Actuarial Outpost
 
Go Back   Actuarial Outpost > Exams - Please Limit Discussion to Exam-Related Topics > SoA/CAS Preliminary Exams > Exam PA: Predictive Analytics
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions


Upload your resume securely at https://www.dwsimpson.com
to be contacted when new jobs meet your skills and objectives.


Reply
 
Thread Tools Search this Thread Display Modes
  #601  
Old 06-18-2019, 02:38 PM
RiskyBusiness7 RiskyBusiness7 is offline
Member
SOA
 
Join Date: Apr 2018
Posts: 53
Default

Quote:
Originally Posted by man_risk View Post
I think that proper data exploration and factor releveling would take too long for the points given for those questions. Due to this, I was strict in time management and did not recombine too many levels or do data exploration that was too in-depth, but rather just combined a couple factors to show I knew how to do it and moved on. I was worried getting bogged down would be too dangerous with this exam. Hopefully I did enough.
same
Reply With Quote
  #602  
Old 06-18-2019, 02:51 PM
mushibug mushibug is offline
SOA
 
Join Date: Nov 2018
Favorite beer: Anything from Cascade Brewing
Posts: 17
Default

Quote:
Originally Posted by justinmichaelknox View Post
I also logged the Thursday target because it was obviously right-skewed. One thing I wasn't sure about was which family/link then makes sense (aside from the Gaussian/identity ie OLS)
What I am still not clear on is if having a right skewed target variable is actually bad? If we use GLM with a Gamma distribution, isn't it in fact good if the target variable is right skewed since Gamma distributions are usually right skewed to some degree anyway (same with Inverse Gaussian)?
__________________
ASA: P FM MFE MLC C IA FA PA APC
Reply With Quote
  #603  
Old 06-18-2019, 02:52 PM
ActuariallyDecentAtBest ActuariallyDecentAtBest is offline
Member
SOA
 
Join Date: Dec 2016
Posts: 333
Default

Quote:
Originally Posted by man_risk View Post
I think that proper data exploration and factor releveling would take too long for the points given for those questions. Due to this, I was strict in time management and did not recombine too many levels or do data exploration that was too in-depth, but rather just combined a couple factors to show I knew how to do it and moved on. I was worried getting bogged down would be too dangerous with this exam. Hopefully I did enough.
Makes sense to me. I commented on some and mentioned that if I had more time I'd explore more.

What did you guys use as your interaction(s)?
Reply With Quote
  #604  
Old 06-18-2019, 04:05 PM
man_risk man_risk is offline
Member
SOA
 
Join Date: Oct 2017
Studying for FAP Modules
College: Carthage College Alumnus
Favorite beer: New Glarus - Moon Man
Posts: 45
Default

Quote:
Originally Posted by ActuariallyDecentAtBest View Post
Makes sense to me. I commented on some and mentioned that if I had more time I'd explore more.

What did you guys use as your interaction(s)?
For me, similar to the case with combining factor levels, I felt that testing too many interactions would be distracting and not worth the time, so I tested about 6 interactions (6 factor split box plot diagrams) and just picked the one that was clearest. In my case, it was the average crash score for work areas vs non-work areas differing depending on if it was an intersection or a non-intersection. The sample of records with work area = yes was pretty small, so in reality I am not sure this would be a horribly useful interaction, but I inserted some mumbo-jumbo about the increased risk and political/legislative nature of road work and how that made it an important variable to consider. Once again, I was mostly interested in proving I knew how to do it, and hopefully that was enough.
__________________
------------------
FIN ECON STAT IA FA APC
P FM SRM IFM STAM LTAM PA
Reply With Quote
  #605  
Old 06-18-2019, 04:46 PM
Gettin Lucky In Kentucky Gettin Lucky In Kentucky is offline
Member
SOA
 
Join Date: Apr 2018
Location: Louisville Ky
Studying for Specialty
College: Eastern Kentucky University Graduate
Posts: 32
Default

Is it August 30th yet?
Reply With Quote
  #606  
Old 06-18-2019, 05:49 PM
RannowA RannowA is offline
Member
CAS SOA
 
Join Date: Aug 2014
College: Iowa State University Graduate
Posts: 585
Default

Quote:
Originally Posted by Gettin Lucky In Kentucky View Post
Is it August 30th yet?
Yes.
Reply With Quote
  #607  
Old 06-18-2019, 06:47 PM
DjPim's Avatar
DjPim DjPim is offline
Member
SOA
 
Join Date: Nov 2015
Location: SoCal
Posts: 699
Default

Quote:
Originally Posted by mushibug View Post
What I am still not clear on is if having a right skewed target variable is actually bad? If we use GLM with a Gamma distribution, isn't it in fact good if the target variable is right skewed since Gamma distributions are usually right skewed to some degree anyway (same with Inverse Gaussian)?
That's my understanding of it. I also logged the response variable as a knee-jerk reaction for my exam, and half way through when picking family/link I realized I didn't need to for creating a model. I commented on that a good amount though so hopefully my grader is understanding. I played it off as 'I transformed the variable so EDA / outliers / other interactions would be easier to see when creating boxplots etc, and my choice in family/link changes depending on if I log the response before or not'
__________________
Quote:
Originally Posted by Dr T Non-Fan View Post
"Cali" SMH.
Reply With Quote
  #608  
Old 06-18-2019, 09:48 PM
kimjongfun kimjongfun is offline
Member
SOA
 
Join Date: Jan 2015
Posts: 870
Default

Quote:
Originally Posted by RiskyBusiness7 View Post
same
I wasted 1/3 of my time on 1 and 2 and I think I failed for that exact reason...
Reply With Quote
  #609  
Old 06-19-2019, 11:31 AM
WhatsAnActuary8282 WhatsAnActuary8282 is offline
Member
SOA
 
Join Date: Oct 2017
Studying for PA
College: University of Kentucky Alumni
Posts: 37
Default

The interaction I chose was Traffic_Control * Rd_Configuration. This seemed to make logical sense so I created the boxplot and saw significant differences in crash_score. When I ran my GLM (Poisson w Log Link) it showed my interaction as being very significant (***) so that's always nice to see. Also I believe in the question it said something about the interaction needing to make sense? I think the preloaded code had something that tried to interact driveways with us highways which makes 0 sense. So I used that logic to explain why I chose my interaction.

I only tested 2 interactions in total before going with mine. It made sense and there was significant data so I hope that was enough.
__________________
P FM MFE LTAM C PA
Econ VEE Fin VEE Stat VEE
FAP ILA
Reply With Quote
  #610  
Old 06-19-2019, 01:11 PM
midwesterner midwesterner is offline
SOA
 
Join Date: Apr 2019
College: UVA
Posts: 21
Default

Two things:

1. It's fine to have a right skewed variable, and it's really your link function that relates to the skew of your variable, not your family distribution. I.e. Instead of logging your target variable, you can use a log link. Then:

E[Y | x] = e^{XB}


Then Y will naturally be right skewed if your XB is normally distributed.

This is similar (but not the exact same) as logging your target, then using the identity link, which implies

E[ln(Y) | x] = XB.

Again, not the same, but they are similar in terms of distribution and nonconstant variance.

Your family selection will only impact your AIC calculation, because it's trying to capture how "likely" your residuals are if Y follows a certain distribution around its mean. The actual predictions don't change regardless of your family distribution if you have the same link function. For this reason, I wouldn't recommend logging the y-variable. But I'm just one person, if you choose your link and family carefully after logging and explain why, that's fine. But IMO the justification of "It's right skewed, therefore log it" would not be sufficient on the exam if I was grading.


2. I chose US HWY and Concrete Road. Typically STATE HWY had the highest mean, but on Concrete Roads there was a big jump for US HWY. It ended up in my final model as the highest impact for a categorical variable. I only chose these two levels of the two features because I binarized my data for the final model, and this was the only level of the interaction that jumped out.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 08:19 AM.


Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.33477 seconds with 12 queries