

FlashChat  Actuarial Discussion  Preliminary Exams  CAS/SOA Exams  Cyberchat  Around the World  Suggestions 


Thread Tools  Search this Thread  Display Modes 
#601




Quote:

#602




What I am still not clear on is if having a right skewed target variable is actually bad? If we use GLM with a Gamma distribution, isn't it in fact good if the target variable is right skewed since Gamma distributions are usually right skewed to some degree anyway (same with Inverse Gaussian)?
__________________
ASA: 
#603




Quote:
What did you guys use as your interaction(s)? 
#604




For me, similar to the case with combining factor levels, I felt that testing too many interactions would be distracting and not worth the time, so I tested about 6 interactions (6 factor split box plot diagrams) and just picked the one that was clearest. In my case, it was the average crash score for work areas vs nonwork areas differing depending on if it was an intersection or a nonintersection. The sample of records with work area = yes was pretty small, so in reality I am not sure this would be a horribly useful interaction, but I inserted some mumbojumbo about the increased risk and political/legislative nature of road work and how that made it an important variable to consider. Once again, I was mostly interested in proving I knew how to do it, and hopefully that was enough.
__________________
 
#607




Quote:

#609




The interaction I chose was Traffic_Control * Rd_Configuration. This seemed to make logical sense so I created the boxplot and saw significant differences in crash_score. When I ran my GLM (Poisson w Log Link) it showed my interaction as being very significant (***) so that's always nice to see. Also I believe in the question it said something about the interaction needing to make sense? I think the preloaded code had something that tried to interact driveways with us highways which makes 0 sense. So I used that logic to explain why I chose my interaction.
I only tested 2 interactions in total before going with mine. It made sense and there was significant data so I hope that was enough.
__________________

#610




Two things:
1. It's fine to have a right skewed variable, and it's really your link function that relates to the skew of your variable, not your family distribution. I.e. Instead of logging your target variable, you can use a log link. Then: E[Y  x] = e^{XB} Then Y will naturally be right skewed if your XB is normally distributed. This is similar (but not the exact same) as logging your target, then using the identity link, which implies E[ln(Y)  x] = XB. Again, not the same, but they are similar in terms of distribution and nonconstant variance. Your family selection will only impact your AIC calculation, because it's trying to capture how "likely" your residuals are if Y follows a certain distribution around its mean. The actual predictions don't change regardless of your family distribution if you have the same link function. For this reason, I wouldn't recommend logging the yvariable. But I'm just one person, if you choose your link and family carefully after logging and explain why, that's fine. But IMO the justification of "It's right skewed, therefore log it" would not be sufficient on the exam if I was grading. 2. I chose US HWY and Concrete Road. Typically STATE HWY had the highest mean, but on Concrete Roads there was a big jump for US HWY. It ended up in my final model as the highest impact for a categorical variable. I only chose these two levels of the two features because I binarized my data for the final model, and this was the only level of the interaction that jumped out. 
Thread Tools  Search this Thread 
Display Modes  

