Actuarial Outpost June 2019 Exam PA
 Register Blogs Wiki FAQ Calendar Search Today's Posts Mark Forums Read
 FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions

 Upload your resume securely at https://www.dwsimpson.com to be contacted when our jobs meet your skills and objectives.

#241
05-26-2019, 05:37 AM
 Inactuary SOA Join Date: May 2019 Posts: 2

Quote:
 Originally Posted by DjPim There's a thread for this project that might prove useful. Most of these are answered there, but I'll try to summarize: 1. Base level refers to the level of the factor with the most observations. To binarize 'Race', we create 4 indicator variables, one for each race. The problem is that these 4 variables are perfectly correlated; the sum of them always = 1. To 'trick' our model and get around this issue of multicolinearity, we remove the base level. To us, we recognize the meaning of it as 'if other 3 indicators are 0, then the observation is the base level'. 2. When doing variable selection with backward stepAIC, it will calculate the AIC of the model with all variables, then calculate the AIC of the model with all variables -1 (repeated for each different variable), then decide if it should remove a variable and which one. If Race was one variable with 4 levels, this process would only ask 'is it significant to include the Race variable with all the races?' whereas if we split it up, it can ask instead 'is it significant to include a distinction for White race specifically, or are we fine just saying Black, Hispanic, and Other?' (as example of being able to remove just 1 level of the variable and not the whole thing). 3. The factor variables were split into binary indicators, so if we have RaceWhite, RaceBlack, RaceHispanic, RaceOther, then we don't also need the original Race. However, Gender is already binary. It doesn't matter if we call the levels M/F or 0/1. Therefore, there's nothing to binarize/add/remove, it's fine as-is. 4. I don't quite remember the part you're referring to, I'd have to look at it again, but maybe the answers to 1-3 help with this question?
Thank you! It makes a lot of sense!
#242
05-26-2019, 01:07 PM
 ActuariallyDecentAtBest Member SOA Join Date: Dec 2016 Posts: 383

Haven't gotten started on this sample project yet, still haven't even finished the freaking material.

How hard is the sample exam? Is it very doable?? Seems like we have to memorize more code...
#243
05-27-2019, 01:51 AM
 noone Member SOA Join Date: Feb 2017 Posts: 138
Partial Dependence Plots

Can someone please provide a laymans description/interpretation of Partial Dependence Plots and what they are good for and how to interpret? Thanks!
#244
05-27-2019, 10:00 AM
 LyActuary Member SOA Join Date: Sep 2017 Location: Rochester, NY College: University of Rochester Posts: 102

Quote:
 Originally Posted by jdman929 Has anyone else gotten stuck on the Student Success Practice Exam Decision Tree portion? I'm trying to run the code provided and I get errors. The code in question is: library(rpart) library(rpart.plot) set.seed(123) excluded_variables <- c("G3") # List excluded variables dt <- rpart(G3.Pass.Flag ~ ., data = Train.DS[, !(names(Full.DS) %in% excluded_variables)], control = rpart.control(minbucket = 5, cp = .001, maxdepth = 20), parms = list(split = "gini")) rpart.plot(dt) Error in `[.data.frame`(Train.DS, , !(names(Full.DS) %in% excluded_variables)) : undefined columns selected Does anyone know what's going on?
I think you need to change the !(names(Full.DS) %in% excluded_variables) to !(names(Train.DS) %in% excluded_variables)
#245
05-27-2019, 10:53 AM
 Josh Peck Member SOA Join Date: Dec 2016 College: Towson University Posts: 99

Quote:
 Originally Posted by noone Can someone please provide a laymans description/interpretation of Partial Dependence Plots and what they are good for and how to interpret? Thanks!
It would be like if you took the variable in question and
broke it up into a bunch of buckets
Then predicted the target variable for each of those buckets

Thus, it shows how the target variable is predicted on average for all the different values of that predictor.

Because these plots take a lot of time to run, I think they would crash the prometric computer and will not be tested on.
However, you should still understand the idea so you can explain it if they ask how you could benefit from running it if you had more time.
__________________
P FM MFE C PA

Last edited by Josh Peck; 05-27-2019 at 11:00 AM..
#246
05-27-2019, 04:58 PM
 noone Member SOA Join Date: Feb 2017 Posts: 138

Reference
Good 221 665

The above is a confusion matrix for rmd 7.3 chunk 21.

According to the results, sensitivity is .2633 and specificity is .95. Sensitivity is the proportion of true positive predictions among all positive cases. So that would be 665/(665+35) =.95. And specificity is TN/(TN+FP) = 79/(79+221) =.2633. It looks like the code switched them up but i don't see how that could happen. Any thoughts here?
#247
05-28-2019, 10:06 AM
 rstein SOA Join Date: Jan 2019 Posts: 12

In the exam solution it says that GLMs "cannot capture non-linear relationships" which is confusing me with the fact that they do model non-normal distributions. I know I must be mixing up two things but can someone explain the difference to me, or explain what it means to "not capture non-linear relationships".

Thanks!
#248
05-28-2019, 10:55 AM
 TranceBrah Member SOA Join Date: Mar 2014 Location: Best Coast Posts: 238

Quote:
 Originally Posted by rstein In the exam solution it says that GLMs "cannot capture non-linear relationships" which is confusing me with the fact that they do model non-normal distributions. I know I must be mixing up two things but can someone explain the difference to me, or explain what it means to "not capture non-linear relationships". Thanks!
Normal linear regression:model the expected value of a continuous variable, Y, as a linear function of the continuous predictor, X, E(Yi) = β0 + β1xi

GLM does NOT assume a linear relationship between the dependent variable and the independent variables, but it does assume linear relationship between the transformed response in terms of the link function and the explanatory variables; e.g., for binary logistic regression logit(p/1-p) = β0 + βX.
#249
05-28-2019, 11:17 AM
 Josh Peck Member SOA Join Date: Dec 2016 College: Towson University Posts: 99

Quote:
 Originally Posted by noone Reference Prediction Bad Good Bad 79 35 Good 221 665 The above is a confusion matrix for rmd 7.3 chunk 21. According to the results, sensitivity is .2633 and specificity is .95. Sensitivity is the proportion of true positive predictions among all positive cases. So that would be 665/(665+35) =.95. And specificity is TN/(TN+FP) = 79/(79+221) =.2633. It looks like the code switched them up but i don't see how that could happen. Any thoughts here?
Bad is mapped to 1 (TRUE)

You can see this if you run the following chunk
```{r}
str(credit\$Credit)
str(as.factor(credit\$Credit))
```

It makes no difference to use sensitivity vs specificity as long as you know how to interpret it, which it seems that you do. It simply depends on which factor level is labeled as TRUE.
__________________
P FM MFE C PA
#250
05-29-2019, 02:15 AM
 noone Member SOA Join Date: Feb 2017 Posts: 138

Quote:
 Originally Posted by Josh Peck It would be like if you took the variable in question and broke it up into a bunch of buckets Then predicted the target variable for each of those buckets Thus, it shows how the target variable is predicted on average for all the different values of that predictor. Because these plots take a lot of time to run, I think they would crash the prometric computer and will not be tested on. However, you should still understand the idea so you can explain it if they ask how you could benefit from running it if you had more time.
Thanks. So the examples in module 7 use issage (predictor) for the x-axis and yhat on the y-axis. What does y-hat mean? The target variable is claim count and is either C (actual_cnt=0) or N(actual_cnt>=1).