

FlashChat  Actuarial Discussion  Preliminary Exams  CAS/SOA Exams  Cyberchat  Around the World  Suggestions 


Thread Tools  Search this Thread  Display Modes 
#221




Searched a bunch of other previews of the book and although probit is mentioned several times in snippets, none of the results were very helpful.
__________________

#222




Find a dead lizard, soak it in sambuca overnight, and then just keep slapping yourself in the face with it until this interest fades. I'd wager the experience will be more pleasant than reading through the modules...
__________________

#224




Quote:
SSE is the sum of the squared errors MSE is essentially the variance of the errors (note variance uses a square in the formula to penalize error more if they are far off and less if they are close[ <1 ] then divides by n  1 ) RMSE is essentially the standard deviation of the errors (note the square root is what makes standard deviation easier to understand because it sort of undoes the previous square) Out of the three I would use RMSE because it is the most interpretable as people in general understand standard deviation a little bit. You could even describe RMSE as essentially the expected error. SSE only makes sense to compare between models from the same sample since it depends on the number of samples used. (More samples means more summing which means a bigger number) Log Likelihood is better to use for a skewed distribution like Poisson, but is less interpretable than RMSE. It is less interpretable because it is the log of the probability that your dataset takes on the distribution you are assuming. A higher number is better, but the number itself doesn't mean much. For classification models you can use accuracy, error rate, AUC and AIC (and some additional things like BIC but I wouldn't worry about those) AUC is hard to sum up briefly, but there should be a lot of resources on it since it does get used in the real world. Here is a video that explains AIC https://www.youtube.com/watch?v=LkifE44myLc For classification models I like to use accuracy of the training set, accuracy of the testing set, and then subtract the two as a measure of stability of the model. (Note I scored a 10 on the classification section of the December exam using this method, but with log likelihood)
__________________
Last edited by Josh Peck; 06112019 at 01:39 PM.. 
#225




I also just realized I left out R^2 and deviance
R^2 is the Error Explained by the model divided by the total error. So it can be thought of as the % of error explained by the model. For a linear model with one predictor this is equivalent to the correlation between X and Y squared, which is where it got it's name. Note that if you add any additional predictor to the model it will at least increase R^2 by a tiny amount so take this into account when you are comparing models with different numbers of predictors. Also note that adjusted R^2 attempts to fix this issue. If you want more details on R^2 I'm sure its very easy to find Deviance is 2L where L is the Log Likelihood. Because of the negative sign, we want to minimize this (for the same reason we want to maximize log likelihood) Note Null Deviance is the deviance for the null model which simply uses the sample mean for all predictions More info on deviance: https://bookdown.org/egarpor/SSS2UC...deviance.html If anyone can think of other methods I am leaving out, please add them.
__________________
Last edited by Josh Peck; 05222019 at 01:51 PM.. 
#226




Has anyone else gotten stuck on the Student Success Practice Exam Decision Tree portion? I'm trying to run the code provided and I get errors. The code in question is:
library(rpart) library(rpart.plot) set.seed(123) excluded_variables < c("G3") # List excluded variables dt < rpart(G3.Pass.Flag ~ ., data = Train.DS[, !(names(Full.DS) %in% excluded_variables)], control = rpart.control(minbucket = 5, cp = .001, maxdepth = 20), parms = list(split = "gini")) rpart.plot(dt) Error in `[.data.frame`(Train.DS, , !(names(Full.DS) %in% excluded_variables)) : undefined columns selected Does anyone know what's going on? 
#227




Quote:
Then, check your dataset Train.DS, does it have a variable G3 (easy to check in the Global Environment pane, or type names(Train.DS))? 
#228




Quote:
I just figured out that if I delete the entire section about excluded_variables and runthe code with just data = Train.DS it works, but there's only one node! It spits G3 based on P or F...which isn't too helpful! 
Thread Tools  Search this Thread 
Display Modes  

