

FlashChat  Actuarial Discussion  Preliminary Exams  CAS/SOA Exams  Cyberchat  Around the World  Suggestions 


Thread Tools  Search this Thread  Display Modes 
#1




Hyperparameters to keep in mind?
AFAIK the models covered in the modules are:
1. GLM 2. CART/decision trees 3. Random Forest 4. Gradient Boosted Machines 5. Penalized regression i.e. lasso, ridge, elastic net 6. PCA 7. KMeans clustering For these are there any general hyperparameter estimates I'd best be aware of? I am currently only aware that the nstart parameter for kmeans should be generally over 20, and the alpha (mixing) and lambda (shrinkage i.e. variance reduction) parameters for regularized regression, but that's all if my memory serves me right.
__________________
Spoiler:  Want to connect on LinkedIn? PM me! 
#2




Quote:
__________________
ASA 
#3




Ah right! I can use hclust to implement that in R too. Hopefully I won't have to read up too much on the cluster dissimilarity i.e. Ward's minimum variance etc.
__________________
Spoiler:  Want to connect on LinkedIn? PM me! 
#4




Those and hclustering are it. Don't forget fit measures like:
 Adjusted R^2  Mallows’s C_p  AIC/BIC  AUC  Loglikelihood  xerror Also, offset and weight. 
#5




Wondering if there are any ranges related to the hyperparameters that one better keep in mind though, aside from kmeans...
__________________
Spoiler:  Want to connect on LinkedIn? PM me! 
#6




Quote:
kMeans Clustering, nstart: 20 to 50 Random Forest, mtry: sqrt(p) for classification; p/3 for regression Gradient Boosting, nrounds: 1,000 Gradient Boosting, eta: 0.01 to 0.20 Tweedie, p: p = 1 is Poisson, p = 2 is gamma, so p between 1 and 2 is most interesting We'll probably be given some default values and will be asked to tune the parameters incrementally in a way that meets the business needs. 
#7




Quote:
__________________
Spoiler:  Want to connect on LinkedIn? PM me! 
#8




I think p is the number of predictor variables. To ensure that the trees in your random forest are not correlated, the algorithm picks a random subset of p's (either sqrt(p) or p/3 depending on whether it's a classification or regression problem) for each node and makes the split based only on those particular p's. If it didn't do this, all of the trees in your random forest would likely look the same and make the same splits.

#9




For decision trees, cp (the complexity parameter) is a particularly important hyperparameter. Especially since rpart() has a built in cross validation procedure that facilitates cp tuning and pruning a decision tree can usually be done by specifying an optimal/tuned cp.

#10




Quote:
cp = tree.reduced$cptable[which.min(tree.reduced$cptable[, "xerror"] In the previous practice exams they gave it to us, but I could see it being the complicated code piece they like to test. 
Thread Tools  Search this Thread 
Display Modes  

