Actuarial Outpost
 
Go Back   Actuarial Outpost > Exams - Please Limit Discussion to Exam-Related Topics > SoA/CAS Preliminary Exams > Exam PA: Predictive Analytics
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions

Search Actuarial Jobs by State @ DWSimpson.com:
AL AK AR AZ CA CO CT DE FL GA HI ID IL IN IA KS KY LA
ME MD MA MI MN MS MO MT NE NH NJ NM NY NV NC ND
OH OK OR PA RI SC SD TN TX UT VT VA WA WV WI WY

Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old 12-03-2019, 05:36 AM
ThereIsNoSpoon ThereIsNoSpoon is offline
Member
CAS SOA
 
Join Date: Sep 2014
Studying for PA,FAP
College: When the smog clears, _ _ _ _
Favorite beer: PABST! BLUE RIBBON!
Posts: 505
Default Hyperparameters to keep in mind?

AFAIK the models covered in the modules are:

1. GLM
2. CART/decision trees
3. Random Forest
4. Gradient Boosted Machines
5. Penalized regression i.e. lasso, ridge, elastic net
6. PCA
7. K-Means clustering

For these are there any general hyperparameter estimates I'd best be aware of? I am currently only aware that the nstart parameter for k-means should be generally over 20, and the alpha (mixing) and lambda (shrinkage i.e. variance reduction) parameters for regularized regression, but that's all if my memory serves me right.
__________________
Spoiler:

------------------------------------------
P FM MFE C MLC VEE Economics VEE Applied Statistics VEE Corporate Finance PA FAP

Want to connect on LinkedIn? PM me!
Reply With Quote
  #2  
Old 12-03-2019, 07:45 AM
crcosme88's Avatar
crcosme88 crcosme88 is offline
Member
SOA
 
Join Date: Jun 2012
Studying for GH DP Exam
College: University of Puerto Rico '15
Posts: 57
Default

Quote:
Originally Posted by ThereIsNoSpoon View Post
AFAIK the models covered in the modules are:

1. GLM
2. CART/decision trees
3. Random Forest
4. Gradient Boosted Machines
5. Penalized regression i.e. lasso, ridge, elastic net
6. PCA
7. K-Means clustering

For these are there any general hyperparameter estimates I'd best be aware of? I am currently only aware that the nstart parameter for k-means should be generally over 20, and the alpha (mixing) and lambda (shrinkage i.e. variance reduction) parameters for regularized regression, but that's all if my memory serves me right.
Don't forget about Hierarchical Clustering.
__________________
ASA
Reply With Quote
  #3  
Old 12-03-2019, 10:32 AM
ThereIsNoSpoon ThereIsNoSpoon is offline
Member
CAS SOA
 
Join Date: Sep 2014
Studying for PA,FAP
College: When the smog clears, _ _ _ _
Favorite beer: PABST! BLUE RIBBON!
Posts: 505
Default

Quote:
Originally Posted by crcosme88 View Post
Don't forget about Hierarchical Clustering.
Ah right! I can use hclust to implement that in R too. Hopefully I won't have to read up too much on the cluster dissimilarity i.e. Ward's minimum variance etc.
__________________
Spoiler:

------------------------------------------
P FM MFE C MLC VEE Economics VEE Applied Statistics VEE Corporate Finance PA FAP

Want to connect on LinkedIn? PM me!
Reply With Quote
  #4  
Old 12-03-2019, 07:50 PM
Life Life is offline
Member
Non-Actuary
 
Join Date: Sep 2016
Posts: 49
Default

Those and hclustering are it. Don't forget fit measures like:
- Adjusted R^2
- Mallows’s C_p
- AIC/BIC
- AUC
- Loglikelihood
- xerror

Also, offset and weight.
Reply With Quote
  #5  
Old 12-04-2019, 09:39 PM
ThereIsNoSpoon ThereIsNoSpoon is offline
Member
CAS SOA
 
Join Date: Sep 2014
Studying for PA,FAP
College: When the smog clears, _ _ _ _
Favorite beer: PABST! BLUE RIBBON!
Posts: 505
Default

Quote:
Originally Posted by Life View Post
Those and hclustering are it. Don't forget fit measures like:
- Adjusted R^2
- Mallows’s C_p
- AIC/BIC
- AUC
- Loglikelihood
- xerror

Also, offset and weight.
Wondering if there are any ranges related to the hyperparameters that one better keep in mind though, aside from k-means...
__________________
Spoiler:

------------------------------------------
P FM MFE C MLC VEE Economics VEE Applied Statistics VEE Corporate Finance PA FAP

Want to connect on LinkedIn? PM me!
Reply With Quote
  #6  
Old 12-04-2019, 09:56 PM
Life Life is offline
Member
Non-Actuary
 
Join Date: Sep 2016
Posts: 49
Default

Quote:
Originally Posted by ThereIsNoSpoon View Post
Wondering if there are any ranges related to the hyperparameters that one better keep in mind though, aside from k-means...
Here's my probably-not-useful list:
k-Means Clustering, nstart: 20 to 50
Random Forest, mtry: sqrt(p) for classification; p/3 for regression
Gradient Boosting, nrounds: 1,000
Gradient Boosting, eta: 0.01 to 0.20
Tweedie, p: p = 1 is Poisson, p = 2 is gamma, so p between 1 and 2 is most interesting

We'll probably be given some default values and will be asked to tune the parameters incrementally in a way that meets the business needs.
Reply With Quote
  #7  
Old 12-08-2019, 11:51 AM
ThereIsNoSpoon ThereIsNoSpoon is offline
Member
CAS SOA
 
Join Date: Sep 2014
Studying for PA,FAP
College: When the smog clears, _ _ _ _
Favorite beer: PABST! BLUE RIBBON!
Posts: 505
Default

Quote:
Originally Posted by Life View Post
Here's my probably-not-useful list:
k-Means Clustering, nstart: 20 to 50
Random Forest, mtry: sqrt(p) for classification; p/3 for regression
Gradient Boosting, nrounds: 1,000
Gradient Boosting, eta: 0.01 to 0.20
Tweedie, p: p = 1 is Poisson, p = 2 is gamma, so p between 1 and 2 is most interesting

We'll probably be given some default values and will be asked to tune the parameters incrementally in a way that meets the business needs.
Remind me what's p for RF?
__________________
Spoiler:

------------------------------------------
P FM MFE C MLC VEE Economics VEE Applied Statistics VEE Corporate Finance PA FAP

Want to connect on LinkedIn? PM me!
Reply With Quote
  #8  
Old 12-08-2019, 01:30 PM
TheFinnyKinkajou TheFinnyKinkajou is offline
SOA
 
Join Date: Jul 2019
Posts: 7
Default

Quote:
Originally Posted by ThereIsNoSpoon View Post
Remind me what's p for RF?
I think p is the number of predictor variables. To ensure that the trees in your random forest are not correlated, the algorithm picks a random subset of p's (either sqrt(p) or p/3 depending on whether it's a classification or regression problem) for each node and makes the split based only on those particular p's. If it didn't do this, all of the trees in your random forest would likely look the same and make the same splits.
Reply With Quote
  #9  
Old 12-10-2019, 03:01 AM
jericc1 jericc1 is offline
SOA
 
Join Date: Oct 2017
Studying for PA
College: UCI - Alumni
Posts: 15
Default

For decision trees, cp (the complexity parameter) is a particularly important hyperparameter. Especially since rpart() has a built in cross validation procedure that facilitates cp tuning and pruning a decision tree can usually be done by specifying an optimal/tuned cp.
Reply With Quote
  #10  
Old 12-10-2019, 07:08 AM
DrWillKirby's Avatar
DrWillKirby DrWillKirby is offline
Member
SOA
 
Join Date: Oct 2011
Posts: 1,084
Default

Quote:
Originally Posted by jericc1 View Post
For decision trees, cp (the complexity parameter) is a particularly important hyperparameter. Especially since rpart() has a built in cross validation procedure that facilitates cp tuning and pruning a decision tree can usually be done by specifying an optimal/tuned cp.
I want to make sure I remember a snippet like this from scratch.

cp = tree.reduced$cptable[which.min(tree.reduced$cptable[, "xerror"]

In the previous practice exams they gave it to us, but I could see it being the complicated code piece they like to test.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 12:46 PM.


Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.45815 seconds with 11 queries