Actuarial Outpost
 
Go Back   Actuarial Outpost > Exams - Please Limit Discussion to Exam-Related Topics > SoA/CAS Preliminary Exams > Exam PA: Predictive Analytics
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions

Browse Open Actuarial Jobs

Life  Health  Casualty  Pension  Entry Level  All Jobs  Salaries


Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old 11-16-2018, 02:48 PM
ARodOmaha ARodOmaha is offline
Member
SOA
 
Join Date: May 2016
Location: Omaha, NE
College: University of Nebraska (alma mater)
Favorite beer: Captain Morgan
Posts: 207
Default RStudio Study Guide

Dear Outposters,

Unlike many of you, I am a mere mortal and have no prior experience with R. Going through the modules, there was a lot of code and a lot of fluff. What I needed is a concise list of simple code that I can reference. I was thinking of something that Coaching Actuaries would put together. To that end, I have compiled the attached document. Although we are provided two cheat sheets during the PA exam, the R coding in them is pretty simple and they don't include the predictive analytic elements.

One will still need to go through the material to understand the purpose behind each section. But for me, I can now sit down and practice the material without flipping through hundreds of slides.

Please let me know of any edits that are needed. I plan on making changes but I wanted to publish this now for people that are in the same boat as me.
----
Update: I have made several suggested changes. Modified 11/26/2018.
Attached Images
File Type: pdf Rstudio Study Notes for PA 20181126.pdf (630.6 KB, 376 views)
__________________
P FM MLC MFE C PA FAP APC

Last edited by ARodOmaha; 11-26-2018 at 01:46 PM..
Reply With Quote
  #2  
Old 11-16-2018, 05:57 PM
Jsobiera Jsobiera is offline
SOA
 
Join Date: Jan 2018
College: Loyola Marymount University
Favorite beer: Pilsner/Blonde
Posts: 5
Default 2 Cheat Sheets for PA?

We are provided 2 cheat sheets for PA?
Where is that stated? What does that mean exactly?
I have never heard of this. I need every advantage I can get!
__________________
P FM MFE C LTAM PA
VEEs FAP APC
...ASA (just waiting on the paperwork )
Reply With Quote
  #3  
Old 11-16-2018, 06:26 PM
NchooseK NchooseK is offline
Member
SOA
 
Join Date: Nov 2012
Location: Philly area
Studying for PA, LTAM, FA (FAP)
College: Swarthmore College (BA Mathematics), Villanova University (MS Applied Stat)
Favorite beer: I don't drink beer, but I love the Dos Equis commercials.
Posts: 353
Default

Quote:
Originally Posted by ARodOmaha View Post
Dear Outposters,
I wanted to publish this now for people that are in the same boat as me.
This is an outstanding reference for those familiar with R and not. Thank you. Well done.

Quote:
Originally Posted by Jsobiera View Post
We are provided 2 cheat sheets for PA?
Where is that stated? What does that mean exactly?
I have never heard of this. I need every advantage I can get!
Check the syllabus. IMO, the cheat sheets are not very valuable. They offer very basic commands/code/ideas that test takers should be very familiar with at the exam sitting. IMO, if you have to refer to these sheets often, you will run out of time. Use them if you forget something small or have a brain cramp. Just my $0.02.
__________________
Exams: P | FM | C | MFE | LTAM | SRM Credit | PA

VEE: Statistics | Finance | Economics

FAP: 1 | 2 | 3 | 4 | IA | 6 | 7 | FA

Conferences: APC
Reply With Quote
  #4  
Old 11-17-2018, 11:08 AM
Alanbb Alanbb is offline
Member
SOA
 
Join Date: Aug 2013
Location: Nigeria
Studying for ILALRM
College: Postgraduate
Posts: 105
Default

Quote:
Originally Posted by ARodOmaha View Post
Dear Outposters,

Unlike many of you, I am a mere mortal and have no prior experience with R. Going through the modules, there was a lot of code and a lot of fluff. What I needed is a concise list of simple code that I can reference. I was thinking of something that Coaching Actuaries would put together. To that end, I have compiled the attached document. Although we are provided two cheat sheets during the PA exam, the R coding in them is pretty simple and they don't include the predictive analytic elements.

One will still need to go through the material to understand the purpose behind each section. But for me, I can now sit down and practice the material without flipping through hundreds of slides.

Please let me know of any edits that are needed. I plan on making changes but I wanted to publish this now for people that are in the same boat as me.
Thanks a lot for this
__________________
ASA
Modules: FinEcons ERM Reg&Tax DMAC
Exams: ILALRM LP LFV
FAC
Reply With Quote
  #5  
Old 11-17-2018, 03:55 PM
Adapt and Chill Adapt and Chill is offline
Member
SOA AAA
 
Join Date: Sep 2017
College: Davidson College
Posts: 189
Default

This is a useful cheatsheet, thanks for sharing. I have a couple of possible suggestions for your next round of edits:

For subsets, there's obviously a lot of ways to correctly select the rows/columns you want. It might be easier without select.
  • data.new <- data.old[,c("GENDER","AGE","BLUEBOOK")] #Gives you all rows, but just the three named variables (columns)
  • data.new <- data.old[data.old$GENDER=="M" & data.old$AGE>20,] #Only selects rows with males over 20, but all variables
  • data.new <- subset(data.old, GENDER=="M" & AGE>20) #same as the case above using subset
  • data.new <- data.old[data.old$GENDER=="M" & data.old$AGE>20,c("GENDER","AGE","BLUEBOOK")] #Only selects rows with males over 20, and returns only the three named columns
To binarize columns I think it's easier to use model.matrix. Since dummyVars doesn’t work with factors, factor vectors need to be converted to characters first before being binarized with dummyVars.
  • data.binarized <- model.matrix(~continent, data=data.new)
As you noted, prcomp only works with numerical data. After binarizing with model.matrix or dummyVars, PCA can be visually examined using the following plots:
  • biplot(pca)
  • screeplot(pca, type=”lines”)
For k-means, I have the following code snippet that can be used to create a data frame that can be used to graph the elbow plot (can also be done quickly by using rbind with a For loop that iterates through each value of k):
  • data.frame(k=c(1:6), bss_tss = c(km1$betweenss/km1$totss, km2$betweenss/km2$totss, …… km6$betweenss/km6$totss))
Not sure if hierarchal clustering will come up, but if it does: need to create a dissimilarity matrix first, then use hclust.
  • dissmatrix<-dist(data)
  • hclust(dissmatrix)

Again, this is just nitpicky/personal preferences that I had in my notes. I think you hit all of the major points effectively in your first draft though. Great job.
Reply With Quote
  #6  
Old 11-17-2018, 05:21 PM
Whoaminoneofyourbusiness's Avatar
Whoaminoneofyourbusiness Whoaminoneofyourbusiness is offline
Member
SOA
 
Join Date: Jan 2017
Location: The Grand Tournament
Studying for GH Spec
Posts: 886
Default

I was under the impression that boosting could be on the exam, but with train instead of xgboost. Not sure if you agree but a small section on that may help. Otherwise this is awesome, thanks a bunch!
__________________
Spoiler:
Reply With Quote
  #7  
Old 11-17-2018, 05:33 PM
Whoaminoneofyourbusiness's Avatar
Whoaminoneofyourbusiness Whoaminoneofyourbusiness is offline
Member
SOA
 
Join Date: Jan 2017
Location: The Grand Tournament
Studying for GH Spec
Posts: 886
Default

Also, I'm curious why this was in the regression tree example in SOA's rmd file:

parms = list(split = “information”)

Information gain is a measure of entropy which should only really be used in a classification tree. By looking at rpart directly:

parms
optional parameters for the splitting function.
Anova splitting has no parameters.

I'd probably remove the parms statement in the regression tree bc of this unless im missing something
__________________
Spoiler:

Last edited by Whoaminoneofyourbusiness; 11-17-2018 at 05:58 PM..
Reply With Quote
  #8  
Old 11-19-2018, 12:10 PM
kasem kasem is offline
SOA
 
Join Date: Apr 2018
Posts: 16
Default

Thanks! useful notes. Is reason behind why the codes for boosting were not included in a noted?
Reply With Quote
  #9  
Old 11-19-2018, 02:33 PM
ARodOmaha ARodOmaha is offline
Member
SOA
 
Join Date: May 2016
Location: Omaha, NE
College: University of Nebraska (alma mater)
Favorite beer: Captain Morgan
Posts: 207
Default

Quote:
Originally Posted by kasem View Post
Thanks! useful notes. Is reason behind why the codes for boosting were not included in a noted?
(1) I didn't easily understand the code from the modules and (2) they were not in the two sample projects. But if someone has some nice clean code then I can include it.
__________________
P FM MLC MFE C PA FAP APC
Reply With Quote
  #10  
Old 11-19-2018, 02:35 PM
ARodOmaha ARodOmaha is offline
Member
SOA
 
Join Date: May 2016
Location: Omaha, NE
College: University of Nebraska (alma mater)
Favorite beer: Captain Morgan
Posts: 207
Default

Quote:
Originally Posted by Adapt and Chill View Post
This is a useful cheatsheet, thanks for sharing. I have a couple of possible suggestions for your next round of edits:

For subsets, there's obviously a lot of ways to correctly select the rows/columns you want. It might be easier without select.
  • data.new <- data.old[,c("GENDER","AGE","BLUEBOOK")] #Gives you all rows, but just the three named variables (columns)
  • data.new <- data.old[data.old$GENDER=="M" & data.old$AGE>20,] #Only selects rows with males over 20, but all variables
  • data.new <- subset(data.old, GENDER=="M" & AGE>20) #same as the case above using subset
  • data.new <- data.old[data.old$GENDER=="M" & data.old$AGE>20,c("GENDER","AGE","BLUEBOOK")] #Only selects rows with males over 20, and returns only the three named columns
To binarize columns I think it's easier to use model.matrix. Since dummyVars doesn’t work with factors, factor vectors need to be converted to characters first before being binarized with dummyVars.
  • data.binarized <- model.matrix(~continent, data=data.new)
As you noted, prcomp only works with numerical data. After binarizing with model.matrix or dummyVars, PCA can be visually examined using the following plots:
  • biplot(pca)
  • screeplot(pca, type=”lines”)
For k-means, I have the following code snippet that can be used to create a data frame that can be used to graph the elbow plot (can also be done quickly by using rbind with a For loop that iterates through each value of k):
  • data.frame(k=c(1:6), bss_tss = c(km1$betweenss/km1$totss, km2$betweenss/km2$totss, …… km6$betweenss/km6$totss))
Not sure if hierarchal clustering will come up, but if it does: need to create a dissimilarity matrix first, then use hclust.
  • dissmatrix<-dist(data)
  • hclust(dissmatrix)

Again, this is just nitpicky/personal preferences that I had in my notes. I think you hit all of the major points effectively in your first draft though. Great job.
Thank you very much for your suggestions, I'll be sure to include many of them in the edit.
__________________
P FM MLC MFE C PA FAP APC
Reply With Quote
Reply

Tags
cheat sheet, coding, predictive analyics, rstudio, study guide

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 07:04 PM.


Powered by vBulletin®
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.33152 seconds with 10 queries