Actuarial Outpost
 
Go Back   Actuarial Outpost > Exams - Please Limit Discussion to Exam-Related Topics > SoA/CAS Preliminary Exams > Exam PA: Predictive Analytics
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions


Fill in a brief DW Simpson Registration Form
to be contacted when new jobs meet your criteria.


Reply
 
Thread Tools Search this Thread Display Modes
  #71  
Old 06-12-2019, 10:26 AM
kakalapoo kakalapoo is offline
SOA
 
Join Date: Nov 2017
Location: Atlanta
College: Florida State University Alum
Posts: 21
Default

Quote:
Originally Posted by Modigliani-Miller View Post
The solution uses the testing data AUC value to compare the logit and probit link functions. To me, this is cheating. You're not supposed to use the performance on testing dataset for model selection!!!

I would just compare the model AIC from training data. Do you guys agree?
I'd like to piggyback off this. Some solutions only use test sets and some use training and testing sets for comparing the models. Is there a time you should use both and a time you use only the test set? Or does it not matter as long as you justify?
__________________
ASA GHDP
Reply With Quote
  #72  
Old 06-12-2019, 10:32 AM
Squeenasaurus Squeenasaurus is offline
Member
SOA
 
Join Date: Jul 2016
College: Illinois State University
Favorite beer: Lagunitas
Posts: 185
Default

Quote:
Originally Posted by kakalapoo View Post
I'd like to piggyback off this. Some solutions only use test sets and some use training and testing sets for comparing the models. Is there a time you should use both and a time you use only the test set? Or does it not matter as long as you justify?
The SOA only re-runs the entire dataset on GLMs to get final coefficient values. I don't think we have a general consensus on tree-based models. I personally don't think you should re-train a tree-based model on the entire dataset because then this results in (most likely) different splits being made and a whole new model.
Reply With Quote
  #73  
Old 06-12-2019, 01:13 PM
kakalapoo kakalapoo is offline
SOA
 
Join Date: Nov 2017
Location: Atlanta
College: Florida State University Alum
Posts: 21
Default

Quote:
Originally Posted by Squeenasaurus View Post
The SOA only re-runs the entire dataset on GLMs to get final coefficient values. I don't think we have a general consensus on tree-based models. I personally don't think you should re-train a tree-based model on the entire dataset because then this results in (most likely) different splits being made and a whole new model.
I wasn't really referring to rerunning on the full dataset. I meant sometimes they will display the model performance on both the testing and training data using loglikelihood, sse, etc. However, other times, they'll just display the results from the testing data.
__________________
ASA GHDP
Reply With Quote
  #74  
Old 06-12-2019, 05:12 PM
windows7forever windows7forever is offline
Member
SOA
 
Join Date: Apr 2016
Posts: 277
Default

I wonder if any of you have done the modeling without realizing there's some but not strong evidence to show race and gender should have interaction effect from the plot function provided. I feel there's not enough time to do all those tasks in the solution even if the solution was not designed to complete in 5 hours.

The extra cleaning and modeling that interaction brings make training and testing sets get updated more times.

What main things from the solution you will keep besides mandatory modeling tasks to ensure your task gets done in 5 hours?

Thanks.
Reply With Quote
  #75  
Old 06-12-2019, 07:54 PM
Rankik Rankik is offline
Member
SOA
 
Join Date: Sep 2013
Location: USA
Studying for FAP
College: USC
Favorite beer: Vodka
Posts: 170
Default Task 5 - Confusion Matrix

Question, in the confusion matrix, inside the factor line, it has (1*(predslogit>0.8) and in the solution, the 0.8 is changed to 0.5. Can someone explain what the predslogit is? The help window in R doesn't have any information on this.

Edit: Nevermind total brain fart lol, just saw it was the predication variable

Last edited by Rankik; 06-12-2019 at 08:10 PM..
Reply With Quote
  #76  
Old 06-12-2019, 08:05 PM
DZHOU DZHOU is offline
SOA
 
Join Date: May 2019
College: RPI
Posts: 3
Default

Quote:
Originally Posted by Rankik View Post
Question, in the confusion matrix, inside the factor line, it has (1*(predslogit>0.8) and in the solution, the 0.8 is changed to 0.5. Can someone explain what the predslogit is? The help window in R doesn't have any information on this.
two lines above
predslogit <- predict(glmlogit,newdata=test,type="response")

predicted probabilities for the testing data set
Reply With Quote
  #77  
Old 06-12-2019, 08:07 PM
iloveexams06 iloveexams06 is offline
SOA
 
Join Date: Jun 2019
Posts: 4
Default

Quote:
Originally Posted by Rankik View Post
Question, in the confusion matrix, inside the factor line, it has (1*(predslogit>0.8) and in the solution, the 0.8 is changed to 0.5. Can someone explain what the predslogit is? The help window in R doesn't have any information on this.
it's a variable created before:
predslogit <- predict(glmlogit,newdat=test,type="response")
Reply With Quote
  #78  
Old 06-12-2019, 08:11 PM
Rankik Rankik is offline
Member
SOA
 
Join Date: Sep 2013
Location: USA
Studying for FAP
College: USC
Favorite beer: Vodka
Posts: 170
Default

Quote:
Originally Posted by DZHOU View Post
two lines above
predslogit <- predict(glmlogit,newdata=test,type="response")

predicted probabilities for the testing data set
Quote:
Originally Posted by iloveexams06 View Post
it's a variable created before:
predslogit <- predict(glmlogit,newdat=test,type="response")
Thank you, I realized that as I was doing Task 6 lol. God I can't wait for this thing to be over
Reply With Quote
  #79  
Old 06-12-2019, 08:17 PM
DZHOU DZHOU is offline
SOA
 
Join Date: May 2019
College: RPI
Posts: 3
Default

Last paragraph in Task 7 solution, SOA said " A reasonable additional step here would be to note that the coefficients of DRGOtherSURG and DRGOtherMED are similar. There may be improvement by combining them into a single factor level."

What's single factor level that we should create? Only combine these two or also combine the rest DRG levels except for the base level?
Reply With Quote
  #80  
Old 06-12-2019, 08:50 PM
avocado avocado is offline
Member
SOA
 
Join Date: Apr 2018
Posts: 82
Default

Quote:
Originally Posted by DZHOU View Post
Last paragraph in Task 7 solution, SOA said " A reasonable additional step here would be to note that the coefficients of DRGOtherSURG and DRGOtherMED are similar. There may be improvement by combining them into a single factor level."

What's single factor level that we should create? Only combine these two or also combine the rest DRG levels except for the base level?
I think that means creating a new variable to flag whether the observation is in DRGOtherSURG or DRGOtherMED. I did this and it improved the accuracy a little.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 10:26 PM.


Powered by vBulletin®
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.44614 seconds with 11 queries