Actuarial Outpost
 
Go Back   Actuarial Outpost > Actuarial Discussion Forum > Property - Casualty / General Insurance
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions



Reply
 
Thread Tools Search this Thread Display Modes
  #21  
Old 04-08-2019, 01:25 PM
Vorian Atreides's Avatar
Vorian Atreides Vorian Atreides is offline
Wiki/Note Contributor
CAS
 
Join Date: Apr 2005
Location: As far as 3 cups of sugar will take you
Studying for ACAS
College: Hard Knocks
Favorite beer: Most German dark lagers
Posts: 65,751
Default

Quote:
Originally Posted by Actuarially Me View Post
Very true. Leads me to some questions for anyone:

For those with experience in building pure premium models, what structure do you use?
Separate Frequency and Severity
Single Pure Premium model
Hurdle

What distributions have worked best for your data?
One answer:

Perform an extensive EDA and see what distributions seem to fit for the data that you actually have. Your book of business could be (significantly) different than mine because of differing underwriting standards and/or claim practices.

Then, do several types of modeling. FxS, single PP, PP by peril/loss_type, FxS by peril/loss_type, etc. Then compare a range of (relevant) metrics to a hold-out set see which are performing quantitatively "the best".

Be sure to also examine what the underlying assumptions are of each model and how well the data conforms to those assumptions when making a final decision on your "champion" model.
__________________
I find your lack of faith disturbing

Why should I worry about dying? Itís not going to happen in my lifetime!


Freedom of speech is not a license to discourtesy

#BLACKMATTERLIVES
Reply With Quote
  #22  
Old 04-16-2019, 11:50 AM
mattcarp mattcarp is offline
Member
CAS
 
Join Date: Sep 2016
Studying for Exam 6
College: UC Berkeley
Posts: 381
Default

Quote:
Originally Posted by Actuarially Me View Post
Pure Premium:
Response: log(Loss/Exposure)
Distribution: Tweedie p = 1.5-1.65
Weight: log(exposure)
Offset: log(exposure)
You're supposed to use both weight and offset for tweedie?
__________________
P FM VEE MFE C S OC1 5 OC2 6 7 8 9
Reply With Quote
  #23  
Old 04-16-2019, 02:03 PM
Actuarially Me Actuarially Me is offline
Member
CAS
 
Join Date: Jun 2013
Posts: 191
Default

Quote:
Originally Posted by mattcarp View Post
You're supposed to use both weight and offset for tweedie?
Should just be weight. Thanks for pointing it out. I'll fix it.
Reply With Quote
  #24  
Old 04-16-2019, 02:12 PM
Actuarially Me Actuarially Me is offline
Member
CAS
 
Join Date: Jun 2013
Posts: 191
Default

Anyone know how to create the below graph in R? Comes from the Practitioner's guide to GLMs. I can get close, but since GLMs don't store the reference level for categorical variables, I can only add them manually. Even though their coefficient is 0, it's nice to have them in the model for underwriters to see.

There's a nice function in R called tidy() that creates a tibble of the model summary data and a function confint() that creates the 95% confidence interval. Gets me most of the way there, but missing the reference levels.

There's also the function dummy.coef() that creates a list of all the variables and their factors including the reference level, but doesn't include the standard error and I can't find a good way to join the standard errors from the tidy() function.


https://imgur.com/p4hrpI6[
Reply With Quote
  #25  
Old 04-16-2019, 02:32 PM
Tacoactuary's Avatar
Tacoactuary Tacoactuary is offline
Member
CAS
 
Join Date: Nov 2014
Location: Des Moines, IA
College: Vanderbilt, UIUC
Favorite beer: Yazoo Sue
Posts: 1,591
Default

Quote:
Originally Posted by Actuarially Me View Post
Anyone know how to create the below graph in R? Comes from the Practitioner's guide to GLMs. I can get close, but since GLMs don't store the reference level for categorical variables, I can only add them manually. Even though their coefficient is 0, it's nice to have them in the model for underwriters to see.

There's a nice function in R called tidy() that creates a tibble of the model summary data and a function confint() that creates the 95% confidence interval. Gets me most of the way there, but missing the reference levels.

There's also the function dummy.coef() that creates a list of all the variables and their factors including the reference level, but doesn't include the standard error and I can't find a good way to join the standard errors from the tidy() function.


https://imgur.com/p4hrpI6[
https://stackoverflow.com/a/54931600/9095127
__________________
ACAS 7 8 9 FCAS
Reply With Quote
  #26  
Old 04-16-2019, 02:41 PM
Actuarially Me Actuarially Me is offline
Member
CAS
 
Join Date: Jun 2013
Posts: 191
Default

Quote:
Originally Posted by Tacoactuary View Post
I even looked at that, but only looked at the first answer. Thanks!

Seems to work, but I'll have to do some regex changes cause I use "_" for variable names.
Reply With Quote
  #27  
Old 04-16-2019, 04:11 PM
Actuarially Me Actuarially Me is offline
Member
CAS
 
Join Date: Jun 2013
Posts: 191
Default

Quote:
Originally Posted by Actuarially Me View Post
I even looked at that, but only looked at the first answer. Thanks!

Seems to work, but I'll have to do some regex changes cause I use "_" for variable names.
I tweaked the code in the stack exchange to only show the categorical variables since that's all I care about graphing. This makes it easy to facet by variable name. It can take a while to calculate the confidence intervals.

Here's the code for anyone interested in something similar:

Code:
tidy_coefs <- function(model, level = 0.95){
# Create confidence intervals based on user input
 tidy_conf <- as.data.frame(confint(model, level = 0.95)) %>% rownames_to_column()
 names(tidy_conf) <- c("Term","Lower_CI","Upper_CI")
 
# Create the tidy model data frame and left join the confidence intervals
  tidy_model <- tidy(model)
names(tidy_model) <- c("Term","Point_Estimate","Std_Error","Statistic","P_Value")
  tidy_model <- tidy_model %>% left_join(tidy_conf, by = "Term")
  
# Round the numbers to 3 digits
  is_num <- sapply(tidy_model, is.numeric)
  tidy_model[is_num] <- lapply(tidy_model[is_num], round, 3)

# Pull in model's categorical variables and unnest them.
# Left Join to tidy_model
  xlevels <- model$xlevels  %>% 
    enframe() %>% 
    unnest() %>% 
    mutate(Term = paste0(name,value)) %>% 
    left_join(tidy_model, by = "Term")

#Replace NA with zeros    
  xlevels[is.na(xlevels)] <- 0

return(xlevels)
}
Reply With Quote
  #28  
Old 04-18-2019, 03:34 PM
RockThatScoober RockThatScoober is offline
CAS
 
Join Date: Mar 2018
Posts: 10
Default

I'm curious the best way to structure the data for modeling? Policy year or calendar/accident year?

With calendar/accident year, I'm concerned about claims that occur at the end of a policy but beginning of year. For example, I'm including claims that occurred between 1/1/2015 and 12/31/2018 (past 3 years). If there is a policy who had a claim in January 2015 but only earned one month of exposure then that claim would have a small weight in the GLM. Right? Should that policy show a full year of exposure (assuming no cancellation) or just the one month?
Reply With Quote
  #29  
Old 04-18-2019, 04:27 PM
itGetsBetter itGetsBetter is offline
Member
CAS AAA
 
Join Date: Feb 2016
Location: Midwest
Favorite beer: Spruce Springsteen
Posts: 270
Default

Quote:
Originally Posted by RockThatScoober View Post
I'm curious the best way to structure the data for modeling? Policy year or calendar/accident year?

With calendar/accident year, I'm concerned about claims that occur at the end of a policy but beginning of year. For example, I'm including claims that occurred between 1/1/2015 and 12/31/2018 (past 3 years). If there is a policy who had a claim in January 2015 but only earned one month of exposure then that claim would have a small weight in the GLM. Right? Should that policy show a full year of exposure (assuming no cancellation) or just the one month?
Policy year structure is best because you have a perfect match between policy characteristics and the losses that resulted from them. An additional tip is to add policy year as a categorical variable in the model to control for trend without having to perform trend analysis.
Reply With Quote
  #30  
Old 04-18-2019, 07:44 PM
MoralHazard MoralHazard is offline
Member
CAS
 
Join Date: Jul 2011
Favorite beer: Sam Adams Rebel Rouser
Posts: 110
Default

Quote:
Originally Posted by RockThatScoober View Post
I'm curious the best way to structure the data for modeling? Policy year or calendar/accident year?

With calendar/accident year, I'm concerned about claims that occur at the end of a policy but beginning of year. For example, I'm including claims that occurred between 1/1/2015 and 12/31/2018 (past 3 years). If there is a policy who had a claim in January 2015 but only earned one month of exposure then that claim would have a small weight in the GLM. Right? Should that policy show a full year of exposure (assuming no cancellation) or just the one month?
Assuming you are doing a pure premium analysis, then yes, use the one month as your exposure. Pure prem is loss divided by exposure, so the loss would be divided by a smaller denominator. This might make pure prem hella high, but the weight would be commensurately low, so everything balances in the end.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 09:04 PM.


Powered by vBulletin®
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.34489 seconds with 9 queries