

FlashChat  Actuarial Discussion  Preliminary Exams  CAS/SOA Exams  Cyberchat  Around the World  Suggestions 
DW Simpson Global Actuarial & Analytics Recruitment 

Thread Tools  Search this Thread  Display Modes 
#21




Quote:
Perform an extensive EDA and see what distributions seem to fit for the data that you actually have. Your book of business could be (significantly) different than mine because of differing underwriting standards and/or claim practices. Then, do several types of modeling. FxS, single PP, PP by peril/loss_type, FxS by peril/loss_type, etc. Then compare a range of (relevant) metrics to a holdout set see which are performing quantitatively "the best". Be sure to also examine what the underlying assumptions are of each model and how well the data conforms to those assumptions when making a final decision on your "champion" model.
__________________
I find your lack of faith disturbing Why should I worry about dying? It’s not going to happen in my lifetime! Freedom of speech is not a license to discourtesy #BLACKMATTERLIVES 
#24




Anyone know how to create the below graph in R? Comes from the Practitioner's guide to GLMs. I can get close, but since GLMs don't store the reference level for categorical variables, I can only add them manually. Even though their coefficient is 0, it's nice to have them in the model for underwriters to see.
There's a nice function in R called tidy() that creates a tibble of the model summary data and a function confint() that creates the 95% confidence interval. Gets me most of the way there, but missing the reference levels. There's also the function dummy.coef() that creates a list of all the variables and their factors including the reference level, but doesn't include the standard error and I can't find a good way to join the standard errors from the tidy() function. https://imgur.com/p4hrpI6[ 
#25




Quote:
__________________
ACAS 7 8 9 
#27




Quote:
Here's the code for anyone interested in something similar: Code:
tidy_coefs < function(model, level = 0.95){ # Create confidence intervals based on user input tidy_conf < as.data.frame(confint(model, level = 0.95)) %>% rownames_to_column() names(tidy_conf) < c("Term","Lower_CI","Upper_CI") # Create the tidy model data frame and left join the confidence intervals tidy_model < tidy(model) names(tidy_model) < c("Term","Point_Estimate","Std_Error","Statistic","P_Value") tidy_model < tidy_model %>% left_join(tidy_conf, by = "Term") # Round the numbers to 3 digits is_num < sapply(tidy_model, is.numeric) tidy_model[is_num] < lapply(tidy_model[is_num], round, 3) # Pull in model's categorical variables and unnest them. # Left Join to tidy_model xlevels < model$xlevels %>% enframe() %>% unnest() %>% mutate(Term = paste0(name,value)) %>% left_join(tidy_model, by = "Term") #Replace NA with zeros xlevels[is.na(xlevels)] < 0 return(xlevels) } 
#28




I'm curious the best way to structure the data for modeling? Policy year or calendar/accident year?
With calendar/accident year, I'm concerned about claims that occur at the end of a policy but beginning of year. For example, I'm including claims that occurred between 1/1/2015 and 12/31/2018 (past 3 years). If there is a policy who had a claim in January 2015 but only earned one month of exposure then that claim would have a small weight in the GLM. Right? Should that policy show a full year of exposure (assuming no cancellation) or just the one month? 
#29




Quote:

#30




Quote:

Thread Tools  Search this Thread 
Display Modes  

