

FlashChat  Actuarial Discussion  Preliminary Exams  CAS/SOA Exams  Cyberchat  Around the World  Suggestions 
DW Simpson Global Actuarial & Analytics Recruitment 

Thread Tools  Search this Thread  Display Modes 
#1




Anyone help with by Peril GLMs?
Background:
I've been tasked with creating a rating model by Peril using GLMs. It's commercial lines property, so the data is pretty sparse. The carriers have been asking for Premiums by peril, so we're going with it regardless if it's a better model than a single pure premium model. We're also working under the assumption that Perils are independent. Excluding CAT, that assumption isn't too far off. My data goes back 15 years, but only 20112017 has complete information on some variables so I only use those years. After doing all the necessary scrubbing, I'm sitting at only 50,000 policies with about 6.5% having an incurred claim. Split by Peril: Peril1 has 900 claims. Of those 850 have 1 claim, 45 have 2 claims and 5 have 3 claims Peril2 and Peril 3 have 1500 and 800 claims respectively with a similar claim count breakout as Peril1 I split the data using a random partition of 70% train, 30% test, and leave 2017 as a validation set. The same breakout percentages are reserved. For the frequency models, they're in the R format: Code:
glm(formula = Count_Peril1 ~ Variables, family = poisson(link = "log), offset = log(Exposure), data = data.train) Code:
o < with(f.model1, order(prediction)) x < with(data.test, cumsum(Count_Peril1[o]) / sum(Count_Peril1[o])) y < with(data.test, cumsum(Exposure[o]) / sum(Exposure[o])) dx < x[1]  x[length(x)] h < (y[1] + y[length(y)]) / 2 gini.peril1 < 2*(.5  sum(h*dx)) I've been using deviance ratio as a replacement for R^2, ie how much of the model is actually explained by the data. I create the deviance ratio by: Code:
deviance < 1(model.peril1$deviance / model.peril1$null.deviance) I measure severity with a glm, but target variable is Inc_Peril1, family is Gamma, offset is log(Count_Peril1) and subset of Count_Peril1 > 0 The gini is calculated similarly as above, but x axis is Count_Peril1 and y axis is Incurred_Peril1. Problem: I feel like my models are horrible and I don't know how/where to improve them first. The QQ plots suggest I'm using the wrong distribution. I've tried using negative binomial with various thetas, but that didn't seem to work. Also, when I create the null model just looking at the intercept, the Gini is much higher than when I include any variables. The AIC and deviance are worse though. Not sure why that's the case. While testing different variables, I check the summary to see if they're statistically significant. Then I'll look at the Gini, AIC, and deviance. I'll add/remove variables checking for improvements. Once I start only marginally increasing the Gini, I'll do an anova chi squared test to determine which models are best. Gini varies from .08.22, which sounds pretty horrible, but I have no benchmark. When I run the summary plots, they all look pretty bad. Here's an example of Peril1's Frequency model Model Summary (Gini of .12): QQ: Residual Plot: Cooks Distance: Where do I go from here to improve the model? 
#2




For frequency, you'll want to calculate crunched residuals instead of "raw" residuals.
See page/slide 13 & 14 of this CAS document for more information.
__________________
I find your lack of faith disturbing Why should I worry about dying? It’s not going to happen in my lifetime! Freedom of speech is not a license to discourtesy #BLACKMATTERLIVES 
#3




1. Try some capping to improve the fit of the distribution and the plots.
2. Check out how the models are performing at segmenting the risks by using a lift chart from "7.2.3. Loss Ratio Charts" of the GLM Monograph (https://www.casact.org/pubs/monograp...hareTevet.pdf). You can use PP, freq, or sev instead of LR in the lift chart.
__________________
FCAS 
#4




Quote:
He'll want to address that first before trying to "improve" the model.
__________________
I find your lack of faith disturbing Why should I worry about dying? It’s not going to happen in my lifetime! Freedom of speech is not a license to discourtesy #BLACKMATTERLIVES 
#5




Quote:
Because of this, the crunched residual plot shows a band for 0 claims, 1 claim, and 2 claims. (There were none larger than 2 in the test set). Did I even calculate crunched residuals correctly? Code:
data < data %>% arrange(Count_Peril1) %>% mutate(res = Count_Peril1  Predict_Peril1, bucket = cut(res, 500) %>% ggplot(aes(x = res, y = bucket)) + geom_point() Assuming my code is correct, definitely something wrong with my model. The average claim amount is $50k, so would it be better to do a logistic model to predict whether or not there's a claim, THEN do a severity model offset by the number of claims? 
#6




Quote:

#7




Quote:
Quote:
With a logistic model, you still run into the issue of "did a claim happen" since the result of a logistic is simply the probability that a claim happens. (You essentially get the same thing with the overdispersed Poisson model.)
__________________
I find your lack of faith disturbing Why should I worry about dying? It’s not going to happen in my lifetime! Freedom of speech is not a license to discourtesy #BLACKMATTERLIVES Last edited by Vorian Atreides; 04112019 at 02:23 PM.. 
#8




Quote:
Quote:
Quote:
Also  try changing the "poisson" in your GLM code to "quasipoisson", and take another look at the GLM summary. 
#10




Quote:
Here's the revised code: Code:
crunched < data %>% arrange(predicted.peril1) %>% mutate(bucket = cut2(predicted.peril1,g=100)) %>% group_by(bucket) %>% summarize(avg.pred = mean(predicted.peril1), actual = mean(Count_Peril1)) %>% mutate(crunch.res = actual  avg.pred) ggplot(crunched, aes(x = bucket, y = crunch.res))+geom_point() Purpose of the model is to have a rating plan by pure premium. Also, thanks for taking the time to explain things. 
Tags 
glm, peril, pure premium 
Thread Tools  Search this Thread 
Display Modes  

