Actuarial Outpost
 
Go Back   Actuarial Outpost > Actuarial Discussion Forum > Property - Casualty / General Insurance
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions


Not looking for a job? Tell us about your ideal job,
and we'll only contact you when it opens up.
https://www.dwsimpson.com/register


Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old 04-23-2019, 11:01 AM
Actuarially Me Actuarially Me is offline
Member
CAS
 
Join Date: Jun 2013
Posts: 191
Default What do you do with the intercept in a GLM?

Couldn't find any clarification on this in the CAS Monograph 5. We're not using software to implement the model. The factors get put into SQL. I present the factors to the underwriters with a confidence interval and they make selections within that interval.

For continuous variables, we determine a base rate, and bin the continuous data. For each bin, we take the midpoint to calculate the factor. So if it's a bin of property value [0, $1M] the formula is ($500K/Base Rate)^Factor. I know this degrades the model, but the underwriters would rather have bins than for me to put the formula into SQL.

Some of the intercept values are very different. One model has a logged value of -9.68, so is .00006 on linear scale which has a large impact on the final value. So how do you handle that? Excluding it causes the factors to look unrealistic.

My guess is that you treat it like a control variable and ignore it since I'm really only interested in the relativity, but curious what others do.
Reply With Quote
  #2  
Old 04-23-2019, 11:28 AM
MoralHazard MoralHazard is offline
Member
CAS
 
Join Date: Jul 2011
Favorite beer: Sam Adams Rebel Rouser
Posts: 110
Default

Quote:
Originally Posted by Actuarially Me View Post
For continuous variables, we determine a base rate, and bin the continuous data. For each bin, we take the midpoint to calculate the factor. So if it's a bin of property value [0, $1M] the formula is ($500K/Base Rate)^Factor. I know this degrades the model, but the underwriters would rather have bins than for me to put the formula into SQL.
Better idea: rather than take midpoints, run a second model, with the predictions of your main model as the target. (i.e., the procedure described in the monograph section 5.1.2.) This is a generally useful procedure any time the model will be implemented in a different way than it was modeled. In this case, the "main" model will have the continuous variables modeled continuously (better modeling practice, imo), the second model will have them binned (for easier implementation), and will estimate factors for the bins that better match the distribution in the data (which may or may not line up with the midpoints).

Quote:
Some of the intercept values are very different. One model has a logged value of -9.68, so is .00006 on linear scale which has a large impact on the final value. So how do you handle that? Excluding it causes the factors to look unrealistic.
The intercept represents the "base rate," i.e., the pure premium (or freq, sev, etc) when all the model vars are zero. So ignore the intercept, and instead focus on the base rate you will be using - is it adequate? Does the average prediction match the expected average pure premium (considering the historical overall pure premium, and factoring any prospective adjustments such as trend)? If not, the base rate may need to be revised.
Reply With Quote
  #3  
Old 04-23-2019, 12:16 PM
Actuarially Me Actuarially Me is offline
Member
CAS
 
Join Date: Jun 2013
Posts: 191
Default

Quote:
Originally Posted by MoralHazard View Post
Better idea: rather than take midpoints, run a second model, with the predictions of your main model as the target. (i.e., the procedure described in the monograph section 5.1.2.) This is a generally useful procedure any time the model will be implemented in a different way than it was modeled. In this case, the "main" model will have the continuous variables modeled continuously (better modeling practice, imo), the second model will have them binned (for easier implementation), and will estimate factors for the bins that better match the distribution in the data (which may or may not line up with the midpoints).
That's a good idea. I may run into some issues with severity since I have issues getting models to converge. Still working on those though. Would you exclude control variables such as Policy Year at this point?

Quote:
Originally Posted by MoralHazard View Post
The intercept represents the "base rate," i.e., the pure premium (or freq, sev, etc) when all the model vars are zero. So ignore the intercept, and instead focus on the base rate you will be using - is it adequate? Does the average prediction match the expected average pure premium (considering the historical overall pure premium, and factoring any prospective adjustments such as trend)? If not, the base rate may need to be revised.
We're looking to revise the base rate as well. Even when I multiply frequency and severity, I get an unrealistic base rate. My severity models are terrible quality though (Gini ~ .02-.05 whereas the frequency models are around .3-.4).
Reply With Quote
  #4  
Old 04-23-2019, 01:37 PM
MoralHazard MoralHazard is offline
Member
CAS
 
Join Date: Jul 2011
Favorite beer: Sam Adams Rebel Rouser
Posts: 110
Default

Quote:
Originally Posted by Actuarially Me View Post
That's a good idea. I may run into some issues with severity since I have issues getting models to converge. Still working on those though. Would you exclude control variables such as Policy Year at this point?
For the second model, you probably don't even need more than one PY's worth of data -- there is literally no noise in the target variable, so credibility isn't a concern. Is you do want to include more than one PY, then yes, you should include PY as a control -- since the first model is generating different predictions for different PYs, not including PY as a control in the second model may cause those differences to be mis-attributed to other variables that correlate with PY.

However note that having control variables in the model that aren't included in the scoring will make your intercept not make sense, so you will need to manually re-set your base rate to bring the predictions in balance.
Reply With Quote
  #5  
Old 04-23-2019, 02:04 PM
Actuarially Me Actuarially Me is offline
Member
CAS
 
Join Date: Jun 2013
Posts: 191
Default

Quote:
Originally Posted by MoralHazard View Post
For the second model, you probably don't even need more than one PY's worth of data -- there is literally no noise in the target variable, so credibility isn't a concern. Is you do want to include more than one PY, then yes, you should include PY as a control -- since the first model is generating different predictions for different PYs, not including PY as a control in the second model may cause those differences to be mis-attributed to other variables that correlate with PY.

However note that having control variables in the model that aren't included in the scoring will make your intercept not make sense, so you will need to manually re-set your base rate to bring the predictions in balance.
Thanks. Know of any resources on best practices for manually determining/adjusting base rate?

I've been poking around old GLM posts on here and see the same people that answer my questions are the same people that still answer. Thanks to you guys for continuing to answer my basic questions!
Reply With Quote
  #6  
Old 04-23-2019, 02:20 PM
Vorian Atreides's Avatar
Vorian Atreides Vorian Atreides is offline
Wiki/Note Contributor
CAS
 
Join Date: Apr 2005
Location: As far as 3 cups of sugar will take you
Studying for ACAS
College: Hard Knocks
Favorite beer: Most German dark lagers
Posts: 65,688
Default

You should read through the Ratemaking material for CAS Exam 5 to address the fundamentals of your question regarding "base rate".

There really is too much to address in posts here that already is captured rather well (if I may say so) elsewhere.
__________________
I find your lack of faith disturbing

Why should I worry about dying? Itís not going to happen in my lifetime!


Freedom of speech is not a license to discourtesy

#BLACKMATTERLIVES
Reply With Quote
  #7  
Old 04-23-2019, 03:01 PM
Actuarially Me Actuarially Me is offline
Member
CAS
 
Join Date: Jun 2013
Posts: 191
Default

//
Quote:
Originally Posted by Vorian Atreides View Post
You should read through the Ratemaking material for CAS Exam 5 to address the fundamentals of your question regarding "base rate".

There really is too much to address in posts here that already is captured rather well (if I may say so) elsewhere.

That was the exam I stopped at before deciding not to be an actuary, only to end up back in insurance. I'll give it a read; hopefully doesn't give me PTSD. I guess I have no choice, but to learn ratemaking if I want to fully understand how to implement a model.

Would be much easier if I could just create an API lol.
Reply With Quote
  #8  
Old 04-23-2019, 04:03 PM
Actuarially Me Actuarially Me is offline
Member
CAS
 
Join Date: Jun 2013
Posts: 191
Default

For anyone else in the same boat as me, this article is pretty useful:

https://www.casact.org/pubs/forum/00wforum/00wf107.pdf
Reply With Quote
  #9  
Old 04-24-2019, 01:47 PM
Actuarially Me Actuarially Me is offline
Member
CAS
 
Join Date: Jun 2013
Posts: 191
Default

After binning my continuous variables, I ran the frequency models using MoralHazard's advice.

Is this the correct way to do it? I ran the model with continuous variables, took the predictions of that, attached it to my training data, and then ran the model with the same variables except binned versions. Since the predictions are continuous, I switched to the Gamma distribution. The residuals look random, so I think that's a safe move.

If this is correct, I have some additional questions.

Is the reason this works is because the original model did all the work, so if you keep all the original variables, it's just regressing toward the mean of each bin and adjusting the other variables to account for that?

Where would I get the standard errors from to create confidence intervals? Use the continuous version standard error for every bucket?
Reply With Quote
  #10  
Old 04-24-2019, 02:40 PM
MoralHazard MoralHazard is offline
Member
CAS
 
Join Date: Jul 2011
Favorite beer: Sam Adams Rebel Rouser
Posts: 110
Default

Quote:
Originally Posted by Actuarially Me View Post
Is this the correct way to do it? I ran the model with continuous variables, took the predictions of that, attached it to my training data, and then ran the model with the same variables except binned versions. Since the predictions are continuous, I switched to the Gamma distribution. The residuals look random, so I think that's a safe move.
Sounds correct to me.

Quote:
Originally Posted by Actuarially Me View Post
Is the reason this works is because the original model did all the work, so if you keep all the original variables, it's just regressing toward the mean of each bin and adjusting the other variables to account for that?
Yes, I think that sums it up.

Quote:
Originally Posted by Actuarially Me View Post
Where would I get the standard errors from to create confidence intervals? Use the continuous version standard error for every bucket?
Interesting question. This procedure really just produces restated model factors, but not standard errors. If you do need standard errors for the restated model, one thing you can try is bootstrapping this whole procedure. That is, take a random sampling of your training data with replacement, fit the first model, fit the second model on top of that, and save the resulting factors. Repeat 1000 times (or 10,000, or whatever is practical) and use the +/- 95% range of the distribution of the factors as your confidence interval. (Though I imagine there's probably some simpler way to get CIs.)
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 07:30 PM.


Powered by vBulletin®
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.25532 seconds with 9 queries