Actuarial Outpost
 
Go Back   Actuarial Outpost > Actuarial Discussion Forum > Property - Casualty / General Insurance
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions

Browse Open Actuarial Jobs

Life  Health  Casualty  Pension  Entry Level  All Jobs  Salaries


Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old 04-22-2019, 10:18 AM
TDH TDH is offline
Member
CAS Non-Actuary
 
Join Date: Dec 2016
Posts: 54
Default glm modelling - what response variable?

I'm trying to put together a few simple glm models in R and I'm struggling to come up with a correct formulaic structure.

1. Frequency model. I want to come out with an expected claim frequency for each policy. In this case, would my response variable be the claim count / exposure - e.g. for motor insurance, the number of claims / vehicle year. Or would it be the claim count itself? In R psuedo code:

Quote:
model <- glm(formula = Claim.number / exposure ~ Age + Gender
, data = datatrain, family = poisson())
model2 <- glm(formula = Claim.number / exposure ~ Age + Gender
, data = datatrain, family = poisson(),weights = exposure)
model3 <- glm(formula = Claim.number ~ Age + Gender
, data = datatrain, family = poisson(),weights = exposure)
model4 <- glm(formula = Claim.number ~ Age + Gender
, data = datatrain, family = poisson())
Which of these would be correct as they all will output different parameters?

2. Loss ratio model. If I wanted to build a loss ratio model, I have the same question as above. Would my response variable be the claim amount (perhaps weighted by the premium) or the claim amount / premium (i.e. the loss ratio)?

I'm struggling to understand when you would use an offset, or when you would use a weight parameter in each of these cases. For instance if I fit a frequency model with a response of claim count (whilst weighting on exposure, i.e. model3 in the above) and I get an expected predicted output of 0.5 in age group B (say), then would I interpret this as I expect 0.5 claims in this age group? Or would I interpret it as I expect 0.5 claims in this age group per unit of exposure (even though our response is not per unit of exposure)? Ideally, I would like to get an expected claim cost per unit of exposure.

Last edited by TDH; 04-22-2019 at 10:53 AM..
Reply With Quote
  #2  
Old 04-22-2019, 02:20 PM
Vorian Atreides's Avatar
Vorian Atreides Vorian Atreides is offline
Wiki/Note Contributor
CAS
 
Join Date: Apr 2005
Location: As far as 3 cups of sugar will take you
Studying for ACAS
College: Hard Knocks
Favorite beer: Most German dark lagers
Posts: 65,751
Default

The answer to 1 & 2 is "what is the purpose of the model?". You can get at "results" using any of the items you're looking at; the issue is basically do you want to the "calculations" up front (e.g., modeling frequency directly) or on the back end (e.g., modeling claim count and then determining an appropriate frequency).

Bottom line, look at the bigger question you're trying to answer and then seek "simple" methods to answer that question first. If you can supply the general goal, you might get better (or more direct) advice.

AS for your last item: from a very simplistic perspective, "offsets" generally will incorporate some influence that is already accounted in some other manner (or the result of some other restrictions placed on your modeling--whether regulatory or business case). For example, California requires that model parameters be estimated in a fixed sequence for auto coverages; so parameters determined in earlier steps would be accounted for through the offset.

On the other hand, weights will help control the influence of specific data points (more or less). For example, an observation that has 1/12 the "exposure" to contributing information (e.g., a policy that has exposures for only 1 month while other data points might be contributing a full year's worth of exposures) will not overly influence the model fit due to some large-ish loss.
__________________
I find your lack of faith disturbing

Why should I worry about dying? Itís not going to happen in my lifetime!


Freedom of speech is not a license to discourtesy

#BLACKMATTERLIVES
Reply With Quote
  #3  
Old 04-22-2019, 02:34 PM
TDH TDH is offline
Member
CAS Non-Actuary
 
Join Date: Dec 2016
Posts: 54
Default

Quote:
Originally Posted by Vorian Atreides View Post
The answer to 1 & 2 is "what is the purpose of the model?". You can get at "results" using any of the items you're looking at; the issue is basically do you want to the "calculations" up front (e.g., modeling frequency directly) or on the back end (e.g., modeling claim count and then determining an appropriate frequency).

Bottom line, look at the bigger question you're trying to answer and then seek "simple" methods to answer that question first. If you can supply the general goal, you might get better (or more direct) advice.

AS for your last item: from a very simplistic perspective, "offsets" generally will incorporate some influence that is already accounted in some other manner (or the result of some other restrictions placed on your modeling--whether regulatory or business case). For example, California requires that model parameters be estimated in a fixed sequence for auto coverages; so parameters determined in earlier steps would be accounted for through the offset.

On the other hand, weights will help control the influence of specific data points (more or less). For example, an observation that has 1/12 the "exposure" to contributing information (e.g., a policy that has exposures for only 1 month while other data points might be contributing a full year's worth of exposures) will not overly influence the model fit due to some large-ish loss.
The problem I have is the two approaches will lead to different results. If I am trying to model the loss ratio (i.e. the end result is looking at a loss ratio), then I have two options:

1. Model a response of a claim severity and have the premium as an offset and the weight as 1. This assumes (assuming log-link) that the expected claims will increase proportionally with the premium, i.e. higher premium policies will lead to higher losses. You can then get back the loss ratio by dividing the modelled loss with the actual premium charged.

2. Model a response of loss / premium with a weight of the premium. This means that policies with higher premium will have lower variance of the loss ratio (is this correct?).

Which one of these is "standard" in the industry? If you model these two scenarios using a Poisson distribution (e.g. with count data) then you receive identical results - however with every other error structure the results are not identical.

Apologies for not being more specific - this is definitely a more general question to help my understanding on which to choose, rather than a specific problem I am looking at.
Reply With Quote
  #4  
Old 04-22-2019, 03:32 PM
Vorian Atreides's Avatar
Vorian Atreides Vorian Atreides is offline
Wiki/Note Contributor
CAS
 
Join Date: Apr 2005
Location: As far as 3 cups of sugar will take you
Studying for ACAS
College: Hard Knocks
Favorite beer: Most German dark lagers
Posts: 65,751
Default

The "standard" in the industry would tell you to look at both models and find metrics applicable to both from which you can make a decision for which one works better for the business problem under consideration.

Keep in mind that most companies view their models as proprietary and aren't going to look to have a "standard" way of doing things.

If you've read the CAS Monograph on GLMs, you'll find key items to consider in your evaluation, but it won't give guidance on how to do things for the reason stated above.
__________________
I find your lack of faith disturbing

Why should I worry about dying? Itís not going to happen in my lifetime!


Freedom of speech is not a license to discourtesy

#BLACKMATTERLIVES
Reply With Quote
  #5  
Old 04-22-2019, 03:32 PM
Vorian Atreides's Avatar
Vorian Atreides Vorian Atreides is offline
Wiki/Note Contributor
CAS
 
Join Date: Apr 2005
Location: As far as 3 cups of sugar will take you
Studying for ACAS
College: Hard Knocks
Favorite beer: Most German dark lagers
Posts: 65,751
Default

The "standard" in the industry would tell you to look at both models and find metrics applicable to both from which you can make a decision for which one works better for the business problem under consideration.

Keep in mind that most companies view their models as proprietary and aren't going to look to have a "standard" way of doing things.

If you've read the CAS Monograph on GLMs, you'll find key items to consider in your evaluation, but it won't give guidance on how to do things for the reason stated above.
__________________
I find your lack of faith disturbing

Why should I worry about dying? Itís not going to happen in my lifetime!


Freedom of speech is not a license to discourtesy

#BLACKMATTERLIVES
Reply With Quote
  #6  
Old 04-23-2019, 09:41 AM
Actuarially Me Actuarially Me is offline
Member
CAS
 
Join Date: Jun 2013
Posts: 191
Default

What's the purpose of the model? Is it the ability to segment risks or is it accurate prediction? This will tell you what metrics to look at.

Does the model need to be interpretative? If not, GLM may not be the best algorithm to use.

Are you only looking at claims that have a count or are you including zero counts as well? You'll need to check if your data is zero inflated and/or overdispersed. Poisson may not be the best fit for the model. Most data is overdispersed and Poisson will understate the variance. QuasiPoisson or Negative Binomial may be a better fit. QuasiPoisson will give the same coefficients and predictions as Poisson, but will give different standard errors. This is important if you plan on giving confidence intervals for the coefficients.


Weights have an inverse relationship with the Variance. So Var[Y/W]. Observations with more weight have lower variance. Offsets are just a variable with coefficient 1.

If we use weights for frequency, it's saying higher exposure has less variance. Which isn't true. Especially if exposure is Policy Year. So instead, we use offset.

The default link function is log for Poisson and you usually want everything on the same scale as your response variable. Therefore, we log everything continuous. That means the offset and weight elements should be logged along with every continuous variables.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 08:47 PM.


Powered by vBulletin®
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.23672 seconds with 9 queries