

FlashChat  Actuarial Discussion  Preliminary Exams  CAS/SOA Exams  Cyberchat  Around the World  Suggestions 

Thread Tools  Search this Thread  Display Modes 
#1




glm modelling  what response variable?
I'm trying to put together a few simple glm models in R and I'm struggling to come up with a correct formulaic structure.
1. Frequency model. I want to come out with an expected claim frequency for each policy. In this case, would my response variable be the claim count / exposure  e.g. for motor insurance, the number of claims / vehicle year. Or would it be the claim count itself? In R psuedo code: Quote:
2. Loss ratio model. If I wanted to build a loss ratio model, I have the same question as above. Would my response variable be the claim amount (perhaps weighted by the premium) or the claim amount / premium (i.e. the loss ratio)? I'm struggling to understand when you would use an offset, or when you would use a weight parameter in each of these cases. For instance if I fit a frequency model with a response of claim count (whilst weighting on exposure, i.e. model3 in the above) and I get an expected predicted output of 0.5 in age group B (say), then would I interpret this as I expect 0.5 claims in this age group? Or would I interpret it as I expect 0.5 claims in this age group per unit of exposure (even though our response is not per unit of exposure)? Ideally, I would like to get an expected claim cost per unit of exposure. Last edited by TDH; 04222019 at 10:53 AM.. 
#2




The answer to 1 & 2 is "what is the purpose of the model?". You can get at "results" using any of the items you're looking at; the issue is basically do you want to the "calculations" up front (e.g., modeling frequency directly) or on the back end (e.g., modeling claim count and then determining an appropriate frequency).
Bottom line, look at the bigger question you're trying to answer and then seek "simple" methods to answer that question first. If you can supply the general goal, you might get better (or more direct) advice. AS for your last item: from a very simplistic perspective, "offsets" generally will incorporate some influence that is already accounted in some other manner (or the result of some other restrictions placed on your modelingwhether regulatory or business case). For example, California requires that model parameters be estimated in a fixed sequence for auto coverages; so parameters determined in earlier steps would be accounted for through the offset. On the other hand, weights will help control the influence of specific data points (more or less). For example, an observation that has 1/12 the "exposure" to contributing information (e.g., a policy that has exposures for only 1 month while other data points might be contributing a full year's worth of exposures) will not overly influence the model fit due to some largeish loss.
__________________
I find your lack of faith disturbing Why should I worry about dying? It’s not going to happen in my lifetime! Freedom of speech is not a license to discourtesy #BLACKMATTERLIVES 
#3




Quote:
1. Model a response of a claim severity and have the premium as an offset and the weight as 1. This assumes (assuming loglink) that the expected claims will increase proportionally with the premium, i.e. higher premium policies will lead to higher losses. You can then get back the loss ratio by dividing the modelled loss with the actual premium charged. 2. Model a response of loss / premium with a weight of the premium. This means that policies with higher premium will have lower variance of the loss ratio (is this correct?). Which one of these is "standard" in the industry? If you model these two scenarios using a Poisson distribution (e.g. with count data) then you receive identical results  however with every other error structure the results are not identical. Apologies for not being more specific  this is definitely a more general question to help my understanding on which to choose, rather than a specific problem I am looking at. 
#4




The "standard" in the industry would tell you to look at both models and find metrics applicable to both from which you can make a decision for which one works better for the business problem under consideration.
Keep in mind that most companies view their models as proprietary and aren't going to look to have a "standard" way of doing things. If you've read the CAS Monograph on GLMs, you'll find key items to consider in your evaluation, but it won't give guidance on how to do things for the reason stated above.
__________________
I find your lack of faith disturbing Why should I worry about dying? It’s not going to happen in my lifetime! Freedom of speech is not a license to discourtesy #BLACKMATTERLIVES 
#5




The "standard" in the industry would tell you to look at both models and find metrics applicable to both from which you can make a decision for which one works better for the business problem under consideration.
Keep in mind that most companies view their models as proprietary and aren't going to look to have a "standard" way of doing things. If you've read the CAS Monograph on GLMs, you'll find key items to consider in your evaluation, but it won't give guidance on how to do things for the reason stated above.
__________________
I find your lack of faith disturbing Why should I worry about dying? It’s not going to happen in my lifetime! Freedom of speech is not a license to discourtesy #BLACKMATTERLIVES 
#6




What's the purpose of the model? Is it the ability to segment risks or is it accurate prediction? This will tell you what metrics to look at.
Does the model need to be interpretative? If not, GLM may not be the best algorithm to use. Are you only looking at claims that have a count or are you including zero counts as well? You'll need to check if your data is zero inflated and/or overdispersed. Poisson may not be the best fit for the model. Most data is overdispersed and Poisson will understate the variance. QuasiPoisson or Negative Binomial may be a better fit. QuasiPoisson will give the same coefficients and predictions as Poisson, but will give different standard errors. This is important if you plan on giving confidence intervals for the coefficients. Weights have an inverse relationship with the Variance. So Var[Y/W]. Observations with more weight have lower variance. Offsets are just a variable with coefficient 1. If we use weights for frequency, it's saying higher exposure has less variance. Which isn't true. Especially if exposure is Policy Year. So instead, we use offset. The default link function is log for Poisson and you usually want everything on the same scale as your response variable. Therefore, we log everything continuous. That means the offset and weight elements should be logged along with every continuous variables. 
Thread Tools  Search this Thread 
Display Modes  

