02-26-2020, 09:21 AM
 letsplaay SOA Join Date: Jul 2014 Posts: 27

02-27-2020, 04:53 AM
 KarimZ Member SOA Join Date: May 2015 Location: Pakistan College: Graduate, Bsc Accounting and Finance Posts: 205

Have a general question regarding GLM.

What would be your choice of distribution/link function for a GLM if you want to predict claim amounts?

There is data of 1 million individuals, and claim amounts are recorded against them. But approx 95% of those individuals have a 0 claim amount recorded against them.

Thoughts?
02-27-2020, 09:52 AM
 Louisville_Toy Member SOA Join Date: Aug 2019 Posts: 68

02-27-2020, 10:09 AM
 LilActuary SOA Join Date: Dec 2019 Posts: 5

Quote:
 Originally Posted by KarimZ Have a general question regarding GLM. What would be your choice of distribution/link function for a GLM if you want to predict claim amounts? There is data of 1 million individuals, and claim amounts are recorded against them. But approx 95% of those individuals have a 0 claim amount recorded against them. Thoughts?
That would be a Tweedie distribution sir. Discrete at zero and continuous beyond that. I believe you'd also need to use a log link for interpretability of model results. Research the package for doing this in R where you'd need to start be determining the optimal power parameter for your data.

Happy to hear other opinions around here.
02-27-2020, 10:13 AM
 ActuariallyDecentAtBest Member SOA Join Date: Dec 2016 Posts: 385

Quote:
 Originally Posted by LilActuary That would be a Tweedie distribution sir. Discrete at zero and continuous beyond that. I believe you'd also need to use a log link for interpretability of model results. Research the package for doing this in R where you'd need to start be determining the optimal power parameter for your data. Happy to hear other opinions around here.
I don't think I'd use a log link in this situation since he said that 95% of the claim amounts are zero.
02-27-2020, 10:20 AM
 Louisville_Toy Member SOA Join Date: Aug 2019 Posts: 68

Quote:
 Originally Posted by LilActuary That would be a Tweedie distribution sir. Discrete at zero and continuous beyond that. I believe you'd also need to use a log link for interpretability of model results. Research the package for doing this in R where you'd need to start be determining the optimal power parameter for your data. Happy to hear other opinions around here.

https://stats.stackexchange.com/ques...-a-tweedie-glm

with p≈1.75
02-27-2020, 11:06 AM
 Nactuary SOA Join Date: Oct 2019 College: Middle Tennessee State University Posts: 6

Quote:
 Originally Posted by KarimZ Have a general question regarding GLM. What would be your choice of distribution/link function for a GLM if you want to predict claim amounts? There is data of 1 million individuals, and claim amounts are recorded against them. But approx 95% of those individuals have a 0 claim amount recorded against them. Thoughts?
I would use a binomial or Poisson for whether or not they make a claim (binomial if they can only make one, Poisson if they can make more than one claim), and then model severity using only the ~50,000 that have claims. Distribution/link would depend on the shape of the curve, but I would consider inverse Gaussian, lognormal, or a Gamma distribution, and I would consider an identity or log link.
02-27-2020, 06:05 PM
 noone Member SOA Join Date: Feb 2017 Posts: 138

02-27-2020, 06:17 PM
 BabyHorse15 Member CAS SOA Join Date: Apr 2015 Studying for life Posts: 81

Quote:
02-27-2020, 06:44 PM
 nole61 SOA Join Date: Jan 2019 Studying for FSA PRF Module College: FSU Grad Favorite beer: The High Life Posts: 10

Quote:
