Actuarial Outpost
 
Go Back   Actuarial Outpost > Actuarial Discussion Forum > Property - Casualty / General Insurance
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions

Browse Open Actuarial Jobs

Life  Health  Casualty  Pension  Entry Level  All Jobs  Salaries


Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old 07-13-2018, 05:38 AM
juandeoyar juandeoyar is offline
 
Join Date: Feb 2018
College: ISEG Actuarial Science
Favorite beer: Leffe Blond
Posts: 23
Default GLM Medical Insurance

Hi everyone,

I am looking for someone with experience modelling multivariate analysis on medical portfolios.
My first question is, how can we handle the group insurance policies? The database has an insured person on each row, but most of them belong to the same policy (so the independence assumption between observations for GLM is broken).
My second question is related to the heavy tail. I can easily observe a few contracts with high loss ratio, which distort the GLM output. If I consider a threshold of 30.000 the results are stabilized, but the maximum sum insured is about 1 million, so I have to do something with the long tail.


Thank you,

Last edited by juandeoyar; 07-13-2018 at 06:28 AM..
Reply With Quote
  #2  
Old 07-13-2018, 08:55 AM
JohnLocke's Avatar
JohnLocke JohnLocke is offline
Member
SOA
 
Join Date: Mar 2007
Posts: 16,292
Default

What exactly are you trying to model? Group claims?
Do you even have good enough data?
Are the features at a member-level or group level?
Is the response at a member-level or group level?
__________________
i always post when i'm in a shitty mood. if i didn't do that, i'd so rarely post. --AO Fan

Lucky for you I was raised by people with a good moral center because if that were not the case, you guys would be in a lot of trouble.
So be very, very glad people like me exist. Your future basically depends on it. --jas66kent

The stock market is going to go up significantly due to Trump Economics --jas66kent
Reply With Quote
  #3  
Old 07-13-2018, 09:38 AM
AMedActuary AMedActuary is offline
Member
SOA
 
Join Date: May 2007
College: UCLA Alumni
Posts: 391
Default

As JohnLocke is saying, more information is needed to help you more.

If you're trying to model the total claims for each member, I would recommend a log transform to handle the heavy tails. You can do log(x+1) (since log(0) is undefined). That has seemed to work fairly well for me.

As far as the independence assumption, it's the residuals that need to be independent. I would make the policy type a categorical predictor (or you can do mixed modeling but this is easier for now). Yes, the policy will affect the claims payment but for members within the same policy, this will make the errors more independent. Of course, you may still have an issue with independence depending on what other predictors you have available. For example, members within the same region may have correlated claims and if you don't have the region as a predictor, that may affect the independence assumption.

I would recommend graphing the residuals if you haven't done so already. Does the graph of the residuals have a pattern to imply independence is in doubt?
Reply With Quote
  #4  
Old 07-13-2018, 11:17 AM
JohnLocke's Avatar
JohnLocke JohnLocke is offline
Member
SOA
 
Join Date: Mar 2007
Posts: 16,292
Default

Quote:
Originally Posted by AMedActuary View Post
As JohnLocke is saying, more information is needed to help you more.

If you're trying to model the total claims for each member, I would recommend a log transform to handle the heavy tails. You can do log(x+1) (since log(0) is undefined). That has seemed to work fairly well for me.

As far as the independence assumption, it's the residuals that need to be independent. I would make the policy type a categorical predictor (or you can do mixed modeling but this is easier for now). Yes, the policy will affect the claims payment but for members within the same policy, this will make the errors more independent. Of course, you may still have an issue with independence depending on what other predictors you have available. For example, members within the same region may have correlated claims and if you don't have the region as a predictor, that may affect the independence assumption.

I would recommend graphing the residuals if you haven't done so already. Does the graph of the residuals have a pattern to imply independence is in doubt?
I'm curious why you would include plan design factors within the model. I would account for those with "actuarial" factors and model allowed claims directly.
__________________
i always post when i'm in a shitty mood. if i didn't do that, i'd so rarely post. --AO Fan

Lucky for you I was raised by people with a good moral center because if that were not the case, you guys would be in a lot of trouble.
So be very, very glad people like me exist. Your future basically depends on it. --jas66kent

The stock market is going to go up significantly due to Trump Economics --jas66kent
Reply With Quote
  #5  
Old 07-13-2018, 11:25 AM
AMedActuary AMedActuary is offline
Member
SOA
 
Join Date: May 2007
College: UCLA Alumni
Posts: 391
Default

Yes that makes sense. I forgot to mention that I did this in the past with Medicaid (which is basically allowed claims) but yes what you described would be best.
Reply With Quote
  #6  
Old 07-13-2018, 03:24 PM
juandeoyar juandeoyar is offline
 
Join Date: Feb 2018
College: ISEG Actuarial Science
Favorite beer: Leffe Blond
Posts: 23
Default

The portfolio has about 65.000 insureds, and some of them are grouped under the same policy, I guess they are family plans.
I am trying to modelling pure premium for each insured, by fitting frequency and severity GLMs models separately. I have also tried with Tweedie for observed pure premiums but it didn't improve the performance (in terms of model fitting- comparing means).
I have rating factors such Age, Sex, Coverage, Nationality, and benefit level. I found that the only significant predictors are Age (grouped by categories of 0-15, 16-25, 26-35, 36-45, 46-60, >60), sex, and SumInsured grouped by 100k, 250k, 500k, 1kk (I didn't use coverage because the database is not clean and it seems to have mistakes).
About the log transformation, I am modelling the severity by a Normal distribution with a log link function, so I think it is equivalent as what you said.
With respect to the frequency, I found that the Negative Binomial distribution has good fitting. I also tried with Hurdle Model (NB) and the AIC improves.
For both cases the tail is still a problem. I've heard that there exists approaches where credibility theory is mixed with GLMs. I would like to remark that I have about 30-40 contracts, out of 65000, with extremely high loss ratios close to the sum insured.
About the residuals, I couldn't identify a pattern but there exists a lot of noise. One reason could be the existence of just a few factors, so they are not able to catch the variability. Of course, when the data is limited, there is nothing we can do until a point. Then I guess we extract the conclusions and move forward.

Last edited by juandeoyar; 07-14-2018 at 03:11 PM..
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 03:18 AM.


Powered by vBulletin®
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.40909 seconds with 11 queries