Actuarial Outpost
 
Go Back   Actuarial Outpost > Actuarial Discussion Forum > Software & Technology
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions

DW Simpson
Actuarial Jobs

Visit our site for the most up to date jobs for actuaries.

Actuarial Salary Surveys
Property & Casualty, Health, Life, Pension and Non-Tradtional Jobs.

Actuarial Meeting Schedule
Browse this year's meetings and which recruiters will attend.

Contact DW Simpson
Have a question?
Let's talk.
You'll be glad you did.


Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old 10-23-2018, 02:30 PM
Examinator's Avatar
Examinator Examinator is offline
Note Contributor
CAS AAA
 
Join Date: May 2004
Posts: 2,873
Question Predictive model biasing the target variable?

First, I'm not sure whether this is the appropriate location for my question, but it's been a long time since I've been here and this seemed to be as good a place as any.

A process currently exists at my company where we remove vehicles from auto policies if we find that thereís no insurable interest in the vehicle. To determine whether that insurable interest exists, a vehicle registration report is ordered, which identifies the individual to whom the vehicle is titled. If that individual isnít a driver on the policy, the case is reviewed by an underwriter and if no legitimate insurable interest is found, the vehicle is removed from the policy. There is a clear business case for using a predictive model to help identify which cases should be reviewed by an underwriter, as the existing process (i.e., manually reviewing anything with a mismatching vehicle registration report) yields too many cases to review. We can identify situations when vehicles have been removed for a lack of insurable interest, which will serve as the predictive model's target.

The predictive model would suggest whether itís worth having an underwriter manually review a particular case, reducing their backlog. However, my real concern is that by implementing such a model, the only cases where a vehicle could ever be removed will be those positively predicted by the model. Future model refits will continue to target vehicle removals, but vehicles would only be removed in cases where the previous model gave a positive prediction. Weíll essentially know when previous positive predictions end up true or false, but we wonít know when our negative predictions are true or false since generally they wonít be reviewed and would never result in a vehicle removal. We can always review a sample of the negative predictions to see if our model is slipping over time, but weíll never have the full history to re-analyze if we generally donít pursue negative predictions.

Iím curious if this is something anyone has come across or considered. I appreciate any insight offered!
Reply With Quote
  #2  
Old 10-23-2018, 03:15 PM
mathmajor's Avatar
mathmajor mathmajor is offline
Member
SOA AAA
 
Join Date: Dec 2010
Location: Nowhere in particular
Studying for Japanese
College: B.S. Applied Math
Favorite beer: La Croix Grapefruit
Posts: 9,617
Default

I'm definitely not a PA expert but just a 'reasonably smart guy' opinion is that a model would tend to score the likelihood of being uninsurable with a probability measure, and it's up to you to design the criteria around what gets pursued (p>75% for example).

You can train the model on historical data (completed investigations) for purposes of analyzing those not completed. Yes, those cases will be biased toward exhibiting the traits correlative with successful investigations, meaning that the unsuccessful ones will likely contain them too, but you can still study the relationship between each trait and success. You'll want to consider the strength you assign to the predictive power there.

I think the opportunity is to study traits that the manual selection process did not consider at all. There should be no bias there, given a credible dataset.
__________________
FSA
Opinions are provided for entertainment purposes only and are no substitute for professional guidance.
Reply With Quote
  #3  
Old 10-23-2018, 05:15 PM
examsarehard examsarehard is offline
Member
CAS
 
Join Date: May 2011
Posts: 576
Default

Yes, this feedback loop is a common problem when implementing triage models. It is a form of confirmation bias for predictive models, and the feedback can sometimes become pernicious, when say deciding where to send police to across various neighborhoods.

In your particular case, this might not actually be a problem if your model has a high true positive rate. Remember, the goal of your model is actually to optimize the underwriter's time, not to discover every single instance of a lack of insurable interest, which is a goal you would not be able to accomplish anyway, with or without the model. If your underwriters end up verifying more vehicles that need to be removed as a result of the model, then that is a win, even if those vehicles are in some sense "biased".
Reply With Quote
  #4  
Old 10-26-2018, 02:05 PM
Examinator's Avatar
Examinator Examinator is offline
Note Contributor
CAS AAA
 
Join Date: May 2004
Posts: 2,873
Default

The comment about true positive rate, or recall, is a good one and is a bit of a conclusion I'm coming to as I reach out to different folks. And I agree that it's important to keep the overall objective in mind, regarding the use of an underwriter's time and their effectiveness in actually finding what they're trying to find.

Keeping track of how well the model performs over time, relative to how many true positive cases we find (e.g., does this number suddenly drop, or slowly drifts downward over time) as well as comparing profiles of true positives, false positives, and both true and false negatives that we intentionally audit also seem to be good measures to take alongside and after deployment. If our ultimate goal is to reduce workload (while not reducing what that workload finds) then by definition we're creating a blind spot going forward. It seems like keeping track of performance as closely as possible, while casting as wide a net as possible (the higher recall) while still reducing workload will be priority.
Reply With Quote
  #5  
Old 10-26-2018, 02:35 PM
Colonel Smoothie's Avatar
Colonel Smoothie Colonel Smoothie is online now
Member
CAS
 
Join Date: Sep 2010
College: Jamba Juice University
Favorite beer: AO Amber Ale
Posts: 47,998
Default

I've had this problem with fraud detection models or any kind of model where you have to prioritize things for decision making. Once you start looking at high scoring claims, it just reinforces the variables that were used to train the model on further attempts to refresh the parameters.

You need to convince management to set aside part of your book that ignores and isn't influenced by the model to get an unbiased data set. This data set is then used as part of further model building/maintenance efforts. This is a business decision justified as an investment that is necessary to build better quality models. It's what Capital One did and they sustained losses for some time for the sake of gathering data until the models went into production.
__________________
Recommended Readings for the EL Actuary || Recommended Readings for the EB Actuary

Quote:
Originally Posted by Wigmeister General View Post
Don't you even think about sending me your resume. I'll turn it into an origami boulder and return it to you.
Reply With Quote
  #6  
Old 10-26-2018, 03:12 PM
Examinator's Avatar
Examinator Examinator is offline
Note Contributor
CAS AAA
 
Join Date: May 2004
Posts: 2,873
Default

Yep, we've brought this up and some in management understand the need while it's counter-intuitive to others.
Reply With Quote
  #7  
Old 10-27-2018, 03:12 PM
whoanonstop's Avatar
whoanonstop whoanonstop is offline
Member
Non-Actuary
 
Join Date: Aug 2013
Location: Los Angeles, CA
Studying for Spark / Scala
College: College of William and Mary
Favorite beer: Orange Juice
Posts: 5,865
Blog Entries: 1
Default

Quote:
Originally Posted by examsarehard View Post
Yes, this feedback loop is a common problem when implementing triage models. It is a form of confirmation bias for predictive models, and the feedback can sometimes become pernicious, when say deciding where to send police to across various neighborhoods.

In your particular case, this might not actually be a problem if your model has a high true positive rate. Remember, the goal of your model is actually to optimize the underwriter's time, not to discover every single instance of a lack of insurable interest, which is a goal you would not be able to accomplish anyway, with or without the model. If your underwriters end up verifying more vehicles that need to be removed as a result of the model, then that is a win, even if those vehicles are in some sense "biased".
Yeah, I think this is a good post. This happens all the time in production models where your model actually affects future distributions. I think if OP does a quick search in general (not with respect to insurance) you should find a handful of good articles detailing how to combat this. Most commonly:

1. Golden set - if you don't believe there will be a large amount of drift, just use an initial set of data without updating on new sets.

2. Holdout set - not the same as traditional "hold-out" and would probably break down unless you have a good amount of data, but have a smaller % of these scenarios still manually "graded" to be used effectively as the training set for future models.

You can even combine 1/2 with an online approach that requires you to have a smaller % on #2, just slowly retire old training data for new training data.

-Riley
__________________
Reply With Quote
Reply

Tags
bias, predictive analytics, predictive modeling

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 08:29 PM.


Powered by vBulletin®
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.41159 seconds with 11 queries