Actuarial Outpost
 
Go Back   Actuarial Outpost > Actuarial Discussion Forum > Property - Casualty / General Insurance
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions


Fill in a brief DW Simpson Registration Form
to be contacted when our jobs meet your criteria.


Reply
 
Thread Tools Display Modes
  #1  
Old 11-21-2011, 07:07 PM
emg3000 emg3000 is offline
 
Join Date: Oct 2007
Posts: 8
Default Predictive Modeling Question

It's best to pose this issue by example:
Suppose you want to predict the performance of several hedge funds. You gather historical data, predictive variables and annual rates of return (your target). One of the variables you select is whether the members of the board left the annual fund meeting with smiles on their faces. Preliminary analysis suggests this variable is highly predictive; smiles are a good thing.

Some of the hedge funds have board meetings before Thanksgiving. Others wait until just before Christmas. Meeting dates may differ from year to year within the same fund. However, you will always use this model on November 30th to select funds for next year's portfolio. All funds require that you put in your order by November 30th or they won't accept your money. Your share in the fund takes effect on January 1st of the new year.

So, when training your predictive model with the historical data, do you include information on board member smiles that was collected from annual meetings in December, even if, were you to apply the model for any such record, you would not know the smile information (your order is in by the time they hold their meeting...)? Put another way, if an observed hedge fund from 3 years ago had a board meeting in December whereby all members were smiling and whereby the fund then went on to generate great returns, would you use the smile data in your model in order to better understand how your model will react to similar data in the future?

Thanks for any insight.
Reply With Quote
  #2  
Old 11-22-2011, 09:47 AM
nonlnear nonlnear is offline
Member
 
Join Date: May 2010
Posts: 6,166
Default

What do you know about the impact of meeting times, and the impact of smile times? Does a smile in February mean the same thing as a smile in December? Just because smiles appeared predictive doesn't mean all smiles are created equal, especially given that orders are locked in prior to the December meeting... (I'm assuming this is a metaphor for some other problme and other variables, but even if it's not I'd probably ask the same thing.) Even if you weren't considering the role of old December data, you should look at the significance of your fixed timing constraint.

It's rare that there is a good reason to ignore otherwise perfectly good data that you already have. Without anything else to go on, I would look at the full smile history of a fund at the point of observation to make the best prediciton possible. Of course, if you are back-testing the model I wouldn't use the 3 year old December smile to make the prediction for the January following that smile, but you should use the smile data from all Decembers prior to that one when making that evaluation.
__________________
Do not reply to this post if you rely on red font.
Reply With Quote
  #3  
Old 11-22-2011, 10:07 AM
Ron Weasley's Avatar
Ron Weasley Ron Weasley is offline
Member
CAS AAA
 
Join Date: Oct 2001
Studying for naught.
Favorite beer: Butterbeer
Posts: 5,275
Default

I'll go with "don't include December smiles for a model to be used in November". Best case is that December smiles are not predictive, worst case is that they are.

Worst case: to the degree that other predictors are correlated with December smiles, the fitted model will "adjust" for the information contained in December smiles. December smiles will not be used for a model that is fired in November, so some of the predictive power that exists in the other predictors were "used" by December smiles, but then December smiles were not available, so the power of that predictor has been thrown out. Your model is less useful than it otherwise would be.

Best case: You spend a lot of time fretting over, and collecting information for a data element you're not going to get the benefit from.
Reply With Quote
  #4  
Old 11-22-2011, 10:17 AM
nonlnear nonlnear is offline
Member
 
Join Date: May 2010
Posts: 6,166
Default

Quote:
Originally Posted by Ron Weasley View Post
I'll go with "don't include December smiles for a model to be used in November". Best case is that December smiles are not predictive, worst case is that they are.

Worst case: to the degree that other predictors are correlated with December smiles, the fitted model will "adjust" for the information contained in December smiles. December smiles will not be used for a model that is fired in November, so some of the predictive power that exists in the other predictors were "used" by December smiles, but then December smiles were not available, so the power of that predictor has been thrown out. Your model is less useful than it otherwise would be.

Best case: You spend a lot of time fretting over, and collecting information for a data element you're not going to get the benefit from.
What about previous Decembers? To make the decision for November 2011 are you saying that there is a good reason to ignore smile dat from previous years? I fully agree that for fitting and testing, the December data for the decision year should be ignored, but I don't see a case for ignoring December dat that is prior to the decision in question.
__________________
Do not reply to this post if you rely on red font.
Reply With Quote
  #5  
Old 11-22-2011, 11:26 AM
emg3000 emg3000 is offline
 
Join Date: Oct 2007
Posts: 8
Default

No doubt, questions a good statistician/actuary should ask. For my purposes, I'm only concerned with whether to use smile data from the current year's November and December. The issue is that December data is readily available (and predictive) for the historical records, but is not available when applying the model (some funds have their meetings after putting in the order).

And yes, as cool as hedge fund espionage may be, this is a metephor for an insurance related predictive model.

By chance, do you know of any literature to reference? I can find all kinds of text books and linear regression tutorials that comment on sources of bias, but they're all aimed at fitting a model for process discovery or explanation. Thus far, I can't find anything addressing this issue in context of making predictions with suppressed/incomplete data.

And thank you for your comments!
Reply With Quote
  #6  
Old 11-22-2011, 11:26 AM
Ron Weasley's Avatar
Ron Weasley Ron Weasley is offline
Member
CAS AAA
 
Join Date: Oct 2001
Studying for naught.
Favorite beer: Butterbeer
Posts: 5,275
Default

I was addressing the same calendar years. If smiles from 11 months ago are predictive, I don't see a statistical problem in using 11 month old smiles.
Reply With Quote
  #7  
Old 11-22-2011, 11:35 AM
Ron Weasley's Avatar
Ron Weasley Ron Weasley is offline
Member
CAS AAA
 
Join Date: Oct 2001
Studying for naught.
Favorite beer: Butterbeer
Posts: 5,275
Default

No book recommendations, but one idea on frame-of-reference thought. Consider limiting yourself only to the kinds of data available at the time the model is used. For example, number of accidents in the next month is available in the historical data, and would probably be very predictive, but not available to an insurer or an insured at any given time. (If it's available to the insured, you may want to let your fraud folks know about it )
Reply With Quote
  #8  
Old 11-28-2011, 01:27 PM
BassFreq's Avatar
BassFreq BassFreq is offline
Member
CAS
 
Join Date: Jun 2003
Location: Chicago
Studying for all eternity
Favorite beer: Duff
Posts: 1,213
Default

You want the data to reflect the reality that you will be posed with when using the resulting model. So, you need to "zero out" the smile indicator variable for the December smiles. In addition to that, there is probably a difference between the December smiles being zeroed out and the November smile indicator being missing because you never were actually able to get someone to the meeting to find out. So, you also should create a "Unable to Observe the Smiles" indicator and test the significance of that. Usually, we see that missing data is usually indicates something bad (and that makes sense here, because if a company has good news, they usually want it to be widely known).
__________________
Res ipsa loquitur, sed quid in infernos dicet?
Reply With Quote
  #9  
Old 11-28-2011, 03:50 PM
magillaG magillaG is offline
Member
 
Join Date: Jun 2007
Posts: 2,618
Default

If the results of your predictive model for a given hedge fund depends heavily on whether that model was trained with december data included, then it seems to me that you may have bigger problems than whether to include december data, ie a rather fragile model.

If you are trying to draw conclusions about a single fund, then I don't know why you wouldn't want to include all the data possible, unless there is some particular reason why December smiles would have a different effect than November smiles- for example, as somebody else pointed out, maybe the fact that the meeting is in December itself carries information that you don't want to ignore.

Of course, when you are evaluating what the real world performance of your model will or should be, then you should remember you will not have the december data. For example, you want to think about what you will do for funds that have an unknown latest meeting.
Reply With Quote
  #10  
Old 11-28-2011, 04:01 PM
Lucy
Guest
 
Posts: n/a
Default

Can't you attempt to interact the smile data with the "how long ago" data, so that smiles last month and smiles 11 months ago can be distinguished by the model? Then you can find out directly if smiles from 11 months ago are predictive.

(I assume you aren't trying to use data that isn't available, like smiles from the following month.)
Reply With Quote
Reply

Tags
predictve modeling

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 04:45 AM.


Powered by vBulletin®
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.45776 seconds with 7 queries