Actuarial Outpost
 
Go Back   Actuarial Outpost > Actuarial Discussion Forum > Software & Technology
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions

Search Actuarial Jobs by State @ DWSimpson.com:
AL AK AR AZ CA CO CT DE FL GA HI ID IL IN IA KS KY LA
ME MD MA MI MN MS MO MT NE NH NJ NM NY NV NC ND
OH OK OR PA RI SC SD TN TX UT VT VA WA WV WI WY

Reply
 
Thread Tools Display Modes
  #1  
Old 11-01-2017, 03:44 PM
Alan.Actex's Avatar
Alan.Actex Alan.Actex is offline
Member
Non-Actuary
 
Join Date: Nov 2016
Location: Holyoke, Massachusetts
College: UMass Amherst
Favorite beer: Paper City Dam Ale
Posts: 100
Lightbulb ACTEX Webinar: Mary Pat Campbell's "Intro to Predictive Analytics for Actuaries"

ACTEX eLearning Presents: Mary Pat Campbell

ACTEX eLearning Webinar:
"Introduction to Predictive Analytics for Actuaries"


November 30, 2017, 1:00 – 2:30 PM ET


Co-Sponsored by the Canadian Institute of Actuaries

1.8 SOA/EA CPD, CAS CE Credits/Training Hours
This is a potential source of 1.5 hours of CPD for the Canadian Institute of Actuaries

Predictive Analytics is all the rage in the actuarial (and larger) world – but what is it? And how can actuaries use it and implement it? In this webcast, viewers will be introduced to predictive analytics as applicable to actuarial work, and the world of insurance more generally.

There will be no formulas! The focus will be on what different techniques can do, as opposed to the mathematical underpinnings of the approaches. Some pros and cons of each technique will be explored.

Enroll Today: Click Here

Sampling of approaches to be covered:
  • Regressions—linear, logistic, and generalized approaches
  • Clustering
  • Decision trees
  • Support vector machines
  • Examples will be shown using an R environment, using publicly available data.

Instructor:

Mary Pat Campbell, FSA, MAAA, PRM, is Vice President, Insurance Research at Conning in Hartford, Connecticut. She also teaches courses on computing (Excel, Access, and VBA) and business writing for actuarial science students at the University of Connecticut.

Mary Pat has had a long-standing interest in modeling techniques, having worked on models covering*molecular physics, neuroscience, finance, population studies, signal processing, statistics, information retrieval, electronic logic games, and, of course, actuarial models. Mary Pat Campbell is a founding member of the SOA Modeling Session and is co-editor of its newsletter, The Modeling Platform.

Mary Pat wrote "Getting Started in Predictive Analytics: Books and Courses" in the December 2015 issue of the Predictive Analytics and Futurism newsletter (link:Click HERE) and presented at the 2016 Life and Annuity Symposium: Session 16 -- Predictive Analytics, Where Do I Even Start?

Who Should Attend?

Actuaries and non-actuaries who are interested in understanding the different approaches to predictive analytics. No prior knowledge of predictive analytics or the R language is assumed.

Enroll Today: Click Here

Questions? Contact me: Alan.ACTEX
__________________
Alan C. Elman | eLearning Lead
ACTEX Learning | Mad River Books
(860) 379-5470 Ext. 4007

Click to view our Actuarial eLearning Offerings:
Webinars | Exam Prep | VEE Courses
Reply With Quote
  #2  
Old 11-29-2017, 05:06 PM
campbell's Avatar
campbell campbell is offline
Mary Pat Campbell
SOA AAA
 
Join Date: Nov 2003
Location: NY
Studying for duolingo and coursera
Favorite beer: Murphy's Irish Stout
Posts: 79,664
Blog Entries: 6
Default



Will be doing this tomorrow - recordings will be available for purchase afterwards, too, but wouldn't you like a live webinar with me?

Get some CE credit! The year is waning...
__________________
It's STUMP

LinkedIn Profile
Reply With Quote
  #3  
Old 12-01-2017, 02:30 PM
oedipus rex's Avatar
oedipus rex oedipus rex is offline
Member
SOA AAA
 
Join Date: Nov 2002
Favorite beer: too many to list here
Posts: 15,278
Default

It was a good talk, I'd like to see more presentations on applying "data science" methods to actuarial tasks.
__________________
Life can only be understood backwards; but it must be lived forwards. --S.K.
Reply With Quote
  #4  
Old 12-01-2017, 04:51 PM
Stephen Camilli's Avatar
Stephen Camilli Stephen Camilli is offline
Member
SOA
 
Join Date: Oct 2013
Location: Hyde Park, NY
College: Brown University Alumni
Favorite beer: Two-Hearted Ale
Posts: 85
Default

I'm glad you liked it Oedipus Rex. We have several "data science" applied to actuarial tasks webinars coming up. You can see the whole list here:

Webinars

Some higlights are:

Data Science for Actuarial Managers

Advanced Python for Actuaries

Data Visualization Design Concepts in R

and

Machine Learning in R for Actuaries

With a couple more webinars in the works on blockchain.
__________________
Stephen Camilli, FSA | President ACTEX Learning | Mad River Books

Click to view our Actuarial eLearning Offerings:
Webinars | Exam Prep | VEE Courses
Reply With Quote
  #5  
Old 12-01-2017, 05:20 PM
oedipus rex's Avatar
oedipus rex oedipus rex is offline
Member
SOA AAA
 
Join Date: Nov 2002
Favorite beer: too many to list here
Posts: 15,278
Default

great, thanks!
__________________
Life can only be understood backwards; but it must be lived forwards. --S.K.
Reply With Quote
  #6  
Old 12-03-2017, 09:16 PM
Chuck Chuck is offline
Member
SOA AAA
 
Join Date: Oct 2001
Location: Illinois
Posts: 4,229
Default

Hi MPC, Stephen - I was on the webinar and enjoyed it. Certainly a lot to bite off in 90 minutes.

So I am going to ask some really basic, expose my ignorance, questions (feel free to consider any of this feedback that you use however you use feedback)...

I am bad w/the lingo and also not knowledgeable of what languages like R actually do. My main interest is in understanding the life underwriting Predictive modeling/indexing projects. Tell me if this is the what is basically going on or set me straight where I am not making sense...

So say I am LexisNexis (or some other data aggregator) and I have a training database with lots of fields that I think are related to mortality expectations. I assume they are things like credit scores, financial info, medical info, other stuff(?) on individuals. I don't see how you can use that to actually try to directly predict mortality rates. So I am assuming what we are really doing is developing a "formula" or "algorithm" which uses the data to predict the underwriting class (preferred, select, std, rated, etc) that would be assigned if the individual actually went thru traditional medical underwriting.

Presumably then the big reinsurers can take those predictions on their database of risks and back test how well the prediction matches the actual assignments. To the extent that they don't match, presumably they can look at their mortality experience, reallocated to the new classes and try to predict whether the predictive index actually does a better or worse job (I've heard predictions that the PI actually does better in out years then traditional).

So, for a simple example, when you are doing some "multi-variable" linear regression on say, n variables (X1,,XN), you come with some "index formula" of "coefficients" (C1...CN) such that INDEX = C1*X1 + ... + CN*XN and you use the index to categorize the risks into classes.

So what does R do exactly? Is it coming up with the coefficient C1...CN that best fit the test data (under some measure of fit)?

Then is the exercise to choose the relevant variables, or other more complex methods besides linear regression to come up with the index until you decide on the one that appears to work best? And R is the tool that performs some algorithm to acheive the best fit once you have chosen your variables and method?

Is that basically what is being done? Set me straight where I go astray if you will.

Now what if I had additional data, maybe proprietary or maybe other public info that I think could improve upon the index. Would it make sense to build another test database that includes my data plus the published index result (as a single variable in my new test data) and then start to use that plus my fields to see if I can come up with a better working index that builds upon the original index?

Last edited by Chuck; 12-03-2017 at 09:22 PM..
Reply With Quote
  #7  
Old 12-03-2017, 09:55 PM
campbell's Avatar
campbell campbell is offline
Mary Pat Campbell
SOA AAA
 
Join Date: Nov 2003
Location: NY
Studying for duolingo and coursera
Favorite beer: Murphy's Irish Stout
Posts: 79,664
Blog Entries: 6
Default

The specific functions I used from R were doing best fit (for regressions, minimizing the sum of squares of residuals - but one can use other weightings/metrics; the metrics being optimized are different for classification processes).

There's nothing particularly special about R, other than it was developed by a community of statistics-minded people and thus has been optimized for specific kinds of analysis and model-fitting. Theoretically, one could implement any of the algorithms used in R via Excel VBA (I DON'T RECOMMEND IT, THOUGH).

R functions have generally been developed by people who know the underlying theory and algorithmic approaches; it's the kind of stuff I used to code in Fortran back in the day, when I took numerical computing classes in the math department.

You can do it R, Fortran, python, whatever -- the point is somebody (or, more specifically, somebodies) have done the work to code the standard algorithms for people already.

I gave multiple examples of linear regressions -- I had to tell the lm() function which variables I wanted to regress against. I had to tell what kind of function I wanted to regress against (the first examples were linear, but I also did a few other kinds).

So yes, you'd have to stipulate the form of the model you're trying to fit, what you want to optimize, which data to use to do the fit. The functions/procedures give various statistics back to indicate significance of various variables, amount of correlation, etc.

When there are various suites for predictive analytics out there at a higher level, such as with reinsurers (as you mention), I believe they've fitted and tested a variety of data sets to see what kinds of structures work best for the kinds of models they're trying to fit (or the kinds of problems they're trying to solve).

That's where the cross-validation and other techniques come in -- what they do is help support that particular model structures work well for the kinds of data you're looking at. As new data comes in, parameters are updated to the particular structure -- I mentioned credibility as something similar.



For my demonstrations, I used public data on Kaggle because I wanted to easily share it with attendees without them needing to install any particular software.

If one wants to use proprietary data, then yes, you could add it to publicly available data... but you wouldn't be posting it to Kaggle Kernels. R can be used in a variety of environments, and it needn't be public as I did.

For those who want to see the Kaggle Kernel I ran, it's here:

https://www.kaggle.com/meepbobeep/in...-in-r/notebook

I plan on adding some more comments and some more code over time.
__________________
It's STUMP

LinkedIn Profile
Reply With Quote
  #8  
Old 12-03-2017, 10:31 PM
Guinness Guinness is offline
Member
SOA
 
Join Date: Jul 2016
Posts: 38
Default

Quote:
Originally Posted by Chuck View Post

So say I am LexisNexis (or some other data aggregator) and I have a training database with lots of fields that I think are related to mortality expectations. I assume they are things like credit scores, financial info, medical info, other stuff(?) on individuals. I don't see how you can use that to actually try to directly predict mortality rates. So I am assuming what we are really doing is developing a "formula" or "algorithm" which uses the data to predict the underwriting class (preferred, select, std, rated, etc) that would be assigned if the individual actually went thru traditional medical underwriting.
While you CAN try to predict the underwriting decision, I would not reccomend it (at all...its one of those things where actuaries instinctively want to do it this way...and some companies have...but it is the absolutely the wrong way to do it). It makes much more sense to try to predict the mortality rate directly than to try to predict the underwriting class. Why? First of all, underwriting classes are rather coarse. It makes sense to predict mortality as a continuous attribute so you don't lose any information. Second, and more importantly, traditional underwriting is often flawed. Predictive models will also be flawed, but in different ways, but if done right will be more accurate than traditional underwriting. It makes no sense to constrain your predictive model to match the flaws of traditional underwriting. If you want to predict mortality, then predict it as well as possible.

Quote:

Now what if I had additional data, maybe proprietary or maybe other public info that I think could improve upon the index. Would it make sense to build another test database that includes my data plus the published index result (as a single variable in my new test data) and then start to use that plus my fields to see if I can come up with a better working index that builds upon the original index?
If I understand what you are saying, its best to add the new fields to the original dataset and retrain the whole model.
Reply With Quote
  #9  
Old 12-04-2017, 02:11 PM
Chuck Chuck is offline
Member
SOA AAA
 
Join Date: Oct 2001
Location: Illinois
Posts: 4,229
Default

Thanks MPC and Guinness.

Guinness - I think I understand what you are saying regarding the flaws in traditional underwriting,not trying to reproduce them, and going for a kind of continuous mortality rate result. But it seems to me the problem is having adequate learning and test data. It seems it would have to be enormous and I am not sure how it could easily fit with what MPC described.

The mortality rate for an individual record is either 1 or 0 (over some period of time). It seems that to really predict a mortality "rate", by definition you must put the records into classes of some kind whether traditional or otherwise and then predict the rate on the classes. I'm sure there is something here I am not getting my head around.

One thought with the direct approach and predicting a continuous mortality rate makes sense to me. The SOA has trouble classifying all the mortality data from companies (because of all the variations in underwriting) in order to develop mortality tables for the various overall types of underwriting (SI, GI, accelerated, tryaditional medical, etc).

It seems to me that a new approach should be to ultimately dump the idea of a mortality "table" and replace it with a continuous index of predicted mortality rates (based on whatever set of data fields, where age/duration are just two of the variables) that get used directly in any actuarial application. Then the SOA mortality studies, instead of being used to build static tables (which seem to be getting more and more non-homogeneous), would be for the purpose of measuring the A/E fit of the index on ALL the industry data, slicing and dicing the data in various ways to measure how the index fits the data and ongoing suggesting new fields, methods to come up with a better mortality index. To me intuitively, it seems you would need this level of enormous amounts of data to do anything meaningful.

It currently takes years to build the SOA static tables and the results seem to get less and less ideal in relation to all the variables that currently get applied with current underwriting practices.
Reply With Quote
  #10  
Old 12-04-2017, 02:14 PM
Chuck Chuck is offline
Member
SOA AAA
 
Join Date: Oct 2001
Location: Illinois
Posts: 4,229
Default

Quote:
Originally Posted by Guinness View Post
If I understand what you are saying, its best to add the new fields to the original dataset and retrain the whole model.
My thought here is that the original dataset and index may be proprietary (I don't know but I asssuming that they are often proprietary?). If all we have is the index, then maybe using it as a single field is a reasonable substitute.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 12:58 PM.


Powered by vBulletin®
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.28041 seconds with 9 queries