

FlashChat  Actuarial Discussion  Preliminary Exams  CAS/SOA Exams  Cyberchat  Around the World  Suggestions 


Thread Tools  Display Modes 
#1




ACTEX Webinar: Mary Pat Campbell's "Intro to Predictive Analytics for Actuaries"
ACTEX eLearning Presents: Mary Pat Campbell
ACTEX eLearning Webinar: "Introduction to Predictive Analytics for Actuaries" November 30, 2017, 1:00 – 2:30 PM ET CoSponsored by the Canadian Institute of Actuaries 1.8 SOA/EA CPD, CAS CE Credits/Training Hours This is a potential source of 1.5 hours of CPD for the Canadian Institute of Actuaries Predictive Analytics is all the rage in the actuarial (and larger) world – but what is it? And how can actuaries use it and implement it? In this webcast, viewers will be introduced to predictive analytics as applicable to actuarial work, and the world of insurance more generally. There will be no formulas! The focus will be on what different techniques can do, as opposed to the mathematical underpinnings of the approaches. Some pros and cons of each technique will be explored. Enroll Today: Click Here Sampling of approaches to be covered:
Instructor: Mary Pat Campbell, FSA, MAAA, PRM, is Vice President, Insurance Research at Conning in Hartford, Connecticut. She also teaches courses on computing (Excel, Access, and VBA) and business writing for actuarial science students at the University of Connecticut. Mary Pat has had a longstanding interest in modeling techniques, having worked on models covering*molecular physics, neuroscience, finance, population studies, signal processing, statistics, information retrieval, electronic logic games, and, of course, actuarial models. Mary Pat Campbell is a founding member of the SOA Modeling Session and is coeditor of its newsletter, The Modeling Platform. Mary Pat wrote "Getting Started in Predictive Analytics: Books and Courses" in the December 2015 issue of the Predictive Analytics and Futurism newsletter (link:Click HERE) and presented at the 2016 Life and Annuity Symposium: Session 16  Predictive Analytics, Where Do I Even Start? Who Should Attend? Actuaries and nonactuaries who are interested in understanding the different approaches to predictive analytics. No prior knowledge of predictive analytics or the R language is assumed. Enroll Today: Click Here Questions? Contact me: Alan.ACTEX
__________________
Alan C. Elman  eLearning Lead ACTEX Learning  Mad River Books (860) 3795470 Ext. 4007 Click to view our Actuarial eLearning Offerings: Webinars  Exam Prep  VEE Courses 
#2




Will be doing this tomorrow  recordings will be available for purchase afterwards, too, but wouldn't you like a live webinar with me? Get some CE credit! The year is waning... 
#4




I'm glad you liked it Oedipus Rex. We have several "data science" applied to actuarial tasks webinars coming up. You can see the whole list here:
Webinars Some higlights are: Data Science for Actuarial Managers Advanced Python for Actuaries Data Visualization Design Concepts in R and Machine Learning in R for Actuaries With a couple more webinars in the works on blockchain.
__________________
Stephen Camilli, FSA  President ACTEX Learning  Mad River Books Click to view our Actuarial eLearning Offerings: Webinars  Exam Prep  VEE Courses 
#6




Hi MPC, Stephen  I was on the webinar and enjoyed it. Certainly a lot to bite off in 90 minutes.
So I am going to ask some really basic, expose my ignorance, questions (feel free to consider any of this feedback that you use however you use feedback)... I am bad w/the lingo and also not knowledgeable of what languages like R actually do. My main interest is in understanding the life underwriting Predictive modeling/indexing projects. Tell me if this is the what is basically going on or set me straight where I am not making sense... So say I am LexisNexis (or some other data aggregator) and I have a training database with lots of fields that I think are related to mortality expectations. I assume they are things like credit scores, financial info, medical info, other stuff(?) on individuals. I don't see how you can use that to actually try to directly predict mortality rates. So I am assuming what we are really doing is developing a "formula" or "algorithm" which uses the data to predict the underwriting class (preferred, select, std, rated, etc) that would be assigned if the individual actually went thru traditional medical underwriting. Presumably then the big reinsurers can take those predictions on their database of risks and back test how well the prediction matches the actual assignments. To the extent that they don't match, presumably they can look at their mortality experience, reallocated to the new classes and try to predict whether the predictive index actually does a better or worse job (I've heard predictions that the PI actually does better in out years then traditional). So, for a simple example, when you are doing some "multivariable" linear regression on say, n variables (X1,,XN), you come with some "index formula" of "coefficients" (C1...CN) such that INDEX = C1*X1 + ... + CN*XN and you use the index to categorize the risks into classes. So what does R do exactly? Is it coming up with the coefficient C1...CN that best fit the test data (under some measure of fit)? Then is the exercise to choose the relevant variables, or other more complex methods besides linear regression to come up with the index until you decide on the one that appears to work best? And R is the tool that performs some algorithm to acheive the best fit once you have chosen your variables and method? Is that basically what is being done? Set me straight where I go astray if you will. Now what if I had additional data, maybe proprietary or maybe other public info that I think could improve upon the index. Would it make sense to build another test database that includes my data plus the published index result (as a single variable in my new test data) and then start to use that plus my fields to see if I can come up with a better working index that builds upon the original index? Last edited by Chuck; 12032017 at 08:22 PM.. 
#7




The specific functions I used from R were doing best fit (for regressions, minimizing the sum of squares of residuals  but one can use other weightings/metrics; the metrics being optimized are different for classification processes).
There's nothing particularly special about R, other than it was developed by a community of statisticsminded people and thus has been optimized for specific kinds of analysis and modelfitting. Theoretically, one could implement any of the algorithms used in R via Excel VBA (I DON'T RECOMMEND IT, THOUGH). R functions have generally been developed by people who know the underlying theory and algorithmic approaches; it's the kind of stuff I used to code in Fortran back in the day, when I took numerical computing classes in the math department. You can do it R, Fortran, python, whatever  the point is somebody (or, more specifically, somebodies) have done the work to code the standard algorithms for people already. I gave multiple examples of linear regressions  I had to tell the lm() function which variables I wanted to regress against. I had to tell what kind of function I wanted to regress against (the first examples were linear, but I also did a few other kinds). So yes, you'd have to stipulate the form of the model you're trying to fit, what you want to optimize, which data to use to do the fit. The functions/procedures give various statistics back to indicate significance of various variables, amount of correlation, etc. When there are various suites for predictive analytics out there at a higher level, such as with reinsurers (as you mention), I believe they've fitted and tested a variety of data sets to see what kinds of structures work best for the kinds of models they're trying to fit (or the kinds of problems they're trying to solve). That's where the crossvalidation and other techniques come in  what they do is help support that particular model structures work well for the kinds of data you're looking at. As new data comes in, parameters are updated to the particular structure  I mentioned credibility as something similar. For my demonstrations, I used public data on Kaggle because I wanted to easily share it with attendees without them needing to install any particular software. If one wants to use proprietary data, then yes, you could add it to publicly available data... but you wouldn't be posting it to Kaggle Kernels. R can be used in a variety of environments, and it needn't be public as I did. For those who want to see the Kaggle Kernel I ran, it's here: https://www.kaggle.com/meepbobeep/in...inr/notebook I plan on adding some more comments and some more code over time. 
#8




Quote:
Quote:

#9




Thanks MPC and Guinness.
Guinness  I think I understand what you are saying regarding the flaws in traditional underwriting,not trying to reproduce them, and going for a kind of continuous mortality rate result. But it seems to me the problem is having adequate learning and test data. It seems it would have to be enormous and I am not sure how it could easily fit with what MPC described. The mortality rate for an individual record is either 1 or 0 (over some period of time). It seems that to really predict a mortality "rate", by definition you must put the records into classes of some kind whether traditional or otherwise and then predict the rate on the classes. I'm sure there is something here I am not getting my head around. One thought with the direct approach and predicting a continuous mortality rate makes sense to me. The SOA has trouble classifying all the mortality data from companies (because of all the variations in underwriting) in order to develop mortality tables for the various overall types of underwriting (SI, GI, accelerated, tryaditional medical, etc). It seems to me that a new approach should be to ultimately dump the idea of a mortality "table" and replace it with a continuous index of predicted mortality rates (based on whatever set of data fields, where age/duration are just two of the variables) that get used directly in any actuarial application. Then the SOA mortality studies, instead of being used to build static tables (which seem to be getting more and more nonhomogeneous), would be for the purpose of measuring the A/E fit of the index on ALL the industry data, slicing and dicing the data in various ways to measure how the index fits the data and ongoing suggesting new fields, methods to come up with a better mortality index. To me intuitively, it seems you would need this level of enormous amounts of data to do anything meaningful. It currently takes years to build the SOA static tables and the results seem to get less and less ideal in relation to all the variables that currently get applied with current underwriting practices. 
#10




My thought here is that the original dataset and index may be proprietary (I don't know but I asssuming that they are often proprietary?). If all we have is the index, then maybe using it as a single field is a reasonable substitute.

Thread Tools  
Display Modes  

