Actuarial Outpost
 
Go Back   Actuarial Outpost > Exams - Please Limit Discussion to Exam-Related Topics > SoA/CAS Preliminary Exams > Exam PA: Predictive Analytics
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions


Upload your resume securely at https://www.dwsimpson.com
to be contacted when new jobs meet your skills and objectives.


Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old 12-06-2019, 10:10 PM
samdman82 samdman82 is offline
Member
 
Join Date: Jul 2007
Posts: 240
Default which link functions are "easily interpretable"?

i know the log link relates the GLM coefficients to the % changes to the target variable, but which ones do NOT? How would i know whether they do or do not?
Reply With Quote
  #2  
Old 12-07-2019, 11:26 PM
actuary121110 actuary121110 is offline
SOA
 
Join Date: Jun 2017
Location: Kuala Lumpur, Malaysia
Studying for PA, QFIQF, QFIPM
College: University of Michigan, Ann Arbor (Alumni)
Posts: 18
Default

The only easy ones are
1. identity - the coefficient is the average change in predicted y for 1 unit increase in x
2. logit - exp(coefficient) is the average change in odds of y for 1 unit increase in x
3. log - exp(coefficient) is the average multiplicative change in predicted y for 1 unit increase in x
Reply With Quote
  #3  
Old 12-10-2019, 06:53 PM
Relmiw Relmiw is offline
Member
CAS
 
Join Date: Apr 2013
Posts: 201
Default

Quote:
Originally Posted by actuary121110 View Post
The only easy ones are
1. identity - the coefficient is the average change in predicted y for 1 unit increase in x
2. logit - exp(coefficient) is the average change in odds of y for 1 unit increase in x
3. log - exp(coefficient) is the average multiplicative change in predicted y for 1 unit increase in x
I am having a hard time identifying how to interpret GLM coefficients. I notice the June 2019 solution states the following regarding a Gamma distribution with Log link:
Quote:
An appropriate way to interpret coefficients is to eponentiate them and subtract 1"
This is congruent with actuary121110's statement above. So is this the coefficient interpretation regardless of overall distribution (Gamma) due to the link function (Log)? As in, a Gaussian log linked would have the same interpretation?

Are the following true?
1. The overall distribution (Gaussian, Gamma, Binomial, Poisson, Quasi-Poisson) should be selected based on the distribution of the target variable, taking into consideration discrete--continues, positive--negative, skew.
2. The link function should be selected similar to the above. For example, a Gaussian is sufficient to model continuous variables without a skew, but should be modified to have a log link function if only positive results are desired.
3. Interpreting the coefficients is a function of the link function, but not the overall distribution. For example, coefficients for either a Gaussian, Gamma or Poisson distribution with a log link are best interpreted as the product of the predictor and the exponentiated coefficient

Thank you very much for any last-minute help
Reply With Quote
  #4  
Old 12-10-2019, 08:38 PM
Colymbosathon ecplecticos's Avatar
Colymbosathon ecplecticos Colymbosathon ecplecticos is offline
Member
 
Join Date: Dec 2003
Posts: 6,167
Default

Let's look just at #2 below:
Quote:
Originally Posted by Relmiw View Post
2. The link function should be selected similar to the above. For example, a Gaussian is sufficient to model continuous variables without a skew, but should be modified to have a log link function if only positive results are desired.
Suppose that we are estimating the vote count for president in the 2020 election --- our target variable is specifically Trump's vote count.

This is clearly a discrete variable, but we will certainly be happy with a continuous approximation. It can't be negative, so your reasoning above suggests that we should account for skewness. But really, that doesn't make any sense does it?

True, you might want a skewness component to account for correlation, but not only because it is a non-negative variable.

If you know any statistical mechanics, you'll know of many other examples.
__________________
"What do you mean I don't have the prerequisites for this class? I've failed it twice before!"


"I think that probably clarifies things pretty good by itself."

"I understand health care now especially very well."
Reply With Quote
  #5  
Old 12-10-2019, 09:18 PM
Relmiw Relmiw is offline
Member
CAS
 
Join Date: Apr 2013
Posts: 201
Default

Quote:
Originally Posted by Colymbosathon ecplecticos View Post
Let's look just at #2 below:


Suppose that we are estimating the vote count for president in the 2020 election --- our target variable is specifically Trump's vote count.

This is clearly a discrete variable, but we will certainly be happy with a continuous approximation. It can't be negative, so your reasoning above suggests that we should account for skewness. But really, that doesn't make any sense does it?

True, you might want a skewness component to account for correlation, but not only because it is a non-negative variable.

If you know any statistical mechanics, you'll know of many other examples.
Ok, so looking at #2. For modeling the number of votes one candidate is going to receive, my first thought is that this is discrete technically, but it might as well be continuous and either distribution type would be fine. My next thought is that while it is technically not allowed to be negative, whatever our sample data is is probably ballpark millions and nowhere near dipping into the negatives, so that is also not a concern. I'd then want to plot* the target variable to determine whether a symmetrical Gaussian distribution is alright, or whether we need to use a Gamma model.

So I suppose the lesson you're imparting regarding #2 is that while those rules are technically in place, they need to be met with consideration and can be bent.

Quote:
... It can't be negative, so your reasoning above suggests that we should account for skewness. But really, that doesn't make any sense does it?
That doesn't make sense? Wouldn't a Gaussian distribution be inappropriate if the distribution of training Target is right skewed?

*Edited post to swap out the word "model" for "plot"

Last edited by Relmiw; 12-10-2019 at 09:33 PM..
Reply With Quote
  #6  
Old 12-10-2019, 09:27 PM
windows7forever windows7forever is offline
Member
SOA
 
Join Date: Apr 2016
Posts: 412
Default

Quote:
Originally Posted by Colymbosathon ecplecticos View Post
Let's look just at #2 below:


Suppose that we are estimating the vote count for president in the 2020 election --- our target variable is specifically Trump's vote count.

This is clearly a discrete variable, but we will certainly be happy with a continuous approximation. It can't be negative, so your reasoning above suggests that we should account for skewness. But really, that doesn't make any sense does it?

True, you might want a skewness component to account for correlation, but not only because it is a non-negative variable.

If you know any statistical mechanics, you'll know of many other examples.
For skewness correction, we use log transformation. Do we just do such transformation on continuous variables instead of discrete variables unless the discrete variable has a lot distinct values like continuous variable?

For example, that ER was not log transformed due to its few distinct values in hospital readmission project. Also it's due to the majority of ER being 0 that could not be log transformed.

For votes, if it's at aggregate level such as city and state count, then we may think about GLM Poisson with offset. If it's at some average per case level, then we may consider GLM Poisson with weight instead. Offset has been tested in last December's exam, but weight has not been tested before except in a short online module example.
Reply With Quote
  #7  
Old 12-11-2019, 04:06 AM
ThereIsNoSpoon ThereIsNoSpoon is offline
Member
CAS SOA
 
Join Date: Sep 2014
Studying for PA,FAP
College: When the smog clears, _ _ _ _
Favorite beer: PABST! BLUE RIBBON!
Posts: 491
Default

Is there going to be a case where we're asked to interpret inverse/cloglog or any of those link functions? is it worth worrying about them?
__________________
Spoiler:

------------------------------------------
P FM MFE C MLC VEE Economics VEE Applied Statistics VEE Corporate Finance PA FAP

Want to connect on LinkedIn? PM me!
Reply With Quote
  #8  
Old 12-11-2019, 05:55 AM
KarimZ's Avatar
KarimZ KarimZ is online now
Member
SOA
 
Join Date: May 2015
Location: Pakistan
College: Graduate, Bsc Accounting and Finance
Posts: 204
Default

Quote:
Originally Posted by actuary121110 View Post
The only easy ones are
1. identity - the coefficient is the average change in predicted y for 1 unit increase in x
2. logit - exp(coefficient) is the average change in odds of y for 1 unit increase in x
3. log - exp(coefficient) is the average multiplicative change in predicted y for 1 unit increase in x
In the exam, do you think we should end up recommending one of the 3 mentioned distribution and link functions since they are easy to interpret and will make writing the executive summary easier?
__________________
P FM MFE C LTAM PA

VEEs

FAP Interim FAP Final

APC

Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 04:01 AM.


Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.21293 seconds with 11 queries