Actuarial Outpost which link functions are "easily interpretable"?
 User Name Remember Me? Password
 Register Blogs Wiki FAQ Calendar Search Today's Posts Mark Forums Read
 FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions

 Upload your resume securely at https://www.dwsimpson.com to be contacted when new jobs meet your skills and objectives.

 Thread Tools Search this Thread Display Modes
#1
12-06-2019, 10:10 PM
 samdman82 Member Join Date: Jul 2007 Posts: 240
which link functions are "easily interpretable"?

i know the log link relates the GLM coefficients to the % changes to the target variable, but which ones do NOT? How would i know whether they do or do not?
#2
12-07-2019, 11:26 PM
 actuary121110 SOA Join Date: Jun 2017 Location: Kuala Lumpur, Malaysia Studying for PA, QFIQF, QFIPM College: University of Michigan, Ann Arbor (Alumni) Posts: 18

The only easy ones are
1. identity - the coefficient is the average change in predicted y for 1 unit increase in x
2. logit - exp(coefficient) is the average change in odds of y for 1 unit increase in x
3. log - exp(coefficient) is the average multiplicative change in predicted y for 1 unit increase in x
#3
12-10-2019, 06:53 PM
 Relmiw Member CAS Join Date: Apr 2013 Posts: 201

Quote:
 Originally Posted by actuary121110 The only easy ones are 1. identity - the coefficient is the average change in predicted y for 1 unit increase in x 2. logit - exp(coefficient) is the average change in odds of y for 1 unit increase in x 3. log - exp(coefficient) is the average multiplicative change in predicted y for 1 unit increase in x
I am having a hard time identifying how to interpret GLM coefficients. I notice the June 2019 solution states the following regarding a Gamma distribution with Log link:
Quote:
 An appropriate way to interpret coefficients is to eponentiate them and subtract 1"
This is congruent with actuary121110's statement above. So is this the coefficient interpretation regardless of overall distribution (Gamma) due to the link function (Log)? As in, a Gaussian log linked would have the same interpretation?

Are the following true?
1. The overall distribution (Gaussian, Gamma, Binomial, Poisson, Quasi-Poisson) should be selected based on the distribution of the target variable, taking into consideration discrete--continues, positive--negative, skew.
2. The link function should be selected similar to the above. For example, a Gaussian is sufficient to model continuous variables without a skew, but should be modified to have a log link function if only positive results are desired.
3. Interpreting the coefficients is a function of the link function, but not the overall distribution. For example, coefficients for either a Gaussian, Gamma or Poisson distribution with a log link are best interpreted as the product of the predictor and the exponentiated coefficient

Thank you very much for any last-minute help
#4
12-10-2019, 08:38 PM
 Colymbosathon ecplecticos Member Join Date: Dec 2003 Posts: 6,167

Let's look just at #2 below:
Quote:
 Originally Posted by Relmiw 2. The link function should be selected similar to the above. For example, a Gaussian is sufficient to model continuous variables without a skew, but should be modified to have a log link function if only positive results are desired.
Suppose that we are estimating the vote count for president in the 2020 election --- our target variable is specifically Trump's vote count.

This is clearly a discrete variable, but we will certainly be happy with a continuous approximation. It can't be negative, so your reasoning above suggests that we should account for skewness. But really, that doesn't make any sense does it?

True, you might want a skewness component to account for correlation, but not only because it is a non-negative variable.

If you know any statistical mechanics, you'll know of many other examples.
__________________
"What do you mean I don't have the prerequisites for this class? I've failed it twice before!"

"I think that probably clarifies things pretty good by itself."

"I understand health care now especially very well."
#5
12-10-2019, 09:18 PM
 Relmiw Member CAS Join Date: Apr 2013 Posts: 201

Quote:
 Originally Posted by Colymbosathon ecplecticos Let's look just at #2 below: Suppose that we are estimating the vote count for president in the 2020 election --- our target variable is specifically Trump's vote count. This is clearly a discrete variable, but we will certainly be happy with a continuous approximation. It can't be negative, so your reasoning above suggests that we should account for skewness. But really, that doesn't make any sense does it? True, you might want a skewness component to account for correlation, but not only because it is a non-negative variable. If you know any statistical mechanics, you'll know of many other examples.
Ok, so looking at #2. For modeling the number of votes one candidate is going to receive, my first thought is that this is discrete technically, but it might as well be continuous and either distribution type would be fine. My next thought is that while it is technically not allowed to be negative, whatever our sample data is is probably ballpark millions and nowhere near dipping into the negatives, so that is also not a concern. I'd then want to plot* the target variable to determine whether a symmetrical Gaussian distribution is alright, or whether we need to use a Gamma model.

So I suppose the lesson you're imparting regarding #2 is that while those rules are technically in place, they need to be met with consideration and can be bent.

Quote:
 ... It can't be negative, so your reasoning above suggests that we should account for skewness. But really, that doesn't make any sense does it?
That doesn't make sense? Wouldn't a Gaussian distribution be inappropriate if the distribution of training Target is right skewed?

*Edited post to swap out the word "model" for "plot"

Last edited by Relmiw; 12-10-2019 at 09:33 PM..
#6
12-10-2019, 09:27 PM
 windows7forever Member SOA Join Date: Apr 2016 Posts: 412

Quote:
 Originally Posted by Colymbosathon ecplecticos Let's look just at #2 below: Suppose that we are estimating the vote count for president in the 2020 election --- our target variable is specifically Trump's vote count. This is clearly a discrete variable, but we will certainly be happy with a continuous approximation. It can't be negative, so your reasoning above suggests that we should account for skewness. But really, that doesn't make any sense does it? True, you might want a skewness component to account for correlation, but not only because it is a non-negative variable. If you know any statistical mechanics, you'll know of many other examples.
For skewness correction, we use log transformation. Do we just do such transformation on continuous variables instead of discrete variables unless the discrete variable has a lot distinct values like continuous variable?

For example, that ER was not log transformed due to its few distinct values in hospital readmission project. Also it's due to the majority of ER being 0 that could not be log transformed.

For votes, if it's at aggregate level such as city and state count, then we may think about GLM Poisson with offset. If it's at some average per case level, then we may consider GLM Poisson with weight instead. Offset has been tested in last December's exam, but weight has not been tested before except in a short online module example.
#7
12-11-2019, 04:06 AM
 ThereIsNoSpoon Member CAS SOA Join Date: Sep 2014 Studying for PA,FAP College: When the smog clears, _ _ _ _ Favorite beer: PABST! BLUE RIBBON! Posts: 491

Is there going to be a case where we're asked to interpret inverse/cloglog or any of those link functions? is it worth worrying about them?
__________________
Spoiler:

------------------------------------------
P FM MFE C MLC VEE Economics VEE Applied Statistics VEE Corporate Finance PA FAP

Want to connect on LinkedIn? PM me!
#8
12-11-2019, 05:55 AM
 KarimZ Member SOA Join Date: May 2015 Location: Pakistan College: Graduate, Bsc Accounting and Finance Posts: 204

Quote:
 Originally Posted by actuary121110 The only easy ones are 1. identity - the coefficient is the average change in predicted y for 1 unit increase in x 2. logit - exp(coefficient) is the average change in odds of y for 1 unit increase in x 3. log - exp(coefficient) is the average multiplicative change in predicted y for 1 unit increase in x
In the exam, do you think we should end up recommending one of the 3 mentioned distribution and link functions since they are easy to interpret and will make writing the executive summary easier?
__________________
P FM MFE C LTAM PA

VEEs

FAP Interim FAP Final

APC

 Thread Tools Search this Thread Search this Thread: Advanced Search Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off

All times are GMT -4. The time now is 04:01 AM.

 -- Default Style - Fluid Width ---- Default Style - Fixed Width ---- Old Default Style ---- Easy on the eyes ---- Smooth Darkness ---- Chestnut ---- Apple-ish Style ---- If Apples were blue ---- If Apples were green ---- If Apples were purple ---- Halloween 2007 ---- B&W ---- Halloween ---- AO Christmas Theme ---- Turkey Day Theme ---- AO 2007 beta ---- 4th Of July Contact Us - Actuarial Outpost - Archive - Privacy Statement - Top