

FlashChat  Actuarial Discussion  Preliminary Exams  CAS/SOA Exams  Cyberchat  Around the World  Suggestions 

Thread Tools  Search this Thread  Display Modes 
#1




Single Model Pure Premium GLM: Any reason to not use Poisson over Tweedie?
I've been testing various single model solutions for nonAuto, Pure Premium modeling in R. My goal is to minmax deviance and Gini. The Tweedie distribution usually performs slightly better, but the 'tweedie' and 'HDtweedie' packages have some limitations that make it difficult to work with.
The CAS Monograph 5 suggests Poisson for Frequency Modeling, Gamma for Severity Modeling, and Tweedie for single model pure premium. But I look at that as a suggestion rather than the end all be all. Is there any major disadvantage to using Poisson in respects to Pure Premium modeling I'm overlooking? The 'HDtweedie' package is a wrapper of the glmnet. However, glmnet allows the use of a sparse matrix in dgCMatrix format which removes all 0's to reduce storage size. HDtweedie only allows a Matrix format. When I try using model.matrix using the same variables, the matrix size is too large. I checked memory.limit() and I have 16gigs allocated to R. There's also some better metrics you can pull from the glmnet package that you can't from HDTweedie. I prefer the glmnet cause has reasonable coefficients and penalizes noncredible categories. As a result, it outperforms the glm by 20%. Edit for people that discover this later and don't want to browse the data: This thread got a bit distracted. Here's what I've gathered from user response and other readings I have done. Per the "Practitioners Guide to GLMS (I'd link it, but it links to a direct pdf): Page 3: log linked Poisson GLMs are equivalent to multiplicative balance principles of minimum bias estimations (minimum bias estimation goes all the way back to the 1960s and were used when computing power was limited) Page 19: Loglinked Poisson is commonly used for Frequencies because the log link makes it a multiplicative model (much easier to implement and compare factors) and because it is invariant to measures of time: modeling frequencies per year will yield the same results as per month. Page 20: Loglinked Gamma is commonly used for Severities because the log link makes it multiplicative and Gamma is invariant to measures of currency. Measuring severity in dollars and cents will yield same results. The log linked Tweedie distribution w/ p in (1,2) is considered a compound PoissonGamma distribution. The closer to 1, it acts more like Poisson, and closer to 2, it acts more like Gamma. Common values are 1.51..65. It also makes the assumption that frequency and severity are highly correlated. I'm not too familiar with the Tweedie distribution likelihood function, but due to its complexity, it's a bit harder to grab some metrics. It really depends on your data if it is appropriate. When you select a family, you're choosing the meanvariance relationship. For Poisson GLMs, the meanvariance relationship is the identity. Despite the warnings that most statistical software gives you, it's completely reasonable to model a relationship in continuous data in which the relationship between two variables is linear on the log scale, and the variance increases in accordance with the mean. If you look at the residuals, you can determine whether the Poisson meanvariance relationship is accurate. If not, may be better to use Gamma whose meanvariance is x^2. Back to my original question: Is there any major disadvantages of using Poisson over Tweedie? No, but it's worth also checking Gamma. Last edited by Actuarially Me; 02212019 at 10:57 AM.. 
#3




Note that one of the parameters you can select for a Tweedie model is the "power parameter". Let's call it p.
Note that if p = 1, the model is the same as Poissondistributed model. If p = 2, the model is the same as a Gammadistributed model. So using a Poisson model is essentially saying that frequency is driving pure premium (which might very well be the case if severity is pretty homogeneous regardless of the risk profile on the books). One thing you might do to assess if a Poisson model is appropriate is to create a Severity model and see if there are significant variation for severity with your data. If not (very few variables are statistically significant and/or parameter values have very little deviation and lift chart is fairly flat), then you have support for modeling pure premium with Poisson. 
#5




Quote:
I remember seeing h.20 supports Tweedie. I haven't dealt too much with it yet, but something worth looking into. It's not the file size that's large really, it's the sparse matrix that is created. Only about 75,000 observations, but encoding the dummy variables blows up the size I guess. So I'd rather use h.20 rather than investing time in getting Spark up and running. Have you noticed any improvements over h.20 or spark? 
#7




For 75k records you can use your cellphone. I would not be considering h20 when there are much better R packages.
It sounds like you have some high cardinality categorical features. Why do you think it is the shrinkage priors of a ridge/lasso regression are giving you better coefficient estimates? Think about it for a while. Do you think you don’t need to reduce the number of levels of those factors or add some hierarchical structure just because you’re using ________ software/package/fancy machine learning method? Also, Think about why you may not want to/be able to use a counting distribution to model a random process X w support on [0, Inf)
__________________
7 8 9 Last edited by FactuarialStatement; 02132019 at 09:27 PM.. 
#9




Quote:
How about being helpful instead of being insufferable all the time? Every post I see of yours you're putting someone down, don't even give helpful advice, and end up in a dick measuring contest with other users. I'm sorry mommy and daddy didn't pay attention to you or put your pictures on the refrigerator. It's pretty clear you're the reason they're divorced. You'll probably retort with something doubling down on your arrogance, but please leave the discussion for people that actually want to be helpful. I'm new to actuarial modeling, worked 5 years in reserving, then switched to a non actuarial predictive modeling role. I don't know all of the pricing actuarial best practices and am just trying to do better at my job. Every pricing example I've seen uses personal lines Auto data which has much more quality data than the lines I'm working with. I don't work for a large company and have to build models from the ground up with no one to really bounce ideas off of. None of the actuaries I work with have a background in predictive modeling. This is the only actuarial forum I know of, so I come here when I have actuarial specific questions. I guess I'm used to the data science community, who are generally collaborative. This is the first GLM I'm implementing for this company. They have their rating plan set up through SQL and ran through a web service. All I can do is update the rating factors. Doing anything else, I'd need to rehaul the SQL and webservice, which is my longterm goal. I'm not currently concerned about feature selection. The goal of the model isn't to find the best point estimate, but be better at segmenting risks than the current rating plan, hence focusing on Gini and deviance. Is this ideal? Of course not, but it's the situation I'm in. So instead of assuming everyone here is an idiot; realize not everyone works under ideal circumstances. Or just work on being a better person! You'll feel better long term and people will like you more if you share your knowledge rather than the small shots of dopamine you receive with your sardonic comments. I understand the difference between frequency and severity I understand the theoretical differences between Poisson, Gamma, and Tweedie I understand high cardinality is bad I understand the importance of feature engineering I understand how Lasso/Ridge regression work I only know of the 'tweedie', 'HDtweedie', and 'HDBoost' packages that handle tweedie distributions. I've only known about these packages for a couple months and from my experience, they're not friendly with tidyverse and/or haven't been updated to work with tidyverse. This leads me to rely more on the more fleshed out packages. I need a multiplicative model since that's what underwriters are used to, so as far as I know, I'm limited to Poisson, Gamma, and Tweedie. I'm able to create better visualizations with Poisson given the package support and thus am willing to give up some predictive power because of this. I came here looking for advice and appreciate everyone who has shared some. I hope these threads will deem useful for future predictive modelers looking for advice so they don't make the same mistakes as me. As a user pointed out, h.20 has built in Tweedie support. That was useful information. Is it overkill for the small amount of data I'm training? Yes, but it's another possibility if I want to keep using the Tweedie distribution. Another user pointed out, it's likely that the data I'm working with is highly frequency driven, hence why Poisson is performing better. Solid advice. 
#10




Relevant paper: https://www.casact.org/pubs/forum/17...rossEvans.pdf
The authors present a case study of a claims severity model, which is commonly modeled using a gamma GLM. They compare that approach to a minimum bias model, which is equivalent to a Poisson GLM, and conclude that the min bias aka Poisson model validated similarly to (or maybe slightly better than) the gamma model. So at least based on this I'd say that yes, you're probably OK going with the Poisson model for PP. 
Tags 
glm, poisson, tweedie 
Thread Tools  Search this Thread 
Display Modes  

