Home Forums Analytics / Predictive Analytics Regularization in R Reply To: Regularization in R

#1324
Michael Barr
Participant

    Hi – I can go into more detail if you like, but here is a high level description of a workflow and how to implement it with glmnet (and cv.glmnet).

    As you know, Lasso/Ridge/Elastic Net perform a penalized regression – the penalties are L1 norm (aka “Manhattan Distance”), L2 Norm (Euclidean distane), and a linear weighting of the two, respectively. The mix between these is controlled by a hyperparameter which if I recall is denoted /alpha, and the overall magnitude or amount of penalty is controlled by the hyperparameter /lambda.

    How do you decide which penalty or what mixture of L1/L2 to use (/alpha)? Sometimes that can be decided a priori (i.e. you are primarily interested in feature selection, so you choose L1 penalty to induce drop-out). If I recall, that is the default with /lambda=1 i.e. Lasso. /lambda=0 would be Ridge. Double check the help file on that.

    How do you decide the optimal amount of penalty (/lambda)? Again, sometimes you may know a priori what you want – like you may want no penalty (/lambda = 0, a vanilla MLE regression), or you want penalty to dominate (i.e. /lambda = something arbitrarily large, so all coefficients are shrunk to approximately 0 and you return the null model). But you can see these are special or edge cases and you wouldn’t be using glmnet if you wanted them. So you always need to tune /lambda.

    glmnet proposes k-fold cross-validation for hyperparameter tuning and it is impemented with cv.glmnet(). So you want to start with this method – it will give you a grid of values for /lambda chosen heuristically or you can provide your own grid of test values, but since there is no real interpretation or significance to the value of /lambda above 0 (it depends on the number and scale of your coefficients), you pretty much want to start with letting the algo choose a grid for you.

    So you run cv.glmnet() to cross-validate /lambda (and potentially /alpha) and receive back a corresponding cross-validated estimated measure of predictive accuracy (I forget defaults, but I believe it is negative log-likelihood). You can plot the metric by calling the method’s plot function [simply plot(cv.object)]. You can also access a couple values of /lambda which are directly stored and which correspond to the minimum prediction error and the 1se prediction error, meaning a slightly more regularized result which is estimated at 1 s.e. of the minimum. That is often chosen since we have a preference for parsimony in models.

    Now with your selected value of lambda, you get your “final” model fit to all the data by using the glmnet() method directly. If I recall correctly, this isn’t totally necessary since each proposed model is fit to all data in addition to the folds and stored in cv.glmnet() but I may be mistaken. In any case, glmnet() will return a smaller object with just the one set of coefficients and fitted results.

     

    HTH