Convert parameters of logistic regression model from glment to sklearn #6

lingling93 · 2019-04-28T20:09:52Z

code in predictr.ipynb:

fit = hetior::glmnet_train(X = X_train, y = y_train, alpha = 0.2, s = lambda, cores = 10, seed = 0, penalty.factor=penalty, lambda.min.ratio=1e-8, nlambda=150, standardize=TRUE)

How to match these parameters with sklearn logistic regression parameters?

The text was updated successfully, but these errors were encountered:

dhimmel · 2019-04-30T14:15:31Z

Thanks @lingling93 for the interest.

Looking at the source code for the hetior::glmnet_train function, it fits the logistic regression model with the following:

  fit$cv_model <- glmnet::cv.glmnet(x = X, y = y, weights = w, family='binomial',
    alpha=alpha, parallel=TRUE, ...)
  fit$lambda <- fit$cv_model[[s]]

I believe the closest thing in sklearn is sklearn.linear_model.SGDClassifier, which can fit an elastic net logistic regression model. Note that the alpha parameter in glmnet is equivalent to the l1_ratio parameter for sklearn (elastic-net mixing parameter). The lambda parameter in glment corresponds to the alpha parameter in sklearn (regularization parameter).

If I remember correctly, glmnet::cv.glmnet is more efficient at evaluating many regularization parameters than implementations in that use sklearn.model_selection.GridSearchCV with SGDClassifier. I believe that is why I did the model fitting in R rather than Python. It may be worth checking out python-glmnet, which claims to follow the sklearn API while providing access to the underlying glmnet fortran code.

Additional References

lingling93 · 2019-05-09T08:33:37Z

@dhimmel Thank you for your quick and informative answer.
I think I have a clue now, still another question, code in predictr.ipynb:
I didn't find the 's' parameter in glmnet. What is 's' ?

dhimmel · 2019-05-09T15:11:06Z

The s parameter specifies which λ (lambda regularization parameter) value to use based on the cross validation results. From the glmnet vignette:

lambda.min is the value of λ that gives minimum mean cross-validated error. The other λ saved is lambda.1se, which gives the most regularized model such that error is within one standard error of the minimum. To use that, we only need to replace lambda.min with lambda.1se above.

For Project Rephetio, we use s="lambda.1se". We also used lambda.1se in our previous work, where we wrote the following:

Regularized logistic regression requires a parameter, λ, setting the strength of regularization. We optimized λ separately for each model fit. Using 10-fold cross-validation and the “one-standard-error” rule to choose the optimal λ from deviance, we adopted a conservative approach designed to prevent overfitting [80].

The “one-standard-error” rule is further described in Regularization Paths for Generalized Linear Models via Coordinate Descent:

We often use the “one-standard-error” rule when selecting the best model; this acknowledges the fact that the risk curves are estimated with error, so errs on the side of parsimony (Hastie et al. 2009). Cross-validation can be used to select α as well, although it is often viewed as a higher-level parameter and chosen on more subjective grounds.

In Project Rephetio discussion, I made the following comment related to our use of lambda.1se:

Such a small number of positive coefficients is a bit disappointing. Our feature assessment ... shows that a broad range of metapaths are informative. The origin of our model's selectivity appears to lie with the “one-standard-error” rule [2] we use to identify the optimal regularization strength (λ). Our model had high cross-validated standard error leading to substantial regularization on top of the deviance minimizing model. While it's tempting to relax our λ selection, I'd rather be more confident in a minimalist model than risk a less coherent but more complex model.

lingling93 · 2019-05-12T21:12:46Z

@dhimmel Hi Daniel, problem solved. I tried python glmnet to reproduce your work, training the logistic regression model and matching every parameter. Then I checked the lambda_best, with different seeds, it fluctuate a little bit, with a certain seed, it gives a close result to yours. The coefficient of prior_prob is more steady around 0.7. So I think this is enough to prove that I can use python glmnet.
Thank you for all your help.q

dhimmel · 2019-05-12T21:52:27Z

Cool! I'm looking forward to trying out the python glmnet myself. Yeah there is a random seed and I'm guessing it won't be possible to achieve exactly the same results in python versus R because the randomness will be different.

dhimmel changed the title ~~Reverse parameters of logistic regression model~~ Convert parameters of logistic regression model from glment to sklearn Apr 30, 2019

anzonyquispe mentioned this issue Oct 30, 2021

New Information anzonyquispe/work_experience#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert parameters of logistic regression model from glment to sklearn #6

Convert parameters of logistic regression model from glment to sklearn #6

lingling93 commented Apr 28, 2019 •

edited by dhimmel

Loading

dhimmel commented Apr 30, 2019

lingling93 commented May 9, 2019 •

edited

Loading

dhimmel commented May 9, 2019

lingling93 commented May 12, 2019

dhimmel commented May 12, 2019

Convert parameters of logistic regression model from glment to sklearn #6

Convert parameters of logistic regression model from glment to sklearn #6

Comments

lingling93 commented Apr 28, 2019 • edited by dhimmel Loading

dhimmel commented Apr 30, 2019

Additional References

lingling93 commented May 9, 2019 • edited Loading

dhimmel commented May 9, 2019

lingling93 commented May 12, 2019

dhimmel commented May 12, 2019

lingling93 commented Apr 28, 2019 •

edited by dhimmel

Loading

lingling93 commented May 9, 2019 •

edited

Loading