GLM tests of scikit-learn #723

lorentzenchr · 2023-10-31T20:59:47Z

Scikit-learn has some very strict tests for GLMs in https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/linear_model/_glm/tests/test_glm.py. I modified the file to test glum.GeneralizedLinearRegressor instead, see https://gist.github.com/lorentzenchr/2e319bcfd4aadfbea64c6330e5b33521. Running pytest test_glm.py results in 76 failed, 212 passed, 104 warnings.

It might be interesting to include those tests in glum.

The text was updated successfully, but these errors were encountered:

jtilly · 2023-10-31T22:10:00Z

Thanks a lot! For future reference, these are the failing tests:

FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='binomial')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='binomial')-True-irls-cd] - assert 0.690107820640591 == 1.2837501395684472 ± 6.4e-05
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='binomial')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='binomial')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='binomial')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='poisson')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='poisson')-True-irls-cd] - assert 0.8533955861703721 == 1.2837501395684472 ± 6.4e-05
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='poisson')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='poisson')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='poisson')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='gamma')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='gamma')-True-irls-cd] - assert 0.8051836315439316 == 1.2837501395684472 ± 6.4e-05
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='gamma')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='gamma')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='gamma')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='tweedie')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='tweedie')-True-irls-cd] - assert 0.6292876313885498 == 1.2837501395684472 ± 6.4e-05
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='tweedie')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='tweedie')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='tweedie')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='binomial')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='binomial')-True-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='binomial')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='binomial')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='poisson')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='poisson')-True-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='poisson')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='poisson')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='gamma')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='gamma')-True-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='gamma')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='gamma')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='tweedie')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='tweedie')-True-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='tweedie')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='tweedie')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-True-irls-cd] - assert 1.3802141997400277 == 1.2837501395684472 ± 6.4e-06
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-True-irls-cd] - assert 1.706489240970734 == 1.2837501395684472 ± 6.4e-06
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-True-irls-cd] - assert 2.1915879526750373 == 1.2837501395684472 ± 6.4e-06
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-True-irls-cd] - assert 1.2585688960366177 == 1.2837501395684472 ± 6.4e-06
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-True-irls-cd] - assert 0.6901078206405913 == 1.2837501395684472 ± 1.3e-04
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-True-irls-cd] - assert 0.8533955861703684 == 1.2837501395684472 ± 1.3e-04
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-True-irls-cd] - assert 0.8051836315439317 == 1.2837501395684472 ± 1.3e-04
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-True-irls-cd] - assert 0.6292876313579576 == 1.2837501395684472 ± 1.3e-04
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-False-irls-cd] - AssertionError: 
================================================================== 76 failed, 212 passed, 107 warnings in 197.18s (0:03:17) ==================================================================

I can get the 12 failing L-BFGS related tests to pass by not standardizing the design matrix here.

64 failing tests to go.

MarcAntoineSchmidtQC · 2023-11-01T14:07:39Z

All the failing tests seem to be for unpenalized regression with a singular design matrix (either the wide problem: p=12, n=4, or the stacked problem where we duplicate all columns). Is that correct? Maybe this is a dumb question but what is the expected result in this case? I'm not surprised to see the tests failing in this case for glum, but in case we want to support this the tests are great!

lorentzenchr · 2023-11-01T15:37:04Z

It is often said that singular design matrices don't allow for a solution, but this is wrong, there are just infinitely many solutions. For OLS, there is a particular nice one called minimal norm solution, i.e. the solution/coefficients having minimal L2 norm among all solutions/coefficients.
It may by that this is of no high practical value, but in light of the discovered interpolation regime, it is at least interesting.

I have at least one PR for the line search in mind that could help at least with a few of those test failures.

lorentzenchr mentioned this issue Nov 2, 2023

ENH deal with tiny loss improvements in line search #724

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GLM tests of scikit-learn #723

GLM tests of scikit-learn #723

lorentzenchr commented Oct 31, 2023

jtilly commented Oct 31, 2023

MarcAntoineSchmidtQC commented Nov 1, 2023

lorentzenchr commented Nov 1, 2023

GLM tests of scikit-learn #723

GLM tests of scikit-learn #723

Comments

lorentzenchr commented Oct 31, 2023

jtilly commented Oct 31, 2023

MarcAntoineSchmidtQC commented Nov 1, 2023

lorentzenchr commented Nov 1, 2023