Release glum 2.0.0 · Quantco/glum

Breaking changes:

Renamed the package to glum!!! Hurray! Celebration.
GeneralizedLinearRegressor and GeneralizedLinearRegressorCV lose the fit_dispersion parameter.
Please use the dispersion method of the appropriate family instance instead.
All functions now use sample_weight as a keyword instead of weights, in line with scikit-learn.
All functions now use dispersion as a keyword instead of phi.
Several methods GeneralizedLinearRegressor and GeneralizedLinearRegressorCV that should have been private have had an underscore prefixed on their names: tear_down_from_fit, _set_up_for_fit, _set_up_and_check_fit_args, _get_start_coef, _solve and _solve_regularization_path.
glum.GeneralizedLinearRegressor.report_diagnostics and glum.GeneralizedLinearRegressor.get_formatted_diagnostics are now public.

New features:

P1 and P2 now accepts 1d array with the same number of elements as the unexpanded design matrix. In this case,
the penalty associated with a categorical feature will be expanded to as many elements as there are levels,
all with the same value.
ExponentialDispersionModel gains a dispersion method.
BinomialDistribution and TweedieDistribution gain a log_likelihood method.
The fit method of GeneralizedLinearRegressor and GeneralizedLinearRegressorCV
now saves the column types of pandas data frames.
GeneralizedLinearRegressor and GeneralizedLinearRegressorCV gain two properties: family_instance and link_instance.
GeneralizedLinearRegressor.std_errors and GeneralizedLinearRegressor.covariance_matrix have been added and support non-robust, robust (HC-1), and clustered
covariance matrices.
GeneralizedLinearRegressor and GeneralizedLinearRegressorCV now accept family='gaussian' as an alternative to family='normal'.

Bug fix:

The score method of GeneralizedLinearRegressor and GeneralizedLinearRegressorCV now accepts data frames.
Upgraded the code to use tabmat 3.0.0.

Other:

A major overhaul of the documentation. Everything is better!
The methods of the link classes will now return scalars when given scalar inputs. Under certain circumstances, they'd return zero-dimensional arrays.
There is a new benchmark available glm_benchmarks_run based on the Boston housing dataset. See here.
glm_benchmarks_analyze now includes offset in the index. See here.
glmnet_python was removed from the benchmarks suite.
The innermost coordinate descent was optimized. This speeds up coordinate descent dominated problems like LASSO by about 1.5-2x. See here.

Provide feedback