Skip to content

glum 2.0.0

Compare
Choose a tag to compare
@MarcAntoineSchmidtQC MarcAntoineSchmidtQC released this 08 Oct 15:02
· 275 commits to main since this release
aa22946

Breaking changes:

  • Renamed the package to glum!!! Hurray! Celebration.
  • GeneralizedLinearRegressor and GeneralizedLinearRegressorCV lose the fit_dispersion parameter.
    Please use the dispersion method of the appropriate family instance instead.
  • All functions now use sample_weight as a keyword instead of weights, in line with scikit-learn.
  • All functions now use dispersion as a keyword instead of phi.
  • Several methods GeneralizedLinearRegressor and GeneralizedLinearRegressorCV that should have been private have had an underscore prefixed on their names: tear_down_from_fit, _set_up_for_fit, _set_up_and_check_fit_args, _get_start_coef, _solve and _solve_regularization_path.
  • glum.GeneralizedLinearRegressor.report_diagnostics and glum.GeneralizedLinearRegressor.get_formatted_diagnostics are now public.

New features:

  • P1 and P2 now accepts 1d array with the same number of elements as the unexpanded design matrix. In this case,
    the penalty associated with a categorical feature will be expanded to as many elements as there are levels,
    all with the same value.
  • ExponentialDispersionModel gains a dispersion method.
  • BinomialDistribution and TweedieDistribution gain a log_likelihood method.
  • The fit method of GeneralizedLinearRegressor and GeneralizedLinearRegressorCV
    now saves the column types of pandas data frames.
  • GeneralizedLinearRegressor and GeneralizedLinearRegressorCV gain two properties: family_instance and link_instance.
  • GeneralizedLinearRegressor.std_errors and GeneralizedLinearRegressor.covariance_matrix have been added and support non-robust, robust (HC-1), and clustered
    covariance matrices.
  • GeneralizedLinearRegressor and GeneralizedLinearRegressorCV now accept family='gaussian' as an alternative to family='normal'.

Bug fix:

  • The score method of GeneralizedLinearRegressor and GeneralizedLinearRegressorCV now accepts data frames.
  • Upgraded the code to use tabmat 3.0.0.

Other:

  • A major overhaul of the documentation. Everything is better!
  • The methods of the link classes will now return scalars when given scalar inputs. Under certain circumstances, they'd return zero-dimensional arrays.
  • There is a new benchmark available glm_benchmarks_run based on the Boston housing dataset. See here.
  • glm_benchmarks_analyze now includes offset in the index. See here.
  • glmnet_python was removed from the benchmarks suite.
  • The innermost coordinate descent was optimized. This speeds up coordinate descent dominated problems like LASSO by about 1.5-2x. See here.