Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Porting loss function to XGBoost #76

Open
kawaho opened this issue Jul 25, 2022 · 1 comment
Open

Porting loss function to XGBoost #76

kawaho opened this issue Jul 25, 2022 · 1 comment
Labels

Comments

@kawaho
Copy link

kawaho commented Jul 25, 2022

Hi authors of hep_ml,
I am wondering if there is an easy way to use the loss function (in particular the BinFlatnessLossFunction) from this package in XGBoost since XGBoost support custom loss function in the typical grad, hess format (https://xgboost.readthedocs.io/en/stable/tutorials/custom_metric_obj.html). This could help to improve the speed of training since hep_ml does not support multithreading (please correct me if I am wrong).
Thanks,
Andy

@arogozhnikov
Copy link
Owner

it should be possible. Just try and see.

hep_ml has more general loss format, see here: https://github.com/arogozhnikov/hep_ml/blob/master/hep_ml/losses.py#L88-L138

you need init, fit, and prepare_tree_params within xgboost.

Difference with other methods is its ability to remember additional characteristics of observation (such as control variables).
Possibility of such factors is ignored by most loss functions I'm aware about: they assume that loss for each observation does not depend on others.
So depending on implementation in xgboost (i.e. if it preserves order of observations on each call) you can just init & fit outside of xgboost, then wrap prepare_tree_params and pass to xgboost loss.

That said, I'd start from checking that you're really bottlenecked by tree building, not loss computation (because flatness computation is rather resource-consuming). If so - you'll see no benefit from moving to xgboost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants