-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix init with scale pos weight. #11280
Conversation
Please help review when you are available @razdoburdin @david-cortes |
@trivialfis I cannot find much info in the docs about what Does it set weights for observations of the positive class in the same way as passing weights to the DMatrix does? If so, then the intercept for it should be obtainable by a weighted mean instead. And what's more, you shouldn't even need to do the calculation with a vector of weights, since the unweighted mean could be adjusted after-the-fact if you know what the scaling should be. Otherwise, from what I see in the issue description, the performance increase from this change would be just a coincidence - this is an imbalanced dataset and one-step Newton happens to drive the number closer to zero, so if the imbalance were to be towards the other side it should have the opposite effect. Also from the issue: it looks like the parameters being tried are very suboptimal since the test accuracy (from a quick look at the data description, haven't seen it in detail) appears to be below what you'd obtain from constant predictions. A better metric to follow for such purposes would be the training logloss (which is what xgboost is optimizing for), or the test AUROC after tuning the hyperparameters. |
@david-cortes You are correct, as mentioned in the PR description, ignoring the weight should improve the training loss instead. We can change it to mean-adjusting for logistic |
But if |
Currently not. I don't have a strong preference for this, since:
|
@trivialfis Does If so, then it sounds like you might want to consider having an option to account for it in the training metrics calculations too. Could be helpful when the number is large enough that it substantially changes what the model is optimizing for. |
No, it affects only the gradient. It's a really old parameter, I don't think it's used consistently like sample weight, but so far proven useful. |
d1e1182
to
173f2c9
Compare
Done, not the best way to handle weighted mean, but better to be consistent with other glm-like objectives. ;-) |
@david-cortes Please help take another look when you are available. |
src/objective/regression_obj.cu
Outdated
|
||
for (std::size_t i = 0, n = h_s.Size(); i < n; ++i) { | ||
// revert the mean back to sum, which is the number of positive samples | ||
auto n_pos = h_s(i) * m; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it has sample weights, here it would need to use the sum of weights instead of the number of rows.
src/objective/regression_obj.cu
Outdated
} | ||
|
||
// Special handling for the scale_pos_weight parameter | ||
auto w = this->param_.scale_pos_weight; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are scalar calculations, they could be done in higher precision (fp64 or even 'long double') without loss of speed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning toward using one-step newton instead. I don't want too much workaround for a single parameter in the initialization step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say the mean initialization approach is quite valuable in other important ways too.
For example, in the referenced issue, it would make the model predictions have an expected value of
@trivialfis I haven't looked at the code but: does |
Yes, but in regression, there are no "positive samples" or "negative samples", so one should not use this parameter. |
In theory it shouldn't, but you could still generalize the calculation for decimal values: If |
@trivialfis I'm thinking it might be better to throw an error when the data has both sample weights and |
@david-cortes That's an option as well. But I'm preparing for a new release, let's not introduce a last-minute breaking change for now. ;-)
I will revert to the Newton method if |
lint. lint. Fix. warning. tidy. Lint.
This reverts commit 009fee30c1f799e6633f2d1e5ac9fb81097dbf99.
5818cec
to
c52eac4
Compare
Reverted to using newton method. Will revisit it in the next release. |
Since this PR uses the same setting in 2.1, merging to branch out to 3.0. |
Close #11198
Use one-step Newton if scale pos weight is used.
Actually, ignoring the scale pos weight should improve the training performance since the weight explicitly instructs xgboost to create bias. I think the discrepancy for the bosch dataset is caused by a different initialization point for the Newton iteration.
One-step Newton is quite close to optimal for logistic loss. Also, the result isn't changed for unweighted data.