Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix init with scale pos weight. #11280

Merged
merged 5 commits into from
Feb 25, 2025

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Feb 24, 2025

Close #11198

Use one-step Newton if scale pos weight is used.

Actually, ignoring the scale pos weight should improve the training performance since the weight explicitly instructs xgboost to create bias. I think the discrepancy for the bosch dataset is caused by a different initialization point for the Newton iteration.

One-step Newton is quite close to optimal for logistic loss. Also, the result isn't changed for unweighted data.

@trivialfis
Copy link
Member Author

Please help review when you are available @razdoburdin @david-cortes

@david-cortes
Copy link
Contributor

@trivialfis I cannot find much info in the docs about what scale_pos_weight does.

Does it set weights for observations of the positive class in the same way as passing weights to the DMatrix does? If so, then the intercept for it should be obtainable by a weighted mean instead. And what's more, you shouldn't even need to do the calculation with a vector of weights, since the unweighted mean could be adjusted after-the-fact if you know what the scaling should be.

Otherwise, from what I see in the issue description, the performance increase from this change would be just a coincidence - this is an imbalanced dataset and one-step Newton happens to drive the number closer to zero, so if the imbalance were to be towards the other side it should have the opposite effect.

Also from the issue: it looks like the parameters being tried are very suboptimal since the test accuracy (from a quick look at the data description, haven't seen it in detail) appears to be below what you'd obtain from constant predictions. A better metric to follow for such purposes would be the training logloss (which is what xgboost is optimizing for), or the test AUROC after tuning the hyperparameters.

@trivialfis
Copy link
Member Author

@david-cortes You are correct, as mentioned in the PR description, ignoring the weight should improve the training loss instead.

We can change it to mean-adjusting for logistic

@david-cortes
Copy link
Contributor

@david-cortes You are correct, as mentioned in the PR description, ignoring the weight should improve the training loss instead.

We can change it to mean-adjusting for logistic

But if scale_pos_weight acts as weights, shouldn't it also be accounted in the logloss calculations?

@trivialfis
Copy link
Member Author

But if scale_pos_weight acts as weights, shouldn't it also be accounted in the logloss calculations?

Currently not. I don't have a strong preference for this, since:

  • Validation datasets should not be affected, this should be considered a training hyper-parameter.
  • One can make an argument that hyper-parameter should not prevent metrics from making accurate estimations of model performance. Biased (weighted) estimation might not be desirable. But then we have sample weight, which acts differently. So, I will keep it as it's for now.

@david-cortes
Copy link
Contributor

@trivialfis Does scale_pos_weight have an effect on other functionalities that depend on a calculation of the objective function during training? For example, min_split_loss, leafwise growth policy, reg_lambda, etc.

If so, then it sounds like you might want to consider having an option to account for it in the training metrics calculations too. Could be helpful when the number is large enough that it substantially changes what the model is optimizing for.

@trivialfis
Copy link
Member Author

No, it affects only the gradient. It's a really old parameter, I don't think it's used consistently like sample weight, but so far proven useful.

@trivialfis trivialfis force-pushed the fix-scale-pos-weight-init branch from d1e1182 to 173f2c9 Compare February 24, 2025 20:00
@trivialfis
Copy link
Member Author

since the unweighted mean could be adjusted after-the-fact if you know what the scaling should be.

Done, not the best way to handle weighted mean, but better to be consistent with other glm-like objectives. ;-)

@trivialfis
Copy link
Member Author

@david-cortes Please help take another look when you are available.

@trivialfis trivialfis mentioned this pull request Feb 25, 2025
9 tasks

for (std::size_t i = 0, n = h_s.Size(); i < n; ++i) {
// revert the mean back to sum, which is the number of positive samples
auto n_pos = h_s(i) * m;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it has sample weights, here it would need to use the sum of weights instead of the number of rows.

}

// Special handling for the scale_pos_weight parameter
auto w = this->param_.scale_pos_weight;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are scalar calculations, they could be done in higher precision (fp64 or even 'long double') without loss of speed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning toward using one-step newton instead. I don't want too much workaround for a single parameter in the initialization step.

Copy link
Contributor

@david-cortes david-cortes Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say the mean initialization approach is quite valuable in other important ways too.

For example, in the referenced issue, it would make the model predictions have an expected value of $\mathbb{E}[y] = 0.5$ by design, which would not be the case with a one-step Newton initialization.

@david-cortes
Copy link
Contributor

@trivialfis I haven't looked at the code but: does scale_pos_weight apply also to objectives like reg:logistic?

@trivialfis
Copy link
Member Author

I haven't looked at the code but: does scale_pos_weight apply also to objectives like reg:logistic?

Yes, but in regression, there are no "positive samples" or "negative samples", so one should not use this parameter.

@david-cortes
Copy link
Contributor

david-cortes commented Feb 25, 2025

I haven't looked at the code but: does scale_pos_weight apply also to objectives like reg:logistic?

Yes, but in regression, there are no "positive samples" or "negative samples", so one should not use this parameter.

In theory it shouldn't, but you could still generalize the calculation for decimal values:

$s \times y \log {p} + (1 - y) \log (1 - p)$

If scale_pos_weight gets applied to reg:logistic, the optimal intercept adjustment would be exactly the same as for binary:logistic.

@david-cortes
Copy link
Contributor

@trivialfis I'm thinking it might be better to throw an error when the data has both sample weights and scale_pos_weight.

@trivialfis
Copy link
Member Author

trivialfis commented Feb 25, 2025

@david-cortes That's an option as well. But I'm preparing for a new release, let's not introduce a last-minute breaking change for now. ;-)

If it has sample weights, here it would need to use the sum of weights instead of the number of rows.

I will revert to the Newton method if scale_pos_weight is used to avoid further complicating things.

lint.

lint.

Fix.

warning.

tidy.

Lint.
This reverts commit 009fee30c1f799e6633f2d1e5ac9fb81097dbf99.
@trivialfis trivialfis force-pushed the fix-scale-pos-weight-init branch from 5818cec to c52eac4 Compare February 25, 2025 19:19
@trivialfis trivialfis requested a review from hcho3 February 25, 2025 19:33
@trivialfis
Copy link
Member Author

Reverted to using newton method. Will revisit it in the next release.

@trivialfis
Copy link
Member Author

Since this PR uses the same setting in 2.1, merging to branch out to 3.0.

@trivialfis trivialfis merged commit bdc5a26 into dmlc:master Feb 25, 2025
59 checks passed
@trivialfis trivialfis deleted the fix-scale-pos-weight-init branch February 25, 2025 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Accuracy degradation on bosch dataset
2 participants