Fix init with scale pos weight. #11280

trivialfis · 2025-02-24T18:50:33Z

Use one-step Newton if scale pos weight is used.

Actually, ignoring the scale pos weight should improve the training performance since the weight explicitly instructs xgboost to create bias. I think the discrepancy for the bosch dataset is caused by a different initialization point for the Newton iteration.

One-step Newton is quite close to optimal for logistic loss. Also, the result isn't changed for unweighted data.

trivialfis · 2025-02-24T18:51:10Z

Please help review when you are available @razdoburdin @david-cortes

david-cortes · 2025-02-24T19:03:38Z

@trivialfis I cannot find much info in the docs about what scale_pos_weight does.

Does it set weights for observations of the positive class in the same way as passing weights to the DMatrix does? If so, then the intercept for it should be obtainable by a weighted mean instead. And what's more, you shouldn't even need to do the calculation with a vector of weights, since the unweighted mean could be adjusted after-the-fact if you know what the scaling should be.

Otherwise, from what I see in the issue description, the performance increase from this change would be just a coincidence - this is an imbalanced dataset and one-step Newton happens to drive the number closer to zero, so if the imbalance were to be towards the other side it should have the opposite effect.

Also from the issue: it looks like the parameters being tried are very suboptimal since the test accuracy (from a quick look at the data description, haven't seen it in detail) appears to be below what you'd obtain from constant predictions. A better metric to follow for such purposes would be the training logloss (which is what xgboost is optimizing for), or the test AUROC after tuning the hyperparameters.

trivialfis · 2025-02-24T19:18:30Z

@david-cortes You are correct, as mentioned in the PR description, ignoring the weight should improve the training loss instead.

We can change it to mean-adjusting for logistic

david-cortes · 2025-02-24T19:20:46Z

@david-cortes You are correct, as mentioned in the PR description, ignoring the weight should improve the training loss instead.

We can change it to mean-adjusting for logistic

But if scale_pos_weight acts as weights, shouldn't it also be accounted in the logloss calculations?

trivialfis · 2025-02-24T19:28:30Z

But if scale_pos_weight acts as weights, shouldn't it also be accounted in the logloss calculations?

Currently not. I don't have a strong preference for this, since:

Validation datasets should not be affected, this should be considered a training hyper-parameter.
One can make an argument that hyper-parameter should not prevent metrics from making accurate estimations of model performance. Biased (weighted) estimation might not be desirable. But then we have sample weight, which acts differently. So, I will keep it as it's for now.

david-cortes · 2025-02-24T19:38:15Z

@trivialfis Does scale_pos_weight have an effect on other functionalities that depend on a calculation of the objective function during training? For example, min_split_loss, leafwise growth policy, reg_lambda, etc.

If so, then it sounds like you might want to consider having an option to account for it in the training metrics calculations too. Could be helpful when the number is large enough that it substantially changes what the model is optimizing for.

trivialfis · 2025-02-24T19:57:52Z

No, it affects only the gradient. It's a really old parameter, I don't think it's used consistently like sample weight, but so far proven useful.

trivialfis · 2025-02-24T20:01:01Z

since the unweighted mean could be adjusted after-the-fact if you know what the scaling should be.

Done, not the best way to handle weighted mean, but better to be consistent with other glm-like objectives. ;-)

trivialfis · 2025-02-24T21:11:56Z

@david-cortes Please help take another look when you are available.

david-cortes · 2025-02-25T18:02:23Z

src/objective/regression_obj.cu

+
+    for (std::size_t i = 0, n = h_s.Size(); i < n; ++i) {
+      // revert the mean back to sum, which is the number of positive samples
+      auto n_pos = h_s(i) * m;


If it has sample weights, here it would need to use the sum of weights instead of the number of rows.

david-cortes · 2025-02-25T18:03:05Z

src/objective/regression_obj.cu

+    }
+
+    // Special handling for the scale_pos_weight parameter
+    auto w = this->param_.scale_pos_weight;


Since these are scalar calculations, they could be done in higher precision (fp64 or even 'long double') without loss of speed.

I'm leaning toward using one-step newton instead. I don't want too much workaround for a single parameter in the initialization step.

I would say the mean initialization approach is quite valuable in other important ways too.

For example, in the referenced issue, it would make the model predictions have an expected value of $\mathbb{E}[y] = 0.5$ by design, which would not be the case with a one-step Newton initialization.

david-cortes · 2025-02-25T18:05:31Z

@trivialfis I haven't looked at the code but: does scale_pos_weight apply also to objectives like reg:logistic?

trivialfis · 2025-02-25T18:14:25Z

I haven't looked at the code but: does scale_pos_weight apply also to objectives like reg:logistic?

Yes, but in regression, there are no "positive samples" or "negative samples", so one should not use this parameter.

david-cortes · 2025-02-25T18:19:12Z

I haven't looked at the code but: does scale_pos_weight apply also to objectives like reg:logistic?

Yes, but in regression, there are no "positive samples" or "negative samples", so one should not use this parameter.

In theory it shouldn't, but you could still generalize the calculation for decimal values:

$s \times y \log {p} + (1 - y) \log (1 - p)$

If scale_pos_weight gets applied to reg:logistic, the optimal intercept adjustment would be exactly the same as for binary:logistic.

david-cortes · 2025-02-25T18:35:31Z

@trivialfis I'm thinking it might be better to throw an error when the data has both sample weights and scale_pos_weight.

trivialfis · 2025-02-25T19:04:00Z

@david-cortes That's an option as well. But I'm preparing for a new release, let's not introduce a last-minute breaking change for now. ;-)

If it has sample weights, here it would need to use the sum of weights instead of the number of rows.

I will revert to the Newton method if scale_pos_weight is used to avoid further complicating things.

lint. lint. Fix. warning. tidy. Lint.

This reverts commit 009fee30c1f799e6633f2d1e5ac9fb81097dbf99.

trivialfis · 2025-02-25T19:48:51Z

Reverted to using newton method. Will revisit it in the next release.

trivialfis · 2025-02-25T20:56:17Z

Since this PR uses the same setting in 2.1, merging to branch out to 3.0.

trivialfis force-pushed the fix-scale-pos-weight-init branch from d1e1182 to 173f2c9 Compare February 24, 2025 20:00

trivialfis mentioned this pull request Feb 25, 2025

3.0 release tracking issue. #11249

Closed

9 tasks

david-cortes reviewed Feb 25, 2025

View reviewed changes

trivialfis added 5 commits February 26, 2025 03:16

Fix init with scale pos weight.

2ab3ef0

lint.

f93b7f4

Use mean with adjustment.

78fb00d

lint. lint. Fix. warning. tidy. Lint.

Revert "Use mean with adjustment."

33e26c8

This reverts commit 009fee30c1f799e6633f2d1e5ac9fb81097dbf99.

Comment.

c52eac4

trivialfis force-pushed the fix-scale-pos-weight-init branch from 5818cec to c52eac4 Compare February 25, 2025 19:19

trivialfis requested a review from hcho3 February 25, 2025 19:33

trivialfis merged commit bdc5a26 into dmlc:master Feb 25, 2025
59 checks passed

trivialfis deleted the fix-scale-pos-weight-init branch February 25, 2025 20:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix init with scale pos weight. #11280

Fix init with scale pos weight. #11280

trivialfis commented Feb 24, 2025 •

edited

Loading

trivialfis commented Feb 24, 2025

david-cortes commented Feb 24, 2025

trivialfis commented Feb 24, 2025

david-cortes commented Feb 24, 2025

trivialfis commented Feb 24, 2025

david-cortes commented Feb 24, 2025

trivialfis commented Feb 24, 2025

trivialfis commented Feb 24, 2025

trivialfis commented Feb 24, 2025

david-cortes Feb 25, 2025

david-cortes Feb 25, 2025

trivialfis Feb 25, 2025

david-cortes Feb 25, 2025 •

edited

Loading

david-cortes commented Feb 25, 2025

trivialfis commented Feb 25, 2025

david-cortes commented Feb 25, 2025 •

edited

Loading

david-cortes commented Feb 25, 2025

trivialfis commented Feb 25, 2025 •

edited

Loading

trivialfis commented Feb 25, 2025

trivialfis commented Feb 25, 2025

Fix init with scale pos weight. #11280

Fix init with scale pos weight. #11280

Conversation

trivialfis commented Feb 24, 2025 • edited Loading

trivialfis commented Feb 24, 2025

david-cortes commented Feb 24, 2025

trivialfis commented Feb 24, 2025

david-cortes commented Feb 24, 2025

trivialfis commented Feb 24, 2025

david-cortes commented Feb 24, 2025

trivialfis commented Feb 24, 2025

trivialfis commented Feb 24, 2025

trivialfis commented Feb 24, 2025

david-cortes Feb 25, 2025

Choose a reason for hiding this comment

david-cortes Feb 25, 2025

Choose a reason for hiding this comment

trivialfis Feb 25, 2025

Choose a reason for hiding this comment

david-cortes Feb 25, 2025 • edited Loading

Choose a reason for hiding this comment

david-cortes commented Feb 25, 2025

trivialfis commented Feb 25, 2025

david-cortes commented Feb 25, 2025 • edited Loading

david-cortes commented Feb 25, 2025

trivialfis commented Feb 25, 2025 • edited Loading

trivialfis commented Feb 25, 2025

trivialfis commented Feb 25, 2025

trivialfis commented Feb 24, 2025 •

edited

Loading

david-cortes Feb 25, 2025 •

edited

Loading

david-cortes commented Feb 25, 2025 •

edited

Loading

trivialfis commented Feb 25, 2025 •

edited

Loading