SLEP025: Losing Accuracy in Scikit-Learn Score #96

lorentzenchr · 2025-12-07T12:26:26Z

What

This SLEP is for finding a consensus on how to removing accuracy as a default metric for classifiers which currently is: classifiers.score.

Why

Because accuracy has many severe weaknesses and we even baked a probability threashold of 50% into it.

What else

I hope the irony of the title is not lost ~~by its passive aggressiveness~~.

This SLEP is based on @amueller's proposal scikit-learn/scikit-learn#28995.

@scikit-learn/core-devs @scikit-learn/communication-team @scikit-learn/contributor-experience-team @scikit-learn/documentation-team ping

jjerphan · 2025-12-07T14:05:56Z

Thank you for writing the SLEP.

While discussing the rectification of this design choice is appropriate (I really think it is valuable), I find the title a bit aggressive or at least detrimental to the intent of the proposal (improving scikit-learn's theoretical consistency): how about "Redefining default metrics"?

I think this could be implemented as part of scikit-learn 2.0.

reshamas · 2025-12-07T17:36:04Z

@lorentzenchr How about: Removing the accuracy metric in scikit-learn scoring

betatim · 2025-12-08T07:55:22Z

slep025/proposal.rst

+
+   The fact that different scoring metrics focus on different things, i.e. ``predict``
+   vs. ``predict_proba``, and not all classifiers provide ``predict_proba`` complicates
+   a unified choice.


Do we need to choose the same metric for all classifiers?

I think the answer is yes because people will use the results of est1.score(X, y) and est2.score(X, y) to evaluate which one is the better estimator. It seems very hard to educate people that they can't compare scores from different estimators

(This is almost a rhetorical question, but I wanted to double check my thinking)

Given your assumption that users will continue to compare score results of different estimators, and given that a generally satisfying metric does not exist, the conclusion is to remove the score method.

My currently best choice for a general classifier metric is the skill score (R2) variant of the Brier score. Classifiers and regressors would then have the same metric, which is nice.

I'm not sure I am ready to remove score().

It is listed as alternative, not as proposal!

lucyleeow · 2025-12-10T03:42:40Z

What about 'Replacing accuracy' - it reads clearer IMO and could mean that we replace it with another metric or replace it with 'nothing'.

thomasjpfan · 2025-12-10T15:24:03Z

slep025/proposal.rst

+a. The time frame of the deprecation period. Should it be longer than the usual 2 minor
+   releases? Should step 1 and 2 happen in the same minor release?


I think we need to decide on a default by step 2, so we can tell users what to set to emulate the new default.

I am okay with 1 and 2 happening at the same time as long as we choose a default. If we have not chosen a new default, then only 1 can happen.

I think if we can't agree a new default we should not start the process. I think this because agreeing the new default is the hard part of this task and if we start it we are on the hook for finishing the transition, which will be impossible without agreement.

Otherwise I agree with Thomas that we can do 1 and 2 at the same time.

I included it as proposal.

thomasjpfan · 2025-12-10T15:25:25Z

slep025/proposal.rst

+
+There are three questions with this approach:
+
+a. The time frame of the deprecation period. Should it be longer than the usual 2 minor


I feel like this needs a longer than the usual 2 minor releases. Personally, I'll be okay with 3 minor releases or 1 major release.

betatim · 2025-12-11T07:49:31Z

slep025/proposal.rst

+available in scikit-learn, see ``sklearn.metric`` module and [2]_. The advantages of
+removing ``score`` are:
+
+- An active choice by the user is triggered as there is no more default.


My assumption is that most people who blindly use score() do not know better. It is unclear to me if forcing them to make a decision is going to improve the quality of the decision they make. scikit-learn is about "machine learning without the learning curve", so we are on the hook for making a "not unreasonable" decision for the beginner user.

It doesn't stop us from extending our documentation and educational material to increase the chances of people reading it (eg we could have a blog post about this topic when the deprecation starts to explain why this is a much bigger deal than it might seem) and hopefully making a better decision than the default argument to score().

if forcing them to make a decision is going to improve the quality of the decision they make

Let's go through the options:

a lucky user chooses a strictly consistent scoring function like log-loss or brier score: situation improved

an informed user chooses a metric close to a business/application metric he has: situation improved

a stubborn (or unlucky) user chooses accuracy or something similar (balanced accuracy, F2, you name it): situation is not worse.

Qunitessence: Situation can only improve, but never worsens.

lorentzenchr · 2026-01-16T15:13:21Z

I improved (I hope so) the writing, clearness of proposal, reasoning and alternatives. May I ask someone to approve? Or can/shall I merge myself?

I'd like to call for a vote soon.

thomasjpfan · 2026-01-16T17:19:18Z

slep025/proposal.rst

+   with ``predict_proba``. At the same time, Brier score returns a valid score even
+   for ``predict``, in constrast to log loss (which returns infinity for false
+   certainty). On top, this would result in classifiers and regressors having the same
+   score (it's just a different name), returning values in the range [0, 1].


For the multi-class case, will the brier_score_loss be normalized to be in the [0, 1] range?

Specifically, do we set scale_by_half=True in brier_score_loss?

Yes, I would do that.

On a second thought: it does not matter. The R^2 version is invariant to scaling: Brier(model) / Brier(mean of data) = MSE(model) / MSE(mean of data).

SLEP 25 Killing Accuracy

d5e5765

lorentzenchr mentioned this pull request Dec 7, 2025

Add "scoring" argument to estimator's score method scikit-learn/scikit-learn#28995

Open

Change title

30e5d52

lorentzenchr changed the title ~~SLEP025: Killing Accuracy in Scikit-Learn~~ SLEP025: Losing Accuracy in Scikit-Learn Dec 7, 2025

betatim reviewed Dec 8, 2025

View reviewed changes

thomasjpfan reviewed Dec 10, 2025

View reviewed changes

betatim reviewed Dec 11, 2025

View reviewed changes

lorentzenchr added 2 commits January 16, 2026 16:10

Improve proposal,, reasoning, alternatives

63f4f2f

Merge branch 'main' into slep25

58a254f

More precise on timing

a5a93ed

lorentzenchr changed the title ~~SLEP025: Losing Accuracy in Scikit-Learn~~ SLEP025: Losing Accuracy in Scikit-Learn Score Jan 16, 2026

thomasjpfan reviewed Jan 16, 2026

View reviewed changes

		a. The time frame of the deprecation period. Should it be longer than the usual 2 minor
		releases? Should step 1 and 2 happen in the same minor release?


		There are three questions with this approach:

		a. The time frame of the deprecation period. Should it be longer than the usual 2 minor

Uh oh!

SLEP025: Losing Accuracy in Scikit-Learn Score #96

Are you sure you want to change the base?

SLEP025: Losing Accuracy in Scikit-Learn Score #96

Conversation

lorentzenchr commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

What else

Uh oh!

jjerphan commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

reshamas commented Dec 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucyleeow commented Dec 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorentzenchr commented Jan 16, 2026

Uh oh!

thomasjpfan Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

lorentzenchr commented Dec 7, 2025 •

edited

Loading

jjerphan commented Dec 7, 2025 •

edited

Loading

lorentzenchr Dec 8, 2025 •

edited

Loading

thomasjpfan Jan 16, 2026 •

edited

Loading