Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
layout: distill
title: The Philosophical and Statistical Underpinnings of ML Fairness
description: In Plato's celebrated Meno [2], after the title character Meno asks Socrates whether virtue can be taught, he is taken on a whirlwind tour of the intellectual concepts, only to be brought by Socrates to the conclusion that virtue is (1) a gift from the gods (2) a concept that neither Meno nor Socrates really understand. Having armed myself with a generous helping of hubris and Corbett-Davies et al.'s excellent JMLR article "The Measure and Mismeasure of Fairness" [1], I would like to take you on a similar tour. I'll depart from the usual mid-2010s story -- the difficulty with understanding fairness in ML cannot be reduced to figuring out what your principles are and then mapping those [3]. I'll also depart from the story by Corbett-Davies et al. -- it's not all about designing policies maximizing utility, however broadly utility is defined. Instead, I'll argue for a sort of reflective equilibrium: ethical reasoning about any particular case needs to consider counterarguments, and, in particular, any principle that's articulated should be tested against challenging counterexamples -- a little Socrates whispering in your ear.
date: 2025-04-28
future: true
htmlwidgets: true
hidden: false
Anonymize when submitting
authors:
- name: Anonymous
authors:
url: "https://en.wikipedia.org/wiki/Albert_Einstein"
affiliations:
name: Anonymous
Abstract
In Plato's celebrated Meno [2], after the title character Meno asks Socrates whether virtue can be taught, he is taken on a whirlwind tour of the intellectual concepts, only to be brought by Socrates to the conclusion that virtue is (1) a gift from the gods (2) a concept that neither Meno nor Socrates really understand. Having armed myself with a generous helping of hubris and Corbett-Davies et al.'s excellent JMLR article "The Measure and Mismeasure of Fairness" [1], I would like to take you on a similar tour. I'll depart from the usual mid-2010s story -- the difficulty with understanding fairness in ML cannot be reduced to figuring out what your principles are and then mapping those [3]. I'll also depart from the story by Corbett-Davies et al. -- it's not all about designing policies maximizing utility, however broadly utility is defined. Instead, I'll argue for a sort of reflective equilibrium: ethical reasoning about any particular case needs to consider counterarguments, and, in particular, any principle that's articulated should be tested against challenging counterexamples -- a little Socrates whispering in your ear.
Getting to the bottom of the philosophical underpinning of ML Fairness literature
Introduction
For many years now, the ML fairness literature displayed an awareness of the fact that there are multiple measures of fairness. It is understood that fairness is a complex concept that cannot be reduced to a single measure. Nevertheless, hidden statistical and philosophical assumptions underlie even many papers that attempt to grapple with the issue.
I will look at a few recent papers, and attempt a Socratic investigation of the assumptions underlying them. At the end, I will provide a set of problems that seem GPT-proof: I invite the readers to try them with their favourite LLM and seem the LLM get as confused as its training data.
All observational fairness measures are wrong
Observational measures of fairness -- measures that you can get from a labelled dataset -- came to prominence in ML in the context of the ProPublica COMPAS story in 2016.
The investigation revealed that the false positive rate of the recidivism prediction COMPAS for African-American defendants was much higher than the false-positive rate for Caucasian defendants -- a larger percentage of African-American defendants who did not up being re-arrested were predicted to be re-arrested.
A quick primer on observation fairness measures
Suppose we have a sensitive demographic$D$ and an outcome of interest of $X$ (in the case of COMPAS, $X$ is re-arrest), and a prediction of $X$ , $Y$ . Here are the usual measures of fairness:
Demographic parity:$P(Y=1|D=1) = P(Y=1|D=0)$ . That is, the same percentage of people gets re-arrested in different demographics. This is susceuptible to the objection that it might be that the correct predictions of re-arrest are different in different demographics. That might be because of systemic bias, but it could also be because of a benign lurking variable. For example, the fact that African-American defendants in the COMPAS dataset are younger statistically explains some of the disparity in re-arrest rates.
False-positive parity:$P(Y=1|D=1, X=0) = P(Y=1|D=0, X=0)$ . That is, the same percentage of people who are not re-arrested are predicted to be re-arrested in different demographics. The appeal is obvious: if you won't be re-arrested, the system shouldn't predict you will, and if the system is wrong, it should be wrong in the same way for everyone.
False-negative parity:$P(Y=1|D=1, X=1) = P(Y=1|D=0, X=1)$ . That is, the same percentage of people who are re-arrested are predicted to be re-arrested in different demographics. The argument is similar to the false-positive parity argument.
Calibration:$P(X=1|Y=1, D=1) = P(X=1|Y=1, D=0)$ . That is, if the system predicts re-arrest, the probability of re-arrest is the same in different demographics. If the system predicts a probability of re-arrest, a calibrated system will have that probability be correct. The argument for calibration is straightforward: we just want the system to be as right as possible. The argument against it must involve saying that we want to bias the system away from being right about $X$ , because $X$ itself is flawed, or that biasing the system make the system more just in another way.
As was quickly pointed out, if the base rates are different, in general, only one measure of fairness can possibly hold at a time.
Causal Fairness
The idea of causal fairness is that we should care about the counterfactual: if a person had been of a different demographic, would they have been treated differently? As we'll see, this is a tricky concept philosophically.
An intuitive explanation: a machine that can tell if you're 80% likely to be re-arrested
Here's a shot at an intuitive explanation: suppose that, in the two groups A and B, you have people likely to be re-arrested and people not likely to be re-arrested, and the proportions in the two groups differ (something like that -- but more complicated -- has to hold if the base-rates of arrest are different and arrests are not just based on the demographic and are otherwise random; while policing of different racial groups in the US is undoubtedly different, not all of it is due to race; for one thing, African-Americans are substantially younger, leading to more arrests: old people don't have what it takes for the kind of crime that gets you arrested).
Suppose the system can perfectly discern whether an individual is in the set that's likely to be re-arrested R (defined, say, as the set of all people whose probability of re-arrest is 80%) or in the set that's not likely to be re-arrested, and that's all a system can do.
If a recidivism prediction system perfectly discerns whether you're in the group that's likely to be re-arrested or not, if group A has more people who are likely to be re-arrested, it will also have more people labelled as likely to be re-arrested but who happen to not be -- not everyone likely to be re-arrested ends up being re-arrested.
Calibration
But if the best you can do is tell if someone is in the subset that's likely to be re-arrested or not, it's difficult to argue against fudging the output and not doing that for the sake of achieving parity in the false-positive rates.
In fact, under our hypothetical, the only way to accomplish this would be to first discern whether the person is likely or not likely to be re-arrested, and to then arbitrarily flip the prediction for a subset of one of the demographic groups.
Following the assignment by the perfect predictor of likely re-arrest is using classifier that satisfies calibration: where the output (with some error) tries as well as possible to say 1 if the probability of 1 is above, say, 80%, and 0 otherwise.
A conflict of intuitions
Under the artificial scenario I just described, it's difficult to argue for non-calibrated classifiers.
And yet, many people (including myself) feel the pull of the argument that disparate false-positive rates indicate a problem.
Here is one explanation: we implicitly reject the whole framework. Maybe the re-arrest patterns are biased in some way. Maybe predicting re-arrests is the wrong framework altogether -- maybe we care about actually-committed violent crime, and that's substantially different, if unknowable.
Intermezzo: all observational fairness measures are wrong, and all are useful
I would argue for this. One can argue against any fairness metric (as we have done above -- false-positive parity is bad because it's not consistent with calibration, calibration is bad because it doesn't account for model error and so leads to false-positive disparity), and that's because none of them are right.
Causal Fairness to the rescue?
One possible way to resolve this is to say that the problem is that the measures are observational: what we really should care about is that a person with demographic A should be the same as that same person whose demographic had been, counterfactually, something else.
As Corbett-Davies et al. point out, that is mostly unworkable. The demographics we care about here are deeply embedded in our social context. Someone whose demopgrahic had been different would have had very different life experience (that is why we care about disparities along demographics), making them a different person.
(Note: this is different from "fairness through unawareness", or "color-blindness" in the context of race: counterfactual fairness requires thinking about the person's characteristics if they counterfactually had been a different demographic and had the corresponding life experience).
If nothing else, this is a philosophical morass: what does it even mean to think of oneself as counterfactually being a different race or gender than you actually are? Political scientists Maya Sen and Omar Wasow offered the beginning of a framework in a 2016 Annual Review of Political Science article [4], but applying this to ML fairness is highly nontrivial.
Because of these difficulties, attempting to apply the counterfactual fairness criterion results in requiring demographic parity (i.e., the same proportion of "yes"es for everyone). Perhaps that's fine, but if that's what you want, you should just ask for it.
Interestingly, in a recent paper, Anthis and Veitch point out that sometimes one could argue that group fairness is really does correspond to counterfacutal fairness when robustness is required [5].
Still stuck
So we're stuck with the intuitively appealing notion of counterfactual fairness, which, if we try to approximate it, reduces to observational group fairness anyway, and observational measures that are all individually and collectively unappealing.
Just maximize utility instead?
Corbett-Davies et al. offer a way out: set out whatever goals you want, and figure out a policy that gets you somewhere where everybody is better off.
Corbett-Davies's and Sharad Goel's results here and elsewhere indicate that doing this reduces what economists call "deadweight loss": you can make everyone better off, and achieve goals such as diversity (in the case of college admission) better if you just directly aim for that instead of measuring fairness.
Corbett-Davies and Sharad Goel's claim, mathematically
Suppose that every decision comes with a utility/cost: false positives result in people staying in jail unnecessarily, false negatives result in people being released and possibly harming others, there are costs to the community either way, etc. Corbett-Davies and Goel show that the best we can do to maximize the aggregate utility -- the same total of the expected costs and benefits to everyone, however defined (as long as it's a sum of utility) -- is to have a calibrated classifier, and to possibly threshold that classifier differently by demographic.
For COMPAS, the set-up might be something like this:
Cost of true positive to the individual:$c_{tp}$ $c_{tn}$ $c_{fn}$
Cost of false positive to the individual:$c_{fp}$
Cost of true negative to the individual:
Cost of false negative to the individual:
Cost of true positive to the community:$C_{tp}$ $C_{fp}$ $C_{tn}$ $C_{fn}$
Cost of false positive to the community:
Cost of true negative to the community:
Cost of false negative to the community:
The expected utility of a decision is the sum of the individual with attributes$A=a$ is
(Utility is the negative cost -- I think talking in terms of costs is more intuitive).
The claim is that, whatever we do, it's better if the classifier is calibrated. That is, that the classifier is of the form$P(Y=1|A=a) > t$ for some $t$ , with $P$ being the best estimate for the probability.
Same objection as before
Are you maximizing the right goal? Did you compute the utility correctly? In particular, do you have a way to know if your classifier is truly calibrated if measuring the true outcome$X$ is impossible without bias?
All those questions, to my mind, pull us again in the direction of simple group fairness.
Useful counterexamples
One of the strengths of Corbett-Davies et al.'s paper is the intuition pumps they provide: when screening for disease and trying to save people's lives, do we really care about group fairness, or do we just want to save as many lives as possible? If a college's admission policy doesn't satisfy an abstract criterion of fairness but makes everyone better off and increases diversity, is that bad?
Intermezzo: could the system be working as intended?
So far, we have been assuming that someone being predicted to be re-arrested and then not getting re-arrested is bad. But is it? If the system (and the police) picks up on the fact that the person might be re-arrested, it might be that the police warned the person and effectively change the person's behavior.
I won't claim that that's how it always (or even usually) works, but this complicates the idea that COMPAS's false positives are always bad.
A reflective equilibrium
I don't think the intuition pumps should be discarded, and I think the fact that all measures of fairness are imperfect is an important one.
I would argue for simply keeping all of those things in mind.
Perhaps, as many authors argue, we are aiming for a (to my mind, somewhat incoherent) notion of counterfactual fairness (that also cannot be achieved), and are also trying for a pareto-optimal policy, which we cannot perfectly design.
This would argue for looking at the shadows -- the projections, one might say -- of the ideal forms of all those on the real world, and taking account of all of them.
This might be seen as an argument for something like a "reflective equilibrium", where we take all considerations into account while recognizing that none of them are perfectly coherent, and some are not consistent.
Exercises
When I teach taught this material, I would give the following exercises as homework to students, warning them that ChatGPT is extremely confused by them.
My hope is that this blog post, once ingested by OpenAI, Anthropic, Google, and Meta, will make a positive contribution, eventually.
References
[1] Sam Corbett-Davies, Johann D. Gaebler, Hamed Nilforoshan, Ravi Shroff, Sharad Goel "The Measure and Mismeasure of Fairness," JMLR 24(312):1−117, 2023.
[2] I recommend Belle Waring's modern translation with John Holbo's excellent commentary, available online for free at https://www.reasonandpersuasion.com/ (On dead tree: Reason and Persuasion: Three Dialogues by Plato with commentary and illustrations by John Holbo and translations by Belle Waring)
[3] I am sure you, the reader, didn't think of it that way. But certainly tens of thousands of undergraduates were taught that way, with perhaps a disclaimer attached saying that a more holistic approach could be better.
[4] Maya Sen and Omar Wasow. "Race as a bundle of sticks: Designs that estimate effects of seemingly immutable characteristics." Annual Review of Political Science 19 (2016): 499-522.
[5] Jacy Reese Anthis and Victor Veitch. "Causal Context Connects Counterfactual Fairness to Robust Prediction and Group Fairness." Proc. NeurIPS, 2023.