Skip to content

Commit aa82fe9

Browse files
committed
Data flow-based exhaustiveness checking
1 parent d6b541e commit aa82fe9

File tree

1 file changed

+351
-0
lines changed

1 file changed

+351
-0
lines changed
Lines changed: 351 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,351 @@
1+
# Data flow-based exhaustiveness checking
2+
3+
* **Type**: Design proposal
4+
* **Author**: Alejandro Serrano
5+
* **Contributors**: Marat Akhin, Nikita Bobko, Nikolay Lunyak, Brian Norman, Roman Venediktov
6+
* **Discussion**:
7+
* **Status**: Implemented in 2.2.20
8+
* **Related YouTrack issue**:
9+
[KT-8781](https://youtrack.jetbrains.com/issue/KT-8781/Consider-making-smart-casts-smart-enough-to-handle-exhaustive-value-sets),
10+
[KTIJ-20749](https://youtrack.jetbrains.com/issue/KTIJ-20749/Exhaustive-when-check-does-not-take-into-account-the-values-excluded-by-previous-if-conditions)
11+
(among others)
12+
13+
## Abstract
14+
15+
Exhaustiveness checking as currently implemented in Kotlin works in a _local_
16+
fashion, taking into account only the branches of the `when` being inspected.
17+
We propose to take a more global approach, by augmenting the current smart
18+
casting framework with _negative_ information.
19+
20+
## Table of contents
21+
22+
* [Abstract](#abstract)
23+
* [Table of contents](#table-of-contents)
24+
* [Motivation](#motivation)
25+
* [Proposal](#proposal)
26+
* [Technical details](#technical-details)
27+
* [Saving negative information](#saving-negative-information)
28+
* [In specification terms](#in-specification-terms)
29+
* [In K2 terms](#in-k2-terms)
30+
* [Merging information from branches](#merging-information-from-branches)
31+
* [Exhaustiveness checking](#exhaustiveness-checking)
32+
* [The problems of stability](#the-problems-of-stability)
33+
* [Further improvements](#further-improvements)
34+
* [Tracking disjunctions](#tracking-disjunctions)
35+
* [Using negative information to smart cast](#using-negative-information-to-smart-cast)
36+
37+
## Motivation
38+
39+
_Exhaustiveness_ is an important property of a `when` expression. The
40+
[current algorithm](https://kotlinlang.org/spec/expressions.html#exhaustive-when-expressions)
41+
works in a _local_ fashion, by inspecting the branches of each `when`
42+
separately, and comparing it with the (known) type of the subject. Alas,
43+
this analysis fails to consider cases where it is statically known that a
44+
subject may _not_ have a certain form (usually because of an early `return`).
45+
46+
```kotlin
47+
enum class Enum { A, B, C }
48+
49+
fun f(e: Enum): Int {
50+
if (e == Enum.A) return 1
51+
return when (e) {
52+
Enum.B -> 2
53+
Enum.C -> 3
54+
}
55+
}
56+
```
57+
58+
In this and other cases (see the related YouTrack issues) the compiler fails
59+
to check what intuitively the developer sees: that in a particular `when`
60+
expression the flow of the program already makes it impossible for some cases
61+
to occur. Our goal with this KEEP is to provide an alternative algorithm for
62+
exhaustiveness which uses that information, by building upon Kotlin's
63+
[smart cast framework](https://kotlinlang.org/spec/type-inference.html#smart-casts).
64+
65+
## Proposal
66+
67+
Currently the compiler keeps track of a set of types which refine type
68+
information for expressions, which we shall refer to from this moment on as the
69+
**positive** information.
70+
The key part of this proposal is to extend this smart cast information held by
71+
the compiler with **negative** information. This new piece of information is
72+
composed of two parts.
73+
74+
1. A set of **types**, which describe those types which an expression definitely
75+
does not have.
76+
2. A set of **constant values** (either literals or enumeration entries) which
77+
the expression definitely **does not have**.
78+
79+
Once the negative information is saved, [exhaustiveness checking]
80+
(https://kotlinlang.org/spec/expressions.html#exhaustive-when-expressions)
81+
may also be updated. We do not modify the definition of _exhaustive_ as laid
82+
down by the specification, however the **source** of information should not
83+
be the branches themselves, but rather this new negative information.
84+
85+
Consider the `f` function defined above. If the flow of the program does not
86+
go through the body of the `if`, we know that `Enum.A` is now a constant value
87+
that the expression definitely does not have. This reasoning can be extended
88+
to the point after each branch condition. (The comments between the
89+
`when` branches represent the point in which the previous branch is known
90+
to be false yet the new branch is not yet considered.)
91+
92+
```kotlin
93+
fun f(e: Enum): Int {
94+
if (e == Enum.A) return 1
95+
// we know [ e != Enum.A ]
96+
return when (e) {
97+
Enum.B -> 2
98+
// we know [ e != Enum.A, != Enum.B ]
99+
Enum.C -> 3
100+
// we know [ e != Enum.A, != Enum.B, != Enum.C ]
101+
}
102+
}
103+
```
104+
105+
After the last branch the compiler knows that all possible values for the type
106+
of `e` are now out of reach. Thus, that `when` expression is marked as
107+
exhaustive.
108+
109+
In this example we only use the "constant value side" of negative information.
110+
The "types side" is necessary to check exhaustiveness when `is` is involved.
111+
112+
```kotlin
113+
sealed interface Status {
114+
data object Error : Status
115+
data class Ok(val data: String) : Status
116+
}
117+
118+
fun g(status: Status): String? = when (status) {
119+
Status.Error -> null
120+
// we know [ status != Status.Error ]
121+
is Status.Ok -> status.data
122+
// we know [ status != Status.Error, !is Status.Ok ]
123+
}
124+
```
125+
126+
We strongly emphasize that this KEEP only proposes to use the negative
127+
component to check exhaustiveness. Smart casts still depend only of the
128+
positive component.
129+
130+
## Technical details
131+
132+
In this section we describe more technical details about how to store and use
133+
the negative information described in the _Proposal_ section.
134+
135+
### Saving negative information
136+
137+
> [!IMPORTANT]
138+
> This part is described in terms of the specifications. Implementations may
139+
> differ in how smart casts are actually propagated.
140+
141+
#### In specification terms
142+
143+
The proposal is an extension of the already existing [smart cast framework]
144+
(https://kotlinlang.org/spec/type-inference.html#smart-casts), so we shall
145+
only describe the additions.
146+
147+
The smart cast lattice gets a third component: a set of constant values that
148+
the corresponding expression may not have. Order, union, and intersection are
149+
extended naturally by considering the equivalent operations over that third
150+
component.
151+
152+
The transfer functions are updated include negative constant values if
153+
possible,
154+
155+
```
156+
[[assume(x != C)]] = s[x -> s(x) ⨅ (⊤ × ⊤ × { C })], if C is a constant value
157+
[[assume(C != y)]] = s[y -> s(y) ⨅ (⊤ × ⊤ × { C })], if C is a constant value
158+
159+
[[assume(x !== C)]] = s[x -> s(x) ⨅ (⊤ × ⊤ × { C })], if C is a constant value
160+
[[assume(C !== y)]] = s[y -> s(y) ⨅ (⊤ × ⊤ × { C })], if C is a constant value
161+
```
162+
163+
#### In K2 terms
164+
165+
The smart cast framework has been
166+
[completely re-architectured](https://youtrack.jetbrains.com/issue/KT-57516)
167+
for version 2.0 of the Kotlin compiler. In particular, nodes in the control
168+
flow graph store both type information (as in the specification) but also
169+
**implications** that describe the different scenarios. These implications
170+
often refer to the outcome of the expression as a whole, which we shall call
171+
`$result` in the description below.
172+
173+
For example, here is a simplified version of the rule for Boolean negation,
174+
which related the result of the whole expression with the value of the argument.
175+
176+
```
177+
[[ !e ]] = [ $result == true ==> e == false , $result == false ==> e == true ]
178+
```
179+
180+
By recording those implications, `assume` nodes are no longer required.
181+
Instead, implications are checked whenever new information becomes available.
182+
For example, when we are within the body of a `when` branch, we know that the
183+
condition was true.
184+
185+
The transfer functions are updated to include negative information. In the
186+
description below we use `e is T`, `e !is T`, and `e != T` to refer to the
187+
positive, negative types, and negative constant value parts of the smart cast
188+
information, respectively.
189+
190+
```
191+
[[ e is T ]] = [ $result == true ==> e is T , $result == false ==> e !is T ]
192+
[[ e !is T ]] = [ $result == true ==> e !is T , $result == false ==> e is T ]
193+
194+
// for enumeration values and literals
195+
[[ e == C ]] = [ $result == true ==> e is type(C) , $result == false ==> e != C ]
196+
[[ e != C ]] = [ $result == true ==> e != C , $result == false ==> e is type(C) ]
197+
198+
// for objects
199+
[[ e == O ]] = [ $result == true ==> e is O , $result == false ==> e !is O ]
200+
[[ e != O ]] = [ $result == true ==> e !is O , $result == false ==> e is O ]
201+
```
202+
203+
Note that if the `equals` method has been overridden in the type of `e`
204+
(as discussed [below](#the-problems-of-stability))
205+
in any of the rules for enumeration values, literals, and object,
206+
then the corresponding positive information (`is type(C)`, `is O`) should
207+
**not** be produced.
208+
209+
#### Merging information from branches
210+
211+
Merging negative information from different branches poses a challenge,
212+
especially with non-flat sealed hierarchies. In particular, trying to compute
213+
the precise joined negative information can easily devolve into an exponential
214+
algorithm. We expect implementations to use a simpler but slightly inaccurate
215+
procedure in this situation; but at the very least flat hierarchies and
216+
enumeration classes should be covered.
217+
218+
### Exhaustiveness checking
219+
220+
As discussed in the _Proposal_ section, the only change about exhaustiveness
221+
checking proper is that the source of information now comes from the negative
222+
information instead of by inspecting the source.
223+
224+
There is one technical detail, which is how to obtain the smart cast information
225+
after the last branch condition, which is not allocated its own node in the
226+
data flow framework.
227+
- If the `when` expression has an `else` branch, the source is equal to the
228+
smart cast information about the subject at that point. (But in that case
229+
we do not really need to perform exhaustiveness checking anyway).
230+
- Otherwise, we consider the program point right _before the last_ condition,
231+
and extend it by considering that condition as false.
232+
233+
### The problems of stability
234+
235+
One difference between how smart cast and exhaustiveness check are described
236+
in the specification is that only the former takes into account whether the
237+
value under consideration is
238+
[stable](https://kotlinlang.org/spec/type-inference.html#smart-cast-sink-stability).
239+
In general we cannot derive smart cast information from values that "may
240+
change" (mutable variables, open getters), since that information may no longer
241+
be true a moment later.
242+
243+
This impacts exhaustiveness checking in two ways. The first one is that,
244+
without further refinement, we would never be able to compute exhaustiveness
245+
of non-stable expressions. For example, no smart cast would be saved in this
246+
case, so the `when` would be marked as non-exhaustive.
247+
248+
```kotlin
249+
when (unstableExpression) {
250+
Enum.A -> 1
251+
Enum.B -> 2
252+
Enum.C -> 3
253+
}
254+
```
255+
256+
A relatively easy solution is to consider that an intermediate stable reference
257+
is always produced in `when`. Intuitively, it is equivalent to always introducing a variable in the subject.
258+
259+
```kotlin
260+
when (val $subject$ = unstableExpression) {
261+
Enum.A -> 1
262+
// [ $subject$ -> !Enum.A ]
263+
Enum.B -> 2
264+
// [ $subject$ -> !Enum.A, !Enum.B ]
265+
Enum.C -> 3
266+
// [ $subject$ -> !Enum.A, !Enum.B, !Enum.C ]
267+
}
268+
```
269+
270+
Note that this does _not_ imply that smart cast on unstable properties
271+
suddenly start working on `when`. If the `unstableExpression` is one such
272+
reference, the compiler should _not_ transfer information from the new
273+
`$subject` into the unstable reference, keeping our casts safe.
274+
275+
The second problem related to stability is that we may only derive smart cast
276+
information from an equality _if `equals` has not been overriden_. To be
277+
completely correct the same requirement should be imposed for negative
278+
information. However, this would reject some code which is now accepted, since
279+
this requirement was not imposed on exhaustiveness checking.
280+
281+
As a compromise, we propose to track whether negative information comes from
282+
such a scenario. We refer to such information as _tainted_. If exhaustiveness
283+
checks use that information in any form, a warning should be reported.
284+
This does not impact (type) safety, as long as we keep smart cast constrainted
285+
to the positive component.
286+
287+
## Further improvements
288+
289+
### Tracking disjunctions
290+
291+
In this proposal we only improve exhaustiveness when it can be described by
292+
the new negative information. This design leaves out cases like the following:
293+
294+
```kotlin
295+
fun f(e: Enum): Int {
296+
if (e == Enum.B || e == Enum.C) {
297+
return when (e) {
298+
Enum.B -> 2
299+
Enum.C -> 3
300+
}
301+
} else {
302+
return 1
303+
}
304+
}
305+
```
306+
307+
In order to handle these cases, we would need for our data flow analysis to
308+
track _disjunctive_ information -- right now it only handles implications and
309+
conjunctions. This is quite a big investment, but the payoff is unclear.
310+
311+
### Using negative information to smart cast
312+
313+
A fair question is whether negative information could be used not only for
314+
exhaustiveness checking, but for "regular" smart cast. For example, one may
315+
argue that the function `g` above may be rewritten as follows, and the compiler
316+
has enough information to understand that `status` must then be `Status.Ok`.
317+
318+
```kotlin
319+
fun g(status: Status): String? = when (status) {
320+
Status.Error -> null
321+
// we know [ status != Status.Error ]
322+
else -> status.data
323+
}
324+
```
325+
326+
The main argument against such a feature is _future compatibility_. If in the
327+
future the sealed hierarchy is extended with another element, then the code
328+
is no longer valid. In fact, it may break at runtime with some sort of cast
329+
exception -- exactly the kind of problem Kotlin tries to avoid.
330+
331+
> [!NOTE]
332+
> The compiler currently throws `NoWhenBranchMatchedException` to protect
333+
> for future extensions of hierarchies. However, it is not clear what should
334+
> be done if a final cast is done using negative information instead.
335+
336+
Still, there are some cases in which we are sure that no extension is going
337+
to be performed in the future, such as:
338+
339+
- Error-like types like `Result`, `Either`,
340+
or the [new proposal for unions for errors](https://youtrack.jetbrains.com/issue/KT-68296);
341+
- Implementation of (linked list) using `Cons` and `Nil`.
342+
343+
For those cases, we may introduce a `@AllowNegativeInformationSmartCast`
344+
annotation which signals this ability.
345+
346+
We think, though, that there is no high demand for such a feature. First of
347+
all, the applicability is very reduced. Furthermore, currently Kotlin does
348+
not perform such smart casts, and people seem happy with the _status quo_
349+
(the only exception being nullability, which is treated especially by K2).
350+
Finally, we need to carefully consider whether negative information is tainted
351+
before smart casting.

0 commit comments

Comments
 (0)