|
| 1 | +# Data flow-based exhaustiveness checking |
| 2 | + |
| 3 | +* **Type**: Design proposal |
| 4 | +* **Author**: Alejandro Serrano |
| 5 | +* **Contributors**: Marat Akhin, Nikita Bobko, Nikolay Lunyak, Brian Norman, Roman Venediktov |
| 6 | +* **Discussion**: |
| 7 | +* **Status**: Implemented in 2.2.20 |
| 8 | +* **Related YouTrack issue**: |
| 9 | + [KT-8781](https://youtrack.jetbrains.com/issue/KT-8781/Consider-making-smart-casts-smart-enough-to-handle-exhaustive-value-sets), |
| 10 | + [KTIJ-20749](https://youtrack.jetbrains.com/issue/KTIJ-20749/Exhaustive-when-check-does-not-take-into-account-the-values-excluded-by-previous-if-conditions) |
| 11 | + (among others) |
| 12 | + |
| 13 | +## Abstract |
| 14 | + |
| 15 | +Exhaustiveness checking as currently implemented in Kotlin works in a _local_ |
| 16 | +fashion, taking into account only the branches of the `when` being inspected. |
| 17 | +We propose to take a more global approach, by augmenting the current smart |
| 18 | +casting framework with _negative_ information. |
| 19 | + |
| 20 | +## Table of contents |
| 21 | + |
| 22 | +* [Abstract](#abstract) |
| 23 | +* [Table of contents](#table-of-contents) |
| 24 | +* [Motivation](#motivation) |
| 25 | +* [Proposal](#proposal) |
| 26 | +* [Technical details](#technical-details) |
| 27 | + * [Saving negative information](#saving-negative-information) |
| 28 | + * [In specification terms](#in-specification-terms) |
| 29 | + * [In K2 terms](#in-k2-terms) |
| 30 | + * [Merging information from branches](#merging-information-from-branches) |
| 31 | + * [Exhaustiveness checking](#exhaustiveness-checking) |
| 32 | + * [The problems of stability](#the-problems-of-stability) |
| 33 | +* [Further improvements](#further-improvements) |
| 34 | + * [Tracking disjunctions](#tracking-disjunctions) |
| 35 | + * [Using negative information to smart cast](#using-negative-information-to-smart-cast) |
| 36 | + |
| 37 | +## Motivation |
| 38 | + |
| 39 | +_Exhaustiveness_ is an important property of a `when` expression. The |
| 40 | +[current algorithm](https://kotlinlang.org/spec/expressions.html#exhaustive-when-expressions) |
| 41 | +works in a _local_ fashion, by inspecting the branches of each `when` |
| 42 | +separately, and comparing it with the (known) type of the subject. Alas, |
| 43 | +this analysis fails to consider cases where it is statically known that a |
| 44 | +subject may _not_ have a certain form (usually because of an early `return`). |
| 45 | + |
| 46 | +```kotlin |
| 47 | +enum class Enum { A, B, C } |
| 48 | + |
| 49 | +fun f(e: Enum): Int { |
| 50 | + if (e == Enum.A) return 1 |
| 51 | + return when (e) { |
| 52 | + Enum.B -> 2 |
| 53 | + Enum.C -> 3 |
| 54 | + } |
| 55 | +} |
| 56 | +``` |
| 57 | + |
| 58 | +In this and other cases (see the related YouTrack issues) the compiler fails |
| 59 | +to check what intuitively the developer sees: that in a particular `when` |
| 60 | +expression the flow of the program already makes it impossible for some cases |
| 61 | +to occur. Our goal with this KEEP is to provide an alternative algorithm for |
| 62 | +exhaustiveness which uses that information, by building upon Kotlin's |
| 63 | +[smart cast framework](https://kotlinlang.org/spec/type-inference.html#smart-casts). |
| 64 | + |
| 65 | +## Proposal |
| 66 | + |
| 67 | +Currently the compiler keeps track of a set of types which refine type |
| 68 | +information for expressions, which we shall refer to from this moment on as the |
| 69 | +**positive** information. |
| 70 | +The key part of this proposal is to extend this smart cast information held by |
| 71 | +the compiler with **negative** information. This new piece of information is |
| 72 | +composed of two parts. |
| 73 | + |
| 74 | +1. A set of **types**, which describe those types which an expression definitely |
| 75 | + does not have. |
| 76 | +2. A set of **constant values** (either literals or enumeration entries) which |
| 77 | + the expression definitely **does not have**. |
| 78 | + |
| 79 | +Once the negative information is saved, [exhaustiveness checking] |
| 80 | +(https://kotlinlang.org/spec/expressions.html#exhaustive-when-expressions) |
| 81 | +may also be updated. We do not modify the definition of _exhaustive_ as laid |
| 82 | +down by the specification, however the **source** of information should not |
| 83 | +be the branches themselves, but rather this new negative information. |
| 84 | + |
| 85 | +Consider the `f` function defined above. If the flow of the program does not |
| 86 | +go through the body of the `if`, we know that `Enum.A` is now a constant value |
| 87 | +that the expression definitely does not have. This reasoning can be extended |
| 88 | +to the point after each branch condition. (The comments between the |
| 89 | +`when` branches represent the point in which the previous branch is known |
| 90 | +to be false yet the new branch is not yet considered.) |
| 91 | + |
| 92 | +```kotlin |
| 93 | +fun f(e: Enum): Int { |
| 94 | + if (e == Enum.A) return 1 |
| 95 | + // we know [ e != Enum.A ] |
| 96 | + return when (e) { |
| 97 | + Enum.B -> 2 |
| 98 | + // we know [ e != Enum.A, != Enum.B ] |
| 99 | + Enum.C -> 3 |
| 100 | + // we know [ e != Enum.A, != Enum.B, != Enum.C ] |
| 101 | + } |
| 102 | +} |
| 103 | +``` |
| 104 | + |
| 105 | +After the last branch the compiler knows that all possible values for the type |
| 106 | +of `e` are now out of reach. Thus, that `when` expression is marked as |
| 107 | +exhaustive. |
| 108 | + |
| 109 | +In this example we only use the "constant value side" of negative information. |
| 110 | +The "types side" is necessary to check exhaustiveness when `is` is involved. |
| 111 | + |
| 112 | +```kotlin |
| 113 | +sealed interface Status { |
| 114 | + data object Error : Status |
| 115 | + data class Ok(val data: String) : Status |
| 116 | +} |
| 117 | + |
| 118 | +fun g(status: Status): String? = when (status) { |
| 119 | + Status.Error -> null |
| 120 | + // we know [ status != Status.Error ] |
| 121 | + is Status.Ok -> status.data |
| 122 | + // we know [ status != Status.Error, !is Status.Ok ] |
| 123 | +} |
| 124 | +``` |
| 125 | + |
| 126 | +We strongly emphasize that this KEEP only proposes to use the negative |
| 127 | +component to check exhaustiveness. Smart casts still depend only of the |
| 128 | +positive component. |
| 129 | + |
| 130 | +## Technical details |
| 131 | + |
| 132 | +In this section we describe more technical details about how to store and use |
| 133 | +the negative information described in the _Proposal_ section. |
| 134 | + |
| 135 | +### Saving negative information |
| 136 | + |
| 137 | +> [!IMPORTANT] |
| 138 | +> This part is described in terms of the specifications. Implementations may |
| 139 | +> differ in how smart casts are actually propagated. |
| 140 | +
|
| 141 | +#### In specification terms |
| 142 | + |
| 143 | +The proposal is an extension of the already existing [smart cast framework] |
| 144 | +(https://kotlinlang.org/spec/type-inference.html#smart-casts), so we shall |
| 145 | +only describe the additions. |
| 146 | + |
| 147 | +The smart cast lattice gets a third component: a set of constant values that |
| 148 | +the corresponding expression may not have. Order, union, and intersection are |
| 149 | +extended naturally by considering the equivalent operations over that third |
| 150 | +component. |
| 151 | + |
| 152 | +The transfer functions are updated include negative constant values if |
| 153 | +possible, |
| 154 | + |
| 155 | +``` |
| 156 | +[[assume(x != C)]] = s[x -> s(x) ⨅ (⊤ × ⊤ × { C })], if C is a constant value |
| 157 | +[[assume(C != y)]] = s[y -> s(y) ⨅ (⊤ × ⊤ × { C })], if C is a constant value |
| 158 | +
|
| 159 | +[[assume(x !== C)]] = s[x -> s(x) ⨅ (⊤ × ⊤ × { C })], if C is a constant value |
| 160 | +[[assume(C !== y)]] = s[y -> s(y) ⨅ (⊤ × ⊤ × { C })], if C is a constant value |
| 161 | +``` |
| 162 | + |
| 163 | +#### In K2 terms |
| 164 | + |
| 165 | +The smart cast framework has been |
| 166 | +[completely re-architectured](https://youtrack.jetbrains.com/issue/KT-57516) |
| 167 | +for version 2.0 of the Kotlin compiler. In particular, nodes in the control |
| 168 | +flow graph store both type information (as in the specification) but also |
| 169 | +**implications** that describe the different scenarios. These implications |
| 170 | +often refer to the outcome of the expression as a whole, which we shall call |
| 171 | +`$result` in the description below. |
| 172 | + |
| 173 | +For example, here is a simplified version of the rule for Boolean negation, |
| 174 | +which related the result of the whole expression with the value of the argument. |
| 175 | + |
| 176 | +``` |
| 177 | +[[ !e ]] = [ $result == true ==> e == false , $result == false ==> e == true ] |
| 178 | +``` |
| 179 | + |
| 180 | +By recording those implications, `assume` nodes are no longer required. |
| 181 | +Instead, implications are checked whenever new information becomes available. |
| 182 | +For example, when we are within the body of a `when` branch, we know that the |
| 183 | +condition was true. |
| 184 | + |
| 185 | +The transfer functions are updated to include negative information. In the |
| 186 | +description below we use `e is T`, `e !is T`, and `e != T` to refer to the |
| 187 | +positive, negative types, and negative constant value parts of the smart cast |
| 188 | +information, respectively. |
| 189 | + |
| 190 | +``` |
| 191 | +[[ e is T ]] = [ $result == true ==> e is T , $result == false ==> e !is T ] |
| 192 | +[[ e !is T ]] = [ $result == true ==> e !is T , $result == false ==> e is T ] |
| 193 | +
|
| 194 | +// for enumeration values and literals |
| 195 | +[[ e == C ]] = [ $result == true ==> e is type(C) , $result == false ==> e != C ] |
| 196 | +[[ e != C ]] = [ $result == true ==> e != C , $result == false ==> e is type(C) ] |
| 197 | +
|
| 198 | +// for objects |
| 199 | +[[ e == O ]] = [ $result == true ==> e is O , $result == false ==> e !is O ] |
| 200 | +[[ e != O ]] = [ $result == true ==> e !is O , $result == false ==> e is O ] |
| 201 | +``` |
| 202 | + |
| 203 | +Note that if the `equals` method has been overridden in the type of `e` |
| 204 | +(as discussed [below](#the-problems-of-stability)) |
| 205 | +in any of the rules for enumeration values, literals, and object, |
| 206 | +then the corresponding positive information (`is type(C)`, `is O`) should |
| 207 | +**not** be produced. |
| 208 | + |
| 209 | +#### Merging information from branches |
| 210 | + |
| 211 | +Merging negative information from different branches poses a challenge, |
| 212 | +especially with non-flat sealed hierarchies. In particular, trying to compute |
| 213 | +the precise joined negative information can easily devolve into an exponential |
| 214 | +algorithm. We expect implementations to use a simpler but slightly inaccurate |
| 215 | +procedure in this situation; but at the very least flat hierarchies and |
| 216 | +enumeration classes should be covered. |
| 217 | + |
| 218 | +### Exhaustiveness checking |
| 219 | + |
| 220 | +As discussed in the _Proposal_ section, the only change about exhaustiveness |
| 221 | +checking proper is that the source of information now comes from the negative |
| 222 | +information instead of by inspecting the source. |
| 223 | + |
| 224 | +There is one technical detail, which is how to obtain the smart cast information |
| 225 | +after the last branch condition, which is not allocated its own node in the |
| 226 | +data flow framework. |
| 227 | +- If the `when` expression has an `else` branch, the source is equal to the |
| 228 | + smart cast information about the subject at that point. (But in that case |
| 229 | + we do not really need to perform exhaustiveness checking anyway). |
| 230 | +- Otherwise, we consider the program point right _before the last_ condition, |
| 231 | + and extend it by considering that condition as false. |
| 232 | + |
| 233 | +### The problems of stability |
| 234 | + |
| 235 | +One difference between how smart cast and exhaustiveness check are described |
| 236 | +in the specification is that only the former takes into account whether the |
| 237 | +value under consideration is |
| 238 | +[stable](https://kotlinlang.org/spec/type-inference.html#smart-cast-sink-stability). |
| 239 | +In general we cannot derive smart cast information from values that "may |
| 240 | +change" (mutable variables, open getters), since that information may no longer |
| 241 | +be true a moment later. |
| 242 | + |
| 243 | +This impacts exhaustiveness checking in two ways. The first one is that, |
| 244 | +without further refinement, we would never be able to compute exhaustiveness |
| 245 | +of non-stable expressions. For example, no smart cast would be saved in this |
| 246 | +case, so the `when` would be marked as non-exhaustive. |
| 247 | + |
| 248 | +```kotlin |
| 249 | +when (unstableExpression) { |
| 250 | + Enum.A -> 1 |
| 251 | + Enum.B -> 2 |
| 252 | + Enum.C -> 3 |
| 253 | +} |
| 254 | +``` |
| 255 | + |
| 256 | +A relatively easy solution is to consider that an intermediate stable reference |
| 257 | +is always produced in `when`. Intuitively, it is equivalent to always introducing a variable in the subject. |
| 258 | + |
| 259 | +```kotlin |
| 260 | +when (val $subject$ = unstableExpression) { |
| 261 | + Enum.A -> 1 |
| 262 | + // [ $subject$ -> !Enum.A ] |
| 263 | + Enum.B -> 2 |
| 264 | + // [ $subject$ -> !Enum.A, !Enum.B ] |
| 265 | + Enum.C -> 3 |
| 266 | + // [ $subject$ -> !Enum.A, !Enum.B, !Enum.C ] |
| 267 | +} |
| 268 | +``` |
| 269 | + |
| 270 | +Note that this does _not_ imply that smart cast on unstable properties |
| 271 | +suddenly start working on `when`. If the `unstableExpression` is one such |
| 272 | +reference, the compiler should _not_ transfer information from the new |
| 273 | +`$subject` into the unstable reference, keeping our casts safe. |
| 274 | + |
| 275 | +The second problem related to stability is that we may only derive smart cast |
| 276 | +information from an equality _if `equals` has not been overriden_. To be |
| 277 | +completely correct the same requirement should be imposed for negative |
| 278 | +information. However, this would reject some code which is now accepted, since |
| 279 | +this requirement was not imposed on exhaustiveness checking. |
| 280 | + |
| 281 | +As a compromise, we propose to track whether negative information comes from |
| 282 | +such a scenario. We refer to such information as _tainted_. If exhaustiveness |
| 283 | +checks use that information in any form, a warning should be reported. |
| 284 | +This does not impact (type) safety, as long as we keep smart cast constrainted |
| 285 | +to the positive component. |
| 286 | + |
| 287 | +## Further improvements |
| 288 | + |
| 289 | +### Tracking disjunctions |
| 290 | + |
| 291 | +In this proposal we only improve exhaustiveness when it can be described by |
| 292 | +the new negative information. This design leaves out cases like the following: |
| 293 | + |
| 294 | +```kotlin |
| 295 | +fun f(e: Enum): Int { |
| 296 | + if (e == Enum.B || e == Enum.C) { |
| 297 | + return when (e) { |
| 298 | + Enum.B -> 2 |
| 299 | + Enum.C -> 3 |
| 300 | + } |
| 301 | + } else { |
| 302 | + return 1 |
| 303 | + } |
| 304 | +} |
| 305 | +``` |
| 306 | + |
| 307 | +In order to handle these cases, we would need for our data flow analysis to |
| 308 | +track _disjunctive_ information -- right now it only handles implications and |
| 309 | +conjunctions. This is quite a big investment, but the payoff is unclear. |
| 310 | + |
| 311 | +### Using negative information to smart cast |
| 312 | + |
| 313 | +A fair question is whether negative information could be used not only for |
| 314 | +exhaustiveness checking, but for "regular" smart cast. For example, one may |
| 315 | +argue that the function `g` above may be rewritten as follows, and the compiler |
| 316 | +has enough information to understand that `status` must then be `Status.Ok`. |
| 317 | + |
| 318 | +```kotlin |
| 319 | +fun g(status: Status): String? = when (status) { |
| 320 | + Status.Error -> null |
| 321 | + // we know [ status != Status.Error ] |
| 322 | + else -> status.data |
| 323 | +} |
| 324 | +``` |
| 325 | + |
| 326 | +The main argument against such a feature is _future compatibility_. If in the |
| 327 | +future the sealed hierarchy is extended with another element, then the code |
| 328 | +is no longer valid. In fact, it may break at runtime with some sort of cast |
| 329 | +exception -- exactly the kind of problem Kotlin tries to avoid. |
| 330 | + |
| 331 | +> [!NOTE] |
| 332 | +> The compiler currently throws `NoWhenBranchMatchedException` to protect |
| 333 | +> for future extensions of hierarchies. However, it is not clear what should |
| 334 | +> be done if a final cast is done using negative information instead. |
| 335 | +
|
| 336 | +Still, there are some cases in which we are sure that no extension is going |
| 337 | +to be performed in the future, such as: |
| 338 | + |
| 339 | +- Error-like types like `Result`, `Either`, |
| 340 | + or the [new proposal for unions for errors](https://youtrack.jetbrains.com/issue/KT-68296); |
| 341 | +- Implementation of (linked list) using `Cons` and `Nil`. |
| 342 | + |
| 343 | +For those cases, we may introduce a `@AllowNegativeInformationSmartCast` |
| 344 | +annotation which signals this ability. |
| 345 | + |
| 346 | +We think, though, that there is no high demand for such a feature. First of |
| 347 | +all, the applicability is very reduced. Furthermore, currently Kotlin does |
| 348 | +not perform such smart casts, and people seem happy with the _status quo_ |
| 349 | +(the only exception being nullability, which is treated especially by K2). |
| 350 | +Finally, we need to carefully consider whether negative information is tainted |
| 351 | +before smart casting. |
0 commit comments