diff --git a/src/doc/rustc-dev-guide/ci/sembr/src/main.rs b/src/doc/rustc-dev-guide/ci/sembr/src/main.rs index 6720267e14f3b..4038f112d59fd 100644 --- a/src/doc/rustc-dev-guide/ci/sembr/src/main.rs +++ b/src/doc/rustc-dev-guide/ci/sembr/src/main.rs @@ -159,6 +159,9 @@ fn lengthen_lines(content: &str, limit: usize) -> String { if in_html_div { continue; } + if line.trim_end().ends_with("
") { + continue; + } if ignore(line, in_code_block) || REGEX_SPLIT.is_match(line) { continue; } diff --git a/src/doc/rustc-dev-guide/rust-version b/src/doc/rustc-dev-guide/rust-version index 209f4226eae7a..b6e1b2bc55df4 100644 --- a/src/doc/rustc-dev-guide/rust-version +++ b/src/doc/rustc-dev-guide/rust-version @@ -1 +1 @@ -44e34e1ac6d7e69b40856cf1403d3da145319c30 +c78a29473a68f07012904af11c92ecffa68fcc75 diff --git a/src/doc/rustc-dev-guide/src/building/bootstrapping/what-bootstrapping-does.md b/src/doc/rustc-dev-guide/src/building/bootstrapping/what-bootstrapping-does.md index a5dfd9a0e8329..0623d176403e4 100644 --- a/src/doc/rustc-dev-guide/src/building/bootstrapping/what-bootstrapping-does.md +++ b/src/doc/rustc-dev-guide/src/building/bootstrapping/what-bootstrapping-does.md @@ -1,12 +1,12 @@ # What Bootstrapping does [*Bootstrapping*][boot] is the process of using a compiler to compile itself. -More accurately, it means using an older compiler to compile a newer version of -the same compiler. +More accurately, it means using an older compiler to compile a newer version of the same compiler. This raises a chicken-and-egg paradox: where did the first compiler come from? -It must have been written in a different language. In Rust's case it was -[written in OCaml][ocaml-compiler]. However, it was abandoned long ago, and the +It must have been written in a different language. +In Rust's case, it was [written in OCaml][ocaml-compiler]. +However, it was abandoned long ago, and the only way to build a modern version of `rustc` is with a slightly less modern version. This is exactly how [`./x.py`] works: it downloads the current beta release of @@ -14,8 +14,8 @@ This is exactly how [`./x.py`] works: it downloads the current beta release of [`./x.py`]: https://github.com/rust-lang/rust/blob/HEAD/x.py -Note that this documentation mostly covers user-facing information. See -[bootstrap/README.md][bootstrap-internals] to read about bootstrap internals. +Note that this documentation mostly covers user-facing information. +See [bootstrap/README.md][bootstrap-internals] to read about bootstrap internals. [bootstrap-internals]: https://github.com/rust-lang/rust/blob/HEAD/src/bootstrap/README.md @@ -28,7 +28,8 @@ Note that this documentation mostly covers user-facing information. See - Stage 2: the truly current compiler - Stage 3: the same-result test -Compiling `rustc` is done in stages. Here's a diagram, adapted from Jynn +Compiling `rustc` is done in stages. +Here's a diagram, adapted from Jynn Nelson's [talk on bootstrapping][rustconf22-talk] at RustConf 2022, with detailed explanations below. @@ -36,8 +37,7 @@ The `A`, `B`, `C`, and `D` show the ordering of the stages of bootstrapping. Blue nodes are downloaded, yellow nodes are built with the `stage0` compiler, and green nodes are built with the `stage1` -compiler. +lightgreen; color: black">green nodes are built with the `stage1` compiler. [rustconf22-talk]: https://www.youtube.com/watch?v=oUIjG-y4zaA @@ -61,8 +61,8 @@ graph TD ### Stage 0: the pre-compiled compiler The stage0 compiler is by default the very recent _beta_ `rustc` compiler and its -associated dynamic libraries, which `./x.py` will download for you. (You can -also configure `./x.py` to change stage0 to something else.) +associated dynamic libraries, which `./x.py` will download for you. +(You can also configure `./x.py` to change stage0 to something else.) The precompiled stage0 compiler is then used only to compile [`src/bootstrap`] and [`compiler/rustc`] with precompiled stage0 std. @@ -72,15 +72,15 @@ Therefore, to use a compiler with a std that is freshly built from the tree, you build the stage2 compiler. There are two concepts at play here: a compiler (with its set of dependencies) and its -'target' or 'object' libraries (`std` and `rustc`). Both are staged, but in a staggered manner. +'target' or 'object' libraries (`std` and `rustc`). +Both are staged, but in a staggered manner. [`compiler/rustc`]: https://github.com/rust-lang/rust/tree/HEAD/compiler/rustc [`src/bootstrap`]: https://github.com/rust-lang/rust/tree/HEAD/src/bootstrap ### Stage 1: from current code, by an earlier compiler -The rustc source code is then compiled with the `stage0` compiler to produce the -`stage1` compiler. +The rustc source code is then compiled with the `stage0` compiler to produce the `stage1` compiler. ### Stage 2: the truly current compiler @@ -88,7 +88,8 @@ We then rebuild the compiler using `stage1` compiler with in-tree std to produce compiler. The `stage1` compiler itself was built by precompiled `stage0` compiler and std -and hence not by the source in your working directory. This means that the ABI +and hence not by the source in your working directory. +This means that the ABI generated by the `stage0` compiler may not match the ABI that would have been made by the `stage1` compiler, which can cause problems for dynamic libraries, tests and tools using `rustc_private`. @@ -96,8 +97,8 @@ and tools using `rustc_private`. Note that the `proc_macro` crate avoids this issue with a `C` FFI layer called `proc_macro::bridge`, allowing it to be used with `stage1`. -The `stage2` compiler is the one distributed with `rustup` and all other install -methods. However, it takes a very long time to build because one must first +The `stage2` compiler is the one distributed with `rustup` and all other install methods. +However, it takes a very long time to build because one must first build the new compiler with an older compiler and then use that to build the new compiler with itself. @@ -106,14 +107,14 @@ See [Building the compiler](../how-to-build-and-run.html#building-the-compiler). ### Stage 3: the same-result test -Stage 3 is optional. To sanity check our new compiler we can build the libraries -with the `stage2` compiler. The result ought to be identical to before, unless -something has broken. +Stage 3 is optional. +To sanity check our new compiler, we can build the libraries with the `stage2` compiler. +The result ought to be identical to before, unless something has broken. ### Building the stages -The script [`./x`] tries to be helpful and pick the stage you most likely meant -for each subcommand. Here are some `x` commands with their default stages: +The script [`./x`] tries to be helpful and pick the stage you most likely meant for each subcommand. +Here are some `x` commands with their default stages: - `check`: `--stage 1` - `clippy`: `--stage 1` @@ -126,8 +127,7 @@ for each subcommand. Here are some `x` commands with their default stages: You can always override the stage by passing `--stage N` explicitly. -For more information about stages, [see -below](#understanding-stages-of-bootstrap). +For more information about stages, [see below](#understanding-stages-of-bootstrap). [`./x`]: https://github.com/rust-lang/rust/blob/HEAD/x @@ -135,22 +135,23 @@ below](#understanding-stages-of-bootstrap). Since the build system uses the current beta compiler to build a `stage1` bootstrapping compiler, the compiler source code can't use some features until -they reach beta (because otherwise the beta compiler doesn't support them). On -the other hand, for [compiler intrinsics][intrinsics] and internal features, the -features _have_ to be used. Additionally, the compiler makes heavy use of -`nightly` features (`#![feature(...)]`). How can we resolve this problem? +they reach beta (because otherwise the beta compiler doesn't support them). +On the other hand, for [compiler intrinsics][intrinsics] and internal features, the +features _have_ to be used. +Additionally, the compiler makes heavy use of `nightly` features (`#![feature(...)]`). +How can we resolve this problem? There are two methods used: 1. The build system sets `--cfg bootstrap` when building with `stage0`, so we can use `cfg(not(bootstrap))` to only use features when built with `stage1`. Setting `--cfg bootstrap` in this way is used for features that were just - stabilized, which require `#![feature(...)]` when built with `stage0`, but - not for `stage1`. -2. The build system sets `RUSTC_BOOTSTRAP=1`. This special variable means to + stabilized, which require `#![feature(...)]` when built with `stage0`, but not for `stage1`. +2. The build system sets `RUSTC_BOOTSTRAP=1`. + This special variable means to _break the stability guarantees_ of Rust: allowing use of `#![feature(...)]` - with a compiler that's not `nightly`. _Setting `RUSTC_BOOTSTRAP=1` should - never be used except when bootstrapping the compiler._ + with a compiler that's not `nightly`. + _Setting `RUSTC_BOOTSTRAP=1` should never be used except when bootstrapping the compiler._ [boot]: https://en.wikipedia.org/wiki/Bootstrapping_(compilers) [intrinsics]: ../../appendix/glossary.md#intrinsic @@ -165,15 +166,14 @@ This is a detailed look into the separate bootstrap stages. The convention `./x` uses is that: - A `--stage N` flag means to run the stage N compiler (`stageN/rustc`). -- A "stage N artifact" is a build artifact that is _produced_ by the stage N - compiler. +- A "stage N artifact" is a build artifact that is _produced_ by the stage N compiler. - The stage N+1 compiler is assembled from stage N *artifacts*. This process is called _uplifting_. #### Build artifacts -Anything you can build with `./x` is a _build artifact_. Build artifacts -include, but are not limited to: +Anything you can build with `./x` is a _build artifact_. +Build artifacts include, but are not limited to: - binaries, like `stage0-rustc/rustc-main` - shared objects, like `stage0-sysroot/rustlib/libstd-6fae108520cf72fe.so` @@ -184,26 +184,26 @@ include, but are not limited to: #### Examples -- `./x test tests/ui` means to build the `stage1` compiler and run `compiletest` - on it. If you're working on the compiler, this is normally the test command - you want. +- `./x test tests/ui` means to build the `stage1` compiler and run `compiletest` on it. + If you're working on the compiler, this is normally the test command you want. - `./x test --stage 0 library/std` means to run tests on the standard library - without building `rustc` from source ('build with `stage0`, then test the - artifacts'). If you're working on the standard library, this is normally the - test command you want. + without building `rustc` from source ('build with `stage0`, then test the artifacts'). + If you're working on the standard library, this is normally the test command you want. - `./x build --stage 0` means to build with the stage0 `rustc`. - `./x doc --stage 1` means to document using the stage0 `rustdoc`. #### Examples of what *not* to do - `./x test --stage 0 tests/ui` is not useful: it runs tests on the _beta_ - compiler and doesn't build `rustc` from source. Use `test tests/ui` instead, + compiler and doesn't build `rustc` from source. + Use `test tests/ui` instead, which builds `stage1` from source. - `./x test --stage 0 compiler/rustc` builds the compiler but runs no tests: - it's running `cargo test -p rustc`, but `cargo` doesn't understand Rust's - tests. You shouldn't need to use this, use `test` instead (without arguments). + it's running `cargo test -p rustc`, but `cargo` doesn't understand Rust's tests. + You shouldn't need to use this; use `test` instead (without arguments). - `./x build --stage 0 compiler/rustc` builds the compiler, but does not build - `libstd` or even `libcore`. Most of the time, you'll want `./x build library` + `libstd` or even `libcore`. + Most of the time, you'll want `./x build library` instead, which allows compiling programs without needing to define lang items. ### Building vs. running @@ -219,71 +219,72 @@ In each stage besides 0, two major steps are performed: This is somewhat intuitive if one thinks of the stage N artifacts as "just" another program we are building with the stage N compiler: `build --stage N -compiler/rustc` is linking the stage N artifacts to the `std` built by the stage -N compiler. +compiler/rustc` is linking the stage N artifacts to the `std` built by the stage N compiler. ### Stages and `std` Note that there are two `std` libraries in play here: -1. The library _linked_ to `stageN/rustc`, which was built by stage N-1 (stage - N-1 `std`) +1. The library _linked_ to `stageN/rustc`, which was built by stage N-1 (stage N-1 `std`) 2. The library _used to compile programs_ with `stageN/rustc`, which was built by stage N (stage N `std`). -Stage N `std` is pretty much necessary for any useful work with the stage N -compiler. Without it, you can only compile programs with `#![no_core]` -- not -terribly useful! +Stage N `std` is pretty much necessary for any useful work with the stage N compiler. +Without it, you can only compile programs with `#![no_core]` -- not terribly useful! The reason these need to be different is because they aren't necessarily ABI-compatible: there could be new layout optimizations, changes to `MIR`, or other changes to Rust metadata on `nightly` that aren't present in beta. -This is also where `--keep-stage 1 library/std` comes into play. Since most -changes to the compiler don't actually change the ABI, once you've produced a -`std` in `stage1`, you can probably just reuse it with a different compiler. If -the ABI hasn't changed, you're good to go, no need to spend time recompiling -that `std`. The flag `--keep-stage` simply instructs the build script to assume +This is also where `--keep-stage 1 library/std` comes into play. +Since most changes to the compiler don't actually change the ABI, once you've produced a +`std` in `stage1`, you can probably just reuse it with a different compiler. +If the ABI hasn't changed, you're good to go; no need to spend time recompiling that `std`. +The flag `--keep-stage` simply instructs the build script to assume the previous compile is fine and copies those artifacts into the appropriate place, skipping the `cargo` invocation. ### Cross-compiling rustc -*Cross-compiling* is the process of compiling code that will run on another -architecture. For instance, you might want to build an ARM version of rustc -using an x86 machine. Building `stage2` `std` is different when you are -cross-compiling. +*Cross-compiling* is the process of compiling code that will run on another architecture. +For instance, you might want to build an ARM version of rustc using an x86 machine. +Building `stage2` `std` is different when you are cross-compiling. This is because `./x` uses the following logic: if `HOST` and `TARGET` are the -same, it will reuse `stage1` `std` for `stage2`! This is sound because `stage1` +same, it will reuse `stage1` `std` for `stage2`! +This is sound because `stage1` `std` was compiled with the `stage1` compiler, i.e. a compiler using the source -code you currently have checked out. So it should be identical (and therefore +code you currently have checked out. +So it should be identical (and therefore ABI-compatible) to the `std` that `stage2/rustc` would compile. -However, when cross-compiling, `stage1` `std` will only run on the host. So the -`stage2` compiler has to recompile `std` for the target. +However, when cross-compiling, `stage1` `std` will only run on the host. +So, the `stage2` compiler has to recompile `std` for the target. (See in the table how `stage2` only builds non-host `std` targets). ### What is a 'sysroot'? When you build a project with `cargo`, the build artifacts for dependencies are -normally stored in `target/debug/deps`. This only contains dependencies `cargo` -knows about; in particular, it doesn't have the standard library. Where do `std` -or `proc_macro` come from? They come from the **sysroot**, the root of a number -of directories where the compiler loads build artifacts at runtime. The -`sysroot` doesn't just store the standard library, though - it includes anything -that needs to be loaded at runtime. That includes (but is not limited to): +normally stored in `target/debug/deps`. +This only contains dependencies `cargo` +knows about; in particular, it doesn't have the standard library. +Where do `std` or `proc_macro` come from? +They come from the **sysroot**, the root of a number +of directories where the compiler loads build artifacts at runtime. +The `sysroot` doesn't just store the standard library, though - it includes anything +that needs to be loaded at runtime. +That includes (but is not limited to): - Libraries `libstd`/`libtest`/`libproc_macro`. -- Compiler crates themselves, when using `rustc_private`. In-tree these are - always present; out of tree, you need to install `rustc-dev` with `rustup`. -- Shared object file `libLLVM.so` for the LLVM project. In-tree this is either - built from source or downloaded from CI; out-of-tree, you need to install +- Compiler crates themselves, when using `rustc_private`. + In-tree, these are always present; out-of-tree, you need to install `rustc-dev` with `rustup`. +- Shared object file `libLLVM.so` for the LLVM project. + In-tree, this is either built from source or downloaded from CI; out-of-tree, you need to install `llvm-tools-preview` with `rustup`. -All the artifacts listed so far are *compiler* runtime dependencies. You can see -them with `rustc --print sysroot`: +All the artifacts listed so far are *compiler* runtime dependencies. +You can see them with `rustc --print sysroot`: ``` $ ls $(rustc --print sysroot)/lib @@ -293,8 +294,8 @@ librustc_driver-4f0cc9f50e53f0ba.so libtracing_attributes-e4be92c35ab2a33b.so librustc_macros-5f0ec4a119c6ac86.so rustlib ``` -There are also runtime dependencies for the standard library! These are in -`lib/rustlib/`, not `lib/` directly. +There are also runtime dependencies for the standard library! +These are in `lib/rustlib/`, not `lib/` directly. ``` $ ls $(rustc --print sysroot)/lib/rustlib/x86_64-unknown-linux-gnu/lib | head -n 5 @@ -306,48 +307,50 @@ libcompiler_builtins-ef2408da76957905.rlib ``` Directory `lib/rustlib/` includes libraries like `hashbrown` and `cfg_if`, which -are not part of the public API of the standard library, but are used to -implement it. Also `lib/rustlib/` is part of the search path for linkers, but +are not part of the public API of the standard library, but are used to implement it. +Also,`lib/rustlib/` is part of the search path for linkers, but `lib` will never be part of the search path. #### `-Z force-unstable-if-unmarked` Since `lib/rustlib/` is part of the search path we have to be careful about -which crates are included in it. In particular, all crates except for the +which crates are included in it. +In particular, all crates except for the standard library are built with the flag `-Z force-unstable-if-unmarked`, which means that you have to use `#![feature(rustc_private)]` in order to load it (as opposed to the standard library, which is always available). The `-Z force-unstable-if-unmarked` flag has a variety of purposes to help -enforce that the correct crates are marked as `unstable`. It was introduced -primarily to allow rustc and the standard library to link to arbitrary crates on -crates.io which do not themselves use `staged_api`. `rustc` also relies on this +enforce that the correct crates are marked as `unstable`. +It was introduced primarily to allow rustc and the standard library to link to arbitrary crates on +crates.io which do not themselves use `staged_api`. +`rustc` also relies on this flag to mark all of its crates as `unstable` with the `rustc_private` feature so that each crate does not need to be carefully marked with `unstable`. This flag is automatically applied to all of `rustc` and the standard library by -the bootstrap scripts. This is needed because the compiler and all of its +the bootstrap scripts. +This is needed because the compiler and all of its dependencies are shipped in `sysroot` to all users. This flag has the following effects: - Marks the crate as "`unstable`" with the `rustc_private` feature if it is not itself marked as `stable` or `unstable`. -- Allows these crates to access other forced-unstable crates without any need - for attributes. Normally a crate would need a `#![feature(rustc_private)]` - attribute to use other `unstable` crates. However, that would make it +- Allows these crates to access other forced-unstable crates without any need for attributes. + Normally, a crate would need a `#![feature(rustc_private)]` + attribute to use other `unstable` crates. + However, that would make it impossible for a crate from crates.io to access its own dependencies since that crate won't have a `feature(rustc_private)` attribute, but *everything* is compiled with `-Z force-unstable-if-unmarked`. Code which does not use `-Z force-unstable-if-unmarked` should include the -`#![feature(rustc_private)]` crate attribute to access these forced-unstable -crates. This is needed for things which link `rustc` itself, such as `Miri` or -`clippy`. +`#![feature(rustc_private)]` crate attribute to access these forced-unstable crates. +This is needed for things which link `rustc` itself, such as `Miri` or `clippy`. You can find more discussion about sysroots in: -- The [rustdoc PR] explaining why it uses `extern crate` for dependencies loaded - from `sysroot` +- The [rustdoc PR] explaining why it uses `extern crate` for dependencies loaded from `sysroot` - [Discussions about sysroot on Zulip](https://rust-lang.zulipchat.com/#narrow/stream/182449-t-compiler.2Fhelp/topic/deps.20in.20sysroot/) - [Discussions about building rustdoc out of @@ -358,31 +361,33 @@ You can find more discussion about sysroots in: ## Passing flags to commands invoked by `bootstrap` Conveniently `./x` allows you to pass stage-specific flags to `rustc` and -`cargo` when bootstrapping. The `RUSTFLAGS_BOOTSTRAP` environment variable is +`cargo` when bootstrapping. +The `RUSTFLAGS_BOOTSTRAP` environment variable is passed as `RUSTFLAGS` to the bootstrap stage (`stage0`), and `RUSTFLAGS_NOT_BOOTSTRAP` is passed when building artifacts for later stages. `RUSTFLAGS` will work, but also affects the build of `bootstrap` itself, so it -will be rare to want to use it. Finally, `MAGIC_EXTRA_RUSTFLAGS` bypasses the +will be rare to want to use it. +Finally, `MAGIC_EXTRA_RUSTFLAGS` bypasses the `cargo` cache to pass flags to rustc without recompiling all dependencies. - `RUSTDOCFLAGS`, `RUSTDOCFLAGS_BOOTSTRAP` and `RUSTDOCFLAGS_NOT_BOOTSTRAP` are analogous to `RUSTFLAGS`, but for `rustdoc`. - `CARGOFLAGS` will pass arguments to cargo itself (e.g. `--timings`). - `CARGOFLAGS_BOOTSTRAP` and `CARGOFLAGS_NOT_BOOTSTRAP` work analogously to - `RUSTFLAGS_BOOTSTRAP`. -- `--test-args` will pass arguments through to the test runner. For `tests/ui`, - this is `compiletest`. For unit tests and doc tests this is the `libtest` - runner. + `CARGOFLAGS_BOOTSTRAP` and `CARGOFLAGS_NOT_BOOTSTRAP` work analogously to `RUSTFLAGS_BOOTSTRAP`. +- `--test-args` will pass arguments through to the test runner. + For `tests/ui`, + this is `compiletest`. + For unit tests and doc tests, this is the `libtest` runner. Most test runners accept `--help`, which you can use to find out the options accepted by the runner. ## Environment Variables -During bootstrapping, there are a bunch of compiler-internal environment -variables that are used. If you are trying to run an intermediate version of -`rustc`, sometimes you may need to set some of these environment variables -manually. Otherwise, you get an error like the following: +During bootstrapping, there are a bunch of compiler-internal environment variables that are used. +If you are trying to run an intermediate version of +`rustc`, sometimes you may need to set some of these environment variables manually. +Otherwise, you get an error like the following: ```text thread 'main' panicked at 'RUSTC_STAGE was not set: NotPresent', library/core/src/result.rs:1165:5 @@ -390,14 +395,12 @@ thread 'main' panicked at 'RUSTC_STAGE was not set: NotPresent', library/core/sr If `./stageN/bin/rustc` gives an error about environment variables, that usually means something is quite wrong -- such as you're trying to compile `rustc` or -`std` or something which depends on environment variables. In the unlikely case -that you actually need to invoke `rustc` in such a situation, you can tell the -bootstrap shim to print all `env` variables by adding `-vvv` to your `x` -command. +`std` or something which depends on environment variables. +In the unlikely case that you actually need to invoke `rustc` in such a situation, you can tell the +bootstrap shim to print all `env` variables by adding `-vvv` to your `x` command. Finally, bootstrap makes use of the [cc-rs crate] which has [its own -method][env-vars] of configuring `C` compilers and `C` flags via environment -variables. +method][env-vars] of configuring `C` compilers and `C` flags via environment variables. [cc-rs crate]: https://github.com/rust-lang/cc-rs [env-vars]: https://docs.rs/cc/latest/cc/#external-configuration-via-environment-variables @@ -406,8 +409,7 @@ variables. In this part, we will investigate the build command's `stdout` in an action (similar, but more detailed and complete documentation compare to topic above). -When you execute `x build --dry-run` command, the build output will be something -like the following: +When you execute `x build --dry-run` command, the build output will be something like the following: ```text Building stage0 library artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu) @@ -434,10 +436,10 @@ This copies the library and compiler artifacts from `cargo` into ### Assembling stage1 compiler This copies the libraries we built in "building `stage0` ... artifacts" into the -`stage1` compiler's `lib/` directory. These are the host libraries that the -compiler itself uses to run. These aren't actually used by artifacts the new -compiler generates. This step also copies the `rustc` and `rustdoc` binaries we -generated into `build/$HOST/stage/bin`. +`stage1` compiler's `lib/` directory. +These are the host libraries that the compiler itself uses to run. +These aren't actually used by artifacts the new compiler generates. +This step also copies the `rustc` and `rustdoc` binaries we generated into `build/$HOST/stage/bin`. The `stage1/bin/rustc` is a fully functional compiler built with stage0 (precompiled) compiler and std. To use a compiler built entirely from source with the in-tree compiler and std, you need to build the diff --git a/src/doc/rustc-dev-guide/src/building/suggested.md b/src/doc/rustc-dev-guide/src/building/suggested.md index 0014ba0e9a94c..959d0687b7fb5 100644 --- a/src/doc/rustc-dev-guide/src/building/suggested.md +++ b/src/doc/rustc-dev-guide/src/building/suggested.md @@ -442,6 +442,9 @@ If you're using the flake, make sure to also update it with the following comman nix flake update --flake ./src/tools/nix-dev-shell ``` +The shell creates a command named `x` that runs the `./x.py` script with all dependencies +set up correctly. + ### Note Note that when using nix on a not-NixOS distribution, it may be necessary to set diff --git a/src/doc/rustc-dev-guide/src/const-generics.md b/src/doc/rustc-dev-guide/src/const-generics.md index 344d9b1d26de8..3f84b99fb637e 100644 --- a/src/doc/rustc-dev-guide/src/const-generics.md +++ b/src/doc/rustc-dev-guide/src/const-generics.md @@ -5,16 +5,20 @@ Most of the kinds of `ty::Const` that exist have direct parallels to kinds of types that exist, for example `ConstKind::Param` is equivalent to `TyKind::Param`. The main interesting points here are: -- [`ConstKind::Unevaluated`], this is equivalent to `TyKind::Alias` and in the long term should be renamed (as well as introducing an `AliasConstKind` to parallel `ty::AliasKind`). -- [`ConstKind::Value`], this is the final value of a `ty::Const` after monomorphization. This is similar-ish to fully concrete to things like `TyKind::Str` or `TyKind::ADT`. +- [`ConstKind::Unevaluated`], which is equivalent to `TyKind::Alias` and in the long term should be renamed (as well as introducing an `AliasConstKind` to parallel `ty::AliasKind`). +- [`ConstKind::Value`], which is the final value of a `ty::Const` after monomorphization. + This is somewhat similar to fully concrete things like `TyKind::Str` or `TyKind::ADT`. For a complete list of *all* kinds of const arguments and how they are actually represented in the type system, see the [`ConstKind`] type. -Inference Variables are quite boring and treated equivalently to type inference variables almost everywhere. Const Parameters are also similarly boring and equivalent to uses of type parameters almost everywhere. However, there are some interesting subtleties with how they are handled during parsing, name resolution, and AST lowering: [ambig-unambig-ty-and-consts]. +Inference Variables are quite boring and treated equivalently to type inference variables almost everywhere. +Const Parameters are also similarly boring and equivalent to uses of type parameters almost everywhere. +However, there are some interesting subtleties with how they are handled during parsing, name resolution, and AST lowering: [ambig-unambig-ty-and-consts]. ## Anon Consts -Anon Consts (short for anonymous const items) are how arbitrary expression are represented in const generics, for example an array length of `1 + 1` or `foo()` or even just `0`. These are unique to const generics and have no real type equivalent. +Anon Consts (short for anonymous const items) are how arbitrary expression are represented in const generics, for example an array length of `1 + 1` or `foo()` or even just `0`. +These are unique to const generics and have no real type equivalent. ### Desugaring @@ -31,7 +35,7 @@ const ANON: usize = 1 + 1; type Alias = [u8; ANON]; ``` -Where the array length in `[u8; ANON]` isn't itself an anon const containing a usage of `ANON`, but a kind of "direct" usage of the `ANON` const item ([`ConstKind::Unevaluated`]). +Where the array length in `[u8; ANON]` isn't itself an anon const containing a usage of `ANON`, but a kind of "direct" usage of the `ANON` const item ([`ConstKind::Unevaluated`]). Anon consts do not inherit any generic parameters of the item they are inside of: ```rust @@ -46,13 +50,19 @@ const ANON: usize = 1 + 1; type Alias = [T; ANON]; ``` -Note how the `ANON` const has no generic parameters or where clauses, even though `Alias` has both a type parameter `T` and a where clauses `T: Sized`. This desugaring is part of how we enforce that anon consts can't make use of generic parameters. +Note how the `ANON` const has no generic parameters or where clauses, even though `Alias` has both a type parameter `T` and a where clauses `T: Sized`. +This desugaring is part of how we enforce that anon consts can't make use of generic parameters. While it's useful to think of anon consts as being desugared to real const items, the compiler does not actually implement things this way. -At AST lowering time we do not yet know the *type* of the anon const, so we can't desugar to a real HIR item with an explicitly written type. To work around this we have [`DefKind::AnonConst`] and [`hir::Node::AnonConst`] which are used to represent these anonymous const items that can't actually be desugared. +At AST lowering time we do not yet know the *type* of the anon const, so we can't desugar to a real HIR item with an explicitly written type. +To work around this, we have [`DefKind::AnonConst`] and [`hir::Node::AnonConst`], +which are used to represent these anonymous const items that can't actually be desugared. -The types of these anon consts are obtainable from the [`type_of`] query. However, the `type_of` query does not actually contain logic for computing the type (infact it just ICEs when called), instead HIR Ty lowering is responsible for *feeding* the value of the `type_of` query for any anon consts that get lowered. HIR Ty lowering can determine the type of the anon const by looking at the type of the Const Parameter that the anon const is an argument to. +The types of these anon consts are obtainable from the [`type_of`] query. +However, the `type_of` query does not actually contain logic for computing the type (and, in fact, it just ICEs when called). +Instead, HIR Ty lowering is responsible for *feeding* the value of the `type_of` query for any anon consts that get lowered. +HIR Ty lowering can determine the type of the anon const by looking at the type of the Const Parameter that the anon const is an argument to. TODO: write a chapter on query feeding and link it here @@ -68,11 +78,12 @@ const ANON = 1 + 1; type Alias = [u8; ANON]; ``` -Where when we go through HIR ty lowering for the array type in `Alias`, we will lower the array length too and feed `type_of(ANON) -> usize`. Effectively setting the type of the `ANON` const item during some later part of the compiler rather than when constructing the HIR. +When we go through HIR ty lowering for the array type in `Alias`, we will lower the array length too, and feed `type_of(ANON) -> usize`. +This will effectively set the type of the `ANON` const item during some later part of the compiler rather than when constructing the HIR. After all of this desugaring has taken place the final representation in the type system (ie as a `ty::Const`) is a `ConstKind::Unevaluated` with the `DefId` of the `AnonConst`. This is equivalent to how we would representa a usage of an actual const item if we were to represent them without going through an anon const (e.g. when `min_generic_const_args` is enabled). -This allows the representation for const "aliases" to be the same as the representation of `TyKind::Alias`. Having a proper HIR body also allows for a *lot* of code re-use, e.g. we can reuse HIR typechecking and all of the lowering steps to MIR where we can then reuse const eval. +This allows the representation for const "aliases" to be the same as the representation of `TyKind::Alias`. Having a proper HIR body also allows for a *lot* of code re-use, e.g. we can reuse HIR typechecking and all of the lowering steps to MIR where we can then reuse const eval. ### Enforcing lack of Generic Parameters @@ -103,16 +114,19 @@ where (): Trait {} ``` -The second point is particularly subtle as it is very easy to get HIR Ty lowering wrong and not properly enforce that anon consts can't use generic parameters. The existing check is too conservative and accidentally permits some generic parameters to wind up in the body of the anon const [#144547](https://github.com/rust-lang/rust/issues/144547). +The second point is particularly subtle as it is very easy to get HIR Ty lowering wrong and not properly enforce that anon consts can't use generic parameters. +The existing check is too conservative and accidentally permits some generic parameters to wind up in the body of the anon const [#144547](https://github.com/rust-lang/rust/issues/144547). Erroneously allowing generic parameters in anon consts can sometimes lead to ICEs but can also lead to accepting illformed programs. -The third point is also somewhat subtle, by not inheriting any of the where clauses of the parent item we can't wind up with the trait solving inferring inference variables to generic parameters based off where clauses in scope that mention generic parameters. For example inferring `?x=T` from the expression `<() as Trait>::ASSOC` and an in scope where clause of `(): Trait`. +The third point is also somewhat subtle, by not inheriting any of the where clauses of the parent item we can't wind up with the trait solving inferring inference variables to generic parameters based off where clauses in scope that mention generic parameters. +For example, inferring `?x=T` from the expression `<() as Trait>::ASSOC` and an in-scope where clause of `(): Trait`. This also makes it much more likely that the compiler will ICE or atleast incidentally emit some kind of error if we *do* accidentally allow generic parameters in an anon const, as the anon const will have none of the necessary information in its environment to properly handle the generic parameters. #### Array repeat expressions -The one exception to all of the above is repeat counts of array expressions. As a *backwards compatibility hack* we allow the repeat count const argument to use generic parameters. +The one exception to all of the above is repeat counts of array expressions. +As a *backwards compatibility hack*, we allow the repeat count const argument to use generic parameters. ```rust fn foo() { @@ -124,14 +138,17 @@ However, to avoid most of the problems involved in allowing generic parameters i In the previous example the anon const can be evaluated for any type parameter `T` because raw pointers to sized types always have the same size (e.g. `8` on 64bit platforms). -When detecting that we evaluated an anon const that syntactically contained generic parameters, but did not actually depend on them for evaluation to succeed, we emit the [`const_evaluatable_unchecked` FCW][cec_fcw]. This is intended to become a hard error once we stabilize more ways of using generic parameters in const arguments, for example `min_generic_const_args` or (the now dead) `generic_const_exprs`. +When detecting that we evaluated an anon const that syntactically contained generic parameters, but did not actually depend on them for evaluation to succeed, we emit the [`const_evaluatable_unchecked` FCW][cec_fcw]. +This is intended to become a hard error once we stabilize more ways of using generic parameters in const arguments, for example `min_generic_const_args` or (the now dead) `generic_const_exprs`. The implementation for this FCW can be found here: [`const_eval_resolve_for_typeck`] ### Incompatibilities with `generic_const_parameter_types` -Supporting const paramters such as `const N: [u8; M]` or `const N: Foo` does not work very nicely with the current anon consts setup. There are two reasons for this: -1. As anon consts cannot use generic parameters, their type *also* can't reference generic parameters. This means it is fundamentally not possible to use an anon const as an argument to a const parameeter whose type still references generic parameters. +Supporting const parameters such as `const N: [u8; M]` or `const N: Foo` does not work very nicely with the current anon consts setup. +There are two reasons for this: +1. As anon consts cannot use generic parameters, their type *also* can't reference generic parameters. + This means it is fundamentally not possible to use an anon const as an argument to a const parameter whose type still references generic parameters. ```rust #![feature(adt_const_params, generic_const_parameter_types)] @@ -144,7 +161,8 @@ Supporting const paramters such as `const N: [u8; M]` or `const N: Foo` does } ``` -2. We currently require knowing the type of anon consts when lowering them during HIR ty lowering. With generic const parameter types it may be the case that the currently known type contains inference variables (ie may not be fully known yet). +2. We currently require knowing the type of anon consts when lowering them during HIR ty lowering. + With generic const parameter types it may be the case that the currently known type contains inference variables (ie may not be fully known yet). ```rust #![feature(adt_const_params, generic_const_parameter_types)] @@ -158,22 +176,27 @@ Supporting const paramters such as `const N: [u8; M]` or `const N: Foo` does } ``` -It is currently unclear what the right way to make `generic_const_parameter_types` work nicely with the rest of const generics is. +It is currently unclear what the right way to make `generic_const_parameter_types` work nicely with the rest of const generics is. -`generic_const_exprs` would have allowed for anon consts with types referencing generic parameters, but that design wound up unworkable. +`generic_const_exprs` would have allowed for anon consts with types referencing generic parameters, but that design wound up unworkable. `min_generic_const_args` will allow for some expressions (for example array construction) to be representable without an anon const and therefore without running into these issues, though whether this is *enough* has yet to be determined. ## Checking types of Const Arguments -In order for a const argument to be well formed it must have the same type as the const parameter it is an argument to. For example a const argument of type `bool` for an array length is not well formed, as an array's length parameter has type `usize`. +In order for a const argument to be well formed it must have the same type as the const parameter it is an argument to. +For example, a const argument of type `bool` for an array length is not well formed, as an array's length parameter has type `usize`. ```rust type Alias = [u8; B]; -//~^ ERROR: +//~^ ERROR: ``` -To check this we have [`ClauseKind::ConstArgHasType(ty::Const, Ty)`][const_arg_has_type], where for each Const Parameter defined on an item we also desugar an equivalent `ConstArgHasType` clause into its list of where cluases. This ensures that whenever we check wellformedness of anything by proving all of its clauses, we also check happen to check that all of the Const Arguments have the correct type. +To check this, we have [`ClauseKind::ConstArgHasType(ty::Const, Ty)`][const_arg_has_type], where, +for each Const Parameter defined on an item, +we also desugar an equivalent `ConstArgHasType` clause into its list of where cluases. +This ensures that whenever we check wellformedness of anything by proving all of its clauses, +we also check happen to check that all of the Const Arguments have the correct type. ```rust fn foo() {} @@ -186,13 +209,15 @@ where N: usize, {} ``` -Proving `ConstArgHasType` goals is implemented by first computing the type of the const argument, then equating it with the provided type. A rough outline of how the type of a Const Argument may be computed: +Proving `ConstArgHasType` goals is implemented by first computing the type of the const argument, then equating it with the provided type. +A rough outline of how the type of a Const Argument may be computed: - [`ConstKind::Param(N)`][`ConstKind::Param`] can be looked up in the [`ParamEnv`] to find a `ConstArgHasType(N, ty)` clause - [`ConstKind::Value`] stores the type of the value inside itself so can trivially be accessed - [`ConstKind::Unevaluated`] can have its type computed by calling the `type_of` query - See the implementation of proving `ConstArgHasType` goals for more detailed information -`ConstArgHasType` is *the* soundness critical way that we check Const Arguments have the correct type. However, we do *indirectly* check the types of Const Arguments a different way in some cases. +`ConstArgHasType` is *the* soundness critical way that we check Const Arguments have the correct type. +However, we do *indirectly* check the types of Const Arguments a different way in some cases. ```rust type Alias = [u8; true]; @@ -203,7 +228,10 @@ const ANON: usize = true; type Alias = [u8; ANON]; ``` -By feeding the type of an anon const with the type of the Const Parameter we guarantee that the `ConstArgHasType` goal involving the anon const will succeed. In cases where the type of the anon const doesn't match the type of the Const Parameter what actually happens is a *type checking* error when type checking the anon const's body. +By feeding the type of an anon const with the type of the Const Parameter, +we guarantee that the `ConstArgHasType` goal involving the anon const will succeed. +In cases where the type of the anon const doesn't match the type of the Const Parameter, +what actually happens is a *type checking* error when type checking the anon const's body. Looking at the above example, this corresponds to `[u8; ANON]` being a well formed type because `ANON` has type `usize`, but the *body* of `ANON` being illformed and resulting in a type checking error because `true` can't be returned from a const item of type `usize`. diff --git a/src/doc/rustc-dev-guide/src/debugging-support-in-rustc.md b/src/doc/rustc-dev-guide/src/debugging-support-in-rustc.md index f2193e8abf98e..e6984417086d8 100644 --- a/src/doc/rustc-dev-guide/src/debugging-support-in-rustc.md +++ b/src/doc/rustc-dev-guide/src/debugging-support-in-rustc.md @@ -19,8 +19,9 @@ According to Wikipedia > other programs (the "target" program). Writing a debugger from scratch for a language requires a lot of work, especially if -debuggers have to be supported on various platforms. GDB and LLDB, however, can be -extended to support debugging a language. This is the path that Rust has chosen. +debuggers have to be supported on various platforms. +GDB and LLDB, however, can be extended to support debugging a language. +This is the path that Rust has chosen. This document's main goal is to document the said debuggers support in Rust compiler. ### DWARF @@ -35,7 +36,8 @@ According to the [DWARF] standard website > as well as in stand-alone environments. DWARF reader is a program that consumes the DWARF format and creates debugger compatible output. -This program may live in the compiler itself. DWARF uses a data structure called +This program may live in the compiler itself. + DWARF uses a data structure called Debugging Information Entry (DIE) which stores the information as "tags" to denote functions, variables etc., e.g., `DW_TAG_variable`, `DW_TAG_pointer_type`, `DW_TAG_subprogram` etc. You can also invent your own tags and attributes. @@ -45,7 +47,8 @@ You can also invent your own tags and attributes. [PDB] (Program Database) is a file format created by Microsoft that contains debug information. PDBs can be consumed by debuggers such as WinDbg/CDB and other tools to display debug information. A PDB contains multiple streams that describe debug information about a specific binary such -as types, symbols, and source files used to compile the given binary. CodeView is another +as types, symbols, and source files used to compile the given binary. +CodeView is another format which defines the structure of [symbol records] and [type records] that appear within PDB streams. @@ -61,15 +64,18 @@ and can parse only a subset of Rust expressions. GDB parser was written from scratch and has no relation to any other parser, including that of rustc. -GDB has Rust-like value and type output. It can print values and types in a way -that look like Rust syntax in the output. Or when you print a type as [ptype] in GDB, -it also looks like Rust source code. Checkout the documentation in the [manual for GDB/Rust]. +GDB has Rust-like value and type output. +It can print values and types in a way that look like Rust syntax in the output. +Or when you print a type as [ptype] in GDB, +it also looks like Rust source code. +Checkout the documentation in the [manual for GDB/Rust]. #### Parser extensions Expression parser has a couple of extensions in it to facilitate features that you cannot do -with Rust. Some limitations are listed in the [manual for GDB/Rust]. There is some special -code in the DWARF reader in GDB to support the extensions. +with Rust. +Some limitations are listed in the [manual for GDB/Rust]. +There is some special code in the DWARF reader in GDB to support the extensions. A couple of examples of DWARF reader support needed are as follows: @@ -81,12 +87,14 @@ A couple of examples of DWARF reader support needed are as follows: 2. Dissect trait objects: DWARF extension where the trait object's description in the DWARF also points to a stub description of the corresponding vtable which in turn points to the - concrete type for which this trait object exists. This means that you can do a `print *object` + concrete type for which this trait object exists. + This means that you can do a `print *object` for that trait object, and GDB will understand how to find the correct type of the payload in the trait object. **TODO**: Figure out if the following should be mentioned in the GDB-Rust document rather than -this guide page so there is no duplication. This is regarding the following comments: +this guide page so there is no duplication. +This is regarding the following comments: [This comment by Tom](https://github.com/rust-lang/rustc-dev-guide/pull/316#discussion_r284027340) > gdb's Rust extensions and limitations are documented in the gdb manual: @@ -102,7 +110,8 @@ document so there is no duplication etc.? #### Rust expression parser -This expression parser is written in C++. It is a type of [Recursive Descent parser]. +This expression parser is written in C++. +It is a type of [Recursive Descent parser]. It implements slightly less of the Rust language than GDB. LLDB has Rust-like value and type output. @@ -114,19 +123,21 @@ LLDB has Rust-like value and type output. ### WinDbg/CDB Microsoft provides [Windows Debugging Tools] such as the Windows Debugger (WinDbg) and -the Console Debugger (CDB) which both support debugging programs written in Rust. These -debuggers parse the debug info for a binary from the `PDB`, if available, to construct a +the Console Debugger (CDB) which both support debugging programs written in Rust. +These debuggers parse the debug info for a binary from the `PDB`, if available, to construct a visualization to serve up in the debugger. #### Natvis Both WinDbg and CDB support defining and viewing custom visualizations for any given type -within the debugger using the Natvis framework. The Rust compiler defines a set of Natvis +within the debugger using the Natvis framework. +The Rust compiler defines a set of Natvis files that define custom visualizations for a subset of types in the standard libraries such -as, `std`, `core`, and `alloc`. These Natvis files are embedded into `PDBs` generated by the +as, `std`, `core`, and `alloc`. +These Natvis files are embedded into `PDBs` generated by the `*-pc-windows-msvc` target triples to automatically enable these custom visualizations when -debugging. This default can be overridden by setting the `strip` rustc flag to either `debuginfo` -or `symbols`. +debugging. +This default can be overridden by setting the `strip` rustc flag to either `debuginfo` or `symbols`. Rust has support for embedding Natvis files for crates outside of the standard libraries by using the `#[debugger_visualizer]` attribute. @@ -147,17 +158,17 @@ We have some DWARF extensions that the Rust compiler emits and the debuggers und are _not_ in the DWARF standard. * Rust compiler will emit DWARF for a virtual table, and this `vtable` object will have a - `DW_AT_containing_type` that points to the real type. This lets debuggers dissect a trait object - pointer to correctly find the payload. E.g., here's such a DIE, from a test case in the gdb - repository: - - ```asm - <1><1a9>: Abbrev Number: 3 (DW_TAG_structure_type) - <1aa> DW_AT_containing_type: <0x1b4> - <1ae> DW_AT_name : (indirect string, offset: 0x23d): vtable - <1b2> DW_AT_byte_size : 0 - <1b3> DW_AT_alignment : 8 - ``` + `DW_AT_containing_type` that points to the real type. + This lets debuggers dissect a trait object pointer to correctly find the payload. + Here is an example of such a DIE, from a test case in the gdb repository: + + ```asm + <1><1a9>: Abbrev Number: 3 (DW_TAG_structure_type) + <1aa> DW_AT_containing_type: <0x1b4> + <1ae> DW_AT_name : (indirect string, offset: 0x23d): vtable + <1b2> DW_AT_byte_size : 0 + <1b3> DW_AT_alignment : 8 + ``` * The other extension is that the Rust compiler can emit a tagless discriminated union. See [DWARF feature request] for this item. @@ -165,7 +176,8 @@ are _not_ in the DWARF standard. ### Current limitations of DWARF * Traits - require a bigger change than normal to DWARF, on how to represent Traits in DWARF. -* DWARF provides no way to differentiate between Structs and Tuples. Rust compiler emits +* DWARF provides no way to differentiate between Structs and Tuples. + Rust compiler emits fields with `__0` and debuggers look for a sequence of such names to overcome this limitation. For example, in this case the debugger would look at a field via `x.__0` instead of `x.0`. This is resolved via the Rust parser in the debugger so now you can do `x.0`. @@ -189,40 +201,46 @@ According to Wikipedia, [System Integrity Protection] is > files and directories against modifications by processes without a specific "entitlement", > even when executed by the root user or a user with root privileges (sudo). -It prevents processes using `ptrace` syscall. If a process wants to use `ptrace` it has to be -code signed. The certificate that signs it has to be trusted on your machine. +It prevents processes using `ptrace` syscall. +If a process wants to use `ptrace` it has to be code signed. +The certificate that signs it has to be trusted on your machine. See [Apple developer documentation for System Integrity Protection]. -We may need to sign up with Apple and get the keys to do this signing. Tom has looked into if -Mozilla cannot do this because it is at the maximum number of -keys it is allowed to sign. Tom does not know if Mozilla could get more keys. +We may need to sign up with Apple and get the keys to do this signing. +Tom has looked into if Mozilla cannot do this because it is at the maximum number of +keys it is allowed to sign. +Tom does not know if Mozilla could get more keys. Alternatively, Tom suggests that maybe a Rust legal entity is needed to get the keys via Apple. -This problem is not technical in nature. If we had such a key we could sign GDB as well and -ship that. +This problem is not technical in nature. +If we had such a key, we could sign GDB as well and ship that. ### DWARF and Traits -Rust traits are not emitted into DWARF at all. The impact of this is calling a method `x.method()` -does not work as is. The reason being that method is implemented by a trait, as opposed -to a type. That information is not present so finding trait methods is missing. +Rust traits are not emitted into DWARF at all. +The impact of this is calling a method `x.method()` does not work as-is. +The reason being that method is implemented by a trait, as opposed to a type. +That information is not present, so finding trait methods is missing. -DWARF has a notion of interface types (possibly added for Java). Tom's idea was to use this -interface type as traits. +DWARF has a notion of interface types (possibly added for Java). +Tom's idea was to use this interface type as traits. -DWARF only deals with concrete names, not the reference types. So, a given implementation of a -trait for a type would be one of these interfaces (`DW_tag_interface` type). Also, the type for -which it is implemented would describe all the interfaces this type implements. This requires a -DWARF extension. +DWARF only deals with concrete names, not the reference types. +So, a given implementation of a +trait for a type would be one of these interfaces (`DW_tag_interface` type). +Also, the type for which it is implemented would describe all the interfaces this type implements. +This requires a DWARF extension. Issue on GitHub: [https://github.com/rust-lang/rust/issues/33014] ## Typical process for a Debug Info change (LLVM) -LLVM has Debug Info (DI) builders. This is the primary thing that Rust calls into. +LLVM has Debug Info (DI) builders. +This is the primary thing that Rust calls into. This is why we need to change LLVM first because that is emitted first and not DWARF directly. -This is a kind of metadata that you construct and hand-off to LLVM. For the Rustc/LLVM hand-off +This is a kind of metadata that you construct and hand-off to LLVM. +For the Rustc/LLVM hand-off, some LLVM DI builder methods are called to construct representation of a type. The steps of this process are as follows: @@ -246,7 +264,8 @@ The steps of this process are as follows: ### Procedural macro stepping A deeply profound question is that how do you actually debug a procedural macro? -What is the location you emit for a macro expansion? Consider some of the following cases - +What is the location you emit for a macro expansion? +Consider some of the following cases - * You can emit location of the invocation of the macro. * You can emit the location of the definition of the macro. @@ -254,9 +273,10 @@ What is the location you emit for a macro expansion? Consider some of the follow RFC: [https://github.com/rust-lang/rfcs/pull/2117] -Focus is to let macros decide what to do. This can be achieved by having some kind of attribute -that lets the macro tell the compiler where the line marker should be. This affects where you -set the breakpoints and what happens when you step it. +Focus is to let macros decide what to do. +This can be achieved by having some kind of attribute +that lets the macro tell the compiler where the line marker should be. +This affects where you set the breakpoints and what happens when you step it. ## Source file checksums in debug info @@ -264,16 +284,20 @@ Both DWARF and CodeView (PDB) support embedding a cryptographic hash of each sou contributed to the associated binary. The cryptographic hash can be used by a debugger to verify that the source file matches the -executable. If the source file does not match, the debugger can provide a warning to the user. +executable. +If the source file does not match, the debugger can provide a warning to the user. The hash can also be used to prove that a given source file has not been modified since it was -used to compile an executable. Because MD5 and SHA1 both have demonstrated vulnerabilities, +used to compile an executable. +Because MD5 and SHA1 both have demonstrated vulnerabilities, using SHA256 is recommended for this application. The Rust compiler stores the hash for each source file in the corresponding `SourceFile` in -the `SourceMap`. The hashes of input files to external crates are stored in `rlib` metadata. +the `SourceMap`. +The hashes of input files to external crates are stored in `rlib` metadata. -A default hashing algorithm is set in the target specification. This allows the target to +A default hashing algorithm is set in the target specification. +This allows the target to specify the best hash available, since not all targets support all hash algorithms. The hashing algorithm for a target can also be overridden with the `-Z source-file-checksum=` @@ -304,19 +328,21 @@ Clang always embeds an MD5 checksum, though this does not appear in documentatio * New demangler in `libiberty` (gcc source tree). * New demangler in LLVM or LLDB. -**TODO**: Check the location of the demangler source. [#1157](https://github.com/rust-lang/rustc-dev-guide/issues/1157) +**TODO**: Check the location of the demangler source. +[#1157](https://github.com/rust-lang/rustc-dev-guide/issues/1157) #### Reuse Rust compiler for expressions -This is an important idea because debuggers by and large do not try to implement type -inference. You need to be much more explicit when you type into the debugger than your -actual source code. So, you cannot just copy and paste an expression from your source -code to debugger and expect the same answer but this would be nice. This can be helped -by using compiler. +This is an important idea because debuggers by and large do not try to implement type inference. +You need to be much more explicit when you type into the debugger than your actual source code. +So, you cannot just copy and paste an expression from your source +code to debugger and expect the same answer, but this would be nice. +This can be helped by using compiler. -It is certainly doable but it is a large project. You certainly need a bridge to the -debugger because the debugger alone has access to the memory. Both GDB (gcc) and LLDB (clang) -have this feature. LLDB uses Clang to compile code to JIT and GDB can do the same with GCC. +It is certainly doable, but it is a large project. +You certainly need a bridge to the debugger because the debugger alone has access to the memory. +Both GDB (gcc) and LLDB (clang) have this feature. +LLDB uses Clang to compile code to JIT and GDB can do the same with GCC. Both debuggers expression evaluation implement both a superset and a subset of Rust. They implement just the expression language, diff --git a/src/doc/rustc-dev-guide/src/diagnostics.md b/src/doc/rustc-dev-guide/src/diagnostics.md index 1ed19663118f8..bdd7a3dfa9b8e 100644 --- a/src/doc/rustc-dev-guide/src/diagnostics.md +++ b/src/doc/rustc-dev-guide/src/diagnostics.md @@ -26,31 +26,35 @@ LL | more code - Level (`error`, `warning`, etc.). It indicates the severity of the message. (See [diagnostic levels](#diagnostic-levels)) -- Code (for example, for "mismatched types", it is `E0308`). It helps - users get more information about the current error through an extended - description of the problem in the error code index. Not all diagnostic have a - code. For example, diagnostics created by lints don't have one. -- Message. It is the main description of the problem. It should be general and - able to stand on its own, so that it can make sense even in isolation. -- Diagnostic window. This contains several things: +- Code (for example, for "mismatched types", it is `E0308`). + It helps users get more information about the current error through an extended + description of the problem in the error code index. + Not all diagnostic have a code. + For example, diagnostics created by lints don't have one. +- Message. + It is the main description of the problem. + It should be general and able to stand on its own, so that it can make sense even in isolation. +- Diagnostic window. + This contains several things: - The path, line number and column of the beginning of the primary span. - The users' affected code and its surroundings. - - Primary and secondary spans underlying the users' code. These spans can - optionally contain one or more labels. + - Primary and secondary spans underlying the users' code. + These spans can optionally contain one or more labels. - Primary spans should have enough text to describe the problem in such a way that if it were the only thing being displayed (for example, in an - IDE) it would still make sense. Because it is "spatially aware" (it - points at the code), it can generally be more succinct than the error - message. + IDE) it would still make sense. + Because it is "spatially aware" (it + points at the code), it can generally be more succinct than the error message. - If cluttered output can be foreseen in cases when multiple span labels - overlap, it is a good idea to tweak the output appropriately. For - example, the `if/else arms have incompatible types` error uses different + overlap, it is a good idea to tweak the output appropriately. + For example, the `if/else arms have incompatible types` error uses different spans depending on whether the arms are all in the same line, if one of the arms is empty and if none of those cases applies. -- Sub-diagnostics. Any error can have multiple sub-diagnostics that look - similar to the main part of the error. These are used for cases where the - order of the explanation might not correspond with the order of the code. If - the order of the explanation can be "order free", leveraging secondary labels +- Sub-diagnostics. + Any error can have multiple sub-diagnostics that look similar to the main part of the error. + These are used for cases where the + order of the explanation might not correspond with the order of the code. + If the order of the explanation can be "order free", leveraging secondary labels in the main diagnostic is preferred, as it is typically less verbose. The text should be matter of fact and avoid capitalization and periods, unless @@ -69,22 +73,23 @@ error: the identifier `foo.bar` is invalid ### Error codes and explanations -Most errors have an associated error code. Error codes are linked to long-form +Most errors have an associated error code. +Error codes are linked to long-form explanations which contains an example of how to trigger the error and in-depth -details about the error. They may be viewed with the `--explain` flag, or via -the [error index]. +details about the error. +They may be viewed with the `--explain` flag, or via the [error index]. As a general rule, give an error a code (with an associated explanation) if the -explanation would give more information than the error itself. A lot of the time -it's better to put all the information in the emitted error itself. However, +explanation would give more information than the error itself. +A lot of the time it's better to put all the information in the emitted error itself. +However, sometimes that would make the error verbose or there are too many possible triggers to include useful information for all cases in the error, in which case it's a good idea to add an explanation.[^estebank] As always, if you are not sure, just ask your reviewer! If you decide to add a new error with an associated error code, please read -[this section][error-codes] for a guide and important details about the -process. +[this section][error-codes] for a guide and important details about the process. [^estebank]: This rule of thumb was suggested by **@estebank** [here][estebank-comment]. @@ -94,29 +99,27 @@ process. ### Lints versus fixed diagnostics -Some messages are emitted via [lints](#lints), where the user can control the -level. Most diagnostics are hard-coded such that the user cannot control the -level. +Some messages are emitted via [lints](#lints), where the user can control the level. +Most diagnostics are hard-coded such that the user cannot control the level. Usually it is obvious whether a diagnostic should be "fixed" or a lint, but there are some grey areas. Here are a few examples: -- Borrow checker errors: these are fixed errors. The user cannot adjust the - level of these diagnostics to silence the borrow checker. -- Dead code: this is a lint. While the user probably doesn't want dead code in - their crate, making this a hard error would make refactoring and development - very painful. -- [future-incompatible lints]: - these are silenceable lints. +- Borrow checker errors: these are fixed errors. + The user cannot adjust the level of these diagnostics to silence the borrow checker. +- Dead code: this is a lint. + While the user probably doesn't want dead code in + their crate, making this a hard error would make refactoring and development very painful. +- [future-incompatible lints]: these are silenceable lints. It was decided that making them fixed errors would cause too much breakage, so warnings are instead emitted, and will eventually be turned into fixed (hard) errors. Hard-coded warnings (those using methods like `span_warn`) should be avoided -for normal code, preferring to use lints instead. Some cases, such as warnings -with CLI flags, will require the use of hard-coded warnings. +for normal code, preferring to use lints instead. +Some cases, such as warnings with CLI flags, will require the use of hard-coded warnings. See the `deny` [lint level](#diagnostic-levels) below for guidelines when to use an error-level lint instead of a fixed error. @@ -125,56 +128,54 @@ use an error-level lint instead of a fixed error. ## Diagnostic output style guide -- Write in plain simple English. If your message, when shown on a – possibly +- Write in plain simple English. + If your message, when shown on a – possibly small – screen (which hasn't been cleaned for a while), cannot be understood by a normal programmer, who just came out of bed after a night partying, it's too complex. - `Error`, `Warning`, `Note`, and `Help` messages start with a lowercase letter and do not end with punctuation. -- Error messages should be succinct. Users will see these error messages many - times, and more verbose descriptions can be viewed with the `--explain` - flag. That said, don't make it so terse that it's hard to understand. -- The word "illegal" is illegal. Prefer "invalid" or a more specific word - instead. +- Error messages should be succinct. + Users will see these error messages many + times, and more verbose descriptions can be viewed with the `--explain` flag. + That said, don't make it so terse that it's hard to understand. +- The word "illegal" is illegal. + Prefer "invalid" or a more specific word instead. - Errors should document the span of code where they occur (use [`rustc_errors::DiagCtxt`][DiagCtxt]'s - `span_*` methods or a diagnostic struct's `#[primary_span]` to easily do - this). Also `note` other spans that have contributed to the error if the span - isn't too large. + `span_*` methods or a diagnostic struct's `#[primary_span]` to easily do this). + Also `note` other spans that have contributed to the error if the span isn't too large. - When emitting a message with span, try to reduce the span to the smallest amount possible that still signifies the issue -- Try not to emit multiple error messages for the same error. This may require - detecting duplicates. +- Try not to emit multiple error messages for the same error. + This may require detecting duplicates. - When the compiler has too little information for a specific error message, consult with the compiler team to add new attributes for library code that - allow adding more information. For example see - [`#[rustc_on_unimplemented]`](#rustc_on_unimplemented). Use these - annotations when available! + allow adding more information. + For example, see [`#[rustc_on_unimplemented]`](#rustc_on_unimplemented). + Use these annotations when available! - Keep in mind that Rust's learning curve is rather steep, and that the compiler messages are an important learning tool. -- When talking about the compiler, call it `the compiler`, not `Rust` or - `rustc`. -- Use the [Oxford comma](https://en.wikipedia.org/wiki/Serial_comma) when - writing lists of items. +- When talking about the compiler, call it `the compiler`, not `Rust` or `rustc`. +- Use the [Oxford comma](https://en.wikipedia.org/wiki/Serial_comma) when writing lists of items. ### Lint naming -From [RFC 0344], lint names should be consistent, with the following -guidelines: +From [RFC 0344], lint names should be consistent, with the following guidelines: The basic rule is: the lint name should make sense when read as "allow -*lint-name*" or "allow *lint-name* items". For example, "allow -`deprecated` items" and "allow `dead_code`" makes sense, while "allow +*lint-name*" or "allow *lint-name* items". +For example, "allow `deprecated` items" and "allow `dead_code`" makes sense, while "allow `unsafe_block`" is ungrammatical (should be plural). - Lint names should state the bad thing being checked for, e.g. `deprecated`, - so that `#[allow(deprecated)]` (items) reads correctly. Thus `ctypes` is not - an appropriate name; `improper_ctypes` is. + so that `#[allow(deprecated)]` (items) reads correctly. + Thus, `ctypes` is not an appropriate name; `improper_ctypes` is. - Lints that apply to arbitrary items (like the stability lints) should just - mention what they check for: use `deprecated` rather than - `deprecated_items`. This keeps lint names short. (Again, think "allow - *lint-name* items".) + mention what they check for: use `deprecated` rather than `deprecated_items`. + This keeps lint names short. + (Again, think "allow *lint-name* items".) - If a lint applies to a specific grammatical class, mention that class and use the plural form: use `unused_variables` rather than `unused_variable`. @@ -197,65 +198,64 @@ Guidelines for different diagnostic levels: - `warning`: emitted when the compiler detects something odd about a program. Care should be taken when adding warnings to avoid warning fatigue, and - avoid false-positives where there really isn't a problem with the code. Some - examples of when it is appropriate to issue a warning: + avoid false-positives where there really isn't a problem with the code. + Some examples of when it is appropriate to issue a warning: - A situation where the user *should* take action, such as swap out a - deprecated item, or use a `Result`, but otherwise doesn't prevent - compilation. - - Unnecessary syntax that can be removed without affecting the semantics of - the code. For example, unused code, or unnecessary `unsafe`. + deprecated item, or use a `Result`, but otherwise doesn't prevent compilation. + - Unnecessary syntax that can be removed without affecting the semantics of the code. + For example, unused code, or unnecessary `unsafe`. - Code that is very likely to be incorrect, dangerous, or confusing, but the - language technically allows, and is not ready or confident enough to make - an error. For example `unused_comparisons` (out of bounds comparisons) or + language technically allows, and is not ready or confident enough to make an error. + Examples are `unused_comparisons` (out of bounds comparisons) or `bindings_with_variant_name` (the user likely did not intend to create a binding in a pattern). - [Future-incompatible lints](#future-incompatible), where something was accidentally or erroneously accepted in the past, but rejecting would cause excessive breakage in the ecosystem. - - Stylistic choices. For example, camel or snake case, or the `dyn` trait - warning in the 2018 edition. These have a high bar to be added, and should - only be used in exceptional circumstances. Other stylistic choices should - either be allow-by-default lints, or part of other tools like Clippy or - rustfmt. + - Stylistic choices. + For example, camel or snake case, or the `dyn` trait warning in the 2018 edition. + These have a high bar to be added, and should only be used in exceptional circumstances. + Other stylistic choices should + either be allow-by-default lints, or part of other tools like Clippy or rustfmt. - `help`: emitted following an `error` or `warning` to give additional - information to the user about how to solve their problem. These messages - often include a suggestion string and [`rustc_errors::Applicability`] - confidence level to guide automated source fixes by tools. See the - [Suggestions](#suggestions) section for more details. + information to the user about how to solve their problem. + These messages often include a suggestion string and [`rustc_errors::Applicability`] + confidence level to guide automated source fixes by tools. + See the [Suggestions](#suggestions) section for more details. The error or warning portion should *not* suggest how to fix the problem, only the "help" sub-diagnostic should. - `note`: emitted to give more context and identify additional circumstances - and parts of the code that caused the warning or error. For example, the - borrow checker will note any previous conflicting borrows. + and parts of the code that caused the warning or error. + For example, the borrow checker will note any previous conflicting borrows. `help` vs `note`: `help` should be used to show changes the user can - possibly make to fix the problem. `note` should be used for everything else, + possibly make to fix the problem. + `note` should be used for everything else, such as other context, information and facts, online resources to read, etc. Not to be confused with *lint levels*, whose guidelines are: - `forbid`: Lints should never default to `forbid`. -- `deny`: Equivalent to `error` diagnostic level. Some examples: +- `deny`: Equivalent to `error` diagnostic level. + Some examples: - - A future-incompatible or edition-based lint that has graduated from the - warning level. + - A future-incompatible or edition-based lint that has graduated from the warning level. - Something that has an extremely high confidence that is incorrect, but still want an escape hatch to allow it to pass. -- `warn`: Equivalent to the `warning` diagnostic level. See `warning` above - for guidelines. +- `warn`: Equivalent to the `warning` diagnostic level. + See `warning` above for guidelines. - `allow`: Examples of the kinds of lints that should default to `allow`: - The lint has a too high false positive rate. - The lint is too opinionated. - The lint is experimental. - The lint is used for enforcing something that is not normally enforced. - For example, the `unsafe_code` lint can be used to prevent usage of unsafe - code. + For example, the `unsafe_code` lint can be used to prevent usage of unsafe code. More information about lint levels can be found in the [rustc book][rustc-lint-levels] and the [reference][reference-diagnostics]. @@ -270,16 +270,15 @@ book][rustc-lint-levels] and the [reference][reference-diagnostics]. There are three main ways to find where a given error is emitted: -- `grep` for either a sub-part of the error message/label or error code. This - usually works well and is straightforward, but there are some cases where +- `grep` for either a sub-part of the error message/label or error code. + This usually works well and is straightforward, but there are some cases where the code emitting the error is removed from the code where the error is - constructed behind a relatively deep call-stack. Even then, it is a good way - to get your bearings. + constructed behind a relatively deep call-stack. + Even then, it is a good way to get your bearings. - Invoking `rustc` with the nightly-only flag `-Z treat-err-as-bug=1` will treat the first error being emitted as an Internal Compiler Error, which - allows you to get a - stack trace at the point the error has been emitted. Change the `1` to - something else if you wish to trigger on a later error. + allows you to get a stack trace at the point the error has been emitted. + Change the `1` to something else if you wish to trigger on a later error. There are limitations with this approach: - Some calls get elided from the stack trace because they get inlined in the compiled `rustc`. @@ -296,7 +295,8 @@ order things are happening. ## `Span` [`Span`][span] is the primary data structure in `rustc` used to represent a -location in the code being compiled. `Span`s are attached to most constructs in +location in the code being compiled. +`Span`s are attached to most constructs in HIR and MIR, allowing for more informative error reporting. [span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html @@ -310,21 +310,21 @@ similar methods on the `SourceMap`. ## Error messages -The [`rustc_errors`][errors] crate defines most of the utilities used for -reporting errors. +The [`rustc_errors`][errors] crate defines most of the utilities used for reporting errors. [errors]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/index.html -Diagnostics can be implemented as types which implement the `Diagnostic` -trait. This is preferred for new diagnostics as it enforces a separation -between diagnostic emitting logic and the main code paths. For less-complex -diagnostics, the `Diagnostic` trait can be derived -- see [Diagnostic -structs][diagnostic-structs]. Within the trait implementation, the APIs -described below can be used as normal. +Diagnostics can be implemented as types which implement the `Diagnostic` trait. +This is preferred for new diagnostics as it enforces a separation +between diagnostic emitting logic and the main code paths. +For less-complex diagnostics, the `Diagnostic` trait can be derived -- see [Diagnostic +structs][diagnostic-structs]. +Within the trait implementation, the APIs described below can be used as normal. [diagnostic-structs]: ./diagnostics/diagnostic-structs.md -[`DiagCtxt`][DiagCtxt] has methods that create and emit errors. These methods +[`DiagCtxt`][DiagCtxt] has methods that create and emit errors. +These methods usually have names like `span_err` or `struct_span_err` or `span_warn`, etc... There are lots of them; they emit different types of "errors", such as warnings, errors, fatal errors, suggestions, etc. @@ -332,10 +332,10 @@ warnings, errors, fatal errors, suggestions, etc. [DiagCtxt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/struct.DiagCtxt.html In general, there are two classes of such methods: ones that emit an error -directly and ones that allow finer control over what to emit. For example, +directly and ones that allow finer control over what to emit. +For example, [`span_err`][spanerr] emits the given error message at the given `Span`, but -[`struct_span_err`][strspanerr] instead returns a -[`Diag`][diag]. +[`struct_span_err`][strspanerr] instead returns a [`Diag`][diag]. Most of these methods will accept strings, but it is recommended that typed identifiers for translatable diagnostics be used for new diagnostics (see @@ -344,8 +344,8 @@ identifiers for translatable diagnostics be used for new diagnostics (see [translation]: ./diagnostics/translation.md `Diag` allows you to add related notes and suggestions to an error -before emitting it by calling the [`emit`][emit] method. (Failing to either -emit or [cancel][cancel] a `Diag` will result in an ICE.) See the +before emitting it by calling the [`emit`][emit] method. +(Failing to either emit or [cancel] a `Diag` will result in an ICE.) See the [docs][diag] for more info on what you can do. [spanerr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/struct.DiagCtxt.html#method.span_err @@ -384,7 +384,8 @@ example-example-error = oh no! this is an error! ## Suggestions In addition to telling the user exactly _why_ their code is wrong, it's -oftentimes furthermore possible to tell them how to fix it. To this end, +oftentimes furthermore possible to tell them how to fix it. +To this end, [`Diag`][diag] offers a structured suggestions API, which formats code suggestions pleasingly in the terminal, or (when the `--error-format json` flag is passed) as JSON for consumption by tools like [`rustfix`][rustfix]. @@ -395,25 +396,22 @@ is passed) as JSON for consumption by tools like [`rustfix`][rustfix]. Not all suggestions should be applied mechanically, they have a degree of confidence in the suggested code, from high (`Applicability::MachineApplicable`) to low (`Applicability::MaybeIncorrect`). -Be conservative when choosing the level. Use the -[`span_suggestion`][span_suggestion] method of `Diag` to -make a suggestion. The last argument provides a hint to tools whether -the suggestion is mechanically applicable or not. +Be conservative when choosing the level. +Use the [`span_suggestion`][span_suggestion] method of `Diag` to +make a suggestion. +The last argument provides a hint to tools whether the suggestion is mechanically applicable or not. Suggestions point to one or more spans with corresponding code that will replace their current content. -The message that accompanies them should be understandable in the following -contexts: +The message that accompanies them should be understandable in the following contexts: - shown as an independent sub-diagnostic (this is the default output) - shown as a label pointing at the affected span (this is done automatically if some heuristics for verbosity are met) - shown as a `help` sub-diagnostic with no content (used for cases where the -suggestion is obvious from the text, but we still want to let tools to apply -them) -- not shown (used for _very_ obvious cases, but we still want to allow tools to -apply them) +suggestion is obvious from the text, but we still want to let tools to apply them) +- not shown (used for _very_ obvious cases, but we still want to allow tools to apply them) [span_suggestion]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/struct.Diag.html#method.span_suggestion @@ -474,8 +472,8 @@ The possible values of [`Applicability`][appl] are: - `MachineApplicable`: Can be applied mechanically. - `HasPlaceholders`: Cannot be applied mechanically because it has placeholder - text in the suggestions. For example: ```try adding a type: `let x: - ` ```. + text in the suggestions. + For example: ```try adding a type: `let x: ` ```. - `MaybeIncorrect`: Cannot be applied mechanically because the suggestion may or may not be a good one. - `Unspecified`: Cannot be applied mechanically because we don't know which @@ -485,43 +483,40 @@ The possible values of [`Applicability`][appl] are: ### Suggestion Style Guide -- Suggestions should not be a question. In particular, language like "did you - mean" should be avoided. Sometimes, it's unclear why a particular suggestion - is being made. In these cases, it's better to be upfront about what the - suggestion is. +- Suggestions should not be a question. + In particular, language like "did you mean" should be avoided. + Sometimes, it's unclear why a particular suggestion is being made. + In these cases, it's better to be upfront about what the suggestion is. - Compare "did you mean: `Foo`" vs. "there is a struct with a similar name: `Foo`". + Compare "did you mean: `Foo`" vs. + "there is a struct with a similar name: `Foo`". - The message should not contain any phrases like "the following", "as shown", etc. Use the span to convey what is being talked about. -- The message may contain further instruction such as "to do xyz, use" or "to do - xyz, use abc". -- The message may contain a name of a function, variable, or type, but avoid - whole expressions. +- The message may contain further instruction such as "to do xyz, use" or "to do xyz, use abc". +- The message may contain a name of a function, variable, or type, but avoid whole expressions. ## Lints -The compiler linting infrastructure is defined in the [`rustc_middle::lint`][rlint] -module. +The compiler linting infrastructure is defined in the [`rustc_middle::lint`][rlint] module. [rlint]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/lint/index.html ### When do lints run? -Different lints will run at different times based on what information the lint -needs to do its job. Some lints get grouped into *passes* where the lints -within a pass are processed together via a single visitor. Some of the passes -are: +Different lints will run at different times based on what information the lint needs to do its job. +Some lints get grouped into *passes* where the lints +within a pass are processed together via a single visitor. +Some of the passes are: -- Pre-expansion pass: Works on [AST nodes] before [macro expansion]. This - should generally be avoided. +- Pre-expansion pass: Works on [AST nodes] before [macro expansion]. + This should generally be avoided. - Example: [`keyword_idents`] checks for identifiers that will become - keywords in future editions, but is sensitive to identifiers used in - macros. + keywords in future editions, but is sensitive to identifiers used in macros. - Early lint pass: Works on [AST nodes] after [macro expansion] and name - resolution, just before [AST lowering]. These lints are for purely - syntactical lints. + resolution, just before [AST lowering]. + These lints are for purely syntactical lints. - Example: The [`unused_parens`] lint checks for parenthesized-expressions in situations where they are not needed, like an `if` condition. @@ -532,22 +527,24 @@ are: uninitialized values) is a late lint because it needs type information to figure out whether a type allows being left uninitialized. -- MIR pass: Works on [MIR nodes]. This isn't quite the same as other passes; +- MIR pass: Works on [MIR nodes]. + This isn't quite the same as other passes; lints that work on MIR nodes have their own methods for running. - Example: The [`arithmetic_overflow`] lint is emitted when it detects a constant value that may overflow. Most lints work well via the pass systems, and they have a fairly straightforward interface and easy way to integrate (mostly just implementing -a specific `check` function). However, some lints are easier to write when -they live on a specific code path anywhere in the compiler. For example, the -[`unused_mut`] lint is implemented in the borrow checker as it requires some +a specific `check` function). +However, some lints are easier to write when +they live on a specific code path anywhere in the compiler. +For example, the [`unused_mut`] lint is implemented in the borrow checker as it requires some information and state in the borrow checker. -Some of these inline lints fire before the linting system is ready. Those -lints will be *buffered* where they are held until later phases of the -compiler when the linting system is ready. See [Linting early in the -compiler](#linting-early-in-the-compiler). +Some of these inline lints fire before the linting system is ready. +Those lints will be *buffered* where they are held until later phases of the +compiler when the linting system is ready. +See [Linting early in the compiler](#linting-early-in-the-compiler). [AST nodes]: the-parser.md @@ -564,30 +561,28 @@ compiler](#linting-early-in-the-compiler). ### Lint definition terms -Lints are managed via the [`LintStore`][LintStore] and get registered in -various ways. The following terms refer to the different classes of lints +Lints are managed via the [`LintStore`][LintStore] and get registered in various ways. +The following terms refer to the different classes of lints generally based on how they are registered. - *Built-in* lints are defined inside the compiler source. - *Driver-registered* lints are registered when the compiler driver is created - by an external driver. This is the mechanism used by Clippy, for example. + by an external driver. + This is the mechanism used by Clippy, for example. - *Tool* lints are lints with a path prefix like `clippy::` or `rustdoc::`. - *Internal* lints are the `rustc::` scoped tool lints that only run on the - rustc source tree itself and are defined in the compiler source like a - regular built-in lint. + rustc source tree itself and are defined in the compiler source like a regular built-in lint. -More information about lint registration can be found in the [LintStore] -chapter. +More information about lint registration can be found in the [LintStore] chapter. [LintStore]: diagnostics/lintstore.md ### Declaring a lint -The built-in compiler lints are defined in the [`rustc_lint`][builtin] -crate. Lints that need to be implemented in other crates are defined in -[`rustc_lint_defs`]. You should prefer to place lints in `rustc_lint` if -possible. One benefit is that it is close to the dependency root, so it can be -much faster to work on. +The built-in compiler lints are defined in the [`rustc_lint`][builtin] crate. +Lints that need to be implemented in other crates are defined in [`rustc_lint_defs`]. +You should prefer to place lints in `rustc_lint` if possible. +One benefit is that it is close to the dependency root, so it can be much faster to work on. [builtin]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lint/index.html [`rustc_lint_defs`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lint_defs/index.html @@ -596,12 +591,11 @@ Every lint is implemented via a `struct` that implements the `LintPass` `trait` (you can also implement one of the more specific lint pass traits, either `EarlyLintPass` or `LateLintPass` depending on when is best for your lint to run). The trait implementation allows you to check certain syntactic constructs -as the linter walks the AST. You can then choose to emit lints in a -very similar way to compile errors. +as the linter walks the AST. +You can then choose to emit lints in a very similar way to compile errors. -You also declare the metadata of a particular lint via the [`declare_lint!`] -macro. This macro includes the name, the default level, a short description, and some -more details. +You also declare the metadata of a particular lint via the [`declare_lint!`] macro. +This macro includes the name, the default level, a short description, and some more details. Note that the lint and the lint pass must be registered with the compiler. @@ -673,7 +667,8 @@ example-use-loop = denote infinite loops with `loop {"{"} ... {"}"}` ### Edition-gated lints -Sometimes we want to change the behavior of a lint in a new edition. To do this, +Sometimes we want to change the behavior of a lint in a new edition. +To do this, we just add the transition to our invocation of `declare_lint!`: ```rust,ignore @@ -692,8 +687,8 @@ See [Edition-specific lints](./guides/editions.md#edition-specific-lints) for mo ### Feature-gated lints -Lints belonging to a feature should only be usable if the feature is enabled in the -crate. To support this, lint declarations can contain a feature gate like so: +Lints belonging to a feature should only be usable if the feature is enabled in the crate. +To support this, lint declarations can contain a feature gate like so: ```rust,ignore declare_lint! { @@ -710,10 +705,10 @@ The use of the term `future-incompatible` within the compiler has a slightly broader meaning than what rustc exposes to users of the compiler. Inside rustc, future-incompatible lints are for signalling to the user that code they have -written may not compile in the future. In general, future-incompatible code -exists for two reasons: -* The user has written unsound code that the compiler mistakenly accepted. While -it is within Rust's backwards compatibility guarantees to fix the soundness hole +written may not compile in the future. +In general, future-incompatible code exists for two reasons: +* The user has written unsound code that the compiler mistakenly accepted. + While it is within Rust's backwards compatibility guarantees to fix the soundness hole (breaking the user's code), the lint is there to warn the user that this will happen in some upcoming version of rustc *regardless of which edition the code uses*. This is the meaning that rustc exclusively exposes to users as "future incompatible". @@ -723,8 +718,7 @@ typically seen in the various "edition compatibility" lint groups (e.g., `rust_2 that are used to lint against code that will break if the user updates the crate's edition. See [migration lints](guides/editions.md#migration-lints) for more details. -A future-incompatible lint should be declared with the `@future_incompatible` -additional "field": +A future-incompatible lint should be declared with the `@future_incompatible` additional "field": ```rust,ignore declare_lint! { @@ -739,7 +733,8 @@ declare_lint! { Notice the `reason` field which describes why the future incompatible change is happening. This will change the diagnostic message the user receives as well as determine which -lint groups the lint is added to. In the example above, the lint is an "edition lint" +lint groups the lint is added to. +In the example above, the lint is an "edition lint" (since its "reason" is `EditionError`), signifying to the user that the use of anonymous parameters will no longer compile in Rust 2018 and beyond. @@ -750,14 +745,14 @@ an edition) or into the `future_incompatibility` lint group. [fi-lint-groupings]: https://github.com/rust-lang/rust/blob/51fd129ac12d5bfeca7d216c47b0e337bf13e0c2/compiler/rustc_lint/src/context.rs#L212-L237 If you need a combination of options that's not supported by the -`declare_lint!` macro, you can always change the `declare_lint!` macro -to support this. +`declare_lint!` macro, you can always change the `declare_lint!` macro to support this. ### Renaming or removing a lint If it is determined that a lint is either improperly named or no longer needed, the lint must be registered for renaming or removal, which will trigger a warning if a user tries -to use the old lint name. To declare a rename/remove, add a line with +to use the old lint name. +To declare a rename/remove, add a line with [`store.register_renamed`] or [`store.register_removed`] to the code of the [`rustc_lint::register_builtins`] function. @@ -771,9 +766,10 @@ store.register_renamed("single_use_lifetime", "single_use_lifetimes"); ### Lint groups -Lints can be turned on in groups. These groups are declared in the -[`register_builtins`][rbuiltins] function in [`rustc_lint::lib`][builtin]. The -`add_lint_group!` macro is used to declare a new group. +Lints can be turned on in groups. +These groups are declared in the +[`register_builtins`][rbuiltins] function in [`rustc_lint::lib`][builtin]. +The `add_lint_group!` macro is used to declare a new group. [rbuiltins]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lint/fn.register_builtins.html @@ -787,15 +783,17 @@ add_lint_group!(sess, NON_UPPER_CASE_GLOBALS); ``` -This defines the `nonstandard_style` group which turns on the listed lints. A -user can turn on these lints with a `#![warn(nonstandard_style)]` attribute in +This defines the `nonstandard_style` group which turns on the listed lints. +A user can turn on these lints with a `#![warn(nonstandard_style)]` attribute in the source code, or by passing `-W nonstandard-style` on the command line. -Some lint groups are created automatically in `LintStore::register_lints`. For instance, +Some lint groups are created automatically in `LintStore::register_lints`. +For instance, any lint declared with `FutureIncompatibleInfo` where the reason is `FutureIncompatibilityReason::FutureReleaseError` (the default when `@future_incompatible` is used in `declare_lint!`), will be added to -the `future_incompatible` lint group. Editions also have their own lint groups +the `future_incompatible` lint group. +Editions also have their own lint groups (e.g., `rust_2021_compatibility`) automatically generated for any lints signaling future-incompatible code that will break in the specified edition. @@ -806,10 +804,10 @@ has been initialized (e.g. during parsing or macro expansion). This is problematic because we need to have computed lint levels to know whether we should emit a warning or an error or nothing at all. -To solve this problem, we buffer the lints until the linting system is -processed. [`Session`][sessbl] and [`ParseSess`][parsebl] both have -`buffer_lint` methods that allow you to buffer a lint for later. The linting -system automatically takes care of handling buffered lints later. +To solve this problem, we buffer the lints until the linting system is processed. +[`Session`][sessbl] and [`ParseSess`][parsebl] both have +`buffer_lint` methods that allow you to buffer a lint for later. +The linting system automatically takes care of handling buffered lints later. [sessbl]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_session/struct.Session.html#method.buffer_lint [parsebl]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_session/parse/struct.ParseSess.html#method.buffer_lint @@ -820,19 +818,20 @@ like normal but invokes the lint with `buffer_lint`. #### Linting even earlier in the compiler The parser (`rustc_ast`) is interesting in that it cannot have dependencies on -any of the other `rustc*` crates. In particular, it cannot depend on -`rustc_middle::lint` or `rustc_lint`, where all of the compiler linting -infrastructure is defined. That's troublesome! +any of the other `rustc*` crates. +In particular, it cannot depend on +`rustc_middle::lint` or `rustc_lint`, where all of the compiler linting infrastructure is defined. +That's troublesome! -To solve this, `rustc_ast` defines its own buffered lint type, which -`ParseSess::buffer_lint` uses. After macro expansion, these buffered lints are +To solve this, `rustc_ast` defines its own buffered lint type, which `ParseSess::buffer_lint` uses. +After macro expansion, these buffered lints are then dumped into the `Session::buffered_lints` used by the rest of the compiler. ## JSON diagnostic output The compiler accepts an `--error-format json` flag to output -diagnostics as JSON objects (for the benefit of tools such as `cargo -fix`). It looks like this: +diagnostics as JSON objects (for the benefit of tools such as `cargo fix`). +It looks like this: ```console $ rustc json_error_demo.rs --error-format json @@ -844,10 +843,9 @@ $ rustc json_error_demo.rs --error-format json Note that the output is a series of lines, each of which is a JSON object, but the series of lines taken together is, unfortunately, not valid JSON, thwarting tools and tricks (such as [piping to `python3 -m -json.tool`](https://docs.python.org/3/library/json.html#module-json.tool)) -that require such. (One speculates that this was intentional for LSP -performance purposes, so that each line/object can be sent as -it is flushed?) +json.tool`](https://docs.python.org/3/library/json.html#module-json.tool)) that require such. +(One speculates that this was intentional for LSP +performance purposes, so that each line/object can be sent as it is flushed?) Also note the "rendered" field, which contains the "human" output as a string; this was introduced so that UI tests could both make use of @@ -860,15 +858,16 @@ The "human" readable and the json format emitter can be found under The JSON emitter defines [its own `Diagnostic` struct](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/json/struct.Diagnostic.html) -(and sub-structs) for the JSON serialization. Don't confuse this with +(and sub-structs) for the JSON serialization. +Don't confuse this with [`errors::Diag`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/struct.Diag.html)! ## `#[rustc_on_unimplemented]` This attribute allows trait definitions to modify error messages when an implementation was -expected but not found. The string literals in the attribute are format strings and can be -formatted with named parameters. See the Formatting -section below for what parameters are permitted. +expected but not found. +The string literals in the attribute are format strings and can be formatted with named parameters. +See the Formatting section below for what parameters are permitted. ```rust,ignore #[rustc_on_unimplemented(message = "an iterator over \ @@ -942,21 +941,22 @@ application of these fields with `on`. You can filter on the following boolean flags: - `crate_local`: whether the code causing the trait bound to not be - fulfilled is part of the user's crate. This is used to avoid suggesting - code changes that would require modifying a dependency. + fulfilled is part of the user's crate. + This is used to avoid suggesting code changes that would require modifying a dependency. - `direct`: whether this is a user-specified rather than derived obligation. - `from_desugaring`: whether we are in some kind of desugaring, like `?` - or a `try` block for example. This flag can also be matched on, see below. + or a `try` block for example. + This flag can also be matched on, see below. You can match on the following names and values, using `name = "value"`: - - `cause`: Match against one variant of the `ObligationCauseCode` - enum. Only `"MainFunctionType"` is supported. - - `from_desugaring`: Match against a particular variant of the `DesugaringKind` - enum. The desugaring is identified by its variant name, for example + - `cause`: Match against one variant of the `ObligationCauseCode` enum. + Only `"MainFunctionType"` is supported. + - `from_desugaring`: Match against a particular variant of the `DesugaringKind` enum. + The desugaring is identified by its variant name, for example `"QuestionMark"` for `?` desugaring or `"TryBlock"` for `try` blocks. - `Self` and any generic arguments of the trait, like `Self = "alloc::string::String"` or `Rhs="i32"`. - + The compiler can provide several values to match on, for example: - the self_ty, pretty printed with and without type arguments resolved. - `"{integral}"`, if self_ty is an integral of which the type is known. @@ -1014,14 +1014,15 @@ pub trait From: Sized { } ``` -### Formatting +### Formatting The string literals are format strings that accept parameters wrapped in braces but positional and listed parameters and format specifiers are not accepted. The following parameter names are valid: - `Self` and all generic parameters of the trait. - `This`: the name of the trait the attribute is on, without generics. -- `Trait`: the name of the "sugared" trait. See `TraitRefPrintSugared`. +- `Trait`: the name of the "sugared" trait. + See `TraitRefPrintSugared`. - `ItemContext`: the kind of `hir::Node` we're in, things like `"an async block"`, `"a function"`, `"an async function"`, etc. @@ -1042,7 +1043,7 @@ fn main() { } ``` -Will format the message into +Will format the message into ```text "Self = `i8`, T = `i32`, this = `From`, trait = `From`, context = `a function`" ``` diff --git a/src/doc/rustc-dev-guide/src/diagnostics/diagnostic-structs.md b/src/doc/rustc-dev-guide/src/diagnostics/diagnostic-structs.md index 2260b1ec4df1a..a99f6a2849c32 100644 --- a/src/doc/rustc-dev-guide/src/diagnostics/diagnostic-structs.md +++ b/src/doc/rustc-dev-guide/src/diagnostics/diagnostic-structs.md @@ -4,8 +4,7 @@ rustc has three diagnostic traits that can be used to create diagnostics: For simple diagnostics, derived impls can be used, e.g. `#[derive(Diagnostic)]`. They are only suitable for simple diagnostics that -don't require much logic in deciding whether or not to add additional -subdiagnostics. +don't require much logic in deciding whether or not to add additional subdiagnostics. In cases where diagnostics require more complex or dynamic behavior, such as conditionally adding subdiagnostics, customizing the rendering logic, or selecting messages at runtime, you will need to manually implement @@ -16,8 +15,7 @@ Diagnostic can be translated into different languages. ## `#[derive(Diagnostic)]` and `#[derive(LintDiagnostic)]` -Consider the [definition][defn] of the "field already declared" diagnostic -shown below: +Consider the [definition][defn] of the "field already declared" diagnostic shown below: ```rust,ignore #[derive(Diagnostic)] @@ -32,47 +30,47 @@ pub struct FieldAlreadyDeclared { } ``` -`Diagnostic` can only be derived on structs and enums. +`Diagnostic` can only be derived on structs and enums. Attributes that are placed on the type for structs are placed on each -variants for enums (or vice versa). Each `Diagnostic` has to have one +variants for enums (or vice versa). +Each `Diagnostic` has to have one attribute, `#[diag(...)]`, applied to the struct or each enum variant. If an error has an error code (e.g. "E0624"), then that can be specified using -the `code` sub-attribute. Specifying a `code` isn't mandatory, but if you are +the `code` sub-attribute. +Specifying a `code` isn't mandatory, but if you are porting a diagnostic that uses `Diag` to use `Diagnostic` then you should keep the code if there was one. -`#[diag(..)]` must provide a message as the first positional argument. -The message is written in English, but might be translated to the locale requested by the user. See -[translation documentation](./translation.md) to learn more about how +`#[diag(..)]` must provide a message as the first positional argument. +The message is written in English, but might be translated to the locale requested by the user. +See [translation documentation](./translation.md) to learn more about how translatable error messages are written and how they are generated. Every field of the `Diagnostic` which does not have an annotation is -available in Fluent messages as a variable, like `field_name` in the example -above. Fields can be annotated `#[skip_arg]` if this is undesired. +available in Fluent messages as a variable, like `field_name` in the example above. +Fields can be annotated `#[skip_arg]` if this is undesired. Using the `#[primary_span]` attribute on a field (that has type `Span`) -indicates the primary span of the diagnostic which will have the main message -of the diagnostic. +indicates the primary span of the diagnostic which will have the main message of the diagnostic. Diagnostics are more than just their primary message, they often include -labels, notes, help messages and suggestions, all of which can also be -specified on a `Diagnostic`. +labels, notes, help messages and suggestions, all of which can also be specified on a `Diagnostic`. `#[label]`, `#[help]`, `#[warning]` and `#[note]` can all be applied to fields which have the -type `Span`. Applying any of these attributes will create the corresponding -subdiagnostic with that `Span`. These attributes take a diagnostic message as an argument. +type `Span`. +Applying any of these attributes will create the corresponding subdiagnostic with that `Span`. +These attributes take a diagnostic message as an argument. Other types have special behavior when used in a `Diagnostic` derive: - Any attribute applied to an `Option` will only emit a subdiagnostic if the option is `Some(..)`. -- Any attribute applied to a `Vec` will be repeated for each element of the - vector. +- Any attribute applied to a `Vec` will be repeated for each element of the vector. `#[help]`, `#[warning]` and `#[note]` can also be applied to the struct itself, in which case -they work exactly like when applied to fields except the subdiagnostic won't -have a `Span`. These attributes can also be applied to fields of type `()` for +they work exactly like when applied to fields except the subdiagnostic won't have a `Span`. +These attributes can also be applied to fields of type `()` for the same effect, which when combined with the `Option` type can be used to represent optional `#[note]`/`#[help]`/`#[warning]` subdiagnostics. @@ -84,8 +82,8 @@ Suggestions can be emitted using one of four field attributes: - `#[suggestion_verbose("message", code = "...", applicability = "...")]` Suggestions must be applied on either a `Span` field or a `(Span, -MachineApplicability)` field. Similarly to other field attributes, a message -needs to be provided which will be shown to the user. +MachineApplicability)` field. +Similarly to other field attributes, a message needs to be provided which will be shown to the user. `code` specifies the code that should be suggested as a replacement and is a format string (e.g. `{field_name}` would be replaced by the value of the `field_name` field of the struct). @@ -113,8 +111,8 @@ impl<'a, G: EmissionGuarantee> Diagnostic<'a> for FieldAlreadyDeclared { } ``` -Now that we've defined our diagnostic, how do we [use it][use]? It's quite -straightforward, just create an instance of the struct and pass it to +Now that we've defined our diagnostic, how do we [use it][use]? +It's quite straightforward, just create an instance of the struct and pass it to `emit_err` (or `emit_warning`): ```rust,ignore @@ -126,8 +124,7 @@ tcx.dcx().emit_err(FieldAlreadyDeclared { ``` ### Reference for `#[derive(Diagnostic)]` and `#[derive(LintDiagnostic)]` -`#[derive(Diagnostic)]` and `#[derive(LintDiagnostic)]` support the -following attributes: +`#[derive(Diagnostic)]` and `#[derive(LintDiagnostic)]` support the following attributes: - `#[diag("message", code = "...")]` - _Applied to struct or enum variant._ @@ -164,17 +161,17 @@ following attributes: - Value is the suggestion message that will be shown to the user. - See [translation documentation](./translation.md). - `code = "..."`/`code("...", ...)` (_Mandatory_) - - One or multiple format strings indicating the code to be suggested as a - replacement. Multiple values signify multiple possible replacements. + - One or multiple format strings indicating the code to be suggested as a replacement. + Multiple values signify multiple possible replacements. - `applicability = "..."` (_Optional_) - String which must be one of `machine-applicable`, `maybe-incorrect`, `has-placeholders` or `unspecified`. - `#[subdiagnostic]` - - _Applied to a type that implements `Subdiagnostic` (from - `#[derive(Subdiagnostic)]`)._ + - _Applied to a type that implements `Subdiagnostic` (from `#[derive(Subdiagnostic)]`)._ - Adds the subdiagnostic represented by the subdiagnostic struct. - `#[primary_span]` (_Optional_) - - _Applied to `Span` fields on `Subdiagnostic`s. Not used for `LintDiagnostic`s._ + - _Applied to `Span` fields on `Subdiagnostic`s. + Not used for `LintDiagnostic`s._ - Indicates the primary span of the diagnostic. - `#[skip_arg]` (_Optional_) - _Applied to any field._ @@ -182,14 +179,14 @@ following attributes: ## `#[derive(Subdiagnostic)]` It is common in the compiler to write a function that conditionally adds a -specific subdiagnostic to an error if it is applicable. Oftentimes these -subdiagnostics could be represented using a diagnostic struct even if the -overall diagnostic could not. In this circumstance, the `Subdiagnostic` +specific subdiagnostic to an error if it is applicable. +Oftentimes these subdiagnostics could be represented using a diagnostic struct even if the +overall diagnostic could not. +In this circumstance, the `Subdiagnostic` derive can be used to represent a partial diagnostic (e.g a note, label, help or suggestion) as a struct. -Consider the [definition][subdiag_defn] of the "expected return type" label -shown below: +Consider the [definition][subdiag_defn] of the "expected return type" label shown below: ```rust #[derive(Subdiagnostic)] @@ -208,10 +205,10 @@ pub enum ExpectedReturnTypeLabel<'tcx> { } ``` -Like `Diagnostic`, `Subdiagnostic` can be derived for structs or -enums. Attributes that are placed on the type for structs are placed on each -variants for enums (or vice versa). Each `Subdiagnostic` should have one -attribute applied to the struct or each variant, one of: +Like `Diagnostic`, `Subdiagnostic` can be derived for structs or enums. +Attributes that are placed on the type for structs are placed on each +variants for enums (or vice versa). +Each `Subdiagnostic` should have one attribute applied to the struct or each variant, one of: - `#[label(..)]` for defining a label - `#[note(..)]` for defining a note @@ -224,15 +221,14 @@ See [translation documentation](./translation.md) to learn more about how translatable error messages are generated. Using the `#[primary_span]` attribute on a field (with type `Span`) will denote -the primary span of the subdiagnostic. A primary span is only necessary for a -label or suggestion, which can not be spanless. +the primary span of the subdiagnostic. +A primary span is only necessary for a label or suggestion, which can not be spanless. Every field of the type/variant which does not have an annotation is available -in Fluent messages as a variable. Fields can be annotated `#[skip_arg]` if this -is undesired. +in Fluent messages as a variable. +Fields can be annotated `#[skip_arg]` if this is undesired. -Like `Diagnostic`, `Subdiagnostic` supports `Option` and -`Vec` fields. +Like `Diagnostic`, `Subdiagnostic` supports `Option` and `Vec` fields. Suggestions can be emitted using one of four attributes on the type/variant: @@ -241,8 +237,7 @@ Suggestions can be emitted using one of four attributes on the type/variant: - `#[suggestion_short("...", code = "...", applicability = "...")]` - `#[suggestion_verbose("...", code = "...", applicability = "...")]` -Suggestions require `#[primary_span]` be set on a field and can have the -following sub-attributes: +Suggestions require `#[primary_span]` be set on a field and can have the following sub-attributes: - The first positional argument specifies the message which will be shown to the user. - `code` specifies the code that should be suggested as a replacement and is a @@ -276,8 +271,7 @@ impl<'tcx> Subdiagnostic for ExpectedReturnTypeLabel<'tcx> { Once defined, a subdiagnostic can be used by passing it to the `subdiagnostic` function ([example][subdiag_use_1] and [example][subdiag_use_2]) on a -diagnostic or by assigning it to a `#[subdiagnostic]`-annotated field of a -diagnostic struct. +diagnostic or by assigning it to a `#[subdiagnostic]`-annotated field of a diagnostic struct. ### Argument sharing and isolation @@ -310,22 +304,24 @@ Additionally, subdiagnostics can access arguments from the main diagnostic with `#[derive(Subdiagnostic)]` supports the following attributes: - `#[label("message")]`, `#[help("message")]`, `#[warning("message")]` or `#[note("message")]` - - _Applied to struct or enum variant. Mutually exclusive with struct/enum variant attributes._ + - _Applied to struct or enum variant. + Mutually exclusive with struct/enum variant attributes._ - _Mandatory_ - Defines the type to be representing a label, help or note. - Message (_Mandatory_) - The diagnostic message that will be shown to the user. - See [translation documentation](./translation.md). - `#[suggestion{,_hidden,_short,_verbose}("message", code = "...", applicability = "...")]` - - _Applied to struct or enum variant. Mutually exclusive with struct/enum variant attributes._ + - _Applied to struct or enum variant. + Mutually exclusive with struct/enum variant attributes._ - _Mandatory_ - Defines the type to be representing a suggestion. - Message (_Mandatory_) - The diagnostic message that will be shown to the user. - See [translation documentation](./translation.md). - `code = "..."`/`code("...", ...)` (_Mandatory_) - - One or multiple format strings indicating the code to be suggested as a - replacement. Multiple values signify multiple possible replacements. + - One or multiple format strings indicating the code to be suggested as a replacement. + Multiple values signify multiple possible replacements. - `applicability = "..."` (_Optional_) - _Mutually exclusive with `#[applicability]` on a field._ - Value is the applicability of the suggestion. @@ -335,7 +331,8 @@ Additionally, subdiagnostics can access arguments from the main diagnostic with - `has-placeholders` - `unspecified` - `#[multipart_suggestion{,_hidden,_short,_verbose}("message", applicability = "...")]` - - _Applied to struct or enum variant. Mutually exclusive with struct/enum variant attributes._ + - _Applied to struct or enum variant. + Mutually exclusive with struct/enum variant attributes._ - _Mandatory_ - Defines the type to be representing a multipart suggestion. - Message (_Mandatory_): see `#[suggestion]` @@ -348,8 +345,7 @@ to multipart suggestions) - _Applied to `Span` fields._ - Indicates the span to be one part of the multipart suggestion. - `code = "..."` (_Mandatory_) - - Value is a format string indicating the code to be suggested as a - replacement. + - Value is a format string indicating the code to be suggested as a replacement. - `#[applicability]` (_Optional_; only applicable to (simple and multipart) suggestions) - _Applied to `Applicability` fields._ - Indicates the applicability of the suggestion. diff --git a/src/doc/rustc-dev-guide/src/diagnostics/translation.md b/src/doc/rustc-dev-guide/src/diagnostics/translation.md index 112bc661ff8a1..88526ec1c5c78 100644 --- a/src/doc/rustc-dev-guide/src/diagnostics/translation.md +++ b/src/doc/rustc-dev-guide/src/diagnostics/translation.md @@ -5,7 +5,8 @@ rustc's current diagnostics translation infrastructure (as of October 2024 ) unfortunately causes some friction for compiler contributors, and the current infrastructure is mostly pending a redesign that better addresses needs of both -compiler contributors and translation teams. Note that there is no current +compiler contributors and translation teams. +Note that there is no current active redesign proposals (as of October 2024 )! @@ -14,13 +15,13 @@ Please see the tracking issue for status updates. The translation infra is waiting for a yet-to-be-proposed redesign and thus rework, we are not -mandating usage of current translation infra. Use the infra if you *want to* or +mandating usage of current translation infra. +Use the infra if you *want to* or otherwise makes the code cleaner, but otherwise sidestep the translation infra if you need more flexibility. -rustc's diagnostic infrastructure supports translatable diagnostics using -[Fluent]. +rustc's diagnostic infrastructure supports translatable diagnostics using [Fluent]. ## Writing translatable diagnostics @@ -28,9 +29,8 @@ There are two ways of writing translatable diagnostics: 1. For simple diagnostics, using a diagnostic (or subdiagnostic) derive. ("Simple" diagnostics being those that don't require a lot of logic in - deciding to emit subdiagnostics and can therefore be represented as - diagnostic structs). See [the diagnostic and subdiagnostic structs - documentation](./diagnostic-structs.md). + deciding to emit subdiagnostics and can therefore be represented as diagnostic structs). + See [the diagnostic and subdiagnostic structs documentation](./diagnostic-structs.md). 2. Using typed identifiers with `Diag` APIs (in `Diagnostic` or `Subdiagnostic` or `LintDiagnostic` implementations). @@ -42,14 +42,15 @@ Only updating the original English message is required. Fluent is built around the idea of "asymmetric localization", which aims to decouple the expressiveness of translations from the grammar of the source -language (English in rustc's case). Prior to translation, rustc's diagnostics +language (English in rustc's case). +Prior to translation, rustc's diagnostics relied heavily on interpolation to build the messages shown to the users. Interpolated strings are hard to translate because writing a natural-sounding translation might require more, less, or just different interpolation than the -English string, all of which would require changes to the compiler's source -code to support. +English string, all of which would require changes to the compiler's source code to support. -Diagnostic messages are defined in Fluent resources. A combined set of Fluent +Diagnostic messages are defined in Fluent resources. +A combined set of Fluent resources for a given locale (e.g. `en-US`) is known as Fluent bundle. ```fluent @@ -57,9 +58,9 @@ typeck_address_of_temporary_taken = cannot take address of a temporary ``` In the above example, `typeck_address_of_temporary_taken` is the identifier for -a Fluent message and corresponds to the diagnostic message in English. Other -Fluent resources can be written which would correspond to a message in another -language. Each diagnostic therefore has at least one Fluent message. +a Fluent message and corresponds to the diagnostic message in English. +Other Fluent resources can be written which would correspond to a message in another language. +Each diagnostic therefore has at least one Fluent message. ```fluent typeck_address_of_temporary_taken = cannot take address of a temporary @@ -68,13 +69,14 @@ typeck_address_of_temporary_taken = cannot take address of a temporary By convention, diagnostic messages for subdiagnostics are specified as "attributes" on Fluent messages (additional related messages, denoted by the -`.` syntax). In the above example, `label` is an attribute of +`.` syntax). +In the above example, `label` is an attribute of `typeck_address_of_temporary_taken` which corresponds to the message for the label added to this diagnostic. Diagnostic messages often interpolate additional context into the message shown -to the user, such as the name of a type or of a variable. Additional context to -Fluent messages is provided as an "argument" to the diagnostic. +to the user, such as the name of a type or of a variable. +Additional context to Fluent messages is provided as an "argument" to the diagnostic. ```fluent typeck_struct_expr_non_exhaustive = @@ -82,19 +84,19 @@ typeck_struct_expr_non_exhaustive = ``` In the above example, the Fluent message refers to an argument named `what` -which is expected to exist (how arguments are provided to diagnostics is -discussed in detail later). +which is expected to exist (how arguments are provided to diagnostics is discussed in detail later). -You can consult the [Fluent] documentation for other usage examples of Fluent -and its syntax. +You can consult the [Fluent] documentation for other usage examples of Fluent and its syntax. ### Guideline for message naming -Usually, fluent uses `-` for separating words inside a message name. However, -`_` is accepted by fluent as well. As `_` fits Rust's use cases better, due to +Usually, fluent uses `-` for separating words inside a message name. +However, +`_` is accepted by fluent as well. +As `_` fits Rust's use cases better, due to the identifiers on the Rust side using `_` as well, inside rustc, `-` is not -allowed for separating words, and instead `_` is recommended. The only exception -is for leading `-`s, for message names like `-passes_see_issue`. +allowed for separating words, and instead `_` is recommended. +The only exception is for leading `-`s, for message names like `-passes_see_issue`. ### Guidelines for writing translatable messages @@ -104,22 +106,22 @@ argument (not just the information required in the English message). As the compiler team gain more experience writing diagnostics that have all of the information necessary to be translated into different languages, this page -will be updated with more guidance. For now, the [Fluent] documentation has +will be updated with more guidance. +For now, the [Fluent] documentation has excellent examples of translating messages into different locales and the information that needs to be provided by the code to do so. ### Compile-time validation and typed identifiers -rustc's `#[derive(Diagnostic)]` macro performs compile-time validation of Fluent -messages. Compile-time validation of Fluent resources will emit any parsing errors +rustc's `#[derive(Diagnostic)]` macro performs compile-time validation of Fluent messages. +Compile-time validation of Fluent resources will emit any parsing errors from Fluent resources while building the compiler, preventing invalid Fluent -resources from causing panics in the compiler. Compile-time validation also -emits an error if multiple Fluent messages have the same identifier. +resources from causing panics in the compiler. +Compile-time validation also emits an error if multiple Fluent messages have the same identifier. ## Internals -Various parts of rustc's diagnostic internals are modified in order to support -translation. +Various parts of rustc's diagnostic internals are modified in order to support translation. ### Messages @@ -127,10 +129,10 @@ All of rustc's traditional diagnostic APIs (e.g. `struct_span_err` or `note`) take any message that can be converted into a `DiagMessage`. [`rustc_error_messages::DiagMessage`] can represent legacy non-translatable -diagnostic messages and translatable messages. Non-translatable messages are -just `String`s. Translatable messages are just a `&'static str` with the -identifier of the Fluent message (sometimes with an additional `&'static str` -with an attribute). +diagnostic messages and translatable messages. +Non-translatable messages are just `String`s. +Translatable messages are just a `&'static str` with the +identifier of the Fluent message (sometimes with an additional `&'static str` with an attribute). `DiagMessage` never needs to be interacted with directly: `DiagMessage` constants are created for each diagnostic message in a @@ -139,8 +141,7 @@ either be created in the macro-generated code of a diagnostic derive. `DiagMessage` implements `Into` for any type that can be converted into a string, and converts these into -non-translatable diagnostics - this keeps all existing diagnostic calls -working. +non-translatable diagnostics - this keeps all existing diagnostic calls working. ### Arguments @@ -151,8 +152,8 @@ Diagnostics have a `set_arg` function that can be used to provide this additional context to a diagnostic. Arguments have both a name (e.g. "what" in the earlier example) and a value. -Argument values are represented using the `DiagArgValue` type, which is -just a string or a number. rustc types can implement `IntoDiagArg` with +Argument values are represented using the `DiagArgValue` type, which is just a string or a number. +rustc types can implement `IntoDiagArg` with conversion into a string or a number, and common types like `Ty<'tcx>` already have such implementations. diff --git a/src/doc/rustc-dev-guide/src/feature-gate-check.md b/src/doc/rustc-dev-guide/src/feature-gate-check.md index c5a499f5708cc..59e50837c52e2 100644 --- a/src/doc/rustc-dev-guide/src/feature-gate-check.md +++ b/src/doc/rustc-dev-guide/src/feature-gate-check.md @@ -4,9 +4,9 @@ For the how-to steps to add, remove, rename, or stabilize feature gates, see [Feature gates][feature-gates]. Feature gates prevent usage of unstable language and library features without a -nightly-only `#![feature(...)]` opt-in. This chapter documents the implementation -of feature gating: where gates are defined, how they are enabled, and how usage -is verified. +nightly-only `#![feature(...)]` opt-in. +This chapter documents the implementation +of feature gating: where gates are defined, how they are enabled, and how usage is verified. @@ -15,15 +15,14 @@ is verified. All feature gate definitions are located in the `rustc_feature` crate: - **Unstable features** are declared in [`rustc_feature/src/unstable.rs`] via - the `declare_features!` macro. This associates features with issue numbers and - tracking metadata. + the `declare_features!` macro. + This associates features with issue numbers and tracking metadata. - **Accepted features** (stabilized) are listed in [`rustc_feature/src/accepted.rs`]. - **Removed features** (explicitly disallowed) are listed in [`rustc_feature/src/removed.rs`]. - **Gated built-in attributes and cfgs** are declared in [`rustc_feature/src/builtin_attrs.rs`]. -The [`rustc_feature::Features`] type represents the **active feature set** for a -crate. Helpers like `enabled`, `incomplete`, and `internal` are used during -compilation to check status. +The [`rustc_feature::Features`] type represents the **active feature set** for a crate. +Helpers like `enabled`, `incomplete`, and `internal` are used during compilation to check status. ## Collecting Features @@ -31,11 +30,10 @@ Before AST validation or expansion, `rustc` collects crate-level `#![feature(...)]` attributes to build the active `Features` set. - The collection happens in [`rustc_expand/src/config.rs`] in [`features`]. -- Each `#![feature]` entry is classified against the `unstable`, `accepted`, and - `removed` tables: +- Each `#![feature]` entry is classified against the `unstable`, `accepted`, and `removed` tables: - **Removed** features cause an immediate error. - - **Accepted** features are recorded but do not require nightly. On - stable/beta, `maybe_stage_features` in + - **Accepted** features are recorded but do not require nightly. + On stable/beta, `maybe_stage_features` in [`rustc_ast_passes/src/feature_gate.rs`] emits the non-nightly diagnostic and lists stable features, which is where the "already stabilized" messaging comes from. @@ -43,13 +41,14 @@ Before AST validation or expansion, `rustc` collects crate-level - Unknown features are treated as **library features** and validated later. - With `-Z allow-features=...`, any **unstable** or **unknown** feature not in the allowlist is rejected. -- [`RUSTC_BOOTSTRAP`] feeds into `UnstableFeatures::from_environment`. This - variable controls whether the compiler is treated as "nightly", allowing +- [`RUSTC_BOOTSTRAP`] feeds into `UnstableFeatures::from_environment`. + This variable controls whether the compiler is treated as "nightly", allowing feature gates to be bypassed during bootstrapping or explicitly disabled (`-1`). ## Parser Gating -Some syntax is detected and gated during parsing. The parser records spans for +Some syntax is detected and gated during parsing. +The parser records spans for later checking to keep diagnostics consistent and deferred until after parsing. - [`rustc_session/src/parse.rs`] defines [`GatedSpans`] and the `gate` method. @@ -77,8 +76,7 @@ in `check_crate` and its AST visitor. `check_crate` iterates over `sess.psess.gated_spans`: -- The `gate_all!` macro emits diagnostics for each gated span if the feature is - not enabled. +- The `gate_all!` macro emits diagnostics for each gated span if the feature is not enabled. - Some gates have extra logic (e.g., `yield` can be allowed by `coroutines` or `gen_blocks`). - Legacy gates (e.g., `box_patterns`, `try_blocks`) may use a separate path that @@ -92,8 +90,7 @@ easier to validate after expansion. - The visitor uses helper macros (`gate!`, `gate_alt!`, `gate_multi!`) to check: 1. Is the feature enabled? 2. Does `span.allows_unstable` permit it (for internal compiler macros)? -- Examples include `trait_alias`, `decl_macro`, `extern types`, and various - `impl Trait` forms. +- Examples include `trait_alias`, `decl_macro`, `extern types`, and various `impl Trait` forms. ## Attributes and `cfg` @@ -101,8 +98,7 @@ Beyond syntax, rustc also gates attributes and `cfg` options. ### Built-in attributes -- [`rustc_ast_passes::check_attribute`] inspects attributes against - `BUILTIN_ATTRIBUTE_MAP`. +- [`rustc_ast_passes::check_attribute`] inspects attributes against `BUILTIN_ATTRIBUTE_MAP`. - If the attribute is `AttributeGate::Gated` and the feature isn’t enabled, `feature_err` is emitted. @@ -121,9 +117,9 @@ Diagnostic helpers are located in [`rustc_session/src/parse.rs`]. - `feature_err` and `feature_warn` emit standardized diagnostics, attaching the tracking issue number where possible. - `Span::allows_unstable` in [`rustc_span/src/lib.rs`] checks if a span originates - from a macro marked with `#[allow_internal_unstable]`. This allows internal - macros to use unstable features on stable channels while enforcing gates for - user code. + from a macro marked with `#[allow_internal_unstable]`. + This allows internal + macros to use unstable features on stable channels while enforcing gates for user code. [`rustc_feature/src/unstable.rs`]: https://github.com/rust-lang/rust/blob/HEAD/compiler/rustc_feature/src/unstable.rs [`rustc_feature/src/removed.rs`]: https://github.com/rust-lang/rust/blob/HEAD/compiler/rustc_feature/src/removed.rs diff --git a/src/doc/rustc-dev-guide/src/feature-gates.md b/src/doc/rustc-dev-guide/src/feature-gates.md index 76bf111fe7729..12166577ae30a 100644 --- a/src/doc/rustc-dev-guide/src/feature-gates.md +++ b/src/doc/rustc-dev-guide/src/feature-gates.md @@ -1,7 +1,6 @@ # Feature gates -This chapter is intended to provide basic help for adding, removing, and -modifying feature gates. +This chapter is intended to provide basic help for adding, removing, and modifying feature gates. For how rustc enforces and checks feature gates in the compiler pipeline, see [Feature Gate Checking][feature-gate-check]. @@ -67,9 +66,8 @@ to follow when [removing a feature gate][removing]): Some("renamed to `$new_feature_name`")) ``` -3. Add a feature gate declaration with the new name to - `rustc_feature/src/unstable.rs`. It should look very similar to the old - declaration: +3. Add a feature gate declaration with the new name to `rustc_feature/src/unstable.rs`. + It should look very similar to the old declaration: ```rust,ignore /// description of feature @@ -79,9 +77,8 @@ to follow when [removing a feature gate][removing]): ## Stabilizing a feature -See ["Updating the feature-gate listing"] in the "Stabilizing Features" chapter -for instructions. There are additional steps you will need to take beyond just -updating the declaration! +See ["Updating the feature-gate listing"] in the "Stabilizing Features" chapter for instructions. +There are additional steps you will need to take beyond just updating the declaration! ["Stability in code"]: ./implementing-new-features.md#stability-in-code diff --git a/src/doc/rustc-dev-guide/src/git.md b/src/doc/rustc-dev-guide/src/git.md index e85e6bd708500..bf31e79a9a154 100644 --- a/src/doc/rustc-dev-guide/src/git.md +++ b/src/doc/rustc-dev-guide/src/git.md @@ -308,8 +308,8 @@ reapplied to the most recent version of `main`. In other words, Git tries to pretend that the changes you made to the old version of `main` were instead made to the new version of `main`. -During this process, you should expect to -encounter at least one "rebase conflict". This happens when Git's attempt to +During this process, you should expect to encounter at least one "rebase conflict". +This happens when Git's attempt to reapply the changes fails because your changes conflicted with other changes that have been made. You can tell that this happened because you'll see lines in the output that look like @@ -410,6 +410,13 @@ because they only represent "fixups" and not real changes. For example, `git rebase --interactive HEAD~2` will allow you to edit the two commits only. +For pull requests in `rust-lang/rust`, you can ask [bors] to squash by commenting +`@bors squash` on the PR. +By default, [bors] combines all commit messages in the PR. +To customize the commit message, use `@bors squash [msg|message=]`. + +[bors]: https://github.com/rust-lang/bors + ### `git range-diff` After completing a rebase, and before pushing up your changes, you may want to @@ -472,8 +479,8 @@ command useful, especially their ["Examples" section][range-diff-example-docs]. ## No-Merge Policy -The rust-lang/rust repo uses what is known as a "rebase workflow". This means -that merge commits in PRs are not accepted. +The rust-lang/rust repo uses what is known as a "rebase workflow". +This means that merge commits in PRs are not accepted. As a result, if you are running `git merge` locally, chances are good that you should be rebasing instead. Of course, this is not always true; if your merge will just be a fast-forward, diff --git a/src/doc/rustc-dev-guide/src/macro-expansion.md b/src/doc/rustc-dev-guide/src/macro-expansion.md index 3199e9950d7ef..60067c4f85c37 100644 --- a/src/doc/rustc-dev-guide/src/macro-expansion.md +++ b/src/doc/rustc-dev-guide/src/macro-expansion.md @@ -1,6 +1,7 @@ # Macro expansion -Rust has a very powerful macro system. In the previous chapter, we saw how +Rust has a very powerful macro system. +In the previous chapter, we saw how the parser sets aside macros to be expanded (using temporary [placeholders]). This chapter is about the process of expanding those macros iteratively until we have a complete [*Abstract Syntax Tree* (AST)][ast] for our crate with no @@ -9,9 +10,9 @@ unexpanded macros (or a compile error). [ast]: ./ast-validation.md [placeholders]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/placeholders/index.html -First, we discuss the algorithm that expands and integrates macro output into -ASTs. Next, we take a look at how hygiene data is collected. Finally, we look -at the specifics of expanding different types of macros. +First, we discuss the algorithm that expands and integrates macro output into ASTs. +Next, we take a look at how hygiene data is collected. +Finally, we look at the specifics of expanding different types of macros. Many of the algorithms and data structures described below are in [`rustc_expand`], with fundamental data structures in [`rustc_expand::base`][base]. @@ -25,21 +26,25 @@ handled in [`rustc_expand::config`][cfg]. ## Expansion and AST Integration -Firstly, expansion happens at the crate level. Given a raw source code for +Firstly, expansion happens at the crate level. +Given a raw source code for a crate, the compiler will produce a massive AST with all macros expanded, all modules inlined, etc. The primary entry point for this process is the -[`MacroExpander::fully_expand_fragment`][fef] method. With few exceptions, we +[`MacroExpander::fully_expand_fragment`][fef] method. +With few exceptions, we use this method on the whole crate (see ["Eager Expansion"](#eager-expansion) below for more detailed discussion of edge case expansion issues). [`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html [reb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/build/index.html -At a high level, [`fully_expand_fragment`][fef] works in iterations. We keep a +At a high level, [`fully_expand_fragment`][fef] works in iterations. +We keep a queue of unresolved macro invocations (i.e. macros we haven't found the -definition of yet). We repeatedly try to pick a macro from the queue, resolve -it, expand it, and integrate it back. If we can't make progress in an -iteration, this represents a compile error. Here is the [algorithm][original]: +definition of yet). +We repeatedly try to pick a macro from the queue, resolve it, expand it, and integrate it back. +If we can't make progress in an iteration, this represents a compile error. + Here is the [algorithm][original]: [fef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.MacroExpander.html#method.fully_expand_fragment [original]: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049 @@ -49,13 +54,13 @@ iteration, this represents a compile error. Here is the [algorithm][original]: 1. [Resolve](./name-resolution.md) imports in our partially built crate as much as possible. 2. Collect as many macro [`Invocation`s][inv] as possible from our - partially built crate (`fn`-like, attributes, derives) and add them to the - queue. + partially built crate (`fn`-like, attributes, derives) and add them to the queue. 3. Dequeue the first element and attempt to resolve it. 4. If it's resolved: 1. Run the macro's expander function that consumes a [`TokenStream`] or AST and produces a [`TokenStream`] or [`AstFragment`] (depending on - the macro kind). (A [`TokenStream`] is a collection of [`TokenTree`s][tt], + the macro kind). + (A [`TokenStream`] is a collection of [`TokenTree`s][tt], each of which are a token (punctuation, identifier, or literal) or a delimited group (anything inside `()`/`[]`/`{}`)). - At this point, we know everything about the macro itself and can @@ -63,17 +68,18 @@ iteration, this represents a compile error. Here is the [algorithm][original]: data; that is the [hygiene] data associated with [`ExpnId`] (see [Hygiene][hybelow] below). 2. Integrate that piece of AST into the currently-existing though - partially-built AST. This is essentially where the "token-like mass" - becomes a proper set-in-stone AST with side-tables. It happens as - follows: + partially-built AST. + This is essentially where the "token-like mass" + becomes a proper set-in-stone AST with side-tables. + It happens as follows: - If the macro produces tokens (e.g. a proc macro), we parse into an AST, which may produce parse errors. - During expansion, we create [`SyntaxContext`]s (hierarchy 2) (see [Hygiene][hybelow] below). - These three passes happen one after another on every AST fragment freshly expanded from a macro: - - [`NodeId`]s are assigned by [`InvocationCollector`]. This - also collects new macro calls from this new AST piece and + - [`NodeId`]s are assigned by [`InvocationCollector`]. + This also collects new macro calls from this new AST piece and adds them to the queue. - ["Def paths"][defpath] are created and [`DefId`]s are assigned to them by [`DefCollector`]. @@ -115,22 +121,23 @@ so that `rustc` can report more errors than just the original failure. ### Name Resolution Notice that name resolution is involved here: we need to resolve imports and -macro names in the above algorithm. This is done in -[`rustc_resolve::macros`][mresolve], which resolves macro paths, validates +macro names in the above algorithm. +This is done in [`rustc_resolve::macros`][mresolve], which resolves macro paths, validates those resolutions, and reports various errors (e.g. "not found", "found, but -it's unstable", "expected x, found y"). However, we don't try to resolve -other names yet. This happens later, as we will see in the chapter: [Name -Resolution](./name-resolution.md). +it's unstable", "expected x, found y"). +However, we don't try to resolve other names yet. +This happens later, as we will see in the chapter: [Name Resolution](./name-resolution.md). [mresolve]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/macros/index.html ### Eager Expansion _Eager expansion_ means we expand the arguments of a macro invocation before -the macro invocation itself. This is implemented only for a few special +the macro invocation itself. +This is implemented only for a few special built-in macros that expect literals; expanding arguments first for some of -these macro results in a smoother user experience. As an example, consider -the following: +these macro results in a smoother user experience. +As an example, consider the following: ```rust,ignore macro bar($i: ident) { $i } @@ -139,29 +146,27 @@ macro foo($i: ident) { $i } foo!(bar!(baz)); ``` -A lazy-expansion would expand `foo!` first. An eager-expansion would expand -`bar!` first. +A lazy-expansion would expand `foo!` first. +An eager-expansion would expand `bar!` first. -Eager-expansion is not a generally available feature of Rust. Implementing -eager-expansion more generally would be challenging, so we implement it for a -few special built-in macros for the sake of user-experience. The built-in -macros are implemented in [`rustc_builtin_macros`], along with some other +Eager-expansion is not a generally available feature of Rust. +Implementing eager-expansion more generally would be challenging, so we implement it for a +few special built-in macros for the sake of user-experience. +The built-in macros are implemented in [`rustc_builtin_macros`], along with some other early code generation facilities like injection of standard library imports or -generation of test harness. There are some additional helpers for building -AST fragments in [`rustc_expand::build`][reb]. Eager-expansion generally -performs a subset of the things that lazy (normal) expansion does. It is done -by invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed +generation of test harness. +There are some additional helpers for building AST fragments in [`rustc_expand::build`][reb]. +Eager-expansion generally performs a subset of the things that lazy (normal) expansion does. +It is done by invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed to the whole crate, like we normally do). ### Other Data Structures -Here are some other notable data structures involved in expansion and -integration: -- [`ResolverExpand`] - a `trait` used to break crate dependencies. This allows the - resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and +Here are some other notable data structures involved in expansion and integration: +- [`ResolverExpand`] - a `trait` used to break crate dependencies. + This allows the resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and pretty much everything else depending on [`rustc_ast`]. -- [`ExtCtxt`]/[`ExpansionData`] - holds various intermediate expansion - infrastructure data. +- [`ExtCtxt`]/[`ExpansionData`] - holds various intermediate expansion infrastructure data. - [`Annotatable`] - a piece of AST that can be an attribute target, almost the same thing as [`AstFragment`] except for types and patterns that can be produced by macros but cannot be annotated with attributes. @@ -182,7 +187,8 @@ integration: ## Hygiene and Hierarchies If you have ever used the C/C++ preprocessor macros, you know that there are some -annoying and hard-to-debug gotchas! For example, consider the following C code: +annoying and hard-to-debug gotchas! +For example, consider the following C code: ```c #define DEFINE_FOO struct Bar {int x;}; struct Foo {Bar bar;}; @@ -195,9 +201,9 @@ struct Bar { DEFINE_FOO ``` -Most people avoid writing C like this – and for good reason: it doesn't -compile. The `struct Bar` defined by the macro clashes names with the `struct -Bar` defined in the code. Consider also the following example: +Most people avoid writing C like this – and for good reason: it doesn't compile. +The `struct Bar` defined by the macro clashes names with the `struct Bar` defined in the code. +Consider also the following example: ```c #define DO_FOO(x) {\ @@ -210,20 +216,23 @@ int y = 22; DO_FOO(y); ``` -Do you see the problem? We wanted to generate a call `foo(22, 0)`, but instead +Do you see the problem? +We wanted to generate a call `foo(22, 0)`, but instead we got `foo(0, 0)` because the macro defined its own `y`! -These are both examples of _macro hygiene_ issues. _Hygiene_ relates to how to -handle names defined _within a macro_. In particular, a hygienic macro system -prevents errors due to names introduced within a macro. Rust macros are hygienic -in that they do not allow one to write the sorts of bugs above. +These are both examples of _macro hygiene_ issues. +_Hygiene_ relates to how to handle names defined _within a macro_. +In particular, a hygienic macro system prevents errors due to names introduced within a macro. +Rust macros are hygienic in that they do not allow one to write the sorts of bugs above. At a high level, hygiene within the Rust compiler is accomplished by keeping -track of the context where a name is introduced and used. We can then -disambiguate names based on that context. Future iterations of the macro system -will allow greater control to the macro author to use that context. For example, -a macro author may want to introduce a new name to the context where the macro -was called. Alternately, the macro author may be defining a variable for use +track of the context where a name is introduced and used. +We can then disambiguate names based on that context. +Future iterations of the macro system +will allow greater control to the macro author to use that context. +For example, +a macro author may want to introduce a new name to the context where the macro was called. +Alternately, the macro author may be defining a variable for use only within the macro (i.e. it should not be visible outside the macro). [code_dir]: https://github.com/rust-lang/rust/tree/HEAD/compiler/rustc_expand/src/mbe @@ -232,8 +241,9 @@ only within the macro (i.e. it should not be visible outside the macro). [code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/struct.TtParser.html#method.parse_tt [parsing]: ./the-parser.html -The context is attached to AST nodes. All AST nodes generated by macros have -context attached. Additionally, there may be other nodes that have context +The context is attached to AST nodes. +All AST nodes generated by macros have context attached. +Additionally, there may be other nodes that have context attached, such as some desugared syntax (non-macro-expanded nodes are considered to just have the "root" context, as described below). Throughout the compiler, we use [`rustc_span::Span`s][span] to refer to code locations. @@ -242,27 +252,29 @@ This struct also has hygiene information attached to it, as we will see later. [span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html Because macros invocations and definitions can be nested, the syntax context of -a node must be a hierarchy. For example, if we expand a macro and there is +a node must be a hierarchy. +For example, if we expand a macro and there is another macro invocation or definition in the generated output, then the syntax context should reflect the nesting. However, it turns out that there are actually a few types of context we may -want to track for different purposes. Thus, there are not just one but _three_ -expansion hierarchies that together comprise the hygiene information for a -crate. +want to track for different purposes. +Thus, there are not just one but _three_ +expansion hierarchies that together comprise the hygiene information for a crate. All of these hierarchies need some sort of "macro ID" to identify individual -elements in the chain of expansions. This ID is [`ExpnId`]. All macros receive -an integer ID, assigned continuously starting from 0 as we discover new macro -calls. All hierarchies start at [`ExpnId::root`][rootid], which is its own -parent. +elements in the chain of expansions. +This ID is [`ExpnId`]. +All macros receive an integer ID, assigned continuously starting from 0 as we discover new macro +calls. +All hierarchies start at [`ExpnId::root`][rootid], which is its own parent. The [`rustc_span::hygiene`][hy] crate contains all of the hygiene-related algorithms (with the exception of some hacks in [`Resolver::resolve_crate_root`][hacks]) and structures related to hygiene and expansion that are kept in global data. -The actual hierarchies are stored in [`HygieneData`][hd]. This is a global -piece of data containing hygiene and expansion info that can be accessed from +The actual hierarchies are stored in [`HygieneData`][hd]. +This is a global piece of data containing hygiene and expansion info that can be accessed from any [`Ident`] without any context. @@ -278,8 +290,8 @@ any [`Ident`] without any context. The first hierarchy tracks the order of expansions, i.e., when a macro invocation is in the output of another macro. -Here, the children in the hierarchy will be the "innermost" tokens. The -[`ExpnData`] struct itself contains a subset of properties from both macro +Here, the children in the hierarchy will be the "innermost" tokens. +The [`ExpnData`] struct itself contains a subset of properties from both macro definition and macro call available through global data. [`ExpnData::parent`][edp] tracks the child-to-parent link in this hierarchy. @@ -300,19 +312,19 @@ In this code, the AST nodes that are finally generated would have hierarchy ### The Macro Definition Hierarchy The second hierarchy tracks the order of macro definitions, i.e., when we are -expanding one macro another macro definition is revealed in its output. This -one is a bit tricky and more complex than the other two hierarchies. +expanding one macro another macro definition is revealed in its output. +This one is a bit tricky and more complex than the other two hierarchies. [`SyntaxContext`][sc] represents a whole chain in this hierarchy via an ID. [`SyntaxContextData`][scd] contains data associated with the given -[`SyntaxContext`][sc]; mostly it is a cache for results of filtering that chain in -different ways. [`SyntaxContextData::parent`][scdp] is the child-to-parent -link here, and [`SyntaxContextData::outer_expns`][scdoe] are individual -elements in the chain. The "chaining-operator" is -[`SyntaxContext::apply_mark`][am] in compiler code. +[`SyntaxContext`][sc]; mostly it is a cache for results of filtering that chain in different ways. + [`SyntaxContextData::parent`][scdp] is the child-to-parent +link here, and [`SyntaxContextData::outer_expns`][scdoe] are individual elements in the chain. +The "chaining-operator" is [`SyntaxContext::apply_mark`][am] in compiler code. A [`Span`][span], mentioned above, is actually just a compact representation of -a code location and [`SyntaxContext`][sc]. Likewise, an [`Ident`] is just an interned +a code location and [`SyntaxContext`][sc]. +Likewise, an [`Ident`] is just an interned [`Symbol`] + `Span` (i.e. an interned string + hygiene data). [`Symbol`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Symbol.html @@ -324,13 +336,14 @@ a code location and [`SyntaxContext`][sc]. Likewise, an [`Ident`] is just an int For built-in macros, we use the context: [`SyntaxContext::empty().apply_mark(expn_id)`], and such macros are -considered to be defined at the hierarchy root. We do the same for `proc -macro`s because we haven't implemented cross-crate hygiene yet. +considered to be defined at the hierarchy root. +We do the same for `proc macro`s because we haven't implemented cross-crate hygiene yet. [`SyntaxContext::empty().apply_mark(expn_id)`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark If the token had context `X` before being produced by a macro then after being -produced by the macro it has context `X -> macro_id`. Here are some examples: +produced by the macro it has context `X -> macro_id`. +Here are some examples: Example 0: @@ -374,8 +387,8 @@ After all expansions, `foo` has context `ROOT -> id(n)` and `bar` has context Currently this hierarchy for tracking macro definitions is subject to the so-called ["context transplantation hack"][hack]. Modern (i.e. experimental) macros have stronger hygiene than the legacy "Macros By Example" (MBE) -system which can result in weird interactions between the two. The hack is -intended to make things "just work" for now. +system which can result in weird interactions between the two. +The hack is intended to make things "just work" for now. [`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html [hack]: https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732 @@ -384,8 +397,7 @@ intended to make things "just work" for now. The third and final hierarchy tracks the location of macro invocations. -In this hierarchy [`ExpnData::call_site`][callsite] is the `child -> parent` -link. +In this hierarchy [`ExpnData::call_site`][callsite] is the `child -> parent` link. [callsite]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.call_site @@ -399,8 +411,7 @@ foo!(bar!(baz)); ``` For the `baz` AST node in the final output, the expansion-order hierarchy is -`ROOT -> id(foo) -> id(bar) -> baz`, while the call-site hierarchy is `ROOT -> -baz`. +`ROOT -> id(foo) -> id(bar) -> baz`, while the call-site hierarchy is `ROOT -> baz`. ### Macro Backtraces @@ -412,16 +423,18 @@ in [`rustc_span::hygiene`][hy]. ## Producing Macro Output Above, we saw how the output of a macro is integrated into the AST for a crate, -and we also saw how the hygiene data for a crate is generated. But how do we -actually produce the output of a macro? It depends on the type of macro. +and we also saw how the hygiene data for a crate is generated. +But how do we actually produce the output of a macro? +It depends on the type of macro. + +There are two types of macros in Rust: + 1. `macro_rules!` macros (a.k.a. + "Macros By Example" (MBE)), and, + 2. procedural macros (proc macros); including custom derives. -There are two types of macros in Rust: - 1. `macro_rules!` macros (a.k.a. "Macros By Example" (MBE)), and, - 2. procedural macros (proc macros); including custom derives. - During the parsing phase, the normal Rust parser will set aside the contents of -macros and their invocations. Later, macros are expanded using these -portions of the code. +macros and their invocations. +Later, macros are expanded using these portions of the code. Some important data structures/interfaces here: - [`SyntaxExtension`] - a lowered macro representation, contains its expander @@ -429,8 +442,8 @@ Some important data structures/interfaces here: [`TokenStream`] or AST + some additional data like stability, or a list of unstable features allowed inside the macro. - [`SyntaxExtensionKind`] - expander functions may have several different - signatures (take one token stream, or two, or a piece of AST, etc). This is - an `enum` that lists them. + signatures (take one token stream, or two, or a piece of AST, etc). + This is an `enum` that lists them. - [`BangProcMacro`]/[`TTMacroExpander`]/[`AttrProcMacro`]/[`MultiItemModifier`] - `trait`s representing the expander function signatures. @@ -443,12 +456,12 @@ Some important data structures/interfaces here: ## Macros By Example -MBEs have their own parser distinct from the Rust parser. When macros are -expanded, we may invoke the MBE parser to parse and expand a macro. The -MBE parser, in turn, may call the Rust parser when it needs to bind a +MBEs have their own parser distinct from the Rust parser. +When macros are expanded, we may invoke the MBE parser to parse and expand a macro. + The MBE parser, in turn, may call the Rust parser when it needs to bind a metavariable (e.g. `$my_expr`) while parsing the contents of a macro -invocation. The code for macro expansion is in -[`compiler/rustc_expand/src/mbe/`][code_dir]. +invocation. +The code for macro expansion is in [`compiler/rustc_expand/src/mbe/`][code_dir]. ### Example @@ -464,21 +477,22 @@ macro_rules! printer { } ``` -Here `$mvar` is called a _metavariable_. Unlike normal variables, rather than -binding to a value _at runtime_, a metavariable binds _at compile time_ to a -tree of _tokens_. A _token_ is a single "unit" of the grammar, such as an +Here `$mvar` is called a _metavariable_. +Unlike normal variables, rather than +binding to a value _at runtime_, a metavariable binds _at compile time_ to a tree of _tokens_. +A _token_ is a single "unit" of the grammar, such as an identifier (e.g. `foo`) or punctuation (e.g. `=>`). There are also other -special tokens, such as `EOF`, which itself indicates that there are no more -tokens. There are token trees resulting from the paired parentheses-like +special tokens, such as `EOF`, which itself indicates that there are no more tokens. +There are token trees resulting from the paired parentheses-like characters (`(`...`)`, `[`...`]`, and `{`...`}`) – they include the open and -close and all the tokens in between (Rust requires that parentheses-like -characters be balanced). Having macro expansion operate on token streams +close and all the tokens in between (Rust requires that parentheses-like characters be balanced). +Having macro expansion operate on token streams rather than the raw bytes of a source-file abstracts away a lot of complexity. The macro expander (and much of the rest of the compiler) doesn't consider the exact line and column of some syntactic construct in the code; it considers -which constructs are used in the code. Using tokens allows us to care about -_what_ without worrying about _where_. For more information about tokens, see -the [Parsing][parsing] chapter of this book. +which constructs are used in the code. +Using tokens allows us to care about _what_ without worrying about _where_. +For more information about tokens, see the [Parsing][parsing] chapter of this book. ```rust,ignore printer!(print foo); // `foo` is a variable @@ -490,15 +504,14 @@ The process of expanding the macro invocation into the syntax tree ### The MBE parser -There are two parts to MBE expansion done by the macro parser: +There are two parts to MBE expansion done by the macro parser: 1. parsing the definition, and, - 2. parsing the invocations. + 2. parsing the invocations. We think of the MBE parser as a nondeterministic finite automaton (NFA) based regex parser since it uses an algorithm similar in spirit to the [Earley -parsing algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro -parser is defined in -[`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp]. +parsing algorithm](https://en.wikipedia.org/wiki/Earley_parser). +The macro parser is defined in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp]. The interface of the macro parser is as follows (this is slightly simplified): @@ -513,31 +526,31 @@ fn parse_tt( We use these items in macro parser: - a `parser` variable is a reference to the state of a normal Rust parser, - including the token stream and parsing session. The token stream is what we - are about to ask the MBE parser to parse. We will consume the raw stream of + including the token stream and parsing session. + The token stream is what we are about to ask the MBE parser to parse. + We will consume the raw stream of tokens and output a binding of metavariables to corresponding token trees. The parsing session can be used to report parser errors. - a `matcher` variable is a sequence of [`MatcherLoc`]s that we want to match the token stream - against. They're converted from the original token trees in the macro's definition before - matching. + against. + They're converted from the original token trees in the macro's definition before matching. [`MatcherLoc`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/enum.MatcherLoc.html In the analogy of a regex parser, the token stream is the input and we are -matching it against the pattern defined by matcher. Using our examples, the +matching it against the pattern defined by matcher. +Using our examples, the token stream could be the stream of tokens containing the inside of the example -invocation `print foo`, while matcher might be the sequence of token (trees) -`print $mvar:ident`. +invocation `print foo`, while matcher might be the sequence of token (trees) `print $mvar:ident`. -The output of the parser is a [`ParseResult`], which indicates which of -three cases has occurred: +The output of the parser is a [`ParseResult`], which indicates which of three cases has occurred: - **Success**: the token stream matches the given matcher and we have produced a binding from metavariables to the corresponding token trees. - **Failure**: the token stream does not match matcher and results in an error message such as "No rule expected token ...". -- **Error**: some fatal error has occurred _in the parser_. For example, this - happens if there is more than one pattern match, since that indicates the +- **Error**: some fatal error has occurred _in the parser_. + For example, this happens if there is more than one pattern match, since that indicates the macro is ambiguous. The full interface is defined [here][code_parse_int]. @@ -553,11 +566,13 @@ For more information about the macro parser's implementation, see the comments i Using our example, we would try to match the token stream `print foo` from the invocation against the matchers `print $mvar:ident` and `print twice $mvar:ident` that we previously extracted from the -rules in the macro definition. When the macro parser comes to a place in the current matcher where +rules in the macro definition. +When the macro parser comes to a place in the current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`), it calls back to the normal Rust parser to -get the contents of that non-terminal. In this case, the Rust parser would look for an `ident` -token, which it finds (`foo`) and returns to the macro parser. Then, the macro parser continues -parsing. +get the contents of that non-terminal. +In this case, the Rust parser would look for an `ident` +token, which it finds (`foo`) and returns to the macro parser. +Then, the macro parser continues parsing. Note that exactly one of the matchers from the various rules should match the invocation; if there is more than one match, the parse is ambiguous, while if there are no matches at all, there is a syntax @@ -568,17 +583,19 @@ rule, substituting the values of any matches it captured when matching against t ## Procedural Macros -Procedural macros are also expanded during parsing. However, rather than -having a parser in the compiler, proc macros are implemented as custom, -third-party crates. The compiler will compile the proc macro crate and +Procedural macros are also expanded during parsing. +However, rather than having a parser in the compiler, proc macros are implemented as custom, +third-party crates. +The compiler will compile the proc macro crate and specially annotated functions in them (i.e. the proc macro itself), passing -them a stream of tokens. A proc macro can then transform the token stream and +them a stream of tokens. +A proc macro can then transform the token stream and output a new token stream, which is synthesized into the AST. -The token stream type used by proc macros is _stable_, so `rustc` does not -use it internally. The compiler's (unstable) token stream is defined in -[`rustc_ast::tokenstream::TokenStream`][rustcts]. This is converted into the -stable [`proc_macro::TokenStream`][stablets] and back in +The token stream type used by proc macros is _stable_, so `rustc` does not use it internally. +The compiler's (unstable) token stream is defined in +[`rustc_ast::tokenstream::TokenStream`][rustcts]. +This is converted into the stable [`proc_macro::TokenStream`][stablets] and back in [`rustc_expand::proc_macro`][pm] and [`rustc_expand::proc_macro_server`][pms]. Since the Rust ABI is currently unstable, we use the C ABI for this conversion. diff --git a/src/doc/rustc-dev-guide/src/notification-groups/about.md b/src/doc/rustc-dev-guide/src/notification-groups/about.md index 2c2c98860a9b5..86797b1e0bb51 100644 --- a/src/doc/rustc-dev-guide/src/notification-groups/about.md +++ b/src/doc/rustc-dev-guide/src/notification-groups/about.md @@ -7,11 +7,11 @@ and joining does not entail any particular commitment. Once you [join a notification group](#join), you will be added to a list that receives pings on github whenever a new issue is found -that fits the notification group's criteria. If you are interested, you -can then [claim the issue] and start working on it. +that fits the notification group's criteria. +If you are interested, you can then [claim the issue] and start working on it. -Of course, you don't have to wait for new issues to be tagged! If you -prefer, you can use the GitHub label for a notification group to +Of course, you don't have to wait for new issues to be tagged! +If you prefer, you can use the GitHub label for a notification group to search for existing issues that haven't been claimed yet. [claim the issue]: https://forge.rust-lang.org/triagebot/issue-assignment.html @@ -37,8 +37,8 @@ particularly those of **middle priority**: - By **isolated**, we mean that we do not expect large-scale refactoring to be required to fix the bug. - By **middle priority**, we mean that we'd like to see the bug fixed, - but it's not such a burning problem that we are dropping everything - else to fix it. The danger with such bugs, of course, is that they + but it's not such a burning problem that we are dropping everything else to fix it. + The danger with such bugs, of course, is that they can accumulate over time, and the role of the notification group is to try and stop that from happening! @@ -48,8 +48,7 @@ particularly those of **middle priority**: To join a notification group, you just have to open a PR adding your GitHub username to the appropriate file in the Rust team repository. -See the "example PRs" below to get a precise idea and to identify the -file to edit. +See the "example PRs" below to get a precise idea and to identify the file to edit. Also, if you are not already a member of a Rust team then -- in addition to adding your name to the file -- you have to checkout the repository and @@ -73,8 +72,8 @@ Example PRs: ## Tagging an issue for a notification group To tag an issue as appropriate for a notification group, you give -[rustbot] a [`ping`] command with the name of the notification -group. For example: +[rustbot] a [`ping`] command with the name of the notification group. +For example: ```text @rustbot ping apple @@ -87,8 +86,8 @@ group. For example: ``` To make some commands shorter and easier to remember, there are aliases, -defined in the [`triagebot.toml`] file. For example, all of these commands -are equivalent and will ping the Apple group: +defined in the [`triagebot.toml`] file. +For example, all of these commands are equivalent and will ping the Apple group: ```text @rustbot ping apple @@ -97,12 +96,12 @@ are equivalent and will ping the Apple group: ``` Keep in mind that these aliases are meant to make humans' life easier. -They might be subject to change. If you need to ensure that a command +They might be subject to change. +If you need to ensure that a command will always be valid, prefer the full invocations over the aliases. **Note though that this should only be done by compiler team members -or contributors, and is typically done as part of compiler team -triage.** +or contributors, and is typically done as part of compiler team triage.** [rustbot]: https://github.com/rust-lang/triagebot/ [`ping`]: https://forge.rust-lang.org/triagebot/pinging.html diff --git a/src/doc/rustc-dev-guide/src/notification-groups/arm.md b/src/doc/rustc-dev-guide/src/notification-groups/arm.md index bffcc6c04571e..b71c3d6067512 100644 --- a/src/doc/rustc-dev-guide/src/notification-groups/arm.md +++ b/src/doc/rustc-dev-guide/src/notification-groups/arm.md @@ -10,13 +10,11 @@ ARM-related issues as well as suggestions on how to resolve interesting questions regarding our ARM support. The group also has an associated Zulip channel ([`#t-compiler/arm`]) -where people can go to pose questions and discuss ARM-specific -topics. +where people can go to pose questions and discuss ARM-specific topics. -So, if you are interested in participating, please sign up for the -ARM group! To do so, open a PR against the [rust-lang/team] -repository. Just [follow this example][eg], but change the username to -your own! +So, if you are interested in participating, please sign up for the ARM group! +To do so, open a PR against the [rust-lang/team] repository. +Just [follow this example][eg], but change the username to your own! [`#t-compiler/arm`]: https://rust-lang.zulipchat.com/#narrow/stream/242906-t-compiler.2Farm [rust-lang/team]: https://github.com/rust-lang/team diff --git a/src/doc/rustc-dev-guide/src/notification-groups/emscripten.md b/src/doc/rustc-dev-guide/src/notification-groups/emscripten.md index 4996ed62e46ab..685517c8d16fb 100644 --- a/src/doc/rustc-dev-guide/src/notification-groups/emscripten.md +++ b/src/doc/rustc-dev-guide/src/notification-groups/emscripten.md @@ -10,13 +10,11 @@ Emscripten-related issues as well as suggestions on how to resolve interesting questions regarding our Emscripten support. The group also has an associated Zulip channel ([`#t-compiler/wasm`]) -where people can go to pose questions and discuss Emscripten-specific -topics. +where people can go to pose questions and discuss Emscripten-specific topics. -So, if you are interested in participating, please sign up for the -Emscripten group! To do so, open a PR against the [rust-lang/team] -repository. Just [follow this example][eg], but change the username to -your own! +So, if you are interested in participating, please sign up for the Emscripten group! +To do so, open a PR against the [rust-lang/team] repository. +Just [follow this example][eg], but change the username to your own! [`#t-compiler/wasm`]: https://rust-lang.zulipchat.com/#narrow/stream/463513-t-compiler.2Fwasm [rust-lang/team]: https://github.com/rust-lang/team diff --git a/src/doc/rustc-dev-guide/src/notification-groups/fuchsia.md b/src/doc/rustc-dev-guide/src/notification-groups/fuchsia.md index fd9c5d236f5c2..3c07caaa5ef83 100644 --- a/src/doc/rustc-dev-guide/src/notification-groups/fuchsia.md +++ b/src/doc/rustc-dev-guide/src/notification-groups/fuchsia.md @@ -6,7 +6,6 @@ [O-fuchsia]: https://github.com/rust-lang/rust/labels/O-fuchsia This list will be used to notify [Fuchsia][fuchsia] maintainers -when the compiler or the standard library changes in a way that would -break the Fuchsia integration. +when the compiler or the standard library changes in a way that would break the Fuchsia integration. [fuchsia]: ../tests/ecosystem-test-jobs/fuchsia.md diff --git a/src/doc/rustc-dev-guide/src/notification-groups/loongarch.md b/src/doc/rustc-dev-guide/src/notification-groups/loongarch.md index 09620a6a5ce84..8b1c33370f73f 100644 --- a/src/doc/rustc-dev-guide/src/notification-groups/loongarch.md +++ b/src/doc/rustc-dev-guide/src/notification-groups/loongarch.md @@ -12,10 +12,9 @@ interesting questions regarding our LoongArch support. The group also has an associated Zulip channel ([`#t-compiler/loong-arch`]) where people can go to pose questions and discuss LoongArch-specific topics. -So, if you are interested in participating, please sign up for the -LoongArch group! To do so, open a PR against the [rust-lang/team] -repository. Just [follow this example][eg], but change the username to -your own! +So, if you are interested in participating, please sign up for the LoongArch group! +To do so, open a PR against the [rust-lang/team] repository. +Just [follow this example][eg], but change the username to your own! [`#t-compiler/loong-arch`]: https://rust-lang.zulipchat.com/#narrow/channel/551512-t-compiler.2Floong-arch [rust-lang/team]: https://github.com/rust-lang/team diff --git a/src/doc/rustc-dev-guide/src/notification-groups/risc-v.md b/src/doc/rustc-dev-guide/src/notification-groups/risc-v.md index 250a512fbaaca..d0b51572d52a0 100644 --- a/src/doc/rustc-dev-guide/src/notification-groups/risc-v.md +++ b/src/doc/rustc-dev-guide/src/notification-groups/risc-v.md @@ -10,13 +10,11 @@ RISC-V-related issues as well as suggestions on how to resolve interesting questions regarding our RISC-V support. The group also has an associated Zulip channel ([`#t-compiler/risc-v`]) -where people can go to pose questions and discuss RISC-V-specific -topics. +where people can go to pose questions and discuss RISC-V-specific topics. -So, if you are interested in participating, please sign up for the -RISC-V group! To do so, open a PR against the [rust-lang/team] -repository. Just [follow this example][eg], but change the username to -your own! +So, if you are interested in participating, please sign up for the RISC-V group! +To do so, open a PR against the [rust-lang/team] repository. +Just [follow this example][eg], but change the username to your own! [`#t-compiler/risc-v`]: https://rust-lang.zulipchat.com/#narrow/stream/250483-t-compiler.2Frisc-v [rust-lang/team]: https://github.com/rust-lang/team diff --git a/src/doc/rustc-dev-guide/src/notification-groups/rust-for-linux.md b/src/doc/rustc-dev-guide/src/notification-groups/rust-for-linux.md index c08cf9deecec2..cb2a477cea1db 100644 --- a/src/doc/rustc-dev-guide/src/notification-groups/rust-for-linux.md +++ b/src/doc/rustc-dev-guide/src/notification-groups/rust-for-linux.md @@ -7,14 +7,13 @@ This list will be used to notify [Rust for Linux (RfL)][rfl] maintainers when the compiler or the standard library changes in a way that would -break Rust for Linux, since it depends on several unstable flags -and features. The RfL maintainers should then ideally provide support +break Rust for Linux, since it depends on several unstable flags and features. +The RfL maintainers should then ideally provide support for resolving the breakage or decide to temporarily accept the breakage and unblock CI by temporarily removing the RfL CI jobs. The group also has an associated Zulip channel ([`#rust-for-linux`]) -where people can go to ask questions and discuss topics related to Rust -for Linux. +where people can go to ask questions and discuss topics related to Rust for Linux. If you are interested in participating, please sign up for the Rust for Linux group on [Zulip][`#rust-for-linux`]! diff --git a/src/doc/rustc-dev-guide/src/notification-groups/wasi.md b/src/doc/rustc-dev-guide/src/notification-groups/wasi.md index 3d7fd01af28dc..93962a54fdfb7 100644 --- a/src/doc/rustc-dev-guide/src/notification-groups/wasi.md +++ b/src/doc/rustc-dev-guide/src/notification-groups/wasi.md @@ -10,13 +10,11 @@ WASI-related issues as well as suggestions on how to resolve interesting questions regarding our WASI support. The group also has an associated Zulip channel ([`#t-compiler/wasm`]) -where people can go to pose questions and discuss WASI-specific -topics. +where people can go to pose questions and discuss WASI-specific topics. -So, if you are interested in participating, please sign up for the -WASI group! To do so, open a PR against the [rust-lang/team] -repository. Just [follow this example][eg], but change the username to -your own! +So, if you are interested in participating, please sign up for the WASI group! +To do so, open a PR against the [rust-lang/team] repository. +Just [follow this example][eg], but change the username to your own! [`#t-compiler/wasm`]: https://rust-lang.zulipchat.com/#narrow/stream/463513-t-compiler.2Fwasm [rust-lang/team]: https://github.com/rust-lang/team diff --git a/src/doc/rustc-dev-guide/src/notification-groups/windows.md b/src/doc/rustc-dev-guide/src/notification-groups/windows.md index 4b3970a9d63f1..797eb69fd1cff 100644 --- a/src/doc/rustc-dev-guide/src/notification-groups/windows.md +++ b/src/doc/rustc-dev-guide/src/notification-groups/windows.md @@ -10,8 +10,7 @@ Windows-related issues as well as suggestions on how to resolve interesting questions regarding our Windows support. The group also has an associated Zulip channel ([`#t-compiler/windows`]) -where people can go to pose questions and discuss Windows-specific -topics. +where people can go to pose questions and discuss Windows-specific topics. To get a better idea for what the group will do, here are some examples of the kinds of questions where we would have reached out to @@ -19,12 +18,12 @@ the group for advice in determining the best course of action: * Which versions of MinGW should we support? * Should we remove the legacy InnoSetup GUI installer? [#72569] -* What names should we use for static libraries on Windows? [#29520] +* What names should we use for static libraries on Windows? + [#29520] -So, if you are interested in participating, please sign up for the -Windows group! To do so, open a PR against the [rust-lang/team] -repository. Just [follow this example][eg], but change the username to -your own! +So, if you are interested in participating, please sign up for the Windows group! +To do so, open a PR against the [rust-lang/team] repository. +Just [follow this example][eg], but change the username to your own! [`#t-compiler/windows`]: https://rust-lang.zulipchat.com/#streams/242869/t-compiler.2Fwindows [rust-lang/team]: https://github.com/rust-lang/team diff --git a/src/doc/rustc-dev-guide/src/overview.md b/src/doc/rustc-dev-guide/src/overview.md index b90bb173c475c..374a3c5bfa406 100644 --- a/src/doc/rustc-dev-guide/src/overview.md +++ b/src/doc/rustc-dev-guide/src/overview.md @@ -1,75 +1,77 @@ # Overview of the compiler -This chapter is about the overall process of compiling a program -- how -everything fits together. +This chapter is about the overall process of compiling a program -- how everything fits together. The Rust compiler is special in two ways: it does things to your code that other compilers don't do (e.g. borrow-checking) and it has a lot of -unconventional implementation choices (e.g. queries). We will talk about these -in turn in this chapter, and in the rest of the guide, we will look at the +unconventional implementation choices (e.g. queries). +We will talk about these in turn in this chapter, and in the rest of the guide, we will look at the individual pieces in more detail. ## What the compiler does to your code -So first, let's look at what the compiler does to your code. For now, we will -avoid mentioning how the compiler implements these steps except as needed. +So first, let's look at what the compiler does to your code. +For now, we will avoid mentioning how the compiler implements these steps except as needed. ### Invocation Compilation begins when a user writes a Rust source program in text and invokes -the `rustc` compiler on it. The work that the compiler needs to perform is -defined by command-line options. For example, it is possible to enable nightly +the `rustc` compiler on it. +The work that the compiler needs to perform is defined by command-line options. +For example, it is possible to enable nightly features (`-Z` flags), perform `check`-only builds, or emit the LLVM Intermediate Representation (`LLVM-IR`) rather than executable machine code. The `rustc` executable call may be indirect through the use of `cargo`. -Command line argument parsing occurs in the [`rustc_driver`]. This crate -defines the compile configuration that is requested by the user and passes it +Command line argument parsing occurs in the [`rustc_driver`]. +This crate defines the compile configuration that is requested by the user and passes it to the rest of the compilation process as a [`rustc_interface::Config`]. ### Lexing and parsing -The raw Rust source text is analyzed by a low-level *lexer* located in -[`rustc_lexer`]. At this stage, the source text is turned into a stream of -atomic source code units known as _tokens_. The `lexer` supports the -Unicode character encoding. +The raw Rust source text is analyzed by a low-level *lexer* located in [`rustc_lexer`]. +At this stage, the source text is turned into a stream of +atomic source code units known as _tokens_. + The `lexer` supports the Unicode character encoding. The token stream passes through a higher-level lexer located in -[`rustc_parse`] to prepare for the next stage of the compile process. The -[`Lexer`] `struct` is used at this stage to perform a set of validations +[`rustc_parse`] to prepare for the next stage of the compile process. +The [`Lexer`] `struct` is used at this stage to perform a set of validations and turn strings into interned symbols (_interning_ is discussed later). -[String interning] is a way of storing only one immutable -copy of each distinct string value. +[String interning] is a way of storing only one immutable copy of each distinct string value. The lexer has a small interface and doesn't depend directly on the diagnostic -infrastructure in `rustc`. Instead it provides diagnostics as plain data which -are emitted in [`rustc_parse::lexer`] as real diagnostics. The `lexer` -preserves full fidelity information for both IDEs and procedural macros +infrastructure in `rustc`. +Instead it provides diagnostics as plain data which +are emitted in [`rustc_parse::lexer`] as real diagnostics. +The `lexer` preserves full fidelity information for both IDEs and procedural macros (sometimes referred to as "proc-macros"). The *parser* [translates the token stream from the `lexer` into an Abstract Syntax -Tree (AST)][parser]. It uses a recursive descent (top-down) approach to syntax -analysis. The crate entry points for the `parser` are the +Tree (AST)][parser]. +It uses a recursive descent (top-down) approach to syntax analysis. +The crate entry points for the `parser` are the [`Parser::parse_crate_mod`][parse_crate_mod] and [`Parser::parse_mod`][parse_mod] -methods found in [`rustc_parse::parser::Parser`]. The external module parsing +methods found in [`rustc_parse::parser::Parser`]. +The external module parsing entry point is [`rustc_expand::module::parse_external_mod`][parse_external_mod]. And the macro-`parser` entry point is [`Parser::parse_nonterminal`][parse_nonterminal]. Parsing is performed with a set of [`parser`] utility methods including [`bump`], [`check`], [`eat`], [`expect`], [`look_ahead`]. -Parsing is organized by semantic construct. Separate -`parse_*` methods can be found in the [`rustc_parse`][rustc_parse_parser_dir] -directory. The source file name follows the construct name. For example, the -following files are found in the `parser`: +Parsing is organized by semantic construct. +Separate `parse_*` methods can be found in the [`rustc_parse`][rustc_parse_parser_dir] directory. +The source file name follows the construct name. +For example, the following files are found in the `parser`: - [`expr.rs`](https://github.com/rust-lang/rust/blob/HEAD/compiler/rustc_parse/src/parser/expr.rs) - [`pat.rs`](https://github.com/rust-lang/rust/blob/HEAD/compiler/rustc_parse/src/parser/pat.rs) - [`ty.rs`](https://github.com/rust-lang/rust/blob/HEAD/compiler/rustc_parse/src/parser/ty.rs) - [`stmt.rs`](https://github.com/rust-lang/rust/blob/HEAD/compiler/rustc_parse/src/parser/stmt.rs) -This naming scheme is used across many compiler stages. You will find either a -file or directory with the same name across the parsing, lowering, type +This naming scheme is used across many compiler stages. +You will find either a file or directory with the same name across the parsing, lowering, type checking, [Typed High-level Intermediate Representation (`THIR`)][thir] lowering, and [Mid-level Intermediate Representation (`MIR`)][mir] building sources. @@ -77,26 +79,25 @@ Macro-expansion, `AST`-validation, name-resolution, and early linting also take place during the lexing and parsing stage. The [`rustc_ast::ast`]::{[`Crate`], [`Expr`], [`Pat`], ...} `AST` nodes are -returned from the parser while the standard [`Diag`] API is used -for error handling. Generally Rust's compiler will try to recover from errors +returned from the parser while the standard [`Diag`] API is used for error handling. +Generally Rust's compiler will try to recover from errors by parsing a superset of Rust's grammar, while also emitting an error type. ### `AST` lowering Next the `AST` is converted into [High-Level Intermediate Representation -(`HIR`)][hir], a more compiler-friendly representation of the `AST`. This process -is called "lowering" and involves a lot of desugaring (the expansion and -formalizing of shortened or abbreviated syntax constructs) of things like loops -and `async fn`. +(`HIR`)][hir], a more compiler-friendly representation of the `AST`. +This process is called "lowering" and involves a lot of desugaring (the expansion and +formalizing of shortened or abbreviated syntax constructs) of things like loops and `async fn`. We then use the `HIR` to do [*type inference*] (the process of automatic detection of the type of an expression), [*trait solving*] (the process of -pairing up an impl with each reference to a `trait`), and [*type checking*]. Type -checking is the process of converting the types found in the `HIR` ([`hir::Ty`]), +pairing up an impl with each reference to a `trait`), and [*type checking*]. +Type checking is the process of converting the types found in the `HIR` ([`hir::Ty`]), which represent what the user wrote, into the internal representation used by -the compiler ([`Ty<'tcx>`]). It's called type checking because the information -is used to verify the type safety, correctness and coherence of the types used -in the program. +the compiler ([`Ty<'tcx>`]). +It's called type checking because the information +is used to verify the type safety, correctness and coherence of the types used in the program. ### `MIR` lowering @@ -105,29 +106,31 @@ The `HIR` is further lowered to `MIR` pattern and exhaustiveness checking) to convert into `MIR`. We do [many optimizations on the MIR][mir-opt] because it is generic and that -improves later code generation and compilation speed. It is easier to do some -optimizations at `MIR` level than at `LLVM-IR` level. For example LLVM doesn't seem +improves later code generation and compilation speed. +It is easier to do some optimizations at `MIR` level than at `LLVM-IR` level. +For example LLVM doesn't seem to be able to optimize the pattern the [`simplify_try`] `MIR`-opt looks for. Rust code is also [_monomorphized_] during code generation, which means making -copies of all the generic code with the type parameters replaced by concrete -types. To do this, we need to collect a list of what concrete types to generate -code for. This is called _monomorphization collection_ and it happens at the -`MIR` level. +copies of all the generic code with the type parameters replaced by concrete types. +To do this, we need to collect a list of what concrete types to generate code for. +This is called _monomorphization collection_ and it happens at the `MIR` level. [_monomorphized_]: https://en.wikipedia.org/wiki/Monomorphization ### Code generation -We then begin what is simply called _code generation_ or _codegen_. The [code -generation stage][codegen] is when higher-level representations of source are -turned into an executable binary. Since `rustc` uses LLVM for code generation, -the first step is to convert the `MIR` to `LLVM-IR`. This is where the `MIR` is -actually monomorphized. The `LLVM-IR` is passed to LLVM, which does a lot more +We then begin what is simply called _code generation_ or _codegen_. +The [code generation stage][codegen] is when higher-level representations of source are +turned into an executable binary. +Since `rustc` uses LLVM for code generation, +the first step is to convert the `MIR` to `LLVM-IR`. +This is where the `MIR` is actually monomorphized. +The `LLVM-IR` is passed to LLVM, which does a lot more optimizations on it, emitting machine code which is basically assembly code with additional low-level types and annotations added (e.g. an ELF object or -`WASM`). The different libraries/binaries are then linked together to produce -the final binary. +`WASM`). +The different libraries/binaries are then linked together to produce the final binary. [*trait solving*]: traits/resolution.md [*type checking*]: hir-typeck/summary.md @@ -171,24 +174,24 @@ the final binary. ## How it does it Now that we have a high-level view of what the compiler does to your code, -let's take a high-level view of _how_ it does all that stuff. There are a lot -of constraints and conflicting goals that the compiler needs to -satisfy/optimize for. For example, - -- Compilation speed: how fast is it to compile a program? More/better - compile-time analyses often means compilation is slower. - - Also, we want to support incremental compilation, so we need to take that - into account. How can we keep track of what work needs to be redone and +let's take a high-level view of _how_ it does all that stuff. +There are a lot of constraints and conflicting goals that the compiler needs to +satisfy/optimize for. +For example, + +- Compilation speed: how fast is it to compile a program? + More/better compile-time analyses often means compilation is slower. + - Also, we want to support incremental compilation, so we need to take that into account. + How can we keep track of what work needs to be redone and what can be reused if the user modifies their program? - Also we can't store too much stuff in the incremental cache because it would take a long time to load from disk and it could take a lot of space on the user's system... -- Compiler memory usage: while compiling a program, we don't want to use more - memory than we need. -- Program speed: how fast is your compiled program? More/better compile-time - analyses often means the compiler can do better optimizations. -- Program size: how large is the compiled binary? Similar to the previous - point. +- Compiler memory usage: while compiling a program, we don't want to use more memory than we need. +- Program speed: how fast is your compiled program? + More/better compile-time analyses often means the compiler can do better optimizations. +- Program size: how large is the compiled binary? + Similar to the previous point. - Compiler compilation speed: how long does it take to compile the compiler? This impacts contributors and compiler maintenance. - Implementation complexity: building a compiler is one of the hardest @@ -199,123 +202,131 @@ satisfy/optimize for. For example, tremendous amount of change constantly going on. - Integration: a number of other tools need to use the compiler in various ways (e.g. `cargo`, `clippy`, `Miri`) that must be supported. -- Compiler stability: the compiler should not crash or fail ungracefully on the - stable channel. +- Compiler stability: the compiler should not crash or fail ungracefully on the stable channel. - Rust stability: the compiler must respect Rust's stability guarantees by not breaking programs that previously compiled despite the many changes that are always going on to its implementation. - Limitations of other tools: `rustc` uses LLVM in its backend, and LLVM has some strengths we leverage and some aspects we need to work around. -So, as you continue your journey through the rest of the guide, keep these -things in mind. They will often inform decisions that we make. +So, as you continue your journey through the rest of the guide, keep these things in mind. +They will often inform decisions that we make. ### Intermediate representations As with most compilers, `rustc` uses some intermediate representations (IRs) to -facilitate computations. In general, working directly with the source code is -extremely inconvenient and error-prone. Source code is designed to be human-friendly while at +facilitate computations. +In general, working directly with the source code is extremely inconvenient and error-prone. +Source code is designed to be human-friendly while at the same time being unambiguous, but it's less convenient for doing something like, say, type checking. Instead most compilers, including `rustc`, build some sort of IR out of the -source code which is easier to analyze. `rustc` has a few IRs, each optimized -for different purposes: +source code which is easier to analyze. +`rustc` has a few IRs, each optimized for different purposes: -- Token stream: the lexer produces a stream of tokens directly from the source - code. This stream of tokens is easier for the parser to deal with than raw - text. +- Token stream: the lexer produces a stream of tokens directly from the source code. + This stream of tokens is easier for the parser to deal with than raw text. - Abstract Syntax Tree (`AST`): the abstract syntax tree is built from the stream - of tokens produced by the lexer. It represents - pretty much exactly what the user wrote. It helps to do some syntactic sanity + of tokens produced by the lexer. + It represents pretty much exactly what the user wrote. + It helps to do some syntactic sanity checking (e.g. checking that a type is expected where the user wrote one). -- High-level IR (HIR): This is a sort of desugared `AST`. It's still close - to what the user wrote syntactically, but it includes some implicit things +- High-level IR (HIR): This is a sort of desugared `AST`. + It's still close to what the user wrote syntactically, but it includes some implicit things such as some elided lifetimes, etc. This IR is amenable to type checking. - Typed `HIR` (THIR) _formerly High-level Abstract IR (HAIR)_: This is an - intermediate between `HIR` and MIR. It is like the `HIR` but it is fully typed + intermediate between `HIR` and MIR. + It is like the `HIR` but it is fully typed and a bit more desugared (e.g. method calls and implicit dereferences are - made fully explicit). As a result, it is easier to lower to `MIR` from `THIR` than - from HIR. -- Middle-level IR (`MIR`): This IR is basically a Control-Flow Graph (CFG). A CFG - is a type of diagram that shows the basic blocks of a program and how control - flow can go between them. Likewise, `MIR` also has a bunch of basic blocks with + made fully explicit). + As a result, it is easier to lower to `MIR` from `THIR` than from HIR. +- Middle-level IR (`MIR`): This IR is basically a Control-Flow Graph (CFG). + A CFG is a type of diagram that shows the basic blocks of a program and how control + flow can go between them. + Likewise, `MIR` also has a bunch of basic blocks with simple typed statements inside them (e.g. assignment, simple computations, etc) and control flow edges to other basic blocks (e.g., calls, dropping - values). `MIR` is used for borrow checking and other + values). + `MIR` is used for borrow checking and other important dataflow-based checks, such as checking for uninitialized values. - It is also used for a series of optimizations and for constant evaluation (via - `Miri`). Because `MIR` is still generic, we can do a lot of analyses here more + It is also used for a series of optimizations and for constant evaluation (via `Miri`). + Because `MIR` is still generic, we can do a lot of analyses here more efficiently than after monomorphization. -- `LLVM-IR`: This is the standard form of all input to the LLVM compiler. `LLVM-IR` - is a sort of typed assembly language with lots of annotations. It's +- `LLVM-IR`: This is the standard form of all input to the LLVM compiler. + `LLVM-IR` is a sort of typed assembly language with lots of annotations. + It's a standard format that is used by all compilers that use LLVM (e.g. the clang - C compiler also outputs `LLVM-IR`). `LLVM-IR` is designed to be easy for other - compilers to emit and also rich enough for LLVM to run a bunch of - optimizations on it. + C compiler also outputs `LLVM-IR`). + `LLVM-IR` is designed to be easy for other + compilers to emit and also rich enough for LLVM to run a bunch of optimizations on it. One other thing to note is that many values in the compiler are _interned_. This is a performance and memory optimization in which we allocate the values in -a special allocator called an -_[arena]_. Then, we pass -around references to the values allocated in the arena. This allows us to make +a special allocator called an _[arena]_. +Then, we pass around references to the values allocated in the arena. +This allows us to make sure that identical values (e.g. types in your program) are only allocated once -and can be compared cheaply by comparing pointers. Many of the intermediate -representations are interned. +and can be compared cheaply by comparing pointers. +Many of the intermediate representations are interned. [arena]: https://en.wikipedia.org/wiki/Region-based_memory_management ### Queries -The first big implementation choice is Rust's use of the _query_ system in its -compiler. The Rust compiler _is not_ organized as a series of passes over the -code which execute sequentially. The Rust compiler does this to make +The first big implementation choice is Rust's use of the _query_ system in its compiler. +The Rust compiler _is not_ organized as a series of passes over the code which execute sequentially. +The Rust compiler does this to make incremental compilation possible -- that is, if the user makes a change to their program and recompiles, we want to do as little redundant work as possible to output the new binary. -In `rustc`, all the major steps above are organized as a bunch of queries that -call each other. For example, there is a query to ask for the type of something -and another to ask for the optimized `MIR` of a function. These queries can call -each other and are all tracked through the query system. The results of the -queries are cached on disk so that the compiler can tell which queries' results -changed from the last compilation and only redo those. This is how incremental -compilation works. - -In principle, for the query-fied steps, we do each of the above for each item -individually. For example, we will take the `HIR` for a function and use queries -to ask for the `LLVM-IR` for that HIR. This drives the generation of optimized -`MIR`, which drives the borrow checker, which drives the generation of `MIR`, and -so on. - -... except that this is very over-simplified. In fact, some queries are not +In `rustc`, all the major steps above are organized as a bunch of queries that call each other. +For example, there is a query to ask for the type of something +and another to ask for the optimized `MIR` of a function. +These queries can call each other and are all tracked through the query system. +The results of the queries are cached on disk so that the compiler can tell which queries' results +changed from the last compilation and only redo those. +This is how incremental compilation works. + +In principle, for the query-fied steps, we do each of the above for each item individually. +For example, we will take the `HIR` for a function and use queries +to ask for the `LLVM-IR` for that HIR. +This drives the generation of optimized +`MIR`, which drives the borrow checker, which drives the generation of `MIR`, and so on. + +... except that this is very over-simplified. +In fact, some queries are not cached on disk, and some parts of the compiler have to run for all code anyway for correctness even if the code is dead code (e.g. the borrow checker). For example, [currently the `mir_borrowck` query is first executed on all functions of a crate.][passes] Then the codegen backend invokes the `collect_and_partition_mono_items` query, which first recursively requests the `optimized_mir` for all reachable functions, which in turn runs `mir_borrowck` -for that function and then creates codegen units. This kind of split will need +for that function and then creates codegen units. +This kind of split will need to remain to ensure that unreachable functions still have their errors emitted. [passes]: https://github.com/rust-lang/rust/blob/e69c7306e2be08939d95f14229e3f96566fb206c/compiler/rustc_interface/src/passes.rs#L791 Moreover, the compiler wasn't originally built to use a query system; the query -system has been retrofitted into the compiler, so parts of it are not query-fied -yet. Also, LLVM isn't our code, so that isn't querified either. The plan is to -eventually query-fy all of the steps listed in the previous section, +system has been retrofitted into the compiler, so parts of it are not query-fied yet. +Also, LLVM isn't our code, so that isn't querified either. +The plan is to eventually query-fy all of the steps listed in the previous section, but as of November 2022, only the steps between `HIR` and -`LLVM-IR` are query-fied. That is, lexing, parsing, name resolution, and macro +`LLVM-IR` are query-fied. +That is, lexing, parsing, name resolution, and macro expansion are done all at once for the whole program. One other thing to mention here is the all-important "typing context", [`TyCtxt`], which is a giant struct that is at the center of all things. -(Note that the name is mostly historic. This is _not_ a "typing context" in the -sense of `Γ` or `Δ` from type theory. The name is retained because that's what -the name of the struct is in the source code.) All +(Note that the name is mostly historic. +This is _not_ a "typing context" in the sense of `Γ` or `Δ` from type theory. +The name is retained because that's what the name of the struct is in the source code.) All queries are defined as methods on the [`TyCtxt`] type, and the in-memory query -cache is stored there too. In the code, there is usually a variable called -`tcx` which is a handle on the typing context. You will also see lifetimes with +cache is stored there too. +In the code, there is usually a variable called `tcx` which is a handle on the typing context. +You will also see lifetimes with the name `'tcx`, which means that something is tied to the lifetime of the [`TyCtxt`] (usually it is stored or interned there). @@ -327,9 +338,10 @@ For more information about queries in the compiler, see [the queries chapter][qu ### `ty::Ty` -Types are really important in Rust, and they form the core of a lot of compiler -analyses. The main type (in the compiler) that represents types (in the user's -program) is [`rustc_middle::ty::Ty`][ty]. This is so important that we have a whole chapter +Types are really important in Rust, and they form the core of a lot of compiler analyses. +The main type (in the compiler) that represents types (in the user's +program) is [`rustc_middle::ty::Ty`][ty]. +This is so important that we have a whole chapter on [`ty::Ty`][ty], but for now, we just want to mention that it exists and is the way `rustc` represents types! @@ -340,20 +352,21 @@ Also note that the [`rustc_middle::ty`] module defines the [`TyCtxt`] struct we ### Parallelism -Compiler performance is a problem that we would like to improve on -(and are always working on). One aspect of that is parallelizing -`rustc` itself. +Compiler performance is a problem that we would like to improve on (and are always working on). +One aspect of that is parallelizing `rustc` itself. Currently, there is only one part of rustc that is parallel by default: [code generation](./parallel-rustc.md#Codegen). -However, the rest of the compiler is still not yet parallel. There have been -lots of efforts spent on this, but it is generally a hard problem. The current -approach is to turn [`RefCell`]s into [`Mutex`]s -- that is, we -switch to thread-safe internal mutability. However, there are ongoing +However, the rest of the compiler is still not yet parallel. +There have been lots of efforts spent on this, but it is generally a hard problem. +The current approach is to turn [`RefCell`]s into [`Mutex`]s -- that is, we +switch to thread-safe internal mutability. +However, there are ongoing challenges with lock contention, maintaining query-system invariants under -concurrency, and the complexity of the code base. One can try out the current -work by enabling parallel compilation in `bootstrap.toml`. It's still early days, +concurrency, and the complexity of the code base. +One can try out the current work by enabling parallel compilation in `bootstrap.toml`. +It's still early days, but there are already some promising performance improvements. [`RefCell`]: https://doc.rust-lang.org/std/cell/struct.RefCell.html @@ -361,15 +374,15 @@ but there are already some promising performance improvements. ### Bootstrapping -`rustc` itself is written in Rust. So how do we compile the compiler? We use an -older compiler to compile the newer compiler. This is called [_bootstrapping_]. +`rustc` itself is written in Rust. +So how do we compile the compiler? We use an older compiler to compile the newer compiler. +This is called [_bootstrapping_]. -Bootstrapping has a lot of interesting implications. For example, it means -that one of the major users of Rust is the Rust compiler, so we are +Bootstrapping has a lot of interesting implications. +For example, it means that one of the major users of Rust is the Rust compiler, so we are constantly testing our own software ("eating our own dogfood"). -For more details on bootstrapping, see -[the bootstrapping section of the guide][rustc-bootstrap]. +For more details on bootstrapping, see [the bootstrapping section of the guide][rustc-bootstrap]. [_bootstrapping_]: https://en.wikipedia.org/wiki/Bootstrapping_(compilers) [rustc-bootstrap]: building/bootstrapping/intro.md @@ -382,8 +395,7 @@ For more details on bootstrapping, see parser, HIR, etc)? - e.g., `cargo rustc -- -Z unpretty=hir-tree` allows you to view `HIR` representation - What is the main source entry point for `X`? -- Where do phases diverge for cross-compilation to machine code across - different platforms? +- Where do phases diverge for cross-compilation to machine code across different platforms? --> # References @@ -406,7 +418,7 @@ For more details on bootstrapping, see - [Entry point for outline module parsing](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/module/fn.parse_external_mod.html) - [Entry point for macro fragments][parse_nonterminal] - `AST` definition: [`rustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html) - - Feature gating: **TODO** + - Feature gating: [Feature Gate Checking](feature-gate-check.md) - Early linting: **TODO** - The High Level Intermediate Representation (HIR) - Guide: [The HIR](hir.md) @@ -439,6 +451,6 @@ For more details on bootstrapping, see - Guide: [Code Generation](backend/codegen.md) - Generating Machine Code from `LLVM-IR` with LLVM - **TODO: reference?** - Main entry point: [`rustc_codegen_ssa::base::codegen_crate`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html) - - This monomorphizes and produces `LLVM-IR` for one codegen unit. It then - starts a background thread to run LLVM, which must be joined later. + - This monomorphizes and produces `LLVM-IR` for one codegen unit. + It then starts a background thread to run LLVM, which must be joined later. - Monomorphization happens lazily via [`FunctionCx::monomorphize`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/struct.FunctionCx.html#method.monomorphize) and [`rustc_codegen_ssa::base::codegen_instance `](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_instance.html) diff --git a/src/doc/rustc-dev-guide/src/queries/incremental-compilation-in-detail.md b/src/doc/rustc-dev-guide/src/queries/incremental-compilation-in-detail.md index cab9f6871f7c5..988284620d5c3 100644 --- a/src/doc/rustc-dev-guide/src/queries/incremental-compilation-in-detail.md +++ b/src/doc/rustc-dev-guide/src/queries/incremental-compilation-in-detail.md @@ -1,7 +1,8 @@ # Incremental compilation in detail The incremental compilation scheme is, in essence, a surprisingly -simple extension to the overall query system. It relies on the fact that: +simple extension to the overall query system. +It relies on the fact that: 1. queries are pure functions -- given the same inputs, a query will always yield the same result, and @@ -14,8 +15,8 @@ incremental and then goes on to discuss version implementation issues. ## A Basic Algorithm For Incremental Query Evaluation As explained in the [query evaluation model primer][query-model], query -invocations form a directed-acyclic graph. Here's the example from the -previous chapter again: +invocations form a directed-acyclic graph. +Here's the example from the previous chapter again: ```ignore list_of_all_hir_items <----------------------------- type_check_crate() @@ -30,38 +31,38 @@ previous chapter again: ``` Since every access from one query to another has to go through the query -context, we can record these accesses and thus actually build this dependency -graph in memory. With dependency tracking enabled, when compilation is done, +context, we can record these accesses and thus actually build this dependency graph in memory. +With dependency tracking enabled, when compilation is done, we know which queries were invoked (the nodes of the graph) and for each invocation, which other queries or input has gone into computing the query's result (the edges of the graph). Now suppose we change the source code of our program so that -HIR of `bar` looks different than before. Our goal is to only recompute -those queries that are actually affected by the change while re-using -the cached results of all the other queries. Given the dependency graph we can -do exactly that. For a given query invocation, the graph tells us exactly +HIR of `bar` looks different than before. +Our goal is to only recompute those queries that are actually affected by the change while re-using +the cached results of all the other queries. +Given the dependency graph we can do exactly that. +For a given query invocation, the graph tells us exactly what data has gone into computing its results, we just have to follow the -edges until we reach something that has changed. If we don't encounter -anything that has changed, we know that the query still would evaluate to +edges until we reach something that has changed. +If we don't encounter anything that has changed, we know that the query still would evaluate to the same result we already have in our cache. Taking the `type_of(foo)` invocation from above as an example, we can check -whether the cached result is still valid by following the edges to its -inputs. The only edge leads to `Hir(foo)`, an input that has not been affected -by the change. So we know that the cached result for `type_of(foo)` is still -valid. +whether the cached result is still valid by following the edges to its inputs. +The only edge leads to `Hir(foo)`, an input that has not been affected by the change. +So we know that the cached result for `type_of(foo)` is still valid. The story is a bit different for `type_check_item(foo)`: We again walk the -edges and already know that `type_of(foo)` is fine. Then we get to -`type_of(bar)` which we have not checked yet, so we walk the edges of -`type_of(bar)` and encounter `Hir(bar)` which *has* changed. Consequently -the result of `type_of(bar)` might yield a different result than what we -have in the cache and, transitively, the result of `type_check_item(foo)` -might have changed too. We thus re-run `type_check_item(foo)`, which in +edges and already know that `type_of(foo)` is fine. +Then we get to `type_of(bar)` which we have not checked yet, so we walk the edges of +`type_of(bar)` and encounter `Hir(bar)` which *has* changed. +Consequently the result of `type_of(bar)` might yield a different result than what we +have in the cache and, transitively, the result of `type_check_item(foo)` might have changed too. +We thus re-run `type_check_item(foo)`, which in turn will re-run `type_of(bar)`, which will yield an up-to-date result -because it reads the up-to-date version of `Hir(bar)`. Also, we re-run -`type_check_item(bar)` because result of `type_of(bar)` might have changed. +because it reads the up-to-date version of `Hir(bar)`. +Also, we re-run `type_check_item(bar)` because result of `type_of(bar)` might have changed. ## The problem with the basic algorithm: false positives @@ -69,8 +70,8 @@ because it reads the up-to-date version of `Hir(bar)`. Also, we re-run If you read the previous paragraph carefully you'll notice that it says that `type_of(bar)` *might* have changed because one of its inputs has changed. There's also the possibility that it might still yield exactly the same -result *even though* its input has changed. Consider an example with a -simple query that just computes the sign of an integer: +result *even though* its input has changed. +Consider an example with a simple query that just computes the sign of an integer: ```ignore IntValue(x) <---- sign_of(x) <--- some_other_query(x) @@ -81,23 +82,22 @@ Even though `IntValue(x)` is different in the two cases, `sign_of(x)` yields the result `+` in both cases. If we follow the basic algorithm, however, `some_other_query(x)` would have to -(unnecessarily) be re-evaluated because it transitively depends on a changed -input. Change detection yields a "false positive" in this case because it has -to conservatively assume that `some_other_query(x)` might be affected by that -changed input. +(unnecessarily) be re-evaluated because it transitively depends on a changed input. +Change detection yields a "false positive" in this case because it has +to conservatively assume that `some_other_query(x)` might be affected by that changed input. Unfortunately it turns out that the actual queries in the compiler are full of examples like this and small changes to the input often potentially affect -very large parts of the output binaries. As a consequence, we had to make the -change detection system smarter and more accurate. +very large parts of the output binaries. +As a consequence, we had to make the change detection system smarter and more accurate. ## Improving accuracy: the red-green algorithm The "false positives" problem can be solved by interleaving change detection -and query re-evaluation. Instead of walking the graph all the way to the +and query re-evaluation. +Instead of walking the graph all the way to the inputs when trying to find out if some cached result is still valid, we can -check if a result has *actually* changed after we were forced to re-evaluate -it. +check if a result has *actually* changed after we were forced to re-evaluate it. We call this algorithm the red-green algorithm because nodes in the dependency graph are assigned the color green if we were able to prove @@ -181,10 +181,12 @@ fn try_mark_green(tcx, current_node) -> bool { > [`compiler/rustc_middle/src/dep_graph/graph.rs`][try_mark_green] By using red-green marking we can avoid the devastating cumulative effect of -having false positives during change detection. Whenever a query is executed -in incremental mode, we first check if its already green. If not, we run -`try_mark_green()` on it. If it still isn't green after that, then we actually -invoke the query provider to re-compute the result. Re-computing the query might +having false positives during change detection. +Whenever a query is executed in incremental mode, we first check if its already green. +If not, we run `try_mark_green()` on it. +If it still isn't green after that, then we actually +invoke the query provider to re-compute the result. +Re-computing the query might then itself involve recursively invoking more queries, which can mean we come back to the `try_mark_green()` algorithm for the dependencies recursively. @@ -200,7 +202,8 @@ This comes with a whole new set of implementation challenges: - The query result cache is stored to disk, so they are not readily available for change comparison. - A subsequent compilation session will start off with new version of the code - that has arbitrary changes applied to it. All kinds of IDs and indices that + that has arbitrary changes applied to it. + All kinds of IDs and indices that are generated from a global, sequential counter (e.g. `NodeId`, `DefId`, etc) might have shifted, making the persisted results on disk not immediately usable anymore because the same numeric IDs and indices might refer to @@ -215,41 +218,46 @@ The following sections describe how the compiler solves these issues. ### A Question Of Stability: Bridging The Gap Between Compilation Sessions As noted before, various IDs (like `DefId`) are generated by the compiler in a -way that depends on the contents of the source code being compiled. ID assignment -is usually deterministic, that is, if the exact same code is compiled twice, -the same things will end up with the same IDs. However, if something +way that depends on the contents of the source code being compiled. +ID assignment is usually deterministic. +That is, if the exact same code is compiled twice, +the same things will end up with the same IDs. +However, if something changes, e.g. a function is added in the middle of a file, there is no guarantee that anything will have the same ID as it had before. As a consequence we cannot represent the data in our on-disk cache the same -way it is represented in memory. For example, if we just stored a piece +way it is represented in memory. +For example, if we just stored a piece of type information like `TyKind::FnDef(DefId, &'tcx Substs<'tcx>)` (as we do in memory) and then the contained `DefId` points to a different function in a new compilation session we'd be in trouble. The solution to this problem is to find "stable" forms for IDs which remain -valid in between compilation sessions. For the most important case, `DefId`s, -these are the so-called `DefPath`s. Each `DefId` has a -corresponding `DefPath` but in place of a numeric ID, a `DefPath` is based on +valid in between compilation sessions. +For the most important case, `DefId`s, +these are the so-called `DefPath`s. +Each `DefId` has a corresponding `DefPath`, but in place of a numeric ID, a `DefPath` is based on the path to the identified item, e.g. `std::collections::HashMap`. The advantage of an ID like this is that it is not affected by unrelated changes. For example, one can add a new function to `std::collections` but -`std::collections::HashMap` would still be `std::collections::HashMap`. A -`DefPath` is "stable" across changes made to the source code while a `DefId` -isn't. +`std::collections::HashMap` would still be `std::collections::HashMap`. +A `DefPath` is "stable" across changes made to the source code while a `DefId` isn't. -There is also the `DefPathHash` which is just a 128-bit hash value of the -`DefPath`. The two contain the same information and we mostly use the +There is also the `DefPathHash` which is just a 128-bit hash value of the `DefPath`. +The two contain the same information and we mostly use the `DefPathHash` because it simpler to handle, being `Copy` and self-contained. This principle of stable identifiers is used to make the data in the on-disk -cache resilient to source code changes. Instead of storing a `DefId`, we store +cache resilient to source code changes. +Instead of storing a `DefId`, we store the `DefPathHash` and when we deserialize something from the cache, we map the `DefPathHash` to the corresponding `DefId` in the *current* compilation session (which is just a simple hash table lookup). The `HirId`, used for identifying HIR components that don't have their own -`DefId`, is another such stable ID. It is (conceptually) a pair of a `DefPath` +`DefId`, is another such stable ID. +It is (conceptually) a pair of a `DefPath` and a `LocalId`, where the `LocalId` identifies something (e.g. a `hir::Expr`) locally within its "owner" (e.g. a `hir::Item`). If the owner is moved around, the `LocalId`s within it are still the same. @@ -259,30 +267,31 @@ the `LocalId`s within it are still the same. ### Checking query results for changes: `HashStable` and `Fingerprint`s In order to do red-green-marking we often need to check if the result of a -query has changed compared to the result it had during the previous -compilation session. There are two performance problems with this though: +query has changed compared to the result it had during the previous compilation session. +There are two performance problems with this though: -- We'd like to avoid having to load the previous result from disk just for - doing the comparison. We already computed the new result and will use that. +- We'd like to avoid having to load the previous result from disk just for doing the comparison. + We already computed the new result and will use that. Also loading a result from disk will "pollute" the interners with data that is unlikely to ever be used. -- We don't want to store each and every result in the on-disk cache. For - example, it would be wasted effort to persist things to disk that are +- We don't want to store each and every result in the on-disk cache. + For example, it would be wasted effort to persist things to disk that are already available in upstream crates. -The compiler avoids these problems by using so-called `Fingerprint`s. Each time -a new query result is computed, the query engine will compute a 128 bit hash -value of the result. We call this hash value "the `Fingerprint` of the query -result". The hashing is (and has to be) done "in a stable way". This means -that whenever something is hashed that might change in between compilation +The compiler avoids these problems by using so-called `Fingerprint`s. +Each time a new query result is computed, the query engine will compute a 128 bit hash +value of the result. +We call this hash value "the `Fingerprint` of the query result". +The hashing is (and has to be) done "in a stable way". +This means that whenever something is hashed that might change in between compilation sessions (e.g. a `DefId`), we instead hash its stable equivalent (e.g. the corresponding `DefPath`). That's what the whole `HashStable` -infrastructure is for. This way `Fingerprint`s computed in two -different compilation sessions are still comparable. +infrastructure is for. +This way `Fingerprint`s computed in two different compilation sessions are still comparable. The next step is to store these fingerprints along with the dependency graph. -This is cheap since fingerprints are just bytes to be copied. It's also cheap to -load the entire set of fingerprints together with the dependency graph. +This is cheap since fingerprints are just bytes to be copied. +It's also cheap to load the entire set of fingerprints together with the dependency graph. Now, when red-green-marking reaches the point where it needs to check if a result has changed, it can just compare the (already loaded) previous @@ -290,17 +299,17 @@ fingerprint to the fingerprint of the new result. This approach works rather well but it's not without flaws: -- There is a small possibility of hash collisions. That is, two different - results could have the same fingerprint and the system would erroneously +- There is a small possibility of hash collisions. + That is, two different results could have the same fingerprint and the system would erroneously assume that the result hasn't changed, leading to a missed update. - We mitigate this risk by using a high-quality hash function and a 128 bit - wide hash value. Due to these measures the practical risk of a hash - collision is negligible. + We mitigate this risk by using a high-quality hash function and a 128 bit wide hash value. + Due to these measures the practical risk of a hash collision is negligible. -- Computing fingerprints is quite costly. It is the main reason why incremental - compilation can be slower than non-incremental compilation. We are forced to - use a good and thus expensive hash function, and we have to map things to +- Computing fingerprints is quite costly. + It is the main reason why incremental + compilation can be slower than non-incremental compilation. + We are forced to use a good and thus expensive hash function, and we have to map things to their stable equivalents while doing the hashing. @@ -313,32 +322,36 @@ dependency graphs: The one we built during the previous compilation session and the one that we are building for the current compilation session. When a compilation session starts, the compiler loads the previous dependency -graph into memory as an immutable piece of data. Then, when a query is invoked, -it will first try to mark the corresponding node in the graph as green. This -means really that we are trying to mark the node in the *previous* dep-graph -as green that corresponds to the query key in the *current* session. How do we -do this mapping between current query key and previous `DepNode`? The answer -is again `Fingerprint`s: Nodes in the dependency graph are identified by a -fingerprint of the query key. Since fingerprints are stable across compilation +graph into memory as an immutable piece of data. +Then, when a query is invoked, +it will first try to mark the corresponding node in the graph as green. +This means really that we are trying to mark the node in the *previous* dep-graph +as green that corresponds to the query key in the *current* session. +How do we do this mapping between current query key and previous `DepNode`? +The answer is again `Fingerprint`s: Nodes in the dependency graph are identified by a +fingerprint of the query key. +Since fingerprints are stable across compilation sessions, computing one in the current session allows us to find a node -in the dependency graph from the previous session. If we don't find a node with +in the dependency graph from the previous session. +If we don't find a node with the given fingerprint, it means that the query key refers to something that did not yet exist in the previous session. So, having found the dep-node in the previous dependency graph, we can look up its dependencies (i.e. also dep-nodes in the previous graph) and continue with -the rest of the try-mark-green algorithm. The next interesting thing happens -when we successfully marked the node as green. At that point we copy the node -and the edges to its dependencies from the old graph into the new graph. We -have to do this because the new dep-graph cannot acquire the -node and edges via the regular dependency tracking. The tracking system can -only record edges while actually running a query -- but running the query, +the rest of the try-mark-green algorithm. +The next interesting thing happens when we successfully marked the node as green. +At that point we copy the node +and the edges to its dependencies from the old graph into the new graph. +We have to do this because the new dep-graph cannot acquire the +node and edges via the regular dependency tracking. +The tracking system can only record edges while actually running a query -- but running the query, although we have the result already cached, is exactly what we want to avoid. Once the compilation session has finished, all the unchanged parts have been copied over from the old into the new dependency graph, while the changed parts -have been added to the new graph by the tracking system. At this point, the -new graph is serialized out to disk, alongside the query result cache, and can +have been added to the new graph by the tracking system. +At this point, the new graph is serialized out to disk, alongside the query result cache, and can act as the previous dep-graph in a subsequent compilation session. @@ -346,18 +359,18 @@ act as the previous dep-graph in a subsequent compilation session. The system described so far has a somewhat subtle property: If all inputs of a dep-node are green then the dep-node itself can be marked as green without -computing or loading the corresponding query result. Applying this property -transitively often leads to the situation that some intermediate results are +computing or loading the corresponding query result. +Applying this property transitively often leads to the situation that some intermediate results are never actually loaded from disk, as in the following example: ```ignore input(A) <-- intermediate_query(B) <-- leaf_query(C) ``` -The compiler might need the value of `leaf_query(C)` in order to generate some -output artifact. If it can mark `leaf_query(C)` as green, it will load the -result from the on-disk cache. The result of `intermediate_query(B)` is never -loaded though. As a consequence, when the compiler persists the *new* result +The compiler might need the value of `leaf_query(C)` in order to generate some output artifact. +If it can mark `leaf_query(C)` as green, it will load the result from the on-disk cache. +The result of `intermediate_query(B)` is never loaded though. +As a consequence, when the compiler persists the *new* result cache by writing all in-memory query results to disk, `intermediate_query(B)` will not be in memory and thus will be missing from the new result cache. @@ -367,25 +380,25 @@ had a perfectly valid result for it in the cache just before. In order to prevent this from happening, the compiler does something called "cache promotion": Before emitting the new result cache it will walk all green -dep-nodes and make sure that their query result is loaded into memory. That way -the result cache doesn't unnecessarily shrink again. +dep-nodes and make sure that their query result is loaded into memory. +That way the result cache doesn't unnecessarily shrink again. # Incremental compilation and the compiler backend The compiler backend, the part involving LLVM, is using the query system but -it is not implemented in terms of queries itself. As a consequence it does not -automatically partake in dependency tracking. However, the manual integration -with the tracking system is pretty straight-forward. The compiler simply tracks -what queries get invoked when generating the initial LLVM version of each -codegen unit (CGU), which results in a dep-node for each CGU. In subsequent -compilation sessions it then tries to mark the dep-node for a CGU as green. If -it succeeds, it knows that the corresponding object and bitcode files on disk -are still valid. If it doesn't succeed, the entire CGU has to be recompiled. - -This is the same approach that is used for regular queries. The main differences -are: +it is not implemented in terms of queries itself. +As a consequence it does not automatically partake in dependency tracking. +However, the manual integration with the tracking system is pretty straight-forward. +The compiler simply tracks what queries get invoked when generating the initial LLVM version of each +codegen unit (CGU), which results in a dep-node for each CGU. +In subsequent compilation sessions it then tries to mark the dep-node for a CGU as green. +If it succeeds, it knows that the corresponding object and bitcode files on disk are still valid. +If it doesn't succeed, the entire CGU has to be recompiled. + +This is the same approach that is used for regular queries. +The main differences are: - that we cannot easily compute a fingerprint for LLVM modules (because they are opaque C++ objects), @@ -399,39 +412,41 @@ are: executed when and what stays in memory for how long. The query system could probably be extended with general purpose mechanisms to -deal with all of the above but so far that seemed like more trouble than it -would save. +deal with all of the above but so far that seemed like more trouble than it would save. ## Query modifiers -The query system allows for applying [modifiers][mod] to queries. These -modifiers affect certain aspects of how the system treats the query with +The query system allows for applying [modifiers][mod] to queries. +These modifiers affect certain aspects of how the system treats the query with respect to incremental compilation: - `eval_always` - A query with the `eval_always` attribute is re-executed - unconditionally during incremental compilation. I.e. the system will not - even try to mark the query's dep-node as green. This attribute has two use - cases: + unconditionally during incremental compilation. + I.e. + the system will not even try to mark the query's dep-node as green. + This attribute has two use cases: - `eval_always` queries can read inputs (from files, global state, etc). They can also produce side effects like writing to files and changing global state. - Some queries are very likely to be re-evaluated because their result - depends on the entire source code. In this case `eval_always` can be used + depends on the entire source code. + In this case `eval_always` can be used as an optimization because the system can skip recording dependencies in the first place. - `no_hash` - Applying `no_hash` to a query tells the system to not compute - the fingerprint of the query's result. This has two consequences: + the fingerprint of the query's result. + This has two consequences: - Not computing the fingerprint can save quite a bit of time because fingerprinting is expensive, especially for large, complex values. - Without the fingerprint, the system has to unconditionally assume that - the result of the query has changed. As a consequence anything depending - on a `no_hash` query will always be re-executed. + the result of the query has changed. + As a consequence, anything depending on a `no_hash` query will always be re-executed. Using `no_hash` for a query can make sense in two circumstances: @@ -446,24 +461,24 @@ respect to incremental compilation: and there are "projection queries" reading from that collection (e.g. `hir_owner`). In such a case the big collection will likely fulfill the condition above (any changed input means recomputing the whole collection) - and the results of the projection queries will be hashed anyway. If we also - hashed the collection query it would mean that we effectively hash the same + and the results of the projection queries will be hashed anyway. + If we also hashed the collection query, it would mean that we effectively hash the same data twice: once when hashing the collection and another time when hashing all - the projection query results. `no_hash` allows us to avoid that redundancy + the projection query results. + `no_hash` allows us to avoid that redundancy and the projection queries act as a "firewall", shielding their dependents from the unconditionally red `no_hash` node. - `cache_on_disk_if` - This attribute is what determines which query results - are persisted in the incremental compilation query result cache. The - attribute takes an expression that allows per query invocation - decisions. For example, it makes no sense to store values from upstream - crates in the cache because they are already available in the upstream - crate's metadata. - - - `anon` - This attribute makes the system use "anonymous" dep-nodes for the - given query. An anonymous dep-node is not identified by the corresponding - query key, instead its ID is computed from the IDs of its dependencies. This - allows the red-green system to do its change detection even if there is no + are persisted in the incremental compilation query result cache. + The attribute takes an expression that allows per query invocation decisions. + For example, it makes no sense to store values from upstream + crates in the cache because they are already available in the upstream crate's metadata. + + - `anon` - This attribute makes the system use "anonymous" dep-nodes for the given query. + An anonymous dep-node is not identified by the corresponding query key. + Instead, its ID is computed from the IDs of its dependencies. + This allows the red-green system to do its change detection even if there is no query key available for a given dep-node -- something which is needed for handling trait selection because it is not based on queries. @@ -473,7 +488,8 @@ respect to incremental compilation: ## The projection query pattern It's interesting to note that `eval_always` and `no_hash` can be used together -in the so-called "projection query" pattern. It is often the case that there is +in the so-called "projection query" pattern. +It is often the case that there is one query that depends on the entirety of the compiler's input (e.g. the indexed HIR) and another query that projects individual values out of this monolithic value (e.g. a HIR item with a certain `DefId`). These projection queries allow for @@ -500,18 +516,18 @@ can still mostly be marked as green. Let's assume that the result `monolithic_query` changes so that also the result of `projection(x)` has changed, i.e. both their dep-nodes are being marked as -red. As a consequence `foo(a)` needs to be re-executed; but `bar(b)` and -`baz(c)` can be marked as green. However, if `foo`, `bar`, and `baz` would have -directly depended on `monolithic_query` then all of them would have had to be -re-evaluated. +red. +As a consequence, `foo(a)` needs to be re-executed; but `bar(b)` and `baz(c)` can be marked as green. +However, if `foo`, `bar`, and `baz` would have +directly depended on `monolithic_query` then all of them would have had to be re-evaluated. This pattern works even without `eval_always` and `no_hash` but the two -modifiers can be used to avoid unnecessary overhead. If the monolithic query +modifiers can be used to avoid unnecessary overhead. +If the monolithic query is likely to change at any minor modification of the compiler's input it makes -sense to mark it as `eval_always`, thus getting rid of its dependency tracking -cost. And it always makes sense to mark the monolithic query as `no_hash` -because we have the projections to take care of keeping things green as much -as possible. +sense to mark it as `eval_always`, thus getting rid of its dependency tracking cost. +And it always makes sense to mark the monolithic query as `no_hash` +because we have the projections to take care of keeping things green as much as possible. # Shortcomings of the current system @@ -520,17 +536,17 @@ There are many things that still can be improved. ## Incrementality of on-disk data structures -The current system is not able to update on-disk caches and the dependency graph -in-place. Instead it has to rewrite each file entirely in each compilation -session. The overhead of doing so is a few percent of total compilation time. +The current system is not able to update on-disk caches and the dependency graph in-place. +Instead, it has to rewrite each file entirely in each compilation session. +The overhead of doing so is a few percent of total compilation time. ## Unnecessary data dependencies Data structures used as query results could be factored in a way that removes -edges from the dependency graph. Especially "span" information is very volatile, -so including it in query result will increase the chance that the result won't -be reusable. See for more -information. +edges from the dependency graph. +Especially "span" information is very volatile, +so including it in query result will increase the chance that the result won't be reusable. +See for more information. [query-model]: ./query-evaluation-model-in-detail.html diff --git a/src/doc/rustc-dev-guide/src/rustdoc.md b/src/doc/rustc-dev-guide/src/rustdoc.md index 47b18f4e7e52d..1259ed10c09bd 100644 --- a/src/doc/rustc-dev-guide/src/rustdoc.md +++ b/src/doc/rustc-dev-guide/src/rustdoc.md @@ -37,28 +37,25 @@ Note that literally all that does is call the `main()` that's in this crate's `l ## Cheat sheet -* Run `./x setup tools` before getting started. This will configure `x` - with nice settings for developing rustdoc and other tools, including +* Run `./x setup tools` before getting started. + This will configure `x` with nice settings for developing rustdoc and other tools, including downloading a copy of rustc rather than building it. * Use `./x check rustdoc` to quickly check for compile errors. -* Use `./x build library rustdoc` to make a usable - rustdoc you can run on other projects. +* Use `./x build library rustdoc` to make a usable rustdoc you can run on other projects. * Add `library/test` to be able to use `rustdoc --test`. * Run `rustup toolchain link stage2 build/host/stage2` to add a custom toolchain called `stage2` to your rustup environment. After running that, `cargo +stage2 doc` in any directory will build with your locally-compiled rustdoc. -* Use `./x doc library` to use this rustdoc to generate the - standard library docs. +* Use `./x doc library` to use this rustdoc to generate the standard library docs. * The completed docs will be available in `build/host/doc` (under `core`, `alloc`, and `std`). * If you want to copy those docs to a webserver, copy all of `build/host/doc`, since that's where the CSS, JS, fonts, and landing page are. * For frontend debugging, disable the `rust.docs-minification` option in [`bootstrap.toml`]. -* Use `./x test tests/rustdoc*` to run the tests using a stage1 - rustdoc. +* Use `./x test tests/rustdoc*` to run the tests using a stage1 rustdoc. * See [Rustdoc internals] for more information about tests. -* Use `./x.py test tidy --extra-checks=js` to run rustdoc’s JavaScript checks (`eslint`, `es-check`, and `tsc`). -> **Note:** `./x.py test tidy` already runs these checks automatically when JS/TS sources changed; `--extra-checks=js` forces them explicitly. +* Use `./x test tidy --extra-checks=js` to run rustdoc’s JavaScript checks (`eslint`, `es-check`, and `tsc`). +> **Note:** `./x test tidy` already runs these checks automatically when JS/TS sources changed; `--extra-checks=js` forces them explicitly. ### JavaScript CI checks @@ -66,7 +63,7 @@ Rustdoc’s JavaScript and TypeScript are checked during CI by `eslint`, `es-che These run as part of the `tidy` job. ```console -./x.py test tidy --extra-checks=js +./x test tidy --extra-checks=js ``` The `--extra-checks=js` flag enables the frontend linting that runs in CI. @@ -82,12 +79,12 @@ All paths in this section are relative to `src/librustdoc/` in the rust-lang/rus * The data types that get rendered by the functions mentioned above are defined in `clean/types.rs`. The functions responsible for creating them from the `HIR` and the `rustc_middle::ty` IR live in `clean/mod.rs`. -* The bits specific to using rustdoc as a test harness are in - `doctest.rs`. +* The bits specific to using rustdoc as a test harness are in `doctest.rs`. * The Markdown renderer is loaded up in `html/markdown.rs`, including functions for extracting doctests from a given block of Markdown. * Frontend CSS and JavaScript are stored in `html/static/`. - * Re. JavaScript, type annotations are written using [TypeScript-flavored JSDoc] + * Re. + JavaScript, type annotations are written using [TypeScript-flavored JSDoc] comments and an external `.d.ts` file. This way, the code itself remains plain, valid JavaScript. We only use `tsc` as a linter. diff --git a/src/doc/rustc-dev-guide/src/syntax-intro.md b/src/doc/rustc-dev-guide/src/syntax-intro.md index a5a8bab149719..7290e9b4f90da 100644 --- a/src/doc/rustc-dev-guide/src/syntax-intro.md +++ b/src/doc/rustc-dev-guide/src/syntax-intro.md @@ -2,7 +2,8 @@ Working directly with source code is very inconvenient and error-prone. Thus, before we do anything else, we convert raw source code into an -[Abstract Syntax Tree (AST)][AST]. It turns out that doing this involves a lot of work, +[Abstract Syntax Tree (AST)][AST]. +It turns out that doing this involves a lot of work, including [lexing, parsing], [macro expansion], [name resolution], conditional compilation, [feature-gate checking], and [validation] of the [AST]. In this chapter, we take a look at all of these steps. diff --git a/src/doc/rustc-dev-guide/src/tests/compiletest.md b/src/doc/rustc-dev-guide/src/tests/compiletest.md index 959a73a5a9ad9..6ca9653c18540 100644 --- a/src/doc/rustc-dev-guide/src/tests/compiletest.md +++ b/src/doc/rustc-dev-guide/src/tests/compiletest.md @@ -785,16 +785,12 @@ only apply to the test as a whole, not to particular revisions. The only directives that are intended to really work when customized to a revision are error patterns and compiler flags. - -The following test suites support revisions: - -- ui -- assembly -- codegen -- coverage -- debuginfo -- rustdoc UI tests -- incremental (these are special in that they inherently cannot be run in parallel) + +> Note that these test suites do not support revisions: +> - `codegen-units` +> - `run-make` +> - `rustdoc-html` +> - `rustdoc-json` ### Ignoring unused revision names diff --git a/src/doc/rustc-dev-guide/src/ty-module/binders.md b/src/doc/rustc-dev-guide/src/ty-module/binders.md index 21cf80abc6e34..6bb7085a2e0df 100644 --- a/src/doc/rustc-dev-guide/src/ty-module/binders.md +++ b/src/doc/rustc-dev-guide/src/ty-module/binders.md @@ -1,8 +1,12 @@ # `Binder` and Higher ranked regions -Sometimes we define generic parameters not on an item but as part of a type or a where clause. As an example the type `for<'a> fn(&'a u32)` or the where clause `for<'a> T: Trait<'a>` both introduce a generic lifetime named `'a`. Currently there is no stable syntax for `for` or `for` but on nightly `feature(non_lifetime_binders)` can be used to write where clauses (but not types) using `for`/`for`. +Sometimes, we define generic parameters not on an item but as part of a type or a where clause. +As an example, the type `for<'a> fn(&'a u32)` or the where clause `for<'a> T: Trait<'a>` both introduce a generic lifetime named `'a`. +Currently, there is no stable syntax for `for` or `for`, +but on nightly, `feature(non_lifetime_binders)` can be used to write where clauses (but not types) using `for`/`for`. -The `for` is referred to as a "binder" because it brings new names into scope. In rustc we use the `Binder` type to track where these parameters are introduced and what the parameters are (i.e. how many and whether the parameter is a type/const/region). A type such as `for<'a> fn(&'a u32)` would be +The `for` is referred to as a "binder" because it brings new names into scope. +In rustc we use the `Binder` type to track where these parameters are introduced and what the parameters are (i.e. how many and whether the parameter is a type/const/region). A type such as `for<'a> fn(&'a u32)` would be represented in rustc as: ``` Binder( @@ -11,13 +15,19 @@ Binder( ) ``` -Usages of these parameters is represented by the `RegionKind::Bound` (or `TyKind::Bound`/`ConstKind::Bound` variants). These bound regions/types/consts are composed of two main pieces of data: +Usages of these parameters is represented by the `RegionKind::Bound` (or `TyKind::Bound`/`ConstKind::Bound` variants). +These bound regions/types/consts are composed of two main pieces of data: - A [DebruijnIndex](../appendix/background.md#what-is-a-de-bruijn-index) to specify which binder we are referring to. - A [`BoundVar`] which specifies which of the parameters that the `Binder` introduces we are referring to. -We also sometimes store some extra information for diagnostics reasons via the [`BoundTyKind`]/[`BoundRegionKind`] but this is not important for type equality or more generally the semantics of `Ty`. (omitted from the above example) +We also sometimes store some extra information for diagnostics reasons via the [`BoundTyKind`]/[`BoundRegionKind`], +but this is not important for type equality, or, more generally, the semantics of `Ty`. +(omitted from the above example) -In debug output (and also informally when talking to each other) we tend to write these bound variables in the format of `^DebruijnIndex_BoundVar`. The above example would instead be written as `Binder(fn(&'^0_0), &[BoundVariableKind::Region])`. Sometimes when the `DebruijnIndex` is `0` we just omit it and would write `^0`. +In debug output (and also informally when talking to each other), +we tend to write these bound variables in the format of `^DebruijnIndex_BoundVar`. +The above example would instead be written as `Binder(fn(&'^0_0), &[BoundVariableKind::Region])`. +Sometimes, when the `DebruijnIndex` is `0`, we just omit it and would write `^0`. Another concrete example, this time a mixture of `for<'a>` in a where clause and a type: ``` @@ -35,16 +45,24 @@ Binder( ) ``` -Note how the `'^1_0` refers to the `'a` parameter. We use a `DebruijnIndex` of `1` to refer to the binder one level up from the innermost one, and a var of `0` to refer to the first parameter bound which is `'a`. We also use `'^0` to refer to the `'b` parameter, the `DebruijnIndex` is `0` (referring to the innermost binder) so we omit it, leaving only the boundvar of `0` referring to the first parameter bound which is `'b`. +Note how the `'^1_0` refers to the `'a` parameter. +We use a `DebruijnIndex` of `1` to refer to the binder one level up from the innermost one, and a var of `0` to refer to the first parameter bound which is `'a`. +We also use `'^0` to refer to the `'b` parameter, the `DebruijnIndex` is `0` (referring to the innermost binder) so we omit it, leaving only the boundvar of `0` referring to the first parameter bound which is `'b`. -We did not always explicitly track the set of bound vars introduced by each `Binder`, this caused a number of bugs (read: ICEs [#81193](https://github.com/rust-lang/rust/issues/81193), [#79949](https://github.com/rust-lang/rust/issues/79949), [#83017](https://github.com/rust-lang/rust/issues/83017)). By tracking these explicitly we can assert when constructing higher ranked where clauses/types that there are no escaping bound variables or variables from a different binder. See the following example of an invalid type inside of a binder: +We did not always explicitly track the set of bound vars introduced by each `Binder`, +and this caused a number of bugs (read: ICEs [#81193](https://github.com/rust-lang/rust/issues/81193), [#79949](https://github.com/rust-lang/rust/issues/79949), [#83017](https://github.com/rust-lang/rust/issues/83017)). +By tracking these explicitly, we can assert when constructing higher ranked where clauses/types that there are no escaping bound variables or variables from a different binder. +See the following example of an invalid type inside of a binder: ``` Binder( fn(&'^1_0 &'^1 T/#0), &[BoundVariableKind::Region(...)], ) ``` -This would cause all kinds of issues as the region `'^1_0` refers to a binder at a higher level than the outermost binder i.e. it is an escaping bound var. The `'^1` region (also writeable as `'^0_1`) is also ill formed as the binder it refers to does not introduce a second parameter. Modern day rustc will ICE when constructing this binder due to both of those reasons, in the past we would have simply allowed this to work and then ran into issues in other parts of the codebase. +This would cause all kinds of issues as the region `'^1_0` refers to a binder at a higher level than the outermost binder i.e. it is an escaping bound var. +The `'^1` region (also writeable as `'^0_1`) is also ill formed as the binder it refers to does not introduce a second parameter. +Modern day rustc will ICE when constructing this binder due to both of those reasons. +In the past, we would have simply allowed this to work and then ran into issues in other parts of the codebase. [`Binder`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Binder.html [`BoundVar`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.BoundVar.html diff --git a/src/doc/rustc-dev-guide/src/ty-module/instantiating-binders.md b/src/doc/rustc-dev-guide/src/ty-module/instantiating-binders.md index 82c340e87a810..f6fc574a3ba2d 100644 --- a/src/doc/rustc-dev-guide/src/ty-module/instantiating-binders.md +++ b/src/doc/rustc-dev-guide/src/ty-module/instantiating-binders.md @@ -1,6 +1,8 @@ # Instantiating `Binder`s -Much like [`EarlyBinder`], when accessing the inside of a [`Binder`] we must first discharge it by replacing the bound vars with some other value. This is for much the same reason as with `EarlyBinder`, types referencing parameters introduced by the `Binder` do not make any sense outside of that binder. See the following erroring example: +Much like [`EarlyBinder`], when accessing the inside of a [`Binder`], we must first discharge it by replacing the bound vars with some other value. +This is for much the same reason as with `EarlyBinder`, types referencing parameters introduced by the `Binder` do not make any sense outside of that binder. +See the following erroring example: ```rust,ignore fn foo<'a>(a: &'a u32) -> &'a u32 { a @@ -15,7 +17,9 @@ fn main() { let references_bound_vars = bar(higher_ranked_fn_ptr); } ``` -In this example we are providing an argument of type `for<'a> fn(&'^0 u32) -> &'^0 u32` to `bar`, we do not want to allow `T` to be inferred to the type `&'^0 u32` as it would be rather nonsensical (and likely unsound if we did not happen to ICE). `main` doesn't know about `'a` so the borrow checker would not be able to handle a borrow with lifetime `'a`. +In this example, we are providing an argument of type `for<'a> fn(&'^0 u32) -> &'^0 u32` to `bar`. +We do not want to allow `T` to be inferred to the type `&'^0 u32` as it would be rather nonsensical (and likely unsound if we did not happen to ICE). +`main` doesn't know about `'a` so the borrow checker would not be able to handle a borrow with lifetime `'a`. Unlike `EarlyBinder` we typically do not instantiate `Binder` with some concrete set of arguments from the user, i.e. `['b, 'static]` as arguments to a `for<'a1, 'a2> fn(&'a1 u32, &'a2 u32)`. Instead we usually instantiate the binder with inference variables or placeholders. @@ -31,27 +35,39 @@ As another example of instantiating with infer vars, given some `for<'a> T: Trai - Equate the goal of `T: Trait<'static>` with the instantiated where clause, inferring `'?0 = 'static` - The goal holds because we were successfully able to unify `T: Trait<'static>` with `T: Trait<'?0>` -Instantiating binders with inference variables can be accomplished by using the [`instantiate_binder_with_fresh_vars`] method on [`InferCtxt`]. Binders should be instantiated with infer vars when we only care about one specific instantiation of the binder, if instead we wish to reason about all possible instantiations of the binder then placeholders should be used instead. +Instantiating binders with inference variables can be accomplished by using the [`instantiate_binder_with_fresh_vars`] method on [`InferCtxt`]. +Binders should be instantiated with infer vars when we only care about one specific instantiation of the binder, if instead we wish to reason about all possible instantiations of the binder then placeholders should be used instead. ## Instantiating with placeholders -Placeholders are very similar to `Ty/ConstKind::Param`/`ReEarlyParam`, they represent some unknown type that is only equal to itself. `Ty`/`Const` and `Region` all have a [`Placeholder`] variant that is comprised of a [`Universe`] and a [`BoundVar`]. +Placeholders are very similar to `Ty/ConstKind::Param`/`ReEarlyParam`, they represent some unknown type that is only equal to itself. +`Ty`/`Const` and `Region` all have a [`Placeholder`] variant that is comprised of a [`Universe`] and a [`BoundVar`]. -The `Universe` tracks which binder the placeholder originated from, and the `BoundVar` tracks which parameter on said binder that this placeholder corresponds to. Equality of placeholders is determined solely by whether the universes are equal and the `BoundVar`s are equal. See the [chapter on Placeholders and Universes][ch_placeholders_universes] for more information. +The `Universe` tracks which binder the placeholder originated from, and the `BoundVar` tracks which parameter on said binder that this placeholder corresponds to. +Equality of placeholders is determined solely by whether the universes are equal and the `BoundVar`s are equal. +See the [chapter on Placeholders and Universes][ch_placeholders_universes] for more information. -When talking with other rustc devs or seeing `Debug` formatted `Ty`/`Const`/`Region`s, `Placeholder` will often be written as `'!UNIVERSE_BOUNDVARS`. For example given some type `for<'a> fn(&'a u32, for<'b> fn(&'b &'a u32))`, after instantiating both binders (assuming the `Universe` in the current `InferCtxt` was `U0` beforehand), the type of `&'b &'a u32` would be represented as `&'!2_0 &!1_0 u32`. +When talking with other rustc devs or seeing `Debug` formatted `Ty`/`Const`/`Region`s, `Placeholder` will often be written as `'!UNIVERSE_BOUNDVARS`. +For example, given some type `for<'a> fn(&'a u32, for<'b> fn(&'b &'a u32))`, +after instantiating both binders (assuming the `Universe` in the current `InferCtxt` was `U0` beforehand), +the type of `&'b &'a u32` would be represented as `&'!2_0 &!1_0 u32`. -When the universe of the placeholder is `0`, it will be entirely omitted from the debug output, i.e. `!0_2` would be printed as `!2`. This rarely happens in practice though as we increase the universe in the `InferCtxt` when instantiating a binder with placeholders so usually the lowest universe placeholders encounterable are ones in `U1`. +When the universe of the placeholder is `0`, it will be entirely omitted from the debug output, i.e. `!0_2` would be printed as `!2`. +This rarely happens in practice though as we increase the universe in the `InferCtxt` when instantiating a binder with placeholders, +so usually the lowest universe placeholders encounterable are ones in `U1`. -`Binder`s can be instantiated with placeholders via the [`enter_forall`] method on `InferCtxt`. It should be used whenever the compiler should care about any possible instantiation of the binder instead of one concrete instantiation. +`Binder`s can be instantiated with placeholders via the [`enter_forall`] method on `InferCtxt`. +It should be used whenever the compiler should care about any possible instantiation of the binder instead of one concrete instantiation. -Note: in the original example of this chapter it was mentioned that we should not infer a local variable to have type `&'^0 u32`. This code is prevented from compiling via universes (as explained in the linked chapter) +Note: in the original example of this chapter it was mentioned that we should not infer a local variable to have type `&'^0 u32`. +This code is prevented from compiling via universes (as explained in the linked chapter) ### Why have both `RePlaceholder` and `ReBound`? You may be wondering why we have both of these variants, afterall the data stored in `Placeholder` is effectively equivalent to that of `ReBound`: something to track which binder, and an index to track which parameter the `Binder` introduced. -The main reason for this is that `Bound` is a more syntactic representation of bound variables whereas `Placeholder` is a more semantic representation. As a concrete example: +The main reason for this is that `Bound` is a more syntactic representation of bound variables whereas `Placeholder` is a more semantic representation. +As a concrete example: ```rust impl<'a> Other<'a> for &'a u32 { } @@ -66,7 +82,10 @@ where { ... } ``` -Given these trait implementations `u32: Bar` should _not_ hold. `&'a u32` only implements `Other<'a>` when the lifetime of the borrow and the lifetime on the trait are equal. However if we only used `ReBound` and did not have placeholders it may be easy to accidentally believe that trait bound does hold. To explain this let's walk through an example of trying to prove `u32: Bar` in a world where rustc did not have placeholders: +Given these trait implementations, `u32: Bar` should _not_ hold. +`&'a u32` only implements `Other<'a>` when the lifetime of the borrow and the lifetime on the trait are equal. +However, if we only used `ReBound` and did not have placeholders, it may be easy to accidentally believe that trait bound does hold. +To explain this, let's walk through an example of trying to prove `u32: Bar` in a world where rustc did not have placeholders: - We start by trying to prove `u32: Bar` - We find the `impl Bar for T` impl, we would wind up instantiating the `EarlyBinder` with `u32` (note: this is not _quite_ accurate as we first instantiate the binder with an inference variable that we then infer to be `u32` but that distinction is not super important here) - There is a where clause `for<'a> &'^0 T: Trait` on the impl, as we instantiated the early binder with `u32` we actually have to prove `for<'a> &'^0 u32: Trait` @@ -83,22 +102,25 @@ While in theory we could make this work it would be quite involved and more comp - When resolving inference variables rewrite any bound variables according to the current binder depth of the infcx - Maybe more (while writing this list items kept getting added so it seems naive to think this is exhaustive) -Fundamentally all of this complexity is because `Bound` ty/const/regions have a different representation for a given parameter on a `Binder` depending on how many other `Binder`s there are between the binder introducing the parameter, and its usage. For example given the following code: +Fundamentally, all of this complexity is because `Bound` ty/const/regions have a different representation for a given parameter on a `Binder` depending on how many other `Binder`s there are between the binder introducing the parameter, and its usage. +For example, given the following code: ```rust fn foo() where for<'a> T: Trait<'a, for<'b> fn(&'b T, &'a u32)> { ... } ``` -That where clause would be written as: -`for<'a> T: Trait<'^0, for<'b> fn(&'^0 T, &'^1_0 u32)>` -Despite there being two references to the `'a` parameter they are both represented differently: `^0` and `^1_0`, due to the fact that the latter usage is nested under a second `Binder` for the inner function pointer type. +That where clause would be written as `for<'a> T: Trait<'^0, for<'b> fn(&'^0 T, &'^1_0 u32)>`. +Despite there being two references to the `'a` parameter, +they are both represented differently, `^0` and `^1_0`, +due to the fact that the latter usage is nested under a second `Binder` for the inner function pointer type. This is in contrast to `Placeholder` ty/const/regions which do not have this limitation due to the fact that `Universe`s are specific to the current `InferCtxt` not the usage site of the parameter. -It is trivially possible to instantiate `EarlyBinder`s and unify inference variables with existing `Placeholder`s as no matter what context the `Placeholder` is in, it will have the same representation. As an example if we were to instantiate the binder on the higher ranked where clause from above, it would be represented like so: -`T: Trait<'!1_0, for<'b> fn(&'^0 T, &'!1_0 u32)>` -the `RePlaceholder` representation for both usages of `'a` are the same despite one being underneath another `Binder`. +It is trivially possible to instantiate `EarlyBinder`s and unify inference variables with existing `Placeholder`s as no matter what context the `Placeholder` is in, it will have the same representation. +As an example, if we were to instantiate the binder on the higher ranked where clause from above, it would be represented like +`T: Trait<'!1_0, for<'b> fn(&'^0 T, &'!1_0 u32)>`. +The `RePlaceholder` representation for both usages of `'a` are the same despite one being underneath another `Binder`. If we were to then instantiate the binder on the function pointer we would get a type such as: `fn(&'!2_0 T, ^'!1_0 u32)` @@ -107,9 +129,12 @@ the `RePlaceholder` for the `'b` parameter is in a higher universe to track the ## Instantiating with `ReLateParam` As discussed in [the chapter about representing types][representing-types], `RegionKind` has two variants for representing generic parameters, `ReLateParam` and `ReEarlyParam`. -`ReLateParam` is conceptually a `Placeholder` that is always in the root universe (`U0`). It is used when instantiating late bound parameters of functions/closures while inside of them. Its actual representation is relatively different from both `ReEarlyParam` and `RePlaceholder`: +`ReLateParam` is conceptually a `Placeholder` that is always in the root universe (`U0`). +It is used when instantiating late bound parameters of functions/closures while inside of them. +Its actual representation is relatively different from both `ReEarlyParam` and `RePlaceholder`: - A `DefId` for the item that introduced the late bound generic parameter -- A [`BoundRegionKind`] which either specifies the `DefId` of the generic parameter and its name (via a `Symbol`), or that this placeholder is representing the anonymous lifetime of a `Fn`/`FnMut` closure's self borrow. There is also a variant for `BrAnon` but this is not used for `ReLateParam`. +- A [`BoundRegionKind`] which either specifies the `DefId` of the generic parameter and its name (via a `Symbol`), or that this placeholder is representing the anonymous lifetime of a `Fn`/`FnMut` closure's self borrow. + There is also a variant for `BrAnon` but this is not used for `ReLateParam`. For example, given the following code: ```rust,ignore @@ -128,11 +153,18 @@ ReLateParam( ) ``` -In this specific case of referencing late bound generic parameters of a function from inside the body this is done implicitly during `hir_ty_lowering` rather than explicitly when instantiating a `Binder` somewhere. In some cases however, we do explicitly instantiate a `Binder` with `ReLateParam`s. +In this specific case of referencing late bound generic parameters of a function from inside the body, +this is done implicitly during `hir_ty_lowering`, +rather than explicitly when instantiating a `Binder` somewhere. +In some cases however, we do explicitly instantiate a `Binder` with `ReLateParam`s. -Generally whenever we have a `Binder` for late bound parameters on a function/closure and we are conceptually inside of the binder already, we use [`liberate_late_bound_regions`] to instantiate it with `ReLateParam`s. That makes this operation the `Binder` equivalent to `EarlyBinder`'s `instantiate_identity`. +Generally, whenever we have a `Binder` for late bound parameters on a function/closure, +and we are conceptually inside of the binder already, +we use [`liberate_late_bound_regions`] to instantiate it with `ReLateParam`s. +That makes this operation the `Binder` equivalent to `EarlyBinder`'s `instantiate_identity`. -As a concrete example, accessing the signature of a function we are type checking will be represented as `EarlyBinder>`. As we are already "inside" of these binders, we would call `instantiate_identity` followed by `liberate_late_bound_regions`. +As a concrete example, accessing the signature of a function we are type checking will be represented as `EarlyBinder>`. +As we are already "inside" of these binders, we would call `instantiate_identity` followed by `liberate_late_bound_regions`. [`liberate_late_bound_regions`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/context/struct.TyCtxt.html#method.liberate_late_bound_regions [representing-types]: param-ty-const-regions.md diff --git a/src/doc/rustc-dev-guide/src/ty-module/param-ty-const-regions.md b/src/doc/rustc-dev-guide/src/ty-module/param-ty-const-regions.md index b0c7930030415..f9963907897cf 100644 --- a/src/doc/rustc-dev-guide/src/ty-module/param-ty-const-regions.md +++ b/src/doc/rustc-dev-guide/src/ty-module/param-ty-const-regions.md @@ -1,7 +1,7 @@ # Parameter `Ty`/`Const`/`Region`s -When inside of generic items, types can be written that use in scope generic parameters, for example `fn foo<'a, T>(_: &'a Vec)`. In this specific case -the `&'a Vec` type would be represented internally as: +When inside of generic items, types can be written that use in scope generic parameters, for example `fn foo<'a, T>(_: &'a Vec)`. +In this specific case, the `&'a Vec` type would be represented internally as: ``` TyKind::Ref( RegionKind::LateParam(DefId(foo), DefId(foo::'a), "'a"), @@ -29,8 +29,11 @@ struct Foo(Vec); ``` The `Vec` type is represented as `TyKind::Adt(Vec, &[GenericArgKind::Type(Param("T", 0))])`. -The name is somewhat self explanatory, it's the name of the type parameter. The index of the type parameter is an integer indicating -its order in the list of generic parameters in scope (note: this includes parameters defined on items on outer scopes than the item the parameter is defined on). Consider the following examples: +The name is somewhat self explanatory; it's the name of the type parameter. +The index of the type parameter is an integer indicating +its order in the list of generic parameters in scope. +Note that this includes parameters defined on items on outer scopes than the item the parameter is defined on. +Consider the following examples: ```rust,ignore struct Foo { @@ -49,15 +52,20 @@ impl Foo { } ``` -Concretely given the `ty::Generics` for the item the parameter is defined on, if the index is `2` then starting from the root `parent`, it will be the third parameter to be introduced. For example in the above example, `Z` has index `2` and is the third generic parameter to be introduced, starting from the `impl` block. +Concretely, given the `ty::Generics` for the item the parameter is defined on, +if the index is `2` and if we start from the root `parent`, it will be the third parameter to be introduced. +For example in the above example, `Z` has index `2` and is the third generic parameter to be introduced, starting from the `impl` block. The index fully defines the `Ty` and is the only part of `TyKind::Param` that matters for reasoning about the code we are compiling. -Generally we do not care what the name is and only use the index. The name is included for diagnostics and debug logs as otherwise it would be +Generally, we do not care what the name is, and we only use the index. +The name is included for diagnostics and debug logs as otherwise it would be incredibly difficult to understand the output, i.e. `Vec: Sized` vs `Vec: Sized`. In debug output, parameter types are often printed out as `{name}/#{index}`, for example in the function `foo` if we were to debug print `Vec` it would be written as `Vec`. -An alternative representation would be to only have the name, however using an index is more efficient as it means we can index into `GenericArgs` when instantiating generic parameters with some arguments. We would otherwise have to store `GenericArgs` as a `HashMap` and do a hashmap lookup everytime we used a generic item. +An alternative representation would be to only have the name. +However, using an index is more efficient as it means we can index into `GenericArgs` when instantiating generic parameters with some arguments. +We would otherwise have to store `GenericArgs` as a `HashMap` and do a hashmap lookup everytime we used a generic item. In theory an index would also allow for having multiple distinct parameters that use the same name, e.g. `impl Foo { fn bar() { .. } }`. @@ -65,9 +73,15 @@ The rules against shadowing make this difficult but those language rules could c ### Lifetime parameters -In contrast to `Ty`/`Const`'s `Param` singular `Param` variant, lifetimes have two variants for representing region parameters: [`RegionKind::EarlyParam`] and [`RegionKind::LateParam`]. The reason for this is due to function's distinguishing between [early and late bound parameters][ch_early_late_bound] which is discussed in an earlier chapter (see link). +In contrast to `Ty`/`Const`'s `Param` singular `Param` variant, lifetimes have two variants for representing region parameters: [`RegionKind::EarlyParam`] and [`RegionKind::LateParam`]. +The reason for this is due to function's distinguishing between [early and late bound parameters][ch_early_late_bound] which is discussed in an earlier chapter (see link). -`RegionKind::EarlyParam` is structured identically to `Ty/Const`'s `Param` variant, it is simply a `u32` index and a `Symbol`. For lifetime parameters defined on non-function items we always use `ReEarlyParam`. For functions we use `ReEarlyParam` for any early bound parameters and `ReLateParam` for any late bound parameters. Note that just like `Ty` and `Const` params we often debug format them as `'SYMBOL/#INDEX`, see for example: +`RegionKind::EarlyParam` is structured identically to `Ty/Const`'s `Param` variant; it is simply a `u32` index and a `Symbol`. +For lifetime parameters defined on non-function items, we always use `ReEarlyParam`. +For functions, we use `ReEarlyParam` for any early bound parameters and `ReLateParam` for any late bound parameters. +Note that, just like `Ty` and `Const` params, we often debug format them as `'SYMBOL/#INDEX`. + +An example: ```rust,ignore // This function would have its signature represented as: diff --git a/src/doc/rustc-dev-guide/src/ty.md b/src/doc/rustc-dev-guide/src/ty.md index c84e82adf5c5d..ef2ea37b7ca46 100644 --- a/src/doc/rustc-dev-guide/src/ty.md +++ b/src/doc/rustc-dev-guide/src/ty.md @@ -1,38 +1,40 @@ # The `ty` module: representing types -The `ty` module defines how the Rust compiler represents types internally. It also defines the +The `ty` module defines how the Rust compiler represents types internally. +It also defines the *typing context* (`tcx` or `TyCtxt`), which is the central data structure in the compiler. ## `ty::Ty` -When we talk about how rustc represents types, we usually refer to a type called `Ty`. There are -quite a few modules and types for `Ty` in the compiler ([Ty documentation][ty]). +When we talk about how rustc represents types, we usually refer to a type called `Ty`. +There are quite a few modules and types for `Ty` in the compiler ([Ty documentation][ty]). [ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/index.html The specific `Ty` we are referring to is [`rustc_middle::ty::Ty`][ty_ty] (and not -[`rustc_hir::Ty`][hir_ty]). The distinction is important, so we will discuss it first before going -into the details of `ty::Ty`. +[`rustc_hir::Ty`][hir_ty]). +The distinction is important, so we will discuss it first before going into the details of `ty::Ty`. [ty_ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html [hir_ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.Ty.html ## `rustc_hir::Ty` vs `ty::Ty` -The HIR in rustc can be thought of as the high-level intermediate representation. It is more or less -the AST (see [this chapter](hir.md)) as it represents the +The HIR in rustc can be thought of as the high-level intermediate representation. +It is more or less the AST (see [this chapter](hir.md)) as it represents the syntax that the user wrote, and is obtained after parsing and some *desugaring*. It has a representation of types, but in reality it reflects more of what the user wrote, that is, what they wrote so as to represent that type. In contrast, `ty::Ty` represents the semantics of a type, that is, the *meaning* of what the user -wrote. For example, `rustc_hir::Ty` would record the fact that a user used the name `u32` twice +wrote. +For example, `rustc_hir::Ty` would record the fact that a user used the name `u32` twice in their program, but the `ty::Ty` would record the fact that both usages refer to the same type. **Example: `fn foo(x: u32) → u32 { x }`** -In this function, we see that `u32` appears twice. We know -that that is the same type, +In this function, we see that `u32` appears twice. +We know that that is the same type, i.e. the function takes an argument and returns an argument of the same type, but from the point of view of the HIR, there would be two distinct type instances because these @@ -43,13 +45,16 @@ That is, they have two different [`Span`s][span] (locations). **Example: `fn foo(x: &u32) -> &u32`** -In addition, HIR might have information left out. This type +In addition, HIR might have information left out. +This type `&u32` is incomplete, since in the full Rust type there is actually a lifetime, but we didn’t need -to write those lifetimes. There are also some elision rules that insert information. The result may -look like `fn foo<'a>(x: &'a u32) -> &'a u32`. +to write those lifetimes. +There are also some elision rules that insert information. +The result may look like `fn foo<'a>(x: &'a u32) -> &'a u32`. In the HIR level, these things are not spelled out and you can say the picture is rather incomplete. -However, at the `ty::Ty` level, these details are added and it is complete. Moreover, we will have +However, at the `ty::Ty` level, these details are added and it is complete. +Moreover, we will have exactly one `ty::Ty` for a given type, like `u32`, and that `ty::Ty` is used for all `u32`s in the whole program, not a specific usage, unlike `rustc_hir::Ty`. @@ -67,12 +72,15 @@ Here is a summary: **Order** -HIR is built directly from the AST, so it happens before any `ty::Ty` is produced. After -HIR is built, some basic type inference and type checking is done. During the type inference, we +HIR is built directly from the AST, so it happens before any `ty::Ty` is produced. +After HIR is built, some basic type inference and type checking is done. +During the type inference, we figure out what the `ty::Ty` of everything is and we also check if the type of something is -ambiguous. The `ty::Ty` is then used for type checking while making sure everything has the -expected type. The [`hir_ty_lowering` module][hir_ty_lowering] is where the code responsible for -lowering a `rustc_hir::Ty` to a `ty::Ty` is located. The main routine used is `lower_ty`. +ambiguous. +The `ty::Ty` is then used for type checking while making sure everything has the expected type. +The [`hir_ty_lowering` module][hir_ty_lowering] is where the code responsible for +lowering a `rustc_hir::Ty` to a `ty::Ty` is located. +The main routine used is `lower_ty`. This occurs during the type-checking phase, but also in other parts of the compiler that want to ask questions like "what argument types does this function expect?" @@ -80,20 +88,23 @@ questions like "what argument types does this function expect?" **How semantics drive the two instances of `Ty`** -You can think of HIR as the perspective -of the type information that assumes the least. We assume two things are distinct until they are -proven to be the same thing. In other words, we know less about them, so we should assume less about -them. +You can think of HIR as the perspective of the type information that assumes the least. +We assume two things are distinct until they are proven to be the same thing. +In other words, we know less about them, so we should assume less about them. They are syntactically two strings: `"u32"` at line N column 20 and `"u32"` at line N column 35. We -don’t know that they are the same yet. So, in the HIR we treat them as if they are different. Later, +don’t know that they are the same yet. +So, in the HIR we treat them as if they are different. +Later, we determine that they semantically are the same type and that’s the `ty::Ty` we use. -Consider another example: `fn foo(x: T) -> u32`. Suppose that someone invokes `foo::(0)`. +Consider another example: `fn foo(x: T) -> u32`. +Suppose that someone invokes `foo::(0)`. This means that `T` and `u32` (in this invocation) actually turns out to be the same type, so we would eventually end up with the same `ty::Ty` in the end, but we have distinct `rustc_hir::Ty`. (This is a bit over-simplified, though, since during type checking, we would check the function -generically and would still have a `T` distinct from `u32`. Later, when doing code generation, +generically and would still have a `T` distinct from `u32`. +Later, when doing code generation, we would always be handling "monomorphized" (fully substituted) versions of each function, and hence we would know what `T` represents (and specifically that it is `u32`).) @@ -110,9 +121,11 @@ mod b { } ``` -Here the type `X` will vary depending on context, clearly. If you look at the `rustc_hir::Ty`, +Here the type `X` will vary depending on context, clearly. +If you look at the `rustc_hir::Ty`, you will get back that `X` is an alias in both cases (though it will be mapped via name resolution -to distinct aliases). But if you look at the `ty::Ty` signature, it will be either `fn(u32) -> u32` +to distinct aliases). +But if you look at the `ty::Ty` signature, it will be either `fn(u32) -> u32` or `fn(i32) -> i32` (with type aliases fully expanded). ## `ty::Ty` implementation @@ -121,15 +134,15 @@ or `fn(i32) -> i32` (with type aliases fully expanded). [`Interned>`][tykind]. You can ignore `Interned` in general; you will basically never access it explicitly. We always hide them within `Ty` and skip over it via `Deref` impls or methods. -`TyKind` is a big enum -with variants to represent many different Rust types +`TyKind` is a big enum with variants to represent many different Rust types (e.g. primitives, references, algebraic data types, generics, lifetimes, etc). -`WithCachedTypeInfo` has a few cached values like `flags` and `outer_exclusive_binder`. They +`WithCachedTypeInfo` has a few cached values like `flags` and `outer_exclusive_binder`. +They are convenient hacks for efficiency and summarize information about the type that we may want to -know, but they don’t come into the picture as much here. Finally, [`Interned`](./memory.md) allows -the `ty::Ty` to be a thin pointer-like -type. This allows us to do cheap comparisons for equality, along with the other -benefits of interning. +know, but they don’t come into the picture as much here. +Finally, [`Interned`](./memory.md) allows the `ty::Ty` to be a thin pointer-like +type. +This allows us to do cheap comparisons for equality, along with the other benefits of interning. [tykind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_type_ir/ty_kind/enum.TyKind.html @@ -137,16 +150,16 @@ benefits of interning. To allocate a new type, you can use the various `new_*` methods defined on [`Ty`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html). -These have names -that correspond mostly to the various kinds of types. For example: +These have names that correspond mostly to the various kinds of types. +For example: ```rust,ignore let array_ty = Ty::new_array_with_const_len(tcx, ty, count); ``` These methods all return a `Ty<'tcx>` – note that the lifetime you get back is the lifetime of the -arena that this `tcx` has access to. Types are always canonicalized and interned (so we never -allocate exactly the same type twice). +arena that this `tcx` has access to. +Types are always canonicalized and interned (so we never allocate exactly the same type twice). You can also find various common types in the `tcx` itself by accessing its fields: `tcx.types.bool`, `tcx.types.char`, etc. (See [`CommonTypes`] for more.) @@ -158,13 +171,15 @@ You can also find various common types in the `tcx` itself by accessing its fiel Because types are interned, it is possible to compare them for equality efficiently using `==` – however, this is almost never what you want to do unless you happen to be hashing and looking -for duplicates. This is because often in Rust there are multiple ways to represent the same type, +for duplicates. +This is because often in Rust there are multiple ways to represent the same type, particularly once inference is involved. For example, the type `{integer}` (`ty::Infer(ty::IntVar(..))` an integer inference variable, the type of an integer literal like `0`) and `u8` (`ty::UInt(..)`) should often be treated as equal when testing whether they can be assigned to each other (which is a common operation in -diagnostics code). `==` on them will return `false` though, since they are different types. +diagnostics code). +`==` on them will return `false` though, since they are different types. The simplest way to compare two types correctly requires an inference context (`infcx`). If you have one, you can use `infcx.can_eq(param_env, ty1, ty2)` @@ -174,32 +189,36 @@ as whether two types can be assigned to each other, not whether they're represen the compiler's type-checking layer. When working with an inference context, you have to be careful to ensure that potential inference -variables inside the types actually belong to that inference context. If you are in a function -that has access to an inference context already, this should be the case. Specifically, this is the -case during HIR type checking or MIR borrow checking. - -Another consideration is normalization. Two types may actually be the same, but one is behind an -associated type. To compare them correctly, you have to normalize the types first. This is -primarily a concern during HIR type checking and with all types from a `TyCtxt` query +variables inside the types actually belong to that inference context. +If you are in a function that has access to an inference context already, this should be the case. +Specifically, this is the case during HIR type checking or MIR borrow checking. + +Another consideration is normalization. +Two types may actually be the same, but one is behind an associated type. +To compare them correctly, you have to normalize the types first. +This is primarily a concern during HIR type checking and with all types from a `TyCtxt` query (for example from `tcx.type_of()`). When a `FnCtxt` or an `ObligationCtxt` is available during type checking, `.normalize(ty)` -should be used on them to normalize the type. After type checking, diagnostics code can use -`tcx.normalize_erasing_regions(ty)`. +should be used on them to normalize the type. +After type checking, diagnostics code can use `tcx.normalize_erasing_regions(ty)`. -There are also cases where using `==` on `Ty` is fine. This is for example the case in late lints +There are also cases where using `==` on `Ty` is fine. +This is, for example, the case in late lints or after monomorphization, since type checking has been completed, meaning all inference variables -are resolved and all regions have been erased. In these cases, if you know that inference variables +are resolved and all regions have been erased. +In these cases, if you know that inference variables or normalization won't be a concern, `#[allow]` or `#[expect]`ing the lint is recommended. When diagnostics code does not have access to an inference context, it should be threaded through the function calls if one is available in some place (like during type checking). If no inference context is available at all, then one can be created as described in -[type-inference]. But this is only useful when the involved types (for example, if +[type-inference]. +But this is only useful when the involved types (for example, if they came from a query like `tcx.type_of()`) are actually substituted with fresh -inference variables using [`fresh_args_for_item`]. This can be used to answer questions -like "can `Vec` for any `T` be unified with `Vec`?". +inference variables using [`fresh_args_for_item`]. +This can be used to answer questions like "can `Vec` for any `T` be unified with `Vec`?". [type-inference]: ./type-inference.md#creating-an-inference-context [`fresh_args_for_item`]: https://doc.rust-lang.org/beta/nightly-rustc/rustc_infer/infer/struct.InferCtxt.html#method.fresh_substs_for_item @@ -229,25 +248,29 @@ There are a lot of related types, and we’ll cover them in time (e.g regions/li “substitutions”, etc). There are many variants on the `TyKind` enum, which you can see by looking at its -[documentation][tykind]. Here is a sampling: +[documentation][tykind]. +Here is a sampling: - [**Algebraic Data Types (ADTs)**][kindadt] An [*algebraic data type*][wikiadt] is a `struct`, - `enum` or `union`. Under the hood, `struct`, `enum` and `union` are actually implemented - the same way: they are all [`ty::TyKind::Adt`][kindadt]. It’s basically a user defined type. + `enum` or `union`. + Under the hood, `struct`, `enum` and `union` are actually implemented + the same way: they are all [`ty::TyKind::Adt`][kindadt]. + It’s basically a user defined type. We will talk more about these later. - [**Foreign**][kindforeign] Corresponds to `extern type T`. -- [**Str**][kindstr] Is the type str. When the user writes `&str`, `Str` is the how we represent the - `str` part of that type. +- [**Str**][kindstr] Is the type str. + When the user writes `&str`, `Str` is how we represent the `str` part of that type. - [**Slice**][kindslice] Corresponds to `[T]`. - [**Array**][kindarray] Corresponds to `[T; n]`. - [**RawPtr**][kindrawptr] Corresponds to `*mut T` or `*const T`. -- [**Ref**][kindref] `Ref` stands for safe references, `&'a mut T` or `&'a T`. `Ref` has some +- [**Ref**][kindref] `Ref` stands for safe references, `&'a mut T` or `&'a T`. + `Ref` has some associated parts, like `Ty<'tcx>` which is the type that the reference references. `Region<'tcx>` is the lifetime or region of the reference and `Mutability` if the reference is mutable or not. - [**Param**][kindparam] Represents a type parameter (e.g. the `T` in `Vec`). -- [**Error**][kinderr] Represents a type error somewhere so that we can print better diagnostics. We - will discuss this more later. +- [**Error**][kinderr] Represents a type error somewhere so that we can print better diagnostics. + We will discuss this more later. - [**And many more**...][kindvars] [wikiadt]: https://en.wikipedia.org/wiki/Algebraic_data_type @@ -270,19 +293,22 @@ Although there is no hard and fast rule, the `ty` module tends to be used like s use ty::{self, Ty, TyCtxt}; ``` -In particular, since they are so common, the `Ty` and `TyCtxt` types are imported directly. Other +In particular, since they are so common, the `Ty` and `TyCtxt` types are imported directly. +Other types are often referenced with an explicit `ty::` prefix (e.g. `ty::TraitRef<'tcx>`). But some modules choose to import a larger or smaller set of names explicitly. ## Type errors -There is a `TyKind::Error` that is produced when the user makes a type error. The idea is that +There is a `TyKind::Error` that is produced when the user makes a type error. +The idea is that we would propagate this type and suppress other errors that come up due to it so as not to overwhelm the user with cascading compiler error messages. -There is an **important invariant** for `TyKind::Error`. The compiler should -**never** produce `Error` unless we **know** that an error has already been -reported to the user. This is usually +There is an **important invariant** for `TyKind::Error`. +The compiler should **never** produce `Error` unless we **know** that an error has already been +reported to the user. +This is usually because (a) you just reported it right there or (b) you are propagating an existing Error type (in which case the error should've been reported when that error type was produced). @@ -291,10 +317,12 @@ other errors -- i.e., we don't report them. If we were to produce an `Error` typ emitting an error to the user, then this could cause later errors to be suppressed, and the compilation might inadvertently succeed! -Sometimes there is a third case. You believe that an error has been reported, but you believe it -would've been reported earlier in the compilation, not locally. In that case, you can create a -"delayed bug" with [`delayed_bug`] or [`span_delayed_bug`]. This will make a note that you expect -compilation to yield an error -- if however compilation should succeed, then it will trigger a +Sometimes there is a third case. +You believe that an error has been reported, but you believe it +would've been reported earlier in the compilation, not locally. +In that case, you can create a "delayed bug" with [`delayed_bug`] or [`span_delayed_bug`]. +This will make a note that you expect +compilation to yield an error -- if, however, compilation should succeed, then it will trigger a compiler bug report. [`delayed_bug`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/struct.DiagCtxt.html#method.delayed_bug @@ -302,13 +330,13 @@ compiler bug report. For added safety, it's not actually possible to produce a `TyKind::Error` value outside of [`rustc_middle::ty`][ty]; there is a private member of -`TyKind::Error` that prevents it from being constructable elsewhere. Instead, -one should use the [`Ty::new_error`][terr] or -[`Ty::new_error_with_message`][terrmsg] methods. These methods either take an `ErrorGuaranteed` -or call `span_delayed_bug` before returning an interned `Ty` of kind `Error`. If you -were already planning to use [`span_delayed_bug`], then you can just pass the -span and message to [`ty_error_with_message`][terrmsg] instead to avoid -a redundant delayed bug. +`TyKind::Error` that prevents it from being constructable elsewhere. +Instead, +one should use the [`Ty::new_error`][terr] or [`Ty::new_error_with_message`][terrmsg] methods. +These methods either take an `ErrorGuaranteed` +or call `span_delayed_bug` before returning an interned `Ty` of kind `Error`. +If you were already planning to use [`span_delayed_bug`], then you can just pass the +span and message to [`ty_error_with_message`][terrmsg] instead to avoid a redundant delayed bug. [terr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html#method.new_error [terrmsg]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html#method.new_error_with_message @@ -316,9 +344,12 @@ a redundant delayed bug. ## `TyKind` variant shorthand syntax -When looking at the debug output of `Ty` or simply talking about different types in the compiler, you may encounter syntax that is not valid rust but is used to concisely represent internal information about types. Below is a quick reference cheat sheet to tell what the various syntax actually means, these should be covered in more depth in later chapters. +When looking at the debug output of `Ty` or simply talking about different types in the compiler, you may encounter syntax that is not valid Rust but is used to concisely represent internal information about types. +Below is a quick reference cheat sheet to tell what the various syntax actually means: - Generic parameters: `{name}/#{index}` e.g. `T/#0`, where `index` corresponds to its position in the list of generic parameters - Inference variables: `?{id}` e.g. `?x`/`?0`, where `id` identifies the inference variable - Variables from binders: `^{binder}_{index}` e.g. `^0_x`/`^0_2`, where `binder` and `index` identify which variable from which binder is being referred to - Placeholders: `!{id}` or `!{id}_{universe}` e.g. `!x`/`!0`/`!x_2`/`!0_2`, representing some unique type in the specified universe. The universe is often elided when it is `0` + +These should be covered in more depth in later chapters.