diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 5558072..813e43d 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -8,7 +8,6 @@ - [Default constructors](./idioms/constructors/default_constructors.md) - [Copy and move constructors](./idioms/constructors/copy_and_move_constructors.md) - [Rule of three/five/zero](./idioms/constructors/rule_of_three_five_zero.md) - - [Destructors and resource cleanup](./idioms/destructors.md) - [Data modeling](./idioms/data_modeling.md) - [Abstract classes, interfaces, and dynamic dispatch](./idioms/data_modeling/abstract_classes.md) @@ -43,26 +42,24 @@ - [Multiple return values](./idioms/out_params/multiple_return.md) - [Optional return values](./idioms/out_params/optional_return.md) - [Pre-allocated buffers](./idioms/out_params/pre-allocated_buffers.md) -- [Varargs]() -- [Attributes]() -- [Calling C (FFI)]() -- [NRVO, RVO, and placement new]() +- [Rust and C++ interoperability (FFI)](./idioms/ffi.md) +- [NRVO and RVO](./idioms/rvo.md) +- [Placement new](./idioms/placement_new.md) - [Concurrency (threads and async)]() # Patterns - [Adapter pattern](./patterns/adapter.md) -- [Visitor pattern and double dispatch]() +- [Visitor pattern and double dispatch](./patterns/visitor.md) - [Curiously recurring template pattern (CRTP)](./patterns/crtp.md) -- [Pointer-to-implementation (PImpl)]() -- [X macros]() +- [Pointer-to-implementation (PIMPL)](./patterns/pimpl.md) # Ecosystem - [Libraries](./etc/libraries.md) - [Tests](./etc/tests.md) -- [Documentation (Doxygen)]() -- [Build systems (CMake)]() +- [Documentation (e.g., Doxygen)](./etc/documentation.md) +- [Build systems (e.g., CMake)](./etc/build_systems.md) --- diff --git a/src/etc/build_systems.md b/src/etc/build_systems.md new file mode 100644 index 0000000..e2662d2 --- /dev/null +++ b/src/etc/build_systems.md @@ -0,0 +1,44 @@ +# Build systems (e.g., CMake) + +One major difference between the C++ and Rust ecosystems is that C++ and C +libraries tend to be either provided by OS distributions or be included in the +repository for a project, while Rust has a central language-specific package +registry called [crates.io](https://crates.io/). + +This difference is amplified by the fact that the Rust build tool, Cargo, has a +build in package manager that works with crates.io, private registries, local +packages, and vendored sources. + +Cargo is documented in detail in the [Cargo +Book](https://doc.rust-lang.org/cargo/). + +## Packages for C and C++ system libraries + +Many C libraries have crates on crates.io providing both low-level bindings and +high-level safe Rust abstractions. For example, for the libgit2 library there is +both a low-level [libgit2-sys crate](https://crates.io/crates/libgit2-sys) and a +high-level [git2 crate](https://crates.io/crates/git2). See the [chapter on the +Rust FFI](../idioms/ffi.md) for more information on how to define these crates. + +## Building C, C++, and Rust code + +Cargo [build +scripts](https://doc.rust-lang.org/cargo/reference/build-script-examples.html) +can be used to build C and C++ code as part of a Rust project. The linked +chapter of the Cargo book includes links to resources handling the compilation +of C, C++, and other code, working with `pkg-config`, etc. + +## Testing (CTest) + +Cargo includes support for [running +tests](https://doc.rust-lang.org/cargo/guide/tests.html). + +## Packaging for distribution (CPack) + +Unlike CPack which is provided with CMake, Cargo does not come with tools for +packaging for distribution to end users. However, there are third party Cargo +helpers for packaging, such as [cargo-deb](https://crates.io/crates/cargo-deb) +for creating Debian package, +[cargo-generate-rpm](https://crates.io/crates/cargo-generate-rpm) for creating +RPM packages, and [cargo-wix](https://crates.io/crates/cargo-wix) for creating +Windows installers. diff --git a/src/etc/documentation.md b/src/etc/documentation.md new file mode 100644 index 0000000..8d66461 --- /dev/null +++ b/src/etc/documentation.md @@ -0,0 +1,198 @@ +# Documentation (e.g., Doxygen) + +While C++ has several documentation tools, such as Doxygen and Sphinx, Rust has +a single documentation tool, +[Rustdoc](https://doc.rust-lang.org/rustdoc/index.html). Rustdoc is supported by +[`docs.rs`](https://docs.rs/), cargo, +[rust-analyzer](https://rust-analyzer.github.io/), and is the tool used for +documenting [the standard +library](https://doc.rust-lang.org/std/vec/struct.Vec.html#blanket-implementations). +Rustdoc is installed by default with the Rust toolchain for most distributions. + +The features and options available for Rustdoc are documented in the [Rustdoc +Book](https://doc.rust-lang.org/rustdoc/). The book also documents [best +practices for documenting Rust +code](https://doc.rust-lang.org/rustdoc/how-to-write-documentation.html), which +differ slightly from the recommended practices for documenting C++ code using +Doxygen. + +The Cargo integration is documented at in the Cargo book under the [`cargo +doc`](https://doc.rust-lang.org/cargo/commands/cargo-doc.html) and [`cargo +rustdoc`](https://doc.rust-lang.org/cargo/commands/cargo-rustdoc.html) commands, +as well as in the [doctests +section](https://doc.rust-lang.org/cargo/commands/cargo-test.html#documentation-tests) +of the `cargo test` command. + +This chapter compares some aspects of Rustdoc with Doxygen in order to help with +understanding what to expect when using Rustdoc when coming from Doxygen or +similar C++ documentation tools. + +## Output formats + +Unlike Doxygen which can also produce PDF and man page output, Rustdoc only +produces HTML output. The produced documentation does include client-side +searching, which includes the ability to search by type signature. + +The [rust-analyzer](https://rust-analyzer.github.io/) language server supports +Rustdoc comments, and makes them available to editors with language server +protocol support on hover, even for Rustdoc comments in the current project. + +## Rustdoc comment syntax + +Unlike Doxygen, which has several supported comment syntaxes for C++, Rustup +supports a single comment syntax. Comments beginning with `//!` document the +top-level module or crate. Comments beginning with `///` document the following +item. + +
+ +```cpp +/** + * @file myheader.h + * @brief A description of this file. + * + * A longer description, with examples, etc. + */ + +/** + * @brief A description of this class. + * + * A longr description, with examples, etc. + */ +struct MyClass { + // ... +}; +``` + +```rust +//! A description of this module or crate. +//! +//! A longer description, with examples, etc. + +/// A description of this type. +/// +/// A longer description, with examples, etc. +struct MyClass { + // ... +} +``` + +
+ +### Special forms + +The content of the comment up until the first blank line is treated similarly to +the `@brief` form in Doxygen. + +Aside from that, Rustdoc does not have special forms for documenting various +parts of an item, such as the parameters of a function. Instead, [Markdown +syntax can be +used](https://doc.rust-lang.org/rustdoc/how-to-write-documentation.html#markdown) +to format the documentation, which is otherwise given in prose. + +There are several common conventions used for structuring documentation +comments. The most common convention is to include sections (defined using +Markdown header syntax) for whichever of the following are necessary for the +item being documented: + +- panics (for functions that panic, e.g., on + [`Vec::split_at`](https://doc.rust-lang.org/std/vec/struct.Vec.html#panics-32)), +- safety (for unsafe functions, e.g., on + [`Vec::split_at_unchecked`](https://doc.rust-lang.org/std/primitive.slice.html#safety-10)), + and +- examples (e.g., on [`Vec::split_at`](https://doc.rust-lang.org/std/vec/struct.Vec.html#examples-105)). + +The following comment compares documentation for a C++ function using Doxygen to +documentation for a Rust function using Rustdoc. + +
+ + +```cpp +/** + * @brief Computes the factorial. + * + * Computes the factorial in a stack-safe way. + * The factorial is defined as... + * + * @code + * #include + * #include "factorial.h" + * + * int main() { + * int res = factorial(3); + * assert(6 == res); + * } + * @endcode + * + * @param n The number of which to take the factorial + * + * @return The factorial + * + * @exception domain_error If n < 0 + */ +int factorial(int n); +``` + +```rust +/// Computes the factorial. +/// +/// Computes the factorial in a stack-safe way. +/// The factorial is defined as... +/// +/// # Examples +/// +/// ``` +/// let res = factorial(3); +/// assert_eq!(6, res); +/// ``` +/// +/// # Panics +/// +/// Requires that `n >= 0`, otherwise panics. +/// For the non-panicking version see +/// [`factorial_checked`]. +fn factorial(n: i32) -> i32 { + // ... +# todo!() +} +``` + +
+ +### Automatic documentation + +Many of the things that can be derived from the code are automatically included +by Rustdoc. A major one is that trait implementations (e.g., on +[`Vec`](https://doc.rust-lang.org/std/vec/struct.Vec.html#trait-implementations)), +including blanket implementations (e.g., on +[`Vec`](https://doc.rust-lang.org/std/vec/struct.Vec.html#blanket-implementations)), +for a type do not have to be documented manually because implementations that +are visible in the crate are automatically discovered and included by Rustdoc. + +# Additional features + +Some valuable Rustdoc features may not be expected by someone coming from using +Doxygen. Because those features provide significant benefit, they are pointed +out here. + +## Doctest support via Rustdoc and `cargo test` + +One specific benefit of including examples when documenting Rust programs using +Rustdoc is that [the examples can be included in the test suite when running +`cargo +test`](https://doc.rust-lang.org/rustdoc/write-documentation/documentation-tests.html). + +The handling of examples as tests in Rustdoc includes logic for handling partial +programs, so that even [the code example in the earlier +comparison](#special-forms-comparison) can serve as a test. + +## Local documentation for project and installed libraries + +Local documentation for both the working project and dependent libraries can be +viewed in-browser using `cargo doc --open`. Private items for the project can be +included in the documentation by using `cargo doc --open +--document-private-items`. Because Rustdoc comments are also used by the +[rust-analyzer](https://rust-analyzer.github.io/) language server to provide +documentation on hover in compatible editors, it is often worth it to document +private items using Rustdoc comments. diff --git a/src/idioms/closures.md b/src/idioms/closures.md index 1b2c985..d0883f0 100644 --- a/src/idioms/closures.md +++ b/src/idioms/closures.md @@ -1429,3 +1429,5 @@ closure. The example involving `FnOnce` functions in [the previous section](#closures-ownership-and-fnonce) may be a point of frustration initially, but the behavior has the benefit of reducing the documentation and reasoning burdens. + +{{#quiz closures.toml}} diff --git a/src/idioms/closures.toml b/src/idioms/closures.toml new file mode 100644 index 0000000..302f87a --- /dev/null +++ b/src/idioms/closures.toml @@ -0,0 +1,80 @@ +[[questions]] +type = "MultipleChoice" +prompt.prompt = """ +Does the following Rust program compile? If not, why not? + +```rust +fn main() { + let s = String::from("hi"); + + for x in (0..10).map(|_| s) { + println!("{}", x); + } +} +``` +""" +prompt.distractors = [ +# Thinking in terms of C copy semantics. +""" +The program compiles. +""", +# Thinking that `FnOnce` has to do with the number of calls rather than +# ownership. +""" +The program does not compile because `|_| s` is a `FnOnce` closure, but is +called 10 times because the range is `0..10`. +""" +] +answer.answer = """ +The program does not compile because the `|_| s` is a `FnOnce` closure, but +`map` takes a `FnMut` closure. +""" +context = """ +The compiler will actually infer the equivalent problem that the closure is +`FnMut` and then state that captured variables cannot be moved out of a `FnMut` +closure. + +The way to fix the problem depends on whether the closure needs to return owned +`String` values, or if `&str` is enough. In the latter case, the closure can be +`|_| &s` instead. In the former case, the closure needs to return a clone of +`s`. +""" +id = "15d61c38-0bd1-45eb-bb2f-db9d357995e3" + +[[questions]] +type = "MultipleChoice" +prompt.prompt = """ +Does the following Rust program compile? If not, why not? + +```rust +fn main() { + let n: i32 = 42; + + for x in (0..10).map(|_| n) { + println!("{}", x); + } +} +``` +""" +prompt.distractors = [ +# Not seeing the difference with the previous question. +""" +The program does not compile because the `|_| s` is a `FnOnce` closure, but +`map` takes a `FnMut` closure. +""", +# Thinking that `FnOnce` has to do with the number of calls rather than +# ownership. +""" +The program does not compile because `|_| s` is a `FnOnce` closure, but is +called 10 times because the range is `0..10`. +""" +] +answer.answer = """ +The program compiles. +""" +context = """ +Because `i32` implements the `Copy` trait, returning `n` implicitly copies `n`, +making the closure a `Fn` closure instead of a `FnOnce` closure. `Fn` closures +implement `FnMut`, and so can be used with `map`. +""" +id = "798f623e-0802-4b29-8516-09d443dbb7e8" diff --git a/src/idioms/constructors/partial_initialzation.md b/src/idioms/constructors/partial_initialzation.md deleted file mode 100644 index b98dc98..0000000 --- a/src/idioms/constructors/partial_initialzation.md +++ /dev/null @@ -1,146 +0,0 @@ -# Separate construction and initialization - -The approach to take in Rust depends on the reason for separating construction -and initialization. - -- For incremental initialization, use a [builder pattern](#rust-builder-pattern). -- Using virtual methods during construction is [not applicable to - Rust](#using-virtual-methods-during-initialization). -- For pre-allocating storage or re-using allocated objects, the techniques and - limitations described in the chapter on [pre-allocated - buffers](../out_params/pre-allocated_buffers.md) apply. - -## Rust builder pattern - -Implementing the builder pattern in Rust involves defining a second "builder" -type to represent the partially constructed value, where each field of type `T` -has type `Option` in the builder. This differs from C++ where either null -values or uninitialized memory can be used to construct an object incrementally. - -
- -```cpp -#include -#include - -struct Pet { - std::string name; -}; - -struct Person { - int age; - std::shared_ptr pet; -}; - -int main() { - Person person; - // Can initialize incrementally without a builder. - // - // Initilizes with age indeterminate and pet as - // nullptr. - person.age = 42; - person.pet = std::make_shared("Mittens"); -} -``` - -```rust -use std::rc::Rc; - -struct Pet { - name: String, -} - -struct Person { - age: i32, - pet: Rc, -} - -struct PersonBuilder { - age: Option, - pet: Option>, -} - -impl PersonBuilder { - fn new() -> PersonBuilder { - PersonBuilder { - age: None, - pet: None, - } - } - - fn age(&mut self, age: i32) -> &mut Self { - self.age = Some(age); - self - } - - fn pet(&mut self, pet: Rc) -> &mut Self { - self.pet = Some(pet); - self - } - - fn build(&self) -> Option { - Some(Person { - age: self.age?, - pet: self.pet.clone()?, - }) - } -} - -fn main() { - let mut builder = PersonBuilder::new(); - let pet = Rc::new(Pet { - name: "Mittens".to_string(), - }); - let person = builder.age(42).pet(pet).build(); -} -``` - -
- -This pattern is sufficiently common that there are libraries to support it, such -as the [`derive_builder` crate](https://crates.io/crates/derive_builder). Using -that crate, the above example is much shorter. - -```rust,ignore -#[derive(Builder)] -struct Person { - age: i32, - name: String, -} -``` - -The resulting API also includes additional features, such as the `build` method -returning a `Result::Err` with an informative error, rather than just `None`, -when not all of required fields are set. - -### An alternative: updating based on a default value - -If there is a reasonable default value for a type, then instead of the builder -pattern, the `Default` trait can be implemented. [Values can be constructed -based on the default value proved by the `Default` -implementation](./default_constructors.md#struct-update). - -### Why builders are more common in Rust than in C++ - -The builder pattern is used more often in C++ than in Rust because - -1. Rust models ownership of pointers orthogonally to optionality, and -2. Rust requires handling all variants of a tagged union. - -This encourages using the type system to model invariants more explicitly, which -means that if different invariants hold before and after construction is -completed, different structs need to be defined to represent those different -states. - -In particular, while a value is in the middle of being incrementally -constructed, the fields are optional. Once fully constructed, the fields are no -longer optional. - -## Using virtual methods during initialization - -Separate initialization is sometimes used in C++ to overcome the limitation that -calling virtual methods during construction in is undefined behavior. The -difference in mechanics in construction in Rust make this kind of workaround -unnecessary. The code that usually runs as part of a constructor in C++ is -defined as a static method in Rust. The kind of partially-constructed state that -exists during the execution of the constructor in C++ does not exist in Rust. diff --git a/src/idioms/data_modeling.md b/src/idioms/data_modeling.md index 3d5a640..bf0316d 100644 --- a/src/idioms/data_modeling.md +++ b/src/idioms/data_modeling.md @@ -43,12 +43,18 @@ Additionally, despite it not being strictly necessary to model a fixed set of variants, the visitor pattern is sometimes used for this situation, especially when using versions of the C++ standard before the introduction of `std::variant`. In most of these cases the idiomatic Rust solution is the same -as what one would do when converting a C++ solution that uses tagged unions. The -chapter on the visitor pattern describes when to use a -Rust version of the visitor pattern or when to use Rust's enums (which are -closer to `std::variant` than to C++ enums) to model the data. +as what one would do when converting a C++ solution that uses [tagged +unions](./data_modeling/tagged_unions.md). The chapter on the [visitor +pattern](../patterns/visitor.md) describes when to use a Rust version of the +visitor pattern or when to use Rust's enums (which are closer to `std::variant` +than to C++ enums) to model the data. ## Varying data and operations -When both data and operations may be extended by a client, the visitor pattern -is used in both C++ and in Rust. +When both data and operations may be extended by a client, the required +solutions are more complex. In C++, the approach usually involves some kind of +extension to the [visitor pattern](../patterns/visitor.md) along with dynamic +casting. Because Rust does not support the kind of RTTI necessary for a dynamic +cast operator, different approaches need to be used. Some of those approaches +are discussed in the [chapter on the visitor +pattern](../patterns/visitor.md#expression-problem). diff --git a/src/idioms/exceptions.md b/src/idioms/exceptions.md index a4570ff..b271831 100644 --- a/src/idioms/exceptions.md +++ b/src/idioms/exceptions.md @@ -27,7 +27,7 @@ are that 1. `Result` and `Option` force explicit handling of the error case in order to access the contained value. This also differs from `std::expected` in C++23. -2. When propagating errors with `Result`, the types of the errors much match. +2. When propagating errors with `Result`, the types of the errors must match. There are libraries for making this easier to handle. ## `Result` vs `Option` diff --git a/src/idioms/exceptions/bugs.md b/src/idioms/exceptions/bugs.md index a707922..1932cf3 100644 --- a/src/idioms/exceptions/bugs.md +++ b/src/idioms/exceptions/bugs.md @@ -265,3 +265,5 @@ default panic handler. Instead one must be specified using the The Embedded Rust Book [chapter on handling panics](https://docs.rust-embedded.org/book/start/panicking.html) has more details on implementing panic handlers for in `no_std` programs. + +{{#quiz bugs.toml}} diff --git a/src/idioms/exceptions/bugs.toml b/src/idioms/exceptions/bugs.toml new file mode 100644 index 0000000..cc86bb2 --- /dev/null +++ b/src/idioms/exceptions/bugs.toml @@ -0,0 +1,42 @@ +[[questions]] +type = "MultipleChoice" +prompt.prompt = """ +Can unafe Rust code rely on the `assert! macro to check invariants that, if +violated, would lead to undefined behavior? +""" +prompt.distractors = [ +""" +It cannot because the assertions might be disabled at runtime. +""" +] +answer.answer = """ +It can, because the assertions are guaranteed to be checked. +""" +context = """ +Unlike `assert` in C++, the `assert!` macro in Rust cannot be disabled. For +expensive-to-check invariants that should be disabled in release builds, use +`debug_assert!` instead. +""" +id = "42c3ea1a-4cf3-4e6c-869b-ae71dd0db493" + +[[questions]] +type = "MultipleChoice" +prompt.prompt = """ +True or false: a failed `assert!` will always halt the program. +""" +prompt.distractors = [ +""" +True +""" +] +answer.answer = """ +False +""" +context = """ +Unlike `assert` in C++ which calls `std::abort()` when the assertion is false, +in Rust, when the condition for an `assert!` is false, `panic!` is called. How +the program handles a panic depends on the panic strategy, the panic handler, +whether the panic was on the main thread, and whether `catch_unwind` was used. +In any case, the program will not resume from where the panic occurred. +""" +id = "91c357c8-786c-4f09-b164-3ffa7a68f180" diff --git a/src/idioms/exceptions/expected_errors.md b/src/idioms/exceptions/expected_errors.md index b7c99a0..2f2f344 100644 --- a/src/idioms/exceptions/expected_errors.md +++ b/src/idioms/exceptions/expected_errors.md @@ -724,3 +724,5 @@ Both [thiserror](https://docs.rs/thiserror/latest/thiserror/) and [anyhow](https://docs.rs/anyhow/latest/anyhow/) have support for conveniently adding backtrace information to errors. Instructions for including backtraces are given on the main documentation page for each crate. + +{{#quiz expected_errors.toml}} diff --git a/src/idioms/exceptions/expected_errors.toml b/src/idioms/exceptions/expected_errors.toml new file mode 100644 index 0000000..e63052f --- /dev/null +++ b/src/idioms/exceptions/expected_errors.toml @@ -0,0 +1,140 @@ +[[questions]] +type = "MultipleChoice" +prompt.prompt = """ +Which Rust signature the most idiomatic to use for a translation of the +following C++ function? + +```cpp +#include +#include + +/** + * @exception bad_domain If the given vector is + * empty. + */ +void process(std::vector userInput) { + if (userInput.empty()) { + throw std::domain_error("Non-empty vector required"); + } + if (userInput.size() > 100) { + throw std::domain_error("Vector is too big"); + } + + // process elements of v +} +``` +""" +prompt.distractors = [ +""" +```rust +/// # Panics +/// +/// Panics if the given vector is empty or too big. +fn process(userInput: Vec); +``` +""", +""" +```rust +/// Returns `None` if the given vector is empty or too big. +fn process(userInput: Vec) -> Option; +``` +""" +] +answer.answer = """ +```rust +use thiserror::Error; + +#[derive(Clone, Copy, Debug, Error)] +enum ProcessError { + #[error("the vector is empty")] + EmptyVec, + #[error("the vector is too big")] + TooBigVec, +} + +/// Returns `Err(EmptyVec)` if the given vector is empty, +/// or `Err(TooBigVec)` if the given vector is too big. +fn process(userInput: Vec) -> Result<(), ProcessError>; +``` +""" +context = """ +Errors that are expected (such as might arise from handling user input) should +be represented with `Result` or `Option`, rather than panics, so that they can +be handled. + +Since there are multiple kinds of errors that might be produced, `Result` should +be used instead of `Option` so that the errors can be distinguished, in order +to, e.g., provide different error messages for the user. +""" +id = "5654c1c0-b526-4248-8684-64e55b26e715" + +[[questions]] +type = "MultipleChoice" +# Focuses on ? operator, since it is so important for readable error handling. +prompt.prompt = """ +Assume `f` is some function + +```rust +fn f(i32) -> Option { + // ... +} +``` + +Which programs have equivalent behavior to the following program? + +```rust +fn go() -> Option<(i32, i32)> { + let x = f(0)?; + let y = f(1)?; + Some((x, y)) +} +``` +""" +# Focuses on ? operator, since it is so important for readable error handling. +prompt.distractors = [ +""" +```rust +fn go() -> Option<(i32, i32)> { + match (f(0), f(1)) { + (Some(x), Some(y)) => Some((x, y)), + _ => None, + } +} +``` +""" +] +answer.answer = [ +""" +```rust +fn go() -> Option<(i32, i32)> { + if let Some(x) = f(0) + && let Some(y) = f(1) + { + return Some((x, y)); + } + None +} +``` +""", +""" +```rust +fn go() -> Option<(i32, i32)> { + let Some(x) = f(0) else { + return None; + }; + let Some(y) = f(1) else { + return None; + }; + Some((x, y)) +} +``` +""" +] +context = """ +The `?` operator returns early, so that the rest of the function is not +executed. + +The program using `match` call `f` on both `0` and `1`, while the others return +early and so do not call `f` on `1`. +""" +id = "08e6862a-31a2-4554-bfbf-bdf13cd4d8c0" diff --git a/src/idioms/ffi.md b/src/idioms/ffi.md new file mode 100644 index 0000000..b0f1460 --- /dev/null +++ b/src/idioms/ffi.md @@ -0,0 +1,14 @@ +# Rust and C++ interoperability (FFI) + +The Rustonomicon [contains a +chapter](https://doc.rust-lang.org/nomicon/ffi.html) covering many of the +concerns relevant to a C++ programmer that wants to call C (or C++ via `extern +"C"` functions) from Rust or Rust from C or C++ code. + +Many C libraries have existing crates, both with low-level bindings and with +high-level safe Rust abstractions. For example, for the libgit2 library there is +both a low-level [libgit2-sys crate](https://crates.io/crates/libgit2-sys) and a +high-level [git2 crate](https://crates.io/crates/git2). + +Bindings to libraries can be generated from a C header file using +[`bindgen`](https://rust-lang.github.io/rust-bindgen/). diff --git a/src/idioms/iterators.md b/src/idioms/iterators.md index 2e309d9..2522d8f 100644 --- a/src/idioms/iterators.md +++ b/src/idioms/iterators.md @@ -980,3 +980,5 @@ iterators](https://doc.rust-lang.org/std/iter/trait.DoubleEndedIterator.html), which allow consuming items from the back of the iterator. However, each item can still only be consumed once: when the front and back meet in the middle, iteration is over. + +{{#quiz iterators.toml}} diff --git a/src/idioms/iterators.toml b/src/idioms/iterators.toml new file mode 100644 index 0000000..fdcf20a --- /dev/null +++ b/src/idioms/iterators.toml @@ -0,0 +1,149 @@ +[[questions]] +type = "MultipleChoice" +prompt.prompt = """ +Does the following Rust program compile? If not, why not? + +```rust +fn main() { + let v = vec!["a".to_string(), "b".to_string()]; + + let mut last = None; + for x in v { + println!("{}", x); + last = Some(x); + } + + println!("{}", v.len()); + println!("last: {:?}", last); +} +``` +""" +prompt.distractors = [ +""" +The program compiles. +""", +""" +The program does not compile because `v` is a `Vec`, not an `Iterator`. +""", +""" +The program does not compile because `last` is mutable and borrows `x` while `v` +is borrowed by `len()`. +""" +] +answer.answer = """ +The program does not compile because the for loop moves `v`, and so it can't be +borrowed afterwards. +""" +context = """ +For loops implicitly call `into_iter` on the iterated value. The method +`into_iter` takes ownership of the value. To make this compile, the loop should +be on a reference to `v`, so that it is the reference that has ownership taken +of it. + +Even though `last` is mutable, the `&String` reference it would contain if `v` +were borrowed isn't. Thus, that does not conflict with the later use of `v`. +""" +id = "34e60ac8-ba03-4076-9ee4-a8a50e9c446d" + +# This question should have the additional benefit of forcing people to actually +# look at the standard library docs. +[[questions]] +type = "MultipleChoice" +prompt.prompt = """ +Which of the following programs are ways in Rust to update each element of a +vector based on all of the other elements in the vector? + +The documentation for the following methods might be useful in answering this +question: + +- `enumerate` +- `split_at_mut` +- `split_first_mut` +- `chain` +""" +prompt.distractors = [ +""" +```rust +fn main() { + let mut v: Vec = (0..10).collect(); + + for (i, x) in (&mut v).into_iter().enumerate() { + for (j, y) in (&v).into_iter().enumerate() { + if i != j { + *x += *y; + } + } + } +} +``` +""", +""" +```rust +fn main() { + let mut v: Vec = (0..10).collect(); + + for i in 0..v.len() { + let mut x = &mut v[i]; + for j in 0..v.len() { + if i != j { + let y = &mut v[j]; + *x += *y; + } + } + } +} +``` +""" +] +answer.answer = [ +""" +``` +fn main() { + let mut v: Vec = (0..10).collect(); + + for i in 0..v.len() { + for j in 0..v.len() { + if i != j { + v[i] += v[j]; + } + } + } +} +``` +""", +""" +```rust +fn main() { + let mut v: Vec = (0..10).collect(); + + for i in 0..v.len() { + let (before, rest) = v.split_at_mut(i); + let (x, after) = rest.split_first_mut().unwrap(); + for y in before.into_iter().chain(after) { + *x += *y; + } + } +} +``` +""" +] +context = """ +The solution that uses `enumerate` fails to compile because `v` is borrowed for +the entirety of each for loop. The borrow checker cannot tell that the condition +prevents `x` and `y` from being used at the same time when they point to the +same index. + +The solution that uses indices but takes the references immediately fails to +compile for the same reason. + +The solution that uses indices throughout works because the lifetimes of the +borrows of `v` to compute the new value and to update the value at index `i` do +not overlap. + +The solution splitting the vector into parts makes use of fact that the standard +library provides a safe API for partitioning a vector. There are never two +mutable references to the same index in the vector. The standard library +contains many other similarly useful methods for operating on vectors, arrays, +and slices. +""" +id = "5c1a704a-bc85-470a-9443-07fca80d7d1f" diff --git a/src/idioms/placement_new.md b/src/idioms/placement_new.md new file mode 100644 index 0000000..e222326 --- /dev/null +++ b/src/idioms/placement_new.md @@ -0,0 +1,279 @@ +# Placement new + +
+ +Some of the statements about Rust in this chapter are dependent on the specifics +of how the compiler optimizes various programs. Unless otherwise state, the +results presented here are based on rustc 1.87 using the [2024 language +edition](https://doc.rust-lang.org/edition-guide/introduction.html). + +
+ +The primary purposes of placement new in C++ are + +- situations where [storage allocation is separate from + initialization](#custom-allocators-and-custom-containers) such as in the + implementation of `std::vector` or memory pools, +- situations where the structures need to be placed at a specific memory + location, e.g., for [working with memory-mapped + registers](#memory-mapped-registers-and-embedded-development), and +- [storage reuse for performance reasons](#performance-and-storage-reuse). + +You also might have ended up on this page looking for [how to construct large +values directly on the heap in Rust](#constructing-large-values-on-the-heap). + +There is an [open proposal](https://github.com/rust-lang/rfcs/pull/2884) for +adding the features analogous to placement new in Rust, but the design of the +features is still under discussion. In the meantime, for many of the use cases +of placement new, there are either alternatives in safe Rust or approaches that +use unsafe Rust that can accomplish the required behaviors. + +## Custom allocators and custom containers + +It is uncommon to use placement new for the first reason because the major use +cases are covered by using STL containers with custom allocators. Similarly, +Rust's standard libraries can be used with custom allocators. However, in Rust +the API for custom allocators is still +[unstable](https://github.com/rust-lang/rust/issues/32838), and so they are only +available when using the nightly compiler with [a feature +flag](https://doc.rust-lang.org/unstable-book/library-features/allocator-api.html). +The Rust Book has [instructions on how to install the nightly +toolchain](https://doc.rust-lang.org/book/appendix-07-nightly-rust.html#unstable-features) +and the The Rust Unstable Book has [instructions on how to use unstable +features](https://doc.rust-lang.org/unstable-book/). + +For stable Rust, there are libraries that cover many of the uses of allocators. +For example, [bumpalo](https://docs.rs/bumpalo/latest/bumpalo/) provides a safe +interface to a bump allocation arena, a [vector type using the +arena](https://docs.rs/bumpalo/latest/bumpalo/collections/vec/struct.Vec.html), +and other utility types using the arena. + +For implementing custom collection types that involves separate allocation and +initialization of memory, the chapters in the Rustonomicon on [implementing +`Vec`](https://doc.rust-lang.org/nomicon/vec/vec.html) are a useful resource. + +## Memory-mapped registers and embedded development + +If you are using Rust for embedded development, you may want to additionally +read the [Embedded Rust Book](https://docs.rust-embedded.org/book/). The +chapters on +[peripherals](https://docs.rust-embedded.org/book/peripherals/index.html) +discuss how to work with structures that are located at a specific address in +memory. + +The Embedded rust Book also includes [a chapter on advice for embedded C +programmers using Rust for embedded +development](https://docs.rust-embedded.org/book/c-tips/index.html). + +## Performance and storage reuse + +This use of placement new in C++ for the purpose of reusing storage can usually +be replaced in Rust by a simple assignment. Because [assignment in Rust is +always a move, and in Rust moves do not leave behind objects that require +destruction](./constructors/copy_and_move_constructors.md), the optimizer will +usually produce code analogous to placement new for this use case. In some +cases, this also depends on an [RVO or NRVO optimization](./rvo.md). While these +optimizations are not guaranteed, they are reliable enough for common coding +patterns, especially when combined with +[benchmarking](https://bheisler.github.io/criterion.rs/book/index.html) the +performance-sensitive code to confirm that the desired optimization was +performed. Additionally, the generated assembly for specific functions can be +examined using a tool like +[cargo-show-asm](https://github.com/pacak/cargo-show-asm). + +The Rust version of the following example relies on the optimizations [to +achieve the desired behavior][godbolt-storage-reuse]. + +[godbolt-storage-reuse]: https://godbolt.org/#z:OYLghAFBqd5TKALEBjA9gEwKYFFMCWALugE4A0BIEAZgQDbYB2AhgLbYgDkAjF%2BTXRMiAZVQtGIHgBYBQogFUAztgAKAD24AGfgCsp5eiyahSAVyVFyKxqiIEh1ZpgDC6embZMpATnLOAGQImbAA5TwAjbFIQADZyAAd0JWIHJjcPL19E5NShIJDwtiiY%2BJtsOzSRIhZSIgzPbx4/csqhatqiArDI6LjrGrqGrOaBzu6ikriASmt0M1JUTi4AUgAmAGYVgFYAIRxSAgA3bAgAEWwaFjN6Immds5WtAEFLczsAagDa4GwAdQImF%2BRA%2BKwA7Lsns8PjCPoCQB8LAQAF7YchQ8GPF4YzY7XZmJhKFg0U5MdAAfTYxmAjHu2yxzxoTA%2BmApAHcyABrCBshHrWJsMwg76kX4AoHYO6giFQ2EfQSkOFw5laAB0qp4Wi00shLzlcoAVGzQRszl8fv9AcCdbL9Xb4XD0Xq7dLHhtddCXQB6L2IlQfNm2u2WTAgEBIYJEMMRIyoTnkiLodQQfls%2B7uoOujFghlQpkfKnBCDTG3OmGMEGCkFKVCkFhEVBIE1mkViq2SsM4K43IjFlYZsss9lclNrAVCj41usNpDpj2YqFcWb0bjbfjeLg6cjobgAJQs1fmi2woM2fHIUc3S9mnJA2zWquksTB21iGwAHD5PzwZBtDNxpH4Ng7y0cgNy3HcuH4JQQFAy8dFmOBYBQDAcHwYgyEoag6EYVgOG4c9BGEMQJE4GQ5GEZQ1E0K9yH0NZDGpNB1khNZdmsbBbHsRwIGcYZvDWHh/CYTAJl6GJZCSFJuPSdxGhAQScmktIxOKPpZFaGSOiGOSskUzSqkGLpgh6NSJLGHTMgEoTLHGEzJnU2YlCPJZuHWDZglQDwcBNFxUBDLt%2B1wHEPKYLyzB8/sXBCQMNmC7EXjeMxPlbS0JRBcEPRhEMwxSVFyRBQF%2B3nHNioxF4iGwNgEiMSrfKIABPBJmHYE8ABUgqhbB1Eq0hmSOdBAQ%2BGMWDjXYkwgNrT1idQ53K54BqG1k/hHRbMA%2BA0p3rRsS0y20FQ%2BCAcpAPLsAKpV%2BzNLRioujYXA%2BTVrvdOEWJY3aZUHa4SA2tkIBiw6tpnEtUvFYFMtVIbLpenM5sHH0/RPQNBxGsaJrTMrBzZABaIKAD8QfbXtYc9BdsVKhLnkjAsWCLd6so%2BCQCGAVglAgRnmfQGgIAJ9LphLRtakzGEODYMgGrxU7Oe5i1QclOk3XptafubAsqrFjHPWWkd0YHZ5SeeJd/y4NcwO0fhIJcV62MnFyT3ctZ%2BHg69yCQbAWAOahlwAoCQNN2jIOg2CLzNvnyFve9H2fV8Py/Hwf2kP8Vy4DZ1zN7duEdkPyCQhB4AgFD0GqhhoiwiAMCLxgYlIHh3zBUCcN6mCIAiNOImCWoGoI/g29YUgGoAeQiXQKkvc9y44YR%2B6YehO9onAIjMYAXAkegYN4fgcCpExJDnghSBH45sDXrduoqIVli3SNOLT%2BgCAiOs%2B7cHA06IQ5gPX8gTlIRMVAuLeaWCKAK8swaBGGAEoAAagQbAbJ%2B7NQ3IReQJFJDkSIooFQGg076CEkYEwIBzCWEMHfGCkBZjoASDJNeWMsb3XIUQLGjATj0Eug7bcX9Dg4BIcWDiXE0hOBEvxAwgR7LiQMFJPIskrJiNyDJVSUwbKcQPu0IygiFG8OUXZQooibIqN0k0CyxktFmSkE5W2JijYm3AubbgHwCENgeqqWuqptQQHQiQRU7keDTEzsA7OKB5hEASEKUu5cEjF1IKEVq3A7GoAcU4k%2BhB3HwiEmg5BZFZBoKopg2i2DyBsjrAkLuXtjap39twfuQogkgk5rYg8sSeCOLBM4w6bgK7RFPBsLxPiEI3hANIJp2wNhrATrETUmp3zvm2O%2BI2gFyDAW2KBKx6coLWCDk7RCiB85oELmEyuISdnhJAMAHg2wUkMEbtQFutEe4dy7uQG5fdB7DzsHc8ezAiBTxnmneei9l70FXncze1Id5bkIPvSoJxj78FPqgc%2Bdyr5Jy3Lfe%2BHcn4X0dm/O5X8f7YD/sCwBWdQEsHAVAmBcDmB3NSeIFBGT5BZJolueijE8GoCtuxZFXCyEULSFQkMl1WVsRYpsD4WN%2B6sPQOwwER94BOUUW0bwvEBF6KESJORfQhLiJkqopSEi1UxDUUopg2l6jKoNfK41eqDC2UsvJHRmjTLyNMQsVyaximWLTpBD46h3yxCxrEaQHxgCoHqdsDUh03FkA6a67phstkBKqfstpESolcG9b6/1gbg0PVDeebAiSyDwgYlS0iUhaWUQwQyvQCk8kFKKRY0pEFymVInDUtNfqA1BpDWGiArTdntPtt44OwDZiu3dn0bhSdZnzMWR6jOqy4Ih16f01UgzhkbFGVocZkzplJxTn7RtKz1nFNYUsgOQ6emf2iCkRw0ggA%3D%3D + +
+ +```cpp +#include +#include + +struct LargeWidget { + std::size_t id; +}; + +template +extern void blackBox(T &x); + +void doWork(void *scratch) { + for (std::size_t i = 0; i < 100; i++) { + auto *w(new (scratch) LargeWidget{.id = i}); + // use w + blackBox(w); + w->~LargeWidget(); + } +} + +int main() { + alignas(alignof(LargeWidget)) char + memory[sizeof(LargeWidget)]; + void *w = memory; + doWork(w); +} +``` + +```rust +#[derive(Default)] +struct LargeWidget { + id: usize, +} + +fn do_work(w: &mut LargeWidget) { + for i in 0..100 { + *w = LargeWidget { id: i }; + // use w + std::hint::black_box(&w); + } +} + +fn main() { + let mut scratch = LargeWidget::default(); + do_work(&mut scratch); +} +``` + +
+ +Adding in a `Drop` implementation for `LargeWidget` does result in the drop +function being called on each loop iteration, but makes the generated assembly +much harder to read, and so has been omitted from the example. + +## Constructing large values on the heap + +`new` in C++ constructs objects directly in dynamic storage, and placement `new` +constructs them directly in the provided location. In Rust, `Box::new` is a +normal function, so the value is constructed on the stack and then moved to the +heap (or to the storage provided by the custom allocator). + +While the initial construction of the value on the stack can sometimes be +optimized away, in order to guarantee that the stack is not used for the large +value requires the use of unsafe Rust and `MaybeUninit`. Additionally, the +mechanisms available for initializing a value on the heap do not guarantee that +the values will not be created on the stack and then moved to the heap. Instead, +they just make it possible to incrementally initialize a structure (either +field-by-field or element-by-element), so that the entire structure does not +have to be on the stack at once. The same optimizations do apply, however, and +so the additional copies might be avoided. + +
+ +```cpp +#include +#include + +int main() { + constexpr unsigned int SIZE = 8000000; + std::unique_ptr b = std::make_unique< + std::array>(); + for (std::size_t i; i < SIZE; ++i) { + (*b)[i] = 42; + } + + // use b so that it isn't optimized away + for (std::size_t i; i < SIZE; ++i) { + std::cout << (*b)[i] << std::endl; + } +} +``` + + + +```rust,no_run +fn main() { + const SIZE: usize = 8_000_000; + + // optimization here makes it not overflow + // the stack with opt-level=2 + let mut b = Box::new([0; SIZE]); + for i in 0..SIZE { + b[i] = 42; + } + + // use b so that it isn't optimized away + std::hint::black_box(&b); +} +``` + +
+ +On the other hand, directly defining the array as `[42; SIZE]` does result in +the value being first constructed on the stack, which produces an error when +run. + +```rust,no_run +fn main() { + const SIZE: usize = 8_000_000; + + let b = Box::new([42; SIZE]); + + // use b so that it isn't optimized away + std::hint::black_box(&b); +} +``` + +```text +thread 'main' has overflowed its stack +fatal runtime error: stack overflow +Aborted (core dumped) +``` + +While construction of the values directly on the heap is not possible to +enforce, it is possible to incrementally construct the value by using unsafe +Rust, which avoids overflowing the stack. This technique relies on both +[`MaybeUninit`](https://doc.rust-lang.org/std/mem/union.MaybeUninit.html) and +[`addr_of_mut!`](https://doc.rust-lang.org/std/ptr/macro.addr_of_mut.html). + +```rust,no_run +fn main() { + const SIZE: usize = 8_000_000; + let mut b = Box::<[i32; SIZE]>::new_uninit(); + let bptr = b.as_mut_ptr(); + for i in 0..SIZE { + unsafe { + std::ptr::addr_of_mut!(((*bptr)[i])).write(42); + } + } + + let b2 = unsafe { b.assume_init() }; + + for i in 0..SIZE { + println!("{}", b2[i]); + } +} +``` + +Depending on what is need, this particular use can be generalized. + +```rust +fn init_with( + f: impl Fn(usize) -> T, +) -> Box<[T; SIZE]> { + let mut b = Box::<[T; SIZE]>::new_uninit(); + let bptr = b.as_mut_ptr(); + for i in 0..SIZE { + unsafe { + std::ptr::addr_of_mut!(((*bptr)[i])) + .write(f(i)); + } + } + + unsafe { b.assume_init() } +} +``` + +Note that a more idiomatic way to deal with a large array on the heap is to +represent it as either a boxed slice or a vector instead of a boxed array, in +which case using iterators to define the value avoids constructing it on the +stack, and does not require the use of unsafe Rust. + +```rust +fn init_with( + f: impl Fn(usize) -> T, +) -> Box<[T]> { + (0..SIZE).map(f).collect() +} +``` diff --git a/src/idioms/raii.md b/src/idioms/raii.md deleted file mode 100644 index a9ce14a..0000000 --- a/src/idioms/raii.md +++ /dev/null @@ -1 +0,0 @@ -# RAII diff --git a/src/idioms/rvo.md b/src/idioms/rvo.md new file mode 100644 index 0000000..d945584 --- /dev/null +++ b/src/idioms/rvo.md @@ -0,0 +1,184 @@ +# NRVO and RVO + +
+ +Some of the statements about Rust in this chapter are dependent on the specifics +of how the compiler optimizes various programs. Unless otherwise state, the +results presented here are based on rustc 1.87 using the [2024 language +edition](https://doc.rust-lang.org/edition-guide/introduction.html) and with +`-O2` for C++ and `--opt-level=2` for Rust. + +
+ +Unlike C++, Rust does not guarantee return value optimization (RVO). Neither +language guarantees named return value optimization (NRVO). However, RVO and +NRVO are usually applied in Rust where they would be in C++. + +## RVO + +The pattern where RVO and NRVO are likely most important is in static factory +methods (which Rust calls [constructor methods](./constructors.md)). In the +following example, C++17 and later guarantee RVO. In Rust the optimization is +performed reliably, but is not guaranteed. + +
+ +```cpp +struct Widget { + signed char x; + double y; + long z; +}; + +Widget make(signed char x, double y, long z) { + return Widget{x, y, z}; +} +``` + +```rust +struct Widget { + x: i8, + y: f64, + z: i64, +} + +impl Widget { + fn new(x: i8, y: f64, z: i64) -> Self { + Widget { x, y, z } + } +} +``` + +
+ +One can see in [the assembly][rvo-godbolt] that for both programs the value is +written directly into the destination provided by the caller. + +
+ +```asm +// C++ +make(signed char, double, long): + mov BYTE PTR [rdi], sil + mov rax, rdi + mov QWORD PTR [rdi+16], rdx + movsd QWORD PTR [rdi+8], xmm0 + ret +``` + +```asm +// Rust +new: + mov rax, rdi + mov byte ptr [rdi + 16], sil + movsd qword ptr [rdi], xmm0 + mov qword ptr [rdi + 8], rdx + ret +``` + +
+ +[rvo-godbolt]: https://godbolt.org/#z:OYLghAFBqd5TKALEBjA9gEwKYFFMCWALugE4A0BIEAZgQDbYB2AhgLbYgDkAjF%2BTXRMiAZVQtGIHgBYBQogFUAztgAKAD24AGfgCsp5eiyahSAVyVFyKxqiIEh1ZpgDC6embZMQANnLOAGQImbAA5TwAjbFIpAFZyAAd0JWIHJjcPL19E5NShIJDwtiiYnnibbDs0kSIWUiIMz28/CqqhGrqiArDI6LjrWvrGrJbBruCe4r6ygEprdDNSVE4uS3M7AGoAdQJMYGwiDYBSAHYAISOtAEEN243NDYIADnJLm7uATxANmh9ZN7uGwAXt8CH9XtdTgARN5vAhsBL0ba7faHU4Xa6Ao4AJgAzEdYmczEwlCwaNgIEx0AB9NjGYCMGYEmGYu40JgbEIAdwgD2e5A2Xx%2B4OBoL%2BMw2AFojrjcMi9gdjucAYC7jsFWjlazVar1BD3jrAR99YbVUCTYboSrblbIScWVcuHN6NxYvxvFwdOR0NwAEoWQ5KBZLbDHPF8chEbROuYAaxAsWxADppD4TrEfLingBOHM8GS4wzcaT8NgJrTkD1en1cfhKEAVqOep3kOCwFAYHD4YhkSjUOiMVgcbgRwTCMQSTgyOTCZRqTTN8j6bGGeloHEXbFnazYWz2RwQZzDbwrwITIolEDZnIpffpdxNEArpK3tLdC99a%2BtO8dIYPrI8Due7VGM769KUAydMeBiWJ0YFTKUcxBosyzcGsZibOqqJKhiBobCkwAhJgGyoEgdT3DKuGApgCwRIwgqUdaGz0EIwDAoxdowriuFvNg6hENEHJYYqdKxhSBFESRZGkPcAo0WYdGhsazGscCErokxpAHIsQkogc6J6oKApAla3GwvaLYulwbqVtG3rcC4G4bvhwbLGGuLYvwTY6DMcxINgLA4DEEDOsWpblrZi41nWDaRtGvnkPGiYpmmGZZrm2b5tIhZWbi7p2dFcXNnMbYIPAEAdugCIMNEfYQBg1WMDEpA8E8JwVgOAmkPWEARHZETBHUXy8PwA2sKQHwAPIRLolRNhGDUcMIk1MPQw1ejgERmMALgSPQ9YjeQOB0iYkiLoQWlVAAbtgB1enxlRmAJdnBAJVlevQBARKQQ1uDgdlEKQ8IjvwN2kBEyTYFC2AnQywSgMVAhGMASgAGoENgXKTQkzAgzOojiJI05jooKgaHZ%2BiAUYJggOYliGF99aQHM6AJHeB2SpKLgbKzRCSowN30DKUKed6YNAzgTMhUBc1pE4TCuP%2B3g8KeCvwZefgvnk96ZMrz65He6t9C0u6y%2B0YzQSrMttEwv7jIU4HZLBf661IK7O/bkwa0hrlTqF1n5VF3AbHTRCoBsPBJm1SZaBsEDdiQMk4riPAzF58WtigCxEAkT11Q1CQ1aQoTsCsofh5H0f3YQie7AYJMTkTsgk3O5OLpT5Bcj9CQg/7NlVvwNaTU9ueHOgNAhwGFdRycMdx24jXRO5qfp8VcYgNIs%2BxB52U%2BDwWj708TyxC8VkluQZaxBWA/2bW1ixd5LalRVaBVYXTX52/RcgMAZSAZ10Qep9UXGNIaeNQETWmrNOweNFrMCICtNadlNrbV2vQfaeNjr0jOhtAgl17A3TuvwB6qAnorAjK9XcdlPrfV%2BlgFYXpAbA0OmDCGKhoaw0%2BjTRGNBkZowxljHGHpRzyEblOZu8hW4Li9MuVcNNUBOS3AzCIUsWZszSBzSwmBhYKK3BuPEUpJqi3QOLXYt14BIVNjbeWitXaATPA7BCBgtZ3ktjebWRsILfhAlBJWMErE/lAueR2gEPZuI9p4qQPsUKcGxH3QO1Zg7qCeD4SUfwNjAFQBXWISYeBxwTmQdycTV4%2BUziAbOo9P6L2LqXbgyTUnpMydk3JxCa5kDriuBuhNxH4ykRTJ8ndu69yLAHSKiSuDDxzk9HmE96lpOkBkrJEccl5IgAvd%2BS9k7FKKj5PyAUgrUH9ufS%2B18CrcBio2eK69N5Jm3tiXe%2B9D7H1PtwPKYzB7nJ2TGEZosb6FUfglMGKRHDSCAA%3D%3D + +## NRVO + +NRVO isn't guaranteed in either C++ or Rust, but the optimization often triggers +in cases where it is commonly desired. For example, in both C++ and Rust +when creating an array, initializing its contents, and then returning it, the +initialization assigns directly to the return location. + +
+ +```cpp +#include + +std::array make() { + std::array v; + for (int i = 0; i < 10; i++) { + v[i] = i; + } + return v; +} +``` + +```rust +#[unsafe(no_mangle)] +fn new() -> [i32; 10] { + let mut v = [0; 10]; + for i in 0..10 { + v[i] = i as i32; + } + v +} +``` + +
+ +The [generated assembly][nrvo-godbolt] for the two versions of the program are +nearly identical, and both construct the array directly in the return location. + +
+ +```asm +// C++ +make(): + movdqa xmm0, XMMWORD PTR .LC0[rip] + mov rdx, QWORD PTR .LC2[rip] + mov rax, rdi + movups XMMWORD PTR [rdi], xmm0 + movdqa xmm0, XMMWORD PTR .LC1[rip] + mov QWORD PTR [rdi+32], rdx + movups XMMWORD PTR [rdi+16], xmm0 + ret +.LC0: + .long 0 + .long 1 + .long 2 + .long 3 +.LC1: + .long 4 + .long 5 + .long 6 + .long 7 +.LC2: + .long 8 + .long 9 +``` + +```asm +// Rust +.LCPI0_0: + .long 0 + .long 1 + .long 2 + .long 3 +.LCPI0_1: + .long 4 + .long 5 + .long 6 + .long 7 +new: + mov rax, rdi + movaps xmm0, xmmword ptr [rip + .LCPI0_0] + movups xmmword ptr [rdi], xmm0 + movaps xmm0, xmmword ptr [rip + .LCPI0_1] + movups xmmword ptr [rdi + 16], xmm0 + movabs rcx, 38654705672 + mov qword ptr [rdi + 32], rcx + ret +``` + +
+ +[nrvo-godbolt]: https://godbolt.org/#z:OYLghAFBqd5TKALEBjA9gEwKYFFMCWALugE4A0BIEAZgQDbYB2AhgLbYgDkAjF%2BTXRMiAZVQtGIHgBYBQogFUAztgAKAD24AGfgCsp5eiyahSAVyVFyKxqiIEh1ZpgDC6embZMDzgDIEmbAA5TwAjbFIQAE5yAAd0JWIHJjcPLwN4xPshf0CQtnDImJtsO2SRIhZSIlTPbx5rbFtspgqqolzgsIjo60rq2vSGy3bO/MLogEprdDNSVE4uAFIAJgBmJYBWACEzJiUWGmwIJnQAfTZjYEZJrYARJa0AQRomAGpAgHcISbeAWiWa1wby22wIaxWgO2bx4WnuIIA7NtHk83mi3owiG82GYsQA3EFrO4gnZwtbQ2H3KEo9FvQSkN4ERnvLQAOlZsMRyOetNpeNBBHhgOJTJYSkZEOpPPRSwRD2laP5z1l8qeXGm9G4m343i4OnI6G4ACULFilLN5tgQes%2BOQiNp1dMANYgTYrVnSABsCM2nrWAA4ooGeDI1oZuNJ%2BGxXVpyLr9YauPwlCBY/a9eryHBYCgMDh8MQyJRqHRGKwONxbYJhGIJJwZHJhMo1JoM%2BR9CtDFc0KtkStto1msknExXO46iA1ptyH4Al0Cj1p5kkkJBt4p3EEiumGNupFpyUykI2gNx%2BkN4eWieOnPxou%2Bu015OD/0b3k967pua5gtuKs1gEqAeDghIuFUpAsAAnoCuAoiiliYCAIDgVBgIuAEVgwmSwKXE6xy/LK3KouiCFISh0FrOhwjkFhMFvPy5I0ui9JvBAGGMoSxJktCTJobR5KMr2vYEUiTG8vRApCkSgmMQqiKqrSpDYEQczvAxREqpmmpcNqcYOga3AuEJ/ZvN%2BlrWhC/DpjokzTEg2AsDgkQ/OGXCRuQ0abLG8b8ImyapnaDq2eQLpuh63q%2BgGQZRCG0hhtpaw6vpfmBRm0zZgg8AQLm6BsLEDARMWEAYHlBWRKQPD%2BgisalkQEQphAoT6aEARVJBlb8C1rCkJBADyoS6KU6a2iVHDCL1TD0O1bY4KEZjAGB9D0CmvD8DglwmJIM0EEpZR4tgK36tg6ilLiiz6hhTT6fQBChBBPVuDg%2BlEKQBDRqt5D7aQoQJNgdzYBt1wBKAaUCEYwBKAAagQ2CfL1sTMB1jaiOIkgNtWigqBo%2Bn6A0RgmCA5iWIYt0ppA0zoLELQrX8fwuG8lNEH8jD7fQworL5X2vTgZMuZew4QM4T6ejOo67gukQNsuLTC5uWTJOLEwNvzx6vrLKutK%2Bis9Mratnt4IsjNU2uS9IX4Wr%2BPAalqSVtombxE0QqAwqyVWslorEFiQDL/lbVlBVmKCzEQsS4kVJX5YwpBBOwiyO87PCuwi7v8NghDewQiENBjtZo7IGPNtjba4%2BQnwQbEHXWzptsJtwvW4qHWLoDQDumgnSfu6xbilVHFl%2B6lNnOiA0jJ5sEJxZ6sKwv6/qbP6rnuZ53nJdw/lpgHGXZWguWR4VVDFTvZUgMAPCbNnDB1aQDVNW2XVtUjd89f1g12Ejo3MEQE1Tfps3zYty1I3WlcLa%2BpCC7XsPtQ6qcTqoDOkjS62l9Q3Tum1R650rKvXeraL6P0VD/UBjdAmoMaDgyhjDOGCNdRVnkLnes%2Bd5CF1bPqDsXYCaoGMgOZBvMKZU2SDTBCwp2H9l7Osf4vUOYGi5pnA68AvxNCGgLIW%2BsQAIlFpgE2IARbS2SE%2BVR2ici3g/IbeRR5NaPmUaojW14NGGz1mkbwlitaGIlpo82P5OArCrrpHyBkuBvHUP6T0fxPTSDeMAVACdNgck9unMgFlPH%2B1BlvYOjdw6HyjjHCsXAAlBJCWEiJMIom2jToWbmIBOw51RnQ5GjCcblNLuXSurlvEry4PXEOuIGYtxycE0J4TInRIgN3XePt1gJIHo6cg9lHI9BctpReMY9J21XtYAK1lJkuhHqyMeKwJ5Ty0DPOerlEpLNrkmCZwVtISJ8SldZwUvqJEcNIIAA%3D%3D + +## Determining whether the optimization occurred + +Tools like [cargo-show-asm](https://github.com/pacak/cargo-show-asm) can be used +to show the assembly for individual functions in order to confirm that RVO or +NRVO was applied where desired. + +There are also high-quality [benchmarking +tools](https://bheisler.github.io/criterion.rs/book/index.html) for Rust, which +can be used to ensure that changes do not unexpectedly result in worse +performance. diff --git a/src/idioms/type_equivalents.md b/src/idioms/type_equivalents.md index 4631fd9..fcc6b5c 100644 --- a/src/idioms/type_equivalents.md +++ b/src/idioms/type_equivalents.md @@ -210,18 +210,18 @@ handling when a type is `void`. The following table maps the ownership-managing classes from C++ to equivalents types in Rust. -| Use | C++ type | Rust type | -|-----------------------------------------------------------|-------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Owned | `T` | `T` | -| Single owner, dynamic storage | `std::unique_ptr` | `Box` | -| Shared owner, dynamic storage, immutable, not thread-safe | `std::shared_ptr` | `std::rc::Rc` | -| Shared owner, dynamic storage, immutable, thread-safe | `std::shared_ptr` | `std::sync::Arc` | -| Shared owner, dynamic storage, mutable, not thread-safe | `std::shared_ptr` | [`std::rc::Rc>`](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html#having-multiple-owners-of-mutable-data-by-combining-rct-and-refcellt) | -| Shared owner, dynamic storage, mutable, thread-safe | `std::shared_ptrM` with a `std::mutex` | [`std::sync::Arc>`](https://doc.rust-lang.org/book/ch16-03-shared-state.html) | -| Const reference | `const &T` | `&T` | -| Mutable reference | `&T` | `&mut T` | -| Const observer pointer | `const *T` | `&T` | -| Mutable observer pointer | `*T` | `&mut T` | +| Use | C++ type | Rust type | +|-----------------------------------------------------------|------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Owned | `T` | `T` | +| Single owner, dynamic storage | `std::unique_ptr` | `Box` | +| Shared owner, dynamic storage, immutable, not thread-safe | `std::shared_ptr` | `std::rc::Rc` | +| Shared owner, dynamic storage, immutable, thread-safe | `std::shared_ptr` | `std::sync::Arc` | +| Shared owner, dynamic storage, mutable, not thread-safe | `std::shared_ptr` | [`std::rc::Rc>`](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html#having-multiple-owners-of-mutable-data-by-combining-rct-and-refcellt) | +| Shared owner, dynamic storage, mutable, thread-safe | `std::shared_ptr` with a `std::mutex` | [`std::sync::Arc>`](https://doc.rust-lang.org/book/ch16-03-shared-state.html) | +| Const reference | `const &T` | `&T` | +| Mutable reference | `&T` | `&mut T` | +| Const observer pointer | `const *T` | `&T` | +| Mutable observer pointer | `*T` | `&mut T` | In C++, the thread safety of `std::shared_ptr` is more nuanced than it appears in this table (e.g., some uses may require `std::atomic`). However, in safe Rust diff --git a/src/idioms/varargs.md b/src/idioms/varargs.md deleted file mode 100644 index f35a073..0000000 --- a/src/idioms/varargs.md +++ /dev/null @@ -1 +0,0 @@ -# Varargs diff --git a/src/optimizations/rvo_and_placement_new.md b/src/optimizations/rvo_and_placement_new.md deleted file mode 100644 index 46c0684..0000000 --- a/src/optimizations/rvo_and_placement_new.md +++ /dev/null @@ -1 +0,0 @@ -# NRVO, RVO, and placement new diff --git a/src/patterns/adapter.md b/src/patterns/adapter.md index 59f19e2..f936028 100644 --- a/src/patterns/adapter.md +++ b/src/patterns/adapter.md @@ -112,3 +112,5 @@ fn main() { The `map` method returns a different type than `iter`, but `middle` can be called on the result of either one. + +{{#quiz adapter.toml}} diff --git a/src/patterns/adapter.toml b/src/patterns/adapter.toml new file mode 100644 index 0000000..6a864d2 --- /dev/null +++ b/src/patterns/adapter.toml @@ -0,0 +1,43 @@ +[[questions]] +type = "MultipleChoice" +prompt.prompt = """ +Consider a situation where you are working with a type `Circle` and a trait +`Shape`. The type and trait are each defined in one of: + +- the current crate, +- a dependency crate `crate_a` of the current crate, or +- a depedency `crate_b` of the current crate. + +For which situations is the adapter pattern necessary in Rust? +""" +prompt.distractors = [ +""" +`Circle` and `Shape` are both defined in the current crate. +""", +""" +`Circle` is defined in the current crate, and and `Shape` is defined in `crate_a`. +""", +""" +`Shape` is defined in the current crate, and and `Circle` is defined in `crate_a`. +""" +] +answer.answer = [ +""" +`Circle` and `Shape` are both defined in `crate_a`. +""", +""" +`Circle` is defined in `crate_a`, and `Shape` is defined in `crate_b`. +""" +] +context = """ +The apdapter pattern is needed when the orphan rule prevents implementing a +trait for a type. + +The [orphan +rule](https://doc.rust-lang.org/reference/items/implementations.html#orphan-rules) +says that a at least one of the trait or a type in the trait implementation +needs to be defined in the current crate. When at least one of `Shape` or +`Circle` is defined in the current crate, the orphan rule does not prevent the +implementation. +""" +id = "b6394504-9fdb-49c8-88c0-046c42f74287" diff --git a/src/patterns/pimpl.md b/src/patterns/pimpl.md index c96ba11..22f75cc 100644 --- a/src/patterns/pimpl.md +++ b/src/patterns/pimpl.md @@ -1 +1,17 @@ -# Pointer-to-implementation (PImpl) +# Pointer-to-implementation (PIMPL) + +The [PIMPL pattern](https://en.cppreference.com/w/cpp/language/pimpl.html) in +C++ is usually used for the purpose of improving compilation times by removing +implementation details from the ABI of a translation unit. It also can be used +to hide implementation details that otherwise would be exposed in a header file. + +In Rust, the unit of separate compilation is the crate, rather than the file or +module. Within a crate, the compiler minimizes compilation times via incremental +compilation, rather than via separate compilation. Between crates, there is no +guarantee of Rust-native ABI stability, so if an upstream crate changes, +downstream crates need to be recompiled. Thus, for performance purposes, the +PIMPL pattern does not apply. + +For the hiding of implementation details, [instead of excluding details from a +header file, modules can be used to control +visibility](../idioms/encapsulation.md). diff --git a/src/patterns/visitor.md b/src/patterns/visitor.md new file mode 100644 index 0000000..34adff3 --- /dev/null +++ b/src/patterns/visitor.md @@ -0,0 +1,754 @@ +# Visitor pattern and double dispatch + +In C++ the visitor pattern is typically used to enable adding behaviors to a +type without modifying the class definitions. In Rust, the same goal is +conventionally accomplished by using Rust enums, which resemble C++ [tagged +unions](../idioms/data_modeling/tagged_unions.md). While the chapter on tagged +unions compares using Rust enums with C++ `std::variant`, this chapter [compares +using the visitor pattern in C++ with using Rust +enums](#use-a-rust-enum-instead). + +Since the visitor pattern and double dispatch may be useful for other purposes +as well, a [Rust visitor pattern version of the example](#visitors) is also +given. + +Extensions of the visitor pattern are sometimes used in C++ to make it possible +to extend both data and behavior without modifying the original definitions +(i.e., to solve [the expression +problem](https://cs.brown.edu/~sk/Publications/Papers/Published/kff-synth-fp-oo/)). +Other approaches, enabled by Rust's traits and generics, are [more likely to be +used in Rust](#varying-data-and-behavior). + +## Use a Rust enum instead + +For the first case, where the variants are fixed but behaviors are not, the +idiomatic approach in Rust is to implement the data structure as an enum instead +of as many structs with a common interface. This is similar to using +`std::variant` in C++. + +
+ +```cpp +#include +#include +#include +#include +#include + +// Declare types that visitor can visit +class Lit; +class Plus; +class Var; +class Let; + +// Define abstract class for visitor +struct Visitor { + virtual void visit(Lit &e) = 0; + virtual void visit(Plus &e) = 0; + virtual void visit(Var &e) = 0; + virtual void visit(Let &e) = 0; + virtual ~Visitor() = default; + +protected: + Visitor() = default; +}; + +// Define abstract class for expressions +struct Exp { + virtual void accept(Visitor &v) = 0; + virtual ~Exp() = default; +}; + +// Implement each expression variant +struct Lit : public Exp { + int value; + + Lit(int value) : value(value) {} + + void accept(Visitor &v) override { + v.visit(*this); + } +}; + +struct Plus : public Exp { + std::unique_ptr lhs; + std::unique_ptr rhs; + + Plus(std::unique_ptr lhs, + std::unique_ptr rhs) + : lhs(std::move(lhs)), rhs(std::move(rhs)) { + } + + void accept(Visitor &v) override { + v.visit(*this); + } +}; + +struct Var : public Exp { + std::string name; + + Var(std::string name) : name(name) {} + + void accept(Visitor &v) override { + v.visit(*this); + } +}; + +struct Let : public Exp { + std::string name; + std::unique_ptr exp; + std::unique_ptr body; + + Let(std::string name, std::unique_ptr exp, + std::unique_ptr body) + : name(std::move(name)), + exp(std::move(exp)), + body(std::move(body)) {} + + void accept(Visitor &v) override { + v.visit(*this); + } +}; + +// Define Visitor for evaluating expressions + +// Exception for representing expression +// evaluation errors +struct UnknownVar : std::exception { + std::string name; + + UnknownVar(std::string name) : name(name) {} + + const char *what() const noexcept override { + return "Unknown variable"; + } +}; + +// Define type for evaluation environment +using Env = std::unordered_map; + +// Define evaluator +struct EvalVisitor : public Visitor { + // Return value. Results propagate up the stack. + int value = 0; + + // Evaluation environment. Changes propagate + // down the stack + Env env; + + // Define behavior for each case of the + // expression. + void visit(Lit &e) override { value = e.value; } + void visit(Plus &e) override { + e.lhs->accept(*this); + auto lhs = value; + e.rhs->accept(*this); + auto rhs = value; + value = lhs + rhs; + } + void visit(Var &e) override { + try { + value = env.at(e.name); + } catch (std::out_of_range &ex) { + throw UnknownVar(e.name); + } + } + void visit(Let &e) override { + e.exp->accept(*this); + auto orig_env = env; + env[e.name] = value; + e.body->accept(*this); + env = orig_env; + } +}; + +int main() { + // Construct an expression + auto x = Plus(std::make_unique( + std::string("x"), + std::make_unique(3), + std::make_unique( + std::string("x"))), + std::make_unique(2)); + + // Construct the evaluator + EvalVisitor visitor; + + // Run the evaluator + x.accept(visitor); + + // Print the output + std::cout << visitor.value << std::endl; +} +``` + +```rust +use std::collections::HashMap; + +// Define expressions. +// +// This covers the first 3 sections of the +// C++ version. +enum Exp { + Var(String), + Lit(i32), + Plus { + lhs: Box, + rhs: Box, + }, + Let { + var: String, + exp: Box, + body: Box, + }, +} + +// Exception for representing expression +// evaluation errors +#[derive(Debug)] +enum EvalError<'a> { + UnknownVar(&'a str), +} + +// Define type for evaluation environment +type Env<'a> = HashMap<&'a str, i32>; + +// Define evaluator +fn eval<'a>( + env: &Env<'a>, + e: &'a Exp, +) -> Result> { + match e { + Exp::Var(x) => env + .get(x.as_str()) + .cloned() + .ok_or(EvalError::UnknownVar(x)), + Exp::Lit(n) => Ok(*n), + Exp::Plus { lhs, rhs } => { + let lv = eval(env, lhs)?; + let rv = eval(env, rhs)?; + Ok(lv + rv) + } + Exp::Let { var, exp, body } => { + let val = eval(env, exp)?; + let mut env = env.clone(); + env.insert(var, val); + eval(&env, body) + } + } +} + +fn main() { + use Exp::*; + + // Construct an expression + let e = Let { + var: "x".to_string(), + exp: Box::new(Lit(3)), + body: Box::new(Plus { + lhs: Box::new(Var("x".to_string())), + rhs: Box::new(Lit(2)), + }), + }; + + // Run the evaluator + let res = eval(&HashMap::new(), &e); + + // Print the output + println!("{:?}", res); +} +``` + +
+ +## Visitors + +If the visitor pattern is still needed for some reason, it can be implemented +similarly to how it is in C++. This can make direct ports of programs that use +the visitor pattern more feasible. However, the enum-based implementation should +still be preferred. + +The following example shows how to implement the same program as in the previous +example, but using a visitor in Rust. The C++ program is identical to the +previous one. + +The example also demonstrates using double dispatch with trait objects in Rust. +The expressions are represented as `dyn Exp` trait objects which accept a `dyn +Visitor` trait object, and then call on the visitor the method specific to the +type of expression. + +
+ +```cpp +#include +#include +#include +#include +#include + +// Declare types that visitor can visit +class Lit; +class Plus; +class Var; +class Let; + +// Define abstract class for visitor +struct Visitor { + virtual void visit(Lit &e) = 0; + virtual void visit(Plus &e) = 0; + virtual void visit(Var &e) = 0; + virtual void visit(Let &e) = 0; + virtual ~Visitor() = default; + +protected: + Visitor() = default; +}; + +// Define abstract class for expressions +struct Exp { + virtual void accept(Visitor &v) = 0; + virtual ~Exp() = default; +}; + +// Implement each expression variant +struct Lit : public Exp { + int value; + + Lit(int value) : value(value) {} + + void accept(Visitor &v) override { + v.visit(*this); + } +}; + +struct Plus : public Exp { + std::unique_ptr lhs; + std::unique_ptr rhs; + + Plus(std::unique_ptr lhs, + std::unique_ptr rhs) + : lhs(std::move(lhs)), rhs(std::move(rhs)) { + } + + void accept(Visitor &v) override { + v.visit(*this); + } +}; + +struct Var : public Exp { + std::string name; + + Var(std::string name) : name(name) {} + + void accept(Visitor &v) override { + v.visit(*this); + } +}; + +struct Let : public Exp { + std::string name; + std::unique_ptr exp; + std::unique_ptr body; + + Let(std::string name, std::unique_ptr exp, + std::unique_ptr body) + : name(std::move(name)), + exp(std::move(exp)), + body(std::move(body)) {} + + void accept(Visitor &v) override { + v.visit(*this); + } +}; + +// Define Visitor for evaluating expressions + +// Exception for representing expression +// evaluation errors +struct UnknownVar : std::exception { + std::string name; + + UnknownVar(std::string name) : name(name) {} + + const char *what() const noexcept override { + return "Unknown variable"; + } +}; + +// Define type for evaluation environment +using Env = std::unordered_map; + +// Define evaluator +struct EvalVisitor : public Visitor { + // Return value. Results propagate up the stack. + int value = 0; + + // Evaluation environment. Changes propagate + // down the stack + Env env; + + // Define behavior for each case of the + // expression. + void visit(Lit &e) override { value = e.value; } + void visit(Plus &e) override { + e.lhs->accept(*this); + auto lhs = value; + e.rhs->accept(*this); + auto rhs = value; + value = lhs + rhs; + } + void visit(Var &e) override { + try { + value = env.at(e.name); + } catch (std::out_of_range &ex) { + throw UnknownVar(e.name); + } + } + void visit(Let &e) override { + e.exp->accept(*this); + auto orig_env = env; + env[e.name] = value; + e.body->accept(*this); + env = orig_env; + } +}; + +int main() { + // Construct an expression + auto x = Plus(std::make_unique( + std::string("x"), + std::make_unique(3), + std::make_unique( + std::string("x"))), + std::make_unique(2)); + + // Construct the evaluator + EvalVisitor visitor; + + // Run the evaluator + x.accept(visitor); + + // Print the output + std::cout << visitor.value << std::endl; +} +``` + +```rust +// This is NOT an idiomatic translation. The +// previous example using Rust enums is. + +use std::collections::HashMap; + +// Define types that the visitor can visit +struct Lit(i32); +struct Plus { + lhs: Box, + rhs: Box, +} +struct Var(String); +struct Let { + name: String, + exp: Box, + body: Box, +} + +// Define trait for expressions +trait Exp { + // Much like C++ can't have virtual template + // methods, Rust can't have trait objects + // where the traits have generic methods. + // + // Therefore the visitor either has to be + // mutable to collect the results or the + // accept method has to be specialized to a + // specific return type. + fn accept<'a>(&'a self, v: &mut dyn Visitor<'a>); +} + +// Define trait for the visitor +trait Visitor<'a> { + fn visit_lit(&mut self, e: &'a Lit); + fn visit_plus(&mut self, e: &'a Plus); + fn visit_var(&mut self, e: &'a Var); + fn visit_let(&mut self, e: &'a Let); +} + +// Implement accept behavior for each expression variant +impl Exp for Lit { + fn accept<'a>(&'a self, v: &mut (dyn Visitor<'a>)) { + v.visit_lit(self); + } +} + +impl Exp for Plus { + fn accept<'a>(&'a self, v: &mut dyn Visitor<'a>) { + v.visit_plus(self); + } +} + +impl Exp for Var { + fn accept<'a>(&'a self, v: &mut dyn Visitor<'a>) { + v.visit_var(self); + } +} + +impl Exp for Let { + fn accept<'a>(&'a self, v: &mut dyn Visitor<'a>) { + v.visit_let(self); + } +} + +// Define Visitor for evaluating expressions + +// Error for representing expression evaluation +// errors. +// +// Has a lifetime parameter beacause it borrows +// the name from the expression. +#[derive(Debug)] +enum EvalError<'a> { + UnknownVar(&'a str), +} + +// Define type for evaluation environment +// +// Has a lifetime parameter because it borrows +// the names from the expression. +type Env<'a> = HashMap<&'a str, i32>; + +// Define the evaluator +struct EvalVisitor<'a> { + // Return value. Results propagate up the stack. + env: Env<'a>, + + // Evaluation environment. Changes propagate + // down the stack + value: Result>, +} + +// Define behavior for each case of the +// expression. +impl<'a> Visitor<'a> for EvalVisitor<'a> { + fn visit_lit(&mut self, e: &'a Lit) { + self.value = Ok(e.0); + } + + fn visit_plus(&mut self, e: &'a Plus) { + e.lhs.accept(self); + let Ok(lv) = self.value else { + return; + }; + e.rhs.accept(self); + let Ok(rv) = self.value else { + return; + }; + self.value = Ok(lv + rv); + } + + fn visit_var(&mut self, e: &'a Var) { + self.value = self + .env + .get(e.0.as_str()) + .ok_or(EvalError::UnknownVar(&e.0)) + .copied(); + } + + fn visit_let(&mut self, e: &'a Let) { + e.exp.accept(self); + let Ok(val) = self.value else { + return; + }; + let orig_env = self.env.clone(); + self.env.insert(e.name.as_ref(), val); + e.body.accept(self); + self.env = orig_env; + } +} + +fn main() { + // Construct an expression + let x = Plus { + lhs: Box::new(Let { + name: "x".to_string(), + exp: Box::new(Lit(3)), + body: Box::new(Var("x".to_string())), + }), + rhs: Box::new(Lit(2)), + }; + + // Construct the evaluator + let mut visitor = EvalVisitor { + value: Ok(0), + env: HashMap::new(), + }; + + // Run the evaluator + x.accept(&mut visitor); + + // Print the output + println!("{:?}", visitor.value); +} +``` + +
+ +## Varying data and behavior + +In C++, extensions to the visitor pattern are sometimes used to handle +situations where both data and behavior and vary. However, those solutions also +make use of dynamic casting. In Rust, that requires opting into +[RTTI](./../idioms/rtti.md) by making `Any` a supertrait of the trait for the +visitors, so they can be downcast. While this extension to the visitor pattern +is possible to implement, the ergonomics of the approach make other approaches +more common in Rust. + +One of the alternative approaches, adopted from functional programming and +leveraging the design of traits and generics in Rust, is called ["data types à +la +carte"](https://www.cambridge.org/core/services/aop-cambridge-core/content/view/14416CB20C4637164EA9F77097909409/S0956796808006758a.pdf/data-types-a-la-carte.pdf). + +The following example shows a variation on the earlier examples using this +pattern to make it so that two parts of the expression type can be defined +separately and given evaluators separately. This approach can lead to +performance problems (in large part due to the indirection through nested +structures) or increases in compilation time, so its necessity should be +carefully evaluated before it is used. + +```rust +use std::collections::HashMap; + +// A type for combining separately-defined +// expressions. Defining individual expressions +// completely separately and then using an +// application-specific sum type instead of nesting +// Sum can improve performance. +enum Sum { + Inl(L), + Inr(R), +} + +// Define arithmetic expressions +enum ArithExp { + Lit(i32), + Plus { lhs: E, rhs: E }, +} + +// Define let bindings and variables +enum LetExp { + Var(String), + Let { name: String, exp: E, body: E }, +} + +// Combine the expressions +type Sig = Sum, LetExp>; + +// Define the fixed-point for recursive +// expressions. +struct Exp(Sig>); + +// Define an evaluator + +// The evaluation environment +type Env<'a> = HashMap<&'a str, i32>; + +// Evaluation errors +#[derive(Debug)] +enum EvalError<'a> { + UndefinedVar(&'a str), +} + +// A trait for expressions that can +// be evaluated. +trait Eval { + fn eval<'a>(&'a self, env: &Env<'a>) -> Result>; +} + +// Implement the evaluator trait for +// the administrative types + +impl Eval for Sum { + fn eval<'a>(&'a self, env: &Env<'a>) -> Result> { + match self { + Sum::Inl(left) => left.eval(env), + Sum::Inr(right) => right.eval(env), + } + } +} + +impl Eval for Box { + fn eval<'a>(&'a self, env: &Env<'a>) -> Result> { + self.as_ref().eval(env) + } +} + +// Implement the trait for the desired variants. +impl Eval for ArithExp { + fn eval<'a>(&'a self, env: &Env<'a>) -> Result> { + match self { + ArithExp::Lit(n) => Ok(*n), + ArithExp::Plus { lhs, rhs } => Ok(lhs.eval(env)? + rhs.eval(env)?), + } + } +} + +impl Eval for LetExp { + fn eval<'a>(&'a self, env: &Env<'a>) -> Result> { + match self { + LetExp::Var(x) => env + .get(x.as_str()) + .copied() + .ok_or(EvalError::UndefinedVar(x)), + LetExp::Let { name, exp, body } => { + let arg = exp.eval(env)?; + let mut env = env.clone(); + env.insert(name, arg); + body.eval(&env) + } + } + } +} + +// Since the trait is implemented for everything +// inside of Exp, it can be implemented for Exp. +impl Eval for Exp { + fn eval<'a>(&'a self, env: &Env<'a>) -> Result> { + self.0.eval(env) + } +} + +// helpers for constructing expressions + +fn lit(n: i32) -> Exp { + Exp(Sum::Inl(ArithExp::Lit(n))) +} + +fn plus(lhs: Exp, rhs: Exp) -> Exp { + Exp(Sum::Inl(ArithExp::Plus { + lhs: Box::new(lhs), + rhs: Box::new(rhs), + })) +} + +fn var(x: &str) -> Exp { + Exp(Sum::Inr(LetExp::Var(x.to_string()))) +} + +fn elet(name: &str, val: Exp, body: Exp) -> Exp { + Exp(Sum::Inr(LetExp::Let { + name: name.to_string(), + exp: Box::new(val), + body: Box::new(body), + })) +} + +fn main() { + let e = elet("x", lit(3), plus(var("x"), lit(2))); + + println!("{:?}", e.eval(&HashMap::new())); +} +``` + +One thing worth noting about the above implementation is that no dynamic +dispatch was required. + +{{#quiz visitor.toml}} diff --git a/src/patterns/visitor.toml b/src/patterns/visitor.toml new file mode 100644 index 0000000..6bc2f8a --- /dev/null +++ b/src/patterns/visitor.toml @@ -0,0 +1,31 @@ +[[questions]] +type = "MultipleChoice" +prompt.prompt = """ +Which are true trade-offs are made when defining free functions over an enum +instead of using the visitor pattern in Rust? +""" +prompt.distractors = [ +""" +Adding behaviors requires changing the enum definition. +""", +""" +Adding behaviors requires changing existing behaviors. +""", +""" +RTTI is necessary to add new variants. +""" +] +answer.answer = [ +""" +Adding variants requires changing the enum definition. +""", +""" +Adding variants requires changing existing behaviors. +""" +] +context = """ +The compiler checks for exhasutiveness in handling the cases of a Rust +enum. This makes it easy to determine where in the code additional logic is +needed to handle the new variants, making the trade-offs easier to live with. +""" +id = "8435b363-2e95-42e7-a518-d5746ffef385" diff --git a/src/patterns/visitor_pattern.md b/src/patterns/visitor_pattern.md deleted file mode 100644 index 38c9e7f..0000000 --- a/src/patterns/visitor_pattern.md +++ /dev/null @@ -1 +0,0 @@ -# Visitor pattern and double dispatch diff --git a/src/tooling/build_systems.md b/src/tooling/build_systems.md deleted file mode 100644 index 20231e2..0000000 --- a/src/tooling/build_systems.md +++ /dev/null @@ -1 +0,0 @@ -# Build systems (CMake)