diff --git a/src/example-archive/Rust/broken/error-proof/error-crash/00007-string-transfer-no-io.c b/src/example-archive/Rust/broken/error-proof/00007-string-transfer-no-io.c similarity index 100% rename from src/example-archive/Rust/broken/error-proof/error-crash/00007-string-transfer-no-io.c rename to src/example-archive/Rust/broken/error-proof/00007-string-transfer-no-io.c diff --git a/src/tutorial-with-bcp-comments.md b/src/tutorial-with-bcp-comments.md deleted file mode 100644 index 872ef700..00000000 --- a/src/tutorial-with-bcp-comments.md +++ /dev/null @@ -1,578 +0,0 @@ -include(src/setup.m4) - - - - ---- -title: CN tutorial -fontsize: 18px -mainfont: sans-serif -linestretch: 1.4 -maxwidth: 45em -lang: en-GB -toc-title: Table of contents -header-includes: | - ---- - - -CN is a type system for verifying C code, focusing especially on low-level systems code. Compared to the normal C type system, CN checks not only that expressions and statements follow the correct typing discipline for C-types, but also that the C code executes *safely* --- does not raise C undefined behaviour --- and *correctly* --- according to strong user-defined specifications. To accurately handle the complex semantics of C, CN builds on the [Cerberus semantics for C](https://github.com/rems-project/cerberus/). - -This tutorial introduces CN along a series of examples, starting with basic usage of CN on simple arithmetic functions and slowly moving towards more elaborate separation logic specifications of data structures. - -Many examples are taken from Arthur Charguéraud's excellent [Separation Logic Foundations](https://softwarefoundations.cis.upenn.edu). These example have names starting with "slf...". - -## Installation - - -To fetch and install CN, check the Cerberus repository at and follow the instructions in [`backend/cn/INSTALL.md`](https://github.com/rems-project/cerberus/blob/master/backend/cn/INSTALL.md). - -Once completed, type `cn --help` in your terminal to ensure CN is installed and found by your system. This should print the list of available options CN can be executed with. - -To apply CN to a C file, run `cn CFILE`. - - - - -## Basic usage - -### First example - -For a first example, let's look at a simple arithmetic function: `add`, shown below, takes two `int` arguments, `x` and `y`, and returns their sum. - -include_example(exercises/0.c) - -Running CN on the example produces an error message: -``` -cn exercises/0.c -[1/1]: add -exercises/0.c:3:10: error: Undefined behaviour - return x+y; - ~^~ -an exceptional condition occurs during the evaluation of an expression (§6.5#5) -Consider the state in /var/folders/_v/ndl32wpj4bb3y9dg11rvc8ph0000gn/T/state_393431.html -``` - -CN rejects the program because it has *C undefined behaviour*, meaning it is not safe to execute. CN points to the relevant source location, the addition `x+y`, and paragraph §6.5#5 of the C language standard that specifies the undefined behaviour. It also puts a link to an HTML file with more details on the error to help in diagnosing the problem. - -Inspecting this HTML report (as we do in a moment) gives us possible example values for `x` and `y` that cause the undefined behaviour and hint at the problem: for very large values for `x` and `y`, such as $1073741825$ and $1073741824$, the sum of `x` and `y` can exceed the representable range of a C `int` value: $1073741825 + 1073741824 = 2^31+1$, so their sum is larger than the maximal `int` value, $2^31-1$. - -Here `x` and `y` are *signed integers*, and in C, signed integer *overflow* is undefined behaviour (UB). Hence, `add` is only safe to execute for smaller values. Similarly, *large negative* values of `x` and `y` can cause signed integer *underflow*, also UB in C. We therefore need to rule out too large values for `x` and `y`, both positive and negative, which we do by writing a CN function specification. - - -### First function specification - -Shown below is our first function specification, for `add`, with a precondition that constraints `x` and `y` such that the sum of `x` and `y` lies between $-2147483648$ and $2147483647$, so within the representable range of a C `int` value. - -include_example(solutions/0.c) - -In detail: - -- Function specification are given using special `/*@ ... @*/` comments, placed in-between the function argument list and the function body. - -- The keyword `requires` starts the pre-condition, a list of one or more CN conditions separated by semicolons. - -- In function specifications, the names of the function arguments, here `x` and `y`, refer to their *initial values*. (Function arguments are mutable in C.) - -- `let sum = (i64) x + (i64) y` is a let-binding, which defines `sum` as the value `(i64) x + (i64) y` in the remainder of the function specification. - -- Instead of C syntax, CN uses Rust-like syntax for integer types, such as `u32` for 32-bit unsigned integers and `i64` for signed 64-bit integers to make their sizes unambiguous. Here, `x` and `y`, of C-type `int`, have CN type `i32`. - -- To define `sum` we cast `x` and `y` to the larger `i64` type, using syntax `(i64)`, which is large enough to hold the sum of any two `i32` values. - -- Finally, we require this sum to be in-between the minimal and maximal `int` value. Integer constants, such as `-2147483648i64`, must specifiy their CN type (`i64`). - -Running CN on the annotated program passes without errors. This means with our specified precondition, `add` is safe to execute. - -We may, however, wish to be more precise. So far the specification gives no information to callers of `add` about its output. To also specify the return values we add a postcondition, using the `ensures` keyword. - -include_example(solutions/1.c) - -Here we use the keyword `return`, only available in function postconditions, to refer to the return value, and equate it to `sum` as defined in the preconditions, cast back to `i32` type: `add` returns the sum of `x` and `y`. - - - -Running CN confirms that this postcondition also holds. - - -### Error reports - -In the original example CN reported a type error due to C undefined behaviour. While that example was perhaps simple enough to guess the problem and solution, this can become quite challenging as program and specification complexity increases. Diagnosing type errors is therefore an important part of using CN. CN tries to help with that by producting detailed error information, in the form of an HTML error report. - -Let's return to the type error from earlier (`add` without precondition) and take a closer look at this report. The report comprises two sections. - -BCP: Can it be bigger in the HTML output? Hard to read... - -![**CN error report**](src/images/0.error.png) - -**Path.** The first section, "Path to error", contains information about the control-flow path leading to the error. - -When type checking a C function, CN checks each possible control-flow path through the program individually. If CN detects UB or a violation of user-defined specifications, CN reports the problematic control-flow path, as a nested structure of statements: paths are split into sections, which group together statements between high-level control-flow positions (e.g. function entry, the start of a loop, the invocation of a `continue`, `break`, or `return` statement, etc.); within each section, statements are listed by source code location; finally, per statement, CN lists the typechecked sub-expressions, and the memory accesses and function calls within these. - -In our example, there is only one possible control-flow path: entering the function body (section "function body") and executing the block from lines 2 to 4, followed by the return statement at line 3. The entry for the latter contains the sequence of sub-expressions in the return statement, including reads of the variables `x` and `y`. - -In C, local variables in a function, including its arguments, are mutable and their address can be taken and passed as a value. CN therefore represents local variables as memory allocations that are manipulated using memory reads and writes. Here, type checking the return statement includes checking memory reads for `x` and `y`, at their locations `&ARG0` and `&ARG1`. The path report lists these reads and their return values: the read at `&ARG0` returns `x` (that is, the value of `x` originally passed to `add`); the read at `&ARG1` returns `y`. Alongside this symbolic information, CN displays concrete values: - -- `1073741825i32 /* 0x40000001 */` for x (the first value is the decimal representation, the second, in `/*...*/` comments, the hex equivalent) and - -- `1073741824i32 /* 0x40000000 */` for `y`. - -For now, ignore the pointer values `{@0; 4}` for `x` and `{@0; 0}` for `y`. - -These concrete values are part of a *counterexample*: a concrete valuation of variables and pointers in the program that that leads to the error. (The exact values may vary on your machine, depending on the version of Z3 installed on your system.) - - -**Proof context.** The second section, below the error trace, lists the proof context CN has reached along this control-flow path. - -"Available resources" lists the owned resources, as discussed in later sections. - -"Variables" lists counterexample values for program variables and pointers. In addition to `x` and `y`, assigned the same values as above, this includes values for their memory locations `&ARG0` and `&ARG1`, function pointers in scope, and the `__cn_alloc_history`, all of which we ignore for now. - -Finally, "Constraints" records all logical facts CN has learned along the path. This includes user-specified assumptions from preconditions or loop invariants, value ranges inferred from the C-types of variables, and facts learned during the type checking of the statements. Here (`add` without precondition) the only constraints are some contraints inferred from C-types in the code. - -- For instance, `good(x)` says that the initial value of `x` is a "good" `signed int` value (i.e. in range). Here `signed int` is the same type as `int`, CN just makes the sign explicit. For integer types `T`, `good` requires the value to be in range of type `T`; for pointer types `T` it also requires the pointer to be aligned. For structs and arrays this extends in the obvious way to struct members or array cells. - -- `repr` requires representability (not alignment) at type `T`, so `repr(&ARGO)`, for instance, records that the pointer to `x` is representable at C-type `signed int*`; - -- `aligned(&ARGO, 4u64)`, moreover, states that it is 4-byte aligned. - - - - -### Another arithmetic example - -Let's apply what we know so far to another simple arithmetic example. - -The function `doubled`, shown below, takes an int `n`, defines `a` as `n` incremented, `b` as `n` decremented, and returns the sum of the two. - -include_example(exercises/slf1_basic_example_let.signed.c) - -We would like to verify this is safe, and that `doubled` returns twice the value of `n`. Running CN on `doubled` leads to a type error: the increment of `a` has undefined behaviour. - -As in the first example, we need to ensure that `n+1` does not overflow and `n-1` does not underflow. Similarly also `a+b` has to be representable at `int` type. - -include_example(solutions/slf1_basic_example_let.signed.c) - -We can specify these using a similar style of precondition as in the first example. We first define `n_` as `n` cast to type `i64` --- i.e. a type large enough to hold `n+1`, `n-1` and `a+b` for any possible `i32` value for `n`. Then we specify that decrementing `n_` does not go below the minimal `int` value, that incrementing `n_` does not go above the maximal value, and that `n` doubled is also in range. These preconditions together guarantee safe execution. - -To capture the functional behaviour, the postcondition specifies that `return` is twice the value of `n`. - -BCP: Why don't we have constants we can use for maxint and minint of given size? Remembering / recognizing these huge numbers feels like a burden... - -### Exercise - -**Quadruple.** Specify the precondition needed to ensure safety of the C function `quadruple`, and a postcondition that describes its return value. - -include_example(exercises/slf2_basic_quadruple.signed.c) - -BCP: Leaving off a semicolon like this... - /*@ requires let n_ = (i64) n; - -2147483647i64 <= n_ + n_ + n_ + n_ - n_ + n_ + n_ + n_ <= 2147483647i64 - ensures return == n * 4i32 - @*/ -...leads to an exciting run-time failure: - cn slf2_basic_quadruple.signed.c - cn: internal error, uncaught exception: - Invalid_argument("String.sub / Bytes.sub") - Raised at Stdlib.invalid_arg in file "stdlib.ml", line 30, characters 20-45 - Called from Stdlib__String.sub in file "string.ml" (inlined), line 43, characters 2-23 - Called from Cerb_location.string_at_line in file "util/cerb_location.ml", line 384, characters 28-66 - Called from Cerb_location.head_pos_of_location in file "util/cerb_location.ml", line 423, characters 14-69 - Called from Dune__exe__Pp.error in file "backend/cn/pp.ml", line 271, characters 20-54 - Called from Dune__exe__Main.main in file "backend/cn/main.ml", line 216, characters 64-95 - Re-raised at Dune__exe__Main.main in file "backend/cn/main.ml", line 222, characters 8-73 - Called from Cmdliner_term.app.(fun) in file "cmdliner_term.ml", line 24, characters 19-24 - Called from Cmdliner_eval.run_parser in file "cmdliner_eval.ml", line 34, characters 37-44 -And attempting to rewrite the spec a little... - /*@ requires let n_ = (i64) n; - -2147483647i64 <= n_ * 4; - n_ + n_ + n_ + n_ <= 2147483647i64 - ensures return == n * 4i32 - @*/ -...leads to a similar failure: - cn slf2_basic_quadruple.signed.c - cn: internal error, uncaught exception: - Invalid_argument("String.sub / Bytes.sub") - Raised at Stdlib.invalid_arg in file "stdlib.ml", line 30, characters 20-45 - Called from Stdlib__String.sub in file "string.ml" (inlined), line 43, characters 2-23 - Called from Cerb_location.string_at_line in file "util/cerb_location.ml", line 384, characters 28-66 - Called from Cerb_location.head_pos_of_location in file "util/cerb_location.ml", line 423, characters 14-69 - Called from Dune__exe__Pp.error in file "backend/cn/pp.ml", line 271, characters 20-54 - Called from Dune__exe__Main.main in file "backend/cn/main.ml", line 216, characters 64-95 - Re-raised at Dune__exe__Main.main in file "backend/cn/main.ml", line 222, characters 8-73 - Called from Cmdliner_term.app.(fun) in file "cmdliner_term.ml", line 24, characters 19-24 - Called from Cmdliner_eval.run_parser in file "cmdliner_eval.ml", line 34, characters 37-44 - - -**Abs.** Give a specification to the C function `abs`, which computes the absolute value of a given `int` value. To describe the return value, use CN's ternary "`_ ? _ : _`" operator. Given a boolean `b`, and expressions `e1` and `e2` of the same basetype, `b ? e1 : e2` returns `e1` if `b` holds and `e2` otherwise. - -include_example(exercises/abs.c) - -BCP: There seem to be a LOT of ways to get this Invalid_argument error! -I "fixed" it by hacking line 384 of util/cerb_location.ml like this: - try - " ..." ^ String.sub l_ start (term_col - 5 - 3) ^ "..." - with _ -> ("OOPS: " ^ l_)) - -BCP: I actually found this exercise rather hard -- was not sure how to think about all the different integer widths, which one to use where, etc. Here was the solution I finally arrived at: - /*@ requires let x_ = (i64) x; - -2147483647i64 <= x_; x_ <= 2147483647i64 - ensures return == (x_ <= 0i64 ? 0i32 - x : x) @*/ -Not sure how best to guide people to the correct solution, but maybe we need a few more exercises stressing this stuff? - -## Pointers and simple ownership - -So far we've only considered example functions manipulating integer values. Verification becomes more interesting and challenging when *pointers* are involved, because the safety of memory accesses via pointers has to be verified. - -CN uses *separation logic resource types* and the concept of *ownership* to reason about memory accesses. A resource is the permission to access a region of memory. Unlike logical constraints, resource ownership is *unique*, meaning resources cannot be duplicated. - -Let's look at a simple example. The function `read` takes an `int` pointer `p` and returns the pointee value. - -include_example(exercises/read.c) - -Running CN on this example produces the following error: -``` -cn exercises/read.c -[1/1]: read -exercises/read.c:3:10: error: Missing resource for reading - return *p; - ^~ -Resource needed: Owned(p) -Consider the state in /var/folders/_v/ndl32wpj4bb3y9dg11rvc8ph0000gn/T/state_403624.html -``` - -For the read `*p` to be safe, ownership of a resource is missing: a resource `Owned(p)`. - -### The Owned resource type - -Given a C-type `T` and pointer `p`, the resource `Owned(p)` asserts ownership of a memory cell at location `p` of the size of C-type `T`. It is is CN's equivalent of a points-to assertion in separation logic (indexed by C-types `T`). - -In this example we can ensure the safe execution of `read` by adding a precondition that requires ownership of `Owned(p)`, as shown below. For now ignore the notation `take ... = Owned(p)`. Since `read` maintains this ownership, we also add a corresponding post-condition, whereby `read` returns ownership of `p` after it is finished executing, in the form of another `Owned(p)` resource. - -include_example(solutions/read.c) - -This specifications means that - -- any function calling `read` has to be able to provide a resource `Owned(p)` to pass into `read`, and - -- the caller will receive back a resource `Owned(p)` when `read` returns. - -BCP: What are v1 and v2? Can I leave them out or replace with _? - -### Resource outputs - -However, a caller of `read` may also wish to know that `read` actually returns the correct value, the pointee of `p`, and also that it does not change memory at location `p`. To phrase both we need a way to refer to the pointee of `p`. - -In CN resources have *outputs*. Each resource outputs the information that can be derived from ownership of the resource. What information is returned is specific to the type of resource. A resource `Owned(p)` (for some C-type `T`) outputs the *pointee value* of `p`, since that can be derived from the resource ownership: assume you have a pointer `p` and the associated ownership, then this uniquely determines the pointee value of `p`. - -CN uses the `take`-notation seen in the example above to refer to the output of a resource, introducing a new name binding for the output. The precondition `take v1 = Owned(p)` from the precondition does two things: (1) it assert ownership of resource `Owned(p)`, and (2) it binds the name `v1` to the resource output, here the pointee value of `p` at the start of the function. Similarly, the postcondition introduces the name `v2` for the pointee value on function return. - -That means we can use the resource outputs from the pre- and postcondition to strengthen the specification of `read` as planned. We add two new postconditions: we specify - -#. that `read` returns `v1` (the initial pointee value of `p`), and - -#. that the pointee values `v1` and `v2` before and after execution of `read` (respectively) are the same. - -include_example(solutions/read2.c) - - -**Aside.** In standard separation logic the equivalent specification for `read` could have been phrased as follows (where `return` binds the return value in the postcondition): -``` -{ ∃v1. p ↦ v1 } -read(p) -{ return. ∃v2. (p ↦ v2) * (return = v1 /\ v1 = v2) } -``` -BCP: Really?? The uses of v1 on the last line look ill scoped... -CN's `take` notation is just an alternative syntax for existential quantification over the values of resources (e.g. `take v1 = Owned<...>(p)` vs. `∃v1. p ↦ v1`), but a useful one: the `take` notation syntactically restricts how these quantifiers can be used to ensure CN can always infer them. - - -### Exercises - -**Quadruple**. Specify the function `quadruple_mem`, that is similar to the earlier `quadruple` function, except that the input is passed as an `int` pointer. Write a specification that takes ownership of this pointer on entry and returns this ownership on exit, leaving the pointee value unchanged. - -include_example(exercises/slf_quadruple_mem.c) - -**Abs**. Give a specification to the function `abs_mem`, which computes the absolute value of a number passed as an `int` pointer. - -include_example(exercises/abs_mem.c) - - -### Linear resource ownership - -In the specifications we have written so far, functions that receive resources as part of their precondition also return this ownership in their postcondition. - -Let's try the `read` example from earlier again, but with a postcondition that does not return the ownership: - -include_example(exercises/read.broken.c) - -CN rejects this program with the following message: -``` -cn build/exercises/read.broken.c -[1/1]: read -build/exercises/read.broken.c:4:3: error: Left-over unused resource 'Owned(p)(v1)' - return *p; - ^~~~~~~~~~ -Consider the state in /var/folders/_v/ndl32wpj4bb3y9dg11rvc8ph0000gn/T/state_17eb4a.html -``` - -CN has typechecked the function, verified that it is safe to execute under the precondition (given ownership `Owned(p)`), and that the function (vacuously) satisfies its postcondition. But, following the check of the postcondition it finds that not all resources have been "used up". - -Given the above specification, `read` leaks memory: it takes ownership, does not return it, but also does not deallocate the owned memory or otherwise dispose of it. In CN this is a type error. - -CN's resource types are *linear* (as opposed to affine). This means not only that resources cannot be duplicated, resources also cannot simply be dropped or "forgotten". Every resource passed into a function has to either be used up by it, by returning it or passing it to another function that consumes it, or destroyed, by deallocating the owned area of memory (as we shall see later). - -CN's motivation for linear tracking of resources is its focus on low-level systems software. CN checks C programs, in which, unlike higher-level garbage-collected languages, memory is managed manually, and memory leaks are typically very undesirable. - -As a consequence, function specifications have to do precise "book-keeping" of their resource footprint, and, in particular, return any unused resources back to the caller. - - - -### The Block resource type - -Aside from the `Owned` resource seen so far, CN has another built-in resource type: `Block`. Given a C-type `T` and pointer `p`, `Block(p)` asserts the same ownership as `Owned(p)` --- so ownership of a memory cell at `p` the size of type `T` --- but in contrast to `Owned`, `Block` memory is not necessarily initialised. - -CN uses this distinction to prevent reads from uninitialised memory: - -- A read at C-type `T` and pointer `p` requires a resource `Owned(p)`, so ownership of *initialised* memory at the right C-type. The load returns the resource unchanged. - -- A write at C-type `T` and pointer `p` needs only a `Block(p)` (so, unlike reads, writes to uninitialised memory are fine). The write consumes ownership of the resource (it destroys it) and returns a new resource `Owned(p)` with the value written as the output. This means the resource returned from a write records the fact that this memory cell is now initialised and can be read from. - -Since `Owned` carries the same ownership as `Block`, just with the additional information that the `Owned` memory is initalised, a resource `Owned(p)` is "at least as good" as `Block(p)` --- an `Owned(p)` resource can be used whenever `Block(p)` is needed. For instance CN's type checking of a write to `p` requires a `Block(p)`, but if an `Owned(p)` resource is what is available, this can be used just the same. (In other words, an already-initialised memory cell can, of course, be over-written again.) - -Unlike `Owned`, whose output is the pointee value, `Block` has no meaningful output: its output is `void`/`unit`. - - -### Write example - -Let's explore resources and their outputs in another example. The C function `incr` takes an `int` pointer `p` and increments the pointee value. - -include_example(solutions/slf0_basic_incr.signed.c) - -In the precondition we assert ownership of resource `Owned(p)`, binding its output/pointee value to `v1`, and use `v1` to specify that `p` must point to a sufficiently small value at the start of the function not to overflow when incremented. The postcondition asserts ownership of `p` with output `v2`, similar to before, and uses this to express that the value `p` points to is incremented by `incr`: `v2 == v1+1i32`. - - -If we incorrectly tweaked this specification and used `Block(p)` instead of `Owned(p)` in the precondition, as below, then CN would reject the program. - -include_example(exercises/slf0_basic_incr.signed.broken.c) - -CN reports: -``` -build/solutions/slf0_basic_incr.signed.broken.c:6:11: error: Missing resource for reading - int n = *p; - ^~ -Resource needed: Owned(p) -Consider the state in /var/folders/_v/ndl32wpj4bb3y9dg11rvc8ph0000gn/T/state_5da0f3.html -``` - -The `Owned(p)` resource required for reading is missing, since, as per precondition, only `Block(p)` is available. Checking the linked HTML file confirms this. Here the section "Available resources" lists all resource ownership at the point of the failure: - -- `Block(p)(u)`, so ownership of uninitialised memory at location `p`; the output is a `void`/`unit` value `u` (specified in the second pair of parentheses) - -- `Owned(&ARG0)(p)`, the ownership of (initialised) memory at location `&ARG0`, so the memory location where the first function argument is stored; its output is the pointer `p` (not to be confused with the pointee of `p`); and finally - -- `__CN_Alloc(&ARG0)(void)` is a resource that records allocation information for location `&ARG0`; this is related to CN's memory-object semantics, which we ignore for the moment. - - -### Exercises - -**Zero.** Write a specification for the function `zero`, which takes a pointer to *uninitialised* memory and initialises it to $0$. - -include_example(exercises/zero.c) - -**In-place double.** Give a specification for the function `inplace_double`, which takes an `int` pointer `p` and doubles the pointee value: specify the precondition needed to guarantee safe execution and a postcondition that captures the function's behaviour. - -include_example(exercises/slf3_basic_inplace_double.c) - - -### Multiple owned pointers - -When functions manipulate multiple pointers, we can assert their ownership just like before. However (as in standard separation logic) pointer ownership is unique, so simultaneous ownership of `Owned` or `Block` resources for two pointers requires these pointers to be disjoint. - -The following example shows the use of two `Owned` resources for accessing two different pointers: function `add` reads two `int` values in memory, at locations `p` and `q`, and returns their sum. - -include_example(exercises/add_read.c) - -This time we use C's `unsigned int` type. In C, over- and underflow of unsigned integers is not undefined behaviour, so we do not need any special preconditions to rule this out. Instead, when an arithmetic operation at unsigned type goes outside the representable range, the value "wraps around". - -The CN variables `m` and `n` (resp. `m2` and `n2`) for the pointee values of `p` and `q` before (resp. after) the execution of `add` have CN basetype `u32`, so unsigned 32-bit integers, matching the C `unsigned int` type. Like C's unsigned integer arithmetic, CN unsigned int values wrap around when exceeding the value range of the type. - -Hence, the postcondition `return == m+n` holds also when the sum of `m` and `n` is greater than the maximal `unsigned int` value. - -In the following we will sometimes use unsigned integer types to focus on specifying memory ownership, rather than the conditions necessary to show absence of C arithmetic undefined behaviour. - -BCP: Maybe it would be better to use unsigned throughout the early parts of the tutorial, to emphasize higher-level aspects of reasoning in CN, and then deal with all the intricacies of numbers all at once. - -### Exercises - -**Swap.** Specify the function `swap`, which takes two owned `unsigned int` pointers and swaps their values. - -include_example(exercises/swap.c) - -**Transfer.** Write a specification for the function `transfer`, shown below. - -include_example(exercises/slf8_basic_transfer.c) - -BCP: I started off with this wrong annotation - /*@ requires take p_ = Owned(p); - take q_ = Owned(q); - -2147483648i64 <= ((i64) p) + ((i64) q); - ((i64) p) + ((i64) q) <= 2147483647i64 - ensures take p_1 = Owned(p); - take q_1 = Owned(q); - p_1 == p_ + q_; - q_1 == 0i32 - @*/ -but I was confused by the error message: - slf8_basic_transfer.c:12:20: error: Missing resource for reading - unsigned int n = *p; - ^~ - Resource needed: Owned(p) -Then I fiddled some more and somehow wound up with this annotation (which also didn't work, but it was not immediately clear why): - /*@ requires take p_ = Owned(p); - take q_ = Owned(q) - ensures take p_1 = Owned(p); - take q_1 = Owned(q); - p_1 == p_ + q_; - q_1 == 0u32 - @*/ - - - -## Ownership of compound objects - -So far all examples have worked with just integers and pointers, but larger programs typically also manipulate compound values, often represented using C struct types. Specifying functions manipulating structs works in much the same way as with basic types. - -For instance, the following example uses a `struct` `point` to represent a point in two-dimensional space. The function `transpose` swaps a point's `x` and `y` coordinates. - -include_example(exercises/transpose.c) - -Here the precondition asserts ownership for `p`, at type `struct point`; the output `s` is a value of CN type `struct point`, i.e. a record with members `i32` `x` and `i32` `y`. The postcondition similarly asserts ownership of `p`, with output `s2`, and asserts the coordinates have been swapped, by referring to the members of `s` and `s2` individually. - -**Note.** In CN, as in C, structurally equal struct types with *different tags* are not the same type. - -BCP: ??? - -### Compound Owned and Block resources - -While one might like to think of a struct as a single (compound) object that is manipulated as a whole, C permits more flexible struct manipulation: given a struct pointer, programmers can construct pointers to *individual struct members* and pass these as values, even to other functions. - -CN therefore cannot treat resources for compound C types, such as structs, as primitive, indivisible units. Instead, `Owned` and `Block` are defined inductively in the structure of the C-type `T`. - -For struct types `T`, the `Owned` resource is defined as the collection of `Owned` resources for its members, as well as `Block` resources for any padding bytes in-between them. The resource `Block`, similarly, is made up of `Block` resources for all members and padding bytes. - -To handle code that manipulates pointers into parts of a struct object, CN can automatically decompose a struct resource into the member resources, and recompose it, as needed. The following example illustrates this. - -Recall the function `zero` from our earlier exercise. It takes an `int` pointer to uninitialised memory, with `Block` ownership, and initialises the value to zero, returning an `Owned` resource with output $0$. - -Now consider the new function `init_point`, shown below, which takes a pointer `p` to a `struct point` and zero-initialises its members by calling `zero` twice, once with a pointer to struct member `x`, and once with a pointer to `y`. - -include_example(exercises/init_point.c) - -As per precondition, `init_point` receives ownership `Block(p)`. The `zero` function, however, works on `int` pointers and requires `Block` ownership. - -CN can prove the calls to `zero` with `&p->x` and `&p->y`are safe because it decomposes the `Block(p)` into two `Block`, one for member `x`, one for member `y`. Later, the reverse happens: following the two calls to `zero`, as per `zero`'s precondition, `init_point` has ownership of two adjacent `Owned` resources: ownership for the two struct member pointers, with the member now initialised. Since the postcondition of `init_point` requires ownership `Owned(p)`, CN combines these back into a compound resource. The resulting `Owned` resource has for an output the struct value `s2` that is composed of the zeroed member values for `x` and `y`. - -### Resource inference - -To handle the required resource inference, CN "eagerly" decomposes all `struct` resources into resources for the struct members, and "lazily" re-composes them as needed. - -We can see this if we experimentally, for instance, change the `transpose` example from above to force a type error. Let's insert an `/*@ assert(false) @*/` CN assertion in the middle of the `transpose` function (more on CN assertions later), so we can inspect CN's proof context shown in the error report. - -include_example(exercises/transpose.broken.c) - -The precondition of `transpose` asserts ownership of an `Owned(p)` resource. The error report now instead lists under "Available resources" two resources: - -- `Owned(member_shift(p, x))` with output `s.x` and - -- `Owned(member_shift(p, y))` with output `s.y` - -Here `member_shift(p,m)` is the CN expression that constructs, from a `struct s` pointer `p`, the "shifted" pointer for its member `m`. - -When the function returns the two member resources are recombined "on demand" to satisfy the postcondition `Owned(p)`. - - - -### Exercises - -**Init point.** Insert CN `assert(false)` statements in different statement positions of `init_point` and check how the available resources evolve. - -**Transpose (again).** Recreate the transpose function from before, now using the swap function verified earlier (for `struct upoint`, with unsigned member values). - -include_example(exercises/transpose2.c) - - - - - -## TODO: Datastructures and resource predicates: linked lists - -C linked list example. This requires a different ownership pattern from before because the structure is recursive. Introduce **resource predicates** for ownership of recursive datastructures. - -Example of CN linked list resource predicate without output ("ignore `void` return"). Verify safety of some simple list manipulating functions? List length? - -Feeing all elements of a list (HLMC). - - -CN's **resource inference for resource predicates** unfolds resource predicates automatically as new information is learned about the inputs. Show with an example, ideally where this unfolding is not available at first, but then works after branching on whether the input pointer is null. - -Maybe also show **manual case splitting**? - - -### TODO: CN datatypes and logical functions - -For going beyond safety proofs there needs to be a way to refer to the data represented in memory. User-defined resource predicates have user-defined outputs that return information derived from inputs and owned memory. For instance, resource predicate that returns length or sum of the list as output (VeriFast?). - -We need a way to talk about recursive data in specifications. Define CN list **datatype** and **CN pattern matching**. Modify linked list predicate to output the represented list. - -Verify C implementations of list nil, cons, head, tail, with functional specification (HLMC?). (For the following: HLMC linked list library?) - -For more complicated operations there needs to be a way to express recursive definitions on lists: introduce **(recursive) specification functions** . - -Define CN list append, zero'ing out a list. Summing all the values and computing the length of a list? (VeriFast) -- so far without matching C functions. - -Recursive CN functions cannot be handled automatically in proof. CN only knows f(args) = f(args') when args = args' for recursive functions, nothing else. - -Show list append example, written as recursive C function, where **unfold** is used to unfold the definition of a recursive specification function to verify the example. Do the same for zeroing out, summing up the values, counting the list. - -In all cases the C function should have a recursive structure matching the CN logical function, so unfolding works well. - - - -### TODO: Loops, loop invariants, lemmata - -Alternative formulation of list append using a while loop. This requires a loop invariant: basically this works the same as function preconditions, but more variables are in scope and function arguments are mutable. - -To phrase the specification we need a predicate for partial linked lists/linked list segments. Verify list append safey without functional correctness; maybe other examples. - -When we move to functional correctness we'll find that often `unfold` does not work, because the shape of the loop does not align nicely with the recursion of the specification function. - -Instead, inductive reasoning is required, and we need to **specify and apply lemmata**. Apply that to the append example. (There's also the option of VeriFast-style C programs as proofs?) - - -### TODO: Binary trees? - - -## TODO: Iterated resources and quantified constraints - -CN computed pointers and aliasing? - -Now move to arrays, and explain **iterated resources**, first using just Owned. - -Verify safety of some simple example program that loops over an array. Specify the ownership using iterated resources. CN rejects the program; fix by adding `extract`. - -Verify a functional property of the same code. The **output of an iterated resource** is a map from indices to values. - - -Reasoning about functional properties of an array may require specifying properties that hold for all values of an array: we need **logical quantifiers**. - -Specify such a property using quantifiers, and show how to **instantiate** it manually using `instantiate`. diff --git a/src/tutorial-with-cht-comments.adoc b/src/tutorial-with-cht-comments.adoc deleted file mode 100644 index e4697141..00000000 --- a/src/tutorial-with-cht-comments.adoc +++ /dev/null @@ -1,1168 +0,0 @@ -:source-highlighter: pygments -:pygments-style: manni -// :pygments-style: tango -:nofooter: -:prewrap!: - -++++ - -++++ - -// CHT: had the same issue as PS with "/bin/sh: 1: [[: not found", but it worked when I used bash -// CHT: adding to PS' comment that you have to `make` in the tutorial to get the `exercises` - they are linked as exercises/* in here, -// but they're in ../build/exercises from here; none of the links work for me -// CHT: I'm confused by the file structure generally; what is the role of "solutions" vs. "examples" -// (also is there a purpose for the duplicate files - 0.c vs. 1.c, list_rev_lemmas.h vs. liste_rev_lemmas.h) - -// __________________________________________________________________________ - -= CN tutorial - -CN is a type system for verifying C code, focusing especially on low-level systems code. Compared to the normal C type system, CN checks not only that expressions and statements follow the correct typing discipline for C-types, but also that the C code executes _safely_ — does not raise C undefined behaviour — and _correctly_ — according to strong user-defined specifications. To accurately handle the complex semantics of C, CN builds on the https://github.com/rems-project/cerberus/[Cerberus semantics for C]. - -This tutorial introduces CN along a series of examples, starting with basic usage of CN on simple arithmetic functions and slowly moving towards more elaborate separation logic specifications of data structures. - -Many examples are taken from Arthur Charguéraud’s excellent https://softwarefoundations.cis.upenn.edu[Separation Logic Foundations]. These example have names starting with "`slf...`". - -== Installing CN - -To fetch and install CN, check the Cerberus repository at https://github.com/rems-project/cerberus and follow the instructions in https://github.com/rems-project/cerberus/blob/master/backend/cn/INSTALL.md[backend/cn/INSTALL.md]. - -Once completed, type `+cn --help+` in your terminal to ensure CN is installed and found by your system. This should print the list of available options CN can be executed with. - -To apply CN to a C file, run `+cn CFILE+`. - -== Source files for the exercises and examples - -The source files can be downloaded from link:exercises.zip[here]. - -== Basic usage - -=== First example - -For a first example, let’s look at a simple arithmetic function: `+add+`, shown below, takes two `+int+` arguments, `+x+` and `+y+`, and returns their sum. - -// BCP: We should probably adopt the convention that all the files in -// the exercises directory have a comment at the top giving their name. -// (We could actually auto-generate those header comments when we process -// /src/examples into build/exercises, to avoid having to maintain them -// and possibly get them wrong...) -[source,c] ----- -include::exercises/0.c[] ----- - -Running CN on the example produces an error message: - -.... -cn exercises/0.c -[1/1]: add -exercises/0.c:3:10: error: Undefined behaviour - return x+y; - ~^~ -an exceptional condition occurs during the evaluation of an expression (§6.5#5) -// CHT: The default state file path doesn't seem to exist or get created for me; I ended up using the --state-file flag -Consider the state in /var/folders/_v/ndl32wpj4bb3y9dg11rvc8ph0000gn/T/state_393431.html -.... - -CN rejects the program because it has _C undefined behaviour_, meaning it is not safe to execute. CN points to the relevant source location, the addition `+x+y+`, and paragraph §6.5#5 of the C language standard that specifies the undefined behaviour. It also puts a link to an HTML file with more details on the error to help in diagnosing the problem. - -Inspecting this HTML report (as we do in a moment) gives us possible example values for `+x+` and `+y+` that cause the undefined behaviour and hint at the problem: for very large values for `+x+` and `+y+`, such as `+1073741825+` and `+1073741824+`, the sum of `+x+` and `+y+` can exceed the representable range of a C `+int+` value: `+1073741825 + 1073741824 = 2^31+1+`, so their sum is larger than the maximal `+int+` value, `+2^31-1+`. - -Here `+x+` and `+y+` are _signed integers_, and in C, signed integer _overflow_ is undefined behaviour (UB). Hence, `+add+` is only safe to execute for smaller values. Similarly, _large negative_ values of `+x+` and `+y+` can cause signed integer _underflow_, also UB in C. We therefore need to rule out too large values for `+x+` and `+y+`, both positive and negative, which we do by writing a CN function specification. - -=== First function specification - -Shown below is our first function specification, for `+add+`, with a precondition that constrains `+x+` and `+y+` such that the sum of `+x+` and `+y+` lies between `+-2147483648+` and `+2147483647+`, so within the representable range of a C `+int+` value. - -[source,c] ----- -include::solutions/0.c[] ----- - -In detail: - -* Function specifications are given using special `+/*@ ... @*/+` comments, placed in-between the function argument list and the function body. - -* The keyword `+requires+` starts the precondition, a list of one or more CN conditions separated by semicolons. - -* In function specifications, the names of the function arguments, here `+x+` and `+y+`, refer to their _initial values_. (Function arguments are mutable in C.) - -* `+let sum = (i64) x + (i64) y+` is a let-binding, which defines `+sum+` as the value `+(i64) x + (i64) y+` in the remainder of the function specification. - -* Instead of C syntax, CN uses Rust-like syntax for integer types, such as `+u32+` for 32-bit unsigned integers and `+i64+` for signed 64-bit integers to make their sizes unambiguous. Here, `+x+` and `+y+`, of C-type `+int+`, have CN type `+i32+`. - -* To define `+sum+` we cast `+x+` and `+y+` to the larger `+i64+` type, using syntax `+(i64)+`, which is large enough to hold the sum of any two `+i32+` values. - -* Finally, we require this sum to be in-between the minimal and maximal `+int+` value. Integer constants, such as `+-2147483648i64+`, must specifiy their CN type (`+i64+`). - -Running CN on the annotated program passes without errors. This means with our specified precondition, `+add+` is safe to execute. - -We may, however, wish to be more precise. So far the specification gives no information to callers of `+add+` about its output. To also specify the return values we add a postcondition, using the `+ensures+` keyword. - -[source,c] ----- -include::solutions/1.c[] ----- - -Here we use the keyword `+return+`, only available in function postconditions, to refer to the return value, and equate it to `+sum+` as defined in the preconditions, cast back to `+i32+` type: `+add+` returns the sum of `+x+` and `+y+`. - -Running CN confirms that this postcondition also holds. - -=== Error reports - -In the original example CN reported a type error due to C undefined behaviour. While that example was perhaps simple enough to guess the problem and solution, this can become quite challenging as program and specification complexity increases. Diagnosing type errors is therefore an important part of using CN. CN tries to help with that by producting detailed error information, in the form of an HTML error report. - -Let’s return to the type error from earlier (`+add+` without precondition) and take a closer look at this report. The report comprises two sections. - -.*CN error report* -image::images/0.error.png[*CN error report*] - -*Path.* The first section, "`Path to error`", contains information about the control-flow path leading to the error. - -When type checking a C function, CN checks each possible control-flow path through the program individually. If CN detects UB or a violation of user-defined specifications, CN reports the problematic control-flow path, as a nested structure of statements: paths are split into sections, which group together statements between high-level control-flow positions (e.g. function entry, the start of a loop, the invocation of a `+continue+`, `+break+`, or `+return+` statement, etc.); within each section, statements are listed by source code location; finally, per statement, CN lists the typechecked sub-expressions, and the memory accesses and function calls within these. - -In our example, there is only one possible control-flow path: entering the function body (section "`function body`") and executing the block from lines 2 to 4, followed by the return statement at line 3. The entry for the latter contains the sequence of sub-expressions in the return statement, including reads of the variables `+x+` and `+y+`. - -In C, local variables in a function, including its arguments, are mutable and their address can be taken and passed as a value. CN therefore represents local variables as memory allocations that are manipulated using memory reads and writes. Here, type checking the return statement includes checking memory reads for `+x+` and `+y+`, at their locations `+&ARG0+` and `+&ARG1+`. The path report lists these reads and their return values: the read at `+&ARG0+` returns `+x+` (that is, the value of `+x+` originally passed to `+add+`); the read at `+&ARG1+` returns `+y+`. Alongside this symbolic information, CN displays concrete values: - -* `+1073741825i32 /* 0x40000001 */+` for x (the first value is the decimal representation, the second, in `+/*...*/+` comments, the hex equivalent) and - -* `+1073741824i32 /* 0x40000000 */+` for `+y+`. - -For now, ignore the pointer values `+{@0; 4}+` for `+x+` and `+{@0; 0}+` for `+y+`. - -These concrete values are part of a _counterexample_: a concrete valuation of variables and pointers in the program that that leads to the error. (The exact values may vary on your machine, depending on the version of Z3 installed on your system.) - -*Proof context.* The second section, below the error trace, lists the proof context CN has reached along this control-flow path. - -"`Available resources`" lists the owned resources, as discussed in later sections. - -"`Variables`" lists counterexample values for program variables and pointers. In addition to `+x+` and `+y+`, assigned the same values as above, this includes values for their memory locations `+&ARG0+` and `+&ARG1+`, function pointers in scope, and the `+__cn_alloc_history+`, all of which we ignore for now. - -Finally, "`Constraints`" records all logical facts CN has learned along the path. This includes user-specified assumptions from preconditions or loop invariants, value ranges inferred from the C-types of variables, and facts learned during the type checking of the statements. Here (`+add+` without precondition) the only constraints are some contraints inferred from C-types in the code. - -* For instance, `+good(x)+` says that the initial value of `+x+` is a "`good`" `+signed int+` value (i.e. in range). Here `+signed int+` is the same type as `+int+`, CN just makes the sign explicit. For integer types `+T+`, `+good+` requires the value to be in range of type `+T+`; for pointer types `+T+` it also requires the pointer to be aligned. For structs and arrays this extends in the obvious way to struct members or array cells. - -* `+repr+` requires representability (not alignment) at type `+T+`, so `+repr(&ARGO)+`, for instance, records that the pointer to `+x+` is representable at C-type `+signed int*+`; - -* `+aligned(&ARGO, 4u64)+`, moreover, states that it is 4-byte aligned. - -=== Another arithmetic example - -Let’s apply what we know so far to another simple arithmetic example. - -The function `+doubled+`, shown below, takes an int `+n+`, defines `+a+` as `+n+` incremented, `+b+` as `+n+` decremented, and returns the sum of the two. - -// BCP: Is it important to number the slf examples? If so, we should do it consistently, but IMO it is not. -[source,c] ----- -include::exercises/slf1_basic_example_let.signed.c[] ----- - -We would like to verify this is safe, and that `+doubled+` returns twice the value of `+n+`. Running CN on `+doubled+` leads to a type error: the increment of `+a+` has undefined behaviour. - -As in the first example, we need to ensure that `+n+1+` does not overflow and `+n-1+` does not underflow. Similarly also `+a+b+` has to be representable at `+int+` type. - -[source,c] ----- -include::solutions/slf1_basic_example_let.signed.c[] ----- - -We can specify these using a similar style of precondition as in the first example. We first define `+n_+` as `+n+` cast to type `+i64+` — i.e. a type large enough to hold `+n+1+`, `+n-1+` and `+a+b+` for any possible `+i32+` value for `+n+`. Then we specify that decrementing `+n_+` does not go below the minimal `+int+` value, that incrementing `+n_+` does not go above the maximal value, and that `+n+` doubled is also in range. These preconditions together guarantee safe execution. - -To capture the functional behaviour, the postcondition specifies that `+return+` is twice the value of `+n+`. - -=== Exercise - -*Quadruple.* Specify the precondition needed to ensure safety of the C function `+quadruple+`, and a postcondition that describes its return value. - -[source,c] ----- -include::exercises/slf2_basic_quadruple.signed.c[] ----- - -*Abs.* Give a specification to the C function `+abs+`, which computes the absolute value of a given `+int+` value. To describe the return value, use CN’s ternary "`+_ ? _ : _+`" operator. Given a boolean `+b+`, and expressions `+e1+` and `+e2+` of the same basetype, `+b ? e1 : e2+` returns `+e1+` if `+b+` holds and `+e2+` otherwise. - -[source,c] ----- -include::exercises/abs.c[] ----- - -== Pointers and simple ownership - -So far we’ve only considered example functions manipulating integer values. Verification becomes more interesting and challenging when _pointers_ are involved, because the safety of memory accesses via pointers has to be verified. - -CN uses _separation logic resource types_ and the concept of _ownership_ to reason about memory accesses. A resource is the permission to access a region of memory. Unlike logical constraints, resource ownership is _unique_, meaning resources cannot be duplicated. - -Let’s look at a simple example. The function `+read+` takes an `+int+` pointer `+p+` and returns the pointee value. - -[source,c] ----- -include::exercises/read.c[] ----- - -Running CN on this example produces the following error: - -.... -cn exercises/read.c -[1/1]: read -exercises/read.c:3:10: error: Missing resource for reading - return *p; - ^~ -Resource needed: Owned(p) -Consider the state in /var/folders/_v/ndl32wpj4bb3y9dg11rvc8ph0000gn/T/state_403624.html -.... - -For the read `+*p+` to be safe, ownership of a resource is missing: a resource `+Owned(p)+`. - -=== The Owned resource type - -Given a C-type `+T+` and pointer `+p+`, the resource `+Owned(p)+` asserts ownership of a memory cell at location `+p+` of the size of C-type `+T+`. It is CN’s equivalent of a points-to assertion in separation logic (indexed by C-types `+T+`). - -In this example we can ensure the safe execution of `+read+` by adding a precondition that requires ownership of `+Owned(p)+`, as shown below. For now ignore the notation `+take ... = Owned(p)+`. Since `+read+` maintains this ownership, we also add a corresponding postcondition, whereby `+read+` returns ownership of `+p+` after it is finished executing, in the form of another `+Owned(p)+` resource. - -[source,c] ----- -include::solutions/read.c[] ----- - -This specifications means that - -* any function calling `+read+` has to be able to provide a resource `+Owned(p)+` to pass into `+read+`, and - -* the caller will receive back a resource `+Owned(p)+` when `+read+` returns. - -=== Resource outputs - -However, a caller of `+read+` may also wish to know that `+read+` actually returns the correct value, the pointee of `+p+`, and also that it does not change memory at location `+p+`. To phrase both we need a way to refer to the pointee of `+p+`. - -In CN resources have _outputs_. Each resource outputs the information that can be derived from ownership of the resource. What information is returned is specific to the type of resource. A resource `+Owned(p)+` (for some C-type `+T+`) outputs the _pointee value_ of `+p+`, since that can be derived from the resource ownership: assume you have a pointer `+p+` and the associated ownership, then this uniquely determines the pointee value of `+p+`. - -CN uses the `+take+`-notation seen in the example above to refer to the output of a resource, introducing a new name binding for the output. The precondition `+take v1 = Owned(p)+` from the precondition does two things: (1) it assert ownership of resource `+Owned(p)+`, and (2) it binds the name `+v1+` to the resource output, here the pointee value of `+p+` at the start of the function. Similarly, the postcondition introduces the name `+v2+` for the pointee value on function return. - -That means we can use the resource outputs from the pre- and postcondition to strengthen the specification of `+read+` as planned. We add two new postconditions: we specify - -. that `+read+` returns `+v1+` (the initial pointee value of `+p+`), and -. that the pointee values `+v1+` and `+v2+` before and after execution of `+read+` (respectively) are the same. - -[source,c] ----- -include::solutions/read2.c[] ----- - -*Aside.* In standard separation logic the equivalent specification for `+read+` could have been phrased as follows (where `+return+` binds the return value in the postcondition): - -.... -∀p. -∀v1. { p ↦ v1 } - read(p) - { return. ∃v2. (p ↦ v2) /\ return = v1 /\ v1 = v2 } -.... - -CN’s `+take+` notation is just alternative syntax for quantification over the values of resources, but a useful one: the `+take+` notation syntactically restricts how these quantifiers can be used to ensure CN can always infer them. -// CHT: can we elaborate on the part of this sentence after the colon? - -=== Exercises - -*Quadruple*. Specify the function `+quadruple_mem+`, that is similar to the earlier `+quadruple+` function, except that the input is passed as an `+int+` pointer. Write a specification that takes ownership of this pointer on entry and returns this ownership on exit, leaving the pointee value unchanged. - -[source,c] ----- -include::exercises/slf_quadruple_mem.c[] ----- - -*Abs*. Give a specification to the function `+abs_mem+`, which computes the absolute value of a number passed as an `+int+` pointer. - -[source,c] ----- -include::exercises/abs_mem.c[] ----- - -=== Linear resource ownership - -In the specifications we have written so far, functions that receive resources as part of their precondition also return this ownership in their postcondition. - -Let’s try the `+read+` example from earlier again, but with a postcondition that does not return the ownership: - -[source,c] ----- -include::exercises/read.broken.c[] ----- - -CN rejects this program with the following message: - -.... -cn build/exercises/read.broken.c -[1/1]: read -build/exercises/read.broken.c:4:3: error: Left-over unused resource 'Owned(p)(v1)' - return *p; - ^~~~~~~~~~ -Consider the state in /var/folders/_v/ndl32wpj4bb3y9dg11rvc8ph0000gn/T/state_17eb4a.html -.... - -CN has typechecked the function, verified that it is safe to execute under the precondition (given ownership `+Owned(p)+`), and that the function (vacuously) satisfies its postcondition. But, following the check of the postcondition it finds that not all resources have been "`used up`". - -Given the above specification, `+read+` leaks memory: it takes ownership, does not return it, but also does not deallocate the owned memory or otherwise dispose of it. In CN this is a type error. - -CN’s resource types are _linear_ (as opposed to affine). This means not only that resources cannot be duplicated, but also that resources cannot simply be dropped or "`forgotten`". Every resource passed into a function has to either be used up by it, by returning it or passing it to another function that consumes it, or destroyed, by deallocating the owned area of memory (as we shall see later). - -CN’s motivation for linear tracking of resources is its focus on low-level systems software. CN checks C programs, in which, unlike higher-level garbage-collected languages, memory is managed manually, and memory leaks are typically very undesirable. - -As a consequence, function specifications have to do precise "`book-keeping`" of their resource footprint, and, in particular, return any unused resources back to the caller. - -=== The Block resource type - -Aside from the `+Owned+` resource seen so far, CN has another built-in resource type: `+Block+`. Given a C-type `+T+` and pointer `+p+`, `+Block(p)+` asserts the same ownership as `+Owned(p)+` — so ownership of a memory cell at `+p+` the size of type `+T+` — but in contrast to `+Owned+`, `+Block+` memory is not necessarily initialised. - -CN uses this distinction to prevent reads from uninitialised memory: - -* A read at C-type `+T+` and pointer `+p+` requires a resource `+Owned(p)+`, i.e., ownership of _initialised_ memory at the right C-type. The load returns the `+Owned+` resource unchanged. - -* A write at C-type `+T+` and pointer `+p+` needs only a `+Block(p)+` (so, unlike reads, writes to uninitialised memory are fine). The write consumes ownership of the `+Block+` resource (it destroys it) and returns a new resource `+Owned(p)+` with the value written as the output. This means the resource returned from a write records the fact that this memory cell is now initialised and can be read from. - -Since `+Owned+` carries the same ownership as `+Block+`, just with the additional information that the `+Owned+` memory is initalised, a resource `+Owned(p)+` is "`at least as good`" as `+Block(p)+` — an `+Owned(p)+` resource can be used whenever `+Block(p)+` is needed. For instance CN’s type checking of a write to `+p+` requires a `+Block(p)+`, but if an `+Owned(p)+` resource is what is available, this can be used just the same. This allows an already-initialised memory cell to be over-written again. - -Unlike `+Owned+`, whose output is the pointee value, `+Block+` has no meaningful output: its output is `+void+`/`+unit+`. - -=== Write example - -Let’s explore resources and their outputs in another example. The C function `+incr+` takes an `+int+` pointer `+p+` and increments the pointee value. - -[source,c] ----- -include::solutions/slf0_basic_incr.signed.c[] ----- - -In the precondition we assert ownership of resource `+Owned(p)+`, binding its output/pointee value to `+v1+`, and use `+v1+` to specify that `+p+` must point to a sufficiently small value at the start of the function not to overflow when incremented. The postcondition asserts ownership of `+p+` with output `+v2+`, as before, and uses this to express that the value `+p+` points to is incremented by `+incr+`: `+v2 == v1+1i32+`. - -If we incorrectly tweaked this specification and used `+Block(p)+` instead of `+Owned(p)+` in the precondition, as below, then CN would reject the program. - -[source,c] ----- -include::exercises/slf0_basic_incr.signed.broken.c[] ----- - -CN reports: - -.... -build/solutions/slf0_basic_incr.signed.broken.c:6:11: error: Missing resource for reading - int n = *p; - ^~ -Resource needed: Owned(p) -Consider the state in /var/folders/_v/ndl32wpj4bb3y9dg11rvc8ph0000gn/T/state_5da0f3.html -.... - -The `+Owned(p)+` resource required for reading is missing, since, as per precondition, only `+Block(p)+` is available. Checking the linked HTML file confirms this. Here the section "`Available resources`" lists all resource ownership at the point of the failure: - -* `+Block(p)(u)+`, so ownership of uninitialised memory at location `+p+`; the output is a `+void+`/`+unit+` value `+u+` (specified in the second pair of parentheses) - -* `+Owned(&ARG0)(p)+`, the ownership of (initialised) memory at location `+&ARG0+`, so the memory location where the first function argument is stored; its output is the pointer `+p+` (not to be confused with the pointee of `+p+`); and finally - -* `+__CN_Alloc(&ARG0)(void)+` is a resource that records allocation information for location `+&ARG0+`; this is related to CN’s memory-object semantics, which we ignore for the moment. - -=== Exercises - -*Zero.* Write a specification for the function `+zero+`, which takes a pointer to _uninitialised_ memory and initialises it to `+0+`. - -[source,c] ----- -include::exercises/zero.c[] ----- - -*In-place double.* Give a specification for the function `+inplace_double+`, which takes an `+int+` pointer `+p+` and doubles the pointee value: specify the precondition needed to guarantee safe execution and a postcondition that captures the function’s behaviour. - -// CHT: are there any existing features for noticing if the precondition is stronger than necessary for safety -// (e.g., if you typo -2i64 <= sum instead of -2147483648i64 <= sum) - -[source,c] ----- -include::exercises/slf3_basic_inplace_double.c[] ----- - -=== Multiple owned pointers - -When functions manipulate multiple pointers, we can assert their ownership just like before. However (as in standard separation logic) pointer ownership is unique, so simultaneous ownership of `+Owned+` or `+Block+` resources for two pointers requires these pointers to be disjoint. - -The following example shows the use of two `+Owned+` resources for accessing two different pointers: function `+add+` reads two `+int+` values in memory, at locations `+p+` and `+q+`, and returns their sum. - -[source,c] ----- -include::exercises/add_read.c[] ----- - -This time we use C’s `+unsigned int+` type. In C, over- and underflow of unsigned integers is not undefined behaviour, so we do not need any special preconditions to rule this out. Instead, when an arithmetic operation at unsigned type goes outside the representable range, the value "`wraps around`". - -The CN variables `+m+` and `+n+` (resp. `+m2+` and `+n2+`) for the pointee values of `+p+` and `+q+` before (resp. after) the execution of `+add+` have CN basetype `+u32+`, so unsigned 32-bit integers, matching the C `+unsigned int+` type. Like C’s unsigned integer arithmetic, CN unsigned int values wrap around when exceeding the value range of the type. - -Hence, the postcondition `+return == m+n+` holds also when the sum of `+m+` and `+n+` is greater than the maximal `+unsigned int+` value. - -In the following we will sometimes use unsigned integer types to focus on specifying memory ownership, rather than the conditions necessary to show absence of C arithmetic undefined behaviour. - -=== Exercises - -*Swap.* Specify the function `+swap+`, which takes two owned `+unsigned int+` pointers and swaps their values. - -[source,c] ----- -include::exercises/swap.c[] ----- - -*Transfer.* Write a specification for the function `+transfer+`, shown below. - -[source,c] ----- -include::exercises/slf8_basic_transfer.c[] ----- - -== Ownership of compound objects - -So far all examples have worked with just integers and pointers, but larger programs typically also manipulate compound values, often represented using C struct types. Specifying functions manipulating structs works in much the same way as with basic types. - -For instance, the following example uses a `+struct+` `+point+` to represent a point in two-dimensional space. The function `+transpose+` swaps a point’s `+x+` and `+y+` coordinates. - -[source,c] ----- -include::exercises/transpose.c[] ----- - -Here the precondition asserts ownership for `+p+`, at type `+struct point+`; the output `+s+` is a value of CN type `+struct point+`, i.e. a record with members `+i32+` `+x+` and `+i32+` `+y+`. The postcondition similarly asserts ownership of `+p+`, with output `+s2+`, and asserts the coordinates have been swapped, by referring to the members of `+s+` and `+s2+` individually. - -=== Compound Owned and Block resources - -While one might like to think of a struct as a single (compound) object that is manipulated as a whole, C permits more flexible struct manipulation: given a struct pointer, programmers can construct pointers to _individual struct members_ and pass these as values, even to other functions. - -CN therefore cannot treat resources for compound C types, such as structs, as primitive, indivisible units. Instead, `+Owned+` and `+Block+` are defined inductively in the structure of the C-type `+T+`. - -For struct types `+T+`, the `+Owned+` resource is defined as the collection of `+Owned+` resources for its members (as well as `+Block+` resources for any padding bytes in-between). The resource `+Block+`, similarly, is made up of `+Block+` resources for all members (and padding bytes). - -To handle code that manipulates pointers into parts of a struct object, CN can automatically decompose a struct resource into the member resources, and recompose it, as needed. The following example illustrates this. - -Recall the function `+zero+` from our earlier exercise. It takes an `+int+` pointer to uninitialised memory, with `+Block+` ownership, and initialises the value to zero, returning an `+Owned+` resource with output `+0+`. - -Now consider the function `+init_point+`, shown below, which takes a pointer `+p+` to a `+struct point+` and zero-initialises its members by calling `+zero+` twice, once with a pointer to struct member `+x+`, and once with a pointer to `+y+`. - -[source,c] ----- -include::exercises/init_point.c[] ----- - -As stated in its precondition, `+init_point+` receives ownership `+Block(p)+`. The `+zero+` function, however, works on `+int+` pointers and requires `+Block+` ownership. - -CN can prove the calls to `+zero+` with `+&p->x+` and `+&p->y+` are safe because it decomposes the `+Block(p)+` into two `+Block+`, one for member `+x+`, one for member `+y+`. Later, the reverse happens: following the two calls to `+zero+`, as per `+zero+`’s precondition, `+init_point+` has ownership of two adjacent `+Owned+` resources – ownership for the two struct member pointers, with the member now initialised. Since the postcondition of `+init_point+` requires ownership `+Owned(p)+`, CN combines these back into a compound resource. The resulting `+Owned+` resource has for an output the struct value `+s2+` that is composed of the zeroed member values for `+x+` and `+y+`. - -=== Resource inference - -To handle the required resource inference, CN "`eagerly`" decomposes all `+struct+` resources into resources for the struct members, and "`lazily`" re-composes them as needed. - -We can see this if, for instance, we experimentally change the `+transpose+` example from above to force a type error. Let’s insert an `+/*@ assert(false) @*/+` CN assertion in the middle of the `+transpose+` function (more on CN assertions later), so we can inspect CN’s proof context shown in the error report. - -// CHT: seconding others' comment that it would be helpful to have a better way of seeing intermediate proof context - -[source,c] ----- -include::exercises/transpose.broken.c[] ----- - -The precondition of `+transpose+` asserts ownership of an `+Owned(p)+` resource. The error report now instead lists under "`Available resources`" two resources: - -* `+Owned(member_shift(p, x))+` with output `+s.x+` and - -* `+Owned(member_shift(p, y))+` with output `+s.y+` - -Here `+member_shift(p,m)+` is the CN expression that constructs, from a `+struct s+` pointer `+p+`, the "`shifted`" pointer for its member `+m+`. - -When the function returns the two member resources are recombined "`on demand`" to satisfy the postcondition `+Owned(p)+`. - -=== Exercises - -*Init point.* Insert CN `+assert(false)+` statements in different statement positions of `+init_point+` and check how the available resources evolve. - -*Transpose (again).* Recreate the transpose function from before, now using the swap function verified earlier (for `+struct upoint+`, with unsigned member values). - -[source,c] ----- -include::exercises/transpose2.c[] ----- - -//// -BCP: Some more things to think about including... - - Something about CN's version of the frame rule (see - bcp_framerule.c, though the example is arguably a bit - unnatural). - - Examples from Basic.v with allocation - there are lots of - interesting ones! -CP: Agreed. For now continuing with arrays, but will return to this later. -//// - -== Arrays and loops - -Another common datatype in C is arrays. Reasoning about memory ownership for arrays is more difficult than for the datatypes we have seen so far: C allows the programmer to access arrays using _computed pointers_, and the size of an array does not need to be known as a constant at compile time. - -To support reasoning about code manipulating arrays and computed pointers, CN has _iterated resources_. For instance, to specify ownership of an `+int+` array with 10 cells starting at pointer `+p+`, CN uses the iterated resource - -[source,c] ----- -each (i32 i; 0i32 <= i && i < 10i32) - { Owned(array_shift(p,i)) } ----- - -In detail, this can be read as follows: - -* for each integer `+i+` of CN type `+i32+`, … - -* if `+i+` is between `+0+` and `+10+`, … - -* assert ownership of a resource `+Owned+` … - -* for cell `+i+` of the array with base-address `+p+`. - -Here `+array_shift(p,i)+` computes a pointer into the array at pointer `+p+`, appropriately offset for index `+i+`. - -In general, iterated resource specifications take the form - -[source,c] ----- -each (BT Q; GUARD) { RESOURCE } ----- - -comprising three parts: - -* `+BT Q+`, for some CN type `+BT+` and name `+Q+`, introduces the quantifier `+Q+` of basetype `+BT+`, which is bound in `+GUARD+` and `+RESOURCE+`; - -* `+GUARD+` is a boolean-typed expression delimiting the instances of `+Q+` for which ownership is asserted; and - -* `+RESOURCE+` is any non-iterated CN resource. - -=== First array example - -Let’s see how this applies to a first example of an array-manipulating function. Function `+read+` takes three arguments: the base pointer `+p+` of an `+int+` array, the length `+n+` of the array, and an index `+i+` into the array; `+read+` then returns the value of the `+i+`-th array cell. - -[source,c] ----- -include::exercises/array_load.broken.c[] ----- - -The CN precondition requires - -- ownership of the array on entry — one `+Owned+` resource for each array index between `+0+` and `+n+` — and -- that `+i+` lies within the range of owned indices. - -On exit the array ownership is returned again. - -This specification, in principle, should ensure that the access `+p[i]+` is safe. However, running CN on the example produces an error: CN is unable to find the required ownership for reading `+p[i]+`. - -.... -cn build/solutions/array_load.broken.c -[1/1]: read -build/solutions/array_load.broken.c:5:10: error: Missing resource for reading - return p[i]; - ^~~~ -Resource needed: Owned(array_shift(p, (u64)i)) -.... - -The reason is that when searching for a required resource, such as the `+Owned+` resource for `+p[i]+` here, CN’s resource inference does not consider iterated resources. Quantifiers, as used by iterated resources, can make verification undecidable, so, in order to maintain predictable type checking, CN delegates this aspect of the reasoning to the user. - -To make the `+Owned+` resource required for accessing `+p[i]+` available to CN’s resource inference we have to "`extract`" ownership for index `+i+` out of the iterated resource. - -[source,c] ----- -include::exercises/array_load.c[] ----- - -Here the CN comment `+/*@ extract Owned, i; @*/+` is a CN "`ghost statement`"/proof hint that instructs CN to instantiate any available iterated `+Owned+` resource for index `+i+`. In our example this operation splits the iterated resource into two: - -[source,c] ----- -each(i32 j; 0i32 <= j && j < n) { Owned(array_shift(p,j)) } ----- - -is split into - -1. the instantiation of the iterated resource at `+i+` -+ -[source,c] ----- -Owned(array_shift(p,i)) ----- -2. the remainder of the iterated resource, the ownership for all indices except `+i+` -+ -[source,c] ----- -each(i32 j; 0i32 <= j && j < n && j != i) - { Owned(array_shift(p,j)) } ----- - -After this extraction step, CN can use the (former) extracted resource to justify the access `+p[i]+`. - -Following an `+extract+` statement, CN moreover remembers the extracted index and can automatically "`reverse`" the extraction when needed: after type checking the access `+p[i]+` CN must ensure the function’s postcondition holds, which needs the full array ownership again (including the extracted index `+i+`); remembering the index `+i+`, CN then automatically merges resources (1) and (2) again to obtain the required full array ownership, and completes the verification of the function. - -So far the specification only guarantees safe execution but does not specify the behaviour of `+read+`. To address this, let’s return to the iterated resources in the function specification. When we specify `+take a1 = each ...+` here, what is `+a1+`? In CN, the output of an iterated resource is a _map_ from indices to resource outputs. In this example, where index `+j+` has CN type `+i32+` and the iterated resource is `+Owned+`, the output `+a1+` is a map from `+i32+` indices to `+i32+` values — CN type `+map+`. (If the type of `+j+` was `+i64+` and the resource `+Owned+`, `+a1+` would have type `+map+`.) - -We can use this to refine our specification with information about the functional behaviour of `+read+`. - -[source,c] ----- -include::exercises/array_load2.c[] ----- - -We specify that `+read+` does not change the array — the outputs `+a1+` and `+a2+`, before (resp. after) running the function, are the same — and that the value returned is `+a1[i]+`, `+a1+` at index `+i+`. - -=== Exercises - - -*Array read two.* Specify and verify the following function, `+array_read_two+`, which takes the base pointer `+p+` of an `+unsigned int+` array, the array length `+n+`, and two indices `+i+` and `+j+`. Assuming `+i+` and `+j+` are different, it returns the sum of the values at these two indices. -// CHT: is there a good way to write this that is agnostic as to whether i and j are different - -[source,c] ----- -include::exercises/add_two_array.c[] ----- - -//// -BCP: In this one I got quite tangled up in different kinds of integers, then got tangled up in (I think) putting the extract declarations in the wrong place. (I didn't save the not-working version, I'm afraid.) -//// - -*Swap array.* Specify and verify `+swap_array+`, which swaps the values of two cells of an `+int+` array. Assume again that `+i+` and `+j+` are different, and describe the effect of `+swap_array+` on the array value using the CN map update expression `+a[i:v]+`, which denotes the same map as `+a+`, except with index `+i+` updated to `+v+`. - -[source,c] ----- -include::exercises/swap_array.c[] ----- - -//// -BCP: I wrote this, which seemed natural but did not work -- I still don't fully understand why. I think this section will need some more examples / exercises to be fully digestible, or perhaps this is just yet another symptom of my imperfecdt understanding of how the numeric stuff works. - - void swap_array (int *p, int n, int i, int j) - /*@ requires take a1 = each(i32 k; 0i32 <= k && k < n) { Owned(array_shift(p,k)) }; - 0i32 <= i && i < n; - 0i32 <= j && j < n; - j != i; - take xi = Owned(array_shift(p,i)); - take xj = Owned(array_shift(p,j)) - ensures take a2 = each(i32 k; 0i32 <= k && k < n) { Owned(array_shift(p,k)) }; - a1[i:xj][j:xi] == a2 - @*/ - { - extract Owned, i; - extract Owned, j; - int tmp = p[i]; - p[i] = p[j]; - p[j] = tmp; - } -//// - -=== Loops - -The array examples covered so far manipulate one or two individual cells of an array. Another typical pattern in code working over arrays is to *loop*, uniformly accessing all cells of an array, or sub-ranges of it. - -In order to verify code with loops, CN requires the user to supply loop invariants -- CN specifications of all owned resources and the constraints required to verify each iteration of the loop. - - -Let's take a look at a simple first example. The following function, `+init_array+`, takes the base pointer `+p+` of a `+char+` array and the array length `+n+` and writes `+0+` to each array cell. -[source,c] ----- -include::exercises/init_array.c[] ----- - -If, for the moment, we focus just on proving safe execution of `+init_array+`, ignoring its functional behaviour, a specification might look as above: on entry `+init_array+` takes ownership of an iterated `+Owned+` resource -- one `+Owned+` resource for each index `+i+` of type `+u32+` (so necessarily greater or equal to `+0+`) up to `+n+`; on exit `+init_array+` returns the ownership. - -To verify this, we have to supply a loop invariant that specifies all resource ownership and the necessary constraints that hold before and after each iteration of the loop. Loop invariants are specified using the keyword `inv`, followed by CN specifications using the same syntax as in function pre- and postconditions. The variables in scope for loop invariants are all in-scope C variables, as well as CN variables introduced in the function precondition. *In loop invariants, the name of a C variable refers to its current value* (more on this shortly). - -// CHT: details on how 'inv' compares to 'requires' and 'ensures' would be helpful - i.e., should we read it as though the body of 'inv' -// is required at the start of the loop and ensured at the end of the loop? -// this would be helpful for understanding how linearity/resource ownership works in this example - -[source,c] ----- -include::solutions/init_array.c[] ----- -//// -BCP: Concrete syntax: Why not write something like "unchanged {p,n}" or "unchanged: p,n"? -//// -// CHT: seconding the above; as written it's unclear if the curly braces mean something other than set notation for things that are unchanged - -The main condition here is unsurprising: we specify ownership of an iterated resource for an array just like in the the pre- and postcondition. - -The second thing we need to do, however, is less straightforward. Recall that, as discussed at the start of the tutorial, function arguments in C are mutable, and so CN permits this as well.While in this example it is obvious that `+p+` and `+n+` do not change, CN currently requires the loop invariant to explicitly state this, using special notation `+{p} unchanged+` (and similarly for `+n+`). - -**Note.** If we forget to specify `+unchanged+`, this can lead to confusing errors. In this example, for instance, CN would verify the loop against the loop invariant, but would be unable to prove a function postcondition seemingly directly implied by the loop invariant (lacking the information that the postcondition's `+p+` and `+n+` are the same as the loop invariant's). Future CN versions may handle loop invariants differently and treat variables as immutable by default. -//// -BCP: This seems like a good idea! -//// - -The final piece needed in the verification is an `+extract+` statement, as used in the previous examples: to separate the individual `+Owned+` resource for index `+j+` out of the iterated `+Owned+` resource and make it available to the resource inference, we specify `+extract Owned, j;+`. - - -With the `+extract+` statements in place, CN accepts the function. - -=== Second loop example - -However, on closer look, the specification of `+init_array+` is overly strong: it requires an iterated `+Owned+` resource for the array on entry. If, as the name suggests, the purpose of `+init_array+` is to initialise the array, then a precondition asserting only an iterated `+Block+` resource for the array should also be sufficient. The modified specification is then as follows. - -[source,c] ----- -include::exercises/init_array2.c[] ----- - -This specification *should* hold: assuming ownership of an uninitialised array on entry, each iteration of the loop initialises one cell of the array, moving it from `+Block+` to `+Owned+` "`state`", so that on function return the full array is initialised. (Recall that stores only require `+Block+` ownership of the written memory location, so ownership of not-necessarily-initialised memory.) - -To verify this modified example we again need a loop invariant. This time, the loop invariant is more involved, however: since each iteration of the loop initialises one more array cell, the loop invariant has to do precise book-keeping of the initialisation status of the array. - -To do so, we partition the array ownership into two parts: for each index of the array the loop has already visited, we have an `+Owned+` resource, for all other array indices we have the (unchanged) `+Block+` ownership. - -[source,c] ----- -include::solutions/init_array2.c[] ----- - -Let's go through this line-by-line: - -- We assert ownership of an iterated `+Owned+` resource, one for each index `+i+` strictly smaller than loop variable `+j+`. - -- All remaining indices `+i+`, between `+j+` and `+n+` are still uninitialised, so part of the iterated `+Block+` resource. - -- As in the previous example, we assert `+p+` and `+n+` are unchanged. - -- Finally, unlike in the previous example, this loop invariant involves `+j+`. We therefore also need to know that `+j+` does not exceed the array length `+n+`. Otherwise CN would not be able to prove that, on completing the last loop iteration, `+j=n+` holds. This, in turn, is needed to show that when the function returns, ownership of the iterated `+Owned+` resource --- as specified in the loop invariant --- is fully consumed by the function's post-condition and there is no left-over unused resource. - -As before, we also have to instruct CN to `+extract+` ownership of individual array cells out of the iterated resources: - -- to allow CN to extract the individual `+Block+` to be written we use `+extract Block, j;+`; - -- the store returns a matching `+Owned+` resource for index `+j+`; - -- finally, we put `+extract Owned, j;+` to allow CN to "`attach`" this resource to the iterated `+Owned+` resource. CN issues a warning, because nothing is, in fact, extracted: we are using `+extract+` only for the "`reverse`" direction. -// CHT: can we elaborate on this last sentence? - - -=== Exercises - -**Init array reverse.** Verify the function `+init_array_rev+`, which has the same specification as `+init_array2+`, but reverses the array in decreasing index order (from right to left). - -[source,c] ----- -include::exercises/init_array_rev.c[] ----- - - - -//// -___________________________________________________________________________ -___________________________________________________________________________ -___________________________________________________________________________ -___________________________________________________________________________ -___________________________________________________________________________ - -BCP: I'll put my new stuff below here... -//// - -== Defining Predicates - -// We should show how to define predicates earlier -- -// - e.g., with numeric ranges!! - -//// -BCP: The text becomes a bit sketchy from here on! But hopefully there's -still enough structure here to make sense of the examples... -//// - -Suppose we want to write a function that takes *two* pointers to -integers and increments the contents of both of them. - -First, let's deal with the "normal" case where the two arguments do -not alias... - -// CHT: many of these examples repeatedly throw "warning: CN pointer equality is not the same as C's (will not warn again)" -// can we elaborate on the difference - -// CHT: is there a subtle reason for the use of "m" here instead of using "n + 1" directly or is that style? - -[source,c] ----- -include::exercises/slf_incr2_noalias.c[] ----- - -But what if they do alias? The clunky solution is to write a whole -different version of incr2 with a different embedded specification... - -[source,c] ----- -include::exercises/slf_incr2_alias.c[] ----- - -This is horrible. Much better is to define a predicate to use -in the pre- and postconditions that captures both cases together: - -[source,c] ----- -include::exercises/slf_incr2.c[] ----- - -**Note**: At the moment, CN does not derive pointer disjointness -constraints from resources: from simultaneous ownership of the -resources `+Owned(p)+` and `+Owned(q)+` CN does not automatically -learn `+(p != q)+`, even though that’s clearly implied. This was -turned off for performance reasons at some point, but once -performance is back to normal again it should come back. In the mean -time, we have to add `+(p != q)+` as an additional precondition to -`+call_both+`. - -== Allocating and Deallocating Memory - -At the moment, CN does not understand the `+malloc+` and `+free+` -functions. They are a bit tricky because they rely on a bit of -polymorphism and a typecast between `+char*+` -- the result type of -`+malloc+` and argument type of `+free+` -- and the actual type of the -object being allocated or deallocated. - -However, for any given type, we can define a type-specific function -that allocates heap storage with exactly that type. The -implementation of this function cannot be checked by CN, but we can -give just the spec, together with a promise to link against an -external C library providing the implementation: -[source,c] ----- -include::exercises/malloc.h[] ----- - -(Alternatively we can include an implementation written in arbitrary C -inside a CN file by marking it with the keyword `+trusted+` at the top -of its CN specification.) - -Similarly: -[source,c] ----- -include::exercises/free.h[] ----- - -Now we can write code that allocates and frees memory: -[source,c] ----- -include::exercises/slf17_get_and_free.c[] ----- - -We can also define a "safer", ML-style version of `+malloc+` that -handles both allocation and initialization: - -[source,c] ----- -include::exercises/ref.h[] ----- - -//// -BCP: This example is a bit broken: the file `+slf0_basic_incr.c+` does not appear at all in the tutorial, though a slightly different version (with signed numbers) does... -//// - -[source,c] ----- -include::exercises/slf16_basic_succ_using_incr.c[] ----- - -=== Exercises - -// BCP: There should be a non-ref-using version of this earlier, for comparison. - -Prove a specification for the following program that reveals *only* -that it returns a pointer to a number that is greater than the number -pointed to by its argument. -// CHT: agree with bcp, it's a bit of a jump (I can look into adding this) - -[source,c] ----- -include::exercises/slf_ref_greater.c[] ----- - -== Side Note - -Here is another syntax for external / unknown -functions, together with an example of a loose specification: - -//// -BCP: This is a bit random -- it's not clear people need to know about this alternate syntax, and it's awkwardly mixed with a semi-interesting example that's not relevant to this section. -//// - -[source,c] ----- -include::exercises/slf18_two_dice.c[] ----- - -== Lists - -Now it's time to look at some more interesting heap structures. - -To begin with, here is a C definition for linked list cells, together -with allocation and deallocation functions: - -// CHT: it's surprising to me that CN lets you take Block(null). -// this seems to be the cause of the return != null check with the comment saying it should not be needed - -[source,c] ----- -include::exercises/list1.h[] ----- - -To write specifications for C functions that manipulate lists, we need -to define a CN "predicate" that describes *mathematical* list -structures, as one would do in ML, Haskell, or Coq. (We call them -"sequences" here to avoid overloading the word "list".) - - -Intuitively, the `+IntList+` predicate walks over a pointer structure -in the C heap and extracts an `+Owned+` version of the mathematical -list that it represents. - -[source,c] ----- -include::exercises/list2.h[] ----- - -We can also write specification-level "functions" by ordinary -functional programming (in slightly strange, unholy-union-of-C-and-ML -syntax): - -[source,c] ----- -include::exercises/list3.h[] ----- - -We use the `+IntList+` predicate to specify functions returning the -empty list and the cons of a number and a list. - -[source,c] ----- -include::exercises/list4.h[] ----- - -Finally, we can collect all this stuff into a single header file and -add the usual C `+#ifndef+` gorp to avoid complaints from the compiler -if it happens to get included twice from the same source file later. - -[source,c] ----- -include::exercises/list.h[] ----- - -//// -BCP: What does this comment mean? - 'return != NULL' should not be needed -//// - -=== Append - -With this basic infrastructure in place, we can start specifying and -verifying list-manipulating functions. First, `+append+`. - -Here is its specification (in a separate file, because we'll want to -use it multiple times below.) - -[source,c] ----- -include::exercises/append.h[] ----- - -Here is a simple destructive `+append+` function. Note the two uses -of the `+unfold+` annotation in the body, which are needed to help the -CN typechecker. - -// BCP: Can someone add a more technical explanation of why they are needed and exactly what they do? -// CHT: I'm reading this as similar to Rocq's "unfold", and as necessary because doing it eagerly could -// cause nontermination. Is that accurate? - -[source,c] ----- -include::exercises/append.c[] ----- - -=== List copy - -Here is an allocating list copy function with a pleasantly light -annotation burden. - -[source,c] ----- -include::exercises/list_copy.c[] ----- - -=== Merge sort - -Finally, here is a slightly tricky in-place version of merge sort that -avoids allocating any new list cells in the splitting step by taking -alternate cells from the original list and linking them together into -two new lists of roughly equal lengths. - -[source,c] ----- -include::exercises/mergesort.c[] ----- - -=== Exercises - -*Allocating append*. Fill in the CN annotations on -`+IntList_append2+`. (You will need some in the body as well as at -the top.) - -[source,c] ----- -include::exercises/append2.c[] ----- - -Note that it would not make sense to do the usual -functional-programming trick of copying xs but sharing ys. (Why?) - -*Length*. Add annotations as appropriate: - -[source,c] ----- -include::exercises/list_length.c[] ----- - -// CHT: I used this file to test CN's behavior with nonterminating programs, and I think -// it could use some explanation for users with limited experience with partial correctness - does -// this already exist somewhere? - -*List deallocation*. Fill in the body of the following procedure and -add annotations as appropriate: - -[source,c] ----- -include::exercises/list_free.c[] ----- - -*Length with an accumulator*. Add annotations as appropriate: -// BCP: Removing / forgetting the unfold in this one gives a truly -// bizarre error message saying that the constraint "n == (n + length(L1))" -// is unsatisfiable... - -[source,c] ----- -include::exercises/slf_length_acc.c[] ----- - -== Working with External Lemmas - -**TODO**: This section should also show what the proof of the lemmas -looks like on the Coq side! - -// BCP: This needs to be filled in urgently!! - -=== List reverse - -The specification of list reversal in CN relies on the familiar -recursive list reverse function, with a recursive helper. - -[source,c] ----- -include::exercises/list_rev_spec.h[] ----- - -// CHT: I believe the definition of snoc is wrong (the body is just a copy of rev), should be: -// function [rec] (datatype seq) snoc(datatype seq xs, i32 y) { -// match xs { -// Seq_Nil {} => { -// Seq_Cons {head: y, tail : Seq_Nil {}} -// } -// Seq_Cons {head : h, tail : zs} => { -// Seq_Cons {head: h, tail : snoc(zs, y)} -// } -// } -// } - -To reason about the C implementation of list reverse, we need to help -the SMT solver by enriching its knowledge base with a couple of facts -about lists. The proofs of these facts require induction, so in CN we -simply state them as lemmas and defer the proofs to Coq. - -[source,c] ----- -include::exercises/list_rev_lemmas.h[] ----- - -Having stated these lemmnas, we can now complete the specification and -proof of `+IntList_rev+`. Note the two places where `+apply+` is used -to tell the SMT solver where to pay attention to the lemmas. - -//// -BCP: Why can't it always pay attention to them? (I guess -"performance", but at least it would be nice to be able to declare a -general scope where a given set of lemmas might be needed, rather than -specifying exactly where to use them.) -//// - -[source,c] ----- -include::exercises/list_rev.c[] ----- - -For comparison, here is another way to write the program, using a -while loop instead of recursion, with its specification and proof. - -// BCP: Why 0 instead of NULL?? (Is 0 better?) - -[source,c] ----- -include::exercises/list_rev_alt.c[] ----- - -=== Exercises - -**Sized stacks:** Fill in annotations where requested: - -[source,c] ----- -include::exercises/slf_sized_stack.c[] ----- - -// CHT: the solution here seems underspecified (e.g. S_.d == tl(S.d); return == hd(S.d)), -// and the need for unfold in exactly one branch in the solution (and both with the above constraints added) -// is surprising; is there intuition for the unfold placement here? - -// TODO: Trees: -// - deep copy -// - sum -// - maybe the accumulating sum - -//// -Further exercises: - - Some exercises that get THEM to write predicates, datatype - declarations, etc. - -Misc things to do: - - create a top-level TAGS file for emacs - - Figure out the smoothest way to do multiple includes - - Figure out how to display filenames -//// diff --git a/src/tutorial-with-ps-comments.adoc b/src/tutorial-with-ps-comments.adoc deleted file mode 100644 index 7723960c..00000000 --- a/src/tutorial-with-ps-comments.adoc +++ /dev/null @@ -1,964 +0,0 @@ -:source-highlighter: pygments -:pygments-style: manni -// :pygments-style: tango -:nofooter: -:prewrap!: - - -++++ - -++++ - - -**TODO PS: the tutorial doesn't say that you have to `make` in the tutorial to get the `exercises` and `solutions` directories** - -**TODO PS: the build still fails for me, complaining `/bin/sh: 1: [[: not found -R` ** - -= CN tutorial - -CN is a type system for verifying C code, focusing especially on low-level systems code. Compared to the normal C type system, CN checks not only that expressions and statements follow the correct typing discipline for C-types, but also that the C code executes _safely_ — does not raise C undefined behaviour — and _correctly_ — according to strong user-defined specifications. To accurately handle the complex semantics of C, CN builds on the https://github.com/rems-project/cerberus/[Cerberus semantics for C]. - -This tutorial introduces CN along a series of examples, starting with basic usage of CN on simple arithmetic functions and slowly moving towards more elaborate separation logic specifications of data structures. - -Many examples are taken from Arthur Charguéraud’s excellent https://softwarefoundations.cis.upenn.edu[Separation Logic Foundations]. These example have names starting with "`slf...`". - -== Installation - -To fetch and install CN, check the Cerberus repository at https://github.com/rems-project/cerberus and follow the instructions in https://github.com/rems-project/cerberus/blob/master/backend/cn/INSTALL.md[backend/cn/INSTALL.md]. - -Once completed, type `+cn --help+` in your terminal to ensure CN is installed and found by your system. This should print the list of available options CN can be executed with. - -To apply CN to a C file, run `+cn CFILE+`. - -== Basic usage - -=== First example - -For a first example, let’s look at a simple arithmetic function: `+add+`, shown below, takes two `+int+` arguments, `+x+` and `+y+`, and returns their sum. - -**TODO PS: this says `cn exercises/0.c` but the examples seem to be in `CN-tutorial/src/examples`. And the checked-in `0.c` is the version with the CN precondition, not the version in the tutorial `built_doc` without. ** - -[source,c] ----- -include::exercises/0.c[] ----- - -Running CN on the example produces an error message: - -.... -cn exercises/0.c -[1/1]: add -exercises/0.c:3:10: error: Undefined behaviour - return x+y; - ~^~ -an exceptional condition occurs during the evaluation of an expression (§6.5#5) -Consider the state in /var/folders/_v/ndl32wpj4bb3y9dg11rvc8ph0000gn/T/state_393431.html -.... - -**TODO PS: on ubuntu the file ends up in `/tmp/state_35eadb.html`, and for some reason I don't understand, firefox refuses to see it or open it. I had to install `lynx` to look at it. Does this output really exploit html, or could it just (maybe optionally with a CN command-line flag?) print to the terminal? ** - -**TODO PS: the CN error message `an exceptional condition occurs during the evaluation of an expression (§6.5#5)` is not super-informative. It won't even be clear to most people (who are not reading the tutorial when they see it) that 6.5#5 is a reference to a place in the C standard, or where they can find that (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf is a publically accessible final committee draft for C11). Kayvan has the corresponding text; I don't know whether it's trivial to print that paragraph, which explains what an exceptional condition is? ("If an exceptional condition occurs during the evaluation of an expression (that is, if the -result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.")** - - -CN rejects the program because it has _C undefined behaviour_, meaning it is not safe to execute. CN points to the relevant source location, the addition `+x+y+`, and paragraph §6.5#5 of the C language standard that specifies the undefined behaviour. It also puts a link to an HTML file with more details on the error to help in diagnosing the problem. - -Inspecting this HTML report (as we do in a moment) gives us possible example values for `+x+` and `+y+` that cause the undefined behaviour and hint at the problem: for very large values for `+x+` and `+y+`, such as `+1073741825+` and `+1073741824+`, the sum of `+x+` and `+y+` can exceed the representable range of a C `+int+` value: `+1073741825 + 1073741824 = 2^31+1+`, so their sum is larger than the maximal `+int+` value, `+2^31-1+`. - -Here `+x+` and `+y+` are _signed integers_, and in C, signed integer _overflow_ is undefined behaviour (UB). Hence, `+add+` is only safe to execute for smaller values. Similarly, _large negative_ values of `+x+` and `+y+` can cause signed integer _underflow_, also UB in C. We therefore need to rule out too large values for `+x+` and `+y+`, both positive and negative, which we do by writing a CN function specification. - -=== First function specification - -Shown below is our first function specification, for `+add+`, with a precondition that constraints `+x+` and `+y+` such that the sum of `+x+` and `+y+` lies between `+-2147483648+` and `+2147483647+`, so within the representable range of a C `+int+` value. - -[source,c] ----- -include::solutions/0.c[] ----- - -**TODO PS: I don't see a `solutions` directory - that example seems to be the `examples/0.c` ** - -In detail: - -* Function specification are given using special `+/*@ ... @*/+` comments, placed in-between the function argument list and the function body. - -* The keyword `+requires+` starts the precondition, a list of one or more CN conditions separated by semicolons. - -* In function specifications, the names of the function arguments, here `+x+` and `+y+`, refer to their _initial values_. (Function arguments are mutable in C.) - -* `+let sum = (i64) x + (i64) y+` is a let-binding, which defines `+sum+` as the value `+(i64) x + (i64) y+` in the remainder of the function specification. - -* Instead of C syntax, CN uses Rust-like syntax for integer types, such as `+u32+` for 32-bit unsigned integers and `+i64+` for signed 64-bit integers to make their sizes unambiguous. Here, `+x+` and `+y+`, of C-type `+int+`, have CN type `+i32+`. - -**TODO PS: those are not ISO C names, but they are Linux kernel integer type names (in addition to being Rust-like)** - -* To define `+sum+` we cast `+x+` and `+y+` to the larger `+i64+` type, using syntax `+(i64)+`, which is large enough to hold the sum of any two `+i32+` values. - -**TODO PS: The bitvector world does push us into a fairly horrible spec here. E.g., what is the user supposed to do if the arguments are already 64-bit types?** - -* Finally, we require this sum to be in-between the minimal and maximal `+int+` value. Integer constants, such as `+-2147483648i64+`, must specifiy their CN type (`+i64+`). - -Running CN on the annotated program passes without errors. This means with our specified precondition, `+add+` is safe to execute. - -We may, however, wish to be more precise. So far the specification gives no information to callers of `+add+` about its output. To also specify the return values we add a postcondition, using the `+ensures+` keyword. - -[source,c] ----- -include::solutions/1.c[] ----- - -Here we use the keyword `+return+`, only available in function postconditions, to refer to the return value, and equate it to `+sum+` as defined in the preconditions, cast back to `+i32+` type: `+add+` returns the sum of `+x+` and `+y+`. - -Running CN confirms that this postcondition also holds. - -=== Error reports - -In the original example CN reported a type error due to C undefined behaviour. While that example was perhaps simple enough to guess the problem and solution, this can become quite challenging as program and specification complexity increases. Diagnosing type errors is therefore an important part of using CN. CN tries to help with that by producting detailed error information, in the form of an HTML error report. - -**TODO PS: typo `producting` -> `producing` ** - -Let’s return to the type error from earlier (`+add+` without precondition) and take a closer look at this report. The report comprises two sections. - -.*CN error report* -image::images/0.error.png[*CN error report*] - -**TOOD PS: figure doesn't appear in the `built_doc` version ** - -*Path.* The first section, "`Path to error`", contains information about the control-flow path leading to the error. - -When type checking a C function, CN checks each possible control-flow path through the program individually. If CN detects UB or a violation of user-defined specifications, CN reports the problematic control-flow path, as a nested structure of statements: paths are split into sections, which group together statements between high-level control-flow positions (e.g. function entry, the start of a loop, the invocation of a `+continue+`, `+break+`, or `+return+` statement, etc.); within each section, statements are listed by source code location; finally, per statement, CN lists the typechecked sub-expressions, and the memory accesses and function calls within these. - -In our example, there is only one possible control-flow path: entering the function body (section "`function body`") and executing the block from lines 2 to 4, followed by the return statement at line 3. The entry for the latter contains the sequence of sub-expressions in the return statement, including reads of the variables `+x+` and `+y+`. - -In C, local variables in a function, including its arguments, are mutable and their address can be taken and passed as a value. CN therefore represents local variables as memory allocations that are manipulated using memory reads and writes. Here, type checking the return statement includes checking memory reads for `+x+` and `+y+`, at their locations `+&ARG0+` and `+&ARG1+`. The path report lists these reads and their return values: the read at `+&ARG0+` returns `+x+` (that is, the value of `+x+` originally passed to `+add+`); the read at `+&ARG1+` returns `+y+`. Alongside this symbolic information, CN displays concrete values: - -* `+1073741825i32 /* 0x40000001 */+` for x (the first value is the decimal representation, the second, in `+/*...*/+` comments, the hex equivalent) and - -* `+1073741824i32 /* 0x40000000 */+` for `+y+`. - -For now, ignore the pointer values `+{@0; 4}+` for `+x+` and `+{@0; 0}+` for `+y+`. - -These concrete values are part of a _counterexample_: a concrete valuation of variables and pointers in the program that that leads to the error. (The exact values may vary on your machine, depending on the version of Z3 installed on your system.) - -*Proof context.* The second section, below the error trace, lists the proof context CN has reached along this control-flow path. - -"`Available resources`" lists the owned resources, as discussed in later sections. - -"`Variables`" lists counterexample values for program variables and pointers. In addition to `+x+` and `+y+`, assigned the same values as above, this includes values for their memory locations `+&ARG0+` and `+&ARG1+`, function pointers in scope, and the `+__cn_alloc_history+`, all of which we ignore for now. - -Finally, "`Constraints`" records all logical facts CN has learned along the path. This includes user-specified assumptions from preconditions or loop invariants, value ranges inferred from the C-types of variables, and facts learned during the type checking of the statements. Here (`+add+` without precondition) the only constraints are some contraints inferred from C-types in the code. - -**TODO PS: typo `contraints` (spell-check the thing)** - -**TODO PS: delete `For instance,` below** - -* For instance, `+good(x)+` says that the initial value of `+x+` is a "`good`" `+signed int+` value (i.e. in range). Here `+signed int+` is the same type as `+int+`, CN just makes the sign explicit. For integer types `+T+`, `+good+` requires the value to be in range of type `+T+`; for pointer types `+T+` it also requires the pointer to be aligned. For structs and arrays this extends in the obvious way to struct members or array cells. - -* `+repr+` requires representability (not alignment) at type `+T+`, so `+repr(&ARGO)+`, for instance, records that the pointer to `+x+` is representable at C-type `+signed int*+`; - - -**TODO PS: I don't know what that means. If I take it literally, it'd normally be false, as on 64-bit archs, pointers are typically *not* representable at `signed int` or `unsigned int` types.** - -* `+aligned(&ARGO, 4u64)+`, moreover, states that it is 4-byte aligned. - -=== Another arithmetic example - -Let’s apply what we know so far to another simple arithmetic example. - -The function `+doubled+`, shown below, takes an int `+n+`, defines `+a+` as `+n+` incremented, `+b+` as `+n+` decremented, and returns the sum of the two. - -[source,c] ----- -include::exercises/slf1_basic_example_let.signed.c[] ----- - -We would like to verify this is safe, and that `+doubled+` returns twice the value of `+n+`. Running CN on `+doubled+` leads to a type error: the increment of `+a+` has undefined behaviour. - -**TODO PS: when explaining CN, perhaps we want to use another phrase to talk about refinement-type errors: the user will typically expect "type error" to refer to something like a C type error. Perhaps "verification error"?? ** - -As in the first example, we need to ensure that `+n+1+` does not overflow and `+n-1+` does not underflow. Similarly also `+a+b+` has to be representable at `+int+` type. - -[source,c] ----- -include::solutions/slf1_basic_example_let.signed.c[] ----- - -We can specify these using a similar style of precondition as in the first example. We first define `+n_+` as `+n+` cast to type `+i64+` — i.e. a type large enough to hold `+n+1+`, `+n-1+` and `+a+b+` for any possible `+i32+` value for `+n+`. Then we specify that decrementing `+n_+` does not go below the minimal `+int+` value, that incrementing `+n_+` does not go above the maximal value, and that `+n+` doubled is also in range. These preconditions together guarantee safe execution. - -To capture the functional behaviour, the postcondition specifies that `+return+` is twice the value of `+n+`. - -=== Exercise - -*Quadruple.* Specify the precondition needed to ensure safety of the C function `+quadruple+`, and a postcondition that describes its return value. - -[source,c] ----- -include::exercises/slf2_basic_quadruple.signed.c[] ----- - -*Abs.* Give a specification to the C function `+abs+`, which computes the absolute value of a given `+int+` value. To describe the return value, use CN’s ternary "`+_ ? _ : _+`" operator. Given a boolean `+b+`, and expressions `+e1+` and `+e2+` of the same basetype, `+b ? e1 : e2+` returns `+e1+` if `+b+` holds and `+e2+` otherwise. - -[source,c] ----- -include::exercises/abs.c[] ----- - -== Pointers and simple ownership - -So far we’ve only considered example functions manipulating integer values. Verification becomes more interesting and challenging when _pointers_ are involved, because the safety of memory accesses via pointers has to be verified. - -CN uses _separation logic resource types_ and the concept of _ownership_ to reason about memory accesses. A resource is the permission to access a region of memory. Unlike logical constraints, resource ownership is _unique_, meaning resources cannot be duplicated. - -Let’s look at a simple example. The function `+read+` takes an `+int+` pointer `+p+` and returns the pointee value. - -[source,c] ----- -include::exercises/read.c[] ----- - -Running CN on this example produces the following error: - -.... -cn exercises/read.c -[1/1]: read -exercises/read.c:3:10: error: Missing resource for reading - return *p; - ^~ -Resource needed: Owned(p) -Consider the state in /var/folders/_v/ndl32wpj4bb3y9dg11rvc8ph0000gn/T/state_403624.html -.... - -For the read `+*p+` to be safe, ownership of a resource is missing: a resource `+Owned(p)+`. - -=== The Owned resource type - -Given a C-type `+T+` and pointer `+p+`, the resource `+Owned(p)+` asserts ownership of a memory cell at location `+p+` of the size of C-type `+T+`. It is is CN’s equivalent of a points-to assertion in separation logic (indexed by C-types `+T+`). - -In this example we can ensure the safe execution of `+read+` by adding a precondition that requires ownership of `+Owned(p)+`, as shown below. For now ignore the notation `+take ... = Owned(p)+`. Since `+read+` maintains this ownership, we also add a corresponding postcondition, whereby `+read+` returns ownership of `+p+` after it is finished executing, in the form of another `+Owned(p)+` resource. - -[source,c] ----- -include::solutions/read.c[] ----- - -This specifications means that - -* any function calling `+read+` has to be able to provide a resource `+Owned(p)+` to pass into `+read+`, and - -* the caller will receive back a resource `+Owned(p)+` when `+read+` returns. - -=== Resource outputs - -However, a caller of `+read+` may also wish to know that `+read+` actually returns the correct value, the pointee of `+p+`, and also that it does not change memory at location `+p+`. To phrase both we need a way to refer to the pointee of `+p+`. - -In CN resources have _outputs_. Each resource outputs the information that can be derived from ownership of the resource. What information is returned is specific to the type of resource. A resource `+Owned(p)+` (for some C-type `+T+`) outputs the _pointee value_ of `+p+`, since that can be derived from the resource ownership: assume you have a pointer `+p+` and the associated ownership, then this uniquely determines the pointee value of `+p+`. - -CN uses the `+take+`-notation seen in the example above to refer to the output of a resource, introducing a new name binding for the output. The precondition `+take v1 = Owned(p)+` from the precondition does two things: (1) it assert ownership of resource `+Owned(p)+`, and (2) it binds the name `+v1+` to the resource output, here the pointee value of `+p+` at the start of the function. Similarly, the postcondition introduces the name `+v2+` for the pointee value on function return. - -That means we can use the resource outputs from the pre- and postcondition to strengthen the specification of `+read+` as planned. We add two new postconditions: we specify - -. that `+read+` returns `+v1+` (the initial pointee value of `+p+`), and -. that the pointee values `+v1+` and `+v2+` before and after execution of `+read+` (respectively) are the same. - -[source,c] ----- -include::solutions/read2.c[] ----- - -*Aside.* In standard separation logic the equivalent specification for `+read+` could have been phrased as follows (where `+return+` binds the return value in the postcondition): - -.... -∀p. -∀v1. { p ↦ v1 } - read(p) - { return. ∃v2. (p ↦ v2) /\ return = v1 /\ v1 = v2 } -.... - -CN’s `+take+` notation is just alternative syntax for quantification over the values of resources, but a useful one: the `+take+` notation syntactically restricts how these quantifiers can be used to ensure CN can always infer them. - -=== Exercises - -*Quadruple*. Specify the function `+quadruple_mem+`, that is similar to the earlier `+quadruple+` function, except that the input is passed as an `+int+` pointer. Write a specification that takes ownership of this pointer on entry and returns this ownership on exit, leaving the pointee value unchanged. - -[source,c] ----- -include::exercises/slf_quadruple_mem.c[] ----- - -*Abs*. Give a specification to the function `+abs_mem+`, which computes the absolute value of a number passed as an `+int+` pointer. - -[source,c] ----- -include::exercises/abs_mem.c[] ----- - -=== Linear resource ownership - -In the specifications we have written so far, functions that receive resources as part of their precondition also return this ownership in their postcondition. - -Let’s try the `+read+` example from earlier again, but with a postcondition that does not return the ownership: - -[source,c] ----- -include::exercises/read.broken.c[] ----- - -CN rejects this program with the following message: - -.... -cn build/exercises/read.broken.c -[1/1]: read -build/exercises/read.broken.c:4:3: error: Left-over unused resource 'Owned(p)(v1)' - return *p; - ^~~~~~~~~~ -Consider the state in /var/folders/_v/ndl32wpj4bb3y9dg11rvc8ph0000gn/T/state_17eb4a.html -.... - -CN has typechecked the function, verified that it is safe to execute under the precondition (given ownership `+Owned(p)+`), and that the function (vacuously) satisfies its postcondition. But, following the check of the postcondition it finds that not all resources have been "`used up`". - -Given the above specification, `+read+` leaks memory: it takes ownership, does not return it, but also does not deallocate the owned memory or otherwise dispose of it. In CN this is a type error. - -CN’s resource types are _linear_ (as opposed to affine). This means not only that resources cannot be duplicated, but also that resources cannot simply be dropped or "`forgotten`". Every resource passed into a function has to either be used up by it, by returning it or passing it to another function that consumes it, or destroyed, by deallocating the owned area of memory (as we shall see later). - -CN’s motivation for linear tracking of resources is its focus on low-level systems software. CN checks C programs, in which, unlike higher-level garbage-collected languages, memory is managed manually, and memory leaks are typically very undesirable. - -As a consequence, function specifications have to do precise "`book-keeping`" of their resource footprint, and, in particular, return any unused resources back to the caller. - -=== The Block resource type - -Aside from the `+Owned+` resource seen so far, CN has another built-in resource type: `+Block+`. Given a C-type `+T+` and pointer `+p+`, `+Block(p)+` asserts the same ownership as `+Owned(p)+` — so ownership of a memory cell at `+p+` the size of type `+T+` — but in contrast to `+Owned+`, `+Block+` memory is not necessarily initialised. - -CN uses this distinction to prevent reads from uninitialised memory: - -* A read at C-type `+T+` and pointer `+p+` requires a resource `+Owned(p)+`, i.e., ownership of _initialised_ memory at the right C-type. The load returns the `+Owned+` resource unchanged. - -* A write at C-type `+T+` and pointer `+p+` needs only a `+Block(p)+` (so, unlike reads, writes to uninitialised memory are fine). The write consumes ownership of the `+Block+` resource (it destroys it) and returns a new resource `+Owned(p)+` with the value written as the output. This means the resource returned from a write records the fact that this memory cell is now initialised and can be read from. - -Since `+Owned+` carries the same ownership as `+Block+`, just with the additional information that the `+Owned+` memory is initalised, a resource `+Owned(p)+` is "`at least as good`" as `+Block(p)+` — an `+Owned(p)+` resource can be used whenever `+Block(p)+` is needed. For instance CN’s type checking of a write to `+p+` requires a `+Block(p)+`, but if an `+Owned(p)+` resource is what is available, this can be used just the same. This allows an already-initialised memory cell to be over-written again. - -Unlike `+Owned+`, whose output is the pointee value, `+Block+` has no meaningful output: its output is `+void+`/`+unit+`. - -=== Write example - -Let’s explore resources and their outputs in another example. The C function `+incr+` takes an `+int+` pointer `+p+` and increments the pointee value. - -[source,c] ----- -include::solutions/slf0_basic_incr.signed.c[] ----- - -In the precondition we assert ownership of resource `+Owned(p)+`, binding its output/pointee value to `+v1+`, and use `+v1+` to specify that `+p+` must point to a sufficiently small value at the start of the function not to overflow when incremented. The postcondition asserts ownership of `+p+` with output `+v2+`, as before, and uses this to express that the value `+p+` points to is incremented by `+incr+`: `+v2 == v1+1i32+`. - -If we incorrectly tweaked this specification and used `+Block(p)+` instead of `+Owned(p)+` in the precondition, as below, then CN would reject the program. - -[source,c] ----- -include::exercises/slf0_basic_incr.signed.broken.c[] ----- - -CN reports: - -.... -build/solutions/slf0_basic_incr.signed.broken.c:6:11: error: Missing resource for reading - int n = *p; - ^~ -Resource needed: Owned(p) -Consider the state in /var/folders/_v/ndl32wpj4bb3y9dg11rvc8ph0000gn/T/state_5da0f3.html -.... - -The `+Owned(p)+` resource required for reading is missing, since, as per precondition, only `+Block(p)+` is available. Checking the linked HTML file confirms this. Here the section "`Available resources`" lists all resource ownership at the point of the failure: - -* `+Block(p)(u)+`, so ownership of uninitialised memory at location `+p+`; the output is a `+void+`/`+unit+` value `+u+` (specified in the second pair of parentheses) - -* `+Owned(&ARG0)(p)+`, the ownership of (initialised) memory at location `+&ARG0+`, so the memory location where the first function argument is stored; its output is the pointer `+p+` (not to be confused with the pointee of `+p+`); and finally - -* `+__CN_Alloc(&ARG0)(void)+` is a resource that records allocation information for location `+&ARG0+`; this is related to CN’s memory-object semantics, which we ignore for the moment. - -=== Exercises - -*Zero.* Write a specification for the function `+zero+`, which takes a pointer to _uninitialised_ memory and initialises it to `+0+`. - -[source,c] ----- -include::exercises/zero.c[] ----- - -*In-place double.* Give a specification for the function `+inplace_double+`, which takes an `+int+` pointer `+p+` and doubles the pointee value: specify the precondition needed to guarantee safe execution and a postcondition that captures the function’s behaviour. - -[source,c] ----- -include::exercises/slf3_basic_inplace_double.c[] ----- - -=== Multiple owned pointers - -When functions manipulate multiple pointers, we can assert their ownership just like before. However (as in standard separation logic) pointer ownership is unique, so simultaneous ownership of `+Owned+` or `+Block+` resources for two pointers requires these pointers to be disjoint. - -The following example shows the use of two `+Owned+` resources for accessing two different pointers: function `+add+` reads two `+int+` values in memory, at locations `+p+` and `+q+`, and returns their sum. - -[source,c] ----- -include::exercises/add_read.c[] ----- - -This time we use C’s `+unsigned int+` type. In C, over- and underflow of unsigned integers is not undefined behaviour, so we do not need any special preconditions to rule this out. Instead, when an arithmetic operation at unsigned type goes outside the representable range, the value "`wraps around`". - -The CN variables `+m+` and `+n+` (resp. `+m2+` and `+n2+`) for the pointee values of `+p+` and `+q+` before (resp. after) the execution of `+add+` have CN basetype `+u32+`, so unsigned 32-bit integers, matching the C `+unsigned int+` type. Like C’s unsigned integer arithmetic, CN unsigned int values wrap around when exceeding the value range of the type. - -Hence, the postcondition `+return == m+n+` holds also when the sum of `+m+` and `+n+` is greater than the maximal `+unsigned int+` value. - -In the following we will sometimes use unsigned integer types to focus on specifying memory ownership, rather than the conditions necessary to show absence of C arithmetic undefined behaviour. - -=== Exercises - -*Swap.* Specify the function `+swap+`, which takes two owned `+unsigned int+` pointers and swaps their values. - -[source,c] ----- -include::exercises/swap.c[] ----- - -*Transfer.* Write a specification for the function `+transfer+`, shown below. - -[source,c] ----- -include::exercises/slf8_basic_transfer.c[] ----- - -== Ownership of compound objects - -So far all examples have worked with just integers and pointers, but larger programs typically also manipulate compound values, often represented using C struct types. Specifying functions manipulating structs works in much the same way as with basic types. - -For instance, the following example uses a `+struct+` `+point+` to represent a point in two-dimensional space. The function `+transpose+` swaps a point’s `+x+` and `+y+` coordinates. - -[source,c] ----- -include::exercises/transpose.c[] ----- - -Here the precondition asserts ownership for `+p+`, at type `+struct point+`; the output `+s+` is a value of CN type `+struct point+`, i.e. a record with members `+i32+` `+x+` and `+i32+` `+y+`. The postcondition similarly asserts ownership of `+p+`, with output `+s2+`, and asserts the coordinates have been swapped, by referring to the members of `+s+` and `+s2+` individually. - -=== Compound Owned and Block resources - -While one might like to think of a struct as a single (compound) object that is manipulated as a whole, C permits more flexible struct manipulation: given a struct pointer, programmers can construct pointers to _individual struct members_ and pass these as values, even to other functions. - -CN therefore cannot treat resources for compound C types, such as structs, as primitive, indivisible units. Instead, `+Owned+` and `+Block+` are defined inductively in the structure of the C-type `+T+`. - -For struct types `+T+`, the `+Owned+` resource is defined as the collection of `+Owned+` resources for its members (as well as `+Block+` resources for any padding bytes in-between). The resource `+Block+`, similarly, is made up of `+Block+` resources for all members (and padding bytes). - -To handle code that manipulates pointers into parts of a struct object, CN can automatically decompose a struct resource into the member resources, and recompose it, as needed. The following example illustrates this. - -Recall the function `+zero+` from our earlier exercise. It takes an `+int+` pointer to uninitialised memory, with `+Block+` ownership, and initialises the value to zero, returning an `+Owned+` resource with output `+0+`. - -Now consider the function `+init_point+`, shown below, which takes a pointer `+p+` to a `+struct point+` and zero-initialises its members by calling `+zero+` twice, once with a pointer to struct member `+x+`, and once with a pointer to `+y+`. - -[source,c] ----- -include::exercises/init_point.c[] ----- - -As stated in its precondition, `+init_point+` receives ownership `+Block(p)+`. The `+zero+` function, however, works on `+int+` pointers and requires `+Block+` ownership. - -CN can prove the calls to `+zero+` with `+&p->x+` and `+&p->y+` are safe because it decomposes the `+Block(p)+` into two `+Block+`, one for member `+x+`, one for member `+y+`. Later, the reverse happens: following the two calls to `+zero+`, as per `+zero+`’s precondition, `+init_point+` has ownership of two adjacent `+Owned+` resources – ownership for the two struct member pointers, with the member now initialised. Since the postcondition of `+init_point+` requires ownership `+Owned(p)+`, CN combines these back into a compound resource. The resulting `+Owned+` resource has for an output the struct value `+s2+` that is composed of the zeroed member values for `+x+` and `+y+`. - -=== Resource inference - -To handle the required resource inference, CN "`eagerly`" decomposes all `+struct+` resources into resources for the struct members, and "`lazily`" re-composes them as needed. - -We can see this if, for instance, we experimentally change the `+transpose+` example from above to force a type error. Let’s insert an `+/*@ assert(false) @*/+` CN assertion in the middle of the `+transpose+` function (more on CN assertions later), so we can inspect CN’s proof context shown in the error report. - -[source,c] ----- -include::exercises/transpose.broken.c[] ----- - -The precondition of `+transpose+` asserts ownership of an `+Owned(p)+` resource. The error report now instead lists under "`Available resources`" two resources: - -* `+Owned(member_shift(p, x))+` with output `+s.x+` and - -* `+Owned(member_shift(p, y))+` with output `+s.y+` - -Here `+member_shift(p,m)+` is the CN expression that constructs, from a `+struct s+` pointer `+p+`, the "`shifted`" pointer for its member `+m+`. - -When the function returns the two member resources are recombined "`on demand`" to satisfy the postcondition `+Owned(p)+`. - -=== Exercises - -*Init point.* Insert CN `+assert(false)+` statements in different statement positions of `+init_point+` and check how the available resources evolve. - -*Transpose (again).* Recreate the transpose function from before, now using the swap function verified earlier (for `+struct upoint+`, with unsigned member values). - -[source,c] ----- -include::exercises/transpose2.c[] ----- - -//// -BCP: Some more things to think about including... - - Something about CN's version of the frame rule (see - bcp_framerule.c, though the example is arguably a bit - unnatural). - - Examples from Basic.v with allocation - there are lots of - interesting ones! -CP: Agreed. For now continuing with arrays, but will return to this later. -//// - -== Arrays and loops - -Another common datatype in C is arrays. Reasoning about memory ownership for arrays is more difficult than for the datatypes we have seen so far: C allows the programmer to access arrays using _computed pointers_, and the size of an array does not need to be known as a constant at compile time. - -To support reasoning about code manipulating arrays and computed pointers, CN has _iterated resources_. For instance, to specify ownership of an `+int+` array with 10 cells starting at pointer `+p+`, CN uses the iterated resource - -[source,c] ----- -each (i32 i; 0i32 <= i && i < 10i32) - { Owned(array_shift(p,i)) } ----- - -In detail, this can be read as follows: - -* for each integer `+i+` of CN type `+i32+`, … - -* if `+i+` is between `+0+` and `+10+`, … - -* assert ownership of a resource `+Owned+` … - -* for cell `+i+` of the array with base-address `+p+`. - -Here `+array_shift(p,i)+` computes a pointer into the array at pointer `+p+`, appropriately offset for index `+i+`. - -In general, iterated resource specifications take the form - -[source,c] ----- -each (BT Q; GUARD) { RESOURCE } ----- - -comprising three parts: - -* `+BT Q+`, for some CN type `+BT+` and name `+Q+`, introduces the quantifier `+Q+` of basetype `+BT+`, which is bound in `+GUARD+` and `+RESOURCE+`; - -* `+GUARD+` is a boolean-typed expression delimiting the instances of `+Q+` for which ownership is asserted; and - -* `+RESOURCE+` is any non-iterated CN resource. - -=== First array example - -Let’s see how this applies to a first example of an array-manipulating function. Function `+read+` takes three arguments: the base pointer `+p+` of an `+int+` array, the length `+n+` of the array, and an index `+i+` into the array; `+read+` then returns the value of the `+i+`-th array cell. - -[source,c] ----- -include::exercises/array_load.broken.c[] ----- - -The CN precondition requires - -- ownership of the array on entry — one `+Owned+` resource for each array index between `+0+` and `+n+` — and -- that `+i+` lies within the range of owned indices. - -On exit the array ownership is returned again. - -This specification, in principle, should ensure that the access `+p[i]+` is safe. However, running CN on the example produces an error: CN is unable to find the required ownership for reading `+p[i]+`. - -.... -cn build/solutions/array_load.broken.c -[1/1]: read -build/solutions/array_load.broken.c:5:10: error: Missing resource for reading - return p[i]; - ^~~~ -Resource needed: Owned(array_shift(p, (u64)i)) -.... - -The reason is that when searching for a required resource, such as the `+Owned+` resource for `+p[i]+` here, CN’s resource inference does not consider iterated resources. Quantifiers, as used by iterated resources, can make verification undecidable, so, in order to maintain predictable type checking, CN delegates this aspect of the reasoning to the user. - -To make the `+Owned+` resource required for accessing `+p[i]+` available to CN’s resource inference we have to "`extract`" ownership for index `+i+` out of the iterated resource. - -[source,c] ----- -include::exercises/array_load.c[] ----- - -Here the CN comment `+/*@ extract Owned, i; @*/+` is a CN "`ghost statement`"/proof hint that instructs CN to instantiate any available iterated `+Owned+` resource for index `+i+`. In our example this operation splits the iterated resource into two: - -[source,c] ----- -each(i32 j; 0i32 <= j && j < n) { Owned(array_shift(p,j)) } ----- - -is split into - -1. the instantiation of the iterated resource at `+i+` -+ -[source,c] ----- -Owned(array_shift(p,i)) ----- -2. the remainder of the iterated resource, the ownership for all indices except `+i+` -+ -[source,c] ----- -each(i32 j; 0i32 <= j && j < n && j != i) - { Owned(array_shift(p,j)) } ----- - -After this extraction step, CN can use the (former) extracted resource to justify the access `+p[i]+`. - -Following an `+extract+` statement, CN moreover remembers the extracted index and can automatically "`reverse`" the extraction when needed: after type checking the access `+p[i]+` CN must ensure the function’s postcondition holds, which needs the full array ownership again (including the extracted index `+i+`); remembering the index `+i+`, CN then automatically merges resources (1) and (2) again to obtain the required full array ownership, and completes the verification of the function. - -So far the specification only guarantees safe execution but does not specify the behaviour of `+read+`. To address this, let’s return to the iterated resources in the function specification. When we specify `+take a1 = each ...+` here, what is `+a1+`? In CN, the output of an iterated resource is a _map_ from indices to resource outputs. In this example, where index `+j+` has CN type `+i32+` and the iterated resource is `+Owned+`, the output `+a1+` is a map from `+i32+` indices to `+i32+` values — CN type `+map+`. (If the type of `+j+` was `+i64+` and the resource `+Owned+`, `+a1+` would have type `+map+`.) - -We can use this to refine our specification with information about the functional behaviour of `+read+`. - -[source,c] ----- -include::exercises/array_load2.c[] ----- - -We specify that `+read+` does not change the array — the outputs `+a1+` and `+a2+`, before resp. after running the function, are the same — and that the value returned is `+a1[i]+`, `+a1+` at index `+i+`. - -=== Exercises - - -*Array read two.* Specify and verify the following function, `+array_read_two+`, which takes the base pointer `+p+` of an `+unsigned int+` array, the array length `+n+`, and two indices `+i+` and `+j+`. Assuming `+i+` and `+j+` are different, it returns the sum of the values at these two indices. - - -[source,c] ----- -include::exercises/add_two_array.c[] ----- - -//// -BCP: In this one I got quite tangled up in different kinds of integers, then got tangled up in (I think) putting the extract declarations in the wrong place. (I didn't save the not-working version, I'm afraid.) -//// - -*Swap array.* Specify and verify `+swap_array+`, which swaps the values of two cells of an `+int+` array. Assume again that `+i+` and `+j+` are different, and describe the effect of `+swap_array+` on the array value using the CN map update expression `+a[i:v]+`, which denotes the same map as `+a+`, except with index `+i+` updated to `+v+`. - -[source,c] ----- -include::exercises/swap_array.c[] ----- - -//// -BCP: I wrote this, which seemed natural but did not work -- I still don't fully understand why. I think this section will need some more examples / exercises to be fully digestible, or perhaps this is just yet another symptom of my imperfecdt understanding of how the numeric stuff works. - - void swap_array (int *p, int n, int i, int j) - /*@ requires take a1 = each(i32 k; 0i32 <= k && k < n) { Owned(array_shift(p,k)) }; - 0i32 <= i && i < n; - 0i32 <= j && j < n; - j != i; - take xi = Owned(array_shift(p,i)); - take xj = Owned(array_shift(p,j)) - ensures take a2 = each(i32 k; 0i32 <= k && k < n) { Owned(array_shift(p,k)) }; - a1[i:xj][j:xi] == a2 - @*/ - { - extract Owned, i; - extract Owned, j; - int tmp = p[i]; - p[i] = p[j]; - p[j] = tmp; - } -//// - -=== Loops - -The array examples covered so far manipulate one or two individual cells of an array. Another typical pattern in code working over arrays is to *loop*, uniformly accessing all cells of an array, or sub-ranges of it. - -In order to verify code with loops, CN requires the user to supply loop invariants -- CN specifications of all owned resources and the constraints required to verify each iteration of the loop. - - -Let's take a look at a simple first example. The following function, `+init_array+`, takes the base pointer `+p+` of a `+char+` array and the array length `+n+` and writes `+0+` to each array cell. -[source,c] ----- -include::exercises/init_array.c[] ----- - -If, for the moment, we focus just on proving safe execution of `+init_array+`, ignoring its functional behaviour, a specification might look as above: on entry `+init_array+` takes ownership of an iterated `+Owned+` resource -- one `+Owned+` resource for each index `+i+` of type `+u32+` (so necessarily greater or equal to `+0+`) up to `+n+`; on exit `+init_array+` returns the ownership. - -To verify this, we have to supply a loop invariant that specifies all resource ownership and the necessary constraints that hold before and after each iteration of the loop. Loop invariants are specified using the keyword `inv`, followed by CN specifications using the same syntax as in function pre- and postconditions. The variables in scope for loop invariants are all in-scope C variables, as well as CN variables introduced in the function precondition. *In loop invariants, the name of a C variable refers to its current value* (more on this shortly). - -[source,c] ----- -include::solutions/init_array.c[] ----- -//// -BCP: Concrete syntax: Why not write something like "unchanged {p,n}" or "unchanged: p,n"? -//// - -The main condition here is unsurprising: we specify ownership of an iterated resource for an array just like in the the pre- and postcondition. - -The second thing we need to do, however, is less straightforward. Recall that, as discussed at the start of the tutorial, function arguments in C are mutable, and so CN permits this as well.While in this example it is obvious that `+p+` and `+n+` do not change, CN currently requires the loop invariant to explicitly state this, using special notation `+{p} unchanged+` (and similarly for `+n+`). - -**Note.** If we forget to specify `+unchanged+`, this can lead to confusing errors. In this example, for instance, CN would verify the loop against the loop invariant, but would be unable to prove a function postcondition seemingly directly implied by the loop invariant (lacking the information that the postcondition's `+p+` and `+n+` are the same as the loop invariant's). Future CN versions may handle loop invariants differently and treat variables as immutable by default. -//// -BCP: This seems like a good idea! -//// - -The final piece needed in the verification is an `+extract+` statement, as used in the previous examples: to separate the individual `+Owned+` resource for index `+j+` out of the iterated `+Owned+` resource and make it available to the resource inference, we specify `+extract Owned, j;+`. - - -With the `+extract+` statements in place, CN accepts the function. - -=== Second loop example - -However, on closer look, the specification of `+init_array+` is overly strong: it requires an iterated `+Owned+` resource for the array on entry. If, as the name suggests, the purpose of `+init_array+` is to initialise the array, then a precondition asserting only an iterated `+Block+` resource for the array should also be sufficient. The modified specification is then as follows. - -[source,c] ----- -include::exercises/init_array2.c[] ----- - -This specification *should* hold: assuming ownership of an uninitialised array on entry, each iteration of the loop initialises one cell of the array, moving it from `+Block+` to `+Owned+` "`state`", so that on function return the full array is initialised. (Recall that stores only require `+Block+` ownership of the written memory location, so ownership of not-necessarily-initialised memory.) - -To verify this modified example we again need a loop invariant. This time, the loop invariant is more involved, however: since each iteration of the loop initialises one more array cell, the loop invariant has to do precise book-keeping of the initialisation status of the array. - -To do so, we partition the array ownership into two parts: for each index of the array the loop has already visited, we have an `+Owned+` resource, for all other array indices we have the (unchanged) `+Block+` ownership. - -[source,c] ----- -include::solutions/init_array2.c[] ----- - -Let's go through this line-by-line: - -- We assert ownership of an iterated `+Owned+` resource, one for each index `+i+` strictly smaller than loop variable `+j+`. - -- All remaining indices `+i+`, between `+j+` and `+n+` are still uninitialised, so part of the iterated `+Block+` resource. - -- As in the previous example, we assert `+p+` and `+n+` are unchanged. - -- Finally, unlike in the previous example, this loop invariant involves `+j+`. We therefore also need to know that `+j+` does not exceed the array length `+n+`. Otherwise CN would not be able to prove that, on completing the last loop iteration, `+j=n+` holds. This, in turn, is needed to show that when the function returns, ownership of the iterated `+Owned+` resource --- as specified in the loop invariant --- is fully consumed by the function's post-condition and there is no left-over unused resource. - -As before, we also have to instruct CN to `+extract+` ownership of individual array cells out of the iterated resources: - -- to allow CN to extract the individual `+Block+` to be written we use `+extract Block, j;+`; - -- the store returns a matching `+Owned+` resource for index `+j+`; - -- finally, we put `+extract Owned, j;+` to allow CN to "`attach`" this resource to the iterated `+Owned+` resource. CN issues a warning, because nothing is, in fact, extracted: we are using `+extract+` only for the "`reverse`" direction. - - -=== Exercises - -**Init array reverse.** Verify the function `+init_array_rev+`, which has the same specification as `+init_array2+`, but reverses the array in decreasing index order ("`from right to left`"). - -(BCP: Currently broken...) -//// -[source,c] ----- -include::exercises/init_array_rev.broken.c[] ----- -//// - - - - - - -//// _____________________________________________________________________ -//// -BCP: I'll put my new stuff below here... -//// - -== Defining Predicates - -// We should show how to define predicates earlier -- -// - e.g., with numeric ranges!! - -(Text is needed from here on! But hopefully there's enough structure to -make sense of the examples...) - -[source,c] ----- -include::exercises/bcp_incr2.c[] ----- - -== Allocating and Deallocating Memory - -[source,c] ----- -include::exercises/malloc.h[] ----- - -[source,c] ----- -include::exercises/free.h[] ----- - -[source,c] ----- -include::exercises/slf17_get_and_free.c[] ----- - -[source,c] ----- -include::exercises/slf_ref.h[] ----- - -[source,c] ----- -include::exercises/slf16_basic_succ_using_incr.c[] ----- - -=== Exercises - -// BCP: There should be a non-ref-using version of this earlier, for comparison. - -Prove a specification for the following program that reveals *only* that -it returns a number that is greater than its argument. - -[source,c] ----- -include::exercises/slf_ref_greater.c[] ----- - -=== Side note: External functions - -Another way to specify external / unknown functions: - -[source,c] ----- -include::exercises/slf18_two_dice.c[] ----- - - -== Lists - -Common definitions for lists: - -[source,c] ----- -include::exercises/list.h[] ----- - -The spec of append: -[source,c] ----- -include::exercises/append.h[] ----- - -A simple destructive append: - -[source,c] ----- -include::exercises/bcp_append.c[] ----- - -A non-allocating merge sort: - -[source,c] ----- -include::exercises/bcp_mergesort.c[] ----- - -An allocating list-copy: - -[source,c] ----- -include::exercises/bcp_copy.c[] ----- - -=== Exercises - -*Allocating append*. Fill in the CN annotations on IntList_append2. (You will need -some in the body as well as at the top.) - -[source,c] ----- -include::exercises/bcp_append2.c[] ----- - -Note that it would not make sense to do the usual -functional-programming trick of copying xs but sharing ys. (Why?) - -*Length*. Add annotations as appropriate: - -[source,c] ----- -include::exercises/bcp_length.c[] ----- - -*List deallocation*. Fill in the body of the following procedure and -add annotations as appropriate: - -[source,c] ----- -include::exercises/list_free.c[] ----- - -*Length with an accumulator*. Add annotations as appropriate: -// BCP: Removing / forgetting the unfold in this one gives a truly -// bizarre error message saying that the constraint "n == (n + length(L1))" -// is unsatisfiable... - -[source,c] ----- -include::exercises/slf_length_acc.c[] ----- - -// BCP: We should include the file names of the exercises so that people do -// not have to cut and paste. - -//// -== External Lemmas - -[source,c] ----- -include::exercises/list_rev.c[] ----- -//// - -// TODO (exercise?): sized stacks (significant exercise) - -// TODO: Trees: -// - deep copy -// - sum -// - maybe the accumulating sum - -//// -Further exercises: - - Some exercises that get THEM to write predicates, datatype - declarations, etc. - -Misc things to do: - - create a top-level TAGS file for emacs - - remove the bcp_ prefix from filenames (and here) - - Figure out the smoothest way to do multiple includes - - Figure out how to display filenames -////