Skip to content

Latest commit

 

History

History
2088 lines (1572 loc) · 70.4 KB

api-guidelines.md

File metadata and controls

2088 lines (1572 loc) · 70.4 KB

Rust API guidelines

Crate conformance checklist

  • Organization (crate is structured in an intelligible way)
    • Crate root re-exports common functionality (C-REEXPORT)
    • Modules provide a sensible API hierarchy (C-HIERARCHY)
  • Naming (crate aligns with Rust naming conventions)
    • Casing conforms to RFC 430 (C-CASE)
    • Ad-hoc conversions follow as_, to_, into_ conventions (C-CONV)
    • Methods on collections that produce iterators follow iter, iter_mut, into_iter (C-ITER)
    • Iterator type names match the methods that produce them (C-ITER-TY)
    • Ownership suffixes use _mut and _ref (C-OWN-SUFFIX)
    • Single-element containers implement appropriate getters (C-GETTERS)
  • Interoperability (crate interacts nicely with other library functionality)
    • Types eagerly implement common traits (C-COMMON-TRAITS)
      • Copy, Clone, Eq, PartialEq, Ord, PartialOrd, Hash Debug, Display, Default
    • Conversions use the standard traits From, AsRef, AsMut (C-CONV-TRAITS)
    • Collections implement FromIterator and Extend (C-COLLECT)
    • Data structures implement Serde's Serialize, Deserialize (C-SERDE)
    • Crate has a "serde" cfg option that enables Serde (C-SERDE-CFG)
    • Types are Send and Sync where possible (C-SEND-SYNC)
    • Error types are Send and Sync (C-SEND-SYNC-ERR)
    • Error types are meaningful, not () (C-MEANINGFUL-ERR)
    • Binary number types provide Hex, Octal, Binary formatting (C-NUM-FMT)
  • Macros (crate presents well-behaved macros)
  • Documentation (crate is abundantly documented)
  • Predictability (crate enables legible code that acts how it looks)
    • Smart pointers do not add inherent methods (C-SMART-PTR)
    • Conversions live on the most specific type involved (C-CONV-SPECIFIC)
    • Functions with a clear receiver are methods (C-METHOD)
    • Functions do not take out-parameters (C-NO-OUT)
    • Operator overloads are unsurprising (C-OVERLOAD)
    • Only smart pointers implement Deref and DerefMut (C-DEREF)
    • Deref and DerefMut never fail (C-DEREF-FAIL)
    • Constructors are static, inherent methods (C-CTOR)
  • Flexibility (crate supports diverse real-world use cases)
    • Functions expose intermediate results to avoid duplicate work (C-INTERMEDIATE)
    • Caller decides where to copy and place data (C-CALLER-CONTROL)
    • Functions minimize assumptions about parameters by using generics (C-GENERIC)
    • Traits are object-safe if they may be useful as a trait object (C-OBJECT)
  • Type safety (crate leverages the type system effectively)
    • Newtypes provide static distinctions (C-NEWTYPE)
    • Arguments convey meaning through types, not bool or Option (C-CUSTOM-TYPE)
    • Types for a set of flags are bitflags, not enums (C-BITFLAG)
    • Builders enable construction of complex values (C-BUILDER)
  • Dependability (crate is unlikely to do the wrong thing)
  • Debuggability (crate is conducive to easy debugging)
  • Future proofing (crate is free to improve without breaking users' code)
  • Necessities (to whom they matter, they really matter)
    • Public dependencies of a stable crate are stable (C-STABLE)
    • Crate and its dependencies have a permissive license (C-PERMISSIVE)

Organization

Crate root re-exports common functionality (C-REEXPORT)

Crates pub use the most common types for convenience, so that clients do not have to remember or write the crate's module hierarchy to use these types.

Re-exporting is covered in more detail in the The Rust Programming Language under Crates and Modules.

Examples from serde_json

The serde_json::Value type is the most commonly used type from serde_json. It is a re-export of a type that lives elsewhere in the module hierarchy, at serde_json::value::Value. The serde_json::value module defines other JSON-value-related things that are not re-exported. For example serde_json::value::Index is the trait that defines types that can be used to index into a Value using square bracket indexing notation. The Index trait is not re-exported at the crate root because it would be comparatively rare for a client crate to need to refer to it.

In addition to types, functions can be re-exported as well. In serde_json the serde_json::from_str function is a re-export of a function from the serde_json::de deserialization module, which contains other less common deserialization-related functionality that is not re-exported.

Modules provide a sensible API hierarchy (C-HIERARCHY)

Examples from Serde

The serde crate is two independent frameworks in one crate - a serialization half and a deserialization half. The crate is divided accordingly into serde::ser and serde::de. Part of the deserialization framework is isolated under serde::de::value because it is a relatively large API surface that is relatively unimportant, and it would crowd the more common, more important functionality located in serde::de if it were to share the same namespace.

Naming

Casing conforms to RFC 430 (C-CASE)

Basic Rust naming conventions are described in RFC 430.

In general, Rust tends to use CamelCase for "type-level" constructs (types and traits) and snake_case for "value-level" constructs. More precisely:

Item Convention
Crates unclear
Modules snake_case
Types CamelCase
Traits CamelCase
Enum variants CamelCase
Functions snake_case
Methods snake_case
General constructors new or with_more_details
Conversion constructors from_some_other_type
Local variables snake_case
Static variables SCREAMING_SNAKE_CASE
Constant variables SCREAMING_SNAKE_CASE
Type parameters concise CamelCase, usually single uppercase letter: T
Lifetimes short lowercase, usually a single letter: 'a, 'de, 'src

In CamelCase, acronyms count as one word: use Uuid rather than UUID. In snake_case, acronyms are lower-cased: is_xid_start.

In snake_case or SCREAMING_SNAKE_CASE, a "word" should never consist of a single letter unless it is the last "word". So, we have btree_map rather than b_tree_map, but PI_2 rather than PI2.

Examples from the standard library

The whole standard library. This guideline should be easy!

Ad-hoc conversions follow as_, to_, into_ conventions (C-CONV)

Conversions should be provided as methods, with names prefixed as follows:

Prefix Cost Ownership
as_ Free borrowed -> borrowed
to_ Expensive borrowed -> owned
into_ Variable owned -> owned

For example:

  • str::as_bytes() gives a &[u8] view into a &str, which is free.
  • str::to_owned() copies a &str to a new String, which may require memory allocation.
  • String::into_bytes() takes ownership a String and yields the underlying Vec<u8>, which is free.
  • BufReader::into_inner() takes ownership of a buffered reader and extracts out the underlying reader, which is free. Data in the buffer is discarded.
  • BufWriter::into_inner() takes ownership of a buffered writer and extracts out the underlying writer, which requires a potentially expensive flush of any buffered data.

Conversions prefixed as_ and into_ typically decrease abstraction, either exposing a view into the underlying representation (as) or deconstructing data into its underlying representation (into). Conversions prefixed to_, on the other hand, typically stay at the same level of abstraction but do some work to change one representation into another.

More examples from the standard library

Methods on collections that produce iterators follow iter, iter_mut, into_iter (C-ITER)

Per RFC 199.

For a container with elements of type U, iterator methods should be named:

fn iter(&self) -> Iter             // Iter implements Iterator<Item = &U>
fn iter_mut(&mut self) -> IterMut  // IterMut implements Iterator<Item = &mut U>
fn into_iter(self) -> IntoIter     // IntoIter implements Iterator<Item = U>

This guideline applies to data structures that are conceptually homogeneous collections. As a counterexample, the str type is slice of bytes that are guaranteed to be valid UTF-8. This is conceptually more nuanced than a homogeneous collection so rather than providing the iter/iter_mut/into_iter group of iterator methods, it provides str::bytes to iterate as bytes and str::chars to iterate as chars.

This guideline applies to methods only, not functions. For example percent_encode from the url crate returns an iterator over percent-encoded string fragments. There would be no clarity to be had by using an iter/iter_mut/into_iter convention.

Examples from the standard library

Iterator type names match the methods that produce them (C-ITER-TY)

A method called into_iter() should return a type called IntoIter and similarly for all other methods that return iterators.

This guideline applies chiefly to methods, but often makes sense for functions as well. For example the percent_encode function from the url crate returns an iterator type called PercentEncode.

These type names make the most sense when prefixed with their owning module, for example vec::IntoIter.

Examples from the standard library

Ownership suffixes use _mut, _ref (C-OWN-SUFFIX)

Functions often come in multiple variants: immutably borrowed, mutably borrowed, and owned.

The right default depends on the function in question. Variants should be marked through suffixes.

Exceptions

In the case of iterators, the moving variant can also be understood as an into conversion, into_iter, and for x in v.into_iter() reads arguably better than for x in v.iter_move(), so the convention is into_iter.

For mutably borrowed variants, if the mut qualifier is part of a type name, it should appear as it would appear in the type. For example Vec::as_mut_slice returns a mut slice; it does what it says.

Immutably borrowed by default

If foo uses/produces an immutable borrow by default, use:

  • The _mut suffix (e.g. foo_mut) for the mutably borrowed variant.
  • The _move suffix (e.g. foo_move) for the owned variant.
Examples from the standard library

TODO rust-api-guidelines#37

Owned by default

If foo uses/produces owned data by default, use:

  • The _ref suffix (e.g. foo_ref) for the immutably borrowed variant.
  • The _mut suffix (e.g. foo_mut) for the mutably borrowed variant.
Examples from the standard library

Single-element containers implement appropriate getters (C-GETTERS)

Single-element contains where accessing the element cannot fail should implement get and get_mut, with the following signatures.

fn get(&self) -> &V;
fn get_mut(&mut self) -> &mut V;

Single-element containers where the element is Copy (e.g. Cell-like containers) should instead return the value directly, and not implement a mutable accessor. TODO rust-api-guidelines#44

fn get(&self) -> V;

For getters that do runtime validation, consider adding unsafe _unchecked variants.

unsafe fn get_unchecked(&self, index) -> &V;
Examples from the standard library

Interoperability

Types eagerly implement common traits (C-COMMON-TRAITS)

Rust's trait system does not allow orphans: roughly, every impl must live either in the crate that defines the trait or the implementing type. Consequently, crates that define new types should eagerly implement all applicable, common traits.

To see why, consider the following situation:

  • Crate std defines trait Display.
  • Crate url defines type Url, without implementing Display.
  • Crate webapp imports from both std and url,

There is no way for webapp to add Display to url, since it defines neither. (Note: the newtype pattern can provide an efficient, but inconvenient workaround.

The most important common traits to implement from std are:

Conversions use the standard traits From, AsRef, AsMut (C-CONV-TRAITS)

The following conversion traits should be implemented where it makes sense:

The following conversion traits should never be implemented:

These traits have a blanket impl based on From and TryFrom. Implement those instead.

Examples from the standard library
  • From<u16> is implemented for u32 because a smaller integer can always be converted to a bigger integer.
  • From<u32> is not implemented for u16 because the conversion may not be possible if the integer is too big.
  • TryFrom<u32> is implemented for u16 and returns an error if the integer is too big to fit in u16.
  • From<Ipv6Addr> is implemented for IpAddr, which is a type that can represent both v4 and v6 IP addresses.

Collections implement FromIterator and Extend (C-COLLECT)

FromIterator and Extend enable collections to be used conveniently with the following iterator methods:

FromIterator is for creating a new collection containing items from an iterator, and Extend is for adding items from an iterator onto an existing collection.

Examples from the standard library
  • Vec<T> implements both FromIterator<T> and Extend<T>.

Data structures implement Serde's Serialize, Deserialize (C-SERDE)

Types that play the role of a data structure should implement Serialize and Deserialize.

An example of a type that plays the role of a data structure is linked_hash_map::LinkedHashMap.

An example of a type that does not play the role of a data structure is byteorder::LittleEndian.

Crate has a "serde" cfg option that enables Serde (C-SERDE-CFG)

If the crate relies on serde_derive to provide Serde impls, the name of the cfg can still be simply "serde" by using this workaround. Do not use a different name for the cfg like "serde_impls" or "serde_serialization".

Types are Send and Sync where possible (C-SEND-SYNC)

Send and Sync are automatically implemented when the compiler determines it is appropriate.

In types that manipulate raw pointers, be vigilant that the Send and Sync status of your type accurately reflects its thread safety characteristics. Tests like the following can help catch unintentional regressions in whether the type implements Send or Sync.

#[test]
fn test_send() {
    fn assert_send<T: Send>() {}
    assert_send::<MyStrangeType>();
}

#[test]
fn test_sync() {
    fn assert_sync<T: Sync>() {}
    assert_sync::<MyStrangeType>();
}

Error types are Send and Sync (C-SEND-SYNC-ERR)

An error that is not Send cannot be returned by a thread run with thread::spawn. An error that is not Sync cannot be passed across threads using an Arc. These are common requirements for basic error handling in a multithreaded application.

Error types are meaningful, not () (C-MEANINGFUL-ERR)

When defining functions that return Result, and the error carries no useful additional information, do not use () as the error type. () does not implement std::error::Error, and this causes problems for callers that expect to be able to convert errors to Error. Common error handling libraries like error-chain expect errors to implement Error.

Instead, define a meaningful error type specific to your crate.

Examples from the standard library
  • ParseBoolError is returned when failing to parse a bool from a string.

Binary number types provide Hex, Octal, Binary formatting (C-NUM-FMT)

These traits control the representation of a type under the {:X}, {:x}, {:o}, and {:b} format specifiers.

Implement these traits for any number type on which you would consider doing bitwise manipulations like | or &. This is especially appropriate for bitflag types. Numeric quantity types like struct Nanoseconds(u64) probably do not need these.

Macros

Input syntax is evocative of the output (C-EVOCATIVE)

Rust macros let you dream up practically whatever input syntax you want. Aim to keep input syntax familiar and cohesive with the rest of your users' code by mirroring existing Rust syntax where possible. Pay attention to the choice and placement of keywords and punctuation.

A good guide is to use syntax, especially keywords and punctuation, that is similar to what will be produced in the output of the macro.

For example if your macro declares a struct with a particular name given in the input, preface the name with the keyword struct to signal to readers that a struct is being declared with the given name.

// Prefer this...
bitflags! {
    struct S: u32 { /* ... */ }
}

// ...over no keyword...
bitflags! {
    S: u32 { /* ... */ }
}

// ...or some ad-hoc word.
bitflags! {
    flags S: u32 { /* ... */ }
}

Another example is semicolons vs commas. Constants in Rust are followed by semicolons so if your macro declares a chain of constants, they should likely be followed by semicolons even if the syntax is otherwise slightly different from Rust's.

// Ordinary constants use semicolons.
const A: u32 = 0b000001;
const B: u32 = 0b000010;

// So prefer this...
bitflags! {
    struct S: u32 {
        const C = 0b000100;
        const D = 0b001000;
    }
}

// ...over this.
bitflags! {
    struct S: u32 {
        const E = 0b010000,
        const F = 0b100000,
    }
}

Macros are so diverse that these specific examples won't be relevant, but think about how to apply the same principles to your situation.

Item macros compose well with attributes (C-MACRO-ATTR)

Macros that produce more than one output item should support adding attributes to any one of those items. One common use case would be putting individual items behind a cfg.

bitflags! {
    struct Flags: u8 {
        #[cfg(windows)]
        const ControlCenter = 0b001;
        #[cfg(unix)]
        const Terminal = 0b010;
    }
}

Macros that produce a struct or enum as output should support attributes so that the output can be used with derive.

bitflags! {
    #[derive(Default, Serialize)]
    struct Flags: u8 {
        const ControlCenter = 0b001;
        const Terminal = 0b010;
    }
}

Item macros work anywhere that items are allowed (C-ANYWHERE)

Rust allows items to be placed at the module level or within a tighter scope like a function. Item macros should work equally well as ordinary items in all of these places. The test suite should include invocations of the macro in at least the module scope and function scope.

#[cfg(test)]
mod tests {
    test_your_macro_in_a!(module);

    #[test]
    fn anywhere() {
        test_your_macro_in_a!(function);
    }
}

As a simple example of how things can go wrong, this macro works great in a module scope but fails in a function scope.

macro_rules! broken {
    ($m:ident :: $t:ident) => {
        pub struct $t;
        pub mod $m {
            pub use super::$t;
        }
    }
}

broken!(m::T); // okay, expands to T and m::T

fn g() {
    broken!(m::U); // fails to compile, super::U refers to the containing module not g
}

Item macros support visibility specifiers (C-MACRO-VIS)

Follow Rust syntax for visibility of items produced by a macro. Private by default, public if pub is specified.

bitflags! {
    struct PrivateFlags: u8 {
        const A = 0b0001;
        const B = 0b0010;
    }
}

bitflags! {
    pub struct PublicFlags: u8 {
        const C = 0b0100;
        const D = 0b1000;
    }
}

Type fragments are flexible (C-MACRO-TY)

If your macro accepts a type fragment like $t:ty in the input, it should be usable with all of the following:

  • Primitives: u8, &str
  • Relative paths: m::Data
  • Absolute paths: ::base::Data
  • Upward relative paths: super::Data
  • Generics: Vec<String>

As a simple example of how things can go wrong, this macro works great with primitives and absolute paths but fails with relative paths.

macro_rules! broken {
    ($m:ident => $t:ty) => {
        pub mod $m {
            pub struct Wrapper($t);
        }
    }
}

broken!(a => u8); // okay

broken!(b => ::std::marker::PhantomData<()>); // okay

struct S;
broken!(c => S); // fails to compile

Documentation

Crate level docs are thorough and include examples (C-CRATE-DOC)

See RFC 1687.

All items have a rustdoc example (C-EXAMPLE)

Every public module, trait, struct, enum, function, method, macro, and type definition should have an example that exercises the functionality.

The purpose of an example is not always to show how to use the item. For example users can be expected to know how to instantiate and match on an enum like enum E { A, B }. Rather, an example is often intended to show why someone would want to use the item.

This guideline should be applied within reason.

A link to an applicable example on another item may be sufficient. For example if exactly one function uses a particular type, it may be appropriate to write a single example on either the function or the type and link to it from the other.

Examples use ?, not try!, not unwrap (C-QUESTION-MARK)

Like it or not, example code is often copied verbatim by users. Unwrapping an error should be a conscious decision that the user needs to make.

A common way of structuring fallible example code is the following. The lines beginning with # are compiled by cargo test when building the example but will not appear in user-visible rustdoc.

/// ```rust
/// # use std::error::Error;
/// #
/// # fn try_main() -> Result<(), Box<dyn Error>> {
/// your;
/// example?;
/// code;
/// #
/// #     Ok(())
/// # }
/// #
/// # fn main() {
/// #     try_main().unwrap();
/// # }
/// ```

Function docs include error conditions in "Errors" section (C-ERROR-DOC)

Per RFC 1574.

This applies to trait methods as well. Trait methods for which the implementation is allowed or expected to return an error should be documented with an "Errors" section.

Examples from the standard library

Some implementations of the std::io::Read::read trait method may return an error.

/// Pull some bytes from this source into the specified buffer, returning
/// how many bytes were read.
///
/// ... lots more info ...
///
/// # Errors
///
/// If this function encounters any form of I/O or other error, an error
/// variant will be returned. If an error is returned then it must be
/// guaranteed that no bytes were read.

Function docs include panic conditions in "Panics" section (C-PANIC-DOC)

Per RFC 1574.

This applies to trait methods as well. Traits methods for which the implementation is allowed or expected to panic should be documented with a "Panics" section.

Examples from the standard library

The Vec::insert method may panic.

/// Inserts an element at position `index` within the vector, shifting all
/// elements after it to the right.
///
/// # Panics
///
/// Panics if `index` is out of bounds.

Prose contains hyperlinks to relevant things (C-LINK)

Links to methods within the same type usually look like this:

[`serialize_struct`]: #method.serialize_struct

Links to other types usually look like this:

[`Deserialize`]: trait.Deserialize.html

Links may also point to a parent or child module:

[`Value`]: ../enum.Value.html
[`DeserializeOwned`]: de/trait.DeserializeOwned.html

This guideline is officially recommended by RFC 1574 under the heading "Link all the things".

Cargo.toml publishes CI badges for tier 1 platforms (C-CI)

The Rust compiler regards tier 1 platforms as "guaranteed to work." Specifically they will each satisfy the following requirements:

  • Official binary releases are provided for the platform.
  • Automated testing is set up to run tests for the platform.
  • Landing changes to the rust-lang/rust repository's master branch is gated on tests passing.
  • Documentation for how to use and how to build the platform is available.

Stable, high-profile crates should meet the same level of rigor when it comes to tier 1. To prove it, Cargo.toml should publish CI badges.

[badges]
travis-ci = { repository = "..." }
appveyor = { repository = "..." }

Cargo.toml includes all common metadata (C-METADATA)

  • authors
  • description
  • license
  • homepage (though see rust-api-guidelines#26)
  • documentation
  • repository
  • readme
  • keywords
  • categories

Crate sets html_root_url attribute (C-HTML-ROOT)

It should point to "https://docs.rs/$crate/$version".

Cargo.toml should contain a note next to the version to remember to bump the html_root_url when bumping the crate version.

Cargo.toml documentation key points to docs.rs (C-DOCS-RS)

It should point to "https://docs.rs/$crate".

Predictability

Smart pointers do not add inherent methods (C-SMART-PTR)

For example, this is why the Box::into_raw function is defined the way it is.

impl<T> Box<T> where T: ?Sized {
    fn into_raw(b: Box<T>) -> *mut T { /* ... */ }
}

let boxed_str: Box<str> = /* ... */;
let ptr = Box::into_raw(boxed_str);

If this were defined as an inherent method instead, it would be confusing at the call site whether the method being called is a method on Box<T> or a method on T.

impl<T> Box<T> where T: ?Sized {
    // Do not do this.
    fn into_raw(self) -> *mut T { /* ... */ }
}

let boxed_str: Box<str> = /* ... */;

// This is a method on str accessed through the smart pointer Deref impl.
boxed_str.chars()

// This is a method on Box<str>...?
boxed_str.into_raw()

Conversions live on the most specific type involved (C-CONV-SPECIFIC)

When in doubt, prefer to_/as_/into_ to from_, because they are more ergonomic to use (and can be chained with other methods).

For many conversions between two types, one of the types is clearly more "specific": it provides some additional invariant or interpretation that is not present in the other type. For example, str is more specific than &[u8], since it is a UTF-8 encoded sequence of bytes.

Conversions should live with the more specific of the involved types. Thus, str provides both the as_bytes method and the from_utf8 constructor for converting to and from &[u8] values. Besides being intuitive, this convention avoids polluting concrete types like &[u8] with endless conversion methods.

Functions with a clear receiver are methods (C-METHOD)

Prefer

impl Foo {
    pub fn frob(&self, w: widget) { /* ... */ }
}

over

pub fn frob(foo: &Foo, w: widget) { /* ... */ }

for any operation that is clearly associated with a particular type.

Methods have numerous advantages over functions:

  • They do not need to be imported or qualified to be used: all you need is a value of the appropriate type.
  • Their invocation performs autoborrowing (including mutable borrows).
  • They make it easy to answer the question "what can I do with a value of type T" (especially when using rustdoc).
  • They provide self notation, which is more concise and often more clearly conveys ownership distinctions.

Functions do not take out-parameters (C-NO-OUT)

Prefer

fn foo() -> (Bar, Bar)

over

fn foo(output: &mut Bar) -> Bar

for returning multiple Bar values.

Compound return types like tuples and structs are efficiently compiled and do not require heap allocation. If a function needs to return multiple values, it should do so via one of these types.

The primary exception: sometimes a function is meant to modify data that the caller already owns, for example to re-use a buffer:

fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>

Operator overloads are unsurprising (C-OVERLOAD)

Operators with built in syntax (*, |, and so on) can be provided for a type by implementing the traits in std::ops. These operators come with strong expectations: implement Mul only for an operation that bears some resemblance to multiplication (and shares the expected properties, e.g. associativity), and so on for the other traits.

Only smart pointers implement Deref and DerefMut (C-DEREF)

The Deref traits are used implicitly by the compiler in many circumstances, and interact with method resolution. The relevant rules are designed specifically to accommodate smart pointers, and so the traits should be used only for that purpose.

Examples from the standard library

Deref and DerefMut never fail (C-DEREF-FAIL)

Because the Deref traits are invoked implicitly by the compiler in sometimes subtle ways, failure during dereferencing can be extremely confusing.

Constructors are static, inherent methods (C-CTOR)

In Rust, "constructors" are just a convention:

impl<T> Example<T> {
    pub fn new() -> Example<T> { /* ... */ }
}

Constructors are static (no self) inherent methods for the type that they construct. Combined with the practice of fully importing type names, this convention leads to informative but concise construction:

use example::Example;

// Construct a new Example.
let ex = Example::new();

This convention also applied to conversion constructors (prefix from rather than new).

Constructors for structs with sensible defaults allow clients to concisely override using the struct update syntax.

pub struct Config {
    pub color: Color,
    pub size: Size,
    pub shape: Shape,
}

impl Config {
    pub fn new() -> Config {
        Config {
            color: Brown,
            size: Medium,
            shape: Square,
        }
    }
}

// In user's code.
let config = Config { color: Red, .. Config::new() };
Examples from the standard library

Flexibility

Functions expose intermediate results to avoid duplicate work (C-INTERMEDIATE)

Many functions that answer a question also compute interesting related data. If this data is potentially of interest to the client, consider exposing it in the API.

Examples from the standard library
  • Vec::binary_search does not return a bool of whether the value was found, nor an Option<usize> of the index at which the value was maybe found. Instead it returns information about the index if found, and also the index at which the value would need to be inserted if not found.

  • String::from_utf8 may fail if the input bytes are not UTF-8. In the error case it returns an intermediate result that exposes the byte offset up to which the input was valid UTF-8, as well as handing back ownership of the input bytes.

Caller decides where to copy and place data (C-CALLER-CONTROL)

If a function requires ownership of an argument, it should take ownership of the argument rather than borrowing and cloning the argument.

// Prefer this:
fn foo(b: Bar) {
    /* use b as owned, directly */
}

// Over this:
fn foo(b: &Bar) {
    let b = b.clone();
    /* use b as owned after cloning */
}

If a function does not require ownership of an argument, it should take a shared or exclusive borrow of the argument rather than taking ownership and dropping the argument.

// Prefer this:
fn foo(b: &Bar) {
    /* use b as borrowed */
}

// Over this:
fn foo(b: Bar) {
    /* use b as borrowed, it is implicitly dropped before function returns */
}

The Copy trait should only be used as a bound when absolutely needed, not as a way of signaling that copies should be cheap to make.

Functions minimize assumptions about parameters by using generics (C-GENERIC)

The fewer assumptions a function makes about its inputs, the more widely usable it becomes.

Prefer

fn foo<I: Iterator<Item = i64>>(iter: I) { /* ... */ }

over any of

fn foo(c: &[i64]) { /* ... */ }
fn foo(c: &Vec<i64>) { /* ... */ }
fn foo(c: &SomeOtherCollection<i64>) { /* ... */ }

if the function only needs to iterate over the data.

More generally, consider using generics to pinpoint the assumptions a function needs to make about its arguments.

Advantages of generics
  • Reusability. Generic functions can be applied to an open-ended collection of types, while giving a clear contract for the functionality those types must provide.

  • Static dispatch and optimization. Each use of a generic function is specialized ("monomorphized") to the particular types implementing the trait bounds, which means that (1) invocations of trait methods are static, direct calls to the implementation and (2) the compiler can inline and otherwise optimize these calls.

  • Inline layout. If a struct and enum type is generic over some type parameter T, values of type T will be laid out inline in the struct/enum, without any indirection.

  • Inference. Since the type parameters to generic functions can usually be inferred, generic functions can help cut down on verbosity in code where explicit conversions or other method calls would usually be necessary.

  • Precise types. Because generic give a name to the specific type implementing a trait, it is possible to be precise about places where that exact type is required or produced. For example, a function

    fn binary<T: Trait>(x: T, y: T) -> T

    is guaranteed to consume and produce elements of exactly the same type T; it cannot be invoked with parameters of different types that both implement Trait.

Disadvantages of generics
  • Code size. Specializing generic functions means that the function body is duplicated. The increase in code size must be weighed against the performance benefits of static dispatch.

  • Homogeneous types. This is the other side of the "precise types" coin: if T is a type parameter, it stands for a single actual type. So for example a Vec<T> contains elements of a single concrete type (and, indeed, the vector representation is specialized to lay these out in line). Sometimes heterogeneous collections are useful; see trait objects.

  • Signature verbosity. Heavy use of generics can make it more difficult to read and understand a function's signature.

Examples from the standard library
  • std::fs::File::open takes an argument of generic type AsRef<Path>. This allows files to be opened conveniently from a string literal "f.txt", a Path, an OsString, and a few other types.

Traits are object-safe if they may be useful as a trait object (C-OBJECT)

Trait objects have some significant limitations: methods invoked through a trait object cannot use generics, and cannot use Self except in receiver position.

When designing a trait, decide early on whether the trait will be used as an object or as a bound on generics.

If a trait is meant to be used as an object, its methods should take and return trait objects rather than use generics.

A where clause of Self: Sized may be used to exclude specific methods from the trait's object. The following trait is not object-safe due to the generic method.

trait MyTrait {
    fn object_safe(&self, i: i32);

    fn not_object_safe<T>(&self, t: T);
}

Adding a requirement of Self: Sized to the generic method excludes it from the trait object and makes the trait object-safe.

trait MyTrait {
    fn object_safe(&self, i: i32);

    fn not_object_safe<T>(&self, t: T) where Self: Sized;
}
Advantages of trait objects
  • Heterogeneity. When you need it, you really need it.
  • Code size. Unlike generics, trait objects do not generate specialized (monomorphized) versions of code, which can greatly reduce code size.
Disadvantages of trait objects
  • No generic methods. Trait objects cannot currently provide generic methods.
  • Dynamic dispatch and fat pointers. Trait objects inherently involve indirection and vtable dispatch, which can carry a performance penalty.
  • No Self. Except for the method receiver argument, methods on trait objects cannot use the Self type.
Examples from the standard library
  • The io::Read and io::Write traits are often used as objects.
  • The Iterator trait has several generic methods marked with where Self: Sized to retain the ability to use Iterator as an object.

Type safety

Newtypes provide static distinctions (C-NEWTYPE)

Newtypes can statically distinguish between different interpretations of an underlying type.

For example, a f64 value might be used to represent a quantity in miles or in kilometers. Using newtypes, we can keep track of the intended interpretation:

struct Miles(pub f64);
struct Kilometers(pub f64);

impl Miles {
    fn as_kilometers(&self) -> Kilometers { /* ... */ }
}
impl Kilometers {
    fn as_miles(&self) -> Miles { /* ... */ }
}

Once we have separated these two types, we can statically ensure that we do not confuse them. For example, the function

fn are_we_there_yet(distance_travelled: Miles) -> bool { /* ... */ }

cannot accidentally be called with a Kilometers value. The compiler will remind us to perform the conversion, thus averting certain catastrophic bugs.

Arguments convey meaning through types, not bool or Option (C-CUSTOM-TYPE)

Prefer

let w = Widget::new(Small, Round)

over

let w = Widget::new(true, false)

Core types like bool, u8 and Option have many possible interpretations.

Use custom types (whether enums, struct, or tuples) to convey interpretation and invariants. In the above example, it is not immediately clear what true and false are conveying without looking up the argument names, but Small and Round are more suggestive.

Using custom types makes it easier to expand the options later on, for example by adding an ExtraLarge variant.

See the newtype pattern for a no-cost way to wrap existing types with a distinguished name.

Types for a set of flags are bitflags, not enums (C-BITFLAG)

Rust supports enum types with explicitly specified discriminants:

enum Color {
    Red = 0xff0000,
    Green = 0x00ff00,
    Blue = 0x0000ff,
}

Custom discriminants are useful when an enum type needs to be serialized to an integer value compatibly with some other system/language. They support "typesafe" APIs: by taking a Color, rather than an integer, a function is guaranteed to get well-formed inputs, even if it later views those inputs as integers.

An enum allows an API to request exactly one choice from among many. Sometimes an API's input is instead the presence or absence of a set of flags. In C code, this is often done by having each flag correspond to a particular bit, allowing a single integer to represent, say, 32 or 64 flags. Rust's bitflags crate provides a typesafe representation of this pattern.

#[macro_use]
extern crate bitflags;

bitflags! {
    flags Flags: u32 {
        const FLAG_A = 0b00000001,
        const FLAG_B = 0b00000010,
        const FLAG_C = 0b00000100,
    }
}

fn f(settings: Flags) {
    if settings.contains(FLAG_A) {
        println!("doing thing A");
    }
    if settings.contains(FLAG_B) {
        println!("doing thing B");
    }
    if settings.contains(FLAG_C) {
        println!("doing thing C");
    }
}

fn main() {
    f(FLAG_A | FLAG_C);
}

Builders enable construction of complex values (C-BUILDER)

Some data structures are complicated to construct, due to their construction needing:

  • a large number of inputs
  • compound data (e.g. slices)
  • optional configuration data
  • choice between several flavors

which can easily lead to a large number of distinct constructors with many arguments each.

If T is such a data structure, consider introducing a T builder:

  1. Introduce a separate data type TBuilder for incrementally configuring a T value. When possible, choose a better name: e.g. Command is the builder for a child process, Url can be created from a ParseOptions.
  2. The builder constructor should take as parameters only the data required to make a T.
  3. The builder should offer a suite of convenient methods for configuration, including setting up compound inputs (like slices) incrementally. These methods should return self to allow chaining.
  4. The builder should provide one or more "terminal" methods for actually building a T.

The builder pattern is especially appropriate when building a T involves side effects, such as spawning a task or launching a process.

In Rust, there are two variants of the builder pattern, differing in the treatment of ownership, as described below.

Non-consuming builders (preferred):

In some cases, constructing the final T does not require the builder itself to be consumed. The follow variant on std::process::Command is one example:

// NOTE: the actual Command API does not use owned Strings;
// this is a simplified version.

pub struct Command {
    program: String,
    args: Vec<String>,
    cwd: Option<String>,
    // etc
}

impl Command {
    pub fn new(program: String) -> Command {
        Command {
            program: program,
            args: Vec::new(),
            cwd: None,
        }
    }

    /// Add an argument to pass to the program.
    pub fn arg(&mut self, arg: String) -> &mut Command {
        self.args.push(arg);
        self
    }

    /// Add multiple arguments to pass to the program.
    pub fn args(&mut self, args: &[String]) -> &mut Command {
        self.args.extend_from_slice(args);
        self
    }

    /// Set the working directory for the child process.
    pub fn current_dir(&mut self, dir: String) -> &mut Command {
        self.cwd = Some(dir);
        self
    }

    /// Executes the command as a child process, which is returned.
    pub fn spawn(&self) -> io::Result<Child> {
        /* ... */
    }
}

Note that the spawn method, which actually uses the builder configuration to spawn a process, takes the builder by immutable reference. This is possible because spawning the process does not require ownership of the configuration data.

Because the terminal spawn method only needs a reference, the configuration methods take and return a mutable borrow of self.

The benefit

By using borrows throughout, Command can be used conveniently for both one-liner and more complex constructions:

// One-liners
Command::new("/bin/cat").arg("file.txt").spawn();

// Complex configuration
let mut cmd = Command::new("/bin/ls");
cmd.arg(".");
if size_sorted {
    cmd.arg("-S");
}
cmd.spawn();

Consuming builders:

Sometimes builders must transfer ownership when constructing the final type T, meaning that the terminal methods must take self rather than &self.

impl TaskBuilder {
    /// Name the task-to-be.
    pub fn named(mut self, name: String) -> TaskBuilder {
        self.name = Some(name);
        self
    }

    /// Redirect task-local stdout.
    pub fn stdout(mut self, stdout: Box<io::Write + Send>) -> TaskBuilder {
        self.stdout = Some(stdout);
        self
    }

    /// Creates and executes a new child task.
    pub fn spawn<F>(self, f: F) where F: FnOnce() + Send {
        /* ... */
    }
}

Here, the stdout configuration involves passing ownership of an io::Write, which must be transferred to the task upon construction (in spawn).

When the terminal methods of the builder require ownership, there is a basic tradeoff:

  • If the other builder methods take/return a mutable borrow, the complex configuration case will work well, but one-liner configuration becomes impossible.

  • If the other builder methods take/return an owned self, one-liners continue to work well but complex configuration is less convenient.

Under the rubric of making easy things easy and hard things possible, all builder methods for a consuming builder should take and returned an owned self. Then client code works as follows:

// One-liners
TaskBuilder::new("my_task").spawn(|| { /* ... */ });

// Complex configuration
let mut task = TaskBuilder::new();
task = task.named("my_task_2"); // must re-assign to retain ownership
if reroute {
    task = task.stdout(mywriter);
}
task.spawn(|| { /* ... */ });

One-liners work as before, because ownership is threaded through each of the builder methods until being consumed by spawn. Complex configuration, however, is more verbose: it requires re-assigning the builder at each step.

Dependability

Functions validate their arguments (C-VALIDATE)

Rust APIs do not generally follow the robustness principle: "be conservative in what you send; be liberal in what you accept".

Instead, Rust code should enforce the validity of input whenever practical.

Enforcement can be achieved through the following mechanisms (listed in order of preference).

Static enforcement:

Choose an argument type that rules out bad inputs.

For example, prefer

fn foo(a: Ascii) { /* ... */ }

over

fn foo(a: u8) { /* ... */ }

where Ascii is a wrapper around u8 that guarantees the highest bit is zero; see newtype patterns for more details on creating typesafe wrappers.

Static enforcement usually comes at little run-time cost: it pushes the costs to the boundaries (e.g. when a u8 is first converted into an Ascii). It also catches bugs early, during compilation, rather than through run-time failures.

On the other hand, some properties are difficult or impossible to express using types.

Dynamic enforcement:

Validate the input as it is processed (or ahead of time, if necessary). Dynamic checking is often easier to implement than static checking, but has several downsides:

  1. Runtime overhead (unless checking can be done as part of processing the input).
  2. Delayed detection of bugs.
  3. Introduces failure cases, either via fail! or Result/Option types, which must then be dealt with by client code.

Dynamic enforcement with debug_assert!:

Same as dynamic enforcement, but with the possibility of easily turning off expensive checks for production builds.

Dynamic enforcement with opt-out:

Same as dynamic enforcement, but adds sibling functions that opt out of the checking.

The convention is to mark these opt-out functions with a suffix like _unchecked or by placing them in a raw submodule.

The unchecked functions can be used judiciously in cases where (1) performance dictates avoiding checks and (2) the client is otherwise confident that the inputs are valid.

Destructors never fail (C-DTOR-FAIL)

Destructors are executed on task failure, and in that context a failing destructor causes the program to abort.

Instead of failing in a destructor, provide a separate method for checking for clean teardown, e.g. a close method, that returns a Result to signal problems.

Destructors that may block have alternatives (C-DTOR-BLOCK)

Similarly, destructors should not invoke blocking operations, which can make debugging much more difficult. Again, consider providing a separate method for preparing for an infallible, nonblocking teardown.

Debuggability

All public types implement Debug (C-DEBUG)

If there are exceptions, they are rare.

Debug representation is never empty (C-DEBUG-NONEMPTY)

Even for conceptually empty values, the Debug representation should never be empty.

let empty_str = "";
assert_eq!(format!("{:?}", empty_str), "\"\"");

let empty_vec = Vec::<bool>::new();
assert_eq!(format!("{:?}", empty_vec), "[]");

Future proofing

Structs have private fields (C-STRUCT-PRIVATE)

Making a field public is a strong commitment: it pins down a representation choice, and prevents the type from providing any validation or maintaining any invariants on the contents of the field, since clients can mutate it arbitrarily.

Public fields are most appropriate for struct types in the C spirit: compound, passive data structures. Otherwise, consider providing getter/setter methods and hiding fields instead.

Newtypes encapsulate implementation details (C-NEWTYPE-HIDE)

A newtype can be used to hide representation details while making precise promises to the client.

For example, consider a function my_transform that returns a compound iterator type.

use std::iter::{Enumerate, Skip};

pub fn my_transform<I: Iterator>(input: I) -> Enumerate<Skip<I>> {
    input.skip(3).enumerate()
}

We wish to hide this type from the client, so that the client's view of the return type is roughly Iterator<Item = (usize, T)>. We can do so using the newtype pattern:

use std::iter::{Enumerate, Skip};

pub struct MyTransformResult<I>(Enumerate<Skip<I>>);

impl<I: Iterator> Iterator for MyTransformResult<I> {
    type Item = (usize, I::Item);

    fn next(&mut self) -> Option<Self::Item> {
        self.0.next()
    }
}

pub fn my_transform<I: Iterator>(input: I) -> MyTransformResult<I> {
    MyTransformResult(input.skip(3).enumerate())
}

Aside from simplifying the signature, this use of newtypes allows us to promise less to the client. The client does not know how the result iterator is constructed or represented, which means the representation can change in the future without breaking client code.

In the future the same thing can be accomplished more concisely with the impl Trait feature but this is currently unstable.

#![feature(conservative_impl_trait)]

pub fn my_transform<I: Iterator>(input: I) -> impl Iterator<Item = (usize, I::Item)> {
    input.skip(3).enumerate()
}

Necessities

Public dependencies of a stable crate are stable (C-STABLE)

A crate cannot be stable (>=1.0.0) without all of its public dependencies being stable.

Public dependencies are crates from which types are used in the public API of the current crate.

pub fn do_my_thing(arg: other_crate::TheirThing) { /* ... */ }

A crate containing this function cannot be stable unless other_crate is also stable.

Be careful because public dependencies can sneak in at unexpected places.

pub struct Error {
    private: ErrorImpl,
}

enum ErrorImpl {
    Io(io::Error),
    // Should be okay even if other_crate isn't
    // stable, because ErrorImpl is private.
    Dep(other_crate::Error),
}

// Oh no! This puts other_crate into the public API
// of the current crate.
impl From<other_crate::Error> for Error {
    fn from(err: other_crate::Error) -> Self {
        Error { private: ErrorImpl::Dep(err) }
    }
}

Crate and its dependencies have a permissive license (C-PERMISSIVE)

The software produced by the Rust project is dual-licensed, under either the MIT or Apache 2.0 licenses. Crates that simply need the maximum compatibility with the Rust ecosystem are recommended to do the same, in the manner described herein. Other options are described below.

These API guidelines do not provide a detailed explanation of Rust's license, but there is a small amount said in the Rust FAQ. These guidelines are concerned with matters of interoperability with Rust, and are not comprehensive over licensing options.

To apply the Rust license to your project, define the license field in your Cargo.toml as:

[package]
name = "..."
version = "..."
authors = ["..."]
license = "MIT/Apache-2.0"

And toward the end of your README.md:

## License

Licensed under either of

 * Apache License, Version 2.0
   ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
 * MIT license
   ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)

at your option.

### Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be
dual licensed as above, without any additional terms or conditions.

Besides the dual MIT/Apache-2.0 license, another common licensing approach used by Rust crate authors is to apply a single permissive license such as MIT or BSD. This license scheme is also entirely compatible with Rust's, because it imposes the minimal restrictions of Rust's MIT license.

Crates that desire perfect license compatibility with Rust are not recommended to choose only the Apache license. The Apache license, though it is a permissive license, imposes restrictions beyond the MIT and BSD licenses that can discourage or prevent their use in some scenarios, so Apache-only software cannot be used in some situations where most of the Rust runtime stack can.

The license of a crate's dependencies can affect the restrictions on distribution of the crate itself, so a permissively-licensed crate should generally only depend on permissively-licensed crates.

External Links

License

This guidelines document is licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this document by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.