Skip to content

Commit

Permalink
Merge pull request #40 from SciML/ChrisRackauckas-patch-1
Browse files Browse the repository at this point in the history
Add a bunch of safety tips from the JuliaHub Secure Julia Coding Best…
  • Loading branch information
ChrisRackauckas authored Nov 19, 2023
2 parents a6f9c08 + 5151810 commit 9807778
Showing 1 changed file with 197 additions and 0 deletions.
197 changes: 197 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,17 @@ the style guide.
- [Prefer code reuse over rewrites whenever possible](#prefer-code-reuse-over-rewrites-whenever-possible)
- [Prefer to not shadow functions](#prefer-to-not-shadow-functions)
- [Avoid unmaintained dependencies](#avoid-unmaintained-dependencies)
- [Avoid unsafe operations](#avoid-unsafe-operations)
- [Avoid non public operations in Julia Base and packages](#avoid-non-public-operations-in-julia-base-and-packages)
- [Always default to constructs which initialize data](#always-default-to-constructs-which-initialize-data)
- [Use extra precaution when running external processes](#use-extra-precaution-when-running-external-processes)
- [Avoid eval whenever possible](#avoid-eval-whenever-possible)
- [Avoid bounds check removal, and if done, add appropriate manual checks](#avoid-bounds-check-removal-and-if-done-add-appropriate-manual-checks)
- [Avoid ccall unless necessary, and use safe ccall practices when required](#avoid-ccall-and-ccall-unless-necessary-and-use-safe-ccall-practices-when-required)
- [Validate all user inputs to avoid code injection](#validate-all-user-inputs-to-avoid-code-injection)
- [Ensure secure random number generators are used when required](#ensure-secure-random-number-generators-are-used-when-required)
- [Be aware of distributed computing encryption principles](#be-aware-of-distributed-computing-encryption-principles)
- [Always immediately flush secret data after handling](#always-immediately-flush-secret-data-after-handling)
- [Specific Rules](#specific-rules)
- [High Level Rules](#high-level-rules)
- [General Naming Principles](#general-naming-principles)
Expand Down Expand Up @@ -373,6 +384,190 @@ then all dependencies from that organization should be considered for removal. N
may take a long time to fix, so it may take more time than 2 weeks to fix, it's simply that the
communication should be open, consistent, and timely.

### Avoid unsafe operations

Like other high-level languages that provide strong safety guarantees by default, Julia nevertheless has a small
set of operations that bypass normal checks. These operations are clearly marked with the prefix unsafe_. By using an
“unsafe” operation, the programmer asserts that they know the operation is valid even though the language cannot
automatically ensure it. For high reliability these constructs should be avoided or carefully inspected during code
review. They are:

* unsafe_load
* unsafe_store!
* unsafe_read
* unsafe_write
* unsafe_string
* unsafe_wrap
* unsafe_convert
* unsafe_copyto!
* unsafe_pointer_to_objref
* ccall
* @ccall
* @inbounds

### Avoid non public operations in Julia Base and packages

The Julia standard library and packages developed in the Julia programming language have an intended public API indicated by
marking symbols with the export keyword or [in v1.11+ with the new `public` keyword](https://github.com/JuliaLang/julia/pull/50105).
However, it is possible to use non public names via explicit qualification, e.g. Base.foobar. This practice is not necessarily unsafe,
but should be avoided since non public operations may have unexpected invariants and behaviors, and are subject to changes in future
releases of the language.

Note that qualified names are commonly used in method definitions to clarify that a function is being extended, e.g. function Base.getindex(...) … end.
Such uses do not fall under this concern.

### Always default to constructs which initialize data

For certain newly-allocated data structures, such as numeric arrays, the Julia compiler and runtime do not check whether data is accessed before it has been
initialized. Therefore such data structures can “leak” information from one part of a program to another. Uninitialized structures should be avoided in favor
of functions like `zeros` and `fill` that create data with well-defined contents. If code does allocate uninitialized memory, it should ensure that this memory
is fully initialized before being returned from the function in which it is allocated.

Constructs which create uninitialized memory should only be used if there is a demonstrated performance impact and it should ensure that all memory is initialized
in the same function in which the array is intended to be used.

Example:

```julia
function make_array(n::Int)
A = Vector{Int}(undef, n)
# function body
return A
end
```

This function allocates an integer array with undefined initial contents (note the language forces you to request this explicitly).
A code reviewer should ensure that the function body assigns every element of the array.
One can similarly create structs with undefined fields, and if used this way, one should ensure all fields are initialized:

```julia
struct Foo
x::Int
Foo() = new()
end

julia> Foo().x
139736495677280
```

### Use extra precaution when running external processes

The Julia standard library contains a `run` function and other facilities for running external processes. Any program that does this
is only as safe as the external process it runs. If this cannot be avoided, then best practices for using these features are:

1. Only run fixed, known executables, and do not derive the path of the executable to run from user input or other outside sources.
2. Make sure the executables used have also passed required audit procedures.
3. Make sure to handle process failure (non-zero exit code).
4. If possible, run external processes in a sandbox or “jail” environment with access only to what they need in terms of files, file handles and ports.

When run in a sandbox or jail, external processes can actually improve security since the subprocess is isolated from the rest of the system by the kernel.

### Avoid eval whenever possible

Julia contains an `eval` function that executes program expressions constructed at run time. This is not in itself unsafe, but because the code it will run
is not textually evident in the surrounding program, it can be difficult to determine what it will do. For example, a Julia program could construct and eval
an expression that performs an unsafe operation without the operation being clearly evident to a code reviewer or analysis tool.

In general, programs should try to avoid using eval in ways that are influenced by user input because there are many subtle ways this can lead to arbitrary code
execution. If user input must influence eval, the input should only be used to select from a known list of possible behaviors. Approaches using pattern matching
to try to validate expressions should be viewed with extreme suspicion because they tend to be brittle and/or exploitable.

Note: it is common for Julia programs to invoke `eval` or `@eval` at the top level, in order to generate global definitions programmatically. Such uses are generally safe.

### Avoid bounds check removal, and if done, add appropriate manual checks

While Julia checks the bounds of all array operations by default, it is possible to manually disable bounds checks in a program using @inbounds.
Note that in early versions of Julia (pre v1.9) this could be used as a performance optimization, but in later versions it can demonstrably
reduce performance and thus one should never immediately default to bounds check removal as a performance habit.

Uses of this construct should be carefully audited during code review. For maximum safety, it should be avoided or programs should be run with the
command line option --check-bounds=yes to enable all checks regardless of manual annotations.

To check a use of @inbounds for correctness, it suffices to examine all array indexing expressions (e.g. a[i]) within the expression it applies to,
and ensure that each index will always be within the bounds of the indexed array. For example the following common use pattern is valid:

```julia
@inbounds for i in eachindex(A)
A[i] = i
end
```

By inspection, the variable `i` will always be a valid index for `A`.

For contrast, the following use is invalid unless A is known to be a specific type (eg: `Vector`)

```julia
@inbounds for i in 1:length(A)
A[i] = i
end
```

`@inbounds` should be applied to as narrow a region of code as possible. When applied to a large block of code, it can be difficult to identify
and verify all indexing expressions.

### Avoid ccall unless necessary, and use safe ccall practices when required

Calling C (and Fortran) libraries from Julia is very easy: the ccall syntax (and the more convenient @ccall macro) allow calling C libraries
without any need for glue files or boilerplate. They do require caution, however: the programmer tells Julia what the signature of each library
function is and if this is not done correctly, it can be the cause of crashes and thus security vulnerabilities. An exploit is just a crash that
an attacker has arranged to fail in a worse way than it would have randomly.

Safe use of ccall depends on both automated and manual measures.

What Julia does (automated):

* Julia provides aliases for C types like Cint, Clong, Cchar, Csize_t, etc. It makes sure that these match what the C ABI on the machine that code is running on expects them to be.
* The Clang.jl package automates the process of turning C header files into valid ccall invocations.
* Pkg+BinaryBuilder.jl allows precise versioning of binary dependencies.
* Julia objects passed directly to ccall are protected from garbage collection (GC) for the duration of the call.

What you must do (manual):
* When writing ccall signatures, programmers should always look at the signature in the C header file and make sure the signature used in Julia matches exactly.
* Use Julia’s C type aliases. For example, if an argument in C is of type int then the corresponding type in Julia is Cint, not Int — on most platforms Int will be the same size as Clong rather than Cint.
* If a raw pointer to memory managed by Julia’s GC is passed to C via ccall, the owning object must be preserved using GC.@preserve around the use of ccall. See the documentation of this macro for more information and examples of proper usage.

### Validate all user inputs to avoid code injection

When writing programs that construct any kind of code based on user input, extra caution is required and the user input must be validated or escaped.
For example, a common type of attack in web applications written in all programming languages is SQL injection: a user input is spliced into an SQL
query to construct a customized query based on the user’s input. If raw user input is spliced into an SQL query as a string, it is easy to craft
inputs that will execute arbitrary SQL commands, including destructive ones or ones that will reveal private data to an attacker. To prevent this,
the user input should be passed as parameters to SQL prepared statements; a package such as SqlStrings.jl can be used to do this without a syntax burden.
This protects against malicious input, but also encourages systematic marshaling of Julia types into SQL types. If string interpolation must be used,
all user input should be either validated to match a strict, safe pattern (e.g. only consists of decimal digits or ASCII letters), or it should be
escaped to ensure that SQL treats it only as data, not as code (e.g. turn a user input into an escaped string literal).

While we have talked specifically about SQL here, this issue is not limited to SQL. The same concern occurs when executing programs via shells, for example.
Julia is more secure than most programming languages in this respect because the default mechanism for running external code (see Cmd objects in the Julia manual)
is carefully designed to not be susceptible to this kind of injection, but programmers may be tempted to use a shell to call external code for convenience sake.
The fact that a shell must be explicitly invoked in Julia helps catch these kinds of circumstances. Using a shell like this is usually a bad idea and can typically
be avoided. If an external shell must be used, be certain that any user data used to construct the shell command is carefully validated or escaped to avoid shell
injection attacks.

## Ensure secure random number generators are used when required

The default pseudo-random number generator in Julia, which can be accessed by calling `rand()` and `randn()`, for example, is intended for simulation purposes and
not for applications requiring cryptographic security. An attacker can, by observing a series of random values, construct its internal state and predict future pseudo-random values.
For security-sensitive applications like generating secret values used for authentication, the `RandomDevice()` random-number generator should be used. This produces genuinely
random numbers which cannot be predicted.

### Be aware of distributed computing encryption principles

Julia’s distributed computing uses unencrypted TCP/IP sockets for communication by default and expects to be running on a fully trusted cluster. If `using Distributed` in your code through
`@distributed`, `pmap`, etc., be aware that the communication channels are not encrypted. Julia opens ports for communication between processes in a distributed cluster. A pre-generated
random cookie is necessary to successfully connect (Julia 0.5 onwards), which defeats arbitrary external connections. This mechanism is described in detail in https://github.com/JuliaLang/julia/pull/16292.

For additional security, these communication channels can be encrypted through the use of a custom ClusterManager which enables SSH port forwarding, or uses some other mechanism to encrypt the communication channel.

### Always immediately flush secret data after handling

When it’s necessary to manage secret data (for example, a user’s password) it’s desirable to have this erased from memory immediately after finishing with the data. However, when a normal `String` or `Array` is used as
a container for such data, the underlying bytes persist after the container is deallocated and in principle could be recovered by an attacker at a later time. There are also situations where a string or array may be
implicitly copied, for example, if it is assigned to a location with a compatible but different type, it will be converted, thus creating a copy of the original data. Normally this is harmless and convenient, but
making copies of secrets is obviously bad for security. To prevent this, Julia provides the type `Base.SecretBuffer` and a `shred!` function which should be called immediately after the data is finished with.
The contents of a `SecretBuffer` must be explicitly extracted — it is never implicitly copied — and its contents will be automatically shredded upon garbage collection of the `SecretBuffer` object if the `shred!`
function was never called on it, with a warning indicating that the buffer should have been explicitly shredded by the programmer.

## Specific Rules

### High Level Rules
Expand Down Expand Up @@ -1237,3 +1432,5 @@ Add [FormatCheck.yml](https://github.com/SciML/ModelingToolkit.jl/blob/master/.g

Many of these style choices were derived from the [Julia style guide](https://docs.julialang.org/en/v1/manual/style-guide/),
the [YASGuide](https://github.com/jrevels/YASGuide), and the [Blue style guide](https://github.com/invenia/BlueStyle#module-imports).
Additionally, many tips and requirements from the [JuliaHub Secure Coding Practices](https://juliahub.com/company/resources/white-papers/secure-julia-coding-practices/index.html)
manual were incorporated into this style.

0 comments on commit 9807778

Please sign in to comment.