Merge pull request #40 from SciML/ChrisRackauckas-patch-1

Add a bunch of safety tips from the JuliaHub Secure Julia Coding Best…
SciML · Nov 19, 2023 · 9807778 · 9807778
2 parents a6f9c08 + 5151810
commit 9807778
Showing 1 changed file with 197 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -34,6 +34,17 @@ the style guide.
     - [Prefer code reuse over rewrites whenever possible](#prefer-code-reuse-over-rewrites-whenever-possible)
     - [Prefer to not shadow functions](#prefer-to-not-shadow-functions)
     - [Avoid unmaintained dependencies](#avoid-unmaintained-dependencies)
+    - [Avoid unsafe operations](#avoid-unsafe-operations)
+    - [Avoid non public operations in Julia Base and packages](#avoid-non-public-operations-in-julia-base-and-packages)
+    - [Always default to constructs which initialize data](#always-default-to-constructs-which-initialize-data)
+    - [Use extra precaution when running external processes](#use-extra-precaution-when-running-external-processes)
+    - [Avoid eval whenever possible](#avoid-eval-whenever-possible)
+    - [Avoid bounds check removal, and if done, add appropriate manual checks](#avoid-bounds-check-removal-and-if-done-add-appropriate-manual-checks)
+    - [Avoid ccall unless necessary, and use safe ccall practices when required](#avoid-ccall-and-ccall-unless-necessary-and-use-safe-ccall-practices-when-required)
+    - [Validate all user inputs to avoid code injection](#validate-all-user-inputs-to-avoid-code-injection)
+    - [Ensure secure random number generators are used when required](#ensure-secure-random-number-generators-are-used-when-required)
+    - [Be aware of distributed computing encryption principles](#be-aware-of-distributed-computing-encryption-principles)
+    - [Always immediately flush secret data after handling](#always-immediately-flush-secret-data-after-handling)
   - [Specific Rules](#specific-rules)
     - [High Level Rules](#high-level-rules)
     - [General Naming Principles](#general-naming-principles)
@@ -373,6 +384,190 @@ then all dependencies from that organization should be considered for removal. N
 may take a long time to fix, so it may take more time than 2 weeks to fix, it's simply that the 
 communication should be open, consistent, and timely. 
 
+### Avoid unsafe operations
+
+Like other high-level languages that provide strong safety guarantees by default, Julia nevertheless has a small 
+set of operations that bypass normal checks. These operations are clearly marked with the prefix unsafe_. By using an 
+“unsafe” operation, the programmer asserts that they know the operation is valid even though the language cannot 
+automatically ensure it. For high reliability these constructs should be avoided or carefully inspected during code 
+review. They are:
+
+* unsafe_load 
+* unsafe_store! 
+* unsafe_read 
+* unsafe_write 
+* unsafe_string 
+* unsafe_wrap
+* unsafe_convert
+* unsafe_copyto! 
+* unsafe_pointer_to_objref 
+* ccall 
+* @ccall 
+* @inbounds
+
+### Avoid non public operations in Julia Base and packages
+
+The Julia standard library and packages developed in the Julia programming language have an intended public API indicated by 
+marking symbols with the export keyword or [in v1.11+ with the new `public` keyword](https://github.com/JuliaLang/julia/pull/50105). 
+However, it is possible to use non public names via explicit qualification, e.g. Base.foobar. This practice is not necessarily unsafe, 
+but should be avoided since non public operations may have unexpected invariants and behaviors, and are subject to changes in future 
+releases of the language.
+
+Note that qualified names are commonly used in method definitions to clarify that a function is being extended, e.g. function Base.getindex(...) … end. 
+Such uses do not fall under this concern.
+
+### Always default to constructs which initialize data
+
+For certain newly-allocated data structures, such as numeric arrays, the Julia compiler and runtime do not check whether data is accessed before it has been 
+initialized. Therefore such data structures can “leak” information from one part of a program to another. Uninitialized structures should be avoided in favor 
+of functions like `zeros` and `fill` that create data with well-defined contents. If code does allocate uninitialized memory, it should ensure that this memory 
+is fully initialized before being returned from the function in which it is allocated.
+
+Constructs which create uninitialized memory should only be used if there is a demonstrated performance impact and it should ensure that all memory is initialized
+in the same function in which the array is intended to be used.
+
+Example:
+
+```julia
+function make_array(n::Int)
+    A = Vector{Int}(undef, n)
+    # function body
+    return A
+end
+```
+
+This function allocates an integer array with undefined initial contents (note the language forces you to request this explicitly). 
+A code reviewer should ensure that the function body assigns every element of the array.
+One can similarly create structs with undefined fields, and if used this way, one should ensure all fields are initialized:
+
+```julia
+struct Foo
+  x::Int
+  Foo() = new()
+end
+
+julia> Foo().x
+139736495677280
+```
+
+### Use extra precaution when running external processes
+
+The Julia standard library contains a `run` function and other facilities for running external processes. Any program that does this 
+is only as safe as the external process it runs. If this cannot be avoided, then best practices for using these features are:
+
+1. Only run fixed, known executables, and do not derive the path of the executable to run from user input or other outside sources.
+2. Make sure the executables used have also passed required audit procedures.
+3. Make sure to handle process failure (non-zero exit code).
+4. If possible, run external processes in a sandbox or “jail” environment with access only to what they need in terms of files, file handles and ports.
+
+When run in a sandbox or jail, external processes can actually improve security since the subprocess is isolated from the rest of the system by the kernel.
+
+### Avoid eval whenever possible
+
+Julia contains an `eval` function that executes program expressions constructed at run time. This is not in itself unsafe, but because the code it will run 
+is not textually evident in the surrounding program, it can be difficult to determine what it will do. For example, a Julia program could construct and eval 
+an expression that performs an unsafe operation without the operation being clearly evident to a code reviewer or analysis tool.
+
+In general, programs should try to avoid using eval in ways that are influenced by user input because there are many subtle ways this can lead to arbitrary code 
+execution. If user input must influence eval, the input should only be used to select from a known list of possible behaviors. Approaches using pattern matching 
+to try to validate expressions should be viewed with extreme suspicion because they tend to be brittle and/or exploitable.
+
+Note: it is common for Julia programs to invoke `eval` or `@eval` at the top level, in order to generate global definitions programmatically. Such uses are generally safe.
+
+### Avoid bounds check removal, and if done, add appropriate manual checks
+
+While Julia checks the bounds of all array operations by default, it is possible to manually disable bounds checks in a program using @inbounds.
+Note that in early versions of Julia (pre v1.9) this could be used as a performance optimization, but in later versions it can demonstrably
+reduce performance and thus one should never immediately default to bounds check removal as a performance habit.
+
+Uses of this construct should be carefully audited during code review. For maximum safety, it should be avoided or programs should be run with the 
+command line option --check-bounds=yes to enable all checks regardless of manual annotations.
+
+To check a use of @inbounds for correctness, it suffices to examine all array indexing expressions (e.g. a[i]) within the expression it applies to, 
+and ensure that each index will always be within the bounds of the indexed array. For example the following common use pattern is valid:
+
+```julia
+@inbounds for i in eachindex(A)
+    A[i] = i
+end
+```
+
+By inspection, the variable `i` will always be a valid index for `A`.
+
+For contrast, the following use is invalid unless A is known to be a specific type (eg: `Vector`)
+
+```julia
+@inbounds for i in 1:length(A)
+    A[i] = i
+end
+```
+
+`@inbounds` should be applied to as narrow a region of code as possible. When applied to a large block of code, it can be difficult to identify 
+and verify all indexing expressions.
+
+### Avoid ccall unless necessary, and use safe ccall practices when required
+
+Calling C (and Fortran) libraries from Julia is very easy: the ccall syntax (and the more convenient @ccall macro) allow calling C libraries 
+without any need for glue files or boilerplate. They do require caution, however: the programmer tells Julia what the signature of each library 
+function is and if this is not done correctly, it can be the cause of crashes and thus security vulnerabilities. An exploit is just a crash that 
+an attacker has arranged to fail in a worse way than it would have randomly.
+
+Safe use of ccall depends on both automated and manual measures.
+
+What Julia does (automated):
+
+* Julia provides aliases for C types like Cint, Clong, Cchar, Csize_t, etc. It makes sure that these match what the C ABI on the machine that code is running on expects them to be. 
+* The Clang.jl package automates the process of turning C header files into valid ccall invocations.
+* Pkg+BinaryBuilder.jl allows precise versioning of binary dependencies.
+* Julia objects passed directly to ccall are protected from garbage collection (GC) for the duration of the call.
+
+What you must do (manual):
+* When writing ccall signatures, programmers should always look at the signature in the C header file and make sure the signature used in Julia matches exactly.
+* Use Julia’s C type aliases. For example, if an argument in C is of type int then the corresponding type in Julia is Cint, not Int — on most platforms Int will be the same size as Clong rather than Cint.
+* If a raw pointer to memory managed by Julia’s GC is passed to C via ccall, the owning object must be preserved using GC.@preserve around the use of ccall. See the documentation of this macro for more information and examples of proper usage.
+
+### Validate all user inputs to avoid code injection
+
+When writing programs that construct any kind of code based on user input, extra caution is required and the user input must be validated or escaped. 
+For example, a common type of attack in web applications written in all programming languages is SQL injection: a user input is spliced into an SQL 
+query to construct a customized query based on the user’s input. If raw user input is spliced into an SQL query as a string, it is easy to craft 
+inputs that will execute arbitrary SQL commands, including destructive ones or ones that will reveal private data to an attacker. To prevent this, 
+the user input should be passed as parameters to SQL prepared statements; a package such as SqlStrings.jl can be used to do this without a syntax burden. 
+This protects against malicious input, but also encourages systematic marshaling of Julia types into SQL types. If string interpolation must be used, 
+all user input should be either validated to match a strict, safe pattern (e.g. only consists of decimal digits or ASCII letters), or it should be 
+escaped to ensure that SQL treats it only as data, not as code (e.g. turn a user input into an escaped string literal).
+
+While we have talked specifically about SQL here, this issue is not limited to SQL. The same concern occurs when executing programs via shells, for example. 
+Julia is more secure than most programming languages in this respect because the default mechanism for running external code (see Cmd objects in the Julia manual) 
+is carefully designed to not be susceptible to this kind of injection, but programmers may be tempted to use a shell to call external code for convenience sake. 
+The fact that a shell must be explicitly invoked in Julia helps catch these kinds of circumstances. Using a shell like this is usually a bad idea and can typically 
+be avoided. If an external shell must be used, be certain that any user data used to construct the shell command is carefully validated or escaped to avoid shell 
+injection attacks.
+
+## Ensure secure random number generators are used when required
+
+The default pseudo-random number generator in Julia, which can be accessed by calling `rand()` and `randn()`, for example, is intended for simulation purposes and 
+not for applications requiring cryptographic security. An attacker can, by observing a series of random values, construct its internal state and predict future pseudo-random values. 
+For security-sensitive applications like generating secret values used for authentication, the `RandomDevice()` random-number generator should be used. This produces genuinely 
+random numbers which cannot be predicted.
+
+### Be aware of distributed computing encryption principles
+
+Julia’s distributed computing uses unencrypted TCP/IP sockets for communication by default and expects to be running on a fully trusted cluster. If `using Distributed` in your code through 
+`@distributed`, `pmap`, etc., be aware that the communication channels are not encrypted. Julia opens ports for communication between processes in a distributed cluster. A pre-generated 
+random cookie is necessary to successfully connect (Julia 0.5 onwards), which defeats arbitrary external connections. This mechanism is described in detail in https://github.com/JuliaLang/julia/pull/16292.
+
+For additional security, these communication channels can be encrypted through the use of a custom ClusterManager which enables SSH port forwarding, or uses some other mechanism to encrypt the communication channel.
+
+### Always immediately flush secret data after handling
+
+When it’s necessary to manage secret data (for example, a user’s password) it’s desirable to have this erased from memory immediately after finishing with the data. However, when a normal `String` or `Array` is used as 
+a container for such data, the underlying bytes persist after the container is deallocated and in principle could be recovered by an attacker at a later time. There are also situations where a string or array may be 
+implicitly copied, for example, if it is assigned to a location with a compatible but different type, it will be converted, thus creating a copy of the original data. Normally this is harmless and convenient, but 
+making copies of secrets is obviously bad for security. To prevent this, Julia provides the type `Base.SecretBuffer` and a `shred!` function which should be called immediately after the data is finished with. 
+The contents of a `SecretBuffer` must be explicitly extracted — it is never implicitly copied — and its contents will be automatically shredded upon garbage collection of the `SecretBuffer` object if the `shred!` 
+function was never called on it, with a warning indicating that the buffer should have been explicitly shredded by the programmer.
+
 ## Specific Rules
 
 ### High Level Rules
@@ -1237,3 +1432,5 @@ Add [FormatCheck.yml](https://github.com/SciML/ModelingToolkit.jl/blob/master/.g
 
 Many of these style choices were derived from the [Julia style guide](https://docs.julialang.org/en/v1/manual/style-guide/),
 the [YASGuide](https://github.com/jrevels/YASGuide), and the [Blue style guide](https://github.com/invenia/BlueStyle#module-imports).
+Additionally, many tips and requirements from the [JuliaHub Secure Coding Practices](https://juliahub.com/company/resources/white-papers/secure-julia-coding-practices/index.html)
+manual were incorporated into this style.