[Security] Make formatted expression free from CVE-2021-42574 (Trojan Source) #276

ShamrockLee · 2021-11-25T16:11:14Z

Proposal

Close the Unicode RTL sections, remove trailing unclosed RTL characters in strings and comments, and warn the user about their existence to prevent CVE-2021-42574.

CVE-2021-42574, a.k.a. the Trojan Source vulnerability, is resulted from the unclosed Unicode RTL control characters. It allows the attacker to alter the order of sections of words in a line, and to make a commented section / a section inside a string to appear at the right side of the comment/string.

https://github.com/nickboucher/trojan-source
https://www.trojansource.codes/
Boucher and Anderson (2021). Trojan Source: Invisible Vulnerabilities. https://trojansource.codes/trojan-source.pdf

Input

{ lib, hello }:
hello.overrideAttrs (oldAttrs:
  let
    scrSecure = builtins.trace "Using the secure source" oldAttrs.src;
  in {
    pname = oldAttrs.pname + "-secure";
    /*Replace the source with a secure one‮⁦src = srcSecure;⁩⁦*/
})

Output

{ lib, hello }:
hello.overrideAttrs (oldAttrs:
  let
    scrSecure = builtins.trace "Using the secure source" oldAttrs.src;
  in
  {
    pname = oldAttrs.pname + "-secure";
    /*Replace the source with a secure one‮⁦src = srcSecure;⁩⁦*/
  })

Desired output

{ lib, hello }:
hello.overrideAttrs (oldAttrs:
  let
    scrSecure = builtins.trace "Using the secure source" oldAttrs.src;
  in
  {
    pname = oldAttrs.pname + "-secure";
    /*Replace the source with a secure one‮⁦src = srcSecure;⁩‬*/
  })

The trick

The execution result of the input is equivalent to that of the desired output, which would be surprising to reviewers. This is due to the explicit formatting support of the Unicode BIDI (bi-direction) algorithm with control characters:

<RLE>...<PDF>
<LRE>...<PDF>
<RLO>...<PDF>
<LRO>...<PDF>
<RLI>...<PDI>
<LRI>...<PDI>
<FSI>...<PDI>

When a explicitly-directional-changing section is not closed, it may infect the following characters in the same line.
See https://www.unicode.org/reports/tr9/tr9-42.html for more detail.
When a explicitly-directional-changing section is not closed, it may infect the following characters in the same line, and that's what happen to the line 8 of the Output:

/*Replace the source with a secure one<RLO><LRI>src = srcSecure;<PDI><LRI>*/

The solution is to:

Remove the <RLE>, <LRE>, <RLO>, <LRO>, <RLI>, <LRI>, <FSI> followed by ", '', */ or the end of line.
Otherwise, add the corresponding <PDF> or <PDI> before ", '', */ or the end of line.
Show warnings about the existence of the RTL control characters, especially when they are not closed in the unformatted file.

After that, the line should become

/*Replace the source with a secure one<RLO><LRI>src = srcSecure;<PDI><PDF>*/

(with the trailing <RLI> removed and <RLO> closed with <PDF>)

I'm not a Unicode expert. It would need someone who are more familiar with this topic to refine the proposal.

Reproduction of the example Nix expressions

Prepare the following python scripts:

gen_code.py

#!/usr/bin/env python3

# Note the unclosed RTL section
code_string = '''
{ lib, hello }:
hello.overrideAttrs (oldAttrs:
  let
    scrSecure = builtins.trace "Using the secure source" oldAttrs.src;
  in {
    pname = oldAttrs.pname + "-secure";
    /*Replace the source with a secure one\N{RLO}\N{LRI}src = srcSecure;\N{PDI}\N{LRI}*/
})
''';

if __name__ == '__main__':
    print(code_string)

gen_code_clean.py

#!/usr/bin/env python3

# The RTL sections are now closed
code_string = '''
{ lib, hello }:
hello.overrideAttrs (oldAttrs:
  let
    scrSecure = builtins.trace "Using the secure source" oldAttrs.src;
  in {
    pname = oldAttrs.pname + "-secure";
    /*Replace the source with a secure one\N{RLO}\N{LRI}src = srcSecure;\N{PDI}\N{PDF}*/
})
''';

if __name__ == '__main__':
    print(code_string)

Run

python3 ./gen_code.py > reproduce.nix
cp reproduce{,_formatted}.nix
nixpkgs-fmt reproduce_formatted.nix
python3 ./gen_code_clean.py > reproduce_clean.py
cp reproduce_clean{,_formatted}.nix
nixpkgs-fmt reproduce_clean_formatted.nix

Get the content:
- Input: reproduce.nix
- Output: reproduce_formatted.nix
- Desired output: reproduce_clean_formatted.nix

The text was updated successfully, but these errors were encountered:

nixos-discourse · 2021-11-29T07:53:30Z

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/protect-nix-codebases-against-trojan-source-cve-2021-42574/16316/1

erikarvstedt · 2021-11-29T13:52:08Z

Related:
https://research.swtch.com/trojan
ziglang/zig#10074

Summary:
This has to be addressed in editors/code review tools, not at lower levels of the tooling stack.

ShamrockLee · 2021-12-02T10:12:21Z

Related: https://research.swtch.com/trojan ziglang/zig#10074

Summary: This has to be addressed in editors/code review tools, not at lower levels of the tooling stack.

The main point in the two links is that the fix shouldn't be done in compilers (interpreters in the case of Nix). This formatter may be used as part of the editors or review tools (e.g. the jnoortheen.nix-ide VSCode extension), and the implementation may benefits developers using the related tools.

zimbatm · 2021-12-23T17:17:22Z

I'm fine having this in nixpkgs-fmt, we just need somebody to send in a PR. It would also be nice to have that as part of the Nix language as well, and the nixpkgs CI. It doesn't have to be an either/or situtation.

ShamrockLee · 2021-12-30T18:24:57Z

I'm currently busy working on the finals, and my knowledge to Rust is limited. It would take several weeks before trying to implement it.

I'll open a PR containing the test data in case someone wants to take a look.

ShamrockLee added formatting Issue with the output format needs triage labels Nov 25, 2021

ShamrockLee mentioned this issue Dec 30, 2021

[WIP] Enable nixpkgs-fmt to fix RTL attacks inside the Nix expressions #281

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security] Make formatted expression free from CVE-2021-42574 (Trojan Source) #276

[Security] Make formatted expression free from CVE-2021-42574 (Trojan Source) #276

ShamrockLee commented Nov 25, 2021 •

edited

Loading

nixos-discourse commented Nov 29, 2021

erikarvstedt commented Nov 29, 2021

ShamrockLee commented Dec 2, 2021

zimbatm commented Dec 23, 2021

ShamrockLee commented Dec 30, 2021

[Security] Make formatted expression free from CVE-2021-42574 (Trojan Source) #276

[Security] Make formatted expression free from CVE-2021-42574 (Trojan Source) #276

Comments

ShamrockLee commented Nov 25, 2021 • edited Loading

Proposal

Input

Output

Desired output

The trick

Reproduction of the example Nix expressions

nixos-discourse commented Nov 29, 2021

erikarvstedt commented Nov 29, 2021

ShamrockLee commented Dec 2, 2021

zimbatm commented Dec 23, 2021

ShamrockLee commented Dec 30, 2021

ShamrockLee commented Nov 25, 2021 •

edited

Loading