Skip to content

Conversation

art1f1c3R
Copy link
Member

@art1f1c3R art1f1c3R commented Sep 26, 2025

Summary

Many false positives were being triggered from the introduction of the excessive whitespace obfuscation rule in #1086. This is due to a lack of specificity in the rule. This PR resolves that change by considering the key syntax of using ; that is the primary malicious indicator for this method of obfuscation.

Description of changes

The rule originally considered any amount of excessive spacing (50 or more) before encountering code. Whitespaces here includes newline characters, and with Semgrep running regex pattern matching in multiline matching mode, this would trigger against code lines where a long line of code was broken across multiple lines like:

<indentation>                                             foo(arg1, another_foo(arg_2), 
                                                                                   arg3_on_other_line)

Due to differences in formatting in code files, it would sometimes also simply detect vertical spacing between lines.

The key malicious indicator white rules aims to detect is when a benign (syntactically valid) code statement is used at the start of the code line, and then the ; character is used to finish that statement, and start a new, malicious one out of the view of the general IDE (unless wrapped text was turned on, though this is often off by default). This is syntactically valid when there is excessive spacing:

  • After the benign code statement, before inserting a ; and then writing a malicious statement
  • After the benign code statement, before inserting a ;, then further excessive spacing and a malicious statement
  • After a ; inserted immediately after the benign code statement, and writing a malicious statement

The updated excessive_spacing.py showcases each of these examples. The new obfuscation.yaml file has been rewritten to include regex that reflects this. It has been tested on recently detected false positives logit-graph-0.1.0, kryon-ai-1.2.0, and cispark-0.1.0, as well as the integration test case django-5.0.6. It was run against the Backstabbers dataset, which confirmed it was able to detect this malicious behaviour, and on popular and trusted packages from the ICSE25-AE-Evaluation dataset, for which it did not trigger.

Checklist

  • I have reviewed the contribution guide.
  • My PR title and commits follow the Conventional Commits convention.
  • My commits include the "Signed-off-by" line.
  • I have signed my commits following the instructions provided by GitHub. Note that we run GitHub's commit verification tool to check the commit signatures. A green verified label should appear next to all of your commits on GitHub.
  • I have updated the relevant documentation, if applicable.
  • I have tested my changes and verified they work as expected.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Sep 26, 2025
@art1f1c3R art1f1c3R force-pushed the art1f1c3R/whitespaces-fp-reduction branch from 1d3f4bc to 33711dd Compare September 26, 2025 04:23
@art1f1c3R art1f1c3R merged commit 27f3cdd into main Sep 26, 2025
9 checks passed
Demolus13 pushed a commit to Demolus13/macaron that referenced this pull request Sep 26, 2025
…#1186)

Reduce false positives in the whitespace semgrep rule by considering the ; statement.

Signed-off-by: Carl Flottmann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants