chore: reduce FPs in whitespace PR by considering ; statement #1186
+69
−26
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Many false positives were being triggered from the introduction of the excessive whitespace obfuscation rule in #1086. This is due to a lack of specificity in the rule. This PR resolves that change by considering the key syntax of using
;
that is the primary malicious indicator for this method of obfuscation.Description of changes
The rule originally considered any amount of excessive spacing (50 or more) before encountering code. Whitespaces here includes newline characters, and with Semgrep running regex pattern matching in multiline matching mode, this would trigger against code lines where a long line of code was broken across multiple lines like:
Due to differences in formatting in code files, it would sometimes also simply detect vertical spacing between lines.
The key malicious indicator white rules aims to detect is when a benign (syntactically valid) code statement is used at the start of the code line, and then the
;
character is used to finish that statement, and start a new, malicious one out of the view of the general IDE (unless wrapped text was turned on, though this is often off by default). This is syntactically valid when there is excessive spacing:;
and then writing a malicious statement;
, then further excessive spacing and a malicious statement;
inserted immediately after the benign code statement, and writing a malicious statementThe updated
excessive_spacing.py
showcases each of these examples. The newobfuscation.yaml
file has been rewritten to include regex that reflects this. It has been tested on recently detected false positiveslogit-graph-0.1.0
,kryon-ai-1.2.0
, andcispark-0.1.0
, as well as the integration test casedjango-5.0.6
. It was run against the Backstabbers dataset, which confirmed it was able to detect this malicious behaviour, and on popular and trusted packages from the ICSE25-AE-Evaluation dataset, for which it did not trigger.Checklist
verified
label should appear next to all of your commits on GitHub.