Impact
Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+
where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways.Some regular expressions take a long time to match certain input strings to the point where the time it
takes to match a string of length is proportional to or even . Such regular expressions can
negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS")
attack by crafting an expensive input string for the regular expression to match.
The regular expression engine provided by Python uses a backtracking non-deterministic finite automata
to implement regular expression matching. While this approach is space-efficient and allows supporting
advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity
of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape,
increasing the input length by ten characters may make the automaton about 1000 times slower.
Patches
_N/A
Workarounds
Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the
regular expression are short enough that the time-complexity does not matter
References
_OWASP: Regular expression Denial of Service - ReDoS.
• Wikipedia: ReDoS.
• Wikipedia: Time complexity.
• James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Denial-ofService
Attack.
• Common Weakness Enumeration: CWE-1333.
• Common Weakness Enumeration: CWE-730.
• Common Weakness Enumeration: CWE-400.
#POC
src/Tabs/ProgramsTab.py:59
This part of the regular expression may cause exponential backtracking on strings starting
with '0-' and containing many repetitions of '--'.
56 r"(?:(?Phttps?)://)?"
57 r"(?P"
58 r"(?:*.)?" # allow wildcard at the start
59 r"[a-zA-Z0-9]+(?:-[a-zA-Z0-9-]+)"
60 r"(?:.[a-zA-Z0-9]+(?:-[a-zA-Z0-9-]+))+"
61 r")"
62 r"(?P:[0-9]+)?" # potential port
Consider this regular expression:
^(__|.)+$
Its sub-expression "(|.)+?" can match the string "" either by the first alternative "" to the left
of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string
consisting of an odd number of underscores followed by some other character will cause the regular
expression engine to run for an exponential amount of time before rejecting the input.
This problem can be avoided by rewriting the regular expression to remove the ambiguity between the
two branches of the alternative inside the repetition:
^_(|[^_])+_$
Impact
Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+
where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways.Some regular expressions take a long time to match certain input strings to the point where the time it
takes to match a string of length is proportional to or even . Such regular expressions can
negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS")
attack by crafting an expensive input string for the regular expression to match.
The regular expression engine provided by Python uses a backtracking non-deterministic finite automata
to implement regular expression matching. While this approach is space-efficient and allows supporting
advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity
of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape,
increasing the input length by ten characters may make the automaton about 1000 times slower.
Patches
_N/A
Workarounds
Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the
regular expression are short enough that the time-complexity does not matter
References
_OWASP: Regular expression Denial of Service - ReDoS.
• Wikipedia: ReDoS.
• Wikipedia: Time complexity.
• James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Denial-ofService
Attack.
• Common Weakness Enumeration: CWE-1333.
• Common Weakness Enumeration: CWE-730.
• Common Weakness Enumeration: CWE-400.
#POC
src/Tabs/ProgramsTab.py:59
This part of the regular expression may cause exponential backtracking on strings starting
with '0-' and containing many repetitions of '--'.
56 r"(?:(?Phttps?)://)?"
57 r"(?P"
58 r"(?:*.)?" # allow wildcard at the start
59 r"[a-zA-Z0-9]+(?:-[a-zA-Z0-9-]+)"
60 r"(?:.[a-zA-Z0-9]+(?:-[a-zA-Z0-9-]+))+"
61 r")"
62 r"(?P:[0-9]+)?" # potential port
Consider this regular expression:
^(__|.)+$
Its sub-expression "(|.)+?" can match the string "" either by the first alternative "" to the left
of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string
consisting of an odd number of underscores followed by some other character will cause the regular
expression engine to run for an exponential amount of time before rejecting the input.
This problem can be avoided by rewriting the regular expression to remove the ambiguity between the
two branches of the alternative inside the repetition:
^_(|[^_])+_$