When searching multiple patterns, why doesn't -o
always return the longest match?
#3114
-
when combining Example: given:
and a set of patterns:
then
but changing the order of patterns (i.e. PATTERN_EXT before PATTERN in
i.e. ripgrep may not report the longest match if a pattern is a prefix of another pattern and the prefix-pattern is listed before the longer pattern in Version: ripgrep-14.1.1-x86_64-unknown-linux-musl |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
This isn't a bug and it is the intended semantic. ripgrep provides no way to do what you want. A work-around, assuming your pattern file is just a list of literals (i.e., not regexes) is to sort your patterns in descending order of length. This guarantees that no preceding pattern is a prefix of one that follows it. ripgrep only supports "leftmost-first" match semantics. You can find a description of that semantic in the library documentation for ripgrep's default regex engine. PCRE2 will behave the same way. The upside of leftmost-first is that it allows you to express a preference order. The downside is that it isn't commutative, as you've discovered. It is conceivable that, in the far future, ripgrep will support a flag to use leftmost-longest semantics while still supporting non-greedy operators (GNU grep's default regex engine only supports leftmost-longest, but not non-greedy operators). This is a large task and there are no immediate plans to work on it. |
Beta Was this translation helpful? Give feedback.
-
thanks for the pointers to the relevant documentation. |
Beta Was this translation helpful? Give feedback.
This isn't a bug and it is the intended semantic. ripgrep provides no way to do what you want. A work-around, assuming your pattern file is just a list of literals (i.e., not regexes) is to sort your patterns in descending order of length. This guarantees that no preceding pattern is a prefix of one that follows it.
ripgrep only supports "leftmost-first" match semantics. You can find a description of that semantic in the library documentation for ripgrep's default regex engine. PCRE2 will behave the same way.
The upside of leftmost-first is that it allows you to express a preference order. The downside is that it isn't commutative, as you've discovered.
It is conceivable that, in the far …