Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lift compile time default maximum pattern length #406

Open
mvorisek opened this issue Apr 17, 2024 · 1 comment
Open

Lift compile time default maximum pattern length #406

mvorisek opened this issue Apr 17, 2024 · 1 comment

Comments

@mvorisek
Copy link

mvorisek commented Apr 17, 2024

This is a feature request to lift the default length limit of 64k bytes.

PCRE library is used in many complex applications like to parse language grammars and the requirements naturally grow every year.

We are at year 2024 and the PCRE library itself should not impose too much unneeded limitation. Especially compile time limitation.

I propose to lift the compile time limit by raising the LINK_SIZE option to 4 by default. At least of 64-bit platforms.

The matching time/performance should not increase much.

If there should be a limit, it should be runtime configureable like other limits using pcre2_set_*_limit() function with default like 10M (UPDATE: implemented in 05aafb2).

related past issues: #119 #271 JuliaText/TextAnalysis.jl#258

@PhilipHazel
Copy link
Collaborator

I am not at all keen on making such a change, for a number of reasons:

  1. It's a serious incompatible change that may affect existing users. I don't like doing that.
  2. There are very many PCRE2 users for whom 64K is more than enough. I'm thinking of, for example, email routing (for which I originally wrote PCRE), text editing, and similar applications where the patterns are generally quite short.
  3. I know (because there have been issues in the past) that there are applications using PCRE2 that run very large numbers of separate threads, each of which has very limited memory. Such a change may well impact them.
  4. There is of course a performance cost, though it would depend very much on the pattern details.

If I were starting again (knowing what I now know) I would certainly do things differently, probably always compiling into 32-bit units, but hindsight is always 20/20.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants