-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to cancel hs_scan*() #139
Comments
Hi @rschu1ze we can provide the second method, but it will go in the next version, this one (5.4.9) needs to be released asap, it's already overdue. |
That would be awesome, thanks :) |
We need to release 5.4.10 asap, so this is moved to next version, however this will not take that long as we have increased our resources in this project. |
@rschu1ze we will begin development of this feature now. As explained in the Readme, due to the recent closed-sourcing of original hyperscan project for versions >5.4, we will continue to keep compatibility with this version, but we will not pursue compatibility with later IPL hyperscan versions. This is actually a good thing for us, as it allows us to extend functionality without needing to chase the original project anymore. Now, with regards to this problem, we intend to add a few more hs_scan_*_extended() functions that can do things that the original API does not provide, but without changing the original API. We will start with adding another periodic callback function as you called it, with a user provided period. Is there anything else that you would like to add in this, now that we're still in the design phase? |
@markos Sorry for not checking back earlier. New functions
Sounds good, looking forward to this. The only addition I would have is that pattern compilation is also prone to ReDoS attacks, meaning that a similar mechanism in |
We (ClickHouse) recently encountered some patterns which are extremely expensive to evaluate with vector/hyperscan, for example bounded repeats "x{n,m}" (these are also documented as being expensive). As a mitigation, we now check patterns on a best-effort basis and reject them when they will likely be expensive.
A better solution would be to either
hs_scan_*()
(*) are provided callbacks which can stop the scan but they are only called when a match is found. Ideally, a second callback can be provided which is called regularly (every N "steps" - whatever that means in the context of vectorscan). I know that vectorscan attempts to stay API-compatible with hyperscan, so these callbacks could be added as new parameters with default value.EDIT: Just noticed that pattern compilation, i.e.
hs_compile_multi(),
becomes slow (not: the scan). A callback for cancelinghs_compile_*()
would be great.(*) ClickHouse actually only uses block mode, not streaming or vector modes.
The text was updated successfully, but these errors were encountered: