-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU Bound Issue in Parser with Complex Grammar (Possible Error with handling of Zero-Width/empty Strings) #863
Comments
Thank you so so much for reporting this, and for sharing your grammar to help us reproduce! Would you mind checking if the issue existed on Guidance v.0.1.14? We introduced some richer support for zero-width strings in this recent release and it'd be helpful to know if it may be the culprit. |
@bytemouse you did indeed uncover a bug here, but here's a simple workaround for you for the moment. "Works on my machine", so your mileage may vary, but why don't you give this a try: @guidance(stateless=True)
def NAME(lm):
return lm + zero_or_more(WS()) + gen(regex='[a-zA-Z_\$][a-zA-Z_0-9\$]*') The library we are using to parse regular expressions doesn't correctly escape special characters inside of character classes, so you have to do that manually for now (note that my only change above from your code was escaping the dollar signs). |
This did resolve the issue. Thank you for your help! |
The bug
For complex grammar, the generation becomes CPU bound and doesn't terminate. My guess is that the problem lies with empty or zero-width strings not being properly handled by the parser. By line profiling I see that all of the time is spent in these lines:
guidance/guidance/_parser.py
Lines 191 to 195 in c9e71fb
To Reproduce
I use this model and this code:
https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b
https://gist.github.com/bytemouse/6b8eaa647840c3793d5a4f23516b2a5f
System info
OS: Fedora 40
Guidance Version: 0.1.15
The text was updated successfully, but these errors were encountered: