CPU Bound Issue in Parser with Complex Grammar (Possible Error with handling of Zero-Width/empty Strings) #863

bytemouse · 2024-05-27T08:40:31Z

The bug
For complex grammar, the generation becomes CPU bound and doesn't terminate. My guess is that the problem lies with empty or zero-width strings not being properly handled by the parser. By line profiling I see that all of the time is spent in these lines:

guidance/guidance/_parser.py

Lines 191 to 195 in c9e71fb

    
           start_state_set = self.state_sets[item.start] 
        
           for start_item in start_state_set: 
        
               if ( 
        
                   start_item.pos < len(start_item.values) 
        
                   and start_item.values[start_item.pos] == item.node

To Reproduce
I use this model and this code:
https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b
https://gist.github.com/bytemouse/6b8eaa647840c3793d5a4f23516b2a5f

System info
OS: Fedora 40
Guidance Version: 0.1.15

Harsha-Nori · 2024-05-28T22:41:29Z

Thank you so so much for reporting this, and for sharing your grammar to help us reproduce! Would you mind checking if the issue existed on Guidance v.0.1.14? We introduced some richer support for zero-width strings in this recent release and it'd be helpful to know if it may be the culprit.

bytemouse · 2024-05-29T22:27:13Z

I tried the other 0.1.14 and the same error error persists with the same prompt and others. it always seems to 'crash' on the last closed curly brace as in this example in 0.1.14 and 15 across various prompts:

package cd4code.lighting;

class CarLighting {
  
  class Light {
    boolean on;
  }
  
  class HeadLight extends Light {
    double brightness;
  }
  
  class TailLight extends Light {
    double blinkRate;
  }

Even in cases where there is no empty token in the top logits, stalling at the probably last tokhen happens. Sometimes it stalls in _compute_children, but I am unsure if these similar problems relate to the same error. Thank you for your fast response.

Method	Calls	Time (ms)	Own time (ms)
`_compute_children`	13,070,618	139,349	126,432
`_inner_loop`	1,701	105,906	55,044
`<built-in method builtins.len>`	563,931,597	37,746	37,698
`__eq__`	128,219,828	25,312	15,477
`<built-in method builtins.isinstance>`	155,682,311	13,623	12,804
`_pre_process_regex`	543,995	16,668	9,877
`replace_grammar_node`	20,817	11,413	6,110

hudson-ai · 2024-05-31T00:45:15Z

@bytemouse you did indeed uncover a bug here, but here's a simple workaround for you for the moment. "Works on my machine", so your mileage may vary, but why don't you give this a try:

@guidance(stateless=True)
def NAME(lm):
    return lm + zero_or_more(WS()) + gen(regex='[a-zA-Z_\$][a-zA-Z_0-9\$]*')

The library we are using to parse regular expressions doesn't correctly escape special characters inside of character classes, so you have to do that manually for now (note that my only change above from your code was escaping the dollar signs).

bytemouse · 2024-05-31T11:35:21Z

This did resolve the issue. Thank you for your help!

Harsha-Nori self-assigned this May 30, 2024

hudson-ai added a commit to hudson-ai/guidance that referenced this issue Jun 1, 2024

First pass at a fix for guidance-ai#863

6037242

hudson-ai added a commit to hudson-ai/guidance that referenced this issue Jun 2, 2024

First pass at a fix for guidance-ai#863

f17e434

hudson-ai mentioned this issue Jun 2, 2024

[WIP] Fix infinite loop when computing parse tree for recursive nullable grammars #874

Closed

hudson-ai linked a pull request Sep 16, 2024 that will close this issue

Test recursive nullable grammars #1026

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU Bound Issue in Parser with Complex Grammar (Possible Error with handling of Zero-Width/empty Strings) #863

CPU Bound Issue in Parser with Complex Grammar (Possible Error with handling of Zero-Width/empty Strings) #863

bytemouse commented May 27, 2024 •

edited

Loading

Harsha-Nori commented May 28, 2024

bytemouse commented May 29, 2024 •

edited

Loading

hudson-ai commented May 31, 2024

bytemouse commented May 31, 2024

CPU Bound Issue in Parser with Complex Grammar (Possible Error with handling of Zero-Width/empty Strings) #863

CPU Bound Issue in Parser with Complex Grammar (Possible Error with handling of Zero-Width/empty Strings) #863

Comments

bytemouse commented May 27, 2024 • edited Loading

Harsha-Nori commented May 28, 2024

bytemouse commented May 29, 2024 • edited Loading

hudson-ai commented May 31, 2024

bytemouse commented May 31, 2024

bytemouse commented May 27, 2024 •

edited

Loading

bytemouse commented May 29, 2024 •

edited

Loading