You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The bug
Regex rules in guidance.gen fail to handle non-ASCII characters (e.g., German umlauts such as ä, ö, ü, ß). Even when explicitly included in the regex pattern, the generated text systematically omits these characters.
To Reproduce
The following code demonstrates the issue. The regex pattern explicitly permits German umlauts and expects the generated text to adhere to it. However, the output consistently avoids such characters.
fromguidanceimportgenfromguidance.modelsimportTransformerslm=Transformers("Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4", device_map="cuda", echo=False)
lm+="<|im_start|>system\nErstelle eine Liste mit 10 Regeln zum Fußball.<|im_end|>\n<|im_start|>assistant\nRegelliste:\n"foriinrange(1, 11):
lm+=f"{i}. "+gen('rule', stop='\n', regex=r'[A-ZÄÖÜ][a-zA-Z., äöüÄÖÜß]*\.\n')
print(i, lm['rule'].strip())
Expected behavior
The generated text should match the regex pattern, including words with umlauts such as "Fußball", "München", or "Größe". Expected output example:
1. Fußball ist ein beliebter Sport.
2. Spieler dürfen keine Handspiele machen.
...
Actual behavior
The generated text omits umlauts, even though they are explicitly allowed in the regex pattern. For example:
1. Fussball ist ein beliebter Sport.
2. Spieler duerfen keine Handspiele machen.
...
The bug
Regex rules in
guidance.gen
fail to handle non-ASCII characters (e.g., German umlauts such as ä, ö, ü, ß). Even when explicitly included in the regex pattern, the generated text systematically omits these characters.To Reproduce
The following code demonstrates the issue. The regex pattern explicitly permits German umlauts and expects the generated text to adhere to it. However, the output consistently avoids such characters.
Expected behavior
The generated text should match the regex pattern, including words with umlauts such as "Fußball", "München", or "Größe". Expected output example:
Actual behavior
The generated text omits umlauts, even though they are explicitly allowed in the regex pattern. For example:
System info:
The text was updated successfully, but these errors were encountered: