-
Notifications
You must be signed in to change notification settings - Fork 718
fix: encoding probes storing translated text in pre_translation_prompt
#1483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thank you @paulinek13 ! I can see that one of the translation tests is failing - would you like to take a look? |
|
@leondz sorry, I didn't run all the tests locally. This should fix it: |
jmartin-tech
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the actual bug is translation of the wrong item. Need some testing to validate. Note that if triggers should be translated then the test revision should be rolled back and the number of calls to get_text would become consistent again.
| self.prompts, self.triggers = zip( | ||
| *random.sample(generated_prompts, self.soft_probe_prompt_cap) | ||
| ) | ||
| self.prompts = self.langprovider.get_text(self.prompts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this actually be translating the self.triggers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paulinek13 Would appreciate your input here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking: for encoding probes, since the attack is in the encoding itself, does the language of the triggers really matter? Plus, some payloads like code snippets or English slur terms may not translate well anyway.
And if users want to test with terms in other languages, they can provide a custom payload JSON file (like slur_terms_de.json for example).
That's how I currently see it, but I might be missing something here.
Do you think that makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay here, looking closely at _generate_encoded_prompts(), you are correct the triggers here are set before encoding so the response value should be compared to the original text not a translation.
What This Change Does
This small change fixes #1461.
Problem: Encoding probes were incorrectly storing translated text in the
pre_translation_promptfield while marking it with the source language tag in reports.Fix: Removed early translation from
EncodingMixin.__init__()to ensure prompts remain untranslated until the translation flow inProbe.probe().Verification
python -m pytest tests/probes/test_probes_encoding.py(84 passed, 1 skipped in 1.92s)pre_translation_promptin reports contains English text tagged as "en"