-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PromptGuard to safety_utils #608
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good! added a couple of simple comments only.
@@ -34,6 +34,7 @@ def main( | |||
enable_sensitive_topics: bool=False, # Enable check for sensitive topics using AuditNLG APIs | |||
enable_salesforce_content_safety: bool=True, # Enable safety check with Salesforce safety flan t5 | |||
enable_llamaguard_content_safety: bool=False, # Enable safety check with Llama-Guard | |||
enable_promptguard_safety: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the small size of the mode, can we leave it as default true?
@@ -37,6 +37,7 @@ def main( | |||
enable_saleforce_content_safety: bool=True, # Enable safety check woth Saleforce safety flan t5 | |||
use_fast_kernels: bool = False, # Enable using SDPA from PyTorch Accelerated Transformers, make use Flash Attention and Xformer memory-efficient kernels | |||
enable_llamaguard_content_safety: bool = False, | |||
enable_promptguard_safety: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specially for this call, can we set it to true?
scores = self.get_scores(sentence) | ||
running_scores['jailbreak'] = max([running_scores['jailbreak'],scores['jailbreak']]) | ||
running_scores['indirect_injection'] = max([running_scores['indirect_injection'],scores['indirect_injection']]) | ||
is_safe = True if running_scores['jailbreak'] < 0.5 else False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set the bar at 0.5? I think 0.8 or 0.9 would be better based on talks with the team.
for sentence in sentences: | ||
scores = self.get_scores(sentence) | ||
running_scores['jailbreak'] = max([running_scores['jailbreak'],scores['jailbreak']]) | ||
running_scores['indirect_injection'] = max([running_scores['indirect_injection'],scores['indirect_injection']]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we comment that this is not being used for the user dialog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall but lets address the way we feed the prompt to the model (See comments).
from torch.nn.functional import softmax | ||
inputs = self.tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512) | ||
inputs = inputs.to(device) | ||
if len(inputs[0]) > 512: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As max_length is 512 this condition should never be true. Can we instead follow the PromptGuard recommendation and split the text into multiple segments which we apply in parallel (batched)? Especially because of the much bigger context length of Llama 3.1.
return "PromptGuard", True, "PromptGuard is not used for model output so checking not carried out" | ||
sentences = text_for_check.split(".") | ||
running_scores = {'jailbreak':0, 'indirect_injection' :0} | ||
for sentence in sentences: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its probably more efficient to do this batched as commented above. Lets split the prompt in blocks of 512 (with some overlap) and then feed them batched into the model which will be way more efficient than feeding the sentences one by one.
PromptGuard has been introduced as a system safety tool to be used to check LLM prompts for malicious text. This PR adds this guard to the safety_utils module and adjusts recipes to use this new check.
General simple test
To test the text splitter feature