-
Notifications
You must be signed in to change notification settings - Fork 718
fix: resolve --generate_autodan CLI feature bugs #1521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: resolve --generate_autodan CLI feature bugs #1521
Conversation
Fixes NVIDIA#1520 - Fix cli.py: Use parse_cli_plugin_config() to properly parse --probe_options instead of referencing non-existent _config.probe_options. Also add SystemExit(1) to prevent execution with undefined variables. - Fix autodan.py: Wrap string prompt in Conversation object to match updated generator.generate() API that requires Conversation objects. - Fix genetic.py: Convert old OpenAI-style dict format to Conversation object for mutation generator calls.
jmartin-tech
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great find, this cli option is very much an edge case that needs some attention.
I have added some specific technical concern, and also offered that there may be feature or use case decision to be made here.
@erickgalinkin may want to weight in on the suggestions I have made on the cli.py changes. I suspect this cli option should be deprecated.
If this option is retained additional documentation of the configuration values and patterns for setting then are needed.
| "AutoDAN generation requires --probe_options with a .json containing a `prompt` and `target` " | ||
| "string" | ||
| ) | ||
| raise SystemExit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| raise SystemExit(1) | |
| raise |
The exception handler should simply re-raise.
| gen_str = gen_output.text if isinstance(gen_output, Message) else str(gen_output) | ||
| success = check_for_attack_success(gen_str, REJECTION_STRINGS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generator.generate must return a list[Message|None| this mean if this is not Message type it will be NoneType and casting that as a string is not a useful action. Another possible condition is for gen_output.text to be None.
Consider:
| gen_str = gen_output.text if isinstance(gen_output, Message) else str(gen_output) | |
| success = check_for_attack_success(gen_str, REJECTION_STRINGS) | |
| success = False | |
| if gen_output and gen_output.text: | |
| success = check_for_attack_success(gen_output.text, REJECTION_STRINGS) |
| response = mutation_generator.generate(prompt=conv) | ||
| response_text = response[0].text if isinstance(response[0], Message) else str(response[0]) | ||
| revised_sentence = response_text.replace("\n", "") | ||
| received = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to other comment:
| response = mutation_generator.generate(prompt=conv) | |
| response_text = response[0].text if isinstance(response[0], Message) else str(response[0]) | |
| revised_sentence = response_text.replace("\n", "") | |
| received = True | |
| response = mutation_generator.generate(prompt=conv)[0] | |
| if response and response.text: | |
| revised_sentence = response.text.replace("\n", "") | |
| received = True |
| probe_options = parse_cli_plugin_config("probe", args) | ||
| if probe_options is None: | ||
| raise ValueError("probe_options is None") | ||
| prompt = probe_options["prompt"] | ||
| target = probe_options["target"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inconsistent with other cli options, the help details mention prompt_options and the exception below mentions probe_options, this PR needs to expanded to ensure consistent messaging.
This also looks like something of a divergence from how configuration is done generally in the tooling as this is only possible as a cli option as coded.
At a minimum rely on the general configuration object that has already been processed and merged with file based configuration:
| probe_options = parse_cli_plugin_config("probe", args) | |
| if probe_options is None: | |
| raise ValueError("probe_options is None") | |
| prompt = probe_options["prompt"] | |
| target = probe_options["target"] | |
| probe_options = config_plugin_type.get("probe", None) | |
| if probe_options is None: | |
| raise ValueError("probe_options is None") | |
| prompt = probe_options["prompt"] | |
| target = probe_options["target"] |
As an alternative maybe the generate_autodan option should be remove and the expectation should be adjusted to execute the dan.AutoDAN with exposed DEFAULT_PARAMS for goal_str and target which map to prompt and target here.
Fixes #1520
This PR fixes the broken `--generate_autodan` CLI feature by updating the AutoDAN code to match the current generator API.
Changes
cli.py
autodan.py
genetic.py
Test plan