-
Notifications
You must be signed in to change notification settings - Fork 718
fix: resolve --generate_autodan CLI feature bugs #1521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -611,13 +611,17 @@ def worker_count_validation(workers): | |||||
| from garak.resources.autodan import autodan_generate | ||||||
|
|
||||||
| try: | ||||||
| prompt = _config.probe_options["prompt"] | ||||||
| target = _config.probe_options["target"] | ||||||
| probe_options = parse_cli_plugin_config("probe", args) | ||||||
| if probe_options is None: | ||||||
| raise ValueError("probe_options is None") | ||||||
| prompt = probe_options["prompt"] | ||||||
| target = probe_options["target"] | ||||||
| except Exception as e: | ||||||
| print( | ||||||
| "AutoDAN generation requires --probe_options with a .json containing a `prompt` and `target` " | ||||||
| "string" | ||||||
| ) | ||||||
| raise SystemExit(1) | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
The exception handler should simply re-raise. |
||||||
| autodan_generate(generator=generator, prompt=prompt, target=target) | ||||||
|
|
||||||
| command.start_run() # start the run now that all config validation is complete | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -12,6 +12,7 @@ | |||||||||||
| from garak._plugins import load_plugin | ||||||||||||
| from garak.generators import Generator | ||||||||||||
| from garak.generators.huggingface import Model | ||||||||||||
| from garak.attempt import Conversation, Turn, Message | ||||||||||||
| import garak._config | ||||||||||||
| from garak.data import path as data_path | ||||||||||||
| from garak.resources.autodan.genetic import ( | ||||||||||||
|
|
@@ -178,7 +179,9 @@ def autodan_generate( | |||||||||||
| best_new_adv_prefix = new_adv_prefixes[best_new_adv_prefix_id] | ||||||||||||
|
|
||||||||||||
| adv_prefix = best_new_adv_prefix | ||||||||||||
| gen_str = generator.generate(prompt=adv_prefix)[0] | ||||||||||||
| conv = Conversation(turns=[Turn(role="user", content=Message(text=adv_prefix))]) | ||||||||||||
| gen_output = generator.generate(prompt=conv)[0] | ||||||||||||
| gen_str = gen_output.text if isinstance(gen_output, Message) else str(gen_output) | ||||||||||||
| success = check_for_attack_success(gen_str, REJECTION_STRINGS) | ||||||||||||
|
Comment on lines
+184
to
185
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Consider:
Suggested change
|
||||||||||||
| if success: | ||||||||||||
| logger.info( | ||||||||||||
|
|
||||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -16,6 +16,7 @@ | |||||||||||||||||
|
|
||||||||||||||||||
| from garak.resources.api import nltk | ||||||||||||||||||
| from garak.resources.autodan.model_utils import AutoDanPrefixManager, forward | ||||||||||||||||||
| from garak.attempt import Conversation, Turn, Message | ||||||||||||||||||
|
|
||||||||||||||||||
| logger = getLogger(__name__) | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
@@ -278,13 +279,13 @@ def gpt_mutate(mutation_generator, sentence: str) -> str: | |||||||||||||||||
| while not received: | ||||||||||||||||||
| try: | ||||||||||||||||||
| # TODO: Make the model configurable. | ||||||||||||||||||
| response = mutation_generator.generate( | ||||||||||||||||||
| prompt=[ | ||||||||||||||||||
| {"role": "system", "content": system_msg}, | ||||||||||||||||||
| {"role": "user", "content": user_message}, | ||||||||||||||||||
| ] | ||||||||||||||||||
| ) | ||||||||||||||||||
| revised_sentence = response[0].replace("\n", "") | ||||||||||||||||||
| conv = Conversation(turns=[ | ||||||||||||||||||
| Turn(role="system", content=Message(text=system_msg)), | ||||||||||||||||||
| Turn(role="user", content=Message(text=user_message)), | ||||||||||||||||||
| ]) | ||||||||||||||||||
| response = mutation_generator.generate(prompt=conv) | ||||||||||||||||||
| response_text = response[0].text if isinstance(response[0], Message) else str(response[0]) | ||||||||||||||||||
| revised_sentence = response_text.replace("\n", "") | ||||||||||||||||||
| received = True | ||||||||||||||||||
|
Comment on lines
+286
to
289
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar to other comment:
Suggested change
|
||||||||||||||||||
| except Exception as e: | ||||||||||||||||||
| logger.error(e) | ||||||||||||||||||
|
|
||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inconsistent with other cli options, the
helpdetails mentionprompt_optionsand the exception below mentionsprobe_options, this PR needs to expanded to ensure consistent messaging.This also looks like something of a divergence from how configuration is done generally in the tooling as this is only possible as a
clioption as coded.At a minimum rely on the general configuration object that has already been processed and merged with file based configuration:
As an alternative maybe the
generate_autodanoption should be remove and the expectation should be adjusted to execute thedan.AutoDANwith exposedDEFAULT_PARAMSforgoal_strandtargetwhich map topromptandtargethere.