Skip to content

Conversation

@sinhaabhiraj0
Copy link

Fixes #1520

This PR fixes the broken `--generate_autodan` CLI feature by updating the AutoDAN code to match the current generator API.

Changes

cli.py

  • Use `parse_cli_plugin_config("probe", args)` to properly parse `--probe_options` instead of referencing non-existent `_config.probe_options`
  • Add `SystemExit(1)` after error message to prevent execution with undefined variables

autodan.py

  • Import `Conversation`, `Turn`, `Message` from `garak.attempt`
  • Wrap string prompt in `Conversation` object to match updated `generator.generate()` API

genetic.py

  • Import `Conversation`, `Turn`, `Message` from `garak.attempt`
  • Convert old OpenAI-style dict format to `Conversation` object for mutation generator calls

Test plan

  • Run `python -m garak --target_type huggingface.Model --target_name gpt2 --generate_autodan --probe_options '{"prompt": "Test prompt", "target": "Test target"}'`
  • Verify AutoDAN generation starts without errors
  • Verify mutation generator (gpt-3.5-turbo) works correctly"

Fixes NVIDIA#1520

- Fix cli.py: Use parse_cli_plugin_config() to properly parse --probe_options
  instead of referencing non-existent _config.probe_options. Also add
  SystemExit(1) to prevent execution with undefined variables.

- Fix autodan.py: Wrap string prompt in Conversation object to match
  updated generator.generate() API that requires Conversation objects.

- Fix genetic.py: Convert old OpenAI-style dict format to Conversation
  object for mutation generator calls.
Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great find, this cli option is very much an edge case that needs some attention.

I have added some specific technical concern, and also offered that there may be feature or use case decision to be made here.

@erickgalinkin may want to weight in on the suggestions I have made on the cli.py changes. I suspect this cli option should be deprecated.

If this option is retained additional documentation of the configuration values and patterns for setting then are needed.

"AutoDAN generation requires --probe_options with a .json containing a `prompt` and `target` "
"string"
)
raise SystemExit(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise SystemExit(1)
raise

The exception handler should simply re-raise.

Comment on lines +184 to 185
gen_str = gen_output.text if isinstance(gen_output, Message) else str(gen_output)
success = check_for_attack_success(gen_str, REJECTION_STRINGS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generator.generate must return a list[Message|None| this mean if this is not Message type it will be NoneType and casting that as a string is not a useful action. Another possible condition is for gen_output.text to be None.

Consider:

Suggested change
gen_str = gen_output.text if isinstance(gen_output, Message) else str(gen_output)
success = check_for_attack_success(gen_str, REJECTION_STRINGS)
success = False
if gen_output and gen_output.text:
success = check_for_attack_success(gen_output.text, REJECTION_STRINGS)

Comment on lines +286 to 289
response = mutation_generator.generate(prompt=conv)
response_text = response[0].text if isinstance(response[0], Message) else str(response[0])
revised_sentence = response_text.replace("\n", "")
received = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to other comment:

Suggested change
response = mutation_generator.generate(prompt=conv)
response_text = response[0].text if isinstance(response[0], Message) else str(response[0])
revised_sentence = response_text.replace("\n", "")
received = True
response = mutation_generator.generate(prompt=conv)[0]
if response and response.text:
revised_sentence = response.text.replace("\n", "")
received = True

Comment on lines +614 to +618
probe_options = parse_cli_plugin_config("probe", args)
if probe_options is None:
raise ValueError("probe_options is None")
prompt = probe_options["prompt"]
target = probe_options["target"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with other cli options, the help details mention prompt_options and the exception below mentions probe_options, this PR needs to expanded to ensure consistent messaging.

This also looks like something of a divergence from how configuration is done generally in the tooling as this is only possible as a cli option as coded.

At a minimum rely on the general configuration object that has already been processed and merged with file based configuration:

Suggested change
probe_options = parse_cli_plugin_config("probe", args)
if probe_options is None:
raise ValueError("probe_options is None")
prompt = probe_options["prompt"]
target = probe_options["target"]
probe_options = config_plugin_type.get("probe", None)
if probe_options is None:
raise ValueError("probe_options is None")
prompt = probe_options["prompt"]
target = probe_options["target"]

As an alternative maybe the generate_autodan option should be remove and the expectation should be adjusted to execute the dan.AutoDAN with exposed DEFAULT_PARAMS for goal_str and target which map to prompt and target here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

--generate_autodan CLI feature is broken

2 participants