fix: resolve --generate_autodan CLI feature bugs #1521

sinhaabhiraj0 · 2025-12-10T05:57:07Z

This PR fixes the broken `--generate_autodan` CLI feature by updating the AutoDAN code to match the current generator API.

Changes

cli.py

Use `parse_cli_plugin_config("probe", args)` to properly parse `--probe_options` instead of referencing non-existent `_config.probe_options`
Add `SystemExit(1)` after error message to prevent execution with undefined variables

autodan.py

Import `Conversation`, `Turn`, `Message` from `garak.attempt`
Wrap string prompt in `Conversation` object to match updated `generator.generate()` API

genetic.py

Import `Conversation`, `Turn`, `Message` from `garak.attempt`
Convert old OpenAI-style dict format to `Conversation` object for mutation generator calls

Test plan

Run `python -m garak --target_type huggingface.Model --target_name gpt2 --generate_autodan --probe_options '{"prompt": "Test prompt", "target": "Test target"}'`
Verify AutoDAN generation starts without errors
Verify mutation generator (gpt-3.5-turbo) works correctly"

Fixes NVIDIA#1520 - Fix cli.py: Use parse_cli_plugin_config() to properly parse --probe_options instead of referencing non-existent _config.probe_options. Also add SystemExit(1) to prevent execution with undefined variables. - Fix autodan.py: Wrap string prompt in Conversation object to match updated generator.generate() API that requires Conversation objects. - Fix genetic.py: Convert old OpenAI-style dict format to Conversation object for mutation generator calls.

jmartin-tech

This is a great find, this cli option is very much an edge case that needs some attention.

I have added some specific technical concern, and also offered that there may be feature or use case decision to be made here.

@erickgalinkin may want to weight in on the suggestions I have made on the cli.py changes. I suspect this cli option should be deprecated.

If this option is retained additional documentation of the configuration values and patterns for setting then are needed.

jmartin-tech · 2025-12-11T17:25:27Z

garak/cli.py

                        "AutoDAN generation requires --probe_options with a .json containing a `prompt` and `target` "
                        "string"
                    )
+                    raise SystemExit(1)


Suggested change

raise SystemExit(1)

raise

The exception handler should simply re-raise.

jmartin-tech · 2025-12-11T17:38:10Z

garak/resources/autodan/autodan.py

+            gen_str = gen_output.text if isinstance(gen_output, Message) else str(gen_output)
            success = check_for_attack_success(gen_str, REJECTION_STRINGS)


generator.generate must return a list[Message|None| this mean if this is not Message type it will be NoneType and casting that as a string is not a useful action. Another possible condition is for gen_output.text to be None.

Consider:

Suggested change

gen_str = gen_output.text if isinstance(gen_output, Message) else str(gen_output)

success = check_for_attack_success(gen_str, REJECTION_STRINGS)

success = False

if gen_output and gen_output.text:

success = check_for_attack_success(gen_output.text, REJECTION_STRINGS)

jmartin-tech · 2025-12-11T17:44:43Z

garak/resources/autodan/genetic.py

+            response = mutation_generator.generate(prompt=conv)
+            response_text = response[0].text if isinstance(response[0], Message) else str(response[0])
+            revised_sentence = response_text.replace("\n", "")
            received = True


Similar to other comment:

Suggested change

response = mutation_generator.generate(prompt=conv)

response_text = response[0].text if isinstance(response[0], Message) else str(response[0])

revised_sentence = response_text.replace("\n", "")

received = True

response = mutation_generator.generate(prompt=conv)[0]

if response and response.text:

revised_sentence = response.text.replace("\n", "")

received = True

jmartin-tech · 2025-12-11T18:29:46Z

garak/cli.py

+                    probe_options = parse_cli_plugin_config("probe", args)
+                    if probe_options is None:
+                        raise ValueError("probe_options is None")
+                    prompt = probe_options["prompt"]
+                    target = probe_options["target"]


This is inconsistent with other cli options, the help details mention prompt_options and the exception below mentions probe_options, this PR needs to expanded to ensure consistent messaging.

This also looks like something of a divergence from how configuration is done generally in the tooling as this is only possible as a cli option as coded.

At a minimum rely on the general configuration object that has already been processed and merged with file based configuration:

Suggested change

probe_options = parse_cli_plugin_config("probe", args)

if probe_options is None:

raise ValueError("probe_options is None")

prompt = probe_options["prompt"]

target = probe_options["target"]

probe_options = config_plugin_type.get("probe", None)

if probe_options is None:

raise ValueError("probe_options is None")

prompt = probe_options["prompt"]

target = probe_options["target"]

As an alternative maybe the generate_autodan option should be remove and the expectation should be adjusted to execute the dan.AutoDAN with exposed DEFAULT_PARAMS for goal_str and target which map to prompt and target here.

jmartin-tech requested changes Dec 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: resolve --generate_autodan CLI feature bugs #1521

fix: resolve --generate_autodan CLI feature bugs #1521

Uh oh!

sinhaabhiraj0 commented Dec 10, 2025

Uh oh!

jmartin-tech left a comment

Uh oh!

jmartin-tech Dec 11, 2025

Uh oh!

jmartin-tech Dec 11, 2025

Uh oh!

jmartin-tech Dec 11, 2025

Uh oh!

jmartin-tech Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		gen_str = gen_output.text if isinstance(gen_output, Message) else str(gen_output)
		success = check_for_attack_success(gen_str, REJECTION_STRINGS)

-            gen_str = gen_output.text if isinstance(gen_output, Message) else str(gen_output)
-            success = check_for_attack_success(gen_str, REJECTION_STRINGS)
+            success = False
+            if gen_output and gen_output.text:
+                success = check_for_attack_success(gen_output.text, REJECTION_STRINGS)

fix: resolve --generate_autodan CLI feature bugs #1521

Are you sure you want to change the base?

fix: resolve --generate_autodan CLI feature bugs #1521

Uh oh!

Conversation

sinhaabhiraj0 commented Dec 10, 2025

Changes

cli.py

autodan.py

genetic.py

Test plan

Uh oh!

jmartin-tech left a comment

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants