Skip to content

Conversation

@saichandrapandraju
Copy link

@saichandrapandraju saichandrapandraju commented Oct 22, 2025

This PR introduces the GOAT (Generative Offensive Agent Tester) probe, a multi-turn red-teaming system that uses an Observation-Thought-Strategy-Reply (O-T-S-R) reasoning framework to iteratively craft adversarial prompts.

Verification

List the steps needed to make sure this thing works

  • Supporting configuration such as generator configuration file
{"openai": {"OpenAICompatible": {"uri": "https:<placeholder>/v1", "model": "qwen2", "api_key": "DUMMY", "suppressed_params": ["n"], "max_tokens": 512, "temperature": 1.0, "top_p":null, "frequency_penalty": null, "presence_penalty":null, "seed":null, "stop":null}}}
  • garak --model_type openai.OpenAICompatible --model_name qwen2 --generator_options '{"openai": {"OpenAICompatible": {"uri": "<placeholder>/v1", "model": "qwen2", "api_key": "dummy", "suppressed_params": ["n"], "max_tokens": 512,"temperature":1.0, "top_p":null, "frequency_penalty": null, "presence_penalty":null, "seed":null, "stop":null}}}' --probes goat.GOATAttack --probe_options '{"goat": {"GOATAttack": {"max_iterations": 5, "attacker_model_type": "openai.OpenAICompatible", "attacker_model_name": "qwen3", "attacker_model_config": {"temperature": 1.0, "uri": "https:<placeholder>/v1", "api_key": "dummy", "model": "qwen3", "max_tokens": null, "top_p": null, "frequency_penalty": null, "presence_penalty": null, "seed": null, "stop": null, "stream": true}}}}'
  • Run the tests and ensure they pass python -m pytest tests/
  • Verify the thing does what it should
  • Verify the thing does not do what it should not
  • Document the thing and how it works (Example)

@github-actions
Copy link
Contributor

github-actions bot commented Oct 22, 2025

DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅

@saichandrapandraju
Copy link
Author

I have read the DCO Document and I hereby sign the DCO

@saichandrapandraju
Copy link
Author

recheck

github-actions bot added a commit that referenced this pull request Oct 22, 2025
@saichandrapandraju saichandrapandraju changed the title Feature/dynamic probe goat feat(probes): Add multi-turn GOAT probe Oct 22, 2025
@jmartin-tech jmartin-tech added the probes Content & activity of LLM probes label Oct 28, 2025
@jmartin-tech jmartin-tech self-assigned this Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

probes Content & activity of LLM probes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants