Configurable system prompt #1337

erickgalinkin · 2025-08-18T14:18:19Z

Allows configuration of the system prompt at the run level

Verification

List the steps needed to make sure this thing works

Supporting configuration such as generator configuration file

---
system:
  parallel_attempts: 20
  lite: true

run:
  system_prompt: "This is a system prompt to check if it's in the logs"
  generations: 1

plugins:
  probe_spec: dan.AutoDANCached
  extended_detectors: false
  model_type: nim
  model_name: qwen/qwen-235b

Existing tests are passing, need to add functional tests with the above and add docs.
~~Initially intended to do this in the base Generator class BUT unfortunately, everything gets serialized in Attempt so it required much more significant refactoring to log appropriately.~~

…tors/base.py`. Add system prompt support to `Probe`. Remove system prompt injection in `openai.py`.

…AICompatible` and `CohereGenerator`.

…od for `Turn` to a classmethod. Fix tests with incorrect signatures for `Conversation`

…orrectly on init. Add `initial_user_message` property to avoid issues with system prompt index.

…etector.detect` to raise `NotImplementedError`. Fix `judge` detectors by ensuring proper return types and proper loading of conversation from list of dicts. Update test_nim.py to conform with expected return value for _call_model.

…g information.

leondz

ok this is great, we now have something concrete to talk about. cue .. talking

garak/attempt.py

garak/detectors/base.py

garak/probes/base.py

tests/detectors/test_detectors.py

…er used by `_mint_attempt`.

garak/probes/base.py

jmartin-tech

Will need to do some testing of various generators, this looks like a pretty clean pass.

garak/resources/red_team/evaluation.py

tests/generators/test_huggingface.py

garak/generators/base.py

tests/generators/test_litellm.py

…f conversations that already have system prompt. Add test for call to `self._conversation_to_list` to huggingface.py.

garak/probes/base.py

leondz

mild refactoring, clarification about where sysprompt comes from - what's the canonical source? how mutable/overridable is it?

leondz · 2025-08-28T10:29:24Z

docs/source/configurable.rst

 ``run`` config items
 """"""""""""""""""""

+* ``system_prompt`` -- If given and not overriden by the probe itself, probes will pass the specified system prompt when possible for generators that support chat modality.


yaml is tricky and escaping is unstable depending on implementation. maybe not needed for PR to land, but how can we afford a more flexible and less painful route to supplying sysprompts? filename?

I think we defer on this and add a system_prompt_file support in a future iteration.

garak/attempt.py

garak/detectors/judge.py

garak/detectors/malwaregen.py

garak/generators/base.py

leondz · 2025-08-28T10:52:02Z

garak/probes/base.py

+        if len(turns) > 0:
+            prompt = garak.attempt.Conversation(
+                turns=turns,
+                notes=notes,
+            )
+


when would we want to permit a prompt with empty Conversation? (distinct from prompt of Conversation with Turn of empty string)

I don't think empty turns is a thing, this guard is simply to avoid changing the input param prompt if the local helper variable turns was never populated. Again this may be possible to removed if/when we validate that all prompt values are Conversation objects.

leondz · 2025-08-28T10:55:32Z

garak/resources/red_team/evaluation.py

+def conversation_from_list(turns: list[dict]) -> Conversation:
+    """Take a list of dicts and return a Conversation object.
+
+    In the future this should be factored out and implemented in the probe.
+    """
+    return Conversation([Turn.from_dict(msg) for msg in turns])
+
+


or in garak.attempt (which holds Conversation) as a module function, or even Conversation @staticmethod (seems kinda de jour). don't think it is perfect here unless the format is specific to resources.red_team.evaluation, which I don't think it is, because the format's the standard one used everywhere else in the PR

This will be removed in a future focused refactor for red_team.evaluation, the usage of fschat in _create_conv is to be removed and converted to have that function output a Conversation.

tests/generators/test_nim.py

tests/test_attempt.py

tests/test_sysprompt.py

Co-authored-by: Jeffrey Martin <[email protected]> Co-authored-by: Leon Derczynski <[email protected]> Signed-off-by: Erick Galinkin <[email protected]>

erickgalinkin added 15 commits August 18, 2025 10:16

Initial approach for configurable system prompt

42aa795

Add system prompt support in Attempt. Add helpful logging to `detec…

d916cda

…tors/base.py`. Add system prompt support to `Probe`. Remove system prompt injection in `openai.py`.

Add as_dict() functionality to Conversation objects. Update `Open…

3fa610a

…AICompatible` and `CohereGenerator`.

Make code more DRY by using as_dict method throughout.

0d7bc1c

Fix as_dict method

436f781

Better attribute check

38f5a61

Add from_list() method to Conversation. Refactor from_dict meth…

9dfe466

…od for `Turn` to a classmethod. Fix tests with incorrect signatures for `Conversation`

Fix Turn.from_dict classmethod. Fix issue with prompt being set inc…

3a79ff0

…orrectly on init. Add `initial_user_message` property to avoid issues with system prompt index.

Improve detector logic and logging.

7d7d1dd

Improve detector logic and logging.

0ac1d86

Remove _format_chat_prompt call from test_huggingface.py

198fd79

Fix issue in judge.py. Fix and add additional detector-related loggin…

68b19f8

…g information.

Fix judge.py and productkey.py bugs

349f4cb

Fix lang_spec in Win5x5

b0b512a

leondz requested changes Aug 22, 2025

View reviewed changes

erickgalinkin added 2 commits August 22, 2025 10:02

Revert functional change to StartsWith detector

5acfd97

Revert changes to test_detectors.py

75d55bc

erickgalinkin linked an issue Aug 22, 2025 that may be closed by this pull request

native support for system prompts #202

Closed

erickgalinkin marked this pull request as ready for review August 22, 2025 15:00

erickgalinkin requested a review from jmartin-tech August 22, 2025 15:00

erickgalinkin added 2 commits August 25, 2025 15:24

Revert a bunch of changes. Move system_prompt to be a run paramet…

5dc458e

…er used by `_mint_attempt`.

Tests and docs for system prompt

2b107f4

jmartin-tech reviewed Aug 25, 2025

View reviewed changes

garak/probes/base.py Outdated Show resolved Hide resolved

jmartin-tech reviewed Aug 25, 2025

View reviewed changes

garak/resources/red_team/evaluation.py Show resolved Hide resolved

tests/generators/test_huggingface.py Show resolved Hide resolved

garak/generators/base.py Outdated Show resolved Hide resolved

tests/generators/test_litellm.py Show resolved Hide resolved

erickgalinkin added 3 commits August 26, 2025 10:02

Refactor conversation_to_list to private method. Improve handling o…

1225236

…f conversations that already have system prompt. Add test for call to `self._conversation_to_list` to huggingface.py.

Change ValueError to logging.warning for atkgen.Tox.

4987275

Fix hf test

2793939

jmartin-tech requested a review from leondz August 26, 2025 15:49

jmartin-tech reviewed Aug 26, 2025

View reviewed changes

garak/probes/base.py Outdated Show resolved Hide resolved

leondz requested changes Aug 28, 2025

View reviewed changes

This was referenced Aug 28, 2025

arch: remove support for accessing/instantiating prompt/Message as str #1347

Open

judge detectors conversation refactor #1346

Merged

Apply suggestions from code review

54494ae

Co-authored-by: Jeffrey Martin <[email protected]> Co-authored-by: Leon Derczynski <[email protected]> Signed-off-by: Erick Galinkin <[email protected]>

jmartin-tech added a commit to jmartin-tech/garak that referenced this pull request Aug 28, 2025

Configurable system prompt (NVIDIA#1337)

9458d47

erickgalinkin added 2 commits August 28, 2025 15:50

Fix atkgen expectations with sysprompt

10f1338

Indentation error fix

c021f75

jmartin-tech added a commit to jmartin-tech/garak that referenced this pull request Aug 28, 2025

Configurable system prompt (NVIDIA#1337)

faed1e0

jmartin-tech self-assigned this Aug 28, 2025

erickgalinkin requested a review from leondz August 28, 2025 20:26

jmartin-tech merged commit b2401f4 into NVIDIA:main Aug 29, 2025
15 checks passed

github-actions bot locked and limited conversation to collaborators Aug 29, 2025

Configurable system prompt #1337

Configurable system prompt #1337

Uh oh!

Conversation

erickgalinkin commented Aug 18, 2025 • edited by jmartin-tech Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verification

Uh oh!

leondz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jmartin-tech left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leondz left a comment

Choose a reason for hiding this comment

Uh oh!

leondz Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leondz Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leondz Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erickgalinkin commented Aug 18, 2025 •

edited by jmartin-tech

Loading

jmartin-tech Aug 28, 2025 •

edited

Loading