Skip to content

Conversation

erickgalinkin
Copy link
Collaborator

@erickgalinkin erickgalinkin commented Aug 18, 2025

Allows configuration of the system prompt at the run level

Verification

List the steps needed to make sure this thing works

  • Supporting configuration such as generator configuration file
---
system:
  parallel_attempts: 20
  lite: true

run:
  system_prompt: "This is a system prompt to check if it's in the logs"
  generations: 1

plugins:
  probe_spec: dan.AutoDANCached
  extended_detectors: false
  model_type: nim
  model_name: qwen/qwen-235b

Existing tests are passing, need to add functional tests with the above and add docs.
Initially intended to do this in the base Generator class BUT unfortunately, everything gets serialized in Attempt so it required much more significant refactoring to log appropriately.

…tors/base.py`. Add system prompt support to `Probe`. Remove system prompt injection in `openai.py`.
…od for `Turn` to a classmethod. Fix tests with incorrect signatures for `Conversation`
…orrectly on init. Add `initial_user_message` property to avoid issues with system prompt index.
…etector.detect` to raise `NotImplementedError`. Fix `judge` detectors by ensuring proper return types and proper loading of conversation from list of dicts. Update test_nim.py to conform with expected return value for _call_model.
Copy link
Collaborator

@leondz leondz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this is great, we now have something concrete to talk about. cue .. talking

@erickgalinkin erickgalinkin linked an issue Aug 22, 2025 that may be closed by this pull request
@erickgalinkin erickgalinkin marked this pull request as ready for review August 22, 2025 15:00
Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to do some testing of various generators, this looks like a pretty clean pass.

…f conversations that already have system prompt. Add test for call to `self._conversation_to_list` to huggingface.py.
@jmartin-tech jmartin-tech requested a review from leondz August 26, 2025 15:49
Copy link
Collaborator

@leondz leondz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mild refactoring, clarification about where sysprompt comes from - what's the canonical source? how mutable/overridable is it?

``run`` config items
""""""""""""""""""""

* ``system_prompt`` -- If given and not overriden by the probe itself, probes will pass the specified system prompt when possible for generators that support chat modality.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yaml is tricky and escaping is unstable depending on implementation. maybe not needed for PR to land, but how can we afford a more flexible and less painful route to supplying sysprompts? filename?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we defer on this and add a system_prompt_file support in a future iteration.

Comment on lines +216 to +221
if len(turns) > 0:
prompt = garak.attempt.Conversation(
turns=turns,
notes=notes,
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when would we want to permit a prompt with empty Conversation? (distinct from prompt of Conversation with Turn of empty string)

Copy link
Collaborator

@jmartin-tech jmartin-tech Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think empty turns is a thing, this guard is simply to avoid changing the input param prompt if the local helper variable turns was never populated. Again this may be possible to removed if/when we validate that all prompt values are Conversation objects.

Comment on lines +60 to +67
def conversation_from_list(turns: list[dict]) -> Conversation:
"""Take a list of dicts and return a Conversation object.
In the future this should be factored out and implemented in the probe.
"""
return Conversation([Turn.from_dict(msg) for msg in turns])


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or in garak.attempt (which holds Conversation) as a module function, or even Conversation @staticmethod (seems kinda de jour). don't think it is perfect here unless the format is specific to resources.red_team.evaluation, which I don't think it is, because the format's the standard one used everywhere else in the PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be removed in a future focused refactor for red_team.evaluation, the usage of fschat in _create_conv is to be removed and converted to have that function output a Conversation.

Co-authored-by: Jeffrey Martin <[email protected]>
Co-authored-by: Leon Derczynski <[email protected]>
Signed-off-by: Erick Galinkin <[email protected]>
jmartin-tech added a commit to jmartin-tech/garak that referenced this pull request Aug 28, 2025
jmartin-tech added a commit to jmartin-tech/garak that referenced this pull request Aug 28, 2025
@jmartin-tech jmartin-tech self-assigned this Aug 28, 2025
@erickgalinkin erickgalinkin requested a review from leondz August 28, 2025 20:26
@jmartin-tech jmartin-tech merged commit b2401f4 into NVIDIA:main Aug 29, 2025
15 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Aug 29, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

native support for system prompts
3 participants