feat: Add Crescendo Attack, Multimodal Support, and Advanced Text/Muliti( modal/lingual) Transforms #223

rdheekonda · 2025-11-14T03:48:51Z

This PR introduces the Crescendo multi-turn jailbreak attack implementation alongside a comprehensive expansion of text transformation capabilities for adversarial testing. Key additions include multimodal transform support through the Message DataType architecture and advanced perturbation techniques using hooks pattern.

Overview

Added production-ready multimodal support for adversarial attacks with:

Message DataType for multimodal content (text, image, audio, video)
Hook-based transforms following agent framework patterns
Bidirectional rigging conversion for LLM API compatibility
Type-safe serialization for frontend rendering
Early stopping fixes for optimization studies

Key Features

1. Multimodal Message DataType

Message container supporting text, images, audio, and video
Bidirectional conversion: dn.Message ↔ rg.Message
Content-aware serialization with explicit type discrimination
Deep cloning for safe transformations
Type-safe properties: text_parts, image_parts, audio_parts, video_parts

2. Hook-Based Transform Architecture

apply_transforms() - Unified hook factory for input/output transforms
Handler registry pattern - Type-specific transform dispatch
Multimodal utilities - apply_transforms_to_message/value/kwargs
Consistent with agents - Same hook pattern as Agent._dispatch
Traced execution - Optional task wrapping for observability

3. Attack Framework Updates

TAP Attack - Updated to work with DnMessage end-to-end
Prompt Attack - Refactored with message-aware refiner
GOAT Attack - Migrated to DnMessage with graph context
LLMTarget - Now accepts/returns DnMessage only
Early stopping fix - Score condition now checks all finished trials

4. Evaluation System Enhancements

Hook integration - Eval uses same hook pattern as agents
Pre/post-process events - SamplePreProcess, SamplePostProcess
Reactions - ModifyInput, ModifyOutput, SkipSample, StopEval
Transform tracking - Sample.transformed_input for debugging
Console display - Shows transformed prompts in attack results

5. Crescendo Attack Implementation

Multi-turn jailbreak attack with progressive escalation strategy
Single-path iterative search (not tree-based like TAP)
Configurable variants with YAML-based templates
Early stopping and backtracking mechanisms
LLM-based prompt refinement with context awareness

6. Advanced Text Transforms

Cipher Suite: Vigenère, Playfair, Rail Fence, Columnar Transposition, XOR, Affine, Bacon, Autokey, Beaufort
Encoding Expansion: Unicode escape, Punycode, Base58, Base91, Zero-width, Mixed-case hex, Quoted-printable
Perturbation Techniques: Homoglyph attacks, token smuggling, semantic preserving perturbations, linguistic camouflage
Style Injections: Authority exploitation, temporal misdirection, complexity amplification, sentiment inversion
Text Operations: Word removal/duplication, case alternation, whitespace manipulation, sentence reordering
- 90+ new transform functions across text, cipher, encoding, and perturbation modules

Added Files

dreadnode/
├── data_types/
│   └── message.py              # Multimodal message container
├── eval/
│   ├── hooks/
│   │   ├── base.py            # EvalHook protocol
│   │   └── transforms.py      # apply_transforms() factory
│   └── reactions.py           # EvalReaction hierarchy
├── transforms/
│   └── multimodal.py          # Multimodal transform utilities
└── examples/
├── multimodal_attack_eval.ipynb
└── tap_with_transforms.ipynb
data/
└── meth.png                   # Example image for attacks

Modified Files

Core Framework

eval/eval.py - Integrated hook architecture, added _dispatch_hooks(), _run_sample_with_hooks()
eval/events.py - Added SamplePreProcess, SamplePostProcess events
eval/sample.py - Added transformed_input field for tracking
optimization/study.py - Passes hooks to Eval, fixed dataset injection
optimization/stop.py - Fixed score_value() to check all finished trials
optimization/trial.py - Added transformed_input property
optimization/console.py - Shows transformed inputs in best trial display

Attack Components

airt/target/llm.py - Updated to accept/return DnMessage only
airt/attack/prompt.py - Refactored with message-aware refiner
airt/attack/tap.py - Updated to DnMessage types
airt/attack/goat.py - Migrated to DnMessage with hooks support
transforms/refine.py - Updated llm_refine() for string-based refinement

Data Types

data_types/text.py - Added explicit metadata for serialization
data_types/image.py - Enhanced with source metadata tracking

Examples

Using Hooks with TAP Attack

from dreadnode.airt.attack import tap_attack
from dreadnode.airt.target import LLMTarget
from dreadnode.eval.hooks import apply_transforms
from dreadnode.transforms import text, image

target = LLMTarget(model="openai/gpt-4o")

attack = tap_attack(
    goal="Extract sensitive information",
    target=target,
    attacker_model="gpt-4",
    evaluator_model="gpt-4",
    hooks=[
        apply_transforms([
            text.char_join(delimiter="_"),
            image.add_text_overlay("CONFIDENTIAL"),
        ], stage="input"),
    ]
)

result = await attack.console()

---

## Generated Summary:

- Added a new data type `Message` for multimodal message handling, which can contain text, images, audio, video, or strings.
- Updated `LLMTarget` to accommodate the new `Message` type as input, enhancing clarity and functionality by requiring the input to be of type `DnMessage`.
- Introduced rigorous input validation in `task_factory`, to ensure that invalid types raise appropriate exceptions (TypeError and ValueError).
- Enhanced the `to_serializable` method to include additional metadata for content type specifity.
- Created a new example notebook `multimodal_attack.ipynb` to demonstrate multimodal adversarial attacks using the updated `Message` type.
- Added the binary file `meth.png`, which is utilized in the example notebook.
- Updates to the `text.py` file to include explicit metadata for serialized text.

These changes significantly improve the capabilities of handling multimodal inputs for the LLM API, and the example notebook provides a practical demonstration of these enhancements, contributing to improved user experience and integration.

This summary was generated with ❤️ by [rigging](https://rigging.dreadnode.io/)


<!-- Delete any sections that are not applicable -->
<!-- Add screenshots or code examples if relevant -->

…ttack

…, adopt transform hooks in eval; clean up

…ttack

add multimodal datatype and attack

3387dbd

dreadnode-renovate-bot bot added the area/examples Changes to example code and demonstrations label Nov 14, 2025

rdheekonda requested a review from monoxgas November 14, 2025 03:49

rdheekonda added 6 commits November 13, 2025 20:05

fix precommit errors

72a2824

Update llm target output to dn message

d6c7dcb

Merge branch 'main' into users/raja/slice-3105-implement-multimodal-a…

ae9707b

…ttack

Add schema to dn message structure

04c3cdb

add transform hooks; update tap, goat, prompt attacks iwth transforms…

99b2c35

…, adopt transform hooks in eval; clean up

Merge branch 'main' into users/raja/slice-3105-implement-multimodal-a…

9eb845b

…ttack

rdheekonda changed the title ~~feat: Add Multimodal Attack Support with Message DataType and Rigging Integration~~ feat: Add Multimodal Attack Support with Hook-Based Transform Architecture Nov 25, 2025

update docs

35078ab

dreadnode-renovate-bot bot added area/docs Changes to documentation and guides type/docs Documentation updates and improvements labels Nov 25, 2025

rdheekonda added 6 commits November 24, 2025 19:28

fix ruff

3e073a4

fix precommit

e5bc204

add crescendo variants and update constants

2343391

add more transforms and crescendo attack

c72c2ae

fix precommit errors

6cce9bc

fix precommit

8a50018

dreadnode-renovate-bot bot added the area/pre-commit Changes made to pre-commit hooks label Nov 29, 2025

rdheekonda changed the title ~~feat: Add Multimodal Attack Support with Hook-Based Transform Architecture~~ feat: Add Crescendo Attack, Multimodal Support, and Advanced Text/Muliti( modal/lingual) Transforms Nov 29, 2025

rdheekonda added 7 commits December 2, 2025 14:14

update goat on topic rubric to better reason about jailbreaks

a2d8a69

precommit error

89d44c7

fix crescendo rubric

83ebecb

add ai red teaming eval notebook

72d1eca

precommit

7a79f70

merge main onto this branch

cd1fafb

add safety dataset

efe6145

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Crescendo Attack, Multimodal Support, and Advanced Text/Muliti( modal/lingual) Transforms #223

feat: Add Crescendo Attack, Multimodal Support, and Advanced Text/Muliti( modal/lingual) Transforms #223

rdheekonda commented Nov 14, 2025 •

edited by vabruzzo

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add Crescendo Attack, Multimodal Support, and Advanced Text/Muliti( modal/lingual) Transforms #223

Are you sure you want to change the base?

feat: Add Crescendo Attack, Multimodal Support, and Advanced Text/Muliti( modal/lingual) Transforms #223

Conversation

rdheekonda commented Nov 14, 2025 • edited by vabruzzo Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Key Features

1. Multimodal Message DataType

2. Hook-Based Transform Architecture

3. Attack Framework Updates

4. Evaluation System Enhancements

5. Crescendo Attack Implementation

6. Advanced Text Transforms

Added Files

Modified Files

Core Framework

Attack Components

Data Types

Examples

Using Hooks with TAP Attack

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rdheekonda commented Nov 14, 2025 •

edited by vabruzzo

Loading