Skip to content

Conversation

@rdheekonda
Copy link
Contributor

@rdheekonda rdheekonda commented Nov 14, 2025

This PR introduces the Crescendo multi-turn jailbreak attack implementation alongside a comprehensive expansion of text transformation capabilities for adversarial testing. Key additions include multimodal transform support through the Message DataType architecture and advanced perturbation techniques using hooks pattern.

Overview

Added production-ready multimodal support for adversarial attacks with:

  • Message DataType for multimodal content (text, image, audio, video)
  • Hook-based transforms following agent framework patterns
  • Bidirectional rigging conversion for LLM API compatibility
  • Type-safe serialization for frontend rendering
  • Early stopping fixes for optimization studies

Key Features

1. Multimodal Message DataType

  • Message container supporting text, images, audio, and video
  • Bidirectional conversion: dn.Messagerg.Message
  • Content-aware serialization with explicit type discrimination
  • Deep cloning for safe transformations
  • Type-safe properties: text_parts, image_parts, audio_parts, video_parts

2. Hook-Based Transform Architecture

  • apply_transforms() - Unified hook factory for input/output transforms
  • Handler registry pattern - Type-specific transform dispatch
  • Multimodal utilities - apply_transforms_to_message/value/kwargs
  • Consistent with agents - Same hook pattern as Agent._dispatch
  • Traced execution - Optional task wrapping for observability

3. Attack Framework Updates

  • TAP Attack - Updated to work with DnMessage end-to-end
  • Prompt Attack - Refactored with message-aware refiner
  • GOAT Attack - Migrated to DnMessage with graph context
  • LLMTarget - Now accepts/returns DnMessage only
  • Early stopping fix - Score condition now checks all finished trials

4. Evaluation System Enhancements

  • Hook integration - Eval uses same hook pattern as agents
  • Pre/post-process events - SamplePreProcess, SamplePostProcess
  • Reactions - ModifyInput, ModifyOutput, SkipSample, StopEval
  • Transform tracking - Sample.transformed_input for debugging
  • Console display - Shows transformed prompts in attack results

5. Crescendo Attack Implementation

  • Multi-turn jailbreak attack with progressive escalation strategy
  • Single-path iterative search (not tree-based like TAP)
  • Configurable variants with YAML-based templates
  • Early stopping and backtracking mechanisms
  • LLM-based prompt refinement with context awareness

6. Advanced Text Transforms

  • Cipher Suite: Vigenère, Playfair, Rail Fence, Columnar Transposition, XOR, Affine, Bacon, Autokey, Beaufort
  • Encoding Expansion: Unicode escape, Punycode, Base58, Base91, Zero-width, Mixed-case hex, Quoted-printable
  • Perturbation Techniques: Homoglyph attacks, token smuggling, semantic preserving perturbations, linguistic camouflage
  • Style Injections: Authority exploitation, temporal misdirection, complexity amplification, sentiment inversion
  • Text Operations: Word removal/duplication, case alternation, whitespace manipulation, sentence reordering
    • 90+ new transform functions across text, cipher, encoding, and perturbation modules

Added Files

dreadnode/
├── data_types/
│   └── message.py              # Multimodal message container
├── eval/
│   ├── hooks/
│   │   ├── base.py            # EvalHook protocol
│   │   └── transforms.py      # apply_transforms() factory
│   └── reactions.py           # EvalReaction hierarchy
├── transforms/
│   └── multimodal.py          # Multimodal transform utilities
└── examples/
├── multimodal_attack_eval.ipynb
└── tap_with_transforms.ipynb
data/
└── meth.png                   # Example image for attacks

Modified Files

Core Framework

  • eval/eval.py - Integrated hook architecture, added _dispatch_hooks(), _run_sample_with_hooks()
  • eval/events.py - Added SamplePreProcess, SamplePostProcess events
  • eval/sample.py - Added transformed_input field for tracking
  • optimization/study.py - Passes hooks to Eval, fixed dataset injection
  • optimization/stop.py - Fixed score_value() to check all finished trials
  • optimization/trial.py - Added transformed_input property
  • optimization/console.py - Shows transformed inputs in best trial display

Attack Components

  • airt/target/llm.py - Updated to accept/return DnMessage only
  • airt/attack/prompt.py - Refactored with message-aware refiner
  • airt/attack/tap.py - Updated to DnMessage types
  • airt/attack/goat.py - Migrated to DnMessage with hooks support
  • transforms/refine.py - Updated llm_refine() for string-based refinement

Data Types

  • data_types/text.py - Added explicit metadata for serialization
  • data_types/image.py - Enhanced with source metadata tracking

Examples

Using Hooks with TAP Attack

from dreadnode.airt.attack import tap_attack
from dreadnode.airt.target import LLMTarget
from dreadnode.eval.hooks import apply_transforms
from dreadnode.transforms import text, image

target = LLMTarget(model="openai/gpt-4o")

attack = tap_attack(
    goal="Extract sensitive information",
    target=target,
    attacker_model="gpt-4",
    evaluator_model="gpt-4",
    hooks=[
        apply_transforms([
            text.char_join(delimiter="_"),
            image.add_text_overlay("CONFIDENTIAL"),
        ], stage="input"),
    ]
)

result = await attack.console()

---

## Generated Summary:

- Added a new data type `Message` for multimodal message handling, which can contain text, images, audio, video, or strings.
- Updated `LLMTarget` to accommodate the new `Message` type as input, enhancing clarity and functionality by requiring the input to be of type `DnMessage`.
- Introduced rigorous input validation in `task_factory`, to ensure that invalid types raise appropriate exceptions (TypeError and ValueError).
- Enhanced the `to_serializable` method to include additional metadata for content type specifity.
- Created a new example notebook `multimodal_attack.ipynb` to demonstrate multimodal adversarial attacks using the updated `Message` type.
- Added the binary file `meth.png`, which is utilized in the example notebook.
- Updates to the `text.py` file to include explicit metadata for serialized text.

These changes significantly improve the capabilities of handling multimodal inputs for the LLM API, and the example notebook provides a practical demonstration of these enhancements, contributing to improved user experience and integration.

This summary was generated with ❤️ by [rigging](https://rigging.dreadnode.io/)


<!-- Delete any sections that are not applicable -->
<!-- Add screenshots or code examples if relevant -->

@dreadnode-renovate-bot dreadnode-renovate-bot bot added the area/examples Changes to example code and demonstrations label Nov 14, 2025
@rdheekonda rdheekonda requested a review from monoxgas November 14, 2025 03:49
@rdheekonda rdheekonda changed the title feat: Add Multimodal Attack Support with Message DataType and Rigging Integration feat: Add Multimodal Attack Support with Hook-Based Transform Architecture Nov 25, 2025
@dreadnode-renovate-bot dreadnode-renovate-bot bot added area/docs Changes to documentation and guides type/docs Documentation updates and improvements labels Nov 25, 2025
@dreadnode-renovate-bot dreadnode-renovate-bot bot added the area/pre-commit Changes made to pre-commit hooks label Nov 29, 2025
@rdheekonda rdheekonda changed the title feat: Add Multimodal Attack Support with Hook-Based Transform Architecture feat: Add Crescendo Attack, Multimodal Support, and Advanced Text/Muliti( modal/lingual) Transforms Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/docs Changes to documentation and guides area/examples Changes to example code and demonstrations area/pre-commit Changes made to pre-commit hooks type/docs Documentation updates and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants