-
Notifications
You must be signed in to change notification settings - Fork 1
feat: Add Crescendo Attack, Multimodal Support, and Advanced Text/Muliti( modal/lingual) Transforms #223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rdheekonda
wants to merge
21
commits into
main
Choose a base branch
from
users/raja/slice-3105-implement-multimodal-attack
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
feat: Add Crescendo Attack, Multimodal Support, and Advanced Text/Muliti( modal/lingual) Transforms #223
rdheekonda
wants to merge
21
commits into
main
from
users/raja/slice-3105-implement-multimodal-attack
+15,049
−1,777
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…, adopt transform hooks in eval; clean up
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/docs
Changes to documentation and guides
area/examples
Changes to example code and demonstrations
area/pre-commit
Changes made to pre-commit hooks
type/docs
Documentation updates and improvements
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces the Crescendo multi-turn jailbreak attack implementation alongside a comprehensive expansion of text transformation capabilities for adversarial testing. Key additions include multimodal transform support through the Message DataType architecture and advanced perturbation techniques using hooks pattern.
Overview
Added production-ready multimodal support for adversarial attacks with:
Key Features
1. Multimodal Message DataType
Messagecontainer supporting text, images, audio, and videodn.Message↔rg.Messagetext_parts,image_parts,audio_parts,video_parts2. Hook-Based Transform Architecture
apply_transforms()- Unified hook factory for input/output transformsapply_transforms_to_message/value/kwargsAgent._dispatch3. Attack Framework Updates
DnMessageend-to-endDnMessagewith graph contextDnMessageonly4. Evaluation System Enhancements
SamplePreProcess,SamplePostProcessModifyInput,ModifyOutput,SkipSample,StopEvalSample.transformed_inputfor debugging5. Crescendo Attack Implementation
6. Advanced Text Transforms
Added Files
Modified Files
Core Framework
eval/eval.py- Integrated hook architecture, added_dispatch_hooks(),_run_sample_with_hooks()eval/events.py- AddedSamplePreProcess,SamplePostProcesseventseval/sample.py- Addedtransformed_inputfield for trackingoptimization/study.py- Passes hooks to Eval, fixed dataset injectionoptimization/stop.py- Fixedscore_value()to check all finished trialsoptimization/trial.py- Addedtransformed_inputpropertyoptimization/console.py- Shows transformed inputs in best trial displayAttack Components
airt/target/llm.py- Updated to accept/returnDnMessageonlyairt/attack/prompt.py- Refactored with message-aware refinerairt/attack/tap.py- Updated toDnMessagetypesairt/attack/goat.py- Migrated toDnMessagewith hooks supporttransforms/refine.py- Updatedllm_refine()for string-based refinementData Types
data_types/text.py- Added explicit metadata for serializationdata_types/image.py- Enhanced with source metadata trackingExamples
Using Hooks with TAP Attack