-
Notifications
You must be signed in to change notification settings - Fork 24
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #66 from HubertR21/patch-2
Add FLIRT presentation
- Loading branch information
Showing
3 changed files
with
6 additions
and
1 deletion.
There are no files selected for viewing
Binary file not shown.
5 changes: 5 additions & 0 deletions
5
2024/2024_01_08_FLIRT_Feedback_Loop_In-context_Red_Teaming/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# FLIRT: Feedback Loop In-context Red Teaming | ||
|
||
As generative models become available for public use in various applications, testing and analyzing vulnerabilities of these models has become a priority. In this paper, the authors propose FLIRT: an automatic red teaming framework that evaluates a given model and exposes its vulnerabilities against unsafe and inappropriate content generation. Contrary to the currently available solutions, FLIRT fully automates the feedback loop responsible for offensive content generation and does not follow a human-in-the-loop approach. The experiments demonstrate that compared to baseline approaches, the proposed strategy is significantly more effective in exposing vulnerabilities in the Stable Diffusion (SD) model, even when the latter is enhanced with safety features. | ||
|
||
The presentation is based on this [paper](https://openreview.net/forum?id=JTBe1WG3Ws) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters