Skip to content

Commit

Permalink
Merge pull request #66 from HubertR21/patch-2
Browse files Browse the repository at this point in the history
Add FLIRT presentation
  • Loading branch information
sobieskibj authored Jan 22, 2024
2 parents 694193b + a38b066 commit 1102525
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 1 deletion.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# FLIRT: Feedback Loop In-context Red Teaming

As generative models become available for public use in various applications, testing and analyzing vulnerabilities of these models has become a priority. In this paper, the authors propose FLIRT: an automatic red teaming framework that evaluates a given model and exposes its vulnerabilities against unsafe and inappropriate content generation. Contrary to the currently available solutions, FLIRT fully automates the feedback loop responsible for offensive content generation and does not follow a human-in-the-loop approach. The experiments demonstrate that compared to baseline approaches, the proposed strategy is significantly more effective in exposing vulnerabilities in the Stable Diffusion (SD) model, even when the latter is enhanced with safety features.

The presentation is based on this [paper](https://openreview.net/forum?id=JTBe1WG3Ws)
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Join us at https://meet.drwhy.ai.
* 04.12.2023 - Discussion - RedTeaming of foundation models
* 11.12.2023 - [Introduction to Diffusion Models](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2023_12_11_intro_to_diffusion_models) - Bartek Sobieski
* 18.12.2023 - [Glaze: Protecting artists from style mimicry by text-to-image model](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2023_12_18_glaze_protecting_artists_from_style_mimicry) - Tymoteusz Kwieciński
* 08.01.2024 - TBD - Hubert Ruczyński
* 08.01.2023 - [FLIRT: Feedback Loop In-context Red Teaming](https://github.com/HubertR21/MI2DataLab_Seminarium/tree/patch-2/2024/2024_01_08_FLIRT_Feedback_Loop_In-context_Red_Teaming) - Hubert Ruczyński
* 15.01.2024 - [Red-Teaming the Stable Diffusion Safety Filter](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2024_01_15_red_teaming_stable_diffusion_safety_filter) - Mateusz Grzyb
* 22.01.2024 - Discussion - Diffusion models for XAI

Expand Down

0 comments on commit 1102525

Please sign in to comment.