Merge pull request #66 from HubertR21/patch-2

Add FLIRT presentation
MI2DataLab · Jan 22, 2024 · 1102525 · 1102525
2 parents 694193b + a38b066
commit 1102525
Show file tree

Hide file tree

Showing 3 changed files with 6 additions and 1 deletion.
diff --git a/2024/2024_01_08_FLIRT_Feedback_Loop_In-context_Red_Teaming/FLIRT.pdf b/2024/2024_01_08_FLIRT_Feedback_Loop_In-context_Red_Teaming/FLIRT.pdf
diff --git a/2024/2024_01_08_FLIRT_Feedback_Loop_In-context_Red_Teaming/README.md b/2024/2024_01_08_FLIRT_Feedback_Loop_In-context_Red_Teaming/README.md
@@ -0,0 +1,5 @@
+# FLIRT: Feedback Loop In-context Red Teaming
+
+As generative models become available for public use in various applications, testing and analyzing vulnerabilities of these models has become a priority. In this paper, the authors propose FLIRT: an automatic red teaming framework that evaluates a given model and exposes its vulnerabilities against unsafe and inappropriate content generation. Contrary to the currently available solutions, FLIRT fully automates the feedback loop responsible for offensive content generation and does not follow a human-in-the-loop approach. The experiments demonstrate that compared to baseline approaches, the proposed strategy is significantly more effective in exposing vulnerabilities in the Stable Diffusion (SD) model, even when the latter is enhanced with safety features.
+
+The presentation is based on this [paper](https://openreview.net/forum?id=JTBe1WG3Ws)
diff --git a/README.md b/README.md
@@ -21,7 +21,7 @@ Join us at https://meet.drwhy.ai.
 * 04.12.2023 - Discussion - RedTeaming of foundation models
 * 11.12.2023 - [Introduction to Diffusion Models](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2023_12_11_intro_to_diffusion_models) - Bartek Sobieski
 * 18.12.2023 - [Glaze: Protecting artists from style mimicry by text-to-image model](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2023_12_18_glaze_protecting_artists_from_style_mimicry) - Tymoteusz Kwieciński
-* 08.01.2024 - TBD - Hubert Ruczyński
+* 08.01.2023 - [FLIRT: Feedback Loop In-context Red Teaming](https://github.com/HubertR21/MI2DataLab_Seminarium/tree/patch-2/2024/2024_01_08_FLIRT_Feedback_Loop_In-context_Red_Teaming) - Hubert Ruczyński
 * 15.01.2024 - [Red-Teaming the Stable Diffusion Safety Filter](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2024_01_15_red_teaming_stable_diffusion_safety_filter) - Mateusz Grzyb
 * 22.01.2024 - Discussion - Diffusion models for XAI