Merge branch 'master' into patch-2

MI2DataLab · Jan 22, 2024 · a38b066 · a38b066
2 parents 60a9630 + 694193b
commit a38b066
Show file tree

Hide file tree

Showing 3 changed files with 7 additions and 1 deletion.
diff --git a/..._stable_diffusion_safety_filter/2024_01_15_red_teaming_stable_diffusion_safety_filter.pdf b/..._stable_diffusion_safety_filter/2024_01_15_red_teaming_stable_diffusion_safety_filter.pdf
diff --git a/2023/2024_01_15_red_teaming_stable_diffusion_safety_filter/README.md b/2023/2024_01_15_red_teaming_stable_diffusion_safety_filter/README.md
@@ -0,0 +1,6 @@
+
+# Red-Teaming the Stable Diffusion Safety Filter
+
+Stable Diffusion, a recently introduced open-source image generation model comparable to proprietary counterparts like DALLE, Imagen, or Parti, incorporates a safety filter designed to mitigate the generation of explicit images. The filter's implementation is obscured and lacks comprehensive documentation. This hinders users from effectively preventing potential misuse in their applications and understanding the filter's constraints for possible improvements. Paper authors' investigation reveals that it is relatively simple to generate unsettling content that bypasses the safety filter. Upon reverse-engineering the filter, they discovered its focus on preventing sexual content while overlooking violence, gore, and other disturbing elements. Based on their analysis, they advocate for future model releases to prioritize fully open and well-documented safety measures to encourage community contributions for enhancing security.
+
+Link to the paper: https://arxiv.org/abs/2210.04610
diff --git a/README.md b/README.md
@@ -22,7 +22,7 @@ Join us at https://meet.drwhy.ai.
 * 11.12.2023 - [Introduction to Diffusion Models](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2023_12_11_intro_to_diffusion_models) - Bartek Sobieski
 * 18.12.2023 - [Glaze: Protecting artists from style mimicry by text-to-image model](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2023_12_18_glaze_protecting_artists_from_style_mimicry) - Tymoteusz Kwieciński
 * 08.01.2023 - [FLIRT: Feedback Loop In-context Red Teaming](https://github.com/HubertR21/MI2DataLab_Seminarium/tree/patch-2/2024/2024_01_08_FLIRT_Feedback_Loop_In-context_Red_Teaming) - Hubert Ruczyński
-* 15.01.2024 - Red-Teaming the Stable Diffusion Safety Filter - Mateusz Grzyb
+* 15.01.2024 - [Red-Teaming the Stable Diffusion Safety Filter](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2024_01_15_red_teaming_stable_diffusion_safety_filter) - Mateusz Grzyb
 * 22.01.2024 - Discussion - Diffusion models for XAI
 
 **Overviews of previous editions**