Skip to content

Commit

Permalink
Merge branch 'master' into patch-2
Browse files Browse the repository at this point in the history
  • Loading branch information
HubertR21 authored Jan 22, 2024
2 parents 60a9630 + 694193b commit a38b066
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 1 deletion.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@

# Red-Teaming the Stable Diffusion Safety Filter

Stable Diffusion, a recently introduced open-source image generation model comparable to proprietary counterparts like DALLE, Imagen, or Parti, incorporates a safety filter designed to mitigate the generation of explicit images. The filter's implementation is obscured and lacks comprehensive documentation. This hinders users from effectively preventing potential misuse in their applications and understanding the filter's constraints for possible improvements. Paper authors' investigation reveals that it is relatively simple to generate unsettling content that bypasses the safety filter. Upon reverse-engineering the filter, they discovered its focus on preventing sexual content while overlooking violence, gore, and other disturbing elements. Based on their analysis, they advocate for future model releases to prioritize fully open and well-documented safety measures to encourage community contributions for enhancing security.

Link to the paper: https://arxiv.org/abs/2210.04610
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Join us at https://meet.drwhy.ai.
* 11.12.2023 - [Introduction to Diffusion Models](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2023_12_11_intro_to_diffusion_models) - Bartek Sobieski
* 18.12.2023 - [Glaze: Protecting artists from style mimicry by text-to-image model](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2023_12_18_glaze_protecting_artists_from_style_mimicry) - Tymoteusz Kwieciński
* 08.01.2023 - [FLIRT: Feedback Loop In-context Red Teaming](https://github.com/HubertR21/MI2DataLab_Seminarium/tree/patch-2/2024/2024_01_08_FLIRT_Feedback_Loop_In-context_Red_Teaming) - Hubert Ruczyński
* 15.01.2024 - Red-Teaming the Stable Diffusion Safety Filter - Mateusz Grzyb
* 15.01.2024 - [Red-Teaming the Stable Diffusion Safety Filter](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2024_01_15_red_teaming_stable_diffusion_safety_filter) - Mateusz Grzyb
* 22.01.2024 - Discussion - Diffusion models for XAI

**Overviews of previous editions**
Expand Down

0 comments on commit a38b066

Please sign in to comment.