-
Notifications
You must be signed in to change notification settings - Fork 24
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
7 additions
and
1 deletion.
There are no files selected for viewing
Binary file added
BIN
+1.58 MB
..._stable_diffusion_safety_filter/2024_01_15_red_teaming_stable_diffusion_safety_filter.pdf
Binary file not shown.
6 changes: 6 additions & 0 deletions
6
2023/2024_01_15_red_teaming_stable_diffusion_safety_filter/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
|
||
# Red-Teaming the Stable Diffusion Safety Filter | ||
|
||
Stable Diffusion, a recently introduced open-source image generation model comparable to proprietary counterparts like DALLE, Imagen, or Parti, incorporates a safety filter designed to mitigate the generation of explicit images. The filter's implementation is obscured and lacks comprehensive documentation. This hinders users from effectively preventing potential misuse in their applications and understanding the filter's constraints for possible improvements. Paper authors' investigation reveals that it is relatively simple to generate unsettling content that bypasses the safety filter. Upon reverse-engineering the filter, they discovered its focus on preventing sexual content while overlooking violence, gore, and other disturbing elements. Based on their analysis, they advocate for future model releases to prioritize fully open and well-documented safety measures to encourage community contributions for enhancing security. | ||
|
||
Link to the paper: https://arxiv.org/abs/2210.04610 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters