-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.html
313 lines (262 loc) · 14 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
<!DOCTYPE html>
<html lang="en">
<head>
<!-- ***** -->
<title>
Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models!
</title>
<link rel="icon" type="image/x-icon" href="./assets/icons/EvoSeed.png">
<meta name="description"
content="Project page for 'Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models!'">
<!-- ***** -->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<style>
@import url('https://fonts.cdnfonts.com/css/chalkduster');
</style>
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="style.css" type="text/css">
<link rel="stylesheet" href="https://fonts.cdnfonts.com/css/chalkduster">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css"
integrity="sha384-rbsA2VBKQhggwzxH7pPCaAqO46MgnOM80zW1RWuH61DGLwZJEdK2Kadq2F9CUG65"
crossorigin="anonymous">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
<script src="https://code.jquery.com/jquery-3.2.1.slim.min.js"
integrity="sha384-KJ3o2DKtIkvYIK3UENzmM7KCkRr/rE9/Qpg6aAZGJwFDMVNA/GpGFF93hXpG5KkN"
crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@popperjs/[email protected]/dist/umd/popper.min.js"
integrity="sha384-oBqDVmMz9ATKxIep9tiCxS/Z9fNfEXiDAYTujMAeBAsjFuCZSmKbSSUnQlmh/jp3"
crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js"
integrity="sha384-kenU1KFdBIe4zVF0s0G1M5b4hcpxyD9F7jL+jjXkk+Q2h455rYXK/7HAuoJl+0I4"
crossorigin="anonymous"></script>
</head>
<body>
<p class="title">
<img src="./assets/icons/EvoSeed.png" alt="EvoSeed" class="icon" style="height:2em;">
Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models!
<img src="./assets/icons/EvoSeed.png" alt="EvoSeed" class="image" style="height:2em;">
</p>
<div style="text-align: center; font-size: 40pt; margin-bottom: 30px">
<!-- <span>CVPR 2023</span> -->
</div>
<p class="author">
<span class="author"><a target="_blank"
href="https://sites.google.com/site/shashankkotyan">Shashank Kotyan</a> <sup>1*</sup></span>
<span class="author"><a target="_blank"
href="https://scholar.google.co.in/citations?user=R_HEXc0AAAAJ">Po-Yuan Mao</a> <sup>1*</sup></span>
<span class="author"><a target="_blank"
href="https://sites.google.com/site/pinyuchenpage/home">Pin-Yu Chen</a> <sup>2</sup></span>
<span class="author"><a target="_blank"
href="https://danilovargas.org/">Danilo Vasconcellos Vargas</a> <sup>1</sup></span>
</p>
<div class="affiliations">
<span>1 Kyushu University</span>
<span>2 IBM Research</span>
<span>* Equal Contribution</span>
</div>
<div class="menu">
<a href="https://arxiv.org/abs/2402.04699">
<img src="./assets/icons/publication.png" alt="Publication" class="image"style="height:2em;" />
<span> [Article]</span>
</a>
<a href="https://github.com/shashankkotyan/EvoSeed/">
<img src="./assets/icons/github.png" alt="GitHub" class="image" style="height:2em;" />
<span> [Code]</span>
</a>
<a href="https://github.com/shashankkotyan/EvoSeed/blob/main/code/Tutorial.ipynb">
<img src="./assets/icons/mortarboard.png" alt="Tutorial" class="image" style="height:2em;" />
<span> [Tutorial]</span>
</a>
<a href="#bibtex">
<img src="./assets/icons/cite.png" alt="Reference" class="image" style="height:2em;" />
<span> [BibTeX]</span>
</a>
</div>
<div class="container">
<br>
<hr class="hr-twill-colorful"><br>
<figure>
<img src="./assets/cover.jpg"
alt="Is it a Volcano? Is it a Seashore? It's a generate image that can fool AI"
style="width:100%" />
</figure>
<br>
<div align="center" style="display: inline-block;">
<img src="./assets/Correct Classification.jpg" alt="" width="25%" style="vertical-align: middle;">
<img src="./assets/icons/arrow-right.png" alt="" width="10%" style="vertical-align: middle;">
<img src="./assets/volcano-optimize.gif" alt="" width="25%" style="vertical-align: middle;">
<img src="./assets/icons/arrow-right.png" alt="" width="10%" style="vertical-align: middle;">
<img src="./assets/Wrong Classification.jpg" alt="" width="25%" style="vertical-align: middle;">
</div>
<br><hr class="hr-twill-colorful"><br>
<figure>
<div class="nsfw-img-container">
<div onclick="revealImage(this)">
<span> This image may contain sensitive or offensive content.<br>Click to view at your own discretion.</span>
</div>
<img src="./assets/transparent/demo.png" onclick="hideImage(this)"
alt="
Adversarial images created with EvoSeed are prime examples of how to deceive a range of classifiers tailored for various tasks.
Note that, the generated natural adversarial images differ from non-adversarial ones, suggesting the adversarial images' unrestricted nature.
" />
</div>
<figcaption>Figure:
Adversarial images created with EvoSeed are prime examples of how to deceive a range of classifiers tailored for various tasks.
Note that, the generated natural adversarial images differ from non-adversarial ones, suggesting the adversarial images' unrestricted nature.
</figcaption>
</figure>
<br><hr class="hr-twill-colorful"><br>
<p class="section">
<img src="./assets/icons/contributions.png" alt="Contributions" class="image" style="height:2em;">
<strong>Key Contributions:</strong>
</p>
<p class="text">
<ul>
<li>
We propose a black-box algorithmic framework based on an Evolutionary Strategy titled EvoSeed to
generate natural adversarial samples in an unrestricted setting.
</li>
<li>
Our results show that adversarial samples created using EvoSeed are photo-realistic and do not change the
human perception of the generated image; however, can be misclassified by various robust and non-robust
classifiers.
</li>
</ul>
</p>
<br><hr class="hr-twill-colorful"><br>
<p class="section">
<img src="./assets/icons/abstract.png" alt="Abstract" class="image" style="height:2em;">
<strong>Abstract</strong>
</p>
<p class="text">
Deep neural networks can be exploited using natural adversarial samples, which do not impact human perception.
Current approaches often rely on deep neural networks' white-box nature to generate these adversarial samples or
synthetically alter the distribution of adversarial samples compared to the training distribution.
In contrast, we propose EvoSeed, a novel evolutionary strategy-based algorithmic framework for generating
photo-realistic natural adversarial samples.
Our EvoSeed framework uses auxiliary Conditional Diffusion and Classifier models to operate in a black-box
setting.
We employ CMA-ES to optimize the search for an initial seed vector, which, when processed by the Conditional
Diffusion Model, results in the natural adversarial sample misclassified by the Classifier Model.
Experiments show that generated adversarial images are of high image quality, raising concerns about generating
harmful content bypassing safety classifiers.
Our research opens new avenues to understanding the limitations of current safety mechanisms and the risk of
plausible attacks against classifier systems using image generation.
</p>
<br><hr class="hr-twill-colorful"><br>
<p class="section">
<img src="./assets/icons/EvoSeed.png" alt="EvoSeed" class="image" style="height:2em;">
<strong>EvoSeed Framework</strong>
</p>
<figure class="image">
<img src="./assets/transparent/framework.png"
alt="Illustration of the EvoSeed framework to optimize initial seed vector \( z \) to generate a natural adversarial sample. The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) iteratively refines the initial seed vector \( z \) and finds an adversarial initial seed vector \( z' \). This adversarial seed vector \( z' \) can then be utilized by the Conditional Diffusion Model \( G \) to generate a natural adversarial sample \( x \) capable of deceiving the Classifier Model \( F \)."
style="width:100%" />
<figcaption>Figure:
Illustration of the EvoSeed framework to optimize initial seed vector \( z \)
to generate a natural adversarial sample. The Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
iteratively refines the initial seed vector \( z \) and finds an adversarial initial seed vector \( z' \). This
adversarial seed vector \( z' \) can then be utilized by the Conditional Diffusion Model \( G \) to generate a
natural adversarial sample \( x \) capable of deceiving the Classifier Model \( F \).
</figcaption>
</figure>
<br><hr class="hr-twill-colorful"><br>
<p class="section">
<img src="./assets/icons/object-classification.png" alt="Object Classification" class="image"
style="height:2em;">
<strong>Adversarial Images for Object Classification Task</strong>
</p>
<figure class="image">
<img src="./assets/transparent/object.png" alt="Exemplar adversarial images generated for the Object Classification Task.
We show that images that are aligned with the conditioning can be misclassified." style="width:100%" />
<figcaption> Figure: Exemplar adversarial images generated for the Object Classification Task.
We show that images that are aligned with the conditioning can be misclassified. </figcaption>
</figure>
<br><hr class="hr-twill-colorful"><br>
<p class="section">
<img src="./assets/icons/nsfw.png" alt="NSFW" class="image" style="height:2em;">
<strong>Adversarial Images bypass Safety Checkers</strong>
</p>
<figure>
<div class="nsfw-img-container">
<div onclick="revealImage(this)">
<span> This image may contain sensitive or offensive content.<br>Click to view at your own discretion.</span>
</div>
<img src="./assets/transparent/nsfw.png" onclick="hideImage(this)"
alt="We demonstrate a malicious use of EvoSeed to generate harmful content bypassing safety mechanisms.
These adversarial images are misclassified as appropriate, highlighting better post-image generation checking for such generated images." />
</div>
<figcaption> Figure: We demonstrate a malicious use of EvoSeed to generate harmful content bypassing safety
mechanisms.
These adversarial images are misclassified as appropriate, highlighting better post-image generation checking
for such generated images. </figcaption>
</figure>
<br><hr class="hr-twill-colorful"><br>
<p class="section">
<img src="./assets/icons/race.png" alt="Ethinicity" class="image" style="height:2em;">
<strong>Adversarial Images for Ethinicity Classification Task</strong>
</p>
<figure>
<img src="./assets/transparent/race.png"
alt="We demonstrate an application of EvoSeed to misclassify the individual's ethnicity in the generated image. This raises concerns about misrepresenting a demographic group's representation estimated by such classifiers. "
style="width:100%" />
<figcaption> Figure: We demonstrate an application of EvoSeed to misclassify the individual's ethnicity in the
generated image. This raises concerns about misrepresenting a demographic group's representation estimated by
such classifiers. </figcaption>
</figure>
<br><hr class="hr-twill-colorful"><br>
<p class="section">
<img src="./assets/icons/alignment.png" alt="Misalignment" class="image" style="height:2em;">
<strong>Adversarial Images exploiting Misalignment</strong>
</p>
<figure>
<img src="./assets/transparent/interesting.png"
alt="Exemplar adversarial images generated by EvoSeed where the gender of the person in the generated image was changed. This example also shows brittleness in the current diffusion model to generate non-aligned images with the conditioning. "
style="width:100%" />
<figcaption> Figure: Exemplar adversarial images generated by EvoSeed where the gender of the person in the
generated image was changed. This example also shows brittleness in the current diffusion model to generate
non-aligned images with the conditioning. </figcaption>
</figure>
<br><hr class="hr-twill-colorful"><br>
<p class="section">
<img src="./assets/icons/evolution.png" alt="Evolution" class="image" style="height:2em;">
<strong>Evolution of an Adversarial Images</strong>
</p>
<figure class="image">
<img src="./assets/transparent/flow.png"
alt="Demonstration of degrading confidence on the conditioned object c by the classifier for generated images.
Note that the right-most image is the adversarial image misclassified by the classifier model, and the left-most is the initial non-adversarial image with the highest confidence. "
style="width:100%" />
<figcaption> Figure: Demonstration of degrading confidence on the conditioned object \( c \) by the classifier for
generated images.
Note that the right-most image is the adversarial image misclassified by the classifier model, and the left-most
is the initial non-adversarial image with the highest confidence. </figcaption>
</figure>
<br><hr class="hr-twill-colorful"><br>
<p class="section" id="bibtex"><b>Bibtex</b></p>
<pre class="bibtex">
@article{kotyan2024EvoSeed,
title = {Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models!,
author = {Kotyan, Shashank and Mao, Po-Yuan and Chen, Pin-Yu and Vargas, Danilo Vasconcellos},
year = {2024},
month = may,
number = {arXiv:2402.04699},
eprint = {2402.04699},
publisher = {{arXiv}},
doi = {10.48550/arXiv.2402.04699},
}
</pre>
</div>
<script>
function revealImage(overlay) {
overlay.style.display = "none";
}
function hideImage(img) {
var overlay = img.previousElementSibling;
overlay.style.display = "flex";
}
</script>
</body>
</html>