-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
165 lines (163 loc) · 9.81 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
<html lang="" xml:lang="" xmlns="http://www.w3.org/1999/xhtml"><head>
<meta charset="utf-8">
<meta content="width=device-width, initial-scale=1" name="viewport">
<link href="media/graphics/favicon.ico" rel="shortcut icon">
<title> Full-Range Virtual Try-On with Recurrent Tri-Level Transform </title>
<link rel="stylesheet" href="style.css">
<link rel="stylesheet" href="box_swipe.css">
<script src="box_swipe.js"></script>
<link href="https://fonts.googleapis.com/css?family=Montserrat|Segoe+UI" rel="stylesheet">
</head>
<body>
<!-- SECTION: HEADER -->
<div class="n-header">
</div>
<div class="n-title">
<h1> Full-Range Virtual Try-On with Recurrent Tri-Level Transform </h1>
</div>
<!-- SECTION: AUTHORS -->
<div class="n-byline">
<div class="byline">
<ul class="authors">
<li> <a href="https://github.com/LZQhardworker" target="_blank">Han Yang</a> <sup> 1, 2 </sup>
</li>
<li> <a href="" target="_blank">Xinrui Yu</a> <sup> 3 </sup>
</li>
<li> <a href="https://liuziwei7.github.io/" target="_blank">Ziwei Liu</a> <sup> ✉️ 4 </sup>
</li>
</ul>
<div class="authors-affiliations-gap"></div>
<ul class="authors affiliations">
<li>
<sup> 1 </sup> ZMO AI Inc.
</li>
<li>
<sup> 2 </sup> ETH Zurich
</li>
<li>
<sup> 3 </sup> Harbin Institute of Technology, Shenzhen
</li>
<li>
<sup> 4 </sup> S-Lab, Nanyang Technological University
</li>
</ul>
<ul class="authors affiliations">
<li>
<sup> ✉️ </sup> Corresponding author.
</li>
</ul>
</div>
</div>
<!-- SECTION: MAIN BODY -->
<div class="n-article">
<!-- teaser -->
<div class="l-article video youtube-embed">
<iframe class="l-article youtube-video" width="100%" height="100%" src="https://www.youtube.com/embed/2XoW-HcrevM" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
</div>
<!-- abstract -->
<h2 id="abstract"> Abstract </h2>
<p align="justify"> Virtual try-on aims to transfer a target clothing image onto a reference person.
Though great progress has been achieved, the functioning zone of existing works is still limited to <strong>standard clothes </strong>
(e.g., plain shirt without complex laces or ripped effect),
while the vast complexity and variety of <strong>non-standard clothes</strong> (e.g., off-shoulder shirt, wordshoulder dress) are largely ignored. </p>
<p align="justify"> In this work, we propose a principled framework, <strong>Recurrent Tri-Level Transform (RT-VTON)</strong> ,
that performs full-range virtual try-on on both standard and non-standard clothes.
We have two key insights towards the framework design:
<strong>1) Semantics transfer</strong> requires a gradual feature transform on three different levels of clothing representations,
namely clothes code, pose code and parsing code.
<strong>2) Geometry transfer</strong> requires a regularized image deformation between rigidity and flexibility.
Firstly, we predict the semantics of the “after-try-on” person by recurrently refining the tri-level feature codes using local gated attention and non-local correspondence learning.
Next, we design a semi-rigid deformation to align the clothing image and the predicted semantics, which preserves local warping similarity.
Finally, a canonical try-on synthesizer fuses all the processed information to generate the clothed person image. Extensive experiments on conventional benchmarks along with user studies demonstrate that our framework achieves state-of-the-art performance both quantitatively and qualitatively.
Notably, RT-VTON shows compelling results on a wide range of non-standard clothe</p>
<!-- paper links -->
<h2 id="links"> Links </h2>
<div class="grid download-section">
<div class="download-thumb">
<a href="image/RT_VITON.pdf" target="_blank">
<img class="dropshadow" src="image/front_cover.png">
</a>
</div>
<div class="download-links">
<ul>
<li>
<a href="RT_VITON.pdf" target="_blank"> paper pdf </a>
</li>
<li>
<a href="/" target="_blank"> arXiv </a>
</li>
</ul>
</div>
</div>
<h2 id="videos"> Experiments </h2>
<p>
</p><h3>Qualitative results</h3>
<p></p>
<p align="justify"> The test pair and test results are shown <a href="https://drive.google.com/file/d/1e4YxOahv1X6jxjaxtn_GmZpKwQ6eBMwN/view?usp=sharing" target="_blank"><font color="blue">this</font></a> and <a href="https://drive.google.com/file/d/1tl-hvPUcTXbBN_3TKyWViv9y_TN25LpZ/view?usp=sharing" target="_blank"><font color="blue">here</font></a>,
from left to right are reference person, target clothes, try-on results of four algorithms including CP-VITON+, ACGPN, DCTON and RT-VITON.</p>
<img src="image/1.png" alt="image1" />
<div class="videocaption">
<div>
<p align="justify"><strong>Figure 1.</strong> Visual comparison of four virtual try-on methods in a standard to non-standard manner (top to bottom).
With our Tri-Level Transform and semi-rigid deformation, RT-VTON produces photo-realistic results for the full-range of clothing types and preserves the fine details of the clothing texture.</p>
</div>
</div>
<img src="image/2.png" alt="image2" />
<div class="videocaption">
<div>
<p align="justify"><strong>Figure 2.</strong> The visual comparison of the image deformation methods between the TPS warping and our semi-rigid deformation.</p>
</div>
</div>
</p><h3>Quantitative Results</h3>
<p></p>
<p align="justify"> Quantitative evaluation of try-on task is hard to conduct as there is no ground-truth of the reference person in the target clothes.</p>
<img src="image/table1.png" alt="table1" />
<div class="videocaption">
<div>
<p align="justify"><strong>Table 1.</strong> Quantitative Comparisons. “N.S.” denotes non-standard.
We show the Frechet Inception Distance (FID) and user study results of four methods.</p>
</div>
</div>
</p><h3>Ablation Study</h3>
<p></p>
<p align="justify">Our ablation studies are conducted mainly on analyzing the effectiveness of our Tri-Level Block in Semantic Generation Module (SGM).
Three settings are given as: <strong>1)</strong> full RT-VTON with Tri-Level Transform, <strong>2)</strong> RTVTON with plain encoder-decoder connected by residual
blocks, following, <strong>3)</strong> RT-VTON with Unet as SGM, which is a common backbone in designing the tryon pipelines.</p>
<img src="image/3.png" alt="image3" />
<div class="videocaption">
<div>
<p align="justify"><strong>Figure 3.</strong> Visual ablation study of Semantic Generation Module (SGM) in RT-VTON.</p>
</div>
</div>
</p><h3>Effectiveness of Non-Local Correspondence</h3>
<p></p>
<p align="justify">In Fig. 4, non-local correspondence learning we used helps capture the non-standard clothing pattern (on the left), which demonstrates strong relationship of the off-shoulder area to retain the clothing shape. Moreover, the boundaries of the sleeves (on the right) are well depicted with the target clothes which leverages the long-range correlation to reconstruct the final semantic layout.</p>
<img src="image/4.png" alt="image4" />
<div class="videocaption">
<div>
<p align="justify"><strong>Figure 4.</strong> Visualization of our non-local correspondence given some manually selected positions.</p>
</div>
</div>
</p><h3>Effectiveness of Gated Attention</h3>
<p></p>
<p>we extract the attention masks for the six Tri-Level Blocks used in RTVTON</p>
<img src="image/5.png" alt="image4" />
<div class="videocaption">
<div>
<p align="justify"><strong>Figure 5.</strong> Visualization of the attention masks in our local gating mechanism for clothes code (top) and pose code (bottom).
TLB1-6 denotes the six Tri-Level Blocks we use in our Semantic Generation Module (SGM).</p>
</div>
</div>
<h2 id="citation"> Citation </h2>
<pre>@inproceedings{yang2022full,
title = {Full-Range Virtual Try-On With Recurrent Tri-Level Transform},
author = {Yang, Han and Yu, Xinrui and Liu, Ziwei},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages = {3460--3469}
year = {2022}
}</pre>
<h2 id="acknowledgments"> Acknowledgments </h2>
<p align="justify"> This work is supported by NTU NAP, MOE AcRF Tier 1 (2021-T1-001-088), and under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). </p>
</div>
</body>
</html>