MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance (ACM MM2024)
This repository is the official implementation of MAG-Edit.
Qi Mao, Lan Chen, Yuchao Gu, Zhen Fang, Mike Zheng Shou
(a) Blended latent diffusion (b) DiffEdit (c) Prompt2Prompt
(d) Plug-and-play (e) P2P+Blend (f) PnP+Blend
TL; DR: MAG-Edit is the first method specifically designed to address localized image editing in complex scenarios without training.
CLICK for the full abstract
Recent diffusion-based image editing approaches have exhibited impressive editing capabilities in images with simple compositions. However, localized editing in complex scenarios has not been well-studied in the literature, despite its growing real-world demands. Existing mask-based inpainting methods fall short of retaining the underlying structure within the edit region. Meanwhile, mask-free attention-based methods often exhibit editing leakage and misalignment in more complex compositions. In this work, we develop MAG-Edit, a training-free, inference-stage optimization method, which enables localized image editing in complex scenarios. In particular, MAG-Edit optimizes the noise latent feature in diffusion models by maximizing two mask-based cross-attention constraints of the edit token, which in turn gradually enhances the local alignment with the desired prompt. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method in achieving both text alignment and structure preservation for localized editing within complex scenarios.- 2024.05.24 Release Token Ratio Code!
- 2023.12.19 Release Project Page and Paper!
- Release Spatial Ratio Code
- Release Token Ratio Code
- Release MAG-Edit paper and project page
Our method is tested using cuda12.0 on a single A100 or V100. The preparation work mainly includes downloading the pre-trained model and configuring the environment.
conda create -n mag python=3.8
conda activate mag
pip install -r requirements.txt
We use Stable Diffusion v1-4 as backbone, please download from Hugging Face and change the file path in line26 in code_tr/network.py
.
To run MAG-Edit, single GPU with at least 32 GB VRAM is required.
The code_tr/edit.sh
provide the edit sample.
CUDA_VISIBLE_DEVICES=0 python edit.py --source_prompt="there is a set of sofas on the red carpet in the living room"\
--target_prompt="there is a set of sofas on the yellow carpet in the living room" \
--target_word="yellow" \
--img_path="examples/1/1.jpg"\
--mask_path="examples/1/mask.png"\
--result_dir="result"\
--max_iteration=15\
--scale=2.5
The result is saved at code_tr/result
.
Comparison with training-free methods
Simplified Prompt |
Source Image |
Ours | Blended LD | DiffEdit | P2P | PnP |
Green pillow |
||||||
Denim pants |
||||||
White bird |
||||||
Slices of steak |
Comparison with training and finetuning methods
Simplified Prompt |
Source Image |
Ours |
Instruct -Pix2Pix |
Magic -Brush |
SINE |
Yellow car |
|||||
Plaid Sofa |
|||||
Tropical fish |
|||||
Straw -berry |
Comparison with Inversion methods
Simplified Prompt |
Source Image |
Ours |
Style -Diffusion |
ProxNPI | DirectInversion |
Jeep | |||||
Floral sofa |
|||||
Yellow shirt |
@inproceedings{mao2024mag,
title={Mag-edit: Localized image editing in complex scenarios via mask-based attention-adjusted guidance},
author={Mao, Qi and Chen, Lan and Gu, Yuchao and Fang, Zhen and Shou, Mike Zheng},
booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
pages={6842--6850},
year={2024}
}
This repository borrows heavily from prompt-to-prompt and layout-guidance. Thanks to the authors for sharing their code and models.