This is the codebase for More Control for Free! Image Synthesis with Semantic Diffusion Guidance.
This repository is based on openai/guided-diffusion, with modifications for semantic guidance.
git clone https://github.com/xh-liu/SDG_code
cd SDG
pip install -r requirements.txt
pip install -e .
The pretrained unconditional diffusion models are from openai/guided-diffusion and jychoi118/ilvr_adm.
- LSUN bedroom unconditional diffusion: lsun_bedroom.pt
- LSUN cat unconditional diffusion: lsun_cat.pt
- LSUN horse unconditional diffusion: lsun_horse.pt
- LSUN horse (no dropout): lsun_horse_nodropout.pt
- FFHQ unconditional diffusion: ffhq.pt
We finetune the CLIP image encoders on noisy images for the semantic guidance. We provide the checkpoint as follows:
- FFHQ semantic guidance: clip_ffhq.pt
- LSUN bedroom semantic guidance: clipbedroom.pt
- LSUN cat semantic guidance: clip_cat.pt
- LSUN horse semantic guidance: clip_horse.pt
To sample from these models, you can use scripts/sample.py
.
Here, we provide flags for sampling from all of these models.
We assume that you have downloaded the relevant model checkpoints into a folder called models/
.
For LSUN cat, LSUN horse, and LSUN bedroom, the model flags are defined as:
MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond False --diffusion_steps 1000 --dropout 0.1 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --resblock_updown True --use_fp16 False --use_scale_shift_norm True --model_path models/lsun_bedroom.pt"
For FFHQ dataset, the model flags are defined as:
MODEL_FLAGS="--attention_resolutions 16 --class_cond False --diffusion_steps 1000 --dropout 0.0 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 128 --num_head_channels 64 --num_res_blocks 1 --resblock_updown True --use_fp16 False --use_scale_shift_norm True --model_path models/ffhq_10m.pt"
Sampling flags:
SAMPLE_FLAGS="--batch_size 8 --timestep_respacing 100"
Sampling with image content(semantics) guidance:
GUIDANCE_FLAGS="--data_dir ref/ref_bedroom --text_weight 0 --image_weight 100 --image_loss semantic --clip_path models/CLIP_bedroom.pt"
CUDA_VISIBLE_DEVICES=0 python -u scripts/sample.py --exp_name bedroom_image_guidance --single_gpu $MODEL_FLAGS $SAMPLE_FLAGS $GUIDANCE_FLAGS
Sampling with image style guidance:
GUIDANCE_FLAGS="--data_dir ref/ref_bedroom --text_weight 0 --image_weight 100 --image_loss style --clip_path models/CLIP_bedroom.pt"
CUDA_VISIBLE_DEVICES=0 python -u scripts/sample.py --exp_name bedroom_image_style_guidance --single_gpu $MODEL_FLAGS $SAMPLE_FLAGS $GUIDANCE_FLAGS
Sampling with language guidance:
GUIDANCE_FLAGS="--data_dir ref/ref_bedroom --text_weight 160 --image_weight 0 --text_instruction_file ref/bedroom_instructions.txt --clip_path models/CLIP_bedroom.pt"
CUDA_VISIBLE_DEVICES=0 python -u scripts/sample.py --exp_name bedroom_language_guidance --single_gpu $MODEL_FLAGS $SAMPLE_FLAGS $GUIDANCE_FLAGS
Sampling with both language and image guidance:
GUIDANCE_FLAGS="--data_dir ref/ref_bedroom --text_weight 160 --image_weight 100 --image_loss semantic --text_instruction_file ref/bedroom_instructions.txt --clip_path models/CLIP_bedroom.pt"
CUDA_VISIBLE_DEVICES=0 python -u scripts/sample.py --exp_name bedroom_image_language_guidance --single_gpu $MODEL_FLAGS $SAMPLE_FLAGS $GUIDANCE_FLAGS
You may need to adjust the text_weight and image_weight for better visual quality of generated samples.
If you find our work useful for your research, please cite our papers.
@inproceedings{liu2023more,
title={More control for free! image synthesis with semantic diffusion guidance},
author={Liu, Xihui and Park, Dong Huk and Azadi, Samaneh and Zhang, Gong and Chopikyan, Arman and Hu, Yuxiao and Shi, Humphrey and Rohrbach, Anna and Darrell, Trevor},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
year={2023}
}