VAR-CLIP:
Text-to-Image Generator with Visual Auto-Regressive Modeling

VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling
Qian Zhang, Xiangzi Dai, Ninghua Yang, Xiang An, Ziyong Feng, Xingyu Ren
Institute of Applied Physics and Computational Mathematics, DeepGlint,Shanghai Jiao Tong University

Some example for text-conditional generation:

.

Some example for class-conditional generation:

.

TODO

Relased Pre_trained model on ImageNet.
Relased train code.
Relased Arxiv.
Training T2I on the ImageNet dataset has been completed.
Training on the ImageNet dataset has been completed.

Getting Started

Requirements

pip install -r requirements.txt

Download Pretrain model/Dataset

1. Place the downloaded ImageNet train/val parts separately under *train/val* in the directory **./imagenet/** .
2. Download **Clip and Vae** pretrain model put on **pretrained/**.
3. Download **VAR_CLIP_d16** pretrain model put on **local_output/**.

Download ClIP_L14
Download VAE
Download VAR_CLIP Model Weight

Training Scripts

# training VAR-CLIP-d16 for 1000 epochs on ImageNet 256x256 costs 4.1 days on 64 A100s
# Before running, you need to configure the IP addresses of multiple machines in the run.py file and data_path
python run.py

Demo Scripts

# you can run demo_samle.py get text-conditional generation resulets after train completed.
python demo_sample.py

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citations

@misc{zhang2024varclip,
      title={VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling}, 
      author={Qian Zhang and Xiangzi Dai and Ninghua Yang and Xiang An and Ziyong Feng and Xingyu Ren},
      year={2024},
      journal={arXiv:2408.01181},
}

VAR - https://github.com/FoundationVision/VAR
CLIP - https://github.com/openai/CLIP

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

VAR-CLIP:
Text-to-Image Generator with Visual Auto-Regressive Modeling

Some example for text-conditional generation:

Some example for class-conditional generation:

TODO

Getting Started

Requirements

Download Pretrain model/Dataset

Training Scripts

Demo Scripts

License

Citations

Files

README.md

Latest commit

History

README.md

File metadata and controls

VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling

Some example for text-conditional generation:

Some example for class-conditional generation:

TODO

Getting Started

Requirements

Download Pretrain model/Dataset

Training Scripts

Demo Scripts

License

Citations

VAR-CLIP:
Text-to-Image Generator with Visual Auto-Regressive Modeling