diff --git a/README.md b/README.md index 74afe9d..67bcfc0 100644 --- a/README.md +++ b/README.md @@ -20,13 +20,14 @@ This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper exploring Hunyuan-DiT. You can find more visualizations on our [project page](https://dit.hunyuan.tencent.com/). -> [**Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding**](https://arxiv.org/abs/2405.08748)
- -> [**DialogGen:Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation**](https://arxiv.org/abs/2403.08857)
+> [**Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding**](https://arxiv.org/abs/2405.08748)
+> [**DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation**](https://arxiv.org/abs/2403.08857)
## đŸ”ĨđŸ”ĨđŸ”Ĩ News!! - +* Jun 06, 2024: 🚀 We introduce Distillation version for Hunyuan-DiT acceleration, which achieves **50%** acceleration on NVIDIA GPUs. Please check [Tencent-Hunyuan/Distillation](https://huggingface.co/Tencent-Hunyuan/Distillation) for more details. +* Jun 05, 2024: 🤗 Hunyuan-DiT is now available in 🤗 Diffusers! Please check the [example](#using--diffusers) below. +* Jun 04, 2024: :globe_with_meridians: Support Tencent Cloud links to download the pretrained models! Please check the [links](#-download-pretrained-models) below. * May 22, 2024: 🚀 We introduce TensorRT version for Hunyuan-DiT acceleration, which achieves **47%** acceleration on NVIDIA GPUs. Please check [TensorRT-libs](https://huggingface.co/Tencent-Hunyuan/TensorRT-libs) for instructions. * May 22, 2024: đŸ’Ŧ We support demo running multi-turn text2image generation now. Please check the [script](#using-gradio) below. @@ -59,14 +60,23 @@ or multi-turn language interactions to create the picture. - Hunyuan-DiT (Text-to-Image Model) - [x] Inference - [x] Checkpoints - - [ ] Distillation Version (Coming soon ⏊ī¸) + - [x] Distillation Version (Coming soon ⏊ī¸) - [x] TensorRT Version (Coming soon ⏊ī¸) - [ ] Training (Coming later ⏊ī¸) + - [ ] Lora + - [ ] Controlnet (Pose, Canny, Depth, Tile) + - [ ] IP-adapter + - [ ] Hunyuan-DiT-XL checkpoints (0.7B model) + - [ ] Caption model (Re-caption the raw image-text pairs) - [DialogGen](https://github.com/Centaurusalpha/DialogGen) (Prompt Enhancement Model) - - [x] Inference + - [x] Inference - [X] Web Demo (Gradio) - [x] Multi-turn T2I Demo (Gradio) - [X] Cli Demo +- [ ] ComfyUI +- [X] Diffusers +- [ ] WebUI + ## Contents - [Hunyuan-DiT](#hunyuan-dit--a-powerful-multi-resolution-diffusion-transformer-with-fine-grained-chinese-understanding) @@ -81,6 +91,7 @@ or multi-turn language interactions to create the picture. - [🧱 Download Pretrained Models](#-download-pretrained-models) - [🔑 Inference](#-inference) - [Using Gradio](#using-gradio) + - [Using Diffusers](#using--diffusers) - [Using Command Line](#using-command-line) - [More Configurations](#more-configurations) - [🚀 Acceleration (for Linux)](#-acceleration-for-linux) @@ -257,13 +268,14 @@ process, you can ignore the error and rerun the download command. All models will be automatically downloaded. For more information about the model, visit the Hugging Face repository [here](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT). -| Model | #Params | Download URL | -|:------------------:|:-------:|:-------------------------------------------------------------------------------------------------------:| -| mT5 | 1.6B | [mT5](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/mt5) | -| CLIP | 350M | [CLIP](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/clip_text_encoder) | -| DialogGen | 7.0B | [DialogGen](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/dialoggen) | -| sdxl-vae-fp16-fix | 83M | [sdxl-vae-fp16-fix](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/sdxl-vae-fp16-fix) | -| Hunyuan-DiT | 1.5B | [Hunyuan-DiT](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/model) | +| Model | #Params | Huggingface Download URL | Tencent Cloud Download URL | +|:------------------:|:-------:|:-------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------:| +| mT5 | 1.6B | [mT5](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/mt5) | [mT5](https://dit.hunyuan.tencent.com/download/HunyuanDiT/mt5.zip) | +| CLIP | 350M | [CLIP](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/clip_text_encoder) | [CLIP](https://dit.hunyuan.tencent.com/download/HunyuanDiT/clip_text_encoder.zip) | +| Tokenizer | - | [Tokenizer](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/tokenizer) | [Tokenizer](https://dit.hunyuan.tencent.com/download/HunyuanDiT/tokenizer.zip) | +| DialogGen | 7.0B | [DialogGen](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/dialoggen) | [DialogGen](https://dit.hunyuan.tencent.com/download/HunyuanDiT/dialoggen.zip) | +| sdxl-vae-fp16-fix | 83M | [sdxl-vae-fp16-fix](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/sdxl-vae-fp16-fix) | [sdxl-vae-fp16-fix](https://dit.hunyuan.tencent.com/download/HunyuanDiT/sdxl-vae-fp16-fix.zip) | +| Hunyuan-DiT | 1.5B | [Hunyuan-DiT](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/model) | [Hunyuan-DiT](https://dit.hunyuan.tencent.com/download/HunyuanDiT/model.zip) | ## 🔑 Inference @@ -292,6 +304,34 @@ python app/multiTurnT2I_app.py ``` Then the demo can be accessed through http://0.0.0.0:443 +### Using 🤗 Diffusers + +Please install PyTorch version 2.0 or higher in advance to satisfy the requirements of the specified version of the diffusers library. + +Install 🤗 diffusers, ensuring that the version is at least 0.28.1: + +```shell +pip install git+https://github.com/huggingface/diffusers.git +``` +or +```shell +pip install diffusers +``` + +You can generate images with both Chinese and English prompts using the following Python script: +```py +import torch +from diffusers import HunyuanDiTPipeline + +pipe = HunyuanDiTPipeline.from_pretrained("Tencent-Hunyuan/HunyuanDiT-Diffusers", torch_dtype=torch.float16) +pipe.to("cuda") + +# You may also use English prompt as HunyuanDiT supports both English and Chinese +# prompt = "An astronaut riding a horse" +prompt = "一ä¸Ē厇čˆĒ员在éĒ‘éŠŦ" +image = pipe(prompt).images[0] +``` + ### Using Command Line We provide several commands to quick start: @@ -337,9 +377,11 @@ We list some more useful configurations for easy usage: ## 🚀 Acceleration (for Linux) -We provide TensorRT version of HunyuanDiT for inference acceleration (faster than flash attention). +- We provide TensorRT version of HunyuanDiT for inference acceleration (faster than flash attention). See [Tencent-Hunyuan/TensorRT-libs](https://huggingface.co/Tencent-Hunyuan/TensorRT-libs) for more details. +- We provide Distillation version of HunyuanDiT for inference acceleration. +See [Tencent-Hunyuan/Distillation](https://huggingface.co/Tencent-Hunyuan/Distillation) for more details. ## 🔗 BibTeX If you find [Hunyuan-DiT](https://arxiv.org/abs/2405.08748) or [DialogGen](https://arxiv.org/abs/2403.08857) useful for your research and applications, please cite using this BibTeX: