Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
data		data
models		models
modules		modules
svd_tools		svd_tools
README.md		README.md
image_to_video.py		image_to_video.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

README.md

Stable Video Diffusion (VideoLDM)

Introduction

Stable Video Diffusion is an Image-to-Video generation model based on Stable Diffusion that extends it to a video generation task by introducing temporal layers into the architecture (a.k.a. VideoLDM). Additionally, it utilizes a modified Decoder with added temporal layers to counteract flickering artifacts.

An example of a single U-Net Block with Added Temporal Layers (for more information please refer to [2])

Pretrained Models

SD Base Version	SVD version	Trained for	Config	Checkpoint
v2.0 & v2.1	SVD	14 frames generation	yaml	Download (9GB)
	SVD-XT	25 frames generation	yaml	Download (9GB)

The weights above were converted from the PyTorch version. If you want to convert another custom model, you can do so by using svd_tools/convert.py. For example:

python svd_tools/convert.py \
--pt_weights_file PATH_TO_YOUR_TORCH_MODEL \
--config CONFIG_FILE \
--out_dir PATH_TO_OUTPUT_DIR

Inference

Currently, only Image-to-Video generation is supported. For video generation from text, an image must first be created using either SD or SDXL (recommended resolution is 1024x576). Once the image is created, the video can be generated using the following command:

python image_to_video.py --mode=1 \
--SVD.config=configs/svd.yaml \
--SVD.checkpoint=PATH_TO_YOUR_SVD_CHECKPOINT \
--SVD.num_frames=NUM_FRAMES_TO_GENERATE \
--SVD.fps=FPS \
--image=PATH_TO_INPUT_IMAGE

Tip

If you encounter an OOM error while running the above command, try setting the --SVD.decode_chunk_size argument to a lower value (default is num_frames) before reducing num_frames as decoding is very memory-intensive.

For more information on possible parameters and usage, please execute the following command:

python image_to_video.py --help

Training

Dataset Preparation

Video labels should be stored in a CSV file in the following format:

path,length,motion_bucket_id
path_to_video1,video_length1,motion_bucket_id1
path_to_video2,video_length2,motion_bucket_id2
...

The generation of motion bucket IDs is described in detail in the SVD [1] paper. Please refer to Appendix C of the paper for more information.

Training

Currently, only Image-to-Video generation training is supported. To train Stable Video Diffusion, execute the following command:

python train.py --config=configs/svd_train.yaml \
--svd_config=configs/svd.yaml \
--train.pretrained=PATH_TO_YOUR_SVD_CHECKPOINT \
--train.output_dir=PATH_TO_OUTPUT_DIR \
--environment.mode=0 \
--train.temporal_only=True \
--train.epochs=NUM_EPOCHS \
--train.dataset.init_args.frames=NUM_FRAMES \
--train.dataset.init_args.step=FRAMES_FETCHING_STEP \
--train.dataset.init_args.data_dir=PATH_TO_DATASET \
--train.dataset.init_args.metadata=PATH_TO_LABELS

Note

More details on the training arguments can be found in the training config and model config.

Important

For 910*, please set export MS_ASCEND_CHECK_OVERFLOW_MODE="INFNAN_MODE" before running training.

Acknowledgements

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, Varun Jampani, Robin Rombach. Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets. Stability AI, 2023.
Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis. Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models. arXiv:2304.08818, 2023.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

svd

svd

README.md

Stable Video Diffusion (VideoLDM)

Introduction

Pretrained Models

Inference

Training

Dataset Preparation

Training

Acknowledgements

Files

svd

Directory actions

More options

Directory actions

More options

Latest commit

History

svd

Folders and files

parent directory

README.md

Stable Video Diffusion (VideoLDM)

Introduction

Pretrained Models

Inference

Training

Dataset Preparation

Training

Acknowledgements