Skip to content

Commit c78bf7b

Browse files
committed
add time cost and gpu memory to readme asn add corresponding configs
Signed-off-by: Can-Zhao <[email protected]>
1 parent cd5e398 commit c78bf7b

File tree

1 file changed

+28
-0
lines changed

1 file changed

+28
-0
lines changed

generation/maisi/README.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,34 @@ We retrained several state-of-the-art diffusion model-based methods using our da
3131

3232
</div>
3333

34+
## Inference Time Cost and GPU Memory Usage
35+
36+
| `output_size` | `autoencoder_sliding_window_infer_size` | `autoencoder_tp_num_splits` | Peak Memory | DM Time | VAE Time |
37+
|---------------|:--------------------------------------:|:---------------------------:|:-----------:|:-------:|:--------:|
38+
| 256x256x128 | >=[64,64,32], not used | 2 | 14G | 57s | 1s |
39+
| 256x256x256 | [48,48,64], 4 patches | 2 | 14G | 81s | 7s |
40+
| 512x512x128 | [64,64,32], 9 patches | 1 | 14G | 138s | 7s |
41+
|---------------|----------------------------------------|-----------------------------|-------------|---------|----------|
42+
| 256x256x256 | >=[64,64,64], not used | 4 | 22G | 81s | 2s |
43+
| 512x512x128 | [80,80,32], 4 patches | 1 | 18G | 138s | 9s |
44+
| 512x512x512 | [64,64,48], 36 patches | 2 | 22G | 569s | 29s |
45+
|---------------|----------------------------------------|-----------------------------|-------------|---------|----------|
46+
| 512x512x512 | [64,64,64], 27 patches | 2 | 26G | 569s | 40s |
47+
|---------------|----------------------------------------|-----------------------------|-------------|---------|----------|
48+
| 512x512x128 | >=[128,128,32], not used | 4 | 37G | 138s | 140s |
49+
| 512x512x512 | [80,80,80], 8 patches | 2 | 44G | 569s | 30s |
50+
| 512x512x768 | [80,80,112], 8 patches | 4 | 55G | 904s | 48s |
51+
52+
53+
The experiment was tested on A100 80G GPU.
54+
55+
During inference, the peak GPU memory usage happens during the autoencoder decoding latent features.
56+
To reduce GPU memory usage, we can either increasing `autoencoder_tp_num_splits` or reduce `autoencoder_sliding_window_infer_size`.
57+
Increasing `autoencoder_tp_num_splits` has smaller impact on the generated image quality.
58+
Yet reducing `autoencoder_sliding_window_infer_size` may introduce stitching artifact and has larger impact on the generated image quality.
59+
60+
61+
3462
## MAISI Model Workflow
3563
The training and inference workflows of MAISI are depicted in the figure below. It begins by training an autoencoder in pixel space to encode images into latent features. Following that, it trains a diffusion model in the latent space to denoise the noisy latent features. During inference, it first generates latent features from random noise by applying multiple denoising steps using the trained diffusion model. Finally, it decodes the denoised latent features into images using the trained autoencoder.
3664
<p align="center">

0 commit comments

Comments
 (0)