@@ -53,19 +53,19 @@ We retrained several state-of-the-art diffusion model-based methods using our da
5353### Inference Time Cost and GPU Memory Usage
5454| ` output_size ` | latent size | ` autoencoder_sliding_window_infer_size ` | ` autoencoder_tp_num_splits ` | Peak Memory | DM Time | VAE Time |
5555| ---------------| :--------------------------------------:| :--------------------------------------:| :---------------------------:| :-----------:| :-------:| :--------:|
56- | [ 256x256x128] ( ./configs/config_infer_16g_256x256x128.json ) | 64x64x32 | >=[ 64,64,32] , not used | 2 | 14G | 57s | 1s |
57- | [ 256x256x256] ( ./configs/config_infer_16g_256x256x256.json ) | 64x64x64 | [ 48,48,64] , 4 patches | 2 | 14G | 81s | 7s |
58- | [ 512x512x128] ( ./configs/config_infer_16g_512x512x128.json ) | 128x128x32 | [ 64,64,32] , 9 patches | 1 | 14G | 138s | 7s |
56+ | [ 256x256x128] ( ./configs/config_infer_16g_256x256x128.json ) | 4x64x64x32 | >=[ 64,64,32] , not used | 2 | 14G | 57s | 1s |
57+ | [ 256x256x256] ( ./configs/config_infer_16g_256x256x256.json ) | 4x64x64x64 | [ 48,48,64] , 4 patches | 2 | 14G | 81s | 7s |
58+ | [ 512x512x128] ( ./configs/config_infer_16g_512x512x128.json ) | 4x128x128x32 | [ 64,64,32] , 9 patches | 1 | 14G | 138s | 7s |
5959| | | | | | |
60- | [ 256x256x256] ( ./configs/config_infer_24g_256x256x256.json ) | 64x64x64 | >=[ 64,64,64] , not used | 4 | 22G | 81s | 2s |
61- | [ 512x512x128] ( ./configs/config_infer_24g_512x512x128.json ) | 128x128x32 | [ 80,80,32] , 4 patches | 1 | 18G | 138s | 9s |
62- | [ 512x512x512] ( ./configs/config_infer_24g_512x512x512.json ) | 128x128x128 | [ 64,64,48] , 36 patches | 2 | 22G | 569s | 29s |
60+ | [ 256x256x256] ( ./configs/config_infer_24g_256x256x256.json ) | 4x64x64x64 | >=[ 64,64,64] , not used | 4 | 22G | 81s | 2s |
61+ | [ 512x512x128] ( ./configs/config_infer_24g_512x512x128.json ) | 4x128x128x32 | [ 80,80,32] , 4 patches | 1 | 18G | 138s | 9s |
62+ | [ 512x512x512] ( ./configs/config_infer_24g_512x512x512.json ) | 4x128x128x128 | [ 64,64,48] , 36 patches | 2 | 22G | 569s | 29s |
6363| | | | | | |
64- | [ 512x512x512] ( ./configs/config_infer_32g_512x512x512.json ) | 128x128x128 | [ 64,64,64] , 27 patches | 2 | 26G | 569s | 40s |
64+ | [ 512x512x512] ( ./configs/config_infer_32g_512x512x512.json ) | 4x128x128x128 | [ 64,64,64] , 27 patches | 2 | 26G | 569s | 40s |
6565| | | | | | |
66- | [ 512x512x128] ( ./configs/config_infer_80g_512x512x128.json ) | 128x128x32 | >=[ 128,128,32] , not used | 4 | 37G | 138s | 140s |
67- | [ 512x512x512] ( ./configs/config_infer_80g_512x512x512.json ) | 128x128x128 | [ 80,80,80] , 8 patches | 2 | 44G | 569s | 30s |
68- | [ 512x512x768] ( ./configs/config_infer_24g_512x512x768.json ) | 128x128x192 | [ 80,80,112] , 8 patches | 4 | 55G | 904s | 48s |
66+ | [ 512x512x128] ( ./configs/config_infer_80g_512x512x128.json ) | 4x128x128x32 | >=[ 128,128,32] , not used | 4 | 37G | 138s | 140s |
67+ | [ 512x512x512] ( ./configs/config_infer_80g_512x512x512.json ) | 4x128x128x128 | [ 80,80,80] , 8 patches | 2 | 44G | 569s | 30s |
68+ | [ 512x512x768] ( ./configs/config_infer_24g_512x512x768.json ) | 4x128x128x192 | [ 80,80,112] , 8 patches | 4 | 55G | 904s | 48s |
6969
7070** Table 3:** Inference Time Cost and GPU Memory Usage. ` DM Time ` refers to the time cost of diffusion model inference. ` VAE Time ` refers to the time cost of VAE decoder inference. The total inference time is the ` DM Time ` plus ` VAE Time ` . When ` autoencoder_sliding_window_infer_size ` is equal or larger than the latent feature size, sliding window will not be used,
7171and the time and memory cost remain the same. The experiment was tested on A100 80G GPU.
0 commit comments