add details in readme

Can-Zhao · Can-Zhao · commit 27b2129323ad · 2024-11-21T19:10:54.000Z
Signed-off-by: Can-Zhao &lt;volcanofly@gmail.com&gt;
diff --git a/generation/maisi/README.md b/generation/maisi/README.md
@@ -53,19 +53,19 @@ We retrained several state-of-the-art diffusion model-based methods using our da
 ### Inference Time Cost and GPU Memory Usage
 | `output_size` | latent size |`autoencoder_sliding_window_infer_size` | `autoencoder_tp_num_splits` | Peak Memory | DM Time | VAE Time |
 |---------------|:--------------------------------------:|:--------------------------------------:|:---------------------------:|:-----------:|:-------:|:--------:|
-| [256x256x128](./configs/config_infer_16g_256x256x128.json)   |64x64x32| >=[64,64,32], not used                 | 2                           | 14G         | 57s     | 1s       |
-| [256x256x256](./configs/config_infer_16g_256x256x256.json)   |64x64x64| [48,48,64], 4 patches                  | 2                           | 14G         | 81s     | 7s       |
-| [512x512x128](./configs/config_infer_16g_512x512x128.json)   |128x128x32| [64,64,32], 9 patches                  | 1                           | 14G         | 138s    | 7s       |
+| [256x256x128](./configs/config_infer_16g_256x256x128.json)   |4x64x64x32| >=[64,64,32], not used                 | 2                           | 14G         | 57s     | 1s       |
+| [256x256x256](./configs/config_infer_16g_256x256x256.json)   |4x64x64x64| [48,48,64], 4 patches                  | 2                           | 14G         | 81s     | 7s       |
+| [512x512x128](./configs/config_infer_16g_512x512x128.json)   |4x128x128x32| [64,64,32], 9 patches                  | 1                           | 14G         | 138s    | 7s       |
 |               |                                        |                             |             |         |          |
-| [256x256x256](./configs/config_infer_24g_256x256x256.json)   |64x64x64| >=[64,64,64], not used                 | 4                           | 22G         | 81s     | 2s       |
-| [512x512x128](./configs/config_infer_24g_512x512x128.json)   |128x128x32| [80,80,32], 4 patches                  | 1                           | 18G         | 138s    | 9s       |
-| [512x512x512](./configs/config_infer_24g_512x512x512.json)   |128x128x128| [64,64,48], 36 patches                 | 2                           | 22G         | 569s    | 29s      |
+| [256x256x256](./configs/config_infer_24g_256x256x256.json)   |4x64x64x64| >=[64,64,64], not used                 | 4                           | 22G         | 81s     | 2s       |
+| [512x512x128](./configs/config_infer_24g_512x512x128.json)   |4x128x128x32| [80,80,32], 4 patches                  | 1                           | 18G         | 138s    | 9s       |
+| [512x512x512](./configs/config_infer_24g_512x512x512.json)   |4x128x128x128| [64,64,48], 36 patches                 | 2                           | 22G         | 569s    | 29s      |
 |               |                                        |                             |             |         |          |
-| [512x512x512](./configs/config_infer_32g_512x512x512.json)   |128x128x128| [64,64,64], 27 patches                 | 2                           | 26G         | 569s    | 40s      |
+| [512x512x512](./configs/config_infer_32g_512x512x512.json)   |4x128x128x128| [64,64,64], 27 patches                 | 2                           | 26G         | 569s    | 40s      |
 |               |                                        |                             |             |         |          |
-| [512x512x128](./configs/config_infer_80g_512x512x128.json)   |128x128x32| >=[128,128,32], not used               | 4                           | 37G         | 138s    | 140s     |
-| [512x512x512](./configs/config_infer_80g_512x512x512.json)   |128x128x128| [80,80,80], 8 patches                  | 2                           | 44G         | 569s    | 30s      |
-| [512x512x768](./configs/config_infer_24g_512x512x768.json)   |128x128x192| [80,80,112], 8 patches                 | 4                           | 55G         | 904s    | 48s      |
+| [512x512x128](./configs/config_infer_80g_512x512x128.json)   |4x128x128x32| >=[128,128,32], not used               | 4                           | 37G         | 138s    | 140s     |
+| [512x512x512](./configs/config_infer_80g_512x512x512.json)   |4x128x128x128| [80,80,80], 8 patches                  | 2                           | 44G         | 569s    | 30s      |
+| [512x512x768](./configs/config_infer_24g_512x512x768.json)   |4x128x128x192| [80,80,112], 8 patches                 | 4                           | 55G         | 904s    | 48s      |
 
 **Table 3:** Inference Time Cost and GPU Memory Usage. `DM Time` refers to the time cost of diffusion model inference. `VAE Time` refers to the time cost of VAE decoder inference. The total inference time is the `DM Time` plus `VAE Time`. When `autoencoder_sliding_window_infer_size` is equal or larger than the latent feature size, sliding window will not be used,
 and the time and memory cost remain the same. The experiment was tested on A100 80G GPU.