You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: generation/maisi/README.md
+19-2Lines changed: 19 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,11 @@ This example demonstrates the applications of training and validating NVIDIA MAI
6
6
- A Foundation Diffusion model that can generate large CT volumes up to 512 × 512 × 768 size, with flexible volume size and voxel size
7
7
- A ControlNet to generate image/mask pairs that can improve downstream tasks, with controllable organ/tumor size
8
8
9
+
## Minimum GPU requirement
10
+
For image size equal or smaller than 512x512x128, the minimum GPU memory for training and inference is 16G.
11
+
12
+
For image size equal or smaller than 512x512x512, the minimum GPU memory for training is 40G, for inference is 24G.
13
+
9
14
## Example Results and Evaluation
10
15
11
16
We retrained several state-of-the-art diffusion model-based methods using our dataset. The results in the table and figure below show that our method outperforms previous methods on an unseen dataset ([autoPET 2023](https://www.nature.com/articles/s41597-022-01718-3)). Our method shows superior performance to previous methods based on all [Fréchet Inception Distance (FID)](https://papers.nips.cc/paper/2017/hash/8a1d694707eb0fefe65871369074926d-Abstract.html) scores on different 2D planes. Here we compared the generated images with real images of size 512 × 512 × 512 and spacing 1.0 × 1.0 × 1.0 mm<sup>3</sup>.
@@ -31,6 +36,18 @@ We retrained several state-of-the-art diffusion model-based methods using our da
**Table 2:** Performance comparison of the `MAIS VAE` model on out-of-distribution datasets (i.e., unseen during MAISI VAE training) versus `Dedicated VAE` models (i.e., train from scratch on in-distribution data). The “GPU” column shows additional GPU hours for training with one 32G V100 GPU. MAISI VAE model achieved comparable results without additional GPU resource expenditure on unseen datasets.
49
+
50
+
34
51
## Time Cost and GPU Memory Usage
35
52
36
53
### Inference Time Cost and GPU Memory Usage
@@ -63,8 +80,8 @@ VAE is trained on patches and thus can be trained with 16G GPU if patch size is
63
80
Users can adjust patch size to fit the GPU memory.
64
81
For the released model, we first trained the autoencoder with 16G V100 with small patch size [64,64,64], then continued training with 32G V100 with patch size of [128,128,128].
65
82
66
-
DM and ControlNet training GPU memory usage depends on the input image size.
67
-
|`image_size`|`latent_size`| Peak Memory |
83
+
DM and ControlNet is train on the whole image instead of patches. The training GPU memory usage depends on the input image size.
0 commit comments