reformat

Can-Zhao · Can-Zhao · commit a90cbd0b8e3d · 2025-03-11T22:26:16.000Z
Signed-off-by: Can-Zhao &lt;canz@nvidia.com&gt;
diff --git a/generation/maisi/maisi_diff_unet_training_tutorial.ipynb b/generation/maisi/maisi_diff_unet_training_tutorial.ipynb
@@ -28,7 +28,7 @@
     "\n",
     "In this notebook, we detail the procedure for training a 3D latent diffusion model to generate high-dimensional 3D medical images. Due to the potential for out-of-memory issues on most GPUs when generating large images (e.g., those with dimensions of 512 x 512 x 512 or greater), we have structured the training process into two primary steps: 1) generating image embeddings and 2) training 3D latent diffusion models. The subsequent sections will demonstrate the entire process using a simulated dataset.\n",
     "\n",
-    "`[Release Note (March 2025)]:` We are excited to announce the new MAISI Version `'maisi-rflow'`. Compared with the previous version `'maisi-ddpm'`, it accelerated latent diffusion model inference by 33x. Please see the detailed difference in the following section."
+    "`[Release Note (March 2025)]:` We are excited to announce the new MAISI Version `'maisi3d-rflow'`. Compared with the previous version `'maisi3d-ddpm'`, it accelerated latent diffusion model inference by 33x. Please see the detailed difference in the following section."
    ]
   },
   {
@@ -38,10 +38,10 @@
    "source": [
     "## Set up the MAISI version\n",
     "\n",
-    "Choose between `'maisi-ddpm'` and `'maisi-rflow'`. The differences are:\n",
-    "- The maisi version `'maisi-ddpm'` uses basic noise scheduler DDPM. `'maisi-rflow'` uses Rectified Flow scheduler, can be 33 times faster during inference.\n",
-    "- The maisi version `'maisi-ddpm'` requires training images to be labeled with body region (`\"top_region_index\"` and `\"bottom_region_index\"`), while `'maisi-rflow'` does not have such requirement. In other words, it is easier to prepare training data for `'maisi-rflow'`.\n",
-    "- For the released model weights, `'maisi-rflow'` can generate images with better quality for head region and small output volumes, and comparable quality for other cases compared with `'maisi-ddpm'`."
+    "Choose between `'maisi3d-ddpm'` and `'maisi3d-rflow'`. The differences are:\n",
+    "- The maisi version `'maisi3d-ddpm'` uses basic noise scheduler DDPM. `'maisi3d-rflow'` uses Rectified Flow scheduler, can be 33 times faster during inference.\n",
+    "- The maisi version `'maisi3d-ddpm'` requires training images to be labeled with body region (`\"top_region_index\"` and `\"bottom_region_index\"`), while `'maisi3d-rflow'` does not have such requirement. In other words, it is easier to prepare training data for `'maisi3d-rflow'`.\n",
+    "- For the released model weights, `'maisi3d-rflow'` can generate images with better quality for head region and small output volumes, and comparable quality for other cases compared with `'maisi3d-ddpm'`."
    ]
   },
   {
@@ -51,8 +51,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "maisi_version = \"maisi-ddpm\"\n",
-    "assert maisi_version in [\"maisi-ddpm\", \"maisi-rflow\"]"
+    "maisi_version = \"maisi3d-ddpm\"\n",
+    "assert maisi_version in [\"maisi3d-ddpm\", \"maisi3d-rflow\"]"
    ]
   },
   {
@@ -131,13 +131,12 @@
     "import numpy as np\n",
     "import nibabel as nib\n",
     "import subprocess\n",
+    "from IPython.display import Image, display\n",
     "\n",
     "from monai.apps import download_url\n",
     "from monai.data import create_test_image_3d\n",
     "from monai.config import print_config\n",
     "\n",
-    "from IPython.display import Image, display\n",
-    "\n",
     "from scripts.diff_model_setting import setup_logging\n",
     "\n",
     "print_config()\n",
@@ -152,10 +151,10 @@
    "source": [
     "## Set up the MAISI version\n",
     "\n",
-    "Choose between `'maisi-ddpm'` and `'maisi-rflow'`. The differences are:\n",
-    "- The maisi version `'maisi-ddpm'` uses basic noise scheduler DDPM. `'maisi-rflow'` uses Rectified Flow scheduler, can be 33 times faster during inference.\n",
-    "- The maisi version `'maisi-ddpm'` requires training images to be labeled with body region (`\"top_region_index\"` and `\"bottom_region_index\"`), while `'maisi-rflow'` does not have such requirement. In other words, it is easier to prepare training data for `'maisi-rflow'`.\n",
-    "- For the released model weights, `'maisi-rflow'` can generate images with better quality for head region and small output volumes, and comparable quality for other cases compared with `'maisi-ddpm'`."
+    "Choose between `'maisi3d-ddpm'` and `'maisi3d-rflow'`. The differences are:\n",
+    "- The maisi version `'maisi3d-ddpm'` uses basic noise scheduler DDPM. `'maisi3d-rflow'` uses Rectified Flow scheduler, can be 33 times faster during inference.\n",
+    "- The maisi version `'maisi3d-ddpm'` requires training images to be labeled with body region (`\"top_region_index\"` and `\"bottom_region_index\"`), while `'maisi3d-rflow'` does not have such requirement. In other words, it is easier to prepare training data for `'maisi3d-rflow'`.\n",
+    "- For the released model weights, `'maisi3d-rflow'` can generate images with better quality for head region and small output volumes, and comparable quality for other cases compared with `'maisi3d-ddpm'`."
    ]
   },
   {
@@ -165,8 +164,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "maisi_version = \"maisi-ddpm\"\n",
-    "assert maisi_version in [\"maisi-ddpm\", \"maisi-rflow\"]"
+    "maisi_version = \"maisi3d-ddpm\"\n",
+    "assert maisi_version in [\"maisi3d-ddpm\", \"maisi3d-rflow\"]"
    ]
   },
   {
@@ -213,7 +212,7 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "[2025-03-11 22:05:02.952][ INFO](notebook) - Generated simulated images.\n"
+      "[2025-03-11 22:16:41.000][ INFO](notebook) - Generated simulated images.\n"
      ]
     }
    ],
@@ -260,22 +259,22 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "[2025-03-11 22:05:02.966][ INFO](notebook) - files and folders under work_dir: ['predictions', 'config_maisi.json', 'models', 'sim_dataroot', 'config_maisi_diff_model.json', 'embeddings', 'environment_maisi_diff_model.json', 'sim_datalist.json'].\n",
-      "[2025-03-11 22:05:02.966][ INFO](notebook) - number of GPUs: 1.\n"
+      "[2025-03-11 22:16:41.012][ INFO](notebook) - files and folders under work_dir: ['predictions', 'config_maisi.json', 'models', 'sim_dataroot', 'config_maisi_diff_model.json', 'embeddings', 'environment_maisi_diff_model.json', 'sim_datalist.json'].\n",
+      "[2025-03-11 22:16:41.012][ INFO](notebook) - number of GPUs: 1.\n"
      ]
     }
    ],
    "source": [
     "env_config_path = \"./configs/environment_maisi_diff_model.json\"\n",
     "model_config_path = \"./configs/config_maisi_diff_model.json\"\n",
-    "if maisi_version == \"maisi-ddpm\":\n",
-    "    model_def_path = \"./configs/config_maisi-ddpm.json\"\n",
+    "if maisi_version == \"maisi3d-ddpm\":\n",
+    "    model_def_path = \"./configs/config_maisi3d-ddpm.json\"\n",
     "    include_body_region = True\n",
-    "elif maisi_version == \"maisi-rflow\":\n",
-    "    model_def_path = \"./configs/config_maisi-rflow.json\"\n",
+    "elif maisi_version == \"maisi3d-rflow\":\n",
+    "    model_def_path = \"./configs/config_maisi3d-rflow.json\"\n",
     "    include_body_region = False\n",
     "else:\n",
-    "    raise ValueError(f\"maisi_version has to be chosen from ['maisi-ddpm', 'maisi-rflow'], yet got {maisi_version}.\")\n",
+    "    raise ValueError(f\"maisi_version has to be chosen from ['maisi3d-ddpm', 'maisi3d-rflow'], yet got {maisi_version}.\")\n",
     "\n",
     "# Load environment configuration, model configuration and model definition\n",
     "with open(env_config_path, \"r\") as f:\n",
@@ -407,16 +406,16 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "[2025-03-11 22:05:02.977][ INFO](notebook) - Creating training data...\n"
+      "[2025-03-11 22:16:41.021][ INFO](notebook) - Creating training data...\n"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "\n",
-      "[2025-03-11 22:05:10.881][ INFO](creating training data) - Using device cuda:0\n",
-      "[2025-03-11 22:05:11.686][ INFO](creating training data) - filenames_raw: ['tr_image_001.nii.gz', 'tr_image_002.nii.gz']\n",
+      "[2025-03-11 22:16:50.396][ INFO](creating training data) - Using device cuda:0\n",
+      "[2025-03-11 22:16:51.402][ INFO](creating training data) - filenames_raw: ['tr_image_001.nii.gz', 'tr_image_002.nii.gz']\n",
       "\n"
      ]
     }
@@ -460,9 +459,9 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "[2025-03-11 22:05:13.881][ INFO](notebook) - data: {'dim': (64, 64, 32), 'spacing': [0.875, 0.875, 0.75], 'top_region_index': [0, 1, 0, 0], 'bottom_region_index': [0, 0, 1, 0]}.\n",
-      "[2025-03-11 22:05:13.884][ INFO](notebook) - data: {'dim': (64, 64, 32), 'spacing': [0.875, 0.875, 0.75], 'top_region_index': [0, 1, 0, 0], 'bottom_region_index': [0, 0, 1, 0]}.\n",
-      "[2025-03-11 22:05:13.885][ INFO](notebook) - Completed creating .json files for all embedding files.\n"
+      "[2025-03-11 22:16:53.638][ INFO](notebook) - data: {'dim': (64, 64, 32), 'spacing': [0.875, 0.875, 0.75], 'top_region_index': [0, 1, 0, 0], 'bottom_region_index': [0, 0, 1, 0]}.\n",
+      "[2025-03-11 22:16:53.640][ INFO](notebook) - data: {'dim': (64, 64, 32), 'spacing': [0.875, 0.875, 0.75], 'top_region_index': [0, 1, 0, 0], 'bottom_region_index': [0, 0, 1, 0]}.\n",
+      "[2025-03-11 22:16:53.641][ INFO](notebook) - Completed creating .json files for all embedding files.\n"
      ]
     }
    ],
@@ -539,34 +538,34 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "[2025-03-11 22:05:13.892][ INFO](notebook) - Training the model...\n"
+      "[2025-03-11 22:16:53.646][ INFO](notebook) - Training the model...\n"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "\n",
-      "[2025-03-11 22:05:24.419][ INFO](training) - Using cuda:0 of 1\n",
-      "[2025-03-11 22:05:24.419][ INFO](training) - [config] ckpt_folder -> ./temp_work_dir/./models.\n",
-      "[2025-03-11 22:05:24.419][ INFO](training) - [config] data_root -> ./temp_work_dir/./embeddings.\n",
-      "[2025-03-11 22:05:24.419][ INFO](training) - [config] data_list -> ./temp_work_dir/sim_datalist.json.\n",
-      "[2025-03-11 22:05:24.419][ INFO](training) - [config] lr -> 0.0001.\n",
-      "[2025-03-11 22:05:24.419][ INFO](training) - [config] num_epochs -> 2.\n",
-      "[2025-03-11 22:05:24.419][ INFO](training) - [config] num_train_timesteps -> 1000.\n",
-      "[2025-03-11 22:05:24.420][ INFO](training) - num_files_train: 2\n",
-      "[2025-03-11 22:05:26.152][ INFO](training) - Training from scratch.\n",
-      "[2025-03-11 22:05:26.539][ INFO](training) - Scaling factor set to 1.159977912902832.\n",
-      "[2025-03-11 22:05:26.539][ INFO](training) - scale_factor -> 1.159977912902832.\n",
-      "[2025-03-11 22:05:26.542][ INFO](training) - torch.set_float32_matmul_precision -> highest.\n",
-      "[2025-03-11 22:05:26.542][ INFO](training) - Epoch 1, lr 0.0001.\n",
-      "[2025-03-11 22:05:28.578][ INFO](training) - [2025-03-11 22:05:28] epoch 1, iter 1/2, loss: 0.7974, lr: 0.000100000000.\n",
-      "[2025-03-11 22:05:28.719][ INFO](training) - [2025-03-11 22:05:28] epoch 1, iter 2/2, loss: 0.7943, lr: 0.000056250000.\n",
-      "[2025-03-11 22:05:28.762][ INFO](training) - epoch 1 average loss: 0.7958.\n",
-      "[2025-03-11 22:05:30.615][ INFO](training) - Epoch 2, lr 2.5e-05.\n",
-      "[2025-03-11 22:05:31.002][ INFO](training) - [2025-03-11 22:05:31] epoch 2, iter 1/2, loss: 0.7898, lr: 0.000025000000.\n",
-      "[2025-03-11 22:05:31.105][ INFO](training) - [2025-03-11 22:05:31] epoch 2, iter 2/2, loss: 0.7886, lr: 0.000006250000.\n",
-      "[2025-03-11 22:05:31.168][ INFO](training) - epoch 2 average loss: 0.7892.\n",
+      "[2025-03-11 22:17:02.004][ INFO](training) - Using cuda:0 of 1\n",
+      "[2025-03-11 22:17:02.004][ INFO](training) - [config] ckpt_folder -> ./temp_work_dir/./models.\n",
+      "[2025-03-11 22:17:02.004][ INFO](training) - [config] data_root -> ./temp_work_dir/./embeddings.\n",
+      "[2025-03-11 22:17:02.004][ INFO](training) - [config] data_list -> ./temp_work_dir/sim_datalist.json.\n",
+      "[2025-03-11 22:17:02.004][ INFO](training) - [config] lr -> 0.0001.\n",
+      "[2025-03-11 22:17:02.004][ INFO](training) - [config] num_epochs -> 2.\n",
+      "[2025-03-11 22:17:02.004][ INFO](training) - [config] num_train_timesteps -> 1000.\n",
+      "[2025-03-11 22:17:02.005][ INFO](training) - num_files_train: 2\n",
+      "[2025-03-11 22:17:03.887][ INFO](training) - Training from scratch.\n",
+      "[2025-03-11 22:17:04.338][ INFO](training) - Scaling factor set to 1.159977912902832.\n",
+      "[2025-03-11 22:17:04.339][ INFO](training) - scale_factor -> 1.159977912902832.\n",
+      "[2025-03-11 22:17:04.341][ INFO](training) - torch.set_float32_matmul_precision -> highest.\n",
+      "[2025-03-11 22:17:04.341][ INFO](training) - Epoch 1, lr 0.0001.\n",
+      "[2025-03-11 22:17:05.278][ INFO](training) - [2025-03-11 22:17:05] epoch 1, iter 1/2, loss: 0.7973, lr: 0.000100000000.\n",
+      "[2025-03-11 22:17:05.673][ INFO](training) - [2025-03-11 22:17:05] epoch 1, iter 2/2, loss: 0.7969, lr: 0.000056250000.\n",
+      "[2025-03-11 22:17:05.718][ INFO](training) - epoch 1 average loss: 0.7971.\n",
+      "[2025-03-11 22:17:07.383][ INFO](training) - Epoch 2, lr 2.5e-05.\n",
+      "[2025-03-11 22:17:07.777][ INFO](training) - [2025-03-11 22:17:07] epoch 2, iter 1/2, loss: 0.7932, lr: 0.000025000000.\n",
+      "[2025-03-11 22:17:07.881][ INFO](training) - [2025-03-11 22:17:07] epoch 2, iter 2/2, loss: 0.7904, lr: 0.000006250000.\n",
+      "[2025-03-11 22:17:07.942][ INFO](training) - epoch 2 average loss: 0.7918.\n",
       "\n"
      ]
     }
@@ -612,32 +611,32 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "[2025-03-11 22:05:35.033][ INFO](notebook) - Running inference...\n",
-      "[2025-03-11 22:05:50.259][ INFO](notebook) - Completed all steps.\n"
+      "[2025-03-11 22:17:11.993][ INFO](notebook) - Running inference...\n",
+      "[2025-03-11 22:17:27.730][ INFO](notebook) - Completed all steps.\n"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "\n",
-      "[2025-03-11 22:05:43.502][ INFO](inference) - Using cuda:0 of 1 with random seed: 7854\n",
-      "[2025-03-11 22:05:43.502][ INFO](inference) - [config] ckpt_filepath -> ./temp_work_dir/./models/diff_unet_ckpt.pt.\n",
-      "[2025-03-11 22:05:43.502][ INFO](inference) - [config] random_seed -> 7854.\n",
-      "[2025-03-11 22:05:43.502][ INFO](inference) - [config] output_prefix -> unet_3d.\n",
-      "[2025-03-11 22:05:43.502][ INFO](inference) - [config] output_size -> (256, 256, 128).\n",
-      "[2025-03-11 22:05:43.502][ INFO](inference) - [config] out_spacing -> (1.0, 1.0, 0.75).\n",
-      "[2025-03-11 22:05:43.502][ INFO](root) - `controllable_anatomy_size` is not provided.\n",
-      "[2025-03-11 22:05:45.793][ INFO](inference) - checkpoints ./temp_work_dir/./models/diff_unet_ckpt.pt loaded.\n",
-      "[2025-03-11 22:05:45.795][ INFO](inference) - scale_factor -> 1.159977912902832.\n",
-      "[2025-03-11 22:05:45.796][ INFO](inference) - num_downsample_level -> 4, divisor -> 4.\n",
-      "[2025-03-11 22:05:45.798][ INFO](inference) - noise: cuda:0, torch.float32, <class 'torch.Tensor'>\n",
+      "[2025-03-11 22:17:20.465][ INFO](inference) - Using cuda:0 of 1 with random seed: 23141\n",
+      "[2025-03-11 22:17:20.466][ INFO](inference) - [config] ckpt_filepath -> ./temp_work_dir/./models/diff_unet_ckpt.pt.\n",
+      "[2025-03-11 22:17:20.466][ INFO](inference) - [config] random_seed -> 23141.\n",
+      "[2025-03-11 22:17:20.466][ INFO](inference) - [config] output_prefix -> unet_3d.\n",
+      "[2025-03-11 22:17:20.466][ INFO](inference) - [config] output_size -> (256, 256, 128).\n",
+      "[2025-03-11 22:17:20.466][ INFO](inference) - [config] out_spacing -> (1.0, 1.0, 0.75).\n",
+      "[2025-03-11 22:17:20.466][ INFO](root) - `controllable_anatomy_size` is not provided.\n",
+      "[2025-03-11 22:17:23.065][ INFO](inference) - checkpoints ./temp_work_dir/./models/diff_unet_ckpt.pt loaded.\n",
+      "[2025-03-11 22:17:23.067][ INFO](inference) - scale_factor -> 1.159977912902832.\n",
+      "[2025-03-11 22:17:23.068][ INFO](inference) - num_downsample_level -> 4, divisor -> 4.\n",
+      "[2025-03-11 22:17:23.070][ INFO](inference) - noise: cuda:0, torch.float32, <class 'torch.Tensor'>\n",
       "\n",
       "  0%|          | 0/10 [00:00<?, ?it/s]\n",
-      " 10%|█         | 1/10 [00:00<00:05,  1.78it/s]\n",
-      " 60%|██████    | 6/10 [00:00<00:00, 11.19it/s]\n",
-      "100%|██████████| 10/10 [00:00<00:00, 12.88it/s]\n",
-      "[2025-03-11 22:05:48.356][ INFO](inference) - Saved ./temp_work_dir/./predictions/unet_3d_seed7854_size256x256x128_spacing1.00x1.00x0.75_20250311220547_rank0.nii.gz.\n",
+      " 10%|█         | 1/10 [00:00<00:07,  1.24it/s]\n",
+      " 60%|██████    | 6/10 [00:00<00:00,  8.37it/s]\n",
+      "100%|██████████| 10/10 [00:01<00:00,  9.78it/s]\n",
+      "[2025-03-11 22:17:25.828][ INFO](inference) - Saved ./temp_work_dir/./predictions/unet_3d_seed23141_size256x256x128_spacing1.00x1.00x0.75_20250311221725_rank0.nii.gz.\n",
       "\n"
      ]
     }