diff --git a/applications/surgical_scene_recon/README.md b/applications/surgical_scene_recon/README.md index 28aa61badc..a423de6cbb 100644 --- a/applications/surgical_scene_recon/README.md +++ b/applications/surgical_scene_recon/README.md @@ -1,16 +1,14 @@ # Surgical Scene Reconstruction with Gaussian Splatting -Real-time 3D surgical scene reconstruction using Gaussian Splatting in a Holoscan streaming pipeline with temporal deformation for accurate tissue modeling. +This application demonstrates real-time 3D surgical scene reconstruction by combining **Holoscan SDK** for high-performance streaming, **3D Gaussian Splatting** for neural 3D representation, and **temporal deformation networks** for accurate modeling of dynamic tissue. -![Training Visualization - Ground Truth vs Rendered](train_gt_animation.gif) -## Overview +![Training Visualization - Ground Truth vs Rendered](train_gt_animation.gif) -This application demonstrates real-time 3D surgical scene reconstruction by combining **Holoscan SDK** for high-performance streaming, **3D Gaussian Splatting** for neural 3D representation, and **temporal deformation networks** for accurate modeling of dynamic tissue. -The application provides a complete end-to-end pipeline—from raw surgical video to real-time 3D reconstruction—enabling researchers and developers to train custom models on their own endoscopic data and visualize results with GPU-accelerated rendering. +The application provides a complete end-to-end pipeline—from raw surgical video to real-time 3D reconstruction. Researchers and developers can use it to train custom models on their own endoscopic data and visualize results with GPU-accelerated rendering. -### Key Features +Features of this application include: - **Real-time Visualization:** Stream surgical scene reconstruction at >30 FPS using Holoscan - **Temporal Deformation:** Accurate per-frame tissue modeling as it deforms over time @@ -19,13 +17,9 @@ The application provides a complete end-to-end pipeline—from raw surgical vide - **Two Operation Modes:** Inference-only (with pre-trained checkpoint) OR train-then-render - **Production Ready:** Tested and optimized Holoscan pipeline with complete Docker containerization -### What It Does +It takes input from EndoNeRF surgical datasets (RGB images + stereo depth + camera poses + tool masks). It processes the input using multi-frame Gaussian Splatting with a 4D spatiotemporal deformation network. And it outputs real-time 3D tissue reconstruction without surgical instruments. -- **Input:** EndoNeRF surgical dataset (RGB images + stereo depth + camera poses + tool masks) -- **Process:** Multi-frame Gaussian Splatting with 4D spatiotemporal deformation network -- **Output:** Real-time 3D tissue reconstruction without surgical instruments - -### Use Cases +It is ideal for use cases, such as: - Surgical scene understanding and visualization - Tool-free tissue reconstruction for analysis @@ -34,7 +28,7 @@ The application provides a complete end-to-end pipeline—from raw surgical vide ## Quick Start -### Step 1: Clone HoloHub +### Step 1: Clone the HoloHub Repository ```bash git clone https://github.com/nvidia-holoscan/holohub.git @@ -43,23 +37,26 @@ cd holohub ### Step 2: Read and Agree to the Terms and Conditions of the EndoNeRF Sample Dataset -- Read and agree to the [Terms and Conditions](https://docs.google.com/document/d/1P6q2hXoGpVMKeD-PpjYYdZ0Yx1rKZdJF1rXxpobbFMY/edit?usp=share_link) for the EndoNeRF dataset. -- EndoNeRF sample dataset is being downloaded automatically when building the application. For manual download, please refer to the [Data](#data) section below. -- If you do not agree to the terms and conditions, set the `HOLOHUB_DOWNLOAD_DATASETS` environment variable to `OFF` and manually download the dataset and place it in the correct location by following the instructions in the [Data](#data) section below. +1. Read and agree to the [Terms and Conditions](https://docs.google.com/document/d/1P6q2hXoGpVMKeD-PpjYYdZ0Yx1rKZdJF1rXxpobbFMY/edit?usp=share_link) for the EndoNeRF dataset. +2. EndoNeRF sample dataset is being downloaded automatically when building the application. +3. Optionally, for manual download of the dataset, refer to the [Data](#pulling-soft-tissues-dataset) section below. +4. Optionally, if you do not agree to the terms and conditions, set the `HOLOHUB_DOWNLOAD_DATASETS` environment variable to `OFF` and manually download the dataset and place it in the correct location by following the instructions in the [Data](#pulling-soft-tissues-dataset) section below. - ```bash - export HOLOHUB_DOWNLOAD_DATASETS=OFF - ``` + ```bash + export HOLOHUB_DOWNLOAD_DATASETS=OFF + ``` ### Step 3: Run Training +To run the model training: + ```bash ./holohub run surgical_scene_recon train ``` -### Step 4: Dynamic Rendering with Trained Model +### Step 4: Dynamic Rendering with a Trained Model -After training completes, visualize your results in real-time: +After training completes, to visualize your results in real-time, run the surgical render: ```bash ./holohub run surgical_scene_recon render @@ -67,7 +64,7 @@ After training completes, visualize your results in real-time: ![Dynamic Rendering Visualization](surg_recon_inference.gif) -## Data +## Obtaining the Pulling Soft Tissues Dataset This application uses the **EndoNeRF "pulling_soft_tissues" dataset**, which contains: @@ -76,36 +73,43 @@ This application uses the **EndoNeRF "pulling_soft_tissues" dataset**, which con - Tool segmentation masks for instrument removal - Camera poses and bounds (poses_bounds.npy) -### Download +### Download the Dataset -📦 **Direct Google Drive:** +You can download the dataset from one of the following locations: -In the Google Drive folder, you'll see: +* 📦 Direct Google Drive: -- `cutting_tissues_twice` -- `pulling_soft_tissues` ← **Download this one** + 1. In the Google Drive folder, you'll see: -**Alternative:** Visit the [EndoNeRF repository](https://github.com/med-air/EndoNeRF) + - `cutting_tissues_twice` + - `pulling_soft_tissues` + + 2. Download `pulling_soft_tissues`. + +* Visit the [EndoNeRF repository](https://github.com/med-air/EndoNeRF). ### Dataset Setup The dataset will be automatically used by the application when placed in the correct location. Refer to the [HoloHub glossary](../../README.md#Glossary) for definitions of HoloHub-specific directory terms used below. -Place the dataset at `/data/EndoNeRF/pulling/`: +To place the dataset at `/data/EndoNeRF/pulling/`: -```bash -# From the HoloHub root directory -mkdir -p data/EndoNeRF +1. From the HoloHub root directory: + ```bash + mkdir -p data/EndoNeRF + ``` -# Extract and move (or copy) the downloaded dataset -mv /path/to/pulling_soft_tissues data/EndoNeRF/pulling -``` +2. Extract and move (or copy) the downloaded dataset: -**⚠️ Important:** The dataset MUST be physically at the path above—do NOT use symlinks! Docker containers cannot follow symlinks outside mounted volumes. + ```bash + mv /path/to/pulling_soft_tissues data/EndoNeRF/pulling + ``` -### Verify Dataset Structure +**Important:** The dataset MUST be physically at the path above, do NOT use symlinks. Docker containers cannot follow symlinks outside mounted volumes. -Your dataset should have this structure: +### Verify the Dataset Structure + +Verify that your dataset has this structure: ```text / @@ -118,39 +122,53 @@ Your dataset should have this structure: └── poses_bounds.npy # Camera poses (8.5 KB) ``` -## Model +## Models Used by the `surgical_scene_recon` Application + +The `surgical_scene_recon` application uses a **3D Gaussian Splatting** model with a **temporal deformation network** for dynamic scene reconstruction. + +- Gaussian Splatting Model -The application uses **3D Gaussian Splatting** with a **temporal deformation network** for dynamic scene reconstruction: + The Gaussian Splatting model can be described as: -### Gaussian Splatting + - Architecture: 3D Gaussians with learned position, scale, rotation, opacity, and color + - Initialization: Multi-frame point cloud (~30,000-50,000 points from all frames) + - Renderer: `gsplat` library (CUDA-accelerated differentiable rasterization) + - Spherical Harmonics of degree 3 (16 coefficients per gaussian for view-dependent color) + - Resolution: 640×512 pixels (RGB, three channels) -- **Architecture:** 3D Gaussians with learned position, scale, rotation, opacity, and color -- **Initialization:** Multi-frame point cloud (~30,000-50,000 points from all frames) -- **Renderer:** gsplat library (CUDA-accelerated differentiable rasterization) -- **Spherical Harmonics:** Degree 3 (16 coefficients per gaussian for view-dependent color) -- **Resolution:** 640×512 pixels (RGB, 3 channels) +- Temporal Deformation Network Model -### Temporal Deformation Network + The Temporal Deformation Network model deforms 3D Gaussians and can be described as: -- **Architecture:** HexPlane 4D spatiotemporal grid + MLP decoder -- **Input:** 3D position + normalized time value [0, 1] -- **Output:** Deformed position, scale, rotation, and opacity changes -- **Training:** Two-stage process (coarse: static, fine: with deformation) -- **Inference:** Direct PyTorch (no conversion, full precision) + - Architecture: HexPlane 4D spatiotemporal grid + MLP decoder + - Input: 3D position + normalized time value [0, 1] + - Output: Deformed position, scale, rotation, and opacity changes + - Training: Two-stage process (coarse: static, fine: with deformation) + - Inference: Direct PyTorch (no conversion, full precision) -### Training Process +## About the Model Training Process The application trains in two stages: -1. **Coarse Stage:** Learn base static Gaussians without deformation -2. **Fine Stage:** Add temporal deformation network for dynamic tissue modeling +1. The Coarse Stage where the application learns the base static Gaussian models without deformation. +2. The Fine Stage where a temporal deformation network model is added for dynamic tissue modeling. The training uses: -- **Multi-modal Data:** RGB images, depth maps, tool segmentation masks -- **Loss Functions:** RGB loss, depth loss, TV loss, masking losses -- **Optimization:** Adam optimizer with batch-size scaled learning rates -- **Tool Removal:** Segmentation masks applied during training for tissue-only reconstruction +- Multi-modal Data: RGB images, depth maps, tool segmentation masks +- Loss Functions: RGB loss, depth loss, TV loss, masking losses +- Optimization: Adam optimizer with batch-size scaled learning rates +- Tool Removal: Segmentation masks applied during training for tissue-only reconstruction + +The **training pipeline** (`gsplat_train.py`) runs in the following order: + +1. Data Loading using EndoNeRF parser loads RGB, depth, masks, and poses. +2. Initialization uses Multi-frame point cloud (~30k points). +3. Training happens in two stages: + - Coarse + - Fine +4. Optimization is done by the Adam (Adaptive Moment Estimation) optimizer with batch-size scaled learning rates. +5. Regularization, for depth loss, TV loss, and masking losses, is performed on the data. The default training command trains a model on all 63 frames with 2000 iterations, producing smooth temporal deformation and high-quality reconstruction. @@ -178,60 +196,48 @@ EndoNeRFLoaderOp → GsplatLoaderOp → GsplatRenderOp → HolovizOp ImageSaverOp ``` -**Components:** - - **EndoNeRFLoaderOp:** Streams camera poses and timestamps - **GsplatLoaderOp:** Loads checkpoint and deformation network - **GsplatRenderOp:** Applies temporal deformation and renders - **HolovizOp:** Real-time GPU-accelerated visualization - **ImageSaverOp:** Optional frame saving -## Requirements +## Requirements for the `surgical_scene_recon` Application - **Hardware:** - NVIDIA GPU (RTX 3000+ series recommended, tested on RTX 6000 Ada Generation) - - ~2 GB free disk space (dataset) - - ~30 GB free disk space (Docker container) + - ~2 GB free disk space (for the dataset) + - ~30 GB free disk space (for Docker containers) - **Software:** - Docker with NVIDIA GPU support - X11 display server (for visualization) - - Holoscan SDK 3.7.0 or later (automatically provided in container) + - Holoscan SDK 3.7.0 or later (automatically provided in containers) + +## Application Integration Testing -## Testing +We provide integration tests. -We provide integration tests that can be run with the following command to test the application for training and inference: +To test the application for training and inference, run: ```bash ./holohub test surgical_scene_recon --verbose ``` -## Technical Details - -### Training Pipeline (gsplat_train.py) - -1. **Data Loading:** EndoNeRF parser loads RGB, depth, masks, poses -2. **Initialization:** Multi-frame point cloud (~30k points) -3. **Two-Stage Training:** - - **Coarse:** Learn base Gaussians (no deformation) - - **Fine:** Add temporal deformation network -4. **Optimization:** Adam with batch-size scaled learning rates -5. **Regularization:** Depth loss, TV loss, masking losses - -### Performance +## Performance -**Tested Configuration:** +Tested Configuration: -- **GPU:** NVIDIA RTX 6000 Ada Generation -- **Container:** Holoscan SDK 3.7.0 -- **Training Time:** ~5 minutes (63 frames, 2000 iterations) -- **Rendering:** Real-time >30 FPS +- GPU: NVIDIA RTX 6000 Ada Generation +- Container: Holoscan SDK 3.7.0 +- Training Time: ~5 minutes (63 frames, 2000 iterations) +- Rendering: Real-time >30 FPS -**Quality Metrics (train mode):** +Quality Metrics (train mode): -- **PSNR:** ~36-38 dB -- **SSIM:** ~0.80 -- **Gaussians:** ~50,000 splats -- **Deformation:** Smooth temporal consistency +- PSNR: ~36-38 dB +- SSIM: ~0.80 +- Gaussians: ~50,000 splats +- Deformation: Smooth temporal consistency ## Troubleshooting @@ -263,40 +269,40 @@ We provide integration tests that can be run with the following command to test ### Citation -If you use this work, please cite: +If you use this work, cite the following: -**EndoNeRF:** +* EndoNeRF: -```bibtex -@inproceedings{wang2022endonerf, - title={EndoNeRF: Neural Rendering for Stereo 3D Reconstruction of Deformable Tissues in Robotic Surgery}, - author={Wang, Yuehao and Yifan, Wang and Tao, Rui and others}, - booktitle={MICCAI}, - year={2022} -} -``` + ```bibtex + @inproceedings{wang2022endonerf, + title={EndoNeRF: Neural Rendering for Stereo 3D Reconstruction of Deformable Tissues in Robotic Surgery}, + author={Wang, Yuehao and Yifan, Wang and Tao, Rui and others}, + booktitle={MICCAI}, + year={2022} + } + ``` -**3D Gaussian Splatting:** +* 3D Gaussian Splatting: -```bibtex -@article{kerbl20233d, - title={3d gaussian splatting for real-time radiance field rendering}, - author={Kerbl, Bernhard and Kopanas, Georgios and Leimk{\"u}hler, Thomas and Drettakis, George}, - journal={ACM Transactions on Graphics}, - year={2023} -} -``` + ```bibtex + @article{kerbl20233d, + title={3d gaussian splatting for real-time radiance field rendering}, + author={Kerbl, Bernhard and Kopanas, Georgios and Leimk{\"u}hler, Thomas and Drettakis, George}, + journal={ACM Transactions on Graphics}, + year={2023} + } + ``` -**gsplat Library:** +* `gsplat` Library: -```bibtex -@software{ye2024gsplat, - title={gsplat}, - author={Ye, Vickie and Turkulainen, Matias and others}, - year={2024}, - url={https://github.com/nerfstudio-project/gsplat} -} -``` + ```bibtex + @software{ye2024gsplat, + title={gsplat}, + author={Ye, Vickie and Turkulainen, Matias and others}, + year={2024}, + url={https://github.com/nerfstudio-project/gsplat} + } + ``` ### License