nvidia-holoscan · megnvidia · Jan 7, 2026 · Jan 8, 2026 · Jan 8, 2026 · Jan 8, 2026
diff --git a/applications/surgical_scene_recon/README.md b/applications/surgical_scene_recon/README.md
@@ -1,16 +1,15 @@
 # Surgical Scene Reconstruction with Gaussian Splatting
 
+This application demonstrates real-time 3D surgical scene reconstruction by combining **Holoscan SDK** for high-performance streaming, **3D Gaussian Splatting** for neural 3D representation, and **temporal deformation networks** for accurate modeling of dynamic tissue.
+
 Real-time 3D surgical scene reconstruction using Gaussian Splatting in a Holoscan streaming pipeline with temporal deformation for accurate tissue modeling.
-Real-time 3D surgical scene reconstruction using Gaussian Splatting in a Holoscan streaming pipeline with temporal deformation for accurate tissue modeling.
-Real-time 3D surgical scene reconstruction using Gaussian Splatting in a Holoscan streaming pipeline with temporal deformation for accurate tissue modeling.
 
 ![Training Visualization - Ground Truth vs Rendered](train_gt_animation.gif)
 
-## Overview
-
-This application demonstrates real-time 3D surgical scene reconstruction by combining **Holoscan SDK** for high-performance streaming, **3D Gaussian Splatting** for neural 3D representation, and **temporal deformation networks** for accurate modeling of dynamic tissue.
 
-The application provides a complete end-to-end pipeline—from raw surgical video to real-time 3D reconstruction—enabling researchers and developers to train custom models on their own endoscopic data and visualize results with GPU-accelerated rendering.
+The application provides a complete end-to-end pipeline—from raw surgical video to real-time 3D reconstruction. Researchers and developers can use it to train custom models on their own endoscopic data and visualize results with GPU-accelerated rendering.
 
-### Key Features
+Features of this application include:
 
 - **Real-time Visualization:** Stream surgical scene reconstruction at >30 FPS using Holoscan
 - **Temporal Deformation:** Accurate per-frame tissue modeling as it deforms over time
@@ -19,13 +18,9 @@ The application provides a complete end-to-end pipeline—from raw surgical vide
 - **Two Operation Modes:** Inference-only (with pre-trained checkpoint) OR train-then-render
 - **Production Ready:** Tested and optimized Holoscan pipeline with complete Docker containerization
 
-### What It Does
+It takes input from EndoNeRF surgical datasets (RGB images + stereo depth + camera poses + tool masks). It processes the input using multi-frame Gaussian Splatting with a 4D spatiotemporal deformation network. And it outputs real-time 3D tissue reconstruction without surgical instruments.
 
-- **Input:** EndoNeRF surgical dataset (RGB images + stereo depth + camera poses + tool masks)
-- **Process:** Multi-frame Gaussian Splatting with 4D spatiotemporal deformation network
-- **Output:** Real-time 3D tissue reconstruction without surgical instruments
-
-### Use Cases
+It is ideal for use cases, such as:
-It is ideal for use cases, such as:
+It is ideal for use cases, such as:
-It is ideal for use cases, such as:
+It is ideal for use cases, such as:
 
 - Surgical scene understanding and visualization
 - Tool-free tissue reconstruction for analysis
@@ -34,7 +29,7 @@ The application provides a complete end-to-end pipeline—from raw surgical vide
 
 ## Quick Start
 
-### Step 1: Clone HoloHub
+### Step 1: Clone the HoloHub Repository
 
 ```bash
 git clone https://github.com/nvidia-holoscan/holohub.git
@@ -43,31 +38,34 @@ cd holohub
 
 ### Step 2: Read and Agree to the Terms and Conditions of the EndoNeRF Sample Dataset
 
-- Read and agree to the [Terms and Conditions](https://docs.google.com/document/d/1P6q2hXoGpVMKeD-PpjYYdZ0Yx1rKZdJF1rXxpobbFMY/edit?usp=share_link) for the EndoNeRF dataset.
-- EndoNeRF sample dataset is being downloaded automatically when building the application. For manual download, please refer to the [Data](#data) section below.
-- If you do not agree to the terms and conditions, set the `HOLOHUB_DOWNLOAD_DATASETS` environment variable to `OFF` and manually download the dataset and place it in the correct location by following the instructions in the [Data](#data) section below.
+1. Read and agree to the [Terms and Conditions](https://docs.google.com/document/d/1P6q2hXoGpVMKeD-PpjYYdZ0Yx1rKZdJF1rXxpobbFMY/edit?usp=share_link) for the EndoNeRF dataset.
+1. EndoNeRF sample dataset is being downloaded automatically when building the application. 
+1. Optionally, for manual download of the dataset, refer to the [Data](#pulling-soft-tissues-dataset) section below.
+1. Optionally, if you do not agree to the terms and conditions, set the `HOLOHUB_DOWNLOAD_DATASETS` environment variable to `OFF` and manually download the dataset and place it in the correct location by following the instructions in the [Data](#pulling-soft-tissues-dataset) section below.
 
-  ```bash
-  export HOLOHUB_DOWNLOAD_DATASETS=OFF
-  ```
+    ```bash
+    export HOLOHUB_DOWNLOAD_DATASETS=OFF
+    ```
 
 ### Step 3: Run Training
 
-```bash
-./holohub run surgical_scene_recon train
-```
+To run the virtual surgical training:
+
+  ```bash
+  ./holohub run surgical_scene_recon train
+  ```
 
-### Step 4: Dynamic Rendering with Trained Model
+### Step 4: Dynamic Rendering with a Trained Model
 
-After training completes, visualize your results in real-time:
+After training completes, to visualize your results in real-time, run the surgical render:
-After training completes, to visualize your results in real-time, run the surgical render:
+After training completes, to visualize your results in real-time:
-After training completes, to visualize your results in real-time, run the surgical render:
+After training completes, to visualize your results in real-time:
 
 ```bash
 ./holohub run surgical_scene_recon render
 ```
 
 ![Dynamic Rendering Visualization](surg_recon_inference.gif)
 
-## Data
+## Pulling Soft Tissues Dataset
 
 This application uses the **EndoNeRF "pulling_soft_tissues" dataset**, which contains:
 
@@ -76,36 +74,43 @@ This application uses the **EndoNeRF "pulling_soft_tissues" dataset**, which con
 - Tool segmentation masks for instrument removal
 - Camera poses and bounds (poses_bounds.npy)
 
-### Download
+### Download the Dataset
 
-📦 **Direct Google Drive:** <https://drive.google.com/drive/folders/1zTcX80c1yrbntY9c6-EK2W2UVESVEug8?usp=sharing>
+You can download the dataset from one of the following locations:
 
-In the Google Drive folder, you'll see:
+* 📦 Direct Google Drive: <https://drive.google.com/drive/folders/1zTcX80c1yrbntY9c6-EK2W2UVESVEug8?usp=sharing>
 
-- `cutting_tissues_twice`
-- `pulling_soft_tissues` ← **Download this one**
+  1. In the Google Drive folder, you'll see:
 
-**Alternative:** Visit the [EndoNeRF repository](https://github.com/med-air/EndoNeRF)
+      - `cutting_tissues_twice`
+      - `pulling_soft_tissues`
+
+  1. Download `pulling_soft_tissues`.
+
+* Visit the [EndoNeRF repository](https://github.com/med-air/EndoNeRF).
 
 ### Dataset Setup
 
 The dataset will be automatically used by the application when placed in the correct location. Refer to the [HoloHub glossary](../../README.md#Glossary) for definitions of HoloHub-specific directory terms used below.
 
-Place the dataset at `<HOLOHUB_ROOT>/data/EndoNeRF/pulling/`:
+To place the dataset at `<HOLOHUB_ROOT>/data/EndoNeRF/pulling/`:
 
-```bash
-# From the HoloHub root directory
-mkdir -p data/EndoNeRF
+1. From the HoloHub root directory:
+    ```bash
+    mkdir -p data/EndoNeRF
+    ```
 
-# Extract and move (or copy) the downloaded dataset
-mv /path/to/pulling_soft_tissues data/EndoNeRF/pulling
-```
+1. Extract and move (or copy) the downloaded dataset: 
+
+    ```bash
+    mv /path/to/pulling_soft_tissues data/EndoNeRF/pulling
+    ```
 
-**⚠️ Important:** The dataset MUST be physically at the path above—do NOT use symlinks! Docker containers cannot follow symlinks outside mounted volumes.
+**Important:** The dataset MUST be physically at the path above, do NOT use symlinks. Docker containers cannot follow symlinks outside mounted volumes.
 
-### Verify Dataset Structure
+### Verify the Dataset Structure
 
-Your dataset should have this structure:
+Verify that your dataset has this structure:
 
 ```text
 <HOLOHUB_ROOT>/
@@ -118,39 +123,55 @@ Your dataset should have this structure:
             └── poses_bounds.npy     # Camera poses (8.5 KB)
 ```
 
-## Model
+## Models Used by the `surgical_scene_recon` Application
 
-The application uses **3D Gaussian Splatting** with a **temporal deformation network** for dynamic scene reconstruction:
+The `surgical_scene_recon` application uses a **3D Gaussian Splatting** model with a **temporal deformation network** for dynamic scene reconstruction. 
 
-### Gaussian Splatting
 
-- **Architecture:** 3D Gaussians with learned position, scale, rotation, opacity, and color
-- **Initialization:** Multi-frame point cloud (~30,000-50,000 points from all frames)
-- **Renderer:** gsplat library (CUDA-accelerated differentiable rasterization)
-- **Spherical Harmonics:** Degree 3 (16 coefficients per gaussian for view-dependent color)
-- **Resolution:** 640×512 pixels (RGB, 3 channels)
+- Gaussian Splatting Model
 
-### Temporal Deformation Network
+  Each portion of the application makes use of different aspects of the Gaussian Splatting Model.
 
-- **Architecture:** HexPlane 4D spatiotemporal grid + MLP decoder
-- **Input:** 3D position + normalized time value [0, 1]
-- **Output:** Deformed position, scale, rotation, and opacity changes
-- **Training:** Two-stage process (coarse: static, fine: with deformation)
-- **Inference:** Direct PyTorch (no conversion, full precision)
+  - Architecture: 3D Gaussian with learned position, scale, rotation, opacity, and color
-  - Architecture: 3D Gaussian with learned position, scale, rotation, opacity, and color
+  - Architecture: 3D Gaussians with learned position, scale, rotation, opacity, and color
-  - Architecture: 3D Gaussian with learned position, scale, rotation, opacity, and color
+  - Architecture: 3D Gaussians with learned position, scale, rotation, opacity, and color
-  - Architecture: 3D Gaussian with learned position, scale, rotation, opacity, and color
+  - Architecture: 3D Gaussians with learned position, scale, rotation, opacity, and color
-  - Architecture: 3D Gaussian with learned position, scale, rotation, opacity, and color
+  - Architecture: 3D Gaussians with learned position, scale, rotation, opacity, and color
+  - Initialization: Multi-frame point cloud (~30,000-50,000 points from all frames)
+  - Renderer: `gsplat` library (CUDA-accelerated differentiable rasterization)
+  - Spherical Harmonics: Degree 3 (16 coefficients per gaussian for view-dependent color)
+  - Resolution: 640×512 pixels (RGB, three channels)
 
-### Training Process
+- Temporal Deformation Network model
+
-
+  The Temporal Deformation Network enables dynamic scene modeling by deforming the base Gaussian representations over time to accurately capture tissue movement and deformation during surgery.
-
+  The Temporal Deformation Network enables dynamic scene modeling by deforming the base Gaussian representations over time to accurately capture tissue movement and deformation during surgery.
+  Temporal Generative Network (TGN) model generates data that integrates bidirectional deformation estimation with temporal prediction to interpolate missing MRI data and forecast future frames.
+
+
+  - **Architecture:** HexPlane 4D spatiotemporal grid + MLP decoder
+  - **Input:** 3D position + normalized time value [0, 1]
+  - **Output:** Deformed position, scale, rotation, and opacity changes
+  - **Training:** Two-stage process (coarse: static, fine: with deformation)
+  - **Inference:** Direct PyTorch (no conversion, full precision)
+
+## About the Model Training Process
 
 The application trains in two stages:
 
-1. **Coarse Stage:** Learn base static Gaussians without deformation
-2. **Fine Stage:** Add temporal deformation network for dynamic tissue modeling
+1. The Coarse Stage where the application learns the base static Gaussian models without deformation.
+2. The Fine Stage where a temporal deformation network model is added for dynamic tissue modeling.
 
 The training uses:
 
-- **Multi-modal Data:** RGB images, depth maps, tool segmentation masks
-- **Loss Functions:** RGB loss, depth loss, TV loss, masking losses
-- **Optimization:** Adam optimizer with batch-size scaled learning rates
-- **Tool Removal:** Segmentation masks applied during training for tissue-only reconstruction
+- Multi-modal Data: RGB images, depth maps, tool segmentation masks
+- Loss Functions: RGB loss, depth loss, TV loss, masking losses
+- Optimization: Adam optimizer with batch-size scaled learning rates
+- Tool Removal: Segmentation masks applied during training for tissue-only reconstruction
+
+The **training pipeline** (`gsplat_train.py`) runs in the following order:
+
+1. Data Loading using EndoNeRF parser loads RGB, depth, masks, and poses.
+2. Initialization uses Multi-frame point cloud (~30k points).
+3. Training happens in two stages:
+   - Coarse
+   - Fine
+4. Optimization is done by the Adam (Adaptive Moment Estimation) optimizer with batch-size scaled learning rates.
+5. Regularization, for depth loss, TV loss, and masking losses, is performed on the data.
 
 The default training command trains a model on all 63 frames with 2000 iterations, producing smooth temporal deformation and high-quality reconstruction.
 
@@ -178,60 +199,49 @@ EndoNeRFLoaderOp → GsplatLoaderOp → GsplatRenderOp → HolovizOp
                                               ImageSaverOp
 ```
 
-**Components:**
-
 - **EndoNeRFLoaderOp:** Streams camera poses and timestamps
 - **GsplatLoaderOp:** Loads checkpoint and deformation network
 - **GsplatRenderOp:** Applies temporal deformation and renders
 - **HolovizOp:** Real-time GPU-accelerated visualization
 - **ImageSaverOp:** Optional frame saving
 
-## Requirements
+## Requirements for the `surgical_scene_recon` Application
 
 - **Hardware:**
   - NVIDIA GPU (RTX 3000+ series recommended, tested on RTX 6000 Ada Generation)
-  - ~2 GB free disk space (dataset)
-  - ~30 GB free disk space (Docker container)
+  - ~2 GB free disk space (for the dataset)
+  - ~30 GB free disk space (for Docker containers)
 - **Software:**
   - Docker with NVIDIA GPU support
   - X11 display server (for visualization)
-  - Holoscan SDK 3.7.0 or later (automatically provided in container)
+  - Holoscan SDK 3.7.0 or later (automatically provided in containers)
+
+## Application Integration Testing
 
-## Testing
+We provide integration tests.
 
-We provide integration tests that can be run with the following command to test the application for training and inference:
+To test the application for training and inference, run:
 
 ```bash
 ./holohub test surgical_scene_recon --verbose
 ```
 
-## Technical Details
 
-### Training Pipeline (gsplat_train.py)
+## Performance
 
-1. **Data Loading:** EndoNeRF parser loads RGB, depth, masks, poses
-2. **Initialization:** Multi-frame point cloud (~30k points)
-3. **Two-Stage Training:**
-   - **Coarse:** Learn base Gaussians (no deformation)
-   - **Fine:** Add temporal deformation network
-4. **Optimization:** Adam with batch-size scaled learning rates
-5. **Regularization:** Depth loss, TV loss, masking losses
+Tested Configuration:
 
-### Performance
+- GPU: NVIDIA RTX 6000 Ada Generation
+- Container: Holoscan SDK 3.7.0
+- Training Time: ~5 minutes (63 frames, 2000 iterations)
+- Rendering: Real-time >30 FPS
 
-**Tested Configuration:**
+Quality Metrics (train mode):
 
-- **GPU:** NVIDIA RTX 6000 Ada Generation
-- **Container:** Holoscan SDK 3.7.0
-- **Training Time:** ~5 minutes (63 frames, 2000 iterations)
-- **Rendering:** Real-time >30 FPS
-
-**Quality Metrics (train mode):**
-
-- **PSNR:** ~36-38 dB
-- **SSIM:** ~0.80
-- **Gaussians:** ~50,000 splats
-- **Deformation:** Smooth temporal consistency
+- PSNR: ~36-38 dB
+- SSIM: ~0.80
+- Gaussian functions: ~50,000 splats
+- Deformation: Smooth temporal consistency
 
 ## Troubleshooting
 
@@ -263,40 +273,40 @@ We provide integration tests that can be run with the following command to test
 
 ### Citation
 
-If you use this work, please cite:
+If you use this work, cite the following:
 
-**EndoNeRF:**
+* EndoNeRF:
 
-```bibtex
-@inproceedings{wang2022endonerf,
-  title={EndoNeRF: Neural Rendering for Stereo 3D Reconstruction of Deformable Tissues in Robotic Surgery},
-  author={Wang, Yuehao and Yifan, Wang and Tao, Rui and others},
-  booktitle={MICCAI},
-  year={2022}
-}
-```
+  ```bibtex
+  @inproceedings{wang2022endonerf,
+    title={EndoNeRF: Neural Rendering for Stereo 3D Reconstruction of Deformable Tissues in Robotic Surgery},
+    author={Wang, Yuehao and Yifan, Wang and Tao, Rui and others},
+    booktitle={MICCAI},
+    year={2022}
+  }
+  ```
 
-**3D Gaussian Splatting:**
+* 3D Gaussian Splatting:
 
-```bibtex
-@article{kerbl20233d,
-  title={3d gaussian splatting for real-time radiance field rendering},
-  author={Kerbl, Bernhard and Kopanas, Georgios and Leimk{\"u}hler, Thomas and Drettakis, George},
-  journal={ACM Transactions on Graphics},
-  year={2023}
-}
-```
+  ```bibtex
+  @article{kerbl20233d,
+    title={3d gaussian splatting for real-time radiance field rendering},
+    author={Kerbl, Bernhard and Kopanas, Georgios and Leimk{\"u}hler, Thomas and Drettakis, George},
+    journal={ACM Transactions on Graphics},
+    year={2023}
+  }
+  ```
 
-**gsplat Library:**
+* `gsplat` Library:
 
-```bibtex
-@software{ye2024gsplat,
-  title={gsplat},
-  author={Ye, Vickie and Turkulainen, Matias and others},
-  year={2024},
-  url={https://github.com/nerfstudio-project/gsplat}
-}
-```
+  ```bibtex
+  @software{ye2024gsplat,
+    title={gsplat},
+    author={Ye, Vickie and Turkulainen, Matias and others},
+    year={2024},
+    url={https://github.com/nerfstudio-project/gsplat}
+  }
+  ```
 
 ### License