Skip to content
Merged
Changes from 11 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
7ba449c
mmiranda nvidia style edits to surgical scene reconstruction
megnvidia Jan 7, 2026
9c5ba36
mmiranda near the end here I had to move the training pipeline bit in…
megnvidia Jan 8, 2026
cb9eb17
Update applications/surgical_scene_recon/README.md
megnvidia Jan 8, 2026
c5b9657
Update applications/surgical_scene_recon/README.md
megnvidia Jan 8, 2026
b8e97b9
Update applications/surgical_scene_recon/README.md
megnvidia Jan 8, 2026
3a688e5
Update applications/surgical_scene_recon/README.md
megnvidia Jan 8, 2026
183875e
Update applications/surgical_scene_recon/README.md
megnvidia Jan 8, 2026
4e982f8
Update applications/surgical_scene_recon/README.md
megnvidia Jan 8, 2026
5350154
Update applications/surgical_scene_recon/README.md
megnvidia Jan 8, 2026
1a71b0d
mmiranda iterating locally to catch other inconsistencies
megnvidia Jan 8, 2026
050a7e9
Merge branch 'main' into mmiranda-surgical-scene-edit
bhashemian Jan 8, 2026
8be0367
mmiranda some edits to address comments, still discussing some things
megnvidia Jan 15, 2026
248989e
Merge branch 'main' into mmiranda-surgical-scene-edit
bhashemian Jan 16, 2026
20a3ce1
Update applications/surgical_scene_recon/README.md
bhashemian Jan 21, 2026
c48adca
mmiranda some final adjustments
megnvidia Jan 21, 2026
b7918ac
Update applications/surgical_scene_recon/README.md
megnvidia Jan 21, 2026
54d31a8
Update applications/surgical_scene_recon/README.md
megnvidia Jan 21, 2026
a3b1594
Merge branch 'main' into mmiranda-surgical-scene-edit
bhashemian Jan 26, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
245 changes: 126 additions & 119 deletions applications/surgical_scene_recon/README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,14 @@
# Surgical Scene Reconstruction with Gaussian Splatting

Real-time 3D surgical scene reconstruction using Gaussian Splatting in a Holoscan streaming pipeline with temporal deformation for accurate tissue modeling.
This application demonstrates real-time 3D surgical scene reconstruction by combining **Holoscan SDK** for high-performance streaming, **3D Gaussian Splatting** for neural 3D representation, and **temporal deformation networks** for accurate modeling of dynamic tissue.

![Training Visualization - Ground Truth vs Rendered](train_gt_animation.gif)

## Overview
![Training Visualization - Ground Truth vs Rendered](train_gt_animation.gif)

This application demonstrates real-time 3D surgical scene reconstruction by combining **Holoscan SDK** for high-performance streaming, **3D Gaussian Splatting** for neural 3D representation, and **temporal deformation networks** for accurate modeling of dynamic tissue.

The application provides a complete end-to-end pipeline—from raw surgical video to real-time 3D reconstruction—enabling researchers and developers to train custom models on their own endoscopic data and visualize results with GPU-accelerated rendering.
The application provides a complete end-to-end pipeline—from raw surgical video to real-time 3D reconstruction. Researchers and developers can use it to train custom models on their own endoscopic data and visualize results with GPU-accelerated rendering.

### Key Features
Features of this application include:

- **Real-time Visualization:** Stream surgical scene reconstruction at >30 FPS using Holoscan
- **Temporal Deformation:** Accurate per-frame tissue modeling as it deforms over time
Expand All @@ -19,13 +17,9 @@ The application provides a complete end-to-end pipeline—from raw surgical vide
- **Two Operation Modes:** Inference-only (with pre-trained checkpoint) OR train-then-render
- **Production Ready:** Tested and optimized Holoscan pipeline with complete Docker containerization

### What It Does
It takes input from EndoNeRF surgical datasets (RGB images + stereo depth + camera poses + tool masks). It processes the input using multi-frame Gaussian Splatting with a 4D spatiotemporal deformation network. And it outputs real-time 3D tissue reconstruction without surgical instruments.

- **Input:** EndoNeRF surgical dataset (RGB images + stereo depth + camera poses + tool masks)
- **Process:** Multi-frame Gaussian Splatting with 4D spatiotemporal deformation network
- **Output:** Real-time 3D tissue reconstruction without surgical instruments

### Use Cases
It is ideal for use cases, such as:

- Surgical scene understanding and visualization
- Tool-free tissue reconstruction for analysis
Expand All @@ -34,7 +28,7 @@ The application provides a complete end-to-end pipeline—from raw surgical vide

## Quick Start

### Step 1: Clone HoloHub
### Step 1: Clone the HoloHub Repository

```bash
git clone https://github.com/nvidia-holoscan/holohub.git
Expand All @@ -43,31 +37,35 @@ cd holohub

### Step 2: Read and Agree to the Terms and Conditions of the EndoNeRF Sample Dataset

- Read and agree to the [Terms and Conditions](https://docs.google.com/document/d/1P6q2hXoGpVMKeD-PpjYYdZ0Yx1rKZdJF1rXxpobbFMY/edit?usp=share_link) for the EndoNeRF dataset.
- EndoNeRF sample dataset is being downloaded automatically when building the application. For manual download, please refer to the [Data](#data) section below.
- If you do not agree to the terms and conditions, set the `HOLOHUB_DOWNLOAD_DATASETS` environment variable to `OFF` and manually download the dataset and place it in the correct location by following the instructions in the [Data](#data) section below.
1. Read and agree to the [Terms and Conditions](https://docs.google.com/document/d/1P6q2hXoGpVMKeD-PpjYYdZ0Yx1rKZdJF1rXxpobbFMY/edit?usp=share_link) for the EndoNeRF dataset.
1. EndoNeRF sample dataset is being downloaded automatically when building the application.
1. Optionally, for manual download of the dataset, refer to the [Data](#pulling-soft-tissues-dataset) section below.
1. Optionally, if you do not agree to the terms and conditions, set the `HOLOHUB_DOWNLOAD_DATASETS` environment variable to `OFF` and manually download the dataset and place it in the correct location by following the instructions in the [Data](#pulling-soft-tissues-dataset) section below.

```bash
export HOLOHUB_DOWNLOAD_DATASETS=OFF
```
```bash
export HOLOHUB_DOWNLOAD_DATASETS=OFF
```

### Step 3: Run Training

To run the model training:

```bash
./holohub run surgical_scene_recon train
```
./holohub run surgical_scene_recon train
```

### Step 4: Dynamic Rendering with Trained Model
### Step 4: Dynamic Rendering with a Trained Model

After training completes, visualize your results in real-time:
After training completes, to visualize your results in real-time, run the surgical render:

```bash
./holohub run surgical_scene_recon render
```

![Dynamic Rendering Visualization](surg_recon_inference.gif)

## Data
## Pulling Soft Tissues Dataset

This application uses the **EndoNeRF "pulling_soft_tissues" dataset**, which contains:

Expand All @@ -76,36 +74,43 @@ This application uses the **EndoNeRF "pulling_soft_tissues" dataset**, which con
- Tool segmentation masks for instrument removal
- Camera poses and bounds (poses_bounds.npy)

### Download
### Download the Dataset

📦 **Direct Google Drive:** <https://drive.google.com/drive/folders/1zTcX80c1yrbntY9c6-EK2W2UVESVEug8?usp=sharing>
You can download the dataset from one of the following locations:

In the Google Drive folder, you'll see:
* 📦 Direct Google Drive: <https://drive.google.com/drive/folders/1zTcX80c1yrbntY9c6-EK2W2UVESVEug8?usp=sharing>

- `cutting_tissues_twice`
- `pulling_soft_tissues` ← **Download this one**
1. In the Google Drive folder, you'll see:

**Alternative:** Visit the [EndoNeRF repository](https://github.com/med-air/EndoNeRF)
- `cutting_tissues_twice`
- `pulling_soft_tissues`

1. Download `pulling_soft_tissues`.

* Visit the [EndoNeRF repository](https://github.com/med-air/EndoNeRF).

### Dataset Setup

The dataset will be automatically used by the application when placed in the correct location. Refer to the [HoloHub glossary](../../README.md#Glossary) for definitions of HoloHub-specific directory terms used below.

Place the dataset at `<HOLOHUB_ROOT>/data/EndoNeRF/pulling/`:
To place the dataset at `<HOLOHUB_ROOT>/data/EndoNeRF/pulling/`:

```bash
# From the HoloHub root directory
mkdir -p data/EndoNeRF
1. From the HoloHub root directory:
```bash
mkdir -p data/EndoNeRF
```

# Extract and move (or copy) the downloaded dataset
mv /path/to/pulling_soft_tissues data/EndoNeRF/pulling
```
1. Extract and move (or copy) the downloaded dataset:

```bash
mv /path/to/pulling_soft_tissues data/EndoNeRF/pulling
```

**⚠️ Important:** The dataset MUST be physically at the path abovedo NOT use symlinks! Docker containers cannot follow symlinks outside mounted volumes.
**Important:** The dataset MUST be physically at the path above, do NOT use symlinks. Docker containers cannot follow symlinks outside mounted volumes.

### Verify Dataset Structure
### Verify the Dataset Structure

Your dataset should have this structure:
Verify that your dataset has this structure:

```text
<HOLOHUB_ROOT>/
Expand All @@ -118,39 +123,53 @@ Your dataset should have this structure:
└── poses_bounds.npy # Camera poses (8.5 KB)
```

## Model
## Models Used by the `surgical_scene_recon` Application

The `surgical_scene_recon` application uses a **3D Gaussian Splatting** model with a **temporal deformation network** for dynamic scene reconstruction.

The application uses **3D Gaussian Splatting** with a **temporal deformation network** for dynamic scene reconstruction:
- Gaussian Splatting Model

### Gaussian Splatting
Each portion of the application makes use of different aspects of the Gaussian Splatting Model.

- **Architecture:** 3D Gaussians with learned position, scale, rotation, opacity, and color
- **Initialization:** Multi-frame point cloud (~30,000-50,000 points from all frames)
- **Renderer:** gsplat library (CUDA-accelerated differentiable rasterization)
- **Spherical Harmonics:** Degree 3 (16 coefficients per gaussian for view-dependent color)
- **Resolution:** 640×512 pixels (RGB, 3 channels)
- Architecture: 3D Gaussians with learned position, scale, rotation, opacity, and color
- Initialization: Multi-frame point cloud (~30,000-50,000 points from all frames)
- Renderer: `gsplat` library (CUDA-accelerated differentiable rasterization)
- Spherical Harmonics: Degree 3 (16 coefficients per gaussian for view-dependent color)
- Resolution: 640×512 pixels (RGB, three channels)

### Temporal Deformation Network
- Temporal Deformation Network Model

- **Architecture:** HexPlane 4D spatiotemporal grid + MLP decoder
- **Input:** 3D position + normalized time value [0, 1]
- **Output:** Deformed position, scale, rotation, and opacity changes
- **Training:** Two-stage process (coarse: static, fine: with deformation)
- **Inference:** Direct PyTorch (no conversion, full precision)
The Temporal Deformation Network deforms 3D Gaussians over time to model dynamic tissue movement during surgery.

### Training Process
- Architecture: HexPlane 4D spatiotemporal grid + MLP decoder
- Input: 3D position + normalized time value [0, 1]
- Output: Deformed position, scale, rotation, and opacity changes
- Training: Two-stage process (coarse: static, fine: with deformation)
- Inference: Direct PyTorch (no conversion, full precision)

## About the Model Training Process

The application trains in two stages:

1. **Coarse Stage:** Learn base static Gaussians without deformation
2. **Fine Stage:** Add temporal deformation network for dynamic tissue modeling
1. The Coarse Stage where the application learns the base static Gaussian models without deformation.
2. The Fine Stage where a temporal deformation network model is added for dynamic tissue modeling.

The training uses:

- **Multi-modal Data:** RGB images, depth maps, tool segmentation masks
- **Loss Functions:** RGB loss, depth loss, TV loss, masking losses
- **Optimization:** Adam optimizer with batch-size scaled learning rates
- **Tool Removal:** Segmentation masks applied during training for tissue-only reconstruction
- Multi-modal Data: RGB images, depth maps, tool segmentation masks
- Loss Functions: RGB loss, depth loss, TV loss, masking losses
- Optimization: Adam optimizer with batch-size scaled learning rates
- Tool Removal: Segmentation masks applied during training for tissue-only reconstruction

The **training pipeline** (`gsplat_train.py`) runs in the following order:

1. Data Loading using EndoNeRF parser loads RGB, depth, masks, and poses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awkward phrasing with "using" appearing twice.

Suggested change
1. Data Loading using EndoNeRF parser loads RGB, depth, masks, and poses.
1. Data Loading: EndoNeRF parser loads RGB, depth, masks, and poses.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

2. Initialization uses Multi-frame point cloud (~30k points).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awkward phrasing - "Initialization uses Multi-frame point cloud" reads unnaturally.

Suggested change
2. Initialization uses Multi-frame point cloud (~30k points).
2. Initialization: Multi-frame point cloud (~30k points).

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

3. Training happens in two stages:
- Coarse
- Fine
4. Optimization is done by the Adam (Adaptive Moment Estimation) optimizer with batch-size scaled learning rates.
5. Regularization, for depth loss, TV loss, and masking losses, is performed on the data.

The default training command trains a model on all 63 frames with 2000 iterations, producing smooth temporal deformation and high-quality reconstruction.

Expand Down Expand Up @@ -178,60 +197,48 @@ EndoNeRFLoaderOp → GsplatLoaderOp → GsplatRenderOp → HolovizOp
ImageSaverOp
```

**Components:**

- **EndoNeRFLoaderOp:** Streams camera poses and timestamps
- **GsplatLoaderOp:** Loads checkpoint and deformation network
- **GsplatRenderOp:** Applies temporal deformation and renders
- **HolovizOp:** Real-time GPU-accelerated visualization
- **ImageSaverOp:** Optional frame saving

## Requirements
## Requirements for the `surgical_scene_recon` Application

- **Hardware:**
- NVIDIA GPU (RTX 3000+ series recommended, tested on RTX 6000 Ada Generation)
- ~2 GB free disk space (dataset)
- ~30 GB free disk space (Docker container)
- ~2 GB free disk space (for the dataset)
- ~30 GB free disk space (for Docker containers)
- **Software:**
- Docker with NVIDIA GPU support
- X11 display server (for visualization)
- Holoscan SDK 3.7.0 or later (automatically provided in container)
- Holoscan SDK 3.7.0 or later (automatically provided in containers)

## Testing
## Application Integration Testing

We provide integration tests that can be run with the following command to test the application for training and inference:
We provide integration tests.

To test the application for training and inference, run:

```bash
./holohub test surgical_scene_recon --verbose
```

## Technical Details

### Training Pipeline (gsplat_train.py)

1. **Data Loading:** EndoNeRF parser loads RGB, depth, masks, poses
2. **Initialization:** Multi-frame point cloud (~30k points)
3. **Two-Stage Training:**
- **Coarse:** Learn base Gaussians (no deformation)
- **Fine:** Add temporal deformation network
4. **Optimization:** Adam with batch-size scaled learning rates
5. **Regularization:** Depth loss, TV loss, masking losses
## Performance

### Performance
Tested Configuration:

**Tested Configuration:**
- GPU: NVIDIA RTX 6000 Ada Generation
- Container: Holoscan SDK 3.7.0
- Training Time: ~5 minutes (63 frames, 2000 iterations)
- Rendering: Real-time >30 FPS

- **GPU:** NVIDIA RTX 6000 Ada Generation
- **Container:** Holoscan SDK 3.7.0
- **Training Time:** ~5 minutes (63 frames, 2000 iterations)
- **Rendering:** Real-time >30 FPS
Quality Metrics (train mode):

**Quality Metrics (train mode):**

- **PSNR:** ~36-38 dB
- **SSIM:** ~0.80
- **Gaussians:** ~50,000 splats
- **Deformation:** Smooth temporal consistency
- PSNR: ~36-38 dB
- SSIM: ~0.80
- Gaussians: ~50,000 splats
- Deformation: Smooth temporal consistency

## Troubleshooting

Expand Down Expand Up @@ -263,40 +270,40 @@ We provide integration tests that can be run with the following command to test

### Citation

If you use this work, please cite:
If you use this work, cite the following:

**EndoNeRF:**
* EndoNeRF:

```bibtex
@inproceedings{wang2022endonerf,
title={EndoNeRF: Neural Rendering for Stereo 3D Reconstruction of Deformable Tissues in Robotic Surgery},
author={Wang, Yuehao and Yifan, Wang and Tao, Rui and others},
booktitle={MICCAI},
year={2022}
}
```
```bibtex
@inproceedings{wang2022endonerf,
title={EndoNeRF: Neural Rendering for Stereo 3D Reconstruction of Deformable Tissues in Robotic Surgery},
author={Wang, Yuehao and Yifan, Wang and Tao, Rui and others},
booktitle={MICCAI},
year={2022}
}
```

**3D Gaussian Splatting:**
* 3D Gaussian Splatting:

```bibtex
@article{kerbl20233d,
title={3d gaussian splatting for real-time radiance field rendering},
author={Kerbl, Bernhard and Kopanas, Georgios and Leimk{\"u}hler, Thomas and Drettakis, George},
journal={ACM Transactions on Graphics},
year={2023}
}
```
```bibtex
@article{kerbl20233d,
title={3d gaussian splatting for real-time radiance field rendering},
author={Kerbl, Bernhard and Kopanas, Georgios and Leimk{\"u}hler, Thomas and Drettakis, George},
journal={ACM Transactions on Graphics},
year={2023}
}
```

**gsplat Library:**
* `gsplat` Library:

```bibtex
@software{ye2024gsplat,
title={gsplat},
author={Ye, Vickie and Turkulainen, Matias and others},
year={2024},
url={https://github.com/nerfstudio-project/gsplat}
}
```
```bibtex
@software{ye2024gsplat,
title={gsplat},
author={Ye, Vickie and Turkulainen, Matias and others},
year={2024},
url={https://github.com/nerfstudio-project/gsplat}
}
```

### License

Expand Down