Skip to content
Open
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
246 changes: 128 additions & 118 deletions applications/surgical_scene_recon/README.md
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 120/126 the title of Model. What model are we talking here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@finalelement could you please help here?

Original file line number Diff line number Diff line change
@@ -1,16 +1,14 @@
# Surgical Scene Reconstruction with Gaussian Splatting

Real-time 3D surgical scene reconstruction using Gaussian Splatting in a Holoscan streaming pipeline with temporal deformation for accurate tissue modeling.
This application demonstrates real-time 3D surgical scene reconstruction by combining **Holoscan SDK** for high-performance streaming, **3D Gaussian Splatting** for neural 3D representation, and **temporal deformation networks** for accurate modeling of dynamic tissue.

![Training Visualization - Ground Truth vs Rendered](train_gt_animation.gif)

## Overview
![Training Visualization - Ground Truth vs Rendered](train_gt_animation.gif)

This application demonstrates real-time 3D surgical scene reconstruction by combining **Holoscan SDK** for high-performance streaming, **3D Gaussian Splatting** for neural 3D representation, and **temporal deformation networks** for accurate modeling of dynamic tissue.

The application provides a complete end-to-end pipeline—from raw surgical video to real-time 3D reconstruction—enabling researchers and developers to train custom models on their own endoscopic data and visualize results with GPU-accelerated rendering.
The application provides a complete end-to-end pipeline—from raw surgical video to real-time 3D reconstruction. Researchers and developers can use it to train custom models on their own endoscopic data and visualize results with GPU-accelerated rendering.

### Key Features
Features of this application include:

- **Real-time Visualization:** Stream surgical scene reconstruction at >30 FPS using Holoscan
- **Temporal Deformation:** Accurate per-frame tissue modeling as it deforms over time
Expand All @@ -19,13 +17,9 @@ The application provides a complete end-to-end pipeline—from raw surgical vide
- **Two Operation Modes:** Inference-only (with pre-trained checkpoint) OR train-then-render
- **Production Ready:** Tested and optimized Holoscan pipeline with complete Docker containerization

### What It Does
It takes input from EndoNeRF surgical datasets (RGB images + stereo depth + camera poses + tool masks). It processes the input using multi-frame Gaussian Splatting with a 4D spatiotemporal deformation network. And it outputs real-time 3D tissue reconstruction without surgical instruments.

- **Input:** EndoNeRF surgical dataset (RGB images + stereo depth + camera poses + tool masks)
- **Process:** Multi-frame Gaussian Splatting with 4D spatiotemporal deformation network
- **Output:** Real-time 3D tissue reconstruction without surgical instruments

### Use Cases
It is ideal for use cases, such as:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "uses cases" should be "use cases"

Suggested change
It is ideal for use cases, such as:
It is ideal for use cases, such as:


- Surgical scene understanding and visualization
- Tool-free tissue reconstruction for analysis
Expand All @@ -34,7 +28,7 @@ The application provides a complete end-to-end pipeline—from raw surgical vide

## Quick Start

### Step 1: Clone HoloHub
### Step 1: Clone the HoloHub Repository

```bash
git clone https://github.com/nvidia-holoscan/holohub.git
Expand All @@ -43,31 +37,35 @@ cd holohub

### Step 2: Read and Agree to the Terms and Conditions of the EndoNeRF Sample Dataset

- Read and agree to the [Terms and Conditions](https://docs.google.com/document/d/1P6q2hXoGpVMKeD-PpjYYdZ0Yx1rKZdJF1rXxpobbFMY/edit?usp=share_link) for the EndoNeRF dataset.
- EndoNeRF sample dataset is being downloaded automatically when building the application. For manual download, please refer to the [Data](#data) section below.
- If you do not agree to the terms and conditions, set the `HOLOHUB_DOWNLOAD_DATASETS` environment variable to `OFF` and manually download the dataset and place it in the correct location by following the instructions in the [Data](#data) section below.
1. Read and agree to the [Terms and Conditions](https://docs.google.com/document/d/1P6q2hXoGpVMKeD-PpjYYdZ0Yx1rKZdJF1rXxpobbFMY/edit?usp=share_link) for the EndoNeRF dataset.
1. EndoNeRF sample dataset is being downloaded automatically when building the application.
1. Optionally, for manual download of the dataset, refer to the [Data](#pulling-soft-tissues-dataset) section below.
1. Optionally, if you do not agree to the terms and conditions, set the `HOLOHUB_DOWNLOAD_DATASETS` environment variable to `OFF` and manually download the dataset and place it in the correct location by following the instructions in the [Data](#pulling-soft-tissues-dataset) section below.

```bash
export HOLOHUB_DOWNLOAD_DATASETS=OFF
```
```bash
export HOLOHUB_DOWNLOAD_DATASETS=OFF
```

### Step 3: Run Training

To run the model training:

```bash
./holohub run surgical_scene_recon train
```
./holohub run surgical_scene_recon train
Comment on lines 54 to +55
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate command with incorrect indentation

Suggested change
./holohub run surgical_scene_recon train
```
./holohub run surgical_scene_recon train
./holohub run surgical_scene_recon train

```
Comment on lines 54 to +56
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inconsistent indentation - code block is indented but other code blocks in Quick Start section are not

Suggested change
./holohub run surgical_scene_recon train
```
./holohub run surgical_scene_recon train
```
```bash
./holohub run surgical_scene_recon train

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines 53 to +56
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Duplicate command - line 54 and 55 both contain ./holohub run surgical_scene_recon train

Suggested change
```bash
./holohub run surgical_scene_recon train
```
./holohub run surgical_scene_recon train
```
```bash
./holohub run surgical_scene_recon train

Comment on lines 54 to +56
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: duplicate command and broken code block - line 54 is duplicated on line 55, and the closing backticks are on the wrong indentation level

Suggested change
./holohub run surgical_scene_recon train
```
./holohub run surgical_scene_recon train
```
```bash
./holohub run surgical_scene_recon train


### Step 4: Dynamic Rendering with Trained Model
### Step 4: Dynamic Rendering with a Trained Model

After training completes, visualize your results in real-time:
After training completes, to visualize your results in real-time, run the surgical render:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"run the surgical render" is awkward - not a standard technical term

Suggested change
After training completes, to visualize your results in real-time, run the surgical render:
After training completes, to visualize your results in real-time:

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!


```bash
./holohub run surgical_scene_recon render
```

![Dynamic Rendering Visualization](surg_recon_inference.gif)

## Data
## Pulling Soft Tissues Dataset

This application uses the **EndoNeRF "pulling_soft_tissues" dataset**, which contains:

Expand All @@ -76,36 +74,43 @@ This application uses the **EndoNeRF "pulling_soft_tissues" dataset**, which con
- Tool segmentation masks for instrument removal
- Camera poses and bounds (poses_bounds.npy)

### Download
### Download the Dataset

You can download the dataset from one of the following locations:

📦 **Direct Google Drive:** <https://drive.google.com/drive/folders/1zTcX80c1yrbntY9c6-EK2W2UVESVEug8?usp=sharing>
* 📦 Direct Google Drive: <https://drive.google.com/drive/folders/1zTcX80c1yrbntY9c6-EK2W2UVESVEug8?usp=sharing>

In the Google Drive folder, you'll see:
1. In the Google Drive folder, you'll see:

- `cutting_tissues_twice`
- `pulling_soft_tissues` ← **Download this one**
- `cutting_tissues_twice`
- `pulling_soft_tissues`

1. Download `pulling_soft_tissues`.

**Alternative:** Visit the [EndoNeRF repository](https://github.com/med-air/EndoNeRF)
* Visit the [EndoNeRF repository](https://github.com/med-air/EndoNeRF).

### Dataset Setup

The dataset will be automatically used by the application when placed in the correct location. Refer to the [HoloHub glossary](../../README.md#Glossary) for definitions of HoloHub-specific directory terms used below.

Place the dataset at `<HOLOHUB_ROOT>/data/EndoNeRF/pulling/`:
To place the dataset at `<HOLOHUB_ROOT>/data/EndoNeRF/pulling/`:

```bash
# From the HoloHub root directory
mkdir -p data/EndoNeRF
1. From the HoloHub root directory:
```bash
mkdir -p data/EndoNeRF
```

# Extract and move (or copy) the downloaded dataset
mv /path/to/pulling_soft_tissues data/EndoNeRF/pulling
```
1. Extract and move (or copy) the downloaded dataset:

**⚠️ Important:** The dataset MUST be physically at the path above—do NOT use symlinks! Docker containers cannot follow symlinks outside mounted volumes.
```bash
mv /path/to/pulling_soft_tissues data/EndoNeRF/pulling
```

### Verify Dataset Structure
**Important:** The dataset MUST be physically at the path above, do NOT use symlinks. Docker containers cannot follow symlinks outside mounted volumes.

Your dataset should have this structure:
### Verify the Dataset Structure

Verify that your dataset has this structure:

```text
<HOLOHUB_ROOT>/
Expand All @@ -118,39 +123,55 @@ Your dataset should have this structure:
└── poses_bounds.npy # Camera poses (8.5 KB)
```

## Model
## Models Used by the `surgical_scene_recon` Application

The `surgical_scene_recon` application uses a **3D Gaussian Splatting** model with a **temporal deformation network** for dynamic scene reconstruction.


The application uses **3D Gaussian Splatting** with a **temporal deformation network** for dynamic scene reconstruction:
- Gaussian Splatting Model

### Gaussian Splatting
Each portion of the application makes use of different aspects of the Gaussian Splatting Model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is vague and doesn't add meaningful information. What "different aspects" does it refer to? Consider removing or making it more specific.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes a vague lead in -- twas my attempt to try to add some explanation for what the bullet list is trying to be. Ideal would be adding other words that help make the bullets make sense, but I need some feedback on what the list is trying to accomplish


- **Architecture:** 3D Gaussians with learned position, scale, rotation, opacity, and color
- **Initialization:** Multi-frame point cloud (~30,000-50,000 points from all frames)
- **Renderer:** gsplat library (CUDA-accelerated differentiable rasterization)
- **Spherical Harmonics:** Degree 3 (16 coefficients per gaussian for view-dependent color)
- **Resolution:** 640×512 pixels (RGB, 3 channels)
- Architecture: 3D Gaussian with learned position, scale, rotation, opacity, and color
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inconsistent capitalization - "Gaussian" should be capitalized consistently with the rest of the document

Suggested change
- Architecture: 3D Gaussian with learned position, scale, rotation, opacity, and color
- Architecture: 3D Gaussians with learned position, scale, rotation, opacity, and color

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3D Gaussian should be 3D Gaussians (plural) to match the original description and be technically correct, as the model uses multiple Gaussian primitives.

Suggested change
- Architecture: 3D Gaussian with learned position, scale, rotation, opacity, and color
- Architecture: 3D Gaussians with learned position, scale, rotation, opacity, and color

- Initialization: Multi-frame point cloud (~30,000-50,000 points from all frames)
- Renderer: `gsplat` library (CUDA-accelerated differentiable rasterization)
- Spherical Harmonics: Degree 3 (16 coefficients per gaussian for view-dependent color)
- Resolution: 640×512 pixels (RGB, three channels)

### Temporal Deformation Network
- Temporal Deformation Network model

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description incorrect - mentions MRI data and forecast future frames, but this application uses surgical endoscopy video for tissue reconstruction, not MRI

Suggested change
The Temporal Deformation Network enables dynamic scene modeling by deforming the base Gaussian representations over time to accurately capture tissue movement and deformation during surgery.

- **Architecture:** HexPlane 4D spatiotemporal grid + MLP decoder
- **Input:** 3D position + normalized time value [0, 1]
- **Output:** Deformed position, scale, rotation, and opacity changes
- **Training:** Two-stage process (coarse: static, fine: with deformation)
- **Inference:** Direct PyTorch (no conversion, full precision)
The Temporal Deformation Network deforms 3D Gaussians over time to model dynamic tissue movement during surgery.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incorrect description for Temporal Deformation Network - mentions MRI data and TGN which are not part of this surgical endoscopy application

Suggested change
The Temporal Deformation Network deforms 3D Gaussians over time to model dynamic tissue movement during surgery.
The Temporal Deformation Network uses a HexPlane-based 4D spatiotemporal grid representation to model tissue deformation over time in surgical scenes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I think this part is going to continue to make the AI mad. The AI and myself are both attempting to add clarity but I think we need an actual subject matter expert to weigh in. Please help us.


### Training Process

- **Architecture:** HexPlane 4D spatiotemporal grid + MLP decoder
- **Input:** 3D position + normalized time value [0, 1]
- **Output:** Deformed position, scale, rotation, and opacity changes
- **Training:** Two-stage process (coarse: static, fine: with deformation)
- **Inference:** Direct PyTorch (no conversion, full precision)

## About the Model Training Process

The application trains in two stages:

1. **Coarse Stage:** Learn base static Gaussians without deformation
2. **Fine Stage:** Add temporal deformation network for dynamic tissue modeling
1. The Coarse Stage where the application learns the base static Gaussian models without deformation.
2. The Fine Stage where a temporal deformation network model is added for dynamic tissue modeling.

The training uses:

- **Multi-modal Data:** RGB images, depth maps, tool segmentation masks
- **Loss Functions:** RGB loss, depth loss, TV loss, masking losses
- **Optimization:** Adam optimizer with batch-size scaled learning rates
- **Tool Removal:** Segmentation masks applied during training for tissue-only reconstruction
- Multi-modal Data: RGB images, depth maps, tool segmentation masks
- Loss Functions: RGB loss, depth loss, TV loss, masking losses
- Optimization: Adam optimizer with batch-size scaled learning rates
- Tool Removal: Segmentation masks applied during training for tissue-only reconstruction

The **training pipeline** (`gsplat_train.py`) runs in the following order:

1. Data Loading using EndoNeRF parser loads RGB, depth, masks, and poses.
2. Initialization uses Multi-frame point cloud (~30k points).
3. Training happens in two stages:
- Coarse
- Fine
4. Optimization is done by the Adam (Adaptive Moment Estimation) optimizer with batch-size scaled learning rates.
5. Regularization, for depth loss, TV loss, and masking losses, is performed on the data.

The default training command trains a model on all 63 frames with 2000 iterations, producing smooth temporal deformation and high-quality reconstruction.

Expand Down Expand Up @@ -178,60 +199,49 @@ EndoNeRFLoaderOp → GsplatLoaderOp → GsplatRenderOp → HolovizOp
ImageSaverOp
```

**Components:**

- **EndoNeRFLoaderOp:** Streams camera poses and timestamps
- **GsplatLoaderOp:** Loads checkpoint and deformation network
- **GsplatRenderOp:** Applies temporal deformation and renders
- **HolovizOp:** Real-time GPU-accelerated visualization
- **ImageSaverOp:** Optional frame saving

## Requirements
## Requirements for the `surgical_scene_recon` Application

- **Hardware:**
- NVIDIA GPU (RTX 3000+ series recommended, tested on RTX 6000 Ada Generation)
- ~2 GB free disk space (dataset)
- ~30 GB free disk space (Docker container)
- ~2 GB free disk space (for the dataset)
- ~30 GB free disk space (for Docker containers)
- **Software:**
- Docker with NVIDIA GPU support
- X11 display server (for visualization)
- Holoscan SDK 3.7.0 or later (automatically provided in container)
- Holoscan SDK 3.7.0 or later (automatically provided in containers)

## Application Integration Testing

## Testing
We provide integration tests.

We provide integration tests that can be run with the following command to test the application for training and inference:
To test the application for training and inference, run:

```bash
./holohub test surgical_scene_recon --verbose
```

## Technical Details

### Training Pipeline (gsplat_train.py)

1. **Data Loading:** EndoNeRF parser loads RGB, depth, masks, poses
2. **Initialization:** Multi-frame point cloud (~30k points)
3. **Two-Stage Training:**
- **Coarse:** Learn base Gaussians (no deformation)
- **Fine:** Add temporal deformation network
4. **Optimization:** Adam with batch-size scaled learning rates
5. **Regularization:** Depth loss, TV loss, masking losses
## Performance

### Performance
Tested Configuration:

**Tested Configuration:**
- GPU: NVIDIA RTX 6000 Ada Generation
- Container: Holoscan SDK 3.7.0
- Training Time: ~5 minutes (63 frames, 2000 iterations)
- Rendering: Real-time >30 FPS

- **GPU:** NVIDIA RTX 6000 Ada Generation
- **Container:** Holoscan SDK 3.7.0
- **Training Time:** ~5 minutes (63 frames, 2000 iterations)
- **Rendering:** Real-time >30 FPS
Quality Metrics (train mode):

**Quality Metrics (train mode):**

- **PSNR:** ~36-38 dB
- **SSIM:** ~0.80
- **Gaussians:** ~50,000 splats
- **Deformation:** Smooth temporal consistency
- PSNR: ~36-38 dB
- SSIM: ~0.80
- Gaussians: ~50,000 splats
- Deformation: Smooth temporal consistency

## Troubleshooting

Expand Down Expand Up @@ -263,40 +273,40 @@ We provide integration tests that can be run with the following command to test

### Citation

If you use this work, please cite:
If you use this work, cite the following:

**EndoNeRF:**
* EndoNeRF:

```bibtex
@inproceedings{wang2022endonerf,
title={EndoNeRF: Neural Rendering for Stereo 3D Reconstruction of Deformable Tissues in Robotic Surgery},
author={Wang, Yuehao and Yifan, Wang and Tao, Rui and others},
booktitle={MICCAI},
year={2022}
}
```
```bibtex
@inproceedings{wang2022endonerf,
title={EndoNeRF: Neural Rendering for Stereo 3D Reconstruction of Deformable Tissues in Robotic Surgery},
author={Wang, Yuehao and Yifan, Wang and Tao, Rui and others},
booktitle={MICCAI},
year={2022}
}
```

**3D Gaussian Splatting:**
* 3D Gaussian Splatting:

```bibtex
@article{kerbl20233d,
title={3d gaussian splatting for real-time radiance field rendering},
author={Kerbl, Bernhard and Kopanas, Georgios and Leimk{\"u}hler, Thomas and Drettakis, George},
journal={ACM Transactions on Graphics},
year={2023}
}
```
```bibtex
@article{kerbl20233d,
title={3d gaussian splatting for real-time radiance field rendering},
author={Kerbl, Bernhard and Kopanas, Georgios and Leimk{\"u}hler, Thomas and Drettakis, George},
journal={ACM Transactions on Graphics},
year={2023}
}
```

**gsplat Library:**
* `gsplat` Library:

```bibtex
@software{ye2024gsplat,
title={gsplat},
author={Ye, Vickie and Turkulainen, Matias and others},
year={2024},
url={https://github.com/nerfstudio-project/gsplat}
}
```
```bibtex
@software{ye2024gsplat,
title={gsplat},
author={Ye, Vickie and Turkulainen, Matias and others},
year={2024},
url={https://github.com/nerfstudio-project/gsplat}
}
```

### License

Expand Down