A comprehensive implementation of monocular depth estimation using PyTorch, based on the Monodepth2 architecture. This project was developed as a solution to Problem 2: Monocular Depth Estimation Algorithm Implementation for a company assignment.
Objective: Estimate depth data using only RGB images without using depth data from sensors.
Technical Specifications:
- Implemented in Python
- No depth data from sensors used (self-supervised learning)
- Open-source libraries/tools used (PyTorch, OpenCV, etc.)
- Real-time depth estimation capability
- Comprehensive evaluation metrics
Optional Features Implemented:
- Real-time depth estimation (demonstrated in demos)
- Advanced loss functions and optimization
- Multi-scale depth prediction
- Comprehensive visualization tools
- Size: 534 images (outdoor scenes)
- Resolution: 2272 × 1704 (downsampled for depth to ~55 × 305)
- Type: Outdoor scenes with corresponding depth maps
Why This Dataset Fits:
- Extremely lightweight and suitable for demonstration
- Can be trained quickly on a laptop or entry-level GPU
- Enough data for demonstrating monocular depth estimation pipelines
- Perfect for assignment requirements
Download Link: Make3D Dataset
- ResNet-based Encoder: Supports ResNet-18, ResNet-34, and ResNet-50 backbones
- Multi-scale Decoder: Produces depth predictions at multiple scales
- Skip Connections: Optional skip connections for better detail preservation
- Comprehensive Loss Functions: Photometric, smoothness, and edge-aware losses
- Real-time Inference: Optimized for real-time depth estimation
- Evaluation Metrics: RMSE, MAE, AbsRel, SqRel, and δ accuracy metrics
- Visualization Tools: Depth map visualization and comparison tools
- Dataset Support: Custom dataset loader with augmentation support
The project includes comprehensive demo videos showcasing the capabilities:
Features demonstrated:
- Complete demonstration of the monocular depth estimation system
- Real-time processing capabilities
- Model architecture overview
- Training and inference pipeline
Features demonstrated:
- Batch processing of multiple images
- Performance optimization techniques
- Scalability demonstration
Features demonstrated:
- Side-by-side comparisons of different approaches
- Performance metrics visualization
- Quality assessment demonstrations
For high-quality viewing, the complete demo videos are available on Google Drive: Download Full Demo Videos
Available videos:
- main_demo.mp4 (130.2 MB) - Complete system demonstration
- batch_processing_demo.mp4 (28.2 MB) - Performance optimization demo
- comparison_demo.mp4 (49.6 MB) - Quality assessment comparisons
Note: The GIFs above provide quick previews, while the full MP4 videos offer higher quality and better detail for detailed analysis.
monocular_depth_estimation_v1/
├── models/
│ ├── encoder.py # ResNet encoder implementation
│ ├── decoder.py # Multi-scale decoder
│ └── monodepth.py # Complete model architecture
├── utils/
│ ├── dataset.py # Dataset loading and augmentation
│ ├── losses.py # Loss function implementations
│ └── metrics.py # Evaluation metrics
├── demo/ # Demo videos (GIF format)
│ ├── main_demo.gif # Main demonstration video
│ ├── batch_processing_demo.gif # Batch processing demo
│ └── comparison_demo.gif # Comparison demo
├── demo.py # Real-time demo script
├── train.py # Training script
├── train_improved.py # Improved training script
├── inference.py # Inference script
├── test_trained_model.py # Model testing script
├── prepare_dataset.py # Dataset preparation script
├── split_dataset.py # Dataset splitting utility
├── test_dataset.py # Dataset testing script
├── test_implementation.py # Implementation testing
└── requirements.txt # Dependencies
- Clone the repository:
git clone https://github.com/omrode1/monocular_depth_estimation_v1.git
cd monocular_depth_estimation_v1
- Create and activate a virtual environment:
python -m venv venv
# On Windows:
.\venv\Scripts\Activate.ps1
# On Linux/Mac:
source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
-
Download the dataset from Make3D Dataset
-
Prepare the dataset:
python prepare_dataset.py --data_path path/to/make3d/dataset
- Split the dataset:
python split_dataset.py --data_path data/paired --train_ratio 0.8
- Test the dataset:
python test_dataset.py --data_path data/paired
After preparation, the dataset structure should be:
data/
├── paired/
│ ├── train/
│ │ ├── images/ # Training images
│ │ └── depth/ # Training depth maps
│ └── val/
│ ├── images/ # Validation images
│ └── depth/ # Validation depth maps
python train.py \
--data_path data/paired \
--dataset_type general \
--num_layers 18 \
--epochs 50 \
--batch_size 8 \
--lr 0.0001 \
--save_dir checkpoints \
--log_dir logs
python train_improved.py \
--data_path data/paired \
--num_layers 34 \
--epochs 100 \
--batch_size 4 \
--lr 0.0001 \
--supervised \
--save_dir checkpoints_improved
--data_path
: Path to the dataset directory--dataset_type
: Dataset type ('general' or 'kitti')--num_layers
: ResNet depth (18, 34, or 50)--epochs
: Number of training epochs--batch_size
: Batch size (adjust based on GPU memory)--lr
: Learning rate--supervised
: Use supervised training with ground truth depth--save_dir
: Directory to save model checkpoints--log_dir
: Directory for TensorBoard logs
tensorboard --logdir logs
python inference.py \
--model_path checkpoints/best_model.pth \
--image_path path/to/image.jpg \
--output_path output_depth.png
python demo.py \
--model_path checkpoints/best_model.pth \
--camera_id 0
python test_trained_model.py \
--model_path checkpoints/best_model.pth \
--data_path data/paired \
--split val
- ResNet backbone (18/34/50 layers)
- Pre-trained on ImageNet
- Extracts multi-scale features
- Multi-scale depth prediction
- Skip connections (optional)
- Upsampling with transposed convolutions
- Photometric Loss: RGB reconstruction error
- Smoothness Loss: Depth smoothness regularization
- Edge-aware Loss: Preserves depth discontinuities
- Supervised Loss: Direct depth supervision (optional)
- RMSE: Root Mean Square Error
- MAE: Mean Absolute Error
- AbsRel: Absolute Relative Error
- SqRel: Square Relative Error
- δ < 1.25: Percentage of pixels with relative error < 1.25
python test_trained_model.py \
--model_path checkpoints/best_model.pth \
--data_path data/paired \
--split val \
--save_results
- Webcam integration for live depth estimation
- Optimized inference pipeline
- Real-time visualization
- Efficient processing of multiple images
- Progress tracking and logging
- Quality assessment
- Side-by-side comparisons
- Color-coded depth maps
- Performance metrics display
The model achieves competitive performance on the Make3D dataset:
- Laptop: Windows 11 Home Single Language
- Processor: AMD Ryzen 5 4600H with Radeon Graphics (6 cores, 3.0 GHz)
- Memory: 7.5 GB RAM
- GPU: NVIDIA GeForce GTX 1650 Ti (4GB VRAM)
- Integrated Graphics: AMD Radeon(TM) Graphics (512MB)
- Training Time: less 40 mins hours on GTX 1650 Ti
- Inference Speed: ~30-50 FPS on GPU
- Memory Usage: ~2-4 GB GPU memory
- Model Size: ~50-100 MB depending on backbone
- Batch Size: 4-8 (optimized for 4GB VRAM)
torch>=1.9.0
torchvision>=0.10.0
numpy>=1.21.0
opencv-python>=4.5.0
matplotlib>=3.4.0
Pillow>=8.3.0
tqdm>=4.62.0
tensorboard>=2.7.0
albumentations>=1.1.0
scipy>=1.7.0
- Self-supervised Learning: Successfully implemented depth estimation without depth sensors
- Real-time Performance: Achieved real-time inference capabilities
- Lightweight Solution: Optimized for resource-constrained environments
- Comprehensive Evaluation: Multiple metrics and visualization tools
- Production Ready: Clean code structure with proper documentation
- Make3D Dataset: Cornell University for providing the dataset
- PyTorch: Deep learning framework
- OpenCV: Computer vision library
Below are the key test and comparison images generated during model evaluation:
Test Results:
Test Image 1 | Test Image 2 | Test Image 3 |
---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Aspect Ratio Comparison:
Approach Comparison:
Each image above demonstrates the model's performance on different test cases, aspect ratio handling, and comparison between different approaches or settings.