AIM-Harvard · ibro45 · Dec 9, 2025 · Dec 9, 2025 · Dec 9, 2025 · Dec 9, 2025
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,13 @@
 *.pyc
 __pycache__
 
+**/lightning_logs/
+**/checkpoints/
+.DS_Store
+.env
+.vscode/
+.idea/
+.ipynb_checkpoints/
+.venv/
+.ruff_cache/
+.claude/
diff --git a/README.md b/README.md
@@ -1,77 +1,171 @@
 # DINOv2-3D: Self-Supervised 3D Vision Transformer Pretraining
 
-A configuration-first (and therefore easily understandable and trackable) repository for a 3D implementation od DINOv2. Based on the implementations from Lightly (Thank you!) and integrated with Pytorch Lightning. 3D capabilities of this implementation are largely through MONAI's functionalities
+A configuration-driven repository for 3D DINOv2 self-supervised learning. Built with [Lighter](https://github.com/project-lighter/lighter), PyTorch Lightning, and MONAI.
 
-## What you can do with this Repo
-- Train your own 3D Dinov2 on CT, MRI, PET data, etc. with very little configuration other than whats been provided. 
-- Use state of the art PRIMUS transformer in medical segmentation to pretrain your DINOV2
-- Make a baseline for DinoV2 to improve and build on.
-- Change elements of the framework through modular extensions. 
+## What You Can Do with This Repo
+- Train your own 3D DINOv2 on CT, MRI, PET data, etc. with minimal configuration
+- Use state-of-the-art PRIMUS transformer for medical imaging pretraining
+- Make a baseline for DINOv2 to improve and build on
+- Change elements of the framework through modular extensions
 
 ## Features
 - DINOv2-style self-supervised learning with teacher-student models
-- Block masking for 3D volumes 
+- Block masking for 3D volumes
 - Flexible 3D augmentations (global/local views) courtesy of MONAI
-- PyTorch Lightning training loop 
-- YAML-based experiment configuration that is explainable at a glance due to its abstraction!
-
+- PyTorch Lightning training loop
+- YAML-based experiment configuration powered by Lighter
 
 ## Installation
+
 1. Clone the repository:
    ```bash
    git clone https://github.com/AIM-Harvard/DINOv2-3D-Med.git
-   cd DINOv2_3D
+   cd DINOv2-3D-Med
    ```
-2. Create a virtual environment with UV(recommended):
+
+2. Create a virtual environment with UV (recommended):
    ```bash
    uv venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    ```
+
 3. Install dependencies:
    ```bash
    uv sync
    ```
 
-If you do not want to use uv, you could just as easily do a `pip install -e .` in the repo directory
+If you do not want to use uv, you can use `pip install -e .` instead.
 
 ## Usage
+
 ### Training
-Run the training script with the default training config:
+
+Run training with the default configuration:
 ```bash
-python -m scripts.run fit --config_file=./configs/train.yaml,./configs/models/primus.yaml,./configs/datasets/amos.yaml
+lighter fit configs/train.yaml configs/models/primus.yaml configs/datasets/amos.yaml
 ```
 
-Here the train.yaml contains most of the heart of the configuration. primus.yaml provides the backbone to use for DINOv2 and amos.yaml provides the path to the dataset to be used.
+Override parameters directly from the CLI:
+```bash
+lighter fit configs/train.yaml configs/models/primus.yaml configs/datasets/amos.yaml \
+  trainer::max_epochs=50 \
+  model::base_lr=0.0005 \
+  data::batch_size=4
+```
+
+### Prediction
 
+```bash
+lighter predict configs/predict.yaml
+```
 
 ### Configuration
-- All experiment settings (model, trainer, data) are defined in YAML configs.
-- `configs/train.yaml`: Main training configuration with complete setup
-- `configs/predict.yaml`: Configuration for inference/prediction tasks
+
+Lighter uses YAML configs with powerful features:
+
+- **Variable references**: `%vars::hidden_size` - reference shared variables
+- **Cross-section references**: `%trainer::max_epochs` - reference other config sections
+- **Python expressions**: `$int(%trainer::max_epochs * 0.03)` - compute values dynamically
+- **Object instantiation**: `_target_: module.ClassName` - create objects from config
+
+#### Config Structure
+
+```
+configs/
+├── train.yaml           # Main training configuration
+├── predict.yaml         # Inference configuration
+├── dinotxt_stage.yaml   # Image-text alignment training
+├── models/
+│   ├── primus.yaml      # PRIMUS backbone
+│   └── vit.yaml         # MONAI ViT backbone
+└── datasets/
+    ├── amos.yaml        # AMOS dataset
+    └── idc_dump.yaml    # IDC dataset
+```
+
+Configs are composable - pass multiple files and they merge in order:
+```bash
+lighter fit base.yaml model.yaml dataset.yaml  # Later files override earlier ones
+```
+
+## Path Configuration
+
+Each config file defines its paths in the `vars:` section at the top for easy customization:
+
+| Config | Variable | Description |
+|--------|----------|-------------|
+| `train.yaml` | `experiments_dir` | Output directory for checkpoints and logs |
+| `dinotxt_stage.yaml` | `experiments_dir` | Output directory for checkpoints and logs |
+| `predict.yaml` | `amos_dataset` | Path to AMOS dataset |
+| `datasets/amos.yaml` | `amos_dataset` | Path to AMOS dataset |
+| `datasets/idc_dump.yaml` | `idc_dataset` | Path to IDC dataset |
+
+Override paths from the CLI:
+```bash
+lighter fit configs/train.yaml configs/models/primus.yaml configs/datasets/amos.yaml \
+    vars::experiments_dir=/your/output/path
+
+lighter fit configs/train.yaml configs/models/primus.yaml configs/datasets/idc_dump.yaml \
+    vars::idc_dataset=/your/idc/data/path
+```
 
 ## Data Preparation
 
-For now, to run a straightforward DINOv2 pipeline, all you need to do is setup your data paths in a JSON in the MONAI format. 
+Create a JSON file in MONAI format:
 
-It looks something like this
+```json
+{
+   "training": [
+      {"image": "/path/to/image1.nii.gz"},
+      {"image": "/path/to/image2.nii.gz"}
+   ]
+}
+```
+
+If you need more complex data loading (e.g., with labels for sampling), extend the JSON:
 
 ```json
 {
    "training": [
-      {"image": <path_to_image>},
-      ....
+      {"image": "/path/to/image.nii.gz", "label": "/path/to/label.nii.gz"}
    ]
 }
 ```
-If you'd like to do more complex manipulations like sample based on a mask and so on, you can easily extend this json to include a "label" in addition to the image and use MONAI transforms to sample as you like.
+
+Then update your dataset config or override from CLI:
+```bash
+lighter fit configs/train.yaml \
+  "data::train_dataset::dataset::data=\$monai.auto3dseg.datafold_read('/path/to/dataset.json', basedir='/path/to/data', key='training')[0]"
+```
+
+## Project Structure
+
+```
+DINOv2-3D-Med/
+├── __lighter__.py           # Lighter marker (enables project.* imports)
+├── configs/                 # YAML configurations
+├── models/                  # Model architectures
+│   ├── meta_arch.py         # DINOv2 teacher-student architecture
+│   └── backbones/           # PRIMUS, ViT, EVA backbones
+├── training/                # Lightning modules
+│   ├── dinov2_lightning_module.py
+│   ├── dinotxt_lightning_module.py
+│   └── data_module.py
+├── transforms/              # Data augmentations
+│   ├── dinov2_aug.py        # DINOv2 3D augmentations
+│   └── blockmask.py         # Block masking for iBOT
+├── losses/                  # Loss functions
+│   └── dino.py              # DINOv2 + iBOT + KoLeo losses
+└── utils/                   # Utilities
+```
 
 ## References
+- [Lighter](https://github.com/project-lighter/lighter)
 - [Lightly](https://github.com/lightly-ai/lightly)
 - [DINOv2 (Facebook Research)](https://github.com/facebookresearch/dinov2)
 - [MONAI (Medical Open Network for AI)](https://github.com/Project-MONAI/MONAI)
 - [PyTorch Lightning](https://www.pytorchlightning.ai/)
 
-
 ## License
 Copyright &copy; 2025 Suraj Pai, Vasco Prudente
 

diff --git a/__lighter__.py b/__lighter__.py
@@ -0,0 +1,2 @@
+# Lighter marker file - enables `project.*` imports
+# See: https://github.com/project-lighter/lighter
diff --git a/configs/datasets/amos.yaml b/configs/datasets/amos.yaml
@@ -1 +1,13 @@
-data_module#train_dataset#data: "$monai.auto3dseg.datafold_read('/mnt/data1/datasets/AMOS/amos22/dataset.json', basedir='/mnt/data1/datasets/AMOS/amos22', key='training')[0]"
+# AMOS Dataset Configuration
+# Multi-organ segmentation dataset
+
+_imports_:
+    monai: monai
+
+vars:
+    amos_dataset: "/mnt/data1/datasets/AMOS/amos22"
+
+data:
+    train_dataset:
+        dataset:
+            data: "$monai.auto3dseg.datafold_read(@vars::amos_dataset + '/dataset.json', basedir=@vars::amos_dataset, key='training')[0]"
diff --git a/configs/datasets/idc_dump.yaml b/configs/datasets/idc_dump.yaml
@@ -1 +1,13 @@
-data_module#train_dataset#dataset#data: "$monai.auto3dseg.datafold_read('/mnt/ssd1/ibro/IDC_SSL_CT/idc_dump_datalist.json', basedir='', key='training')[0]"
+# IDC Dataset Configuration
+# Imaging Data Commons CT dataset
+
+_imports_:
+    monai: monai
+
+vars:
+    idc_dataset: "/mnt/ssd1/ibro/IDC_SSL_CT"
+
+data:
+    train_dataset:
+        dataset:
+            data: "$monai.auto3dseg.datafold_read(@vars::idc_dataset + '/idc_dump_datalist.json', basedir='', key='training')[0]"
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		# Lighter marker file - enables `project.*` imports
		# See: https://github.com/project-lighter/lighter