nano-JEPA: a Video Joint Embedding Predictive Architecture that runs in a regular computer. Based on V-JEPA. Read the paper about our work. Consider using nano-datasets to create your dataset ;)
(base) conda create -n nano-jepa python=3.9
(base) conda activate nano-jepa
# Install PyTorch on hardware that contains GPUs
# (nano-jepa) conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
# Install PyTorch only using CPUs
(nano-jepa) conda install pytorch torchvision torchaudio cpuonly -c pytorch
(nano-jepa) python setup.py install
A set of config files are provided in this repo. Change the paths ins the *.yaml file using the guidelines in the next section.
# unsupervised training
(nano-jepa)$ python -m app.train_nano_jepa --fname configs/pretrain/vitt.yaml
# video evaluation
(nano-jepa)$ python -m evals.eval_video_nano_jepa --fname configs/evals/vitt16_k400_16x8x3.yaml
# image evaluation
(nano-jepa)$ python -m evals.eval_image_nano_jepa --fname configs/evals/vitt16_in1k.yaml
# video inference
(nano-jepa)$ python -m evals.infer_video_classification --fname configs/infer/infer_vitt_k400x8x3.yaml
# visualize feature (work in progress)
(nano-jepa)$ python -m evals.eval_features
Consider the nano-datasets tool to create your local dataset.
- A Windows user path: C:\Users\your-user\Documents\ML-datasets\video_datasets\k400\k400_file_index.csv
- A Linux user path: /home/your-user/Documents/ML-datasets/video_datasets/k400/k400_file_index.csv
- A Windows user path: C:\Users\your-user\Documents\ML-logging
- A Linux user path: /home/your-user/Documents/ML-logging
Read the PDF file here
Paper details:
- Title: "nano-JEPA: Democratizing Video Understanding with Personal Computers"
- Authors: Adrián Rostagno, Javier Iparraguirre, Joel Ermantraut, Lucas Tobio, Segundo Foissac, Santiago Aggio, Guillermo Friedrich
- Event: XXV WASI – Workshop Agentes y Sistemas Inteligentes, CACIC.
- Year: 2024.
Bibtex:
@inproceedings{ermantraut2020resolucion,
title={nano-JEPA: Democratizing Video Understanding with Personal Computer},
author={Adrian Rostagno and Javier Iparraguirre and Joel Ermantraut and Lucas Tobio and Segundo Foissac and Santiago Aggio and Guillermo Friedrich},
booktitle={XXV WASI – Workshop Agentes y Sistemas Inteligentes, CACIC},
year={2024}
}
Here is a list of checkpoints available for experimentation:
Model | Classes | Train Videos | Val videos | Ratio | Accuracy | Epochs | Pre-train |
---|---|---|---|---|---|---|---|
A (download) | 4 | 100 | 50 | 0.5 | 45.50 | 20 | nano-JEPA ViT-T 800 videos(download) |
B (download) | 4 | 25 | 12 | 0.48 | 35.41 | 10 | |
C (download) | 8 | 100 | 50 | 0.5 | 41.50 | 20 | |
D (download) | 4 | 100 | 50 | 0.5 | 99.50 | 6 | V-JEPA ViT-L (see V-JEPA site) |
E (download) | 4 | 25 | 12 | 0.48 | 91.66 | 6 | |
F (download) | 8 | 100 | 50 | 0.5 | 94.25 | 6 |