Vision Transformer (ViT) Implementation and Video Attention Visualization

This project implements a Vision Transformer (ViT) model from scratch and includes a script for visualizing attention on video frames. The implementation is based on the original ViT paper and inspired by various educational resources.

Project Structure

vision_transformers.py: Contains the implementation of the ViT model.
run.py: Script for applying attention visualization to video frames using a trained ViT model.

Features

Vision Transformer (ViT) implementation from scratch
Training script for the ViT model on image classification tasks
Video processing script to visualize attention maps on video frames

Usage

Train the ViT model:
```
python vision_transformer.py
```
Visualize attention on a video:
```
python video_attention_visualization.py
```
Make sure to update the experiment_name and input_video_path in the script before running.

Requirements

PyTorch
torchvision
numpy
matplotlib
OpenCV (cv2)

Credits

This project is inspired by and adapted from the following resources:

Implementing Vision Transformer (ViT) from Scratch by Tin Nguyen
- This article provided the foundation for our ViT implementation and helped structure the code.
Let's build GPT: from scratch, in code, spelled out by Andrej Karpathy
- While this video focuses on GPT, it offers valuable insights into transformer architecture and implementation details that were helpful in understanding and adapting the ViT model.

Additional

Tiger Video by Shailesh Kashid on Pexels

Additional Resources

For a deeper understanding of Vision Transformers and attention mechanisms, we recommend the following:

Attention Is All You Need - The paper introducing the transformer architecture

Contact

If you have any questions or feedback, please open an issue in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
experiments/vit-with-100-epochs		experiments/vit-with-100-epochs
.gitignore		.gitignore
README.md		README.md
attention.mp4		attention.mp4
attention.png		attention.png
metrics.png		metrics.png
run.py		run.py
tyger.mp4		tyger.mp4
vision_transformers.py		vision_transformers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Transformer (ViT) Implementation and Video Attention Visualization

Project Structure

Features

Usage

Requirements

Credits

Additional Resources

Contact

About

Languages

Davidwarchy/vit

Folders and files

Latest commit

History

Repository files navigation

Vision Transformer (ViT) Implementation and Video Attention Visualization

Project Structure

Features

Usage

Requirements

Credits

Additional Resources

Contact

About

Topics

Resources

Stars

Watchers

Forks

Languages