Hi there.
This repository contains the code and resources for our team project in the AI course at Ajou University, Fall 2024. The project focuses on optimizing Vision Transformers (ViTs). We developed hybrid models (convolutional and transformer-based) to improve inference speed and accuracy on image classification tasks.
- Oxford-IIIT Pet Dataset
- Parkhi et al., "Cats and dogs" 2012 IEEE Conference on Computer Vision and Pattern Recognition)
- CIFAR-10, CIFAR-100 Dataset
- (A. Krizhevsky, "Learning Multiple Layers of Features from Tiny Images", 2009)
- Caltech 256 Dataset
- (Griffin et al., (2022). Caltech 256 (1.0) [Data set]. CaltechDATA. https://doi.org/10.22002/D1.20087)
- LeViT (https://github.com/facebookresearch/LeViT)
- FLatten Transformer (https://github.com/LeapLabTHU/FLatten-Transformer)
- Swin Transformer (https://github.com/microsoft/Swin-Transformer)
- Flash Linear Attention (https://github.com/sustcsonglin/flash-linear-attention)
- Gated Linear Attention (https://github.com/berlino/gated_linear_attention)