Skip to content

dlongert/dog_image_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

dog_image_classification

This study investigates the efficacy of modern deep learning architectures in image classification tasks, focusing on the recognition of dog breeds. Leveraging the Stanford Dogs Dataset, we evaluate the performance of Vision Transformer (ViT), VGG-16, and ResNet-50 models, aiming to surpass previous benchmarks set by Hsu (2015) using conventional convolutional neural networks (CNNs). The Vision Transformer (ViT) architecture, originally designed for natural language processing, represents a modern approach to image classification by processing entire images as sequences of tokens. Our results demonstrate significant accuracy improvements over the baseline established by Hsu (2015). VGG-16 achieved 65% testing accuracy, ResNet-50 achieved 84%, and surprisingly, ViT outperformed both with 91% accuracy. These findings suggest the potential of transformer architectures in handling smaller-scale datasets with fine-grained categories. The study contributes to the growing body of research indicating the viability of transformer models in various image classification tasks and calls for further exploration to enhance their performance as the architecture continues to evolve.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published