Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Simple Vision Transformer (ViT)

Model Architecture

Our implementation of a simplified Vision Transformer (ViT) model adopts a patch size of 16 and and features an encoder with 4 layers, each equipped with 4 attention heads.

ViT Architecture

Performance

The model has been evaluated using standard classification metrics:

  • Accuracy: 58.96%
  • Hamming Loss: 0.0680

These metrics reflect the preliminary results obtained under our current experimental setup.