Name		Name	Last commit message	Last commit date
parent directory ..
evaluation.py		evaluation.py
readme.md		readme.md
vit.pth		vit.pth
vit.py		vit.py

readme.md

Simple Vision Transformer (ViT)

Model Architecture

Our implementation of a simplified Vision Transformer (ViT) model adopts a patch size of 16 and and features an encoder with 4 layers, each equipped with 4 attention heads.

Performance

The model has been evaluated using standard classification metrics:

Accuracy: 58.96%
Hamming Loss: 0.0680

These metrics reflect the preliminary results obtained under our current experimental setup.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vit

vit

readme.md

Simple Vision Transformer (ViT)

Model Architecture

Performance

Files

vit

Directory actions

More options

Directory actions

More options

Latest commit

History

vit

Folders and files

parent directory

readme.md

Simple Vision Transformer (ViT)

Model Architecture

Performance