Skip to content

Initial NaFlex ViT model and training support #2466

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

Initial NaFlex ViT model and training support #2466

wants to merge 9 commits into from

Conversation

rwightman
Copy link
Collaborator

@rwightman rwightman commented Apr 8, 2025

Working:

  • 'flex' ViT w/ NaFlex position embedding resize, pre-patched input, attention padding masks
  • Single node train.py works with a custom naflex data-pipeline via a dataset wrapper that handles random seq-len & batch-size selection, constrains images to seq-len while keeping aspect ratio (with randomizations)
  • A much faster patch embed kernel resample, torch only, can be used in forward()

Not tested / not completed:

  • distributed training not tested, dataset wrapper needs more verification
  • dataset wrapper for iterable datasets (wds, tfds, iterable hfds) needs to be added
  • more model definitions
  • weight loading / translation for existing vits
  • SigLip-2 NaFlex vision encoder weight port
  • Integration of naflex data pipeline components into OpenCLIP
  • Add randomization of the patch_size along with seq_len

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@rwightman rwightman marked this pull request as draft April 8, 2025 04:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants