Releases: huggingface/pytorch-image-models
Releases · huggingface/pytorch-image-models
Release v1.0.7
June 12, 2024
- MobileNetV4 models and initial set of
timm
trained weights added:
model | top1 | top1_err | top5 | top5_err | param_count | img_size |
---|---|---|---|---|---|---|
mobilenetv4_hybrid_large.e600_r384_in1k | 84.266 | 15.734 | 96.936 | 3.064 | 37.76 | 448 |
mobilenetv4_hybrid_large.e600_r384_in1k | 83.800 | 16.200 | 96.770 | 3.230 | 37.76 | 384 |
mobilenetv4_conv_large.e600_r384_in1k | 83.392 | 16.608 | 96.622 | 3.378 | 32.59 | 448 |
mobilenetv4_conv_large.e600_r384_in1k | 82.952 | 17.048 | 96.266 | 3.734 | 32.59 | 384 |
mobilenetv4_conv_large.e500_r256_in1k | 82.674 | 17.326 | 96.31 | 3.69 | 32.59 | 320 |
mobilenetv4_conv_large.e500_r256_in1k | 81.862 | 18.138 | 95.69 | 4.31 | 32.59 | 256 |
mobilenetv4_hybrid_medium.e500_r224_in1k | 81.276 | 18.724 | 95.742 | 4.258 | 11.07 | 256 |
mobilenetv4_conv_medium.e500_r256_in1k | 80.858 | 19.142 | 95.768 | 4.232 | 9.72 | 320 |
mobilenetv4_hybrid_medium.e500_r224_in1k | 80.442 | 19.558 | 95.38 | 4.62 | 11.07 | 224 |
mobilenetv4_conv_blur_medium.e500_r224_in1k | 80.142 | 19.858 | 95.298 | 4.702 | 9.72 | 256 |
mobilenetv4_conv_medium.e500_r256_in1k | 79.928 | 20.072 | 95.184 | 4.816 | 9.72 | 256 |
mobilenetv4_conv_medium.e500_r224_in1k | 79.808 | 20.192 | 95.186 | 4.814 | 9.72 | 256 |
mobilenetv4_conv_blur_medium.e500_r224_in1k | 79.438 | 20.562 | 94.932 | 5.068 | 9.72 | 224 |
mobilenetv4_conv_medium.e500_r224_in1k | 79.094 | 20.906 | 94.77 | 5.23 | 9.72 | 224 |
mobilenetv4_conv_small.e2400_r224_in1k | 74.616 | 25.384 | 92.072 | 7.928 | 3.77 | 256 |
mobilenetv4_conv_small.e1200_r224_in1k | 74.292 | 25.708 | 92.116 | 7.884 | 3.77 | 256 |
mobilenetv4_conv_small.e2400_r224_in1k | 73.756 | 26.244 | 91.422 | 8.578 | 3.77 | 224 |
mobilenetv4_conv_small.e1200_r224_in1k | 73.454 | 26.546 | 91.34 | 8.66 | 3.77 | 224 |
- Apple MobileCLIP (https://arxiv.org/pdf/2311.17049, FastViT and ViT-B) image tower model support & weights added (part of OpenCLIP support).
- ViTamin (https://arxiv.org/abs/2404.02132) CLIP image tower model & weights added (part of OpenCLIP support).
- OpenAI CLIP Modified ResNet image tower modelling & weight support (via ByobNet). Refactor AttentionPool2d.
- Refactoring & improvements, especially related to classifier_reset and num_features vs head_hidden_size for forward_features() vs pre_logits
Release v1.0.3
May 14, 2024
- Support loading PaliGemma jax weights into SigLIP ViT models with average pooling.
- Add Hiera models from Meta (https://github.com/facebookresearch/hiera).
- Add
normalize=
flag for transorms, return non-normalized torch.Tensor with original dytpe (forchug
) - Version 1.0.3 release
May 11, 2024
Searching for Better ViT Baselines (For the GPU Poor)
weights and vit variants released. Exploring model shapes between Tiny and Base.
- AttentionExtract helper added to extract attention maps from
timm
models. See example in #1232 (comment) forward_intermediates()
API refined and added to more models including some ConvNets that have other extraction methods.- 1017 of 1047 model architectures support
features_only=True
feature extraction. Remaining 34 architectures can be supported but based on priority requests. - Remove torch.jit.script annotated functions including old JIT activations. Conflict with dynamo and dynamo does a much better job when used.
April 11, 2024
- Prepping for a long overdue 1.0 release, things have been stable for a while now.
- Significant feature that's been missing for a while,
features_only=True
support for ViT models with flat hidden states or non-std module layouts (so far covering'vit_*', 'twins_*', 'deit*', 'beit*', 'mvitv2*', 'eva*', 'samvit_*', 'flexivit*'
) - Above feature support achieved through a new
forward_intermediates()
API that can be used with a feature wrapping module or direclty.
model = timm.create_model('vit_base_patch16_224')
final_feat, intermediates = model.forward_intermediates(input)
output = model.forward_head(final_feat) # pooling + classifier head
print(final_feat.shape)
torch.Size([2, 197, 768])
for f in intermediates:
print(f.shape)
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
print(output.shape)
torch.Size([2, 1000])
model = timm.create_model('eva02_base_patch16_clip_224', pretrained=True, img_size=512, features_only=True, out_indices=(-3, -2,))
output = model(torch.randn(2, 3, 512, 512))
for o in output:
print(o.shape)
torch.Size([2, 768, 32, 32])
torch.Size([2, 768, 32, 32])
- TinyCLIP vision tower weights added, thx Thien Tran
Release v0.9.16
Feb 19, 2024
- Next-ViT models added. Adapted from https://github.com/bytedance/Next-ViT
- HGNet and PP-HGNetV2 models added. Adapted from https://github.com/PaddlePaddle/PaddleClas by SeeFun
- Removed setup.py, moved to pyproject.toml based build supported by PDM
- Add updated model EMA impl using _for_each for less overhead
- Support device args in train script for non GPU devices
- Other misc fixes and small additions
- Min supported Python version increased to 3.8
- Release 0.9.16
Jan 8, 2024
Datasets & transform refactoring
- HuggingFace streaming (iterable) dataset support (
--dataset hfids:org/dataset
) - Webdataset wrapper tweaks for improved split info fetching, can auto fetch splits from supported HF hub webdataset
- Tested HF
datasets
and webdataset wrapper streaming from HF hub with recenttimm
ImageNet uploads to https://huggingface.co/timm - Make input & target column/field keys consistent across datasets and pass via args
- Full monochrome support when using e:g:
--input-size 1 224 224
or--in-chans 1
, sets PIL image conversion appropriately in dataset - Improved several alternate crop & resize transforms (ResizeKeepRatio, RandomCropOrPad, etc) for use in PixParse document AI project
- Add SimCLR style color jitter prob along with grayscale and gaussian blur options to augmentations and args
- Allow train without validation set (
--val-split ''
) in train script - Add
--bce-sum
(sum over class dim) and--bce-pos-weight
(positive weighting) args for training as they're common BCE loss tweaks I was often hard coding
Release v0.9.12
Nov 23, 2023
- Added EfficientViT-Large models, thanks SeeFun
- Fix Python 3.7 compat, will be dropping support for it soon
- Other misc fixes
- Release 0.9.12
Release v0.9.11
Nov 20, 2023
- Added significant flexibility for Hugging Face Hub based timm models via
model_args
config entry.model_args
will be passed as kwargs through to models on creation. - Updated imagenet eval and test set csv files with latest models
vision_transformer.py
typing and doc cleanup by Laureηt- 0.9.11 release
Release v0.9.10
Nov 4
- Patch fix for 0.9.9 to fix FrozenBatchnorm2d import path for old torchvision (~2 years )
Nov 3, 2023
- DFN (Data Filtering Networks) and MetaCLIP ViT weights added
- DINOv2 'register' ViT model weights added
- Add
quickgelu
ViT variants for OpenAI, DFN, MetaCLIP weights that use it (less efficient) - Improved typing added to ResNet, MobileNet-v3 thanks to Aryan
- ImageNet-12k fine-tuned (from LAION-2B CLIP)
convnext_xxlarge
- 0.9.9 release
Release v0.9.9
Nov 3, 2023
- DFN (Data Filtering Networks) and MetaCLIP ViT weights added
- DINOv2 'register' ViT model weights added
- Add
quickgelu
ViT variants for OpenAI, DFN, MetaCLIP weights that use it (less efficient) - Improved typing added to ResNet, MobileNet-v3 thanks to Aryan
- ImageNet-12k fine-tuned (from LAION-2B CLIP)
convnext_xxlarge
- 0.9.9 release
Release v0.9.8
Oct 20, 2023
- SigLIP image tower weights supported in
vision_transformer.py
.- Great potential for fine-tune and downstream feature use.
- Experimental 'register' support in vit models as per Vision Transformers Need Registers
- Updated RepViT with new weight release. Thanks wangao
- Add patch resizing support (on pretrained weight load) to Swin models
- 0.9.8 release
Release v0.9.7
Release v0.9.6
Aug 28, 2023
- Add dynamic img size support to models in
vision_transformer.py
,vision_transformer_hybrid.py
,deit.py
, andeva.py
w/o breaking backward compat.- Add
dynamic_img_size=True
to args at model creation time to allow changing the grid size (interpolate abs and/or ROPE pos embed each forward pass). - Add
dynamic_img_pad=True
to allow image sizes that aren't divisible by patch size (pad bottom right to patch size each forward pass). - Enabling either dynamic mode will break FX tracing unless PatchEmbed module added as leaf.
- Existing method of resizing position embedding by passing different
img_size
(interpolate pretrained embed weights once) on creation still works. - Existing method of changing
patch_size
(resize pretrained patch_embed weights once) on creation still works. - Example validation cmd
python validate.py /imagenet --model vit_base_patch16_224 --amp --amp-dtype bfloat16 --img-size 255 --crop-pct 1.0 --model-kwargs dynamic_img_size=True dyamic_img_pad=True
- Add
Aug 25, 2023
- Many new models since last release
- FastViT - https://arxiv.org/abs/2303.14189
- MobileOne - https://arxiv.org/abs/2206.04040
- InceptionNeXt - https://arxiv.org/abs/2303.16900
- RepGhostNet - https://arxiv.org/abs/2211.06088 (thanks https://github.com/ChengpengChen)
- GhostNetV2 - https://arxiv.org/abs/2211.12905 (thanks https://github.com/yehuitang)
- EfficientViT (MSRA) - https://arxiv.org/abs/2305.07027 (thanks https://github.com/seefun)
- EfficientViT (MIT) - https://arxiv.org/abs/2205.14756 (thanks https://github.com/seefun)
- Add
--reparam
arg tobenchmark.py
,onnx_export.py
, andvalidate.py
to trigger layer reparameterization / fusion for models with any one ofreparameterize()
,switch_to_deploy()
orfuse()
- Including FastViT, MobileOne, RepGhostNet, EfficientViT (MSRA), RepViT, RepVGG, and LeViT
- Preparing 0.9.6 'back to school' release
Aug 11, 2023
- Swin, MaxViT, CoAtNet, and BEiT models support resizing of image/window size on creation with adaptation of pretrained weights
- Example validation cmd to test w/ non-square resize
python validate.py /imagenet --model swin_base_patch4_window7_224.ms_in22k_ft_in1k --amp --amp-dtype bfloat16 --input-size 3 256 320 --model-kwargs window_size=8,10 img_size=256,320