Skip to content

Commit 5d535d7

Browse files
committedJan 19, 2025·
Version 1.0.14, update README & changelog
1 parent c6b74eb commit 5d535d7

File tree

3 files changed

+151
-49
lines changed

3 files changed

+151
-49
lines changed
 

‎README.md

+9-48
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,15 @@
1212

1313
## What's New
1414

15+
## Jan 19, 2025
16+
* Fix loading of LeViT safetensor weights, remove conversion code which should have been deactivated
17+
* Add 'SO150M' ViT weights trained with SBB recipes, decent results, but not optimal shape for ImageNet-12k/1k pretrain/ft
18+
* `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k` - 86.7% top-1
19+
* `vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k` - 87.4% top-1
20+
* `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k`
21+
* Misc typing, typo, etc. cleanup
22+
* 1.0.14 release to get above LeViT fix out
23+
1524
## Jan 9, 2025
1625
* Add support to train and validate in pure `bfloat16` or `float16`
1726
* `wandb` project name arg added by https://github.com/caojiaolong, use arg.experiment for name
@@ -116,7 +125,6 @@ Add a set of new very well trained ResNet & ResNet-V2 18/34 (basic block) weight
116125
* [mobilenetv3_large_150d.ra4_e3600_r256_in1k](http://hf.co/timm/mobilenetv3_large_150d.ra4_e3600_r256_in1k) - 81.81 @ 320, 80.94 @ 256
117126
* [mobilenetv3_large_100.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv3_large_100.ra4_e3600_r224_in1k) - 77.16 @ 256, 76.31 @ 224
118127

119-
120128
### Aug 21, 2024
121129
* Updated SBB ViT models trained on ImageNet-12k and fine-tuned on ImageNet-1k, challenging quite a number of much larger, slower models
122130

@@ -319,53 +327,6 @@ torch.Size([2, 768, 32, 32])
319327
* Min supported Python version increased to 3.8
320328
* Release 0.9.16
321329

322-
### Jan 8, 2024
323-
Datasets & transform refactoring
324-
* HuggingFace streaming (iterable) dataset support (`--dataset hfids:org/dataset`)
325-
* Webdataset wrapper tweaks for improved split info fetching, can auto fetch splits from supported HF hub webdataset
326-
* Tested HF `datasets` and webdataset wrapper streaming from HF hub with recent `timm` ImageNet uploads to https://huggingface.co/timm
327-
* Make input & target column/field keys consistent across datasets and pass via args
328-
* Full monochrome support when using e:g: `--input-size 1 224 224` or `--in-chans 1`, sets PIL image conversion appropriately in dataset
329-
* Improved several alternate crop & resize transforms (ResizeKeepRatio, RandomCropOrPad, etc) for use in PixParse document AI project
330-
* Add SimCLR style color jitter prob along with grayscale and gaussian blur options to augmentations and args
331-
* Allow train without validation set (`--val-split ''`) in train script
332-
* Add `--bce-sum` (sum over class dim) and `--bce-pos-weight` (positive weighting) args for training as they're common BCE loss tweaks I was often hard coding
333-
334-
### Nov 23, 2023
335-
* Added EfficientViT-Large models, thanks [SeeFun](https://github.com/seefun)
336-
* Fix Python 3.7 compat, will be dropping support for it soon
337-
* Other misc fixes
338-
* Release 0.9.12
339-
340-
### Nov 20, 2023
341-
* Added significant flexibility for Hugging Face Hub based timm models via `model_args` config entry. `model_args` will be passed as kwargs through to models on creation.
342-
* See example at https://huggingface.co/gaunernst/vit_base_patch16_1024_128.audiomae_as2m_ft_as20k/blob/main/config.json
343-
* Usage: https://github.com/huggingface/pytorch-image-models/discussions/2035
344-
* Updated imagenet eval and test set csv files with latest models
345-
* `vision_transformer.py` typing and doc cleanup by [Laureηt](https://github.com/Laurent2916)
346-
* 0.9.11 release
347-
348-
### Nov 3, 2023
349-
* [DFN (Data Filtering Networks)](https://huggingface.co/papers/2309.17425) and [MetaCLIP](https://huggingface.co/papers/2309.16671) ViT weights added
350-
* DINOv2 'register' ViT model weights added (https://huggingface.co/papers/2309.16588, https://huggingface.co/papers/2304.07193)
351-
* Add `quickgelu` ViT variants for OpenAI, DFN, MetaCLIP weights that use it (less efficient)
352-
* Improved typing added to ResNet, MobileNet-v3 thanks to [Aryan](https://github.com/a-r-r-o-w)
353-
* ImageNet-12k fine-tuned (from LAION-2B CLIP) `convnext_xxlarge`
354-
* 0.9.9 release
355-
356-
### Oct 20, 2023
357-
* [SigLIP](https://huggingface.co/papers/2303.15343) image tower weights supported in `vision_transformer.py`.
358-
* Great potential for fine-tune and downstream feature use.
359-
* Experimental 'register' support in vit models as per [Vision Transformers Need Registers](https://huggingface.co/papers/2309.16588)
360-
* Updated RepViT with new weight release. Thanks [wangao](https://github.com/jameslahm)
361-
* Add patch resizing support (on pretrained weight load) to Swin models
362-
* 0.9.8 release pending
363-
364-
### Sep 1, 2023
365-
* TinyViT added by [SeeFun](https://github.com/seefun)
366-
* Fix EfficientViT (MIT) to use torch.autocast so it works back to PT 1.10
367-
* 0.9.7 release
368-
369330
## Introduction
370331

371332
Py**T**orch **Im**age **M**odels (`timm`) is a collection of image models, layers, utilities, optimizers, schedulers, data-loaders / augmentations, and reference training / validation scripts that aim to pull together a wide variety of SOTA models with ability to reproduce ImageNet training results.

‎hfdocs/source/changes.mdx

+141
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,146 @@
11
# Changelog
22

3+
## Jan 19, 2025
4+
* Fix loading of LeViT safetensor weights, remove conversion code which should have been deactivated
5+
* Add 'SO150M' ViT weights trained with SBB recipes, decent results, but not optimal shape for ImageNet-12k/1k pretrain/ft
6+
* `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k` - 86.7% top-1
7+
* `vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k` - 87.4% top-1
8+
* `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k`
9+
* Misc typing, typo, etc. cleanup
10+
* 1.0.14 release to get above LeViT fix out
11+
12+
## Jan 9, 2025
13+
* Add support to train and validate in pure `bfloat16` or `float16`
14+
* `wandb` project name arg added by https://github.com/caojiaolong, use arg.experiment for name
15+
* Fix old issue w/ checkpoint saving not working on filesystem w/o hard-link support (e.g. FUSE fs mounts)
16+
* 1.0.13 release
17+
18+
## Jan 6, 2025
19+
* Add `torch.utils.checkpoint.checkpoint()` wrapper in `timm.models` that defaults `use_reentrant=False`, unless `TIMM_REENTRANT_CKPT=1` is set in env.
20+
21+
## Dec 31, 2024
22+
* `convnext_nano` 384x384 ImageNet-12k pretrain & fine-tune. https://huggingface.co/models?search=convnext_nano%20r384
23+
* Add AIM-v2 encoders from https://github.com/apple/ml-aim, see on Hub: https://huggingface.co/models?search=timm%20aimv2
24+
* Add PaliGemma2 encoders from https://github.com/google-research/big_vision to existing PaliGemma, see on Hub: https://huggingface.co/models?search=timm%20pali2
25+
* Add missing L/14 DFN2B 39B CLIP ViT, `vit_large_patch14_clip_224.dfn2b_s39b`
26+
* Fix existing `RmsNorm` layer & fn to match standard formulation, use PT 2.5 impl when possible. Move old impl to `SimpleNorm` layer, it's LN w/o centering or bias. There were only two `timm` models using it, and they have been updated.
27+
* Allow override of `cache_dir` arg for model creation
28+
* Pass through `trust_remote_code` for HF datasets wrapper
29+
* `inception_next_atto` model added by creator
30+
* Adan optimizer caution, and Lamb decoupled weighgt decay options
31+
* Some feature_info metadata fixed by https://github.com/brianhou0208
32+
* All OpenCLIP and JAX (CLIP, SigLIP, Pali, etc) model weights that used load time remapping were given their own HF Hub instances so that they work with `hf-hub:` based loading, and thus will work with new Transformers `TimmWrapperModel`
33+
34+
## Nov 28, 2024
35+
* More optimizers
36+
* Add MARS optimizer (https://arxiv.org/abs/2411.10438, https://github.com/AGI-Arena/MARS)
37+
* Add LaProp optimizer (https://arxiv.org/abs/2002.04839, https://github.com/Z-T-WANG/LaProp-Optimizer)
38+
* Add masking from 'Cautious Optimizers' (https://arxiv.org/abs/2411.16085, https://github.com/kyleliang919/C-Optim) to Adafactor, Adafactor Big Vision, AdamW (legacy), Adopt, Lamb, LaProp, Lion, NadamW, RMSPropTF, SGDW
39+
* Cleanup some docstrings and type annotations re optimizers and factory
40+
* Add MobileNet-V4 Conv Medium models pretrained on in12k and fine-tuned in1k @ 384x384
41+
* https://huggingface.co/timm/mobilenetv4_conv_medium.e250_r384_in12k_ft_in1k
42+
* https://huggingface.co/timm/mobilenetv4_conv_medium.e250_r384_in12k
43+
* https://huggingface.co/timm/mobilenetv4_conv_medium.e180_ad_r384_in12k
44+
* https://huggingface.co/timm/mobilenetv4_conv_medium.e180_r384_in12k
45+
* Add small cs3darknet, quite good for the speed
46+
* https://huggingface.co/timm/cs3darknet_focus_s.ra4_e3600_r256_in1k
47+
48+
## Nov 12, 2024
49+
* Optimizer factory refactor
50+
* New factory works by registering optimizers using an OptimInfo dataclass w/ some key traits
51+
* Add `list_optimizers`, `get_optimizer_class`, `get_optimizer_info` to reworked `create_optimizer_v2` fn to explore optimizers, get info or class
52+
* deprecate `optim.optim_factory`, move fns to `optim/_optim_factory.py` and `optim/_param_groups.py` and encourage import via `timm.optim`
53+
* Add Adopt (https://github.com/iShohei220/adopt) optimizer
54+
* Add 'Big Vision' variant of Adafactor (https://github.com/google-research/big_vision/blob/main/big_vision/optax.py) optimizer
55+
* Fix original Adafactor to pick better factorization dims for convolutions
56+
* Tweak LAMB optimizer with some improvements in torch.where functionality since original, refactor clipping a bit
57+
* dynamic img size support in vit, deit, eva improved to support resize from non-square patch grids, thanks https://github.com/wojtke
58+
*
59+
## Oct 31, 2024
60+
Add a set of new very well trained ResNet & ResNet-V2 18/34 (basic block) weights. See https://huggingface.co/blog/rwightman/resnet-trick-or-treat
61+
62+
## Oct 19, 2024
63+
* Cleanup torch amp usage to avoid cuda specific calls, merge support for Ascend (NPU) devices from [MengqingCao](https://github.com/MengqingCao) that should work now in PyTorch 2.5 w/ new device extension autoloading feature. Tested Intel Arc (XPU) in Pytorch 2.5 too and it (mostly) worked.
64+
65+
## Oct 16, 2024
66+
* Fix error on importing from deprecated path `timm.models.registry`, increased priority of existing deprecation warnings to be visible
67+
* Port weights of InternViT-300M (https://huggingface.co/OpenGVLab/InternViT-300M-448px) to `timm` as `vit_intern300m_patch14_448`
68+
69+
### Oct 14, 2024
70+
* Pre-activation (ResNetV2) version of 18/18d/34/34d ResNet model defs added by request (weights pending)
71+
* Release 1.0.10
72+
73+
### Oct 11, 2024
74+
* MambaOut (https://github.com/yuweihao/MambaOut) model & weights added. A cheeky take on SSM vision models w/o the SSM (essentially ConvNeXt w/ gating). A mix of original weights + custom variations & weights.
75+
76+
|model |img_size|top1 |top5 |param_count|
77+
|---------------------------------------------------------------------------------------------------------------------|--------|------|------|-----------|
78+
|[mambaout_base_plus_rw.sw_e150_r384_in12k_ft_in1k](http://huggingface.co/timm/mambaout_base_plus_rw.sw_e150_r384_in12k_ft_in1k)|384 |87.506|98.428|101.66 |
79+
|[mambaout_base_plus_rw.sw_e150_in12k_ft_in1k](http://huggingface.co/timm/mambaout_base_plus_rw.sw_e150_in12k_ft_in1k)|288 |86.912|98.236|101.66 |
80+
|[mambaout_base_plus_rw.sw_e150_in12k_ft_in1k](http://huggingface.co/timm/mambaout_base_plus_rw.sw_e150_in12k_ft_in1k)|224 |86.632|98.156|101.66 |
81+
|[mambaout_base_tall_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_tall_rw.sw_e500_in1k) |288 |84.974|97.332|86.48 |
82+
|[mambaout_base_wide_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_wide_rw.sw_e500_in1k) |288 |84.962|97.208|94.45 |
83+
|[mambaout_base_short_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_short_rw.sw_e500_in1k) |288 |84.832|97.27 |88.83 |
84+
|[mambaout_base.in1k](http://huggingface.co/timm/mambaout_base.in1k) |288 |84.72 |96.93 |84.81 |
85+
|[mambaout_small_rw.sw_e450_in1k](http://huggingface.co/timm/mambaout_small_rw.sw_e450_in1k) |288 |84.598|97.098|48.5 |
86+
|[mambaout_small.in1k](http://huggingface.co/timm/mambaout_small.in1k) |288 |84.5 |96.974|48.49 |
87+
|[mambaout_base_wide_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_wide_rw.sw_e500_in1k) |224 |84.454|96.864|94.45 |
88+
|[mambaout_base_tall_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_tall_rw.sw_e500_in1k) |224 |84.434|96.958|86.48 |
89+
|[mambaout_base_short_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_short_rw.sw_e500_in1k) |224 |84.362|96.952|88.83 |
90+
|[mambaout_base.in1k](http://huggingface.co/timm/mambaout_base.in1k) |224 |84.168|96.68 |84.81 |
91+
|[mambaout_small.in1k](http://huggingface.co/timm/mambaout_small.in1k) |224 |84.086|96.63 |48.49 |
92+
|[mambaout_small_rw.sw_e450_in1k](http://huggingface.co/timm/mambaout_small_rw.sw_e450_in1k) |224 |84.024|96.752|48.5 |
93+
|[mambaout_tiny.in1k](http://huggingface.co/timm/mambaout_tiny.in1k) |288 |83.448|96.538|26.55 |
94+
|[mambaout_tiny.in1k](http://huggingface.co/timm/mambaout_tiny.in1k) |224 |82.736|96.1 |26.55 |
95+
|[mambaout_kobe.in1k](http://huggingface.co/timm/mambaout_kobe.in1k) |288 |81.054|95.718|9.14 |
96+
|[mambaout_kobe.in1k](http://huggingface.co/timm/mambaout_kobe.in1k) |224 |79.986|94.986|9.14 |
97+
|[mambaout_femto.in1k](http://huggingface.co/timm/mambaout_femto.in1k) |288 |79.848|95.14 |7.3 |
98+
|[mambaout_femto.in1k](http://huggingface.co/timm/mambaout_femto.in1k) |224 |78.87 |94.408|7.3 |
99+
100+
* SigLIP SO400M ViT fine-tunes on ImageNet-1k @ 378x378, added 378x378 option for existing SigLIP 384x384 models
101+
* [vit_so400m_patch14_siglip_378.webli_ft_in1k](https://huggingface.co/timm/vit_so400m_patch14_siglip_378.webli_ft_in1k) - 89.42 top-1
102+
* [vit_so400m_patch14_siglip_gap_378.webli_ft_in1k](https://huggingface.co/timm/vit_so400m_patch14_siglip_gap_378.webli_ft_in1k) - 89.03
103+
* SigLIP SO400M ViT encoder from recent multi-lingual (i18n) variant, patch16 @ 256x256 (https://huggingface.co/timm/ViT-SO400M-16-SigLIP-i18n-256). OpenCLIP update pending.
104+
* Add two ConvNeXt 'Zepto' models & weights (one w/ overlapped stem and one w/ patch stem). Uses RMSNorm, smaller than previous 'Atto', 2.2M params.
105+
* [convnext_zepto_rms_ols.ra4_e3600_r224_in1k](https://huggingface.co/timm/convnext_zepto_rms_ols.ra4_e3600_r224_in1k) - 73.20 top-1 @ 224
106+
* [convnext_zepto_rms.ra4_e3600_r224_in1k](https://huggingface.co/timm/convnext_zepto_rms.ra4_e3600_r224_in1k) - 72.81 @ 224
107+
108+
### Sept 2024
109+
* Add a suite of tiny test models for improved unit tests and niche low-resource applications (https://huggingface.co/blog/rwightman/timm-tiny-test)
110+
* Add MobileNetV4-Conv-Small (0.5x) model (https://huggingface.co/posts/rwightman/793053396198664)
111+
* [mobilenetv4_conv_small_050.e3000_r224_in1k](http://hf.co/timm/mobilenetv4_conv_small_050.e3000_r224_in1k) - 65.81 top-1 @ 256, 64.76 @ 224
112+
* Add MobileNetV3-Large variants trained with MNV4 Small recipe
113+
* [mobilenetv3_large_150d.ra4_e3600_r256_in1k](http://hf.co/timm/mobilenetv3_large_150d.ra4_e3600_r256_in1k) - 81.81 @ 320, 80.94 @ 256
114+
* [mobilenetv3_large_100.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv3_large_100.ra4_e3600_r224_in1k) - 77.16 @ 256, 76.31 @ 224
115+
116+
### Aug 21, 2024
117+
* Updated SBB ViT models trained on ImageNet-12k and fine-tuned on ImageNet-1k, challenging quite a number of much larger, slower models
118+
119+
| model | top1 | top5 | param_count | img_size |
120+
| -------------------------------------------------- | ------ | ------ | ----------- | -------- |
121+
| [vit_mediumd_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k](https://huggingface.co/timm/vit_mediumd_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k) | 87.438 | 98.256 | 64.11 | 384 |
122+
| [vit_mediumd_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k](https://huggingface.co/timm/vit_mediumd_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k) | 86.608 | 97.934 | 64.11 | 256 |
123+
| [vit_betwixt_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k](https://huggingface.co/timm/vit_betwixt_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k) | 86.594 | 98.02 | 60.4 | 384 |
124+
| [vit_betwixt_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k](https://huggingface.co/timm/vit_betwixt_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k) | 85.734 | 97.61 | 60.4 | 256 |
125+
* MobileNet-V1 1.25, EfficientNet-B1, & ResNet50-D weights w/ MNV4 baseline challenge recipe
126+
127+
| model | top1 | top5 | param_count | img_size |
128+
|--------------------------------------------------------------------------------------------------------------------------|--------|--------|-------------|----------|
129+
| [resnet50d.ra4_e3600_r224_in1k](http://hf.co/timm/resnet50d.ra4_e3600_r224_in1k) | 81.838 | 95.922 | 25.58 | 288 |
130+
| [efficientnet_b1.ra4_e3600_r240_in1k](http://hf.co/timm/efficientnet_b1.ra4_e3600_r240_in1k) | 81.440 | 95.700 | 7.79 | 288 |
131+
| [resnet50d.ra4_e3600_r224_in1k](http://hf.co/timm/resnet50d.ra4_e3600_r224_in1k) | 80.952 | 95.384 | 25.58 | 224 |
132+
| [efficientnet_b1.ra4_e3600_r240_in1k](http://hf.co/timm/efficientnet_b1.ra4_e3600_r240_in1k) | 80.406 | 95.152 | 7.79 | 240 |
133+
| [mobilenetv1_125.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv1_125.ra4_e3600_r224_in1k) | 77.600 | 93.804 | 6.27 | 256 |
134+
| [mobilenetv1_125.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv1_125.ra4_e3600_r224_in1k) | 76.924 | 93.234 | 6.27 | 224 |
135+
136+
* Add SAM2 (HieraDet) backbone arch & weight loading support
137+
* Add Hiera Small weights trained w/ abswin pos embed on in12k & fine-tuned on 1k
138+
139+
|model |top1 |top5 |param_count|
140+
|---------------------------------|------|------|-----------|
141+
|hiera_small_abswin_256.sbb2_e200_in12k_ft_in1k |84.912|97.260|35.01 |
142+
|hiera_small_abswin_256.sbb2_pd_e200_in12k_ft_in1k |84.560|97.106|35.01 |
143+
3144
### Aug 8, 2024
4145
* Add RDNet ('DenseNets Reloaded', https://arxiv.org/abs/2403.19588), thanks [Donghyun Kim](https://github.com/dhkim0225)
5146

‎timm/version.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = '1.0.14.dev0'
1+
__version__ = '1.0.14'

0 commit comments

Comments
 (0)
Please sign in to comment.