Releases: huggingface/pytorch-image-models
Releases · huggingface/pytorch-image-models
Release v1.0.21
Oct 16-20, 2025
- Add an impl of the Muon optimizer (based on https://github.com/KellerJordan/Muon) with customizations
- extra flexibility and improved handling for conv weights and fallbacks for weight shapes not suited for orthogonalization
- small speedup for NS iterations by reducing allocs and using fused (b)add(b)mm ops
- by default uses AdamW (or NAdamW if nesterov=True) updates if muon not suitable for parameter shape (or excluded via param group flag)
- like torch impl, select from several LR scale adjustment fns via adjust_lr_fn
- select from several NS coefficient presets or specify your own via ns_coefficients
 
- First 2 steps of 'meta' device model initialization supported
- Fix several ops that were breaking creation under 'meta' device context
- Add device & dtype factory kwarg support to all models and modules (anything inherting from nn.Module) in timm
 
- License fields added to pretrained cfgs in code
- Release 1.0.21
What's Changed
- Add calculate_drop_path_rates helper by @rwightman in #2589
- Review huggingface_hubintegration by @Wauplin in #2592
- Adding device/dtype factory_kwargs to modules and models by @rwightman in #2591
- Consistent license handling throughout timm by @alexanderdann in #2585
- Add impl of Muon optimizer. Fix #2580 by @rwightman in #2596
- Rename 'simple' flag for Muon to 'fallback' by @rwightman in #2599
New Contributors
- @alexanderdann made their first contribution in #2585
Full Changelog: v1.0.20...v1.0.21
Release v1.0.20
Sept 21, 2025
- Remap DINOv3 ViT weight tags from lvd_1689m->lvd1689mto match (same forsat_493m->sat493m)
- Release 1.0.20
Sept 17, 2025
- DINOv3 (https://arxiv.org/abs/2508.10104) ConvNeXt and ViT models added. ConvNeXt models were mapped to existing timmmodel. ViT support done via the EVA base model w/ a newRotaryEmbeddingDinoV3to match the DINOv3 specific RoPE impl
- MobileCLIP-2 (https://arxiv.org/abs/2508.20691) vision encoders. New MCI3/MCI4 FastViT variants added and weights mapped to existing FastViT and B, L/14 ViTs.
- MetaCLIP-2 Worldwide (https://arxiv.org/abs/2507.22062) ViT encoder weights added.
- SigLIP-2 (https://arxiv.org/abs/2502.14786) NaFlex ViT encoder weights added via timm NaFlexViT model.
- Misc fixes and contributions
What's Changed
- Pass init_values at hieradet_sam2 by @hassonofer in #2559
- Add mobileclip2 encoder weights by @rwightman in #2560
- Add support for Gemma 3n MobileNetV5 encoder weight loading by @rwightman in #2561
- Fix #2562, add siglip2 naflex vit encoder weights by @rwightman in #2564
- fix: create results_dir if missing before saving results by @zhima771 in #2576
- feat(validate): add precision, recall, and F1 metrics by @ha405 in #2568
- Allow user to ask for features other than image and label in ImageDataset by @grodino in #2571
- Add MobileCLIP2 image encoders by @rwightman in #2578
- Add DINOv3 support by @rwightman in #2579
New Contributors
- @hassonofer made their first contribution in #2559
- @zhima771 made their first contribution in #2576
- @ha405 made their first contribution in #2568
Full Changelog: v1.0.19...v1.0.20
Release v1.0.19
Patch release for Python 3.9 compat break in 1.0.18
July 23, 2025
- Add set_input_size()method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models.
- Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0
- Fix small typing issue that broke Python 3.9 compat. 1.0.19 patch release.
July 21, 2025
- ROPE support added to NaFlexViT. All models covered by the EVA base (eva.py) including EVA, EVA02, Meta PE ViT,timmSBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT whenuse_naflex=Truepassed at model creation time
- More Meta PE ViT encoders added, including small/tiny variants, lang variants w/ tiling, and more spatial variants.
- PatchDropout fixed with NaFlexViT and also w/ EVA models (regression after adding Naver ROPE-ViT)
- Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).
What's Changed
- Add ROPE support to NaFlexVit (axial and mixed), and support most (all?) EVA based vit models & weights by @rwightman in #2552
- Support set_input_size() in EVA models by @rwightman in #2554
Full Changelog: v1.0.17...v1.0.18
Release v1.0.18
July 23, 2025
- Add set_input_size()method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models.
- Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0
July 21, 2025
- ROPE support added to NaFlexViT. All models covered by the EVA base (eva.py) including EVA, EVA02, Meta PE ViT,timmSBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT whenuse_naflex=Truepassed at model creation time
- More Meta PE ViT encoders added, including small/tiny variants, lang variants w/ tiling, and more spatial variants.
- PatchDropout fixed with NaFlexViT and also w/ EVA models (regression after adding Naver ROPE-ViT)
- Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).
What's Changed
- Add ROPE support to NaFlexVit (axial and mixed), and support most (all?) EVA based vit models & weights by @rwightman in #2552
- Support set_input_size() in EVA models by @rwightman in #2554
Full Changelog: v1.0.17...v1.0.18
Release v1.0.17
July 7, 2025
- MobileNet-v5 backbone tweaks for improved Google Gemma 3n behaviour (to pair with updated official weights)
- Add stem bias (zero'd in updated weights, compat break with old weights)
- GELU -> GELU (tanh approx). A minor change to be closer to JAX
 
- Add two arguments to layer-decay support, a min scale clamp and 'no optimization' scale threshold
- Add 'Fp32' LayerNorm, RMSNorm, SimpleNorm variants that can be enabled to force computation of norm in float32
- Some typing, argument cleanup for norm, norm+act layers done with above
- Support Naver ROPE-ViT (https://github.com/naver-ai/rope-vit) in eva.py, add RotaryEmbeddingMixed module for mixed mode, weights on HuggingFace Hub
| model | img_size | top1 | top5 | param_count | 
|---|---|---|---|---|
| vit_large_patch16_rope_mixed_ape_224.naver_in1k | 224 | 84.84 | 97.122 | 304.4 | 
| vit_large_patch16_rope_mixed_224.naver_in1k | 224 | 84.828 | 97.116 | 304.2 | 
| vit_large_patch16_rope_ape_224.naver_in1k | 224 | 84.65 | 97.154 | 304.37 | 
| vit_large_patch16_rope_224.naver_in1k | 224 | 84.648 | 97.122 | 304.17 | 
| vit_base_patch16_rope_mixed_ape_224.naver_in1k | 224 | 83.894 | 96.754 | 86.59 | 
| vit_base_patch16_rope_mixed_224.naver_in1k | 224 | 83.804 | 96.712 | 86.44 | 
| vit_base_patch16_rope_ape_224.naver_in1k | 224 | 83.782 | 96.61 | 86.59 | 
| vit_base_patch16_rope_224.naver_in1k | 224 | 83.718 | 96.672 | 86.43 | 
| vit_small_patch16_rope_224.naver_in1k | 224 | 81.23 | 95.022 | 21.98 | 
| vit_small_patch16_rope_mixed_224.naver_in1k | 224 | 81.216 | 95.022 | 21.99 | 
| vit_small_patch16_rope_ape_224.naver_in1k | 224 | 81.004 | 95.016 | 22.06 | 
| vit_small_patch16_rope_mixed_ape_224.naver_in1k | 224 | 80.986 | 94.976 | 22.06 | 
- Some cleanup of ROPE modules, helpers, and FX tracing leaf registration
- Preparing version 1.0.17 release
What's Changed
- Adding Naver rope-vit compatibility to EVA ViT by @rwightman in #2529
- Update no_grad usage to inference_mode if possible by @GuillaumeErhard in #2534
- Add a min layer-decay scale clamp, and no optimization threshold to exclude groups from optimization by @rwightman in #2537
- Add stem_bias option to MNV5. Resolve the norm layer so can pass string. by @rwightman in #2538
- Add flag to enable float32 computation for normalization (norm + affine) by @rwightman in #2536
- fix: mnv5 conv_stem bias and GELU with approximate=tanh by @RyanMullins in #2533
- Fixup casting issues for weights/bias in fp32 norm layers by @rwightman in #2539
- Fix H, W ordering for xy indexing in ROPE by @rwightman in #2541
- Fix 3 typos in README.md by @robin-ede in #2544
New Contributors
- @GuillaumeErhard made their first contribution in #2534
- @RyanMullins made their first contribution in #2533
- @robin-ede made their first contribution in #2544
Full Changelog: v1.0.16...v1.0.17
Release v1.0.16
June 26, 2025
- MobileNetV5 backbone (w/ encoder only variant) for Gemma 3n image encoder
- Version 1.0.16 released
June 23, 2025
- Add F.grid_sample based 2D and factorized pos embed resize to NaFlexViT. Faster when lots of different sizes (based on example by https://github.com/stas-sl).
- Further speed up patch embed resample by replacing vmap with matmul (based on snippet by https://github.com/stas-sl).
- Add 3 initial native aspect NaFlexViT checkpoints created while testing, ImageNet-1k and 3 different pos embed configs w/ same hparams.
| Model | Top-1 Acc | Top-5 Acc | Params (M) | Eval Seq Len | 
|---|---|---|---|---|
| naflexvit_base_patch16_par_gap.e300_s576_in1k | 83.67 | 96.45 | 86.63 | 576 | 
| naflexvit_base_patch16_parfac_gap.e300_s576_in1k | 83.63 | 96.41 | 86.46 | 576 | 
| naflexvit_base_patch16_gap.e300_s576_in1k | 83.50 | 96.46 | 86.63 | 576 | 
- Support gradient checkpointing for forward_intermediatesand fix some checkpointing bugs. Thanks https://github.com/brianhou0208
- Add 'corrected weight decay' (https://arxiv.org/abs/2506.02285) as option to AdamW (legacy), Adopt, Kron, Adafactor (BV), Lamb, LaProp, Lion, NadamW, RmsPropTF, SGDW optimizers
- Switch PE (perception encoder) ViT models to use native timm weights instead of remapping on the fly
- Fix cuda stream bug in prefetch loader
June 5, 2025
- Initial NaFlexVit model code. NaFlexVit is a Vision Transformer with:
- Encapsulated embedding and position encoding in a single module
- Support for nn.Linear patch embedding on pre-patchified (dictionary) inputs
- Support for NaFlex variable aspect, variable resolution (SigLip-2: https://arxiv.org/abs/2502.14786)
- Support for FlexiViT variable patch size (https://arxiv.org/abs/2212.08013)
- Support for NaViT fractional/factorized position embedding (https://arxiv.org/abs/2307.06304)
 
- Existing vit models in vision_transformer.pycan be loaded into the NaFlexVit model by adding theuse_naflex=Trueflag tocreate_model- Some native weights coming soon
 
- A full NaFlex data pipeline is available that allows training / fine-tuning / evaluating with variable aspect / size images
- To enable in train.pyandvalidate.pyadd the--naflex-loaderarg, must be used with a NaFlexVit
 
- To enable in 
- To evaluate an existing (classic) ViT loaded in NaFlexVit model w/ NaFlex data pipe:
- python validate.py /imagenet --amp -j 8 --model vit_base_patch16_224 --model-kwargs use_naflex=True --naflex-loader --naflex-max-seq-len 256
 
- The training has some extra args features worth noting
- The --naflex-train-seq-lens'argument specifies which sequence lengths to randomly pick from per batch during training
- The --naflex-max-seq-lenargument sets the target sequence length for validation
- Adding --model-kwargs enable_patch_interpolator=True --naflex-patch-sizes 12 16 24will enable random patch size selection per-batch w/ interpolation
- The --naflex-loss-scalearg changes loss scaling mode per batch relative to the batch size,timmNaFlex loading changes the batch size for each seq len
 
- The 
May 28, 2025
- Add a number of small/fast models thanks to https://github.com/brianhou0208
- SwiftFormer - (ICCV2023) SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
- FasterNet - (CVPR2023) Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks
- SHViT - (CVPR2024) SHViT: Single-Head Vision Transformer with Memory Efficient
- StarNet - (CVPR2024) Rewrite the Stars
- GhostNet-V3 GhostNetV3: Exploring the Training Strategies for Compact Models
 
- Update EVA ViT (closest match) to support Perception Encoder models (https://arxiv.org/abs/2504.13181) from Meta, loading Hub weights but I still need to push dedicated timmweights- Add some flexibility to ROPE impl
 
- Big increase in number of models supporting forward_intermediates()and some additional fixes thanks to https://github.com/brianhou0208- DaViT, EdgeNeXt, EfficientFormerV2, EfficientViT(MIT), EfficientViT(MSRA), FocalNet, GCViT, HGNet /V2, InceptionNeXt, Inception-V4, MambaOut, MetaFormer, NesT, Next-ViT, PiT, PVT V2, RepGhostNet, RepViT, ResNetV2, ReXNet, TinyViT, TResNet, VoV
 
- TNT model updated w/ new weights forward_intermediates()thanks to https://github.com/brianhou0208
- Add local-dir:pretrained schema, can uselocal-dir:/path/to/model/folderfor model name to source model / pretrained cfg & weights Hugging Face Hub models (config.json + weights file) from a local folder.
- Fixes, improvements for onnx export
What's Changed
- Fix arg merging of sknet, old seresnet. Fix #2470 by @rwightman in #2471
- Fix onnx export by @rwightman in #2475
- Add local-dir: schema support for model loading (config + weights) from folder by @rwightman in #2476
- Fix: Allow img_size to be int or tuple in PatchEmbed by @sddongxh in #2477
- Add LightlyTrain Integration for Pretraining Support by @yutong-xiang-97 in #2474
- Check forward_intermediates features against forward_features output by @rwightman in #2483
- More models support forward_intermediates by @brianhou0208 in #2482
- Update README.md by @atharva-pathak in #2484
- remove downloadargument from torch_kwargs for torchvisionImageNetclass by @ryan-caesar-ramos in #2486
- Update TNT-(S/B) model weights and add feature extraction support by @brianhou0208 in #2480
- Add EVA ViT based PE (Perceptual Encoder) impl by @rwightman in #2487
- Add SwiftFormer, SHViT, StarNet, FasterNet and GhostNetV3 by @brianhou0208 in #2499
- A cleaned up beit3 remap onto vision_transformer.py vit by @rwightman in #2503
- Initial NaFlex ViT model and training support by @rwightman in #2466
- Forgot to compact attention pool branches after verifying by @rwightman in #2507
- Throw exception on non-directory path for pretrained weights by @emmanuel-ferdman in #2510
- Add corrected_weight decay to several optimizers by @rwightman in #2511
- Doing some Claude enabled docstring, type annotation and other cleanup by @rwightman in #2504
- Fix #2513, be explicit about stream devices by @rwightman in #2515
- Update legacy AdamW impl so it has a multi-tensor impl like NAdamW (n… by @rwightman in #2517
- Fix head_dimreference inAttentionRopeclass ofattention.pyby @amorehead in #2519
- Refactor patch and pos embed resampling based on feedback from https://github.com/stas-sl by @rwightman in #2518
- Add initial weights for my first 3 naflexvit_base models by @rwightman in #2523
- Support gradient checkpointing in forward_intermediates()by @brianhou0208 in #2501
- Update README: add references for additional supported models by @brianhou0208 in #2526
- MobileNetV5 by @rwightman in #2527
New Contributors
- @sddongxh made their first contribution in #2477
- @yutong-xiang-97 made their first contribution in #2474
- @atharva-pathak made their first contribution in #2484
- @ryan-caesar-ramos made their first contribution in #2486
- @emmanuel-ferdman made their first contribution in #2510
- @amorehead made their first contribution in #2519
Full Changelog: v1.0.15...v1.0.16
Release v1.0.15
Feb 21, 2025
- SigLIP 2 ViT image encoders added (https://huggingface.co/collections/timm/siglip-2-67b8e72ba08b09dd97aecaf9)
- Variable resolution / aspect NaFlex versions are a WIP
 
- Add 'SO150M2' ViT weights trained with SBB recipes, great results, better for ImageNet than previous attempt w/ less training.
- vit_so150m2_patch16_reg1_gap_448.sbb_e200_in12k_ft_in1k- 88.1% top-1
- vit_so150m2_patch16_reg1_gap_384.sbb_e200_in12k_ft_in1k- 87.9% top-1
- vit_so150m2_patch16_reg1_gap_256.sbb_e200_in12k_ft_in1k- 87.3% top-1
- vit_so150m2_patch16_reg4_gap_256.sbb_e200_in12k
 
- Updated InternViT-300M '2.5' weights
- Release 1.0.15
Feb 1, 2025
- FYI PyTorch 2.6 & Python 3.13 are tested and working w/ current main and released version of timm
Jan 27, 2025
- Add Kron Optimizer (PSGD w/ Kronecker-factored preconditioner)
What's Changed
- Fix metavar for --input-sizeby @JosuaRieder in #2417
- Add arguments to the respective argument groups by @JosuaRieder in #2416
- Add missing training flag to convert_sync_batchnorm by @collinmccarthy in #2423
- Fix num_classes update in reset_classifier and RDNet forward head call by @brianhou0208 in #2421
- timm: add all to init by @adamjstewart in #2399
- Fiddling with Kron (PSGD) optimizer by @rwightman in #2427
- Try to force numpy<2.0 for torch 1.13 tests, update newest tested torch to 2.5.1 by @rwightman in #2429
- Kron flatten improvements + stochastic weight decay by @rwightman in #2431
- PSGD: unify RNG by @ClashLuke in #2433
- Add vit so150m2 weights by @rwightman in #2439
- adapt_input_conv: add type hints by @adamjstewart in #2441
- SigLIP 2 by @rwightman in #2440
- timm.models: explicitly export attributes by @adamjstewart in #2442
New Contributors
- @collinmccarthy made their first contribution in #2423
- @ClashLuke made their first contribution in #2433
Full Changelog: v1.0.14...v1.0.15
Release v1.0.14
Jan 19, 2025
- Fix loading of LeViT safetensor weights, remove conversion code which should have been deactivated
- Add 'SO150M' ViT weights trained with SBB recipes, decent results, but not optimal shape for ImageNet-12k/1k pretrain/ft
- vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k- 86.7% top-1
- vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k- 87.4% top-1
- vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k
 
- Misc typing, typo, etc. cleanup
- 1.0.14 release to get above LeViT fix out
What's Changed
- Fix nn.Module type hints by @adamjstewart in #2400
- Add missing paper title by @JosuaRieder in #2405
- fix 'timm recipe scripts' link by @JosuaRieder in #2404
- fix typo in EfficientNet docs by @JosuaRieder in #2403
- disable abbreviating csv inference output with ellipses by @JosuaRieder in #2402
- fix incorrect LaTeX formulas by @JosuaRieder in #2406
- VGG ConvMlp: fix layer defaults/types by @adamjstewart in #2409
- Implement --no-console-results in inference.py by @JosuaRieder in #2408
- LeViT safetensors load is broken by conversion code that wasn't deactivated by @rwightman in #2412
- A few more weights by @rwightman in #2413
- Fix typos by @JosuaRieder in #2415
New Contributors
- @adamjstewart made their first contribution in #2400
Full Changelog: v1.0.13...v1.0.14
Release v1.0.13
Jan 9, 2025
- Add support to train and validate in pure bfloat16orfloat16
- wandbproject name arg added by https://github.com/caojiaolong, use arg.experiment for name
- Fix old issue w/ checkpoint saving not working on filesystem w/o hard-link support (e.g. FUSE fs mounts)
- 1.0.13 release
Jan 6, 2025
- Add torch.utils.checkpoint.checkpoint()wrapper intimm.modelsthat defaultsuse_reentrant=False, unlessTIMM_REENTRANT_CKPT=1is set in env.
Dec 31, 2024
- convnext_nano384x384 ImageNet-12k pretrain & fine-tune. https://huggingface.co/models?search=convnext_nano%20r384
- Add AIM-v2 encoders from https://github.com/apple/ml-aim, see on Hub: https://huggingface.co/models?search=timm%20aimv2
- Add PaliGemma2 encoders from https://github.com/google-research/big_vision to existing PaliGemma, see on Hub: https://huggingface.co/models?search=timm%20pali2
- Add missing L/14 DFN2B 39B CLIP ViT, vit_large_patch14_clip_224.dfn2b_s39b
- Fix existing RmsNormlayer & fn to match standard formulation, use PT 2.5 impl when possible. Move old impl toSimpleNormlayer, it's LN w/o centering or bias. There were only twotimmmodels using it, and they have been updated.
- Allow override of cache_dirarg for model creation
- Pass through trust_remote_codefor HF datasets wrapper
- inception_next_attomodel added by creator
- Adan optimizer caution, and Lamb decoupled weighgt decay options
- Some feature_info metadata fixed by https://github.com/brianhou0208
- All OpenCLIP and JAX (CLIP, SigLIP, Pali, etc) model weights that used load time remapping were given their own HF Hub instances so that they work with hf-hub:based loading, and thus will work with new TransformersTimmWrapperModel
What's Changed
- Punch cache_dir through model factory / builder / pretrain helpers by @rwightman in #2356
- Yuweihao inception next atto merge by @rwightman in #2360
- Dataset trust remote tweaks by @rwightman in #2361
- Add --dataset-trust-remote-code to the train.py and validate.py scripts by @grodino in #2328
- Fix feature_info.reduction by @brianhou0208 in #2369
- Add caution to Adan. Add decouple decay option to LAMB. by @rwightman in #2357
- Switching to timm specific weight instances for open_clip image encoders by @rwightman in #2376
- Fix broken image link in Quickstartdoc by @ariG23498 in #2381
- Supporting aimv2 encoders by @rwightman in #2379
- fix: minor typos in markdowns by @ruidazeng in #2382
- Add 384x384 in12k pretrain and finetune for convnext_nano by @rwightman in #2384
- Fixed unfused attn2d scale by @laclouis5 in #2387
- Fix MQA V2 by @laclouis5 in #2388
- Wrap torch checkpoint() fn to default use_reentrant flag to False and allow env var override by @rwightman in #2394
- Add half-precision (bfloat16, float16) support to train & validate scripts by @rwightman in #2397
- Merging wandb project name chages w/ addition by @rwightman in #2398
New Contributors
- @brianhou0208 made their first contribution in #2369
- @ariG23498 made their first contribution in #2381
- @ruidazeng made their first contribution in #2382
- @laclouis5 made their first contribution in #2387
Full Changelog: v1.0.12...v1.0.13
Release v1.0.12
Nov 28, 2024
- More optimizers
- Add MARS optimizer (https://arxiv.org/abs/2411.10438, https://github.com/AGI-Arena/MARS)
- Add LaProp optimizer (https://arxiv.org/abs/2002.04839, https://github.com/Z-T-WANG/LaProp-Optimizer)
- Add masking from 'Cautious Optimizers' (https://arxiv.org/abs/2411.16085, https://github.com/kyleliang919/C-Optim) to Adafactor, Adafactor Big Vision, AdamW (legacy), Adopt, Lamb, LaProp, Lion, NadamW, RMSPropTF, SGDW
- Cleanup some docstrings and type annotations re optimizers and factory
 
- Add MobileNet-V4 Conv Medium models pretrained on in12k and fine-tuned in1k @ 384x384
- Add small cs3darknet, quite good for the speed
Nov 12, 2024
- Optimizer factory refactor
- New factory works by registering optimizers using an OptimInfo dataclass w/ some key traits
- Add list_optimizers,get_optimizer_class,get_optimizer_infoto reworkedcreate_optimizer_v2fn to explore optimizers, get info or class
- deprecate optim.optim_factory, move fns tooptim/_optim_factory.pyandoptim/_param_groups.pyand encourage import viatimm.optim
 
- Add Adopt (https://github.com/iShohei220/adopt) optimizer
- Add 'Big Vision' variant of Adafactor (https://github.com/google-research/big_vision/blob/main/big_vision/optax.py) optimizer
- Fix original Adafactor to pick better factorization dims for convolutions
- Tweak LAMB optimizer with some improvements in torch.where functionality since original, refactor clipping a bit
- dynamic img size support in vit, deit, eva improved to support resize from non-square patch grids, thanks https://github.com/wojtke
Oct 31, 2024
Add a set of new very well trained ResNet & ResNet-V2 18/34 (basic block) weights. See https://huggingface.co/blog/rwightman/resnet-trick-or-treat
Oct 19, 2024
- Cleanup torch amp usage to avoid cuda specific calls, merge support for Ascend (NPU) devices from MengqingCao that should work now in PyTorch 2.5 w/ new device extension autoloading feature. Tested Intel Arc (XPU) in Pytorch 2.5 too and it (mostly) worked.
What's Changed
- mambaout.py: fixed bug by @NightMachinery in #2305
- Cleanup some amp related behaviour to better support different (non-cuda) devices by @rwightman in #2308
- Add NPU backend support for val and inference by @MengqingCao in #2109
- Update some clip pretrained weights to point to new hub locations by @rwightman in #2311
- ResNet vs MNV4 v1/v2 18 & 34 weights by @rwightman in #2316
- Replace deprecated positional argument with --data-dir by @JosuaRieder in #2322
- Fix typo in train.py: bathes > batches by @JosuaRieder in #2321
- Fix positional embedding resampling for non-square inputs in ViT by @wojtke in #2317
- Add trust_remote_code argument to ReaderHfds by @grodino in #2326
- Extend train epoch schedule by warmup_epochs if warmup_prefix enabled by @rwightman in #2325
- Extend existing unit tests using Cover-Agent by @mrT23 in #2331
- An impl of adafactor as per big vision (scaling vit) changes by @rwightman in #2320
- Add py.typed file as recommended by PEP 561 by @antoinebrl in #2252
- Add CODE_OF_CONDUCT.md and CITATION.cff files by @AlinaImtiaz018 in #2333
- Add some 384x384 small model weights by @rwightman in #2334
- In dist training, update loss running avg every step, sync on log by @rwightman in #2340
- Improve WandB logging by @sinahmr in #2341
- A few weights to merge Friday by @rwightman in #2343
- Update timm torchvision resnet weight urls to the updated urls in torchvision by @JohannesTheo in #2346
- More optimizer updates, add MARS, LaProp, add Adopt fix and more by @rwightman in #2347
- Cautious optimizer impl plus some typing cleanup. by @rwightman in #2349
- Add cautious mars, improve test reliability by skipping grad diff for… by @rwightman in #2351
- See if we can avoid some model / layer pickle issues with the aa attr in ConvNormAct by @rwightman in #2353
New Contributors
- @MengqingCao made their first contribution in #2109
- @JosuaRieder made their first contribution in #2322
- @wojtke made their first contribution in #2317
- @grodino made their first contribution in #2326
- @AlinaImtiaz018 made their first contribution in #2333
- @sinahmr made their first contribution in #2341
- @JohannesTheo made their first contribution in #2346
Full Changelog: v1.0.11...v1.0.12