Fix typos

JosuaRieder · rwightman · commit 8d81fdf3d9ce · 2025-01-19T13:39:40.000-08:00
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -10,7 +10,7 @@ Code linting and auto-format (black) are not currently in place but open to cons
 
 A few specific differences from Google style (or black)
 1. Line length is 120 char. Going over is okay in some cases (e.g. I prefer not to break URL across lines).
-2. Hanging indents are always prefered, please avoid aligning arguments with closing brackets or braces.
+2. Hanging indents are always preferred, please avoid aligning arguments with closing brackets or braces.
 
 Example, from Google guide, but this is a NO here:
 ```
diff --git a/README.md b/README.md
@@ -238,7 +238,7 @@ Add a set of new very well trained ResNet & ResNet-V2 18/34 (basic block) weight
 ### May 14, 2024
 * Support loading PaliGemma jax weights into SigLIP ViT models with average pooling.
 * Add Hiera models from Meta (https://github.com/facebookresearch/hiera).
-* Add `normalize=` flag for transorms, return non-normalized torch.Tensor with original dytpe (for `chug`)
+* Add `normalize=` flag for transforms, return non-normalized torch.Tensor with original dytpe (for `chug`)
 * Version 1.0.3 release
 
 ### May 11, 2024
diff --git a/hfdocs/source/changes.mdx b/hfdocs/source/changes.mdx
@@ -93,7 +93,7 @@
 ### May 14, 2024
 * Support loading PaliGemma jax weights into SigLIP ViT models with average pooling.
 * Add Hiera models from Meta (https://github.com/facebookresearch/hiera).
-* Add `normalize=` flag for transorms, return non-normalized torch.Tensor with original dytpe (for `chug`)
+* Add `normalize=` flag for transforms, return non-normalized torch.Tensor with original dytpe (for `chug`)
 * Version 1.0.3 release
 
 ### May 11, 2024
@@ -125,7 +125,7 @@
 ### April 11, 2024
 * Prepping for a long overdue 1.0 release, things have been stable for a while now.
 * Significant feature that's been missing for a while, `features_only=True` support for ViT models with flat hidden states or non-std module layouts (so far covering  `'vit_*', 'twins_*', 'deit*', 'beit*', 'mvitv2*', 'eva*', 'samvit_*', 'flexivit*'`)
-* Above feature support achieved through a new `forward_intermediates()` API that can be used with a feature wrapping module or direclty.
+* Above feature support achieved through a new `forward_intermediates()` API that can be used with a feature wrapping module or directly.
 ```python
 model = timm.create_model('vit_base_patch16_224')
 final_feat, intermediates = model.forward_intermediates(input)
@@ -360,7 +360,7 @@ Datasets & transform refactoring
 * 0.8.15dev0
 
 ### Feb 20, 2023
-* Add 320x320 `convnext_large_mlp.clip_laion2b_ft_320` and `convnext_lage_mlp.clip_laion2b_ft_soup_320` CLIP image tower weights for features & fine-tune
+* Add 320x320 `convnext_large_mlp.clip_laion2b_ft_320` and `convnext_large_mlp.clip_laion2b_ft_soup_320` CLIP image tower weights for features & fine-tune
 * 0.8.13dev0 pypi release for latest changes w/ move to huggingface org
 
 ### Feb 16, 2023
@@ -745,7 +745,7 @@ More models, more fixes
 * Add 'group matching' API to all models to allow grouping model parameters for application of 'layer-wise' LR decay, lr scale added to LR scheduler
 * Gradient checkpointing support added to many models
 * `forward_head(x, pre_logits=False)` fn added to all models to allow separate calls of `forward_features` + `forward_head`
-* All vision transformer and vision MLP models update to return non-pooled / non-token selected features from `foward_features`, for consistency with CNN models, token selection or pooling now applied in `forward_head`
+* All vision transformer and vision MLP models update to return non-pooled / non-token selected features from `forward_features`, for consistency with CNN models, token selection or pooling now applied in `forward_head`
 
 ### Feb 2, 2022
 * [Chris Hughes](https://github.com/Chris-hughes10) posted an exhaustive run through of `timm` on his blog yesterday. Well worth a read. [Getting Started with PyTorch Image Models (timm): A Practitioner’s Guide](https://towardsdatascience.com/getting-started-with-pytorch-image-models-timm-a-practitioners-guide-4e77b4bf9055)
@@ -1058,7 +1058,7 @@ More models, more fixes
 * Add 'group matching' API to all models to allow grouping model parameters for application of 'layer-wise' LR decay, lr scale added to LR scheduler
 * Gradient checkpointing support added to many models
 * `forward_head(x, pre_logits=False)` fn added to all models to allow separate calls of `forward_features` + `forward_head`
-* All vision transformer and vision MLP models update to return non-pooled / non-token selected features from `foward_features`, for consistency with CNN models, token selection or pooling now applied in `forward_head`
+* All vision transformer and vision MLP models update to return non-pooled / non-token selected features from `forward_features`, for consistency with CNN models, token selection or pooling now applied in `forward_head`
 
 ### Feb 2, 2022
 * [Chris Hughes](https://github.com/Chris-hughes10) posted an exhaustive run through of `timm` on his blog yesterday. Well worth a read. [Getting Started with PyTorch Image Models (timm): A Practitioner’s Guide](https://towardsdatascience.com/getting-started-with-pytorch-image-models-timm-a-practitioners-guide-4e77b4bf9055)
diff --git a/hfdocs/source/models/adversarial-inception-v3.mdx b/hfdocs/source/models/adversarial-inception-v3.mdx
@@ -1,6 +1,6 @@
 # Adversarial Inception v3
 
-**Inception v3** is a convolutional neural network architecture from the Inception family that makes several improvements including using [Label Smoothing](https://paperswithcode.com/method/label-smoothing), Factorized 7 x 7 convolutions, and the use of an [auxiliary classifer](https://paperswithcode.com/method/auxiliary-classifier) to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). The key building block is an [Inception Module](https://paperswithcode.com/method/inception-v3-module).
+**Inception v3** is a convolutional neural network architecture from the Inception family that makes several improvements including using [Label Smoothing](https://paperswithcode.com/method/label-smoothing), Factorized 7 x 7 convolutions, and the use of an [auxiliary classifier](https://paperswithcode.com/method/auxiliary-classifier) to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). The key building block is an [Inception Module](https://paperswithcode.com/method/inception-v3-module).
 
 This particular model was trained for study of adversarial examples (adversarial training).
 
diff --git a/hfdocs/source/models/gloun-inception-v3.mdx b/hfdocs/source/models/gloun-inception-v3.mdx
@@ -1,6 +1,6 @@
 # (Gluon) Inception v3
 
-**Inception v3** is a convolutional neural network architecture from the Inception family that makes several improvements including using [Label Smoothing](https://paperswithcode.com/method/label-smoothing), Factorized 7 x 7 convolutions, and the use of an [auxiliary classifer](https://paperswithcode.com/method/auxiliary-classifier) to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). The key building block is an [Inception Module](https://paperswithcode.com/method/inception-v3-module).
+**Inception v3** is a convolutional neural network architecture from the Inception family that makes several improvements including using [Label Smoothing](https://paperswithcode.com/method/label-smoothing), Factorized 7 x 7 convolutions, and the use of an [auxiliary classifier](https://paperswithcode.com/method/auxiliary-classifier) to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). The key building block is an [Inception Module](https://paperswithcode.com/method/inception-v3-module).
 
 The weights from this model were ported from [Gluon](https://cv.gluon.ai/model_zoo/classification.html).
 
diff --git a/hfdocs/source/models/inception-v3.mdx b/hfdocs/source/models/inception-v3.mdx
@@ -1,6 +1,6 @@
 # Inception v3
 
-**Inception v3** is a convolutional neural network architecture from the Inception family that makes several improvements including using [Label Smoothing](https://paperswithcode.com/method/label-smoothing), Factorized 7 x 7 convolutions, and the use of an [auxiliary classifer](https://paperswithcode.com/method/auxiliary-classifier) to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). The key building block is an [Inception Module](https://paperswithcode.com/method/inception-v3-module).
+**Inception v3** is a convolutional neural network architecture from the Inception family that makes several improvements including using [Label Smoothing](https://paperswithcode.com/method/label-smoothing), Factorized 7 x 7 convolutions, and the use of an [auxiliary classifier](https://paperswithcode.com/method/auxiliary-classifier) to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). The key building block is an [Inception Module](https://paperswithcode.com/method/inception-v3-module).
 
 ## How do I use this model on an image?
 
diff --git a/hfdocs/source/models/tf-inception-v3.mdx b/hfdocs/source/models/tf-inception-v3.mdx
@@ -1,6 +1,6 @@
 # (Tensorflow) Inception v3
 
-**Inception v3** is a convolutional neural network architecture from the Inception family that makes several improvements including using [Label Smoothing](https://paperswithcode.com/method/label-smoothing), Factorized 7 x 7 convolutions, and the use of an [auxiliary classifer](https://paperswithcode.com/method/auxiliary-classifier) to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). The key building block is an [Inception Module](https://paperswithcode.com/method/inception-v3-module).
+**Inception v3** is a convolutional neural network architecture from the Inception family that makes several improvements including using [Label Smoothing](https://paperswithcode.com/method/label-smoothing), Factorized 7 x 7 convolutions, and the use of an [auxiliary classifier](https://paperswithcode.com/method/auxiliary-classifier) to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). The key building block is an [Inception Module](https://paperswithcode.com/method/inception-v3-module).
 
 The weights from this model were ported from [Tensorflow/Models](https://github.com/tensorflow/models).
 
diff --git a/timm/data/auto_augment.py b/timm/data/auto_augment.py
@@ -954,7 +954,7 @@ def augment_and_mix_transform(config_str: str, hparams: Optional[Dict] = None):
     Args:
         config_str (str): String defining configuration of random augmentation. Consists of multiple sections separated
             by dashes ('-'). The first section defines the specific variant of rand augment (currently only 'rand').
-            The remaining sections, not order sepecific determine
+            The remaining sections, not order specific determine
                 'm' - integer magnitude (severity) of augmentation mix (default: 3)
                 'w' - integer width of augmentation chain (default: 3)
                 'd' - integer depth of augmentation chain (-1 is random [1, 3], default: -1)
diff --git a/timm/data/imagenet_info.py b/timm/data/imagenet_info.py
@@ -52,7 +52,7 @@ def __init__(self, subset: str = 'imagenet-1k'):
         subset = re.sub(r'[-_\s]', '', subset.lower())
         assert subset in _SUBSETS, f'Unknown imagenet subset {subset}.'
 
-        # WordNet synsets (part-of-speach + offset) are the unique class label names for ImageNet classifiers
+        # WordNet synsets (part-of-speech + offset) are the unique class label names for ImageNet classifiers
         synset_file = _SUBSETS[subset]
         synset_data = pkgutil.get_data(__name__, os.path.join('_info', synset_file))
         self._synsets = synset_data.decode('utf-8').splitlines()
diff --git a/timm/data/readers/reader_hfids.py b/timm/data/readers/reader_hfids.py
@@ -80,7 +80,7 @@ def __init__(
             self.num_samples = split_info.num_examples
         else:
             raise ValueError(
-                "Dataset length is unknown, please pass `num_samples` explicitely. "
+                "Dataset length is unknown, please pass `num_samples` explicitly. "
                 "The number of steps needs to be known in advance for the learning rate scheduler."
             )
 
diff --git a/timm/data/readers/reader_image_folder.py b/timm/data/readers/reader_image_folder.py
@@ -25,7 +25,7 @@ def find_images_and_targets(
     """ Walk folder recursively to discover images and map them to classes by folder names.
 
     Args:
-        folder: root of folder to recrusively search
+        folder: root of folder to recursively search
         types: types (file extensions) to search for in path
         class_to_idx: specify mapping for class (folder name) to class index if set
         leaf_name_only: use only leaf-name of folder walk for class names
diff --git a/timm/data/readers/reader_wds.py b/timm/data/readers/reader_wds.py
@@ -124,7 +124,7 @@ def _info_convert(dict_info):
 
 
 def log_and_continue(exn):
-    """Call in an exception handler to ignore exceptions, isssue a warning, and continue."""
+    """Call in an exception handler to ignore exceptions, issue a warning, and continue."""
     _logger.warning(f'Handling webdataset error ({repr(exn)}). Ignoring.')
     # NOTE: try force an exit on errors that are clearly code / config and not transient
     if isinstance(exn, TypeError):
@@ -277,7 +277,7 @@ def __init__(
             target_img_mode: str = '',
             filename_key: str = 'filename',
             sample_shuffle_size: Optional[int] = None,
-            smaple_initial_size: Optional[int] = None,
+            sample_initial_size: Optional[int] = None,
     ):
         super().__init__()
         if wds is None:
@@ -290,7 +290,7 @@ def __init__(
         self.common_seed = seed  # a seed that's fixed across all worker / distributed instances
         self.shard_shuffle_size = 500
         self.sample_shuffle_size = sample_shuffle_size or SAMPLE_SHUFFLE_SIZE
-        self.sample_initial_size = smaple_initial_size or SAMPLE_INITIAL_SIZE
+        self.sample_initial_size = sample_initial_size or SAMPLE_INITIAL_SIZE
 
         self.input_key = input_key
         self.input_img_mode = input_img_mode
diff --git a/timm/layers/activations.py b/timm/layers/activations.py
@@ -47,7 +47,7 @@ def sigmoid(x, inplace: bool = False):
     return x.sigmoid_() if inplace else x.sigmoid()
 
 
-# PyTorch has this, but not with a consistent inplace argmument interface
+# PyTorch has this, but not with a consistent inplace argument interface
 class Sigmoid(nn.Module):
     def __init__(self, inplace: bool = False):
         super(Sigmoid, self).__init__()
@@ -61,7 +61,7 @@ def tanh(x, inplace: bool = False):
     return x.tanh_() if inplace else x.tanh()
 
 
-# PyTorch has this, but not with a consistent inplace argmument interface
+# PyTorch has this, but not with a consistent inplace argument interface
 class Tanh(nn.Module):
     def __init__(self, inplace: bool = False):
         super(Tanh, self).__init__()
diff --git a/timm/layers/attention2d.py b/timm/layers/attention2d.py
@@ -16,7 +16,7 @@ class MultiQueryAttentionV2(nn.Module):
     Fast Transformer Decoding: One Write-Head is All You Need
     https://arxiv.org/pdf/1911.02150.pdf
 
-    This is an acceletor optimized version - removing multiple unneccessary
+    This is an acceletor optimized version - removing multiple unnecessary
     tensor transpose by re-arranging indices according to the following rules: 1)
     contracted indices are at the end, 2) other indices have the same order in the
     input and output tensores.
@@ -87,7 +87,7 @@ class MultiQueryAttention2d(nn.Module):
      2. query_strides: horizontal & vertical strides on Query only.
 
     This is an optimized version.
-    1. Projections in Attention is explict written out as 1x1 Conv2D.
+    1. Projections in Attention is explicit written out as 1x1 Conv2D.
     2. Additional reshapes are introduced to bring a up to 3x speed up.
     """
     fused_attn: torch.jit.Final[bool]
diff --git a/timm/layers/create_norm_act.py b/timm/layers/create_norm_act.py
@@ -1,7 +1,7 @@
-""" NormAct (Normalizaiton + Activation Layer) Factory
+""" NormAct (Normalization + Activation Layer) Factory
 
 Create norm + act combo modules that attempt to be backwards compatible with separate norm + act
-isntances in models. Where these are used it will be possible to swap separate BN + act layers with
+instances in models. Where these are used it will be possible to swap separate BN + act layers with
 combined modules like IABN or EvoNorms.
 
 Hacked together by / Copyright 2020 Ross Wightman
diff --git a/timm/layers/weight_init.py b/timm/layers/weight_init.py
@@ -78,7 +78,7 @@ def trunc_normal_tf_(tensor, mean=0., std=1., a=-2., b=2.):
 
     NOTE: this 'tf' variant behaves closer to Tensorflow / JAX impl where the
     bounds [a, b] are applied when sampling the normal distribution with mean=0, std=1.0
-    and the result is subsquently scaled and shifted by the mean and std args.
+    and the result is subsequently scaled and shifted by the mean and std args.
 
     Args:
         tensor: an n-dimensional `torch.Tensor`
diff --git a/timm/models/_efficientnet_blocks.py b/timm/models/_efficientnet_blocks.py
@@ -490,7 +490,7 @@ def __init__(
         # https://arxiv.org/abs/2102.10882
         # 1. Rather than adding one CPE before the attention blocks, we add a CPE
         #    into every attention block.
-        # 2. We replace the expensive Conv2D by a Seperable DW Conv.
+        # 2. We replace the expensive Conv2D by a Separable DW Conv.
         if use_cpe:
             self.conv_cpe_dw = create_conv2d(
                 in_chs, in_chs,
diff --git a/timm/models/_features.py b/timm/models/_features.py
@@ -32,7 +32,7 @@ def feature_take_indices(
 ) -> Tuple[List[int], int]:
     """ Determine the absolute feature indices to 'take' from.
 
-    Note: This function can be called in forwar() so must be torchscript compatible,
+    Note: This function can be called in forward() so must be torchscript compatible,
     which requires some incomplete typing and workaround hacks.
 
     Args:
diff --git a/timm/models/byobnet.py b/timm/models/byobnet.py
@@ -611,7 +611,7 @@ def _get_kernel_bias(self) -> Tuple[torch.Tensor, torch.Tensor]:
         return kernel_final, bias_final
 
     def _fuse_bn_tensor(self, branch) -> Tuple[torch.Tensor, torch.Tensor]:
-        """ Method to fuse batchnorm layer with preceeding conv layer.
+        """ Method to fuse batchnorm layer with preceding conv layer.
         Reference: https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py#L95
         """
         if isinstance(branch, ConvNormAct):
@@ -800,7 +800,7 @@ def _get_kernel_bias(self) -> Tuple[torch.Tensor, torch.Tensor]:
         return kernel_final, bias_final
 
     def _fuse_bn_tensor(self, branch) -> Tuple[torch.Tensor, torch.Tensor]:
-        """ Method to fuse batchnorm layer with preceeding conv layer.
+        """ Method to fuse batchnorm layer with preceding conv layer.
         Reference: https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py#L95
         """
         if isinstance(branch, ConvNormAct):
diff --git a/timm/models/crossvit.py b/timm/models/crossvit.py
@@ -21,7 +21,7 @@
 
 
 """
-Modifed from Timm. https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py
+Modified from Timm. https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py
 
 """
 from functools import partial
diff --git a/timm/models/efficientvit_msra.py b/timm/models/efficientvit_msra.py
@@ -246,7 +246,7 @@ def __init__(
     def forward(self, x):
         H = W = self.resolution
         B, C, H_, W_ = x.shape
-        # Only check this for classifcation models
+        # Only check this for classification models
         _assert(H == H_, f'input feature has wrong size, expect {(H, W)}, got {(H_, W_)}')
         _assert(W == W_, f'input feature has wrong size, expect {(H, W)}, got {(H_, W_)}')
         if H <= self.window_resolution and W <= self.window_resolution:
diff --git a/timm/models/fastvit.py b/timm/models/fastvit.py
@@ -231,7 +231,7 @@ def _get_kernel_bias(self) -> Tuple[torch.Tensor, torch.Tensor]:
     def _fuse_bn_tensor(
         self, branch: Union[nn.Sequential, nn.BatchNorm2d]
     ) -> Tuple[torch.Tensor, torch.Tensor]:
-        """Method to fuse batchnorm layer with preceeding conv layer.
+        """Method to fuse batchnorm layer with preceding conv layer.
         Reference: https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py#L95
 
         Args:
diff --git a/timm/models/focalnet.py b/timm/models/focalnet.py
@@ -78,7 +78,7 @@ def forward(self, x):
         x = self.f(x)
         q, ctx, gates = torch.split(x, self.input_split, 1)
 
-        # context aggreation
+        # context aggregation
         ctx_all = 0
         for l, focal_layer in enumerate(self.focal_layers):
             ctx = focal_layer(ctx)
@@ -353,7 +353,7 @@ def __init__(
             focal_levels: How many focal levels at all stages. Note that this excludes the finest-grain level.
             focal_windows: The focal window size at all stages.
             use_overlap_down: Whether to use convolutional embedding.
-            use_post_norm: Whether to use layernorm after modulation (it helps stablize training of large models)
+            use_post_norm: Whether to use layernorm after modulation (it helps stabilize training of large models)
             layerscale_value: Value for layer scale.
             drop_rate: Dropout rate.
             drop_path_rate: Stochastic depth rate.
diff --git a/timm/models/metaformer.py b/timm/models/metaformer.py
diff --git a/timm/models/regnet.py b/timm/models/regnet.py
diff --git a/timm/models/resnetv2.py b/timm/models/resnetv2.py
diff --git a/timm/models/twins.py b/timm/models/twins.py
diff --git a/timm/models/vision_transformer_sam.py b/timm/models/vision_transformer_sam.py
diff --git a/timm/models/volo.py b/timm/models/volo.py
diff --git a/timm/optim/adabelief.py b/timm/optim/adabelief.py
diff --git a/timm/optim/adahessian.py b/timm/optim/adahessian.py
diff --git a/timm/utils/jit.py b/timm/utils/jit.py
diff --git a/timm/utils/model.py b/timm/utils/model.py

Original file line number	Diff line number	Diff line change
`@@ -80,7 +80,7 @@ def __init__(`
`80`	`80`	`self.num_samples = split_info.num_examples`
`81`	`81`	`else:`
`82`	`82`	`raise ValueError(`
`83`		- "Dataset length is unknown, please pass `num_samples` explicitely. "
	`83`	+ "Dataset length is unknown, please pass `num_samples` explicitly. "
`84`	`84`	`"The number of steps needs to be known in advance for the learning rate scheduler."`
`85`	`85`	`)`
`86`	`86`