Releases: pytorch/vision
TorchVision 0.13, including new Multi-weights API, new pre-trained weights, and more
Highlights
Models
Multi-weight support API
TorchVision v0.13 offers a new Multi-weight support API for loading different weights to the existing model builder methods:
from torchvision.models import *
# Old weights with accuracy 76.130%
resnet50(weights=ResNet50_Weights.IMAGENET1K_V1)
# New weights with accuracy 80.858%
resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)
# Best available weights (currently alias for IMAGENET1K_V2)
# Note that these weights may change across versions
resnet50(weights=ResNet50_Weights.DEFAULT)
# Strings are also supported
resnet50(weights="IMAGENET1K_V2")
# No weights - random initialization
resnet50(weights=None)
The new API bundles along with the weights important details such as the preprocessing transforms and meta-data such as labels. Here is how to make the most out of it:
from torchvision.io import read_image
from torchvision.models import resnet50, ResNet50_Weights
img = read_image("test/assets/encode_jpeg/grace_hopper_517x606.jpg")
# Step 1: Initialize model with the best available weights
weights = ResNet50_Weights.DEFAULT
model = resnet50(weights=weights)
model.eval()
# Step 2: Initialize the inference transforms
preprocess = weights.transforms()
# Step 3: Apply inference preprocessing transforms
batch = preprocess(img).unsqueeze(0)
# Step 4: Use the model and print the predicted category
prediction = model(batch).squeeze(0).softmax(0)
class_id = prediction.argmax().item()
score = prediction[class_id].item()
category_name = weights.meta["categories"][class_id]
print(f"{category_name}: {100 * score:.1f}%")
You can read more about the new API in the docs. To provide your feedback, please use this dedicated Github issue.
New architectures and model variants
Classification
The Swin Transformer and EfficienetNetV2 are two popular classification models which are often used for downstream vision tasks. This release includes 6 pre-trained weights for their classification variants. Here is how to use the new models:
import torch
from torchvision.models import *
image = torch.rand(1, 3, 224, 224)
model = swin_t(weights="DEFAULT").eval()
prediction = model(image)
image = torch.rand(1, 3, 384, 384)
model = efficientnet_v2_s(weights="DEFAULT").eval()
prediction = model(image)
In addition to the above, we also provide new variants for existing architectures such as ShuffleNetV2, ResNeXt and MNASNet. The accuracies of all the new pre-trained models obtained on ImageNet-1K are seen below:
Model | Acc@1 | Acc@5 |
---|---|---|
swin_t | 81.474 | 95.776 |
swin_s | 83.196 | 96.36 |
swin_b | 83.582 | 96.64 |
efficientnet_v2_s | 84.228 | 96.878 |
efficientnet_v2_m | 85.112 | 97.156 |
efficientnet_v2_l | 85.808 | 97.788 |
resnext101_64x4d | 83.246 | 96.454 |
resnext101_64x4d (quantized) | 82.898 | 96.326 |
shufflenet_v2_x1_5 | 72.996 | 91.086 |
shufflenet_v2_x1_5 (quantized) | 72.052 | 90.700 |
shufflenet_v2_x2_0 | 76.230 | 93.006 |
shufflenet_v2_x2_0 (quantized) | 75.354 | 92.488 |
mnasnet0_75 | 71.180 | 90.496 |
mnas1_3 | 76.506 | 93.522 |
We would like to thank Hu Ye for contributing to TorchVision the Swin Transformer implementation.
[BETA] Object Detection and Instance Segmentation
We have introduced 3 new model variants for RetinaNet, FasterRCNN and MaskRCNN that include several post-paper architectural optimizations and improved training recipes. All models can be used similarly:
import torch
from torchvision.models.detection import *
images = [torch.rand(3, 800, 600)]
model = retinanet_resnet50_fpn_v2(weights="DEFAULT")
# model = fasterrcnn_resnet50_fpn_v2(weights="DEFAULT")
# model = maskrcnn_resnet50_fpn_v2(weights="DEFAULT")
model.eval()
prediction = model(images)
Below we present the metrics of the new variants on COCO val2017. In parenthesis we denote the improvement over the old variants:
Model | Box mAP | Mask mAP |
---|---|---|
retinanet_resnet50_fpn_v2 | 41.5 (+5.1) | - |
fasterrcnn_resnet50_fpn_v2 | 46.7 (+9.7) | - |
maskrcnn_resnet50_fpn_v2 | 47.4 (+9.5) | 41.8 (+7.2) |
We would like to thank Ross Girshick, Piotr Dollar, Vaibhav Aggarwal, Francisco Massa and Hu Ye for their past research and contributions to this work.
New pre-trained weights
SWAG weights
The ViT and RegNet model variants offer new pre-trained SWAG (Supervised Weakly from hashtAGs) weights. One of the biggest of these models achieves a whopping 88.6% accuracy on ImageNet-1K. We currently offer two versions of the weights: 1) fine-tuned end-to-end weights on ImageNet-1K (highest accuracy) and 2) frozen trunk weights with a linear classifier fit on ImageNet-1K (great for transfer learning). Below we see the detailed accuracies of each model variant:
Model Weights | Acc@1 | Acc@5 |
---|---|---|
RegNet_Y_16GF_Weights.IMAGENET1K_SWAG_E2E_V1 | 86.012 | 98.054 |
RegNet_Y_16GF_Weights.IMAGENET1K_SWAG_LINEAR_V1 | 83.976 | 97.244 |
RegNet_Y_32GF_Weights.IMAGENET1K_SWAG_E2E_V1 | 86.838 | 98.362 |
RegNet_Y_32GF_Weights.IMAGENET1K_SWAG_LINEAR_V1 | 84.622 | 97.48 |
RegNet_Y_128GF_Weights.IMAGENET1K_SWAG_E2E_V1 | 88.228 | 98.682 |
RegNet_Y_128GF_Weights.IMAGENET1K_SWAG_LINEAR_V1 | 86.068 | 97.844 |
ViT_B_16_Weights.IMAGENET1K_SWAG_E2E_V1 | 85.304 | 97.65 |
ViT_B_16_Weights.IMAGENET1K_SWAG_LINEAR_V1 | 81.886 | 96.18 |
ViT_L_16_Weights.IMAGENET1K_SWAG_E2E_V1 | 88.064 | 98.512 |
ViT_L_16_Weights.IMAGENET1K_SWAG_LINEAR_V1 | 85.146 | 97.422 |
ViT_H_14_Weights.IMAGENET1K_SWAG_E2E_V1 | 88.552 | 98.694 |
ViT_H_14_Weights.IMAGENET1K_SWAG_LINEAR_V1 | 85.708 | 97.73 |
The weights can be loaded normally as follows:
from torchvision.models import *
model1 = vit_h_14(weights="IMAGENET1K_SWAG_E2E_V1")
model2 = vit_h_14(weights="IMAGENET1K_SWAG_LINEAR_V1")
The SWAG weights are released under the Attribution-NonCommercial 4.0 International license. We would like to thank Laura Gustafson, Mannat Singh and Aaron Adcock for their work and support in making the weights available to TorchVision.
Model Refresh
The release of the Multi-weight support API enabled us to refresh the most popular models and offer more accurate weights. We improved on average each model by ~3 points. The new recipe used was learned on top of ResNet50 and its details were covered on a previous blogpost.
Model | Old weights | New weights |
---|---|---|
efficientnet_b1 | 78.642 | 79.838 |
mobilenet_v2 | 71.878 | 72.154 |
mobilenet_v3_large | 74.042 | 75.274 |
regnet_y_400mf | 74.046 | 75.804 |
regnet_y_800mf | 76.42 | 78.828 |
regnet_y_1_6gf | 77.95 | 80.876 |
regnet_y_3_2gf | 78.948 | 81.982 |
regnet_y_8gf | 80.032 | 82.828 |
regnet_y_16gf | 80.424 | 82.886 |
regnet_y_32gf | 80.878 | 83.368 |
regnet_x_400mf | 72.834 | 74.864 |
regnet_x_800mf | 75.212 | 77.522 |
regnet_x_1_6gf | 77.04 | 79.668 |
regnet_x_3_2gf | 78.364 | 81.196 |
regnet_x_8gf | 79.344 | 81.682 |
regnet_x_16gf | 80.058 | 82.716 |
regnet_x_32gf | 80.622 | 83.014 |
resnet50 | 76.13 | 80.858 |
resnet50 (quantized) | 75.92 | 80.282 |
resnet101 | 77.374 | 81.886 |
resnet152 | 78.312 | 82.284 |
resnext50_32x4d | 77.618 | 81.198 |
resnext101_32x8d | 79.312 | 82.834 |
resnext101_32x8d (quantized) | 78.986 | 82.574 |
wide_resnet50_2 | 78.468 | 81.602 |
wide_resnet101_2 | 78.848 | 82.51 |
We would like to thank Piotr Dollar, Mannat Singh and Hugo Touvron for their past research and contributions to this work.
Ops and Transforms
New Augmentations, Layers and Losses
This release brings a bunch of new primitives which can be used to produce SOTA models. Some highlights include the addition of AugMix data-augmentation method, the DropBlock layer, the cIoU/dIoU loss and many more. We would like to thank Aditya Oke, Abhijit Deo, Yassine Alouini and Hu Ye for contributing to the project and for helping us maintain TorchVision relevant and fresh.
Documentation
We completely revamped our models documentation to make them easier to browse, and added various key information such as supported image sizes, or image pre-processing steps of pre-trained weights. We now have a main model page with various summary tables of available weights, and each model has a dedicated page. Each model builder is also documented in their own page, with more details about the available weights, including accuracy, minimal image size, lin...
TorchVision 0.12, including new Models, Datasets, GPU Video Decoding, and more
Highlights
New Models
Four new model families have been released in the latest version along with pre-trained weights for their variants: FCOS, RAFT, Vision Transformer (ViT) and ConvNeXt.
Object Detection
FCOS is a popular, fully convolutional, anchor-free model for object detection. In this release we include a community-contributed model implementation as well as pre-trained weights. The model was trained on COCO train2017 and can be used as follows:
import torch
from torchvision import models
x = [torch.rand(3, 224, 224)]
fcos = models.detection.fcos_resnet50_fpn(pretrained=True).eval()
predictions = fcos(x)
The box AP of the pre-trained model on COCO val2017 is 39.2 (see #4961 for more details).
We would like to thank Hu Ye and Zhiqiang Wang for contributing to the model implementation and initial training. This was the first community-contributed model in a long while, and given its success, we decided to use the learnings from this process and create a new model contribution guidelines.
Optical Flow support and RAFT model
Torchvision now supports optical flow! Optical flow models try to predict movement in a video: given two consecutive frames, the model predicts where each pixel of the first frame ends up in the second frame. Check out our new tutorial on Optical Flow!
We implemented a torchscript-compatible RAFT model with pre-trained weights (both normal and “small” versions), and added support for training and evaluating optical flow models. Our training scripts support distributed training across processes and nodes, leading to much faster training time than the original implementation. We also added 5 new optical flow datasets: Flying Chairs, Flying Things, Sintel, Kitti, and HD1K.
Image Classification
Vision Transformer (ViT) and ConvNeXt are two popular architectures which can be used as image classifiers or as backbones for downstream vision tasks. In this release we include 8 pre-trained weights for their classification variants. The models were trained on ImageNet and can be used as follows:
import torch
from torchvision import models
x = torch.rand(1, 3, 224, 224)
vit = models.vit_b_16(pretrained=True).eval()
convnext = models.convnext_tiny(pretrained=True).eval()
predictions1 = vit(x)
predictions2 = convnext(x)
The accuracies of the pre-trained models obtained on ImageNet val are seen below:
Model | Acc@1 | Acc@5 |
---|---|---|
vit_b_16 | 81.072 | 95.318 |
vit_b_32 | 75.912 | 92.466 |
vit_l_16 | 79.662 | 94.638 |
vit_l_32 | 76.972 | 93.07 |
convnext_tiny | 82.52 | 96.146 |
convnext_small | 83.616 | 96.65 |
convnext_base | 84.062 | 96.87 |
convnext_large | 84.414 | 96.976 |
The above models have been trained using an adjusted version of our new training recipe and this allows us to offer models with accuracies significantly higher than the ones on the original papers.
GPU Video Decoding
In this release, we add support for GPU video decoding in the video reading API. To use hardware-accelerated decoding, we just need to pass a cuda device to the video reading API as shown below:
import torchvision
reader = torchvision.io.VideoReader(file_name, device='cuda:0')
for frame in reader:
print(frame)
We also support seeking to anyframe or a keyframe in the video before reading, as shown below:
reader.seek(seek_time)
New Datasets
We have implemented 14 new classification datasets: CLEVR, GTSRB, FER2013, SUN397, Country211, Flowers102, fvgc_aircraft, OxfordIIITPet, DTD, Food 101, Rendered SST2, Stanford cars, PCAM, and EuroSAT.
As part of our work on Optical Flow support (see above for more details), we also added 5 new optical flow datasets: Flying Chairs, Flying Things, Sintel, Kitti, and HD1K.
Documentation
New documentation layout
We have updated our documentation pages to be more compact and easier to browse. Each function / class is now documented in a separate page, clearing up some space in the per-module pages, and easing the discovery of the proposed APIs. Compare e.g. our previous docs vs the new ones. Please let us know if you have any feedback!
Model contribution guidelines
New model contribution guidelines have been published following the success of the FCOS model which was contributed by the community. These guidelines aim to be an overview of the model contribution process for anyone who would like to suggest, implement and train a new model.
Upcoming Prototype APIs
We are currently working on a prototype API which adds Multi-weight support on all of our model builder methods. This will enable us to offer multiple pre-trained weights, associated with their meta-data and inference transforms. The API is still under review and thus was not included in the release but you can read more about it on our blogpost and provide your feedback on the dedicated Github issue.
Changes in our deprecation policy
Up until now, torchvision would almost never remove deprecated APIs. In order to be more aligned and consistent with pytorch core, we are updating our deprecation policy. We are now following a 2-release deprecation cycle: deprecated APIs will raise a warning for 2 versions, and will be removed after that. To reflect these changes and to smooth the transition, we have decided to:
- Remove all APIs that had been deprecated before or on v0.8, released 1.5 years ago.
- Update the removal timeline of all other deprecated APIs to v0.14, to reflect the new 2-cycle policy starting now in v0.12.
Backward-incompatible changes
[models.quantization] Removed the Quantized shufflenet_v2_x1_5 and shufflenet_v2_x2_0 model builders which had no associated weights, rendering them useless. Additionally we added pre-trained weights for the shufflenet_v2_x0_5 quantized variant.. (#4854)
[ops] Change to stable sort in nms implementations - this change can lead to different behavior in rare cases therefore it has been flagged as backwards-incompatible (#4767)
[transforms] Changed the center and the parametrization of shear X/Y in Auto Augment transforms to align with the original papers (#5285) (#5384)
Deprecations
Note: in order to be more aligned with pytorch core, we are updating our deprecation policy. Please read more above in the “Highlights” section.
[ops] The ops.poolers.MultiScaleRoIAlign
public methods setup_setup_scales
, convert_to_roi_format
, and infer_scale
have been deprecated and will be removed in 0.14 (#4951) (#4810)
New Features
[datasets] New optical flow datasets added: FlyingChairs, Kitti, Sintel, FlyingThings3D, and HD1K (#4860) (#4845) (#4858) (#4890) (#5004) (#4889) (#4888) (#4870)
[datasets] New classification datasets support for FLAVA: CLEVR, GTSRB, FER2013, SUN397, Country211, Flowers102, fvgc_aircraft, OxfordIIITPet, DTD, Food 101, Rendered SST2, Stanford cars, PCAM, and EuroSAT (#5120) (#5130) (#5117) (#5132) (#5138) (#5177) (#5178) (#5116) (#5115) (#5119) (#5220) (#5166) (#5203) (#5114) (#5164) (#5280)
[models] Add VisionTransformer model (#5173) (#5210) (#5172) (#5085) (#5226) (#5025) (#5086) (#5159)
[models] Add ConvNeXt model (#5330) (#5253)
[models] Add RAFT models and support for optical flow model training (#5022) (#5070) (#5174) (#5381) (#5078) (#5076) (#5081) (#5079) (#5026) (#5027) (#5082) (#5060) (#4868) (#4657) (#4732)
[models] Add FCOS model (#4961) (#5267)
[utils] Add utility to convert optical flow to an image (#5134) (#5308)
[utils] Add utility to draw keypoints (#4216)
[video] Add video GPU decoder (#5019) (#5191) (#5215) (#5256) (#4474) (#3179) (#4878) (#5328) (#5327) (#5183) (#4947) (#5192)
Improvements
[datasets] Migrate mnist dataset from np.frombuffer (#4598)
[io, tests] Switch from np.frombuffer to torch.frombuffer (#4578)
[models] Update ResNet-50 accuracy with Repeated Augmentation (#5201)
[models] Add regnet_y_128gf factory function, and several regnet model weights (#5176) (#4530)
[models] Adding min_size to classification and video models (#5223)
[models] Remove in-place mutation in DefaultBoxGenerator (#5279)
[models] Added Dropout parameter to Models Constructors (#4580)
[models] Allow to use custom norm_layer (#4621)
[models] Add In...
Minor release
This is a minor release compatible with PyTorch 1.10.2 and a minor bug fix.
Highlights
Bug Fixes
- [CI] Skip jpeg comparison tests with PIL (#5232)
Minor bugfix release
This minor release bumps the pinned PyTorch version to v1.10.1 and contains some minor bug fixes.
Highlights
Bug Fixes
- [CI] Fix clang_format issue (#5061)
- [CI, MOBILE] Fix binary_libtorchvision_ops_android job (#5062)
- [CI] Add numpy as explicit dependency to build_cmake.sh (#5065)
- [MODELS] Amend the weights only if quantize=True. (#5066)
- [TRANSFORMS] Fix augmentation space to be uint8 compatible (#5067)
- [DATASETS] Fix WIDERFace download links (#5068)
- [BUILD, WINDOWS] Workaround for loading bundled DLLs (#5094)
Update dependency on wheels to match version in PyPI
Users were reporting issues installing torchvision on PyPI, this release contains an update to the dependencies for wheels to point directly to torch==0.10.0
RegNet, EfficientNet, FX Feature Extraction and more
This release introduces the RegNet and EfficientNet architectures, a new FX-based utility to perform Feature Extraction, new data augmentation techniques such as RandAugment and TrivialAugment, updated training recipes that support EMA, Label Smoothing, Learning-Rate Warmup, Mixup and Cutmix, and many more.
Highlights
New Models
RegNet and EfficientNet are two popular architectures that can be scaled to different computational budgets. In this release we include 22 pre-trained weights for their classification variants. The models were trained on ImageNet and can be used as follows:
import torch
from torchvision import models
x = torch.rand(1, 3, 224, 224)
regnet = models.regnet_y_400mf(pretrained=True)
regnet.eval()
predictions = regnet(x)
efficientnet = models.efficientnet_b0(pretrained=True)
efficientnet.eval()
predictions = efficientnet(x)
The accuracies of the pre-trained models obtained on ImageNet val are seen below (see #4403, #4530 and #4293 for more details)
Model | Acc@1 | Acc@5 |
---|---|---|
regnet_x_400mf | 72.834 | 90.95 |
regnet_x_800mf | 75.212 | 92.348 |
regnet_x_1_6gf | 77.04 | 93.44 |
regnet_x_3_2gf | 78.364 | 93.992 |
regnet_x_8gf | 79.344 | 94.686 |
regnet_x_16gf | 80.058 | 94.944 |
regnet_x_32gf | 80.622 | 95.248 |
regnet_y_400mf | 74.046 | 91.716 |
regnet_y_800mf | 76.42 | 93.136 |
regnet_y_1_6gf | 77.95 | 93.966 |
regnet_y_3_2gf | 78.948 | 94.576 |
regnet_y_8gf | 80.032 | 95.048 |
regnet_y_16gf | 80.424 | 95.24 |
regnet_y_32gf | 80.878 | 95.34 |
EfficientNet-B0 | 77.692 | 93.532 |
EfficientNet-B1 | 78.642 | 94.186 |
EfficientNet-B2 | 80.608 | 95.31 |
EfficientNet-B3 | 82.008 | 96.054 |
EfficientNet-B4 | 83.384 | 96.594 |
EfficientNet-B5 | 83.444 | 96.628 |
EfficientNet-B6 | 84.008 | 96.916 |
EfficientNet-B7 | 84.122 | 96.908 |
We would like to thank Ross Wightman and Luke Melas-Kyriazi for contributing the weights of the EfficientNet variants.
FX-based Feature Extraction
A new Feature Extraction method has been added to our utilities. It uses PyTorch FX and enables us to retrieve the outputs of intermediate layers of a network which is useful for feature extraction and visualization. Here is an example of how to use the new utility:
import torch
from torchvision.models import resnet50
from torchvision.models.feature_extraction import create_feature_extractor
x = torch.rand(1, 3, 224, 224)
model = resnet50()
return_nodes = {
"layer4.2.relu_2": "layer4"
}
model2 = create_feature_extractor(model, return_nodes=return_nodes)
intermediate_outputs = model2(x)
print(intermediate_outputs['layer4'].shape)
We would like to thank Alexander Soare for developing this utility.
New Data Augmentations
Two new Automatic Augmentation techniques were added: Rand Augment and Trivial Augment. Both methods can be used as drop-in replacement of the AutoAugment technique as seen below:
from torchvision import transforms
t = transforms.RandAugment()
# t = transforms.TrivialAugmentWide()
transformed = t(image)
transform = transforms.Compose([
transforms.Resize(256),
transforms.RandAugment(), # transforms.TrivialAugmentWide()
transforms.ToTensor()])
We would like to thank Samuel G. Müller for contributing Trivial Augment and for his help on refactoring the AA package.
Updated Training Recipes
We have updated our training reference scripts to add support of Exponential Moving Average, Label Smoothing, Learning-Rate Warmup, Mixup, Cutmix and other SOTA primitives. The above enabled us to improve the classification Acc@1 of some pre-trained models by over 4 points. A major update of the existing pre-trained weights is expected on the next release.
Backward-incompatible changes
[models] Use torch instead of scipy for random initialization of inception and googlenet weights (#4256)
Deprecations
[models] Deprecate the C++ vision::models namespace (#4375)
New Features
[datasets] Add iNaturalist dataset (#4123)
[datasets] Download and Kinetics 400/600/700 Datasets (#3680)
[datasets] Added LFW Dataset (#4255)
[models] Add FX feature extraction as an alternative to intermediate_layer_getter (#4302) (#4418)
[models] Add RegNet Architecture in TorchVision (#4403) (#4530) (#4550)
[ops] Add new masks_to_boxes op (#4290) (#4469)
[ops] Add StochasticDepth implementation (#4301)
[reference scripts] Adding Mixup and Cutmix (#4379)
[transforms] Integration of TrivialAugment with the current AutoAugment Code (#4221)
[transforms] Adding RandAugment implementation (#4348)
[models] Add EfficientNet Architecture in TorchVision (#4293)
Improvements
Various documentation improvements (#4239) (#4251) (#4275) (#4342) (#3894) (#4159) (#4133) (#4138) (#4089) (#3944) (#4349) (#3754) (#4308) (#4352) (#4318) (#4244) (#4362) (#3863) (#4382) (#4484) (#4503) (#4376) (#4457) (#4505) (#4363) (#4361) (#4337) (#4546) (#4553) (#4565) (#4567) (#4574) (#4575) (#4383) (#4390) (#3409) (#4451) (#4340) (#3967) (#4072) (#4028) (#4132)
[build] Add CUDA-11.3 builds to torchvision (#4248)
[ci, tests] Skip some CPU-only tests on CircleCI machines with GPU (#4002) (#4025) (#4062)
[ci] New issue templates (#4299)
[ci] Various CI improvements, in particular putting back GPU testing on windows (#4421) (#4014) (#4053) (#4482) (#4475) (#3998) (#4388) (#4179) (#4394) (#4162) (#4065) (#3928) (#4081) (#4203) (#4011) (#4055) (#4074) (#4419) (#4067) (#4201) (#4200) (#4202) (#4496) (#3925)
[ci] ping maintainers in case a PR was not properly labeled (#3993) (#4012) (#4021) (#4501)
[datasets] Add bzip2 file compression support to datasets (#4097)
[datasets] Faster dataset indexing (#3939)
[datasets] Enable logging of internal dataset instanciations. (#4319) (#4090)
[datasets] Removed copy=False in torch.from_numpy in MNIST to avoid warning (#4184)
[io] Add warning for files with corrupt containers (#3961)
[models, tests] Add test to check that classification models are FX-compatible (#3662)
[tests] Speedup various tests (#3929) (#3933) (#3936)
[models] Allow custom activation in SqueezeExcitation of EfficientNet (#4448)
[models] Allow gradient backpropagation through GeneralizedRCNNTransform to inputs (#4327)
[ops, tests] Add JIT tests (#4472)
[ops] Make StochasticDepth FX-compatible (#4373)
[ops] Added backward pass on CPU and CUDA for interpolation with anti-alias option (#4208) (#4211)
[ops] Small refactoring to support opt mode for torchvision ops (fb internal specific) (#4080) (#4095)
[reference scripts] Added Exponential Moving Average support to classification reference script (#4381) (#4406) (#4407)
[reference scripts] Adding label smoothing on classification reference (#4335)
[reference scripts] Further enhance Classification Reference (#4444)
[reference scripts] Replaced to_tensor() with pil_to_tensor() + convert_image_dtype() (#4452)
[reference scripts] Update the metrics output on reference scripts (#4408)
[reference scripts] Warmup schedulers in References (#4411)
[tests] Add check for fx compatibility on segmentation and video models (#4131)
[tests] Mock redirection logic for tests (#4197)
[tests] Replace set_deterministic with non-deprecated spelling (#4212)
[tests] Skip building torchvision with ffmpeg when python==3.9 (#4417)
[tests] [jit] Make operation call accept Stack& instead Stack* (#63414) (#4380)
[tests] make tests that involve GDrive more robust (#4454)
[tests] remove dependency for dtype getters (#4291)
[transforms] Replaced example usage of ToTensor() by PILToTensor() + ConvertImageDtype() (#4494)
[transforms] Explicitly copying array in pil_to_tensor (#4566) (#4573)
[transforms] Make get_image_size and get_image_num_channels public. (#4321)
[transforms] adding gray images support for adjust_contrast and adjust_saturation (#4477) (#4480)
[utils] Support single color in utils.draw_bounding_boxes (#4075)
[video, documentation] Port the video_api.ipynb notebook to the example gallery (#4241)
[video, io, tests] Added check for invalid input file (#3932)
[video, io] remove deprecated function call (#3861) (#3989)
[video, tests] Removed test_audio_video_sync as it doesn't work as expected (#4050)
[video] Build torchvision with ffmpeg only on Linux and ignore ffmpeg on other platforms (#4413, #4410, #4041)
Bug Fixes
[build] Conda: Add numpy dependency (#4442)
[build] Explicitly exclude PIL 8.3.0 from compatible dependencies (#4148)
[build] More robust version check (#4285)
[ci] Fix broken clang format test. (#4320)
[ci] Remove mentions of conda-forge (#4082)
[ci] fixup '' -> '/./' for CI filter (#4059)
[datasets] Fix download from google drive which was downloading empty files in some cases (#4109)
[datasets] Fix splitting CelebA dataset (#4377)
[datasets] Add support for files with periods in name (#4099)
[io, tests] Don't check transparency channel for pil >= 8.3 in test_decode_png (#4167)
[io] Fix size_t issues across JPEG versions and platforms (#4439)
[io] Raise proper error when decoding 16-bits jpegs (#4101)
[io] Unpinned the libjpeg version and fixed jpeg_mem_dest's size type Wind… (#4288)
[io] deinterlacing PNG images with read_image (#4268)
[io] More robust ffmpeg version query in setup.py (#4254)
[io] Fixed read_image bug (#3948)
[models] Don't download backbone weights if pretrained=True (#4283)
[onnx, tests] Do not disable profiling executor in ...
Minor bugfix release
This release depends on pytorch 1.9.1
No functional changes other than minor updates to CI rules.
iOS support, GPU image decoding, SSDlite and more
This release improves support for mobile, with new mobile-friendly detection models based on SSD and SSDlite, CPU kernels for quantized NMS and quantized RoIAlign, pre-compiled binaries for iOS available in cocoapods and an iOS demo app. It also improves image IO by providing JPEG decoding on the GPU, and many more.
Highlights
[BETA] New models for detection
SSD and SSDlite are two popular object detection architectures which are efficient in terms of speed and provide good results for low resolution pictures. In this release, we provide implementations for the original SSD model with VGG16 backbone and for its mobile-friendly variant SSDlite with MobileNetV3-Large backbone. The models were pre-trained on COCO train2017 and can be used as follows:
import torch
import torchvision
# Original SSD variant
x = [torch.rand(3, 300, 300), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.ssd300_vgg16(pretrained=True)
m_detector.eval()
predictions = m_detector(x)
# Mobile-friendly SSDlite variant
x = [torch.rand(3, 320, 320), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.ssdlite320_mobilenet_v3_large(pretrained=True)
m_detector.eval()
predictions = m_detector(x)
The following accuracies can be obtained on COCO val2017 (full results available in #3403 and #3757):
Model | mAP | mAP@50 | mAP@75 |
---|---|---|---|
SSD300 VGG16 | 25.1 | 41.5 | 26.2 |
SSDlite320 MobileNetV3-Large | 21.3 | 34.3 | 22.1 |
[STABLE] Quantized kernels for object detection
The forward pass of the nms and roi_align operators now support tensors with a quantized dtype, which can help lowering the memory footprint of object detection models, particularly on mobile environments.
[BETA] JPEG decoding on the GPU
Decoding jpegs is now possible on GPUs with the use of nvjpeg, which should be readily available in your CUDA setup. The decoding time of a single image should be about 2 to 3 times faster than with libjpeg on CPU. While the resulting tensor will be stored on the GPU device, the input raw tensor still needs to reside on the host (CPU), because the first stages of the decoding process take place on the host:
from torchvision.io.image import read_file, decode_jpeg
data = read_file('path_to_image.jpg') # raw data is on CPU
img = decode_jpeg(data, device='cuda') # decoded image in on GPU
[BETA] iOS support
TorchVision 0.10 now provides pre-compiled iOS binaries for its C++ operators, which means you can run Faster R-CNN and Mask R-CNN on iOS. An example app on how to build a program leveraging those ops can be found in here.
[STABLE] Speed optimizations for Tensor transforms
The resize and flip transforms have been optimized and its runtime improved by up to 5x on the CPU. The corresponding PRs were sent to PyTorch in pytorch/pytorch#51653, pytorch/pytorch#54500 and pytorch/pytorch#56713
[STABLE] Documentation improvements
Significant improvements were made to the documentation. In particular, a new gallery of examples is available: see here for the latest version (the stable version is not released at the time of writing). These examples visually illustrate how each transform acts on an image, and also properly documents and illustrate the output of the segmentation models.
The example gallery will be extended in the future to provide more comprehensive examples and serve as a reference for common torchvision tasks.
Backwards Incompatible Changes
- [transforms] Ensure input type of
normalize
is float. (#3621) - [models] Use PyTorch
smooth_l1_loss
and remove private custom implementation (#3539)
New Features
- Added iOS binaries and test app (#3582)(#3629) (#3806)
- [datasets] Added KITTI dataset (#3640)
- [utils] Added utility to draw segmentation masks (#3330, #3824)
- [models] Added the SSD & SSDlite object detection models (#3403, #3757, #3766, #3855, #3896, #3818, #3799)
- [transforms] Added
antialias
option totransforms.functional.resize
(#3761, #3810, #3842) - [transforms] Add new
max_size
parameter toResize
(#3494) - [io] Support for decoding jpegs on GPU with
nvjpeg
(#3792) - [ci, rocm] Add ROCm to builds (#3840) (#3604) (#3575)
- [ops, models.quantization] Add quantized version of NMS (#3601)
- [ops, models.quantization] Add quantized version of RoIAlign (#3624, #3904)
Improvement
- [build] Various build improvements: (#3618) (#3622) (#3399) (#3794) (#3561)
- [ci] Various CI improvements (#3647) (#3609) (#3635) (#3599) (#3778) (#3636) (#3809) (#3625) (#3764) (#3679) (#3869) (#3871) (#3444) (#3445) (#3480) (#3768) (#3919) (#3641)(#3900)
- [datasets] Improve error handling in
make_dataset
(#3496) - [datasets] Remove caching from MNIST and variants (#3420)
- [datasets] Make
DatasetFolder.find_classes
public (#3628) - [datasets] Separate extraction and decompression logic in
datasets.utils.extract_archive
(#3443) - [datasets, tests] Improve dataset test coverage and infrastructure (#3450) (#3457) (#3454) (#3447) (#3489) (#3661) (#3458 (#3705) (#3411) (#3461) (#3465) (#3543) (#3550) (#3665) (#3464) (#3595) (#3466) (#3468) (#3467) (#3486) (#3736) (#3730) (#3731) (#3477) (#3589) (#3503) (#3423) (#3492)(#3578) (#3605) (#3448) (#3864) (#3544)
- [datasets, tests] Fix lazy importing for dataset tests (#3481)
- [datasets, tests] Fix
test_extract(zip|tar|tar_xz|gzip)
on windows (#3542) - [datasets, tests] Fix
kwargs
forwarding in fake data utility functions (#3459) - [datasets, tests] Properly fix dataset test that passes by accident (#3434)
- [documentation] Improve the documentation infrastructure (#3868) (#3724) (#3834) (#3689) (#3700) (#3513) (#3671) (#3490) (#3660) (#3594)
- [documentation] Various documentation improvements (#3793) (#3715) (#3727) (#3838) (#3701) (#3923) (#3643) (#3537) (#3691) (#3453) (#3437) (#3732) (#3683) (#3853) (#3684) (#3576) (#3739) (#3530) (#3586) (#3744) (#3645) (#3694) (#3584) (#3615) (#3693) (#3706) (#3646) (#3780) (#3704) (#3774) (#3634)(#3591)(#3807)(#3663)
- [documentation, ci] Improve the CI infrastructure for documentation (#3734) (#3837) (#3796) (#3711)
- [io] remove deprecated function calls (#3859) (#3858)
- [documentation, io] Improve IO docs and expose
ImageReadMode
intorchvision.io
(#3812) - [onnx, models] Replace
reshape
withflatten
in MobileNetV2 (#3462) - [ops, tests] Added test for
aligned=True
(#3540) - [ops, tests] Add onnx test for
batched_nms
(#3483) - [tests] Various test improvements (#3548) (#3422) (#3435) (#3860) (#3479) (#3721) (#3872) (#3908) (#2916) (#3917) (#3920) (#3579)
- [transforms] add
__repr__
fortransforms.RandomErasing
(#3491) - [transforms, documentation] Adds Documentation for AutoAugmentation (#3529)
- [transforms, documentation] Add illustrations of transforms with sphinx-gallery (#3652)
- [datasets] Remove pandas dependency for CelebA dataset (#3656, #3698)
- [documentation] Add docs for missing datasets (#3536)
- [referencescripts] Make reference scripts compatible with
submitit
(#3785) - [referencescripts] Updated
all_gather()
to make use ofall_gather_object()
from PyTorch (#3857) - [datasets] Added dataset download support in fbcode (#3823) (#3826)
Code quality
- Remove inconsistent FB copyright headers (#3741)
- Keep consistency in classes
ConvBNActivation
(#3750) - Removed unused imports (#3738, #3740, #3639)
- Fixed
floor_divide
deprecation warnings seen in pytest output (#3672) - Unify onnx and JIT
resize
implementations (#3654) - Cleaned-up imports in test files related to datasets (#3720)
- [documentation] Remove old css file (#3839)
- [ci] Fix inconsistent version pinning across yaml files (#3790)
- [datasets] Remove redundant
path.join
inPlaces365
(#3545) - [datasets] Remove imprecise error handling in
PhotoTour
dataset (#3488) - [datasets, tests] Remove obsolete
test_datasets_transforms.py
(#3867) - [models] Making protected params of MobileNetV3 public (#3828)
- [models] Make target argument in
transform.py
truly optional (#3866) - [models] Adding some references on MobileNetV3 implementation. (#3850)
- [models] Refactored
set_cell_anchors()
inAnchorGenerator
(#3755) - [ops] Minor cleanup of
roi_align_forward_kernel_impl
(#3619) - [ops] Replace deprecated
AutoNonVariableTypeMode
withAutoDispatchBelowADInplaceOrView
. (#3786, #3897) - [tests] Port tests to use pytest (#3852, #3845, #3697, #3907, #3749)
- [ops, tests] simplify
get_script_fn
(#3541) - [tests] Use torch.testing.assert_close in out test suite (#3886) (#3885) (#3883) (#3882) (#3881) (#3887) (#3880) (#3878) (#3877) (#3875) (#3888) (#3874) (#3884) (#3876) (#3879) (#3873)
- [tests] Clean up test accept behaviour (#3759)
- [tests] Remove unused
masks
variable intest_image.py
(#3910) - [transforms] use ternary if in
resize
(#3533) - [transforms] replaced deprecated call to
ByteTensor
withfrom_numpy
(#3813) - [transforms] Remove unnecessary casting in
adjust_gamma
(#3472)
Bugfixes
- [ci] set empty cxx flags as default (#3474)
- [android][test_app] Cleanup duplicate dependency (#3428)
- Remove leftover exception (#3717)
- Corrected spelling in a
TypeError
(#3659) - Add missing device info. (#3651)
- Moving tensors to the right device (#3870)
- Proper error message (#3725)
- [ci, io] Pin JPEG version to resolve the size_t issue on windows (#3787)
- [datasets] Make LSUN OS agnostic (#3455)
- [datasets] Update
squeezenet
urls (#3581) - [datasets] Add
.item()
to thetarget
variable infakedataset.py
(#3587) - [datasets] Fix VOC da...
Dataset bugfixes
Highlights
This minor release bumps the pinned PyTorch version to v1.8.1, and brings a few bugfixes for datasets, including MNIST download not being available.
Bugfixes
Mobile support, AutoAugment, improved IO and more
This release introduces improved support for mobile, with new mobile-friendly models, pre-compiled binaries for Android available in maven and an android demo app. It also improves image IO and provides new data augmentations including AutoAugment.
Highlights
Better mobile support
torchvision 0.9 adds support for the MobileNetV3 architecture with pre-trained weights for Classification, Object Detection and Segmentation tasks.
It also improves C++ operators so that they can be compiled and run on Android, and we are providing pre-compiled torchvision artifacts published to jcenter. An example application on how to use the torchvision ops on an Android app can be found in here.
Classification
We provide MobileNetV3 variants (including a quantized version) pre-trained on ImageNet 2012.
import torch
import torchvision
# Classification
x = torch.rand(1, 3, 224, 224)
m_classifier = torchvision.models.mobilenet_v3_large(pretrained=True)
# m_classifier = torchvision.models.mobilenet_v3_small(pretrained=True)
m_classifier.eval()
predictions = m_classifier(x)
# Quantized Classification
x = torch.rand(1, 3, 224, 224)
m_classifier = torchvision.models.quantization.mobilenet_v3_large(pretrained=True)
m_classifier.eval()
predictions = m_classifier(x)
The pre-trained models have the following accuracies on ImageNet 2012 val:
Model | Top-1 Acc | Top-5 Acc |
---|---|---|
MobileNetV3 Large | 74.042 | 91.340 |
MobileNetV3 Large (Quantized) | 73.004 | 90.858 |
MobileNetV3 Small | 67.620 | 87.404 |
Object Detection
We provide two variants of Faster R-CNN with MobileNetV3 backbone pre-trained on COCO train2017. They can be obtained as follows
import torch
import torchvision
# Fast Low Resolution Model
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(pretrained=True)
m_detector.eval()
predictions = m_detector(x)
# Highly Accurate High Resolution Model
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_fpn(pretrained=True)
m_detector.eval()
predictions = m_detector(x)
And yield the following accuracies on COCO val 2017 (full results available in #3265):
Model | mAP | mAP@50 | mAP@75 |
---|---|---|---|
Faster R-CNN MobileNetV3-Large 320 FPN | 22.8 | 38.0 | 23.2 |
Faster R-CNN MobileNetV3-Large FPN | 32.8 | 52.5 | 34.3 |
Semantic Segmentation
We also provide pre-trained models for semantic segmentation. The models have been trained on a subset of COCO train2017, which contains the same 20 categories as those from Pascal VOC.
import torch
import torchvision
# Fast Mobile Model
x = torch.rand(1, 3, 520, 520)
m_segmenter = torchvision.models.segmentation.lraspp_mobilenet_v3_large(pretrained=True)
m_segmenter.eval()
predictions = m_segmenter(x)
# Highly Accurate Mobile Model
x = torch.rand(1, 3, 520, 520)
m_segmenter = torchvision.models.segmentation.deeplabv3_mobilenet_v3_large(pretrained=True)
m_segmenter.eval()
predictions = m_segmenter(x)
The pre-trained models give the following results on the subset of COCO val2017 which contain the same 20 categories as those present in Pascal VOC (full results in #3276):
Model | mean IoU | global pixelwise accuracy |
---|---|---|
Lite R-ASPP with Dilated MobileNetV3 Large Backbone | 57.9 | 91.2 |
DeepLabV3 with Dilated MobileNetV3 Large Backbone | 60.3 | 91.2 |
Addition of the AutoAugment method
AutoAugment is a common Data Augmentation technique that can improve the accuracy of Scene Classification models. Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that ImageNet policies provide significant improvements when applied to other datasets.
In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFA10 and SVHN. The new transform can be used standalone or mixed-and-matched with existing transforms:
from torchvision import transforms
t = transforms.AutoAugment()
transformed = t(image)
transform=transforms.Compose([
transforms.Resize(256),
transforms.AutoAugment(),
transforms.ToTensor()])
Improved Image IO and on-the-fly image type conversions
All the read and decode methods of the io.image
package have been updated to:
- Add support for Palette, Grayscale Alpha and RBG Alpha image types during PNG decoding.
- Allow the on-the-fly conversion of image from one type to the other during read.
from torchvision.io.image import read_image, ImageReadMode
# keeps original type, channels unchanged
x1 = read_image("image.png")
# converts to grayscale, channels = 1
x2 = read_image("image.png", mode=ImageReadMode.GRAY)
# converts to grayscale with alpha transparency, channels = 2
x3 = read_image("image.png", mode=ImageReadMode.GRAY_ALPHA)
# coverts to RGB, channels = 3
x4 = read_image("image.png", mode=ImageReadMode.RGB)
# converts to RGB with alpha transparency, channels = 4
x5 = read_image("image.png", mode=ImageReadMode.RGB_ALPHA)
Python 3.9 and CUDA 11.1
This release adds official support for Python 3.9 and CUDA 11.1 (#3341, #3418)
Backwards Incompatible Changes
- [Ops] Change default
eps
value ofFrozenBN
to better align withnn.BatchNorm
(#2933) - [Ops] Remove deprecated _new_empty_tensor. (#3156)
- [Transforms]
ColorJitter
gets its random params by callingget_params()
(#3001) - [Transforms] Change rounding of transforms on integer tensors (#2964)
- [Utils] Remove
normalize
fromsave_image
(#3324)
New Features
- [Datasets] Add WiderFace dataset (#2883)
- [Models] Add MobileNetV3 architecture:
- [Models] Improve speed/accuracy of FasterRCNN by introducing a score threshold on RPN (#3205)
- [Mobile] Add Android gradle project with demo test app (#2897)
- [Transforms] Implemented AutoAugment, along with required new transforms + Policies (#3123)
- [Ops] Added support of Autocast in all Operators: #2938, #2926, #2922, #2928, #2905, #2906, #2907, #2898
- [Ops] Add modulation input for DeformConv2D (#2791)
- [IO] Improved
io.image
with on-the-fly image type conversions: (#3193, #3069, #3024, #2988, #2984) - [IO] Add option to write audio to video file (#2304)
- [Utils] Added a utility to draw bounding boxes (#2785, #3296, #3075)
Improvements
Datasets
- Concatenate small tensors in video datasets to reduce the use of shared file descriptor (#1795)
- Improve testing for datasets (#3336, #3337, #3402, #3412, #3413, #3415, #3416, #3345, #3376, #3346, #3338)
- Check if dataset file is located on Google Drive before downloading it (#3245)
- Improve Coco implementation (#3417)
- Make download_url follow redirects (#3236)
make_dataset
asstaticmethod
ofDatasetFolder
(#3215)- Add a warning if any clip can't be obtained from a video in
VideoClips
. (#2513)
Models
- Improve error message in
AnchorGenerator
(#2960) - Disable pretrained backbone downloading if pretrained is True in segmentation models (#3325)
- Support for image with no annotations in RetinaNet (#3032)
- Change RoIHeads reshape to support empty batches. (#3031)
- Fixed typing exception throwing issues with JIT (#3029)
- Replace deprecated
functional.sigmoid
withtorch.sigmoid
in RetinaNet (#3307) - Assert that inputs are floating point in Faster R-CNN normalize method (#3266)
- Speedup RetinaNet's postprocessing (#2828)
Ops
- Added eps in the
__repr__
of FrozenBN (#2852) - Added
__repr__
toMultiScaleRoIAlign
(#2840) - Exposing LevelMapper params in
MultiScaleRoIAlign
(#3151) - Enable autocast for all operators and let them use the dispatcher (#2926, #2922, #2928, #2898)
Transforms
adjust_hue
now accepts tensors with one channel (#3222)- Add
fill
color support for tensor affine transforms (#2904) - Remove torchscript workaround for
center_crop
(#3118) - Improved error message for
RandomCrop
(#2816)
IO
- Enabling to import
read_file
and the other methods from torchvision.io (#2918) - accept python bytes in
_read_video_from_memory()
(#3347) - Enable rtmp timeout in decoder (#3076)
- Specify tls cert file to decoder through config (#3289, #3374)
- Add UUID in LOG() in decoder (#3080)
References
- Add weight averaging and storing methods in references utils (#3352)
- Adding Preset Transforms in reference scripts (#3317)
- Load variables when
--resume /path/to/checkpoint --test-only
(#3285) - Updated video classification ref example with new transforms (#2935)
Misc
- Various documentation improvements (#3039, #3271, #2820, #2808, #3131, #3062, #3061, #3000, #3299, #3400, #2899, #2901, #2908, #2851, #2909, #3005, #2821, #2957, #3360, #3019, #3124, #3217, #2879, #3234, #3180, #3425, #2979, #2935, #3298, #3268, #3203, #3290, #3295, #3200, #2663, #3153, #3147, #3232)
- The documentation infrastructure was improved, in particular the docs are now built on every PR and uploaded to CircleCI (#3259, #3378, #3408, #3373, #3290)
- Avoid some deprecation warnings from PyTorch (#3348)
- Ensure operators are added in C++ (#2798, #3091, #3391)
- Fixed compilation warnings on C++ codebase (#3390)
- CI Improvements (#3401, #3329, #2990, #2978, #3189, #3230, #3254, #2844, #2872, #2825, #3144, #3137, #2827, #2848, #2914, #3419, #2895, #2837)
- Installation improvements (#3302, #2969, #3113, #3202)
- CMake improvemen...