PyTorch's torchvision 0.3 Comes With Segmentation and Detection Models, New Datasets and More

The new release 0.3 of PyTorch’s torchvision library brings several new features and improvements. The newest version of torchvision includes models for semantic segmentation, instance segmentation, object detection, person keypoint detection, etc.

Torchvision developers added reference training and evaluation scripts for several tasks within computer vision. These scripts are meant to provide flexibility and convenience when tackling common computer vision problems and they serve as a base for training specific models and providing evaluation, baselines, etc.

The new release also contains torchvision ops – custom C++/CUDA operators that are used in computer vision. Some examples of torchvision ops include roi_pool, box_area, roi_align, etc.

As we mentioned above, torchvision 0.3 includes many popular models for segmentation, detection, and classification. Some of the models which are available in the new release include FCN, DeepLabV3 (with ResNet backbone) for segmentation, Faster R-CNN, Mask R-CNN, Keypoint R-CNN for detection and GoogleNet, MobileNetV2, ShuffleNet V2, ResNeXt for classification.

Five new datasets are also part of the release of torchvision 0.3. Caltech101, Caltech256, CelebA, Imagenet, and Semantic Boundaries datasets are now available in the torchvision package. Researchers added a superclass VisionDataset as a base class for all datasets used for computer vision tasks.

The full release notes are available here. Torchvision developers also added a tutorial as a Google Colab notebook that shows how to fine-tune a segmentation model on a custom dataset in torchvision.