PyTorch Releases Version 1.12 With New Library “TorchArrow”

PyTorch-1.12-Release-Includes-Accelerated-Training-on-Macs-and-New-Library-TorchArrow

PyTorch 1.12 supports complex convolutions and the complex32 data type for reduced-precision computation for applications requiring complex numbers

The PyTorch open-source deep-learning framework announced the release of version 1.12, which includes support for GPU-accelerated training on Apple silicon Macs and a new data preprocessing library TorchArrow, as updates to other libraries and APIs.

The PyTorch team highlighted the significant features of the release in a recent blog post.

Support for training on Apple silicon GPUs using Apple’s Metal Performance Shaders (MPS) is released with “prototype” status, offering up to 20x speedup over CPU-based training. In addition, the release includes official support for M1 builds of the Core and Domain PyTorch libraries. The TorchData library’s DataPipes are now backwards compatible with the older DataLoader class; the release also includes an AWS S3 integration for TorchData. The TorchArrow library features a Pandas-style API and an in-memory data format based on Apache Arrow and can easily plug into other PyTorch data libraries, including DataLoader and DataPipe. The new release contains more than 3,100 commits from 433 contributors since the 1.11 release.

Before the 1.12 release, PyTorch only supported CPU-based training on M1 Macs. With help from Apple’s Metal team, PyTorch now includes a backend based on MPS, with processor-specific kernels and a mapping of the PyTorch model computation graph onto the MPS Graph Framework. The Mac’s memory architecture gives the GPU direct access to memory, improving overall performance and allowing for training using larger batch sizes and larger models.

Besides support for Apple silicon, PyTorch 1.12 includes several other performance enhancements. TorchScript, PyTorch’s intermediate representation of models for runtime portability, now has a new layer fusion backend called NVFuser, which is faster and supports more operations than the previous fuser, NNC. For computer vision (CV) models, the release implements the Channels Last data format for use on CPUs, increasing inference performance up to 1.8x over Channels First. The release also includes enhancements to the bfloat16 reduced-precision data type, which can provide up to 2.2x performance improvement on Intel Xeon processors.

The release includes several new features and APIs. PyTorch 1.12 supports complex convolutions and the complex32 data type for reduced-precision computation for applications requiring complex numbers. The release “significantly improves” support for forward-mode automatic differentiation, for eager computation of directional derivatives in the forward pass. There is also a prototype implementation of a new class, DataLoader2, a lightweight data loader class for executing a DataPipe graph.

The Fully Sharded Data Parallel (FSDP) API moves from prototype to Beta in the new release. FSDP supports training large models by distributing model weights and gradients across a cluster of workers. New features for FSDP in this release include faster model initialisation, fine-grained control of mixed precision, enhanced training of Transformer models, and an API that supports changing sharding strategy with a single line of code. The PyTorch 1.12 code and release notes are available on GitHub.