PyTorch 2.0 Training

ByteIR is a well optimized compiler for PyTorch 2.0. This doc will introduce our practice on accelerating PyTorch 2.0 on NVGPU by compilation technology.


Overview of our compilation pipeline:

          FX Graph
       Torch Frontend
             |(emit mhlo)
       ByteIR Compiler
  |          |          |
Codegen  AITemplate  Runtime Library
  |          |
  |(linalg)  |(cutlass)
  |          |
 PTX    Tuned Kernel

Prepare Materials


We recommond the following environment for building:

  • cuda>=11.8
  • python>=3.9
  • gcc>=8.5 or clang>=7

Or use our Dockerfile to build an docker image.

Build ByteIR Components

See each components’ README to build themself:

When building is completed, it will produce three python wheel packages: torch_frontend*.whl, byteir*.whl and brt*.whl

Install PyTorch and ByteIR Components

Install torch-nightly:

  • cd byteir/frontends/torch-frontend
  • python3 -m pip install -r ./torch-requirements.txt

Install ByteIR Components:

  • python3 -m pip install /path_to/torch_frontend*.whl /path_to/byteir*.whl /path_to/brt*.whl

Training Accelerating Examples

ByteIR Backend for Torch2.0


MLP Training Demo