PyTorch 2.0 Training

ByteIR is a well optimized compiler for PyTorch 2.0. This doc will introduce our practice on accelerating PyTorch 2.0 on NVGPU by compilation technology.

Overview

Overview of our compilation pipeline:

         PyTorch2.0
             |
       Dynamo/FuncTorch
             |
          FX Graph
             |
       Torch Frontend
             |
             |(emit mhlo)
             |
       ByteIR Compiler
             |
  +----------+----------+
  |          |          |
Codegen  AITemplate  Runtime Library
  |          |
  |(linalg)  |(cutlass)
  |          |
 PTX    Tuned Kernel

Prepare Materials

Environment

We recommond the following environment for building:

cuda>=11.8
python>=3.9
gcc>=8.5 or clang>=7

Or use our Dockerfile to build an docker image.

Build ByteIR Components

See each components’ README to build themself:

When building is completed, it will produce three python wheel packages: torch_frontend*.whl, byteir*.whl and brt*.whl

Install PyTorch and ByteIR Components

Install torch-nightly:

cd byteir/frontends/torch-frontend
python3 -m pip install -r ./torch-requirements.txt

Install ByteIR Components:

python3 -m pip install /path_to/torch_frontend*.whl /path_to/byteir*.whl /path_to/brt*.whl

PyTorch 2.0 Training

Overview

Prepare Materials

Environment

Build ByteIR Components

Install PyTorch and ByteIR Components

Training Accelerating Examples

ByteIR Backend for Torch2.0

MLP Training Demo