PyTorch 2.0 Training
ByteIR is a well optimized compiler for PyTorch 2.0. This doc will introduce our practice on accelerating PyTorch 2.0 on NVGPU by compilation technology.
Overview
Overview of our compilation pipeline:
PyTorch2.0
|
Dynamo/FuncTorch
|
FX Graph
|
Torch Frontend
|
|(emit mhlo)
|
ByteIR Compiler
|
+----------+----------+
| | |
Codegen AITemplate Runtime Library
| |
|(linalg) |(cutlass)
| |
PTX Tuned Kernel
Prepare Materials
Environment
We recommond the following environment for building:
- cuda>=11.8
- python>=3.9
- gcc>=8.5 or clang>=7
Or use our Dockerfile to build an docker image.
Build ByteIR Components
See each components’ README to build themself:
When building is completed, it will produce three python wheel packages: torch_frontend*.whl
, byteir*.whl
and brt*.whl
Install PyTorch and ByteIR Components
Install torch-nightly:
cd byteir/frontends/torch-frontend
python3 -m pip install -r ./torch-requirements.txt
Install ByteIR Components:
python3 -m pip install /path_to/torch_frontend*.whl /path_to/byteir*.whl /path_to/brt*.whl
Training Accelerating Examples
ByteIR Backend for Torch2.0
See byteir_backend.py.
MLP Training Demo
See mlp.py.