ByteIR Additional Ops
ByteIR compiler introduces several coarse-grained ops to improve pattern-matching rewriting during compilation.
ByteIR implements in the way of re-using mhlo custom call op definition with a ByteIR prefix in call_target_name
,
instead of defining another new dialect.
ByteIR implements this conversion in frontends, instead of putting it to ByteIR compiler.
Rationales
Need of coarse-grained ops
Introduction of coarse-grained ops can provide several benefits as follows,
- it simplifies pattern-matching processes during rewriting regardless of optimization or lowering;
- it allows high-level information to be encoded with coarse-grained ops, helping optimization;
- it provides intuitive mapping from frontends to IR, helping debuggability;
- it provides flexible control, since coarse-grained ops can be easily decomposed to fine-grained ops, the other way around is much harder.
Implementation of reusing mhlo custom call
Reusing mhlo custom call with a ByteIR prefix in call_target_name
can provide several benefits as follows,
- the original IR is still legal and well-defined without introducing additional new dialect or defining new ops in tablegen;
- it provides backward support for all existing passes or pattern-matching, not breaking anything;
- with a proper definition, an unrecognized coarse-grained op can be eaisly mapping to a custom library or be decomposed into fine-grained ops.
Implementation coarse-grained op conversion in frontends
Implementing coarse-grained op conversion in frontends can provide several benefits as follows,
- it avoids N-to-1 rewriting happening in ByteIR compiler, and putting corresponding rewriting to each own frontend provides much cleaner implementation;
- different frontends might already define their own dialect providing coarse-grained ops, making this conversion trivial and intuitive;
- it isolates effects caused by existing frontends graph optimizations, which might change among different versions of each frontends.
Addtional op definition
A coarse-grained op kind is defined through with a prefix.
call_target_name = "byteir.softmax" or "tf.DynamicPartition"
If an op is generic across frontends, which happen mostly, it uses a byteir
prefix.
If an op is frontend-specific, it uses a frontend-specific prefix, such as tf
or pytorch
.
Further needed information for a given coarse-grained op are encoded in a dictionary attribute, called byteir_attrs
, which includes all named attributes.
Op Attribute: byteir_attrs = {approximate = "none"} or byteir_attrs = {} of if none
byteir.layer_norm
- Operands:
- input: Tensor
- weight: Tensor
- bias: Tensor
- Attrs
- epsilon: F64Attr
- axis: I64ArrayAttr
- eps_outside_sqrt: Optional<BoolAttr>
- Results(1 or 3):
- output: Tensor
- mean: Optional<Tensor>
- inv_std_dev: Optional<Tensor>
byteir.l2_norm
- Operands:
- input: Tensor
- Attrs
- epsilon: F64Attr
- axis: I64ArrayAttr
- Results:
- output: Tensor
byteir.softmax
- Operands:
- input: Tensor
- Attrs
- axis: I64Attr
- Results:
- output: Tensor
- Example:
%0 = "mhlo.custom_call"(%arg0) {api_version = 1 : i32, backend_config = "", byteir_attrs = {axis = 1 : i64}, call_target_name = "byteir.softmax", called_computations = [], has_side_effect = false} : (tensor<4x64xf32>) -> tensor<4x64xf32>
byteir.log_softmax
- Operands:
- input: Tensor
- Attrs
- axis: I64Attr
- Result:
- output: Tensor
- Example:
%0 = "mhlo.custom_call"(%arg0) {api_version = 1 : i32, backend_config = "", byteir_attrs = {axis = 1 : i64}, call_target_name = "byteir.log_softmax", called_computations = [], has_side_effect = false} : (tensor<4x64xf32>) -> tensor<4x64xf32>
byteir.gelu
- Operands:
- input: Tensor
- Attr:
- approximate: str
- none / erf
- tanh
- approximate: str
- Results:
- output: Tensor
- Example:
%0 = "mhlo.custom_call"(%arg0) {api_version = 1 : i32, backend_config = "", byteir_attrs = {approximate = "none"}, call_target_name = "byteir.gelu", called_computations = [], has_side_effect = false} : (tensor<4x64xf32>) -> tensor<4x64xf32>
byteir.arg_max/byteir.arg_min
- Operands:
- input: Tensor
- Attrs
- axis: I64Attr
- keep_dims: BoolAttr
- select_last_index: BoolAttr
- Results:
- output: Optional<Tensor>
- indices: IntTensor
byteir.top_k
- Operands:
- input: Tensor
- Attrs
- k: I64Attr
- axis: I64ArrayAttr
- sorted: BoolAttr
- Results:
- output: Tensor
- indices: IntTensor
byteir.erf
- Operands:
- input: Tensor
- Results:
- output: Tensor
- Example:
%0 = "mhlo.custom_call"(%arg0) {call_target_name = "byteir.erf", has_side_effect = false} : (tensor<?x64xf32>) -> tensor<?x64xf32>
byteir.one_hot
- Operands:
- indices: IntTensor
- Attrs:
- depth: I64Attr
- axis: I64Attr
- on_value: AnyAttr
- off_value: AnyAttr
- Results:
- output: Tensor (ElementType same as on_value and off_value)
byteir.quantize
- Operands:
- input: FloatTensor
- scale: FloatTensor (rank=0 for per-tensor quantization, or rank=1 for per-channel quantization)
- zero_point: Int8Tensor (shape same as scale)
- Attrs
- axis: I64Attr (Optional, required only for per-channel quantization)
- Results:
- output: Int8Tensor
byteir.dequantize
- Operands:
- input: Int8Tensor
- scale: FloatTensor (rank=0 for per-tensor dequantization, or rank=1 for per-channel dequantization)
- zero_point: Int8Tensor (shape same as scale)
- Attrs
- axis: I64Attr (Optional, channel axis index, required only for per-channel dequantization)
- Results:
- output: FloatTensor
byteir.resize
- Operands:
- input: Tensor
- target (scale/size): FloatTensor/IntTensor (respectively)
- Attrs:
- target_mode: StringAttr
scale
size
- mode: StringAttr
nearest
linear
- coordinate_transformation_mode: StringAttr
- Denote scale = length_resized / length_original, the transformation can be described as following.
- target_mode: StringAttr
coordinate_transformation_mode | x_original = |
---|---|
asymmetric | x_resized / scale |
pytorch_half_pixel | length_resized > 1 ? (x_resized + 0.5) / scale - 0.5 : 0 |
half_pixel | (x_resized + 0.5) / scale - 0.5 |
align_corners | x_resized * (length_original - 1) / (length_resized - 1) |
- Results:
- output: Tensor
byteir.rng_uniform
- Operands:
- low: 0dTensor
- high: 0dTensor
- seed: 0dTensor
- offset: 0dTensor
- shape: Optional<1dTensor>
- Results:
- out: Tensor
- Example:
// Static Shape Case: out tensor must have static shape
%high = mhlo.constant dense<1.000000e+00> : tensor<f32>
%low = mhlo.constant dense<0.000000e+00> : tensor<f32>
%seed = byre.compute @GetSeed() : tensor<i64>
%offset = byre.compute @NextOffset() : tensor<i64>
%0 = "mhlo.custom_call"(%low, %high, %seed, %offset) {call_target_name = "byteir.rng_uniform", has_side_effect = false} : (tensor<f32>, tensor<f32>, tensor<i64>, tensor<i64>) -> tensor<8x1024x768xf32>
// Dynamic Shape Case
%high = mhlo.constant dense<1.000000e+00> : tensor<f32>
%low = mhlo.constant dense<0.000000e+00> : tensor<f32>
%seed = byre.compute @GetSeed() : tensor<i64>
%offset = byre.compute @NextOffset() : tensor<i64>
%shape = shape.shape_of %arg0 : tensor<3xindex>
%0 = "mhlo.custom_call"(%low, %high, %seed, %offset, %shape) {call_target_name = "byteir.rng_uniform", has_side_effect = false} : (tensor<f32>, tensor<f32>, tensor<i64>, tensor<i64>, tensor<3xindex>) -> tensor<?x?x?xf32>