ByteIR Additional Ops

ByteIR compiler introduces several coarse-grained ops to improve pattern-matching rewriting during compilation. ByteIR implements in the way of re-using mhlo custom call op definition with a ByteIR prefix in call_target_name, instead of defining another new dialect. ByteIR implements this conversion in frontends, instead of putting it to ByteIR compiler.

Rationales

Need of coarse-grained ops

Introduction of coarse-grained ops can provide several benefits as follows,

  • it simplifies pattern-matching processes during rewriting regardless of optimization or lowering;
  • it allows high-level information to be encoded with coarse-grained ops, helping optimization;
  • it provides intuitive mapping from frontends to IR, helping debuggability;
  • it provides flexible control, since coarse-grained ops can be easily decomposed to fine-grained ops, the other way around is much harder.

Implementation of reusing mhlo custom call

Reusing mhlo custom call with a ByteIR prefix in call_target_name can provide several benefits as follows,

  • the original IR is still legal and well-defined without introducing additional new dialect or defining new ops in tablegen;
  • it provides backward support for all existing passes or pattern-matching, not breaking anything;
  • with a proper definition, an unrecognized coarse-grained op can be eaisly mapping to a custom library or be decomposed into fine-grained ops.

Implementation coarse-grained op conversion in frontends

Implementing coarse-grained op conversion in frontends can provide several benefits as follows,

  • it avoids N-to-1 rewriting happening in ByteIR compiler, and putting corresponding rewriting to each own frontend provides much cleaner implementation;
  • different frontends might already define their own dialect providing coarse-grained ops, making this conversion trivial and intuitive;
  • it isolates effects caused by existing frontends graph optimizations, which might change among different versions of each frontends.

Addtional op definition

A coarse-grained op kind is defined through with a prefix.

call_target_name = "byteir.softmax" or "tf.DynamicPartition"

If an op is generic across frontends, which happen mostly, it uses a byteir prefix. If an op is frontend-specific, it uses a frontend-specific prefix, such as tf or pytorch.

Further needed information for a given coarse-grained op are encoded in a dictionary attribute, called byteir_attrs, which includes all named attributes.

Op Attribute: byteir_attrs = {approximate = "none"} or byteir_attrs = {} of if none

byteir.layer_norm

  • Operands:
    • input: Tensor
    • weight: Tensor
    • bias: Tensor
  • Attrs
    • epsilon: F64Attr
    • axis: I64ArrayAttr
    • eps_outside_sqrt: Optional<BoolAttr>
  • Results(1 or 3):
    • output: Tensor
    • mean: Optional<Tensor>
    • inv_std_dev: Optional<Tensor>

byteir.l2_norm

  • Operands:
    • input: Tensor
  • Attrs
    • epsilon: F64Attr
    • axis: I64ArrayAttr
  • Results:
    • output: Tensor

byteir.softmax

  • Operands:
    • input: Tensor
  • Attrs
    • axis: I64Attr
  • Results:
    • output: Tensor
  • Example:
%0 = "mhlo.custom_call"(%arg0) {api_version = 1 : i32, backend_config = "", byteir_attrs = {axis = 1 : i64}, call_target_name = "byteir.softmax", called_computations = [], has_side_effect = false} : (tensor<4x64xf32>) -> tensor<4x64xf32>

byteir.log_softmax

  • Operands:
    • input: Tensor
  • Attrs
    • axis: I64Attr
  • Result:
    • output: Tensor
  • Example:
%0 = "mhlo.custom_call"(%arg0) {api_version = 1 : i32, backend_config = "", byteir_attrs = {axis = 1 : i64}, call_target_name = "byteir.log_softmax", called_computations = [], has_side_effect = false} : (tensor<4x64xf32>) -> tensor<4x64xf32>

byteir.gelu

  • Operands:
    • input: Tensor
  • Attr:
    • approximate: str
      • none / erf
      • tanh
  • Results:
    • output: Tensor
  • Example:
%0 = "mhlo.custom_call"(%arg0) {api_version = 1 : i32, backend_config = "", byteir_attrs = {approximate = "none"}, call_target_name = "byteir.gelu", called_computations = [], has_side_effect = false} : (tensor<4x64xf32>) -> tensor<4x64xf32>

byteir.arg_max/byteir.arg_min

  • Operands:
    • input: Tensor
  • Attrs
    • axis: I64Attr
    • keep_dims: BoolAttr
    • select_last_index: BoolAttr
  • Results:
    • output: Optional<Tensor>
    • indices: IntTensor

byteir.top_k

  • Operands:
    • input: Tensor
  • Attrs
    • k: I64Attr
    • axis: I64ArrayAttr
    • sorted: BoolAttr
  • Results:
    • output: Tensor
    • indices: IntTensor

byteir.erf

  • Operands:
    • input: Tensor
  • Results:
    • output: Tensor
  • Example:
%0 = "mhlo.custom_call"(%arg0) {call_target_name = "byteir.erf", has_side_effect = false} : (tensor<?x64xf32>) -> tensor<?x64xf32>

byteir.one_hot

  • Operands:
    • indices: IntTensor
  • Attrs:
    • depth: I64Attr
    • axis: I64Attr
    • on_value: AnyAttr
    • off_value: AnyAttr
  • Results:
    • output: Tensor (ElementType same as on_value and off_value)

byteir.quantize

  • Operands:
    • input: FloatTensor
    • scale: FloatTensor (rank=0 for per-tensor quantization, or rank=1 for per-channel quantization)
    • zero_point: Int8Tensor (shape same as scale)
  • Attrs
    • axis: I64Attr (Optional, required only for per-channel quantization)
  • Results:
    • output: Int8Tensor

byteir.dequantize

  • Operands:
    • input: Int8Tensor
    • scale: FloatTensor (rank=0 for per-tensor dequantization, or rank=1 for per-channel dequantization)
    • zero_point: Int8Tensor (shape same as scale)
  • Attrs
    • axis: I64Attr (Optional, channel axis index, required only for per-channel dequantization)
  • Results:
    • output: FloatTensor

byteir.resize

  • Operands:
    • input: Tensor
    • target (scale/size): FloatTensor/IntTensor (respectively)
  • Attrs:
    • target_mode: StringAttr
      • scale
      • size
    • mode: StringAttr
      • nearest
      • linear
    • coordinate_transformation_mode: StringAttr
      • Denote scale = length_resized / length_original, the transformation can be described as following.
coordinate_transformation_modex_original =
asymmetricx_resized / scale
pytorch_half_pixellength_resized > 1 ? (x_resized + 0.5) / scale - 0.5 : 0
half_pixel(x_resized + 0.5) / scale - 0.5
align_cornersx_resized * (length_original - 1) / (length_resized - 1)
  • Results:
    • output: Tensor

byteir.rng_uniform

  • Operands:
    • low: 0dTensor
    • high: 0dTensor
    • seed: 0dTensor
    • offset: 0dTensor
    • shape: Optional<1dTensor>
  • Results:
    • out: Tensor
  • Example:
// Static Shape Case: out tensor must have static shape
%high = mhlo.constant dense<1.000000e+00> : tensor<f32>
%low = mhlo.constant dense<0.000000e+00> : tensor<f32>
%seed = byre.compute @GetSeed() : tensor<i64>
%offset = byre.compute @NextOffset() : tensor<i64>
%0 = "mhlo.custom_call"(%low, %high, %seed, %offset) {call_target_name = "byteir.rng_uniform", has_side_effect = false} : (tensor<f32>, tensor<f32>, tensor<i64>, tensor<i64>) -> tensor<8x1024x768xf32>
// Dynamic Shape Case
%high = mhlo.constant dense<1.000000e+00> : tensor<f32>
%low = mhlo.constant dense<0.000000e+00> : tensor<f32>
%seed = byre.compute @GetSeed() : tensor<i64>
%offset = byre.compute @NextOffset() : tensor<i64>
%shape = shape.shape_of %arg0 : tensor<3xindex>
%0 = "mhlo.custom_call"(%low, %high, %seed, %offset, %shape) {call_target_name = "byteir.rng_uniform", has_side_effect = false} : (tensor<f32>, tensor<f32>, tensor<i64>, tensor<i64>, tensor<3xindex>) -> tensor<?x?x?xf32>