Difference in operation ordering between --EmitONNXIR and --EmitONNXBasic #3010

flemairen6 · 2024-11-13T14:47:34Z

I am trying to understand one of the optimization that seems to be running when using --EmitONNXIR compared to --EmitONNXBasic
If we take the following examples

      <
          ir_version: 8,
          opset_import: ["" : 19]
      >
      main (float[1,64,8] input) => (float[1,64,4] out0) {
          splitCst = Constant <value: tensor = int64[2] {4, 4}> ()
          cst4 = Constant <value = float {1.0}> ()
          cst5 = Constant <value = float {0.5}> ()
          split1, split2 = Split <axis: int = 2> (input, splitCst)
          add = Add(split1, cst4)
          mul3 = Mul(split1, add)
          mul4 = Mul(mul3, cst5)
          out0 = Mul(mul4, split2)
      }

and run onnx-mlir --EmitONNXBasic MyModel.onnx, I get:

  func.func @main_graph(%arg0: tensor<1x64x8xf32> {onnx.name = "input"}) -> (tensor<1x64x4xf32> {onnx.name = "out0"}) {
    %0 = onnx.Constant dense<4> : tensor<2xi64>
    %1 = onnx.Constant dense<1.000000e+00> : tensor<f32>
    %2 = onnx.Constant dense<5.000000e-01> : tensor<f32>
    %3:2 = "onnx.Split"(%arg0, %0) {axis = 2 : si64, onnx_node_name = "Split3"} : (tensor<1x64x8xf32>, tensor<2xi64>) -> (tensor<1x64x4xf32>, tensor<1x64x4xf32>)
    %4 = "onnx.Add"(%3#0, %1) {onnx_node_name = "Add4"} : (tensor<1x64x4xf32>, tensor<f32>) -> tensor<1x64x4xf32>
    %5 = "onnx.Mul"(%3#0, %4) {onnx_node_name = "Mul5"} : (tensor<1x64x4xf32>, tensor<1x64x4xf32>) -> tensor<1x64x4xf32>
    %6 = "onnx.Mul"(%5, %2) {onnx_node_name = "Mul6"} : (tensor<1x64x4xf32>, tensor<f32>) -> tensor<1x64x4xf32>
    %7 = "onnx.Mul"(%6, %3#1) {onnx_node_name = "Mul7"} : (tensor<1x64x4xf32>, tensor<1x64x4xf32>) -> tensor<1x64x4xf32>
    onnx.Return %7 : tensor<1x64x4xf32>
  }

Where the order of the Mul ops is the same as in the textual (or binary) version of the model (Notice that the results of Split is used in the first and last Mul)
Now, if I run onnx-mlir --EmitONNXIR MyModel.onnx, I have:

  func.func @main_graph(%arg0: tensor<1x64x8xf32> {onnx.name = "input"}) -> (tensor<1x64x4xf32> {onnx.name = "out0"}) {
    %0 = onnx.Constant dense<4> : tensor<2xi64>
    %1 = onnx.Constant dense<1.000000e+00> : tensor<f32>
    %2 = onnx.Constant dense<5.000000e-01> : tensor<f32>
    %3:2 = "onnx.Split"(%arg0, %0) {axis = 2 : si64, onnx_node_name = "Split3"} : (tensor<1x64x8xf32>, tensor<2xi64>) -> (tensor<1x64x4xf32>, tensor<1x64x4xf32>)
    %4 = "onnx.Add"(%3#0, %1) {onnx_node_name = "Add4"} : (tensor<1x64x4xf32>, tensor<f32>) -> tensor<1x64x4xf32>
    %5 = "onnx.Mul"(%3#0, %4) {onnx_node_name = "Mul5"} : (tensor<1x64x4xf32>, tensor<1x64x4xf32>) -> tensor<1x64x4xf32>
    %6 = "onnx.Mul"(%5, %3#1) {onnx_node_name = "Mul7-Constant2-Mul6_0"} : (tensor<1x64x4xf32>, tensor<1x64x4xf32>) -> tensor<1x64x4xf32>
    %7 = "onnx.Mul"(%6, %2) {onnx_node_name = "Mul7-Constant2-Mul6_1"} : (tensor<1x64x4xf32>, tensor<f32>) -> tensor<1x64x4xf32>
    return %7 : tensor<1x64x4xf32>
  }

In this case, the last Mul has been moved up compared to the previous IR.

I am trying to understand what pass or optimization could be causing this behaviour, and if there is a way to disable it without disabling other optimizations. Could someone help me with that?

Thanks a lot in advance!

The text was updated successfully, but these errors were encountered:

AlexandreEichenberger · 2024-11-14T03:16:27Z

That feels like maybe constant propagation. Have you tried to run with the -mlir-print-ir-after-all? It may list the offending optimizations.

Also, if it list the hybird opt, where multiple opt are mashed together, we used to have an alternative to that pass, but it might have been yanked.

My recollection is that basic was before anything was done, and the other one was after shape inference. So are there reasons you still want to run with some but not all opts present in the EmitONNXIR?

We can probably adapt when the code is emitted for that later target, and/or making another target that emit after only what you want.

flemairen6 · 2024-11-14T14:04:07Z

Hi @AlexandreEichenberger thanks for the answer! Indeed after looking into it, both Constant propagation and the hybrid pass have this effect.
My usecase is that I'm trying to recompose a decomposed operator but the decomposed operation are moved around and "lost". It doesn't sound too good of an idea to disable both Constant propagation and the Hybrid pass though so I'll probably need to find another solution.
Thanks for the help!

AlexandreEichenberger · 2024-11-14T14:40:14Z

@flemairen6 feel free to modify the order of opts leading to -EmitONNXIR. It's not a path that we use too much, so if you wanted more (or less, different...) set of optimizations leading to that emit goal, feel free to experiment. From our perspective, as long as the code going all the way through LLVM matters.

If there are different patterns, feel free to play with the rule priorities as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference in operation ordering between --EmitONNXIR and --EmitONNXBasic #3010

Difference in operation ordering between --EmitONNXIR and --EmitONNXBasic #3010

flemairen6 commented Nov 13, 2024

AlexandreEichenberger commented Nov 14, 2024

flemairen6 commented Nov 14, 2024

AlexandreEichenberger commented Nov 14, 2024 •

edited

Loading

Difference in operation ordering between --EmitONNXIR and --EmitONNXBasic #3010

Difference in operation ordering between --EmitONNXIR and --EmitONNXBasic #3010

Comments

flemairen6 commented Nov 13, 2024

AlexandreEichenberger commented Nov 14, 2024

flemairen6 commented Nov 14, 2024

AlexandreEichenberger commented Nov 14, 2024 • edited Loading

AlexandreEichenberger commented Nov 14, 2024 •

edited

Loading