Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference in operation ordering between --EmitONNXIR and --EmitONNXBasic #3010

Open
flemairen6 opened this issue Nov 13, 2024 · 3 comments
Open

Comments

@flemairen6
Copy link
Collaborator

I am trying to understand one of the optimization that seems to be running when using --EmitONNXIR compared to --EmitONNXBasic
If we take the following examples

      <
          ir_version: 8,
          opset_import: ["" : 19]
      >
      main (float[1,64,8] input) => (float[1,64,4] out0) {
          splitCst = Constant <value: tensor = int64[2] {4, 4}> ()
          cst4 = Constant <value = float {1.0}> ()
          cst5 = Constant <value = float {0.5}> ()
          split1, split2 = Split <axis: int = 2> (input, splitCst)
          add = Add(split1, cst4)
          mul3 = Mul(split1, add)
          mul4 = Mul(mul3, cst5)
          out0 = Mul(mul4, split2)
      }

and run onnx-mlir --EmitONNXBasic MyModel.onnx, I get:

  func.func @main_graph(%arg0: tensor<1x64x8xf32> {onnx.name = "input"}) -> (tensor<1x64x4xf32> {onnx.name = "out0"}) {
    %0 = onnx.Constant dense<4> : tensor<2xi64>
    %1 = onnx.Constant dense<1.000000e+00> : tensor<f32>
    %2 = onnx.Constant dense<5.000000e-01> : tensor<f32>
    %3:2 = "onnx.Split"(%arg0, %0) {axis = 2 : si64, onnx_node_name = "Split3"} : (tensor<1x64x8xf32>, tensor<2xi64>) -> (tensor<1x64x4xf32>, tensor<1x64x4xf32>)
    %4 = "onnx.Add"(%3#0, %1) {onnx_node_name = "Add4"} : (tensor<1x64x4xf32>, tensor<f32>) -> tensor<1x64x4xf32>
    %5 = "onnx.Mul"(%3#0, %4) {onnx_node_name = "Mul5"} : (tensor<1x64x4xf32>, tensor<1x64x4xf32>) -> tensor<1x64x4xf32>
    %6 = "onnx.Mul"(%5, %2) {onnx_node_name = "Mul6"} : (tensor<1x64x4xf32>, tensor<f32>) -> tensor<1x64x4xf32>
    %7 = "onnx.Mul"(%6, %3#1) {onnx_node_name = "Mul7"} : (tensor<1x64x4xf32>, tensor<1x64x4xf32>) -> tensor<1x64x4xf32>
    onnx.Return %7 : tensor<1x64x4xf32>
  }

Where the order of the Mul ops is the same as in the textual (or binary) version of the model (Notice that the results of Split is used in the first and last Mul)
Now, if I run onnx-mlir --EmitONNXIR MyModel.onnx, I have:

  func.func @main_graph(%arg0: tensor<1x64x8xf32> {onnx.name = "input"}) -> (tensor<1x64x4xf32> {onnx.name = "out0"}) {
    %0 = onnx.Constant dense<4> : tensor<2xi64>
    %1 = onnx.Constant dense<1.000000e+00> : tensor<f32>
    %2 = onnx.Constant dense<5.000000e-01> : tensor<f32>
    %3:2 = "onnx.Split"(%arg0, %0) {axis = 2 : si64, onnx_node_name = "Split3"} : (tensor<1x64x8xf32>, tensor<2xi64>) -> (tensor<1x64x4xf32>, tensor<1x64x4xf32>)
    %4 = "onnx.Add"(%3#0, %1) {onnx_node_name = "Add4"} : (tensor<1x64x4xf32>, tensor<f32>) -> tensor<1x64x4xf32>
    %5 = "onnx.Mul"(%3#0, %4) {onnx_node_name = "Mul5"} : (tensor<1x64x4xf32>, tensor<1x64x4xf32>) -> tensor<1x64x4xf32>
    %6 = "onnx.Mul"(%5, %3#1) {onnx_node_name = "Mul7-Constant2-Mul6_0"} : (tensor<1x64x4xf32>, tensor<1x64x4xf32>) -> tensor<1x64x4xf32>
    %7 = "onnx.Mul"(%6, %2) {onnx_node_name = "Mul7-Constant2-Mul6_1"} : (tensor<1x64x4xf32>, tensor<f32>) -> tensor<1x64x4xf32>
    return %7 : tensor<1x64x4xf32>
  }

In this case, the last Mul has been moved up compared to the previous IR.

I am trying to understand what pass or optimization could be causing this behaviour, and if there is a way to disable it without disabling other optimizations. Could someone help me with that?

Thanks a lot in advance!

@AlexandreEichenberger
Copy link
Collaborator

That feels like maybe constant propagation. Have you tried to run with the -mlir-print-ir-after-all? It may list the offending optimizations.

Also, if it list the hybird opt, where multiple opt are mashed together, we used to have an alternative to that pass, but it might have been yanked.

My recollection is that basic was before anything was done, and the other one was after shape inference. So are there reasons you still want to run with some but not all opts present in the EmitONNXIR?

We can probably adapt when the code is emitted for that later target, and/or making another target that emit after only what you want.

@flemairen6
Copy link
Collaborator Author

Hi @AlexandreEichenberger thanks for the answer! Indeed after looking into it, both Constant propagation and the hybrid pass have this effect.
My usecase is that I'm trying to recompose a decomposed operator but the decomposed operation are moved around and "lost". It doesn't sound too good of an idea to disable both Constant propagation and the Hybrid pass though so I'll probably need to find another solution.
Thanks for the help!

@AlexandreEichenberger
Copy link
Collaborator

AlexandreEichenberger commented Nov 14, 2024

@flemairen6 feel free to modify the order of opts leading to -EmitONNXIR. It's not a path that we use too much, so if you wanted more (or less, different...) set of optimizations leading to that emit goal, feel free to experiment. From our perspective, as long as the code going all the way through LLVM matters.

If there are different patterns, feel free to play with the rule priorities as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants