[BUG] Error in Shape inference of Constant Op #3014

amd-abhikulk · 2024-11-18T11:01:44Z

I was trying to lower Mistral-7B-v0.1.onnx model, when I get the following error.

onnx-mlir: /home/amd/Workspace/Abhishek/onnx-mlir/src/Support/Diagnostic.hpp:40: onnx_mlir::Diagnostic::Range<T>::Range(T, T) [with T = long int]: Assertion `min <= max && "Illegal range"' failed.
Aborted (core dumped)

This error doesn't much context into what went wrong.

I did some debugging and found out that it was Concat Op was giving this error here

https://github.com/onnx/onnx-mlir/blob/a5ae8baff62209567eec4cc1e8a621cd36292cce/src/Dialect/ONNX/ONNXOps/Tensor/Concat.cpp#L99C1-L103C1

This happens cause concat received tensors with different ranks and the logic kinda fell apart. IMO verify function must check whether all the operands have same rank. (I will send a PR for this soon).

Now as to how it got different ranked tensors, that's the actual cause of this error.

The concat Op has a constant [-1] Tensor to concatenated to the input tensor. But for some reason the onnx-mlir lowers it to

%298 = onnx.Constant {onnx_node_name = "/model/constant_nodes/TensorProto.INT64/1D/-1", value = dense<-1> : tensor<1xi64>} : tensor<i64>

The constant Op strips the constant tensors Op of it's dimensionality for some reason, and I can't undersrtand why

The text was updated successfully, but these errors were encountered:

amd-abhikulk · 2024-11-27T04:36:26Z

@AlexandreEichenberger can you please look into this issue.

AlexandreEichenberger · 2024-11-27T16:09:52Z

Let me look into this.

AlexandreEichenberger · 2024-11-27T18:59:26Z

Ok, so very simply, we attempt to print an error because the range for an operation is wrong. The range is from [-rank... rank-1], but since rank is zero (possibly after errors in our code), the range is [0, -1] and our error printer asserts because of an illegal range. So it kind of defeat the purpose of having a nice printout for the error. I have fixed that, but I still don't get an error message, so looking into why.

At a higher level, we will have to figure out why the rank is perceived as zero. I have seen (in other places) that Microsoft ONNX models have sometimes assumptions that are different from what we think is the ONNX specs. To be determined.

I will have a PR to fix at least the error messaging, as it is truly unacceptable to provide such asserts on error for model inputs.

If you have the time to determine why we see a rank of zero there, would be much appreciated, keeping in mind that possibly the model is non-conforming (but maybe a simple fix, aka a normalization rule may fix it).

AlexandreEichenberger · 2024-11-27T23:15:30Z

Ok, there were a few things wrong. We now get the diagnostic again, and we issue warning instead of assert (it was for reporting an error to begin with, so it's better that way).

Fix are in PR #3020.

AlexandreEichenberger · 2024-11-27T23:19:35Z

Now as to how it got different ranked tensors, that's the actual cause of this error.

@amd-abhikulk do you mind looking at the root cause of the error and submit a PR for it? Meaning adding code in the verifier to check the specs, namely

All input tensors must have the same shape, except for the dimension size of the axis to concatenate on.

If that is the case in the model, its better to report the root cause of the error, not the subsequent axis issue that seems to derived from that root error (did not personally verify that claim but that seems very plausible).

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Error in Shape inference of Constant Op #3014

[BUG] Error in Shape inference of Constant Op #3014

amd-abhikulk commented Nov 18, 2024 •

edited

Loading

amd-abhikulk commented Nov 27, 2024

AlexandreEichenberger commented Nov 27, 2024

AlexandreEichenberger commented Nov 27, 2024

AlexandreEichenberger commented Nov 27, 2024

AlexandreEichenberger commented Nov 27, 2024

[BUG] Error in Shape inference of Constant Op #3014

[BUG] Error in Shape inference of Constant Op #3014

Comments

amd-abhikulk commented Nov 18, 2024 • edited Loading

amd-abhikulk commented Nov 27, 2024

AlexandreEichenberger commented Nov 27, 2024

AlexandreEichenberger commented Nov 27, 2024

AlexandreEichenberger commented Nov 27, 2024

AlexandreEichenberger commented Nov 27, 2024

amd-abhikulk commented Nov 18, 2024 •

edited

Loading