Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error in Shape inference of Constant Op #3014

Open
amd-abhikulk opened this issue Nov 18, 2024 · 5 comments
Open

[BUG] Error in Shape inference of Constant Op #3014

amd-abhikulk opened this issue Nov 18, 2024 · 5 comments

Comments

@amd-abhikulk
Copy link

amd-abhikulk commented Nov 18, 2024

I was trying to lower Mistral-7B-v0.1.onnx model, when I get the following error.

onnx-mlir: /home/amd/Workspace/Abhishek/onnx-mlir/src/Support/Diagnostic.hpp:40: onnx_mlir::Diagnostic::Range<T>::Range(T, T) [with T = long int]: Assertion `min <= max && "Illegal range"' failed.
Aborted (core dumped)

This error doesn't much context into what went wrong.

I did some debugging and found out that it was Concat Op was giving this error here

https://github.com/onnx/onnx-mlir/blob/a5ae8baff62209567eec4cc1e8a621cd36292cce/src/Dialect/ONNX/ONNXOps/Tensor/Concat.cpp#L99C1-L103C1

This happens cause concat received tensors with different ranks and the logic kinda fell apart. IMO verify function must check whether all the operands have same rank. (I will send a PR for this soon).

Now as to how it got different ranked tensors, that's the actual cause of this error.

Screenshot (17)_edited

The concat Op has a constant [-1] Tensor to concatenated to the input tensor. But for some reason the onnx-mlir lowers it to

%298 = onnx.Constant {onnx_node_name = "/model/constant_nodes/TensorProto.INT64/1D/-1", value = dense<-1> : tensor<1xi64>} : tensor<i64>

The constant Op strips the constant tensors Op of it's dimensionality for some reason, and I can't undersrtand why

@amd-abhikulk
Copy link
Author

@AlexandreEichenberger can you please look into this issue.

@AlexandreEichenberger
Copy link
Collaborator

Let me look into this.

@AlexandreEichenberger
Copy link
Collaborator

Ok, so very simply, we attempt to print an error because the range for an operation is wrong. The range is from [-rank... rank-1], but since rank is zero (possibly after errors in our code), the range is [0, -1] and our error printer asserts because of an illegal range. So it kind of defeat the purpose of having a nice printout for the error. I have fixed that, but I still don't get an error message, so looking into why.

At a higher level, we will have to figure out why the rank is perceived as zero. I have seen (in other places) that Microsoft ONNX models have sometimes assumptions that are different from what we think is the ONNX specs. To be determined.

I will have a PR to fix at least the error messaging, as it is truly unacceptable to provide such asserts on error for model inputs.

If you have the time to determine why we see a rank of zero there, would be much appreciated, keeping in mind that possibly the model is non-conforming (but maybe a simple fix, aka a normalization rule may fix it).

@AlexandreEichenberger
Copy link
Collaborator

Ok, there were a few things wrong. We now get the diagnostic again, and we issue warning instead of assert (it was for reporting an error to begin with, so it's better that way).

Fix are in PR #3020.

@AlexandreEichenberger
Copy link
Collaborator

Now as to how it got different ranked tensors, that's the actual cause of this error.

@amd-abhikulk do you mind looking at the root cause of the error and submit a PR for it? Meaning adding code in the verifier to check the specs, namely

All input tensors must have the same shape, except for the dimension size of the axis to concatenate on.

If that is the case in the model, its better to report the root cause of the error, not the subsequent axis issue that seems to derived from that root error (did not personally verify that claim but that seems very plausible).

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants