-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugs in loop shape inference #163
Comments
Thanks for reporting the issue! The point around version 19 is certainly a bug. We have custom shape and type inference logic for a few operators, but try to upstream them into I'm afraid I don't see the issue with the first example, though. Where would that |
Sorry, made a typo in the test. It should be num_iters = op.const(5) |
I would avoid trying to resolve overly complex type inference problems by assuming that the result should be The bug here (that it actually says I think the 'more correct' behaviour could be a more robust unification between the known input and output types:
And preferably this approach should be up-streamed, but ONNX's control flow has always lacked in such things. I think it does have something for unification implemented though. Workaround: where a loop accumulator is known to have dynamic shape across iterations, use Hope this helps! |
I think there may also be an implicit assumption on the semantics of Loop. Namely, that the outputs of the subgraph have to have the same type and shape in each iteration. The documentation for Scan spells this requirement out explicitly:
While Loop does not state so, we may have assumed so in the past. |
I found a comment in the inference code in onnx that says it can change https://github.com/onnx/onnx/blob/ea4afc74f8f84646973e132ed471a82586a8aee1/onnx/defs/controlflow/utils.cc#L96 |
Another thing I realised. Since the loop constructor takes a function, you could pass a function, which makes a different graph depending on the static shape of the inputs. This would not have the effect you would expect intuitively (to be like repeatedly calling the function in a loop), because only the first iteration determines the graph. Is this not a problem? Should the user just be careful about the function they pass in? I think ideally the user should specify what shape info should the function be called with (or just provide the actual subgraph). |
Yes, while Spox exposes the type information (effectively allowing ad-hoc polymorphism in the DSL, this is driven by the metalanguage: here, Python), the user has to be careful about what is done with that information.
It is a problem, but what can be done about it? We can't stop this at the language level in Python, it's always up to the user in the end.
Specifying shape information is getting rid of type inference. If you like, you can implement a wrapper around |
I don't see how you would get rid of type inference. The difference is only in how you construct the subgraph. Instead of taking the input shapes from the initial inputs (which is like assuming they will be unchanged), you take them from the user. You would the unify the initial inputs, body inputs and body outputs to infer the loop output shapes. By the actual subgraph I mean any representation of fixed subgraph. Something that has a list of arguments and a list of outputs. |
We could also be stricter than the standard and immediately error if the type/shape information of the initial values differs from those we got after running the |
@cbourjau you can't use |
I have spent some time trying to implement different ideas for propagating the shape info, however most of them are either inconsistent or too slow. Currently we have the problem that we cannot reason about the shapes of the state variables. @jbachurski suggested we initial_a: Int[1]
initial_b: Int[1]
initial_c: Int[1]
for ...:
a, b, c = concat(a, a, axis=0), a, b
# propagation iter=0
# a: Int[1]
# b: Int[1]
# c: Int[1]
# propagation iter=1
# a: Int[1] U Int[2] = Int[None]
# b: Int[1] U Int[1] = Int[1]
# c: Int[1] U Int[1] = Int[1]
# propagation iter=2
# a: Int[None] U Int[None] = Int[None]
# b: Int[1] U Int[None] = Int[None]
# c: Int[1] U Int[1] = Int[1]
# propagation iter=3
# a: Int[None] U Int[None] = Int[None]
# b: Int[None] U Int[None] = Int[None]
# c: Int[1] U Int[None] = Int[None]
# propagation iter=4
# a: Int[None] U Int[None] = Int[None]
# b: Int[None] U Int[None] = Int[None]
# c: Int[None] U Int[None] = Int[None]
# Fixed point -> we can stop One can see this procedure can be extended to be arbitrarly long, and when nested the
In my opinion we should strive to hit the middle ground as mentioned in #198, where we try to do the inference once. If we are already at the fixed point(i.e. input and output types of the subgraph are the same) we use those, if however some type changes, give up and do not annotate any of the state variables. |
Thanks for writing this out. I definitely agree that predictable shape inference performance is too important to give up without a really compelling use case. Did you already do the work to construct a bound on the number of iterations with respect to the loop properties (I assume |
The process strictly loses information, so it must terminate. I suspect an argument could be made for the number of iteration to be a: Int[][][][]
for ...:
a = sum(a, axis=0) |
(v17) Custom inference fails when inputs and outputs of the body have mismatched shapes.
The following code should generate a loop that produces a tensor
[1 1 1 1 1]
by appending1
in each iteration, but the inferred shape is(1,)
.Result:
(v19+) Custom inference is missing completely
The fix from v17 is completely missing in all newer versions. The error can be reproduced using the following code
Result:
Note: this test succeeds with v17
The text was updated successfully, but these errors were encountered: