-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concretization error from communication segment followed by compute segment. #2481
Comments
Can you give me a reproducer? I guess it involves some local changes to your test, so I wasn't sure how. |
Yes, here is a minimal test to reproduce: 00d93fa |
I haven't run the repro yet, but it looks like the issue is that we cannot determine the extent |
Sorry, should have clarified. Running just the matmul+allreduce works (in fact we have this test https://github.com/NVIDIA/Fuser/blob/main/tests/cpp/test_multidevice_matmul.cpp#L274) this failing test just tags an add at the end. The binding error only occurs on the bias add segment. IIUC in that segment, concretizing i0 isn't even necessary since it's reduced iter domain? Also, I bolded the wrong iter domain in the example which is now fixed! |
👍
Great! I hadn't noticed that.
Ah! I think I understand now. Thank you for the explanation! One way to avoid errors like this is for us to strip off that annoying reduction domain for the segment input T5 when we do |
Yes, that is the exact same error! Thank you!! |
Currently, when we segment at the output of a reduction, the consumer segment will have an input tensor that has a `Reduction` axis in it. This can be problematic; see #2481 and #2317. This PR strips reduction axes from the root and allocation domain in these cases. --------- Co-authored-by: Jingyue Wu <[email protected]>
Verified the PR worked for this test! Thank you @jacobhinkle! |
I am seeing a concretization error where an IterDomain extent is not bound to a constant. This error only occurs when we have a reduction based collective (like allreduce/reduce scatter) followed by computation in another segment.
An example looks like this:
---- local matmul ----
T4_g[ ideviceIdx.x10{i0}, iS11{i3}, iS12{i6}, rS13{i2} ] (DeviceMesh{0 1})
= matmul(T3_l[ ideviceIdx.x7{i0}, iS9{i3}, iS8{i2} ] (DeviceMesh{0 1}),
T1_g[ ideviceIdx.x3{i4}, iS4{i5}, iS5{i6} ] (DeviceMesh{0 1}))
---- allreduce --- (dispatched to ProcessGroup)
T5_g[ rS14{i0}, iS15{i3}, iS16{i6} ] (DeviceMesh{0 1} = reduce([ideviceIdx.x3{i4}, iS4{i5}, iS5{i6} DeviceMesh{0 1}))
---- bias add --- Note: rS14{i0} is the Val that triggers the error.
%kernel {
T7_l[ iS19{i3}, iS20{i6} ] (DeviceMesh{0 1})
= __half2float(T5_g[ rS14{i0}, iS15{i3}, iS16{i6} ] (DeviceMesh{0 1}));
T6_l[ bS17{1}, iS18{i7} ] (DeviceMesh{0 1})
= broadcast( T2_l[ iS6{i7} ] (DeviceMesh{0 1}) )
T8_l[ bS21{1}, iS22{i7} ] (DeviceMesh{0 1})
= __half2float(T6_l[ bS17{1}, iS18{i7} ] (DeviceMesh{0 1}));
T9_g[ iS23{i3}, iS24{i6} ] (DeviceMesh{0 1})
= T7_l[ iS19{i3}, iS20{i6} ] (DeviceMesh{0 1})
Error:
C++ exception with description "ext_opt.hasValue() INTERNAL ASSERT FAILED at "/csrc/dynamic_transform.cpp":279, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. Could not evaluate dynamic extent: i0
Exception raised from DynamicTransformConcretizationInfo at csrc/dynamic_transform.cpp:279 (most recent call first):
The allreduce is handled outside of FusionExecutorCache, so I assume this is an issue of piping information about i0's extent to the bias add segment since it cannot be inferred from the segment inputs? If something similar was already implemented for compute segments that take an input with a reduced IterDomain a pointer to it would be greatly appreciated :)
The text was updated successfully, but these errors were encountered: