-
Notifications
You must be signed in to change notification settings - Fork 74
skip unsupported arches in python tests #5760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
4b7d91b
235f161
40f0ef7
f9fdc32
55b5572
bd58ae9
e3e0b6a
835c7a8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -16,6 +16,8 @@ | |||||||||||||||||||||||||
| pytorch_nvfp4_quantize, | ||||||||||||||||||||||||||
| is_pre_blackwell, | ||||||||||||||||||||||||||
| microarchitecture_is_pre, | ||||||||||||||||||||||||||
| is_blackwell, | ||||||||||||||||||||||||||
| microarchitecture_is, | ||||||||||||||||||||||||||
| linear_to_swizzled_128_4, | ||||||||||||||||||||||||||
| round_up, | ||||||||||||||||||||||||||
| activation_scale_to_nvfp4, | ||||||||||||||||||||||||||
|
|
@@ -241,10 +243,8 @@ def nvfuser_fusion_id0(fd: FusionDefinition): | |||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| # cannot use opinfo test, because the input tensor dtype and fusion definition dtype doesn't match | ||||||||||||||||||||||||||
| @pytest.mark.skipif( | ||||||||||||||||||||||||||
| is_pre_blackwell(), reason="Only supported on blackwell and newer devices." | ||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
| @pytest.mark.skipif( | ||||||||||||||||||||||||||
| not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0" | ||||||||||||||||||||||||||
| not microarchitecture_is(10, 0), | ||||||||||||||||||||||||||
| reason="Does not support blackwell compute 12.0, other arches are not tested.", | ||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
|
Comment on lines
245
to
248
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The skip condition logic has changed significantly. The old conditions were:
This ran tests on architectures where: 10 <= major < 12 (i.e., 10.0, 10.3, and any 11.x devices) The new condition If the intent is to support 10.3 (B300/GB300) in addition to 10.0, the condition should be:
Suggested change
However, if 10.3 is deliberately excluded because it hasn't been tested, the current change is correct but note that this is more restrictive than the original logic. |
||||||||||||||||||||||||||
| @pytest.mark.parametrize("config", [[128, 256, 512], [128, 256, 512]]) | ||||||||||||||||||||||||||
| @pytest.mark.parametrize("out_dtype", [torch.bfloat16]) | ||||||||||||||||||||||||||
|
|
@@ -323,9 +323,7 @@ def nvfuser_fusion_id0(fd: FusionDefinition) -> None: | |||||||||||||||||||||||||
| torch.testing.assert_close(outputs[0], ref_outputs, rtol=1e-1, atol=1e-2) | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| @pytest.mark.skipif( | ||||||||||||||||||||||||||
| is_pre_blackwell(), reason="Only supported on blackwell and newer devices." | ||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
| @pytest.mark.skipif(not is_blackwell(), reason="Only supported on blackwell.") | ||||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Inconsistent architecture check: this test uses
This test uses
Suggested change
If this test is genuinely intended to support all Blackwell variants (10.0, 10.3, 12.0, 12.1) because |
||||||||||||||||||||||||||
| @pytest.mark.parametrize("config", [[1024, 1024, 1024]]) | ||||||||||||||||||||||||||
| @pytest.mark.parametrize("out_dtype", [torch.bfloat16]) | ||||||||||||||||||||||||||
| def test_scaled_mm_nv_quantized( | ||||||||||||||||||||||||||
|
|
@@ -454,10 +452,8 @@ def fusion_baseline(fd: FusionDefinition) -> None: | |||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| @pytest.mark.skipif( | ||||||||||||||||||||||||||
| is_pre_blackwell(), reason="Only supported on blackwell and newer devices." | ||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
| @pytest.mark.skipif( | ||||||||||||||||||||||||||
| not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0" | ||||||||||||||||||||||||||
| not microarchitecture_is(10, 0), | ||||||||||||||||||||||||||
| reason="Does not support blackwell compute 12.0, other arches are not tested.", | ||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
|
Comment on lines
454
to
457
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. style: Same inconsistency: Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Comment on lines
454
to
457
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same issue as test_scaled_mm: the skip condition logic has changed from supporting 10.x and 11.x architectures to only supporting exactly 10.0. Old: This test uses |
||||||||||||||||||||||||||
| @pytest.mark.parametrize("config", [[1024, 128, 256]]) | ||||||||||||||||||||||||||
| @pytest.mark.parametrize("tokens_per_expert_neg_one", [[115, 144, 8]]) | ||||||||||||||||||||||||||
|
|
@@ -661,10 +657,8 @@ def nvfuser_fusion_id0(fd: FusionDefinition) -> None: | |||||||||||||||||||||||||
| # 1. inputs data needs to be changed from `torch.testing.make_tensor` to `torch.randn`; | ||||||||||||||||||||||||||
| # 2. output errors are much more relaxed. | ||||||||||||||||||||||||||
| @pytest.mark.skipif( | ||||||||||||||||||||||||||
| is_pre_blackwell(), reason="Only supported on blackwell and newer devices." | ||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
| @pytest.mark.skipif( | ||||||||||||||||||||||||||
| not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0" | ||||||||||||||||||||||||||
| not microarchitecture_is(10, 0), | ||||||||||||||||||||||||||
| reason="Does not support blackwell compute 12.0, other arches are not tested.", | ||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
|
Comment on lines
659
to
662
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. style: Same pattern: Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Comment on lines
659
to
662
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same issue: the skip condition has become more restrictive, changing from supporting 10.x and 11.x to only 10.0. This test also uses |
||||||||||||||||||||||||||
| @pytest.mark.parametrize("config", [[1024, 128, 256]]) | ||||||||||||||||||||||||||
| @pytest.mark.parametrize("tokens_per_expert_neg_one", [[115, 144, 8]]) | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,8 +15,7 @@ | |
| FLOAT8_E4M3_EPS, | ||
| FLOAT8_E4M3_MAX, | ||
| pytorch_nvfp4_quantize, | ||
| is_pre_blackwell, | ||
| microarchitecture_is_pre, | ||
| microarchitecture_is, | ||
| linear_to_swizzled_128_4, | ||
| round_up, | ||
| activation_scale_to_nvfp4, | ||
|
|
@@ -31,10 +30,8 @@ | |
| # assertion. Having this as a separate test file would avoid environment | ||
| # variable contamination from others. | ||
| @pytest.mark.skipif( | ||
| is_pre_blackwell(), reason="Only supported on blackwell and newer devices." | ||
| ) | ||
| @pytest.mark.skipif( | ||
| not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0" | ||
| not microarchitecture_is(10, 0), | ||
| reason="Does not support blackwell compute 12.0, other arches are not tested.", | ||
| ) | ||
|
Comment on lines
32
to
35
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The skip condition has changed from supporting 10.x and 11.x architectures (via |
||
| @pytest.mark.parametrize("config", [[1024, 128, 256]]) | ||
| @pytest.mark.parametrize("tokens_per_expert_neg_one", [[115, 144, 8]]) | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -35,6 +35,19 @@ def is_pre_blackwell(): | |||||||||||||||||||||||||
| return microarchitecture_is_pre(10) | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| # 10.0 (B200 and GB200) | ||||||||||||||||||||||||||
| # 10.3 (B300 and GB300) | ||||||||||||||||||||||||||
| # 12.0 (RTX PRO 6000 and RTX 50XX) | ||||||||||||||||||||||||||
| # 12.1 (DGX Spark) | ||||||||||||||||||||||||||
| def is_blackwell(): | ||||||||||||||||||||||||||
| return ( | ||||||||||||||||||||||||||
| microarchitecture_is(10, 0) | ||||||||||||||||||||||||||
| or microarchitecture_is(10, 3) | ||||||||||||||||||||||||||
| or microarchitecture_is(12, 0) | ||||||||||||||||||||||||||
| or microarchitecture_is(12, 1) | ||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
|
Comment on lines
+42
to
+48
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The if (device_prop->major != 10 ||
!(device_prop->minor == 0 || device_prop->minor == 3)) {
// error: "Cutlass scheduler only supports GB200 and GB300 (cc 10.0 or 10.3)"
}Compute capabilities 12.0 (RTX PRO 6000 and RTX 50XX) and 12.1 (DGX Spark) are NOT Blackwell architectures and are NOT supported by the cutlass scheduler. This function should only return true for 10.0 and 10.3:
Suggested change
Alternatively, if 12.0 and 12.1 are legitimately Blackwell variants, the function should be renamed to clarify it's broader than what cutlass supports, and tests should use more specific checks. |
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| # Get string representation for FusionDefinition | ||||||||||||||||||||||||||
| # Run captured python definition | ||||||||||||||||||||||||||
| # Check that the result of captured python definition matches original results | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Inconsistent architecture filtering within same file: this test uses
microarchitecture_is(10, 0)(only supports 10.0), but other tests in this file (lines 66, 173, 320) useis_blackwell()which supports all Blackwell variants (10.0, 10.3, 12.0, 12.1). Are the_scaled_mmtests specifically incompatible with 12.x architectures while quantization tests support them? Do scaled_mm operations have specific incompatibilities with 12.x compute capabilities that quantization operations don't have?