-
Notifications
You must be signed in to change notification settings - Fork 74
skip unsupported arches in python tests #5760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
!test |
Greptile OverviewGreptile SummaryOverviewThis PR updates architecture checks in Python tests to skip unsupported GPU architectures for FP4 quantization and GEMM operations. The changes replace module-level skip conditions with per-test Key Changes
Critical Issues Found1. Incorrect Architecture Definition in
|
| Filename | Score | Overview |
|---|---|---|
| tests/python/direct_utils/utils.py | 3/5 | Added is_blackwell() helper but incorrectly includes compute 12.0 and 12.1 which are not supported by cutlass scheduler |
| tests/python/direct/test_cutlass_nvfp4_gemm.py | 5/5 | Correctly replaced module-level skip with per-test skipif decorators using microarchitecture_is(10, 0) |
| tests/python/direct/test_narrow_precision.py | 2/5 | Inconsistent skip conditions: test_scaled_mm_nv_quantized uses is_blackwell() (includes unsupported 12.0/12.1) while other tests correctly use microarchitecture_is(10, 0) |
| tests/python/direct/test_with_id_model_indexer.py | 5/5 | Correctly consolidated dual skip conditions into single microarchitecture_is(10, 0) check |
Sequence Diagram
sequenceDiagram
participant Test as Test Runner
participant Skip as Skip Condition Check
participant Device as GPU Device
participant Cutlass as Cutlass Scheduler
participant Nvfuser as Nvfuser Operations
Test->>Skip: Evaluate @pytest.mark.skipif
Skip->>Device: Query compute capability
Device-->>Skip: Returns (major, minor)
alt Old Logic (is_pre_blackwell + microarchitecture_is_pre)
Skip->>Skip: Check major >= 10 AND major < 12
Note over Skip: Runs on 10.0, 10.3, 11.x
else New Logic (microarchitecture_is)
Skip->>Skip: Check major == 10 AND minor == 0
Note over Skip: Only runs on 10.0
else New Logic (is_blackwell - problematic)
Skip->>Skip: Check 10.0, 10.3, 12.0, 12.1
Note over Skip: Includes unsupported 12.0/12.1
end
alt Test NOT Skipped
Test->>Nvfuser: Execute fusion (scaled_mm, cutlass ops)
Nvfuser->>Cutlass: Request scheduling
Cutlass->>Device: Verify compute capability
alt Device is 10.0 or 10.3
Cutlass-->>Nvfuser: Schedule operations
Nvfuser-->>Test: Success
else Device is 12.0 or 12.1
Cutlass-->>Nvfuser: Error: Unsupported architecture
Nvfuser-->>Test: FAIL
end
else Test Skipped
Skip-->>Test: Skip test
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 files reviewed, 1 comment
| @pytest.mark.skipif( | ||
| is_pre_blackwell(), reason="Only supported on blackwell and newer devices." | ||
| ) | ||
| @pytest.mark.skipif(not is_blackwell(), reason="Only supported on blackwell.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Inconsistent architecture filtering: this test uses is_blackwell() (supports 10.0, 10.3, 12.0, 12.1) while other tests in this same file use microarchitecture_is(10, 0) (only supports 10.0). This creates different test coverage across Blackwell variants.
|
Review updated until commit 835c7a8 Description
|
| Relevant files | |||||||
|---|---|---|---|---|---|---|---|
| Tests |
| ||||||
| Enhancement |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Architecture Coverage Inconsistency
microarchitecture_is(10, 0) which only allows compute capability 10.0, but the new is_blackwell() function in utils.py includes 10.3, 12.0, and 12.1 as Blackwell architectures. This suggests the tests might be too restrictive and not testing all intended Blackwell architectures. |
Test failures
-
(Low, 1)
Minor numerical mismatch in thunder instance_norm nvFuser CUDA test on A100Test Name A100 Source thunder.tests.test_ops.test_core_vs_torch_consistency_instance_norm_nvfuser_cuda_thunder.dtypes.float32 ❌
|
The title suggests this has something to do with indexing. Can you clarify what it is? |
test_with_id_model_indexer.py
The title was not accurate, failure is not realated to indexing. Revised to |
|
@greptileai review this draft |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 files reviewed, 4 comments
| @pytest.mark.skipif( | ||
| is_pre_blackwell(), reason="Only supported on blackwell and newer devices." | ||
| ) | ||
| @pytest.mark.skipif( | ||
| not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0" | ||
| not microarchitecture_is(10, 0), | ||
| reason="Does not support blackwell compute 12.0, other arches are not tested.", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Inconsistent architecture filtering within same file: this test uses microarchitecture_is(10, 0) (only supports 10.0), but other tests in this file (lines 66, 173, 320) use is_blackwell() which supports all Blackwell variants (10.0, 10.3, 12.0, 12.1). Are the _scaled_mm tests specifically incompatible with 12.x architectures while quantization tests support them? Do scaled_mm operations have specific incompatibilities with 12.x compute capabilities that quantization operations don't have?
| @pytest.mark.skipif( | ||
| is_pre_blackwell(), reason="Only supported on blackwell and newer devices." | ||
| ) | ||
| @pytest.mark.skipif( | ||
| not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0" | ||
| not microarchitecture_is(10, 0), | ||
| reason="Does not support blackwell compute 12.0, other arches are not tested.", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Same inconsistency: microarchitecture_is(10, 0) here vs is_blackwell() at line 320 for similar grouped matmul tests
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| @pytest.mark.skipif( | ||
| is_pre_blackwell(), reason="Only supported on blackwell and newer devices." | ||
| ) | ||
| @pytest.mark.skipif( | ||
| not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0" | ||
| not microarchitecture_is(10, 0), | ||
| reason="Does not support blackwell compute 12.0, other arches are not tested.", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Same pattern: microarchitecture_is(10, 0) here vs is_blackwell() elsewhere in file
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| @pytest.mark.skipif( | ||
| is_pre_blackwell(), reason="Only supported on blackwell and newer devices." | ||
| ) | ||
| @pytest.mark.skipif( | ||
| not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0" | ||
| not microarchitecture_is(10, 0), | ||
| reason="Does not support blackwell compute 12.0, other arches are not tested.", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Same pattern: microarchitecture_is(10, 0) here vs is_blackwell() elsewhere in file
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
test_with_id_model_indexer.py|
!test |
|
Summary of failure reasons:
|
|
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 files reviewed, 6 comments
| def is_blackwell(): | ||
| return ( | ||
| microarchitecture_is(10, 0) | ||
| or microarchitecture_is(10, 3) | ||
| or microarchitecture_is(12, 0) | ||
| or microarchitecture_is(12, 1) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is_blackwell() function incorrectly includes compute capability 12.0 and 12.1 as "Blackwell" architectures. However, the cutlass scheduler (csrc/scheduler/cutlass.cpp lines 75-84) explicitly only supports compute capabilities 10.0 and 10.3:
if (device_prop->major != 10 ||
!(device_prop->minor == 0 || device_prop->minor == 3)) {
// error: "Cutlass scheduler only supports GB200 and GB300 (cc 10.0 or 10.3)"
}Compute capabilities 12.0 (RTX PRO 6000 and RTX 50XX) and 12.1 (DGX Spark) are NOT Blackwell architectures and are NOT supported by the cutlass scheduler.
This function should only return true for 10.0 and 10.3:
| def is_blackwell(): | |
| return ( | |
| microarchitecture_is(10, 0) | |
| or microarchitecture_is(10, 3) | |
| or microarchitecture_is(12, 0) | |
| or microarchitecture_is(12, 1) | |
| ) | |
| def is_blackwell(): | |
| return ( | |
| microarchitecture_is(10, 0) | |
| or microarchitecture_is(10, 3) | |
| ) |
Alternatively, if 12.0 and 12.1 are legitimately Blackwell variants, the function should be renamed to clarify it's broader than what cutlass supports, and tests should use more specific checks.
| @pytest.mark.skipif( | ||
| is_pre_blackwell(), reason="Only supported on blackwell and newer devices." | ||
| ) | ||
| @pytest.mark.skipif(not is_blackwell(), reason="Only supported on blackwell.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent architecture check: this test uses is_blackwell() which includes compute capabilities 10.0, 10.3, 12.0, and 12.1. However:
- Other similar tests in this file that use
fd.ops.scaled_mm(liketest_scaled_mmat line 245) usemicroarchitecture_is(10, 0)- only 10.0 - The cutlass scheduler only supports 10.0 and 10.3 (not 12.0 or 12.1)
This test uses fd.ops.scaled_mm and fd.ops.nv_block_quantize, which may rely on cutlass scheduling. If so, it should use the same skip condition as other tests:
| @pytest.mark.skipif(not is_blackwell(), reason="Only supported on blackwell.") | |
| @pytest.mark.skipif(not microarchitecture_is(10, 0), reason="Only supported on blackwell.") |
If this test is genuinely intended to support all Blackwell variants (10.0, 10.3, 12.0, 12.1) because scaled_mm uses a different scheduler (nvmmh), then the is_blackwell() function needs to be fixed to exclude 12.0 and 12.1, OR this needs documentation explaining why it's different.
| @pytest.mark.skipif( | ||
| is_pre_blackwell(), reason="Only supported on blackwell and newer devices." | ||
| ) | ||
| @pytest.mark.skipif( | ||
| not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0" | ||
| not microarchitecture_is(10, 0), | ||
| reason="Does not support blackwell compute 12.0, other arches are not tested.", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The skip condition logic has changed significantly. The old conditions were:
is_pre_blackwell()→ skip if major < 10not microarchitecture_is_pre(12)→ skip if major >= 12
This ran tests on architectures where: 10 <= major < 12 (i.e., 10.0, 10.3, and any 11.x devices)
The new condition not microarchitecture_is(10, 0) runs tests ONLY on exactly 10.0.
If the intent is to support 10.3 (B300/GB300) in addition to 10.0, the condition should be:
| @pytest.mark.skipif( | |
| is_pre_blackwell(), reason="Only supported on blackwell and newer devices." | |
| ) | |
| @pytest.mark.skipif( | |
| not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0" | |
| not microarchitecture_is(10, 0), | |
| reason="Does not support blackwell compute 12.0, other arches are not tested.", | |
| ) | |
| @pytest.mark.skipif( | |
| not (microarchitecture_is(10, 0) or microarchitecture_is(10, 3)), | |
| reason="Does not support blackwell compute 12.0, other arches are not tested.", | |
| ) |
However, if 10.3 is deliberately excluded because it hasn't been tested, the current change is correct but note that this is more restrictive than the original logic.
| @pytest.mark.skipif( | ||
| is_pre_blackwell(), reason="Only supported on blackwell and newer devices." | ||
| ) | ||
| @pytest.mark.skipif( | ||
| not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0" | ||
| not microarchitecture_is(10, 0), | ||
| reason="Does not support blackwell compute 12.0, other arches are not tested.", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as test_scaled_mm: the skip condition logic has changed from supporting 10.x and 11.x architectures to only supporting exactly 10.0.
Old: is_pre_blackwell() AND not microarchitecture_is_pre(12) → run on 10 <= major < 12
New: not microarchitecture_is(10, 0) → run only on major==10 and minor==0
This test uses fd.ops.cutlass_nvfp4_grouped_mm which is explicitly a cutlass operation. The cutlass scheduler supports both 10.0 and 10.3 according to csrc/scheduler/cutlass.cpp. Consider updating to support both tested architectures if 10.3 should be included.
| @pytest.mark.skipif( | ||
| is_pre_blackwell(), reason="Only supported on blackwell and newer devices." | ||
| ) | ||
| @pytest.mark.skipif( | ||
| not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0" | ||
| not microarchitecture_is(10, 0), | ||
| reason="Does not support blackwell compute 12.0, other arches are not tested.", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue: the skip condition has become more restrictive, changing from supporting 10.x and 11.x to only 10.0. This test also uses fd.ops.cutlass_nvfp4_grouped_mm which should support both 10.0 and 10.3 based on the cutlass scheduler implementation.
| @pytest.mark.skipif( | ||
| is_pre_blackwell(), reason="Only supported on blackwell and newer devices." | ||
| ) | ||
| @pytest.mark.skipif( | ||
| not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0" | ||
| not microarchitecture_is(10, 0), | ||
| reason="Does not support blackwell compute 12.0, other arches are not tested.", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The skip condition has changed from supporting 10.x and 11.x architectures (via is_pre_blackwell() AND not microarchitecture_is_pre(12)) to only supporting exactly 10.0. This is consistent with the changes in test_narrow_precision.py but represents a more restrictive test coverage than before. Verify this is intentional.
skip unsupported arches in grouped and scaled matmuls