Skip to content

Conversation

@pggPL
Copy link
Collaborator

@pggPL pggPL commented Oct 13, 2025

Description

This PR fixes 2 issues with debug tests:

  • distributed tests for debug were failing on Ampere because test_log in test_distributed needs fp8, but it was not skipped on non fp8 devices,
  • The test_perf used to check if the CPU time of debug layers (with no features active) matched non-debug layers. This test was unstable and sometimes failed. I replaced it with a deterministic test that verifies inspect_tensor_enabled is called only once when the feature is never used. Previously, calling this API every iteration caused heavy CPU overhead. That issue was fixed in [PyTorch debug] Improve precision debug tools performance #1909.

I added option inspect_only_once to TestDummyFeature - in this case inspect_tensor_enabled will return True, None - which means "invoke inspect_tensor() this iteration, but the future will be never invoked again".

Type of change

  • test fix

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

pggPL and others added 4 commits October 13, 2025 09:05
Signed-off-by: Pawel Gadzinski <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
@pggPL pggPL force-pushed the fix_performance_test_for_nvinspect branch from c2fa0ec to ea3d54e Compare October 13, 2025 09:38
@pggPL
Copy link
Collaborator Author

pggPL commented Oct 13, 2025

/te-ci pytorch L1

pggPL and others added 6 commits October 17, 2025 11:47
Signed-off-by: Pawel Gadzinski <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
@pggPL
Copy link
Collaborator Author

pggPL commented Oct 22, 2025

/te-ci pytorch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant