-
Notifications
You must be signed in to change notification settings - Fork 73
Use toPolymorphicValue to convert py::handle directly to PolymorphicValue
#5771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Review updated until commit a49a6c1 Description
|
| Relevant files | |||
|---|---|---|---|
| Enhancement |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Type Coverage
toPolymorphicValue function only handles 5 specific types (Tensor, bool, int64_t, double, complex), while the previous torch::jit::toIValue with c10::AnyType::get() would accept any type. Need to verify that all types that were previously supported and actually used in practice are covered by the new implementation. |
Test failures
-
(High, 49)
NCCL NVLS multicast binding failures across multidevice/nvfuser test suites on dlcluster_viking_ciTest Name H100 (dist.) Source tests.python.multidevice.test_communication.test_allgather ❌ tests.python.multidevice.test_communication.test_allgather_expanded_broadcast ❌ tests.python.multidevice.test_communication.test_allreduce ❌ tests.python.multidevice.test_communication.test_reduce_scatter ❌ tests.python.multidevice.test_communication.test_reduce_scatter_noncontiguous ❌ tests.python.multidevice.test_dtensor.test_column_parallel_linear ❌ tests.python.multidevice.test_dtensor.test_plus_one ❌ tests.python.multidevice.test_dtensor.test_row_parallel_linear ❌ tests.python.multidevice.test_expert_parallel.test_dispatch_and_combine ❌ tests.python.multidevice.test_matmul.test_column_parallel_grouped_mm ❌ ... with 39 more test failures omitted. Check internal logs. -
(Medium, 1)
NCCL invalid usage in multidevice overlap all-gather tests (tests/python/multidevice/test_overlap.py)Test Name H100 (dist.) Source tests.python.multidevice.test_overlap.test_overlap_allgather_matmul_shard_outermost[backend_type=CommunicatorBackend.cuda] ❌
Greptile OverviewGreptile SummaryReplaces Confidence Score: 5/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant Python as Python Code
participant Converter as toPolymorphicValue
participant Holder as KernelArgumentHolder
Python->>Converter: py::handle (tensor/scalar)
alt torch.Tensor
Converter->>Converter: py::cast<at::Tensor>
else bool
Converter->>Converter: py::cast<bool>
else int
Converter->>Converter: py::cast<int64_t>
else float
Converter->>Converter: py::cast<double>
else complex
Converter->>Converter: py::cast<std::complex<double>>
end
Converter->>Holder: PolymorphicValue
Note over Converter,Holder: Direct conversion without IValue intermediate
|
jjsjann123
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet improvements for such a trivial change.
Looking forward seeing how nanobind can help with latency!
1605983 to
761abfa
Compare
|
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR optimizes the Python to C++ type conversion process in the direct Python API by replacing torch::jit::toIValue with a custom toPolymorphicValue function. The new implementation eliminates the intermediate IValue conversion step and directly converts Python objects to PolymorphicValue, resulting in a 7% performance improvement across all Python tests.
Key Changes:
- Added
toPolymorphicValuehelper function in an anonymous namespace that handles direct conversion frompy::handletoPolymorphicValue - Supports all PyTorch-compatible scalar types:
torch.Tensor,bool,int64_t,double, andstd::complex<double> - Type checking correctly orders
boolbeforeintto handle Python's type hierarchy whereboolis a subclass ofint - Uses static initialization for efficient repeated torch.Tensor type checking
- Maintains the same error handling semantics, throwing
NVF_THROWfor unsupported types - The change is isolated to the direct Python API (
python_direct) and does not affect public interfaces
Compatibility:
- The public API signature of
from_pyiterableremains unchanged - All supported types in the old implementation are supported in the new implementation
- Error handling is consistent with existing codebase patterns
- This change is a prerequisite for future nanobind migration as noted in the PR description
Confidence Score: 5/5
- This PR is safe to merge with minimal risk - the implementation correctly handles all supported types with proper error handling and maintains API compatibility.
- Score of 5 reflects: (1) Correct type checking order (bool before int) handling Python's type hierarchy, (2) All supported types properly handled (Tensor, bool, int64_t, double, complex), (3) Proper error handling with NVF_THROW for unsupported types, (4) Efficient static initialization for torch.Tensor type checking, (5) Unchanged public API signature maintaining backward compatibility, (6) Performance improvement of 7% without behavioral changes, (7) Isolated change to python_direct module with no side effects on other code, (8) Includes are properly handled through existing header dependencies.
- No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| python/python_direct/direct_utils.cpp | 5/5 | Replaced torch::jit::toIValue with a custom toPolymorphicValue function that directly converts py::handle to PolymorphicValue, improving performance by avoiding intermediate IValue conversion. Correctly handles all supported types (torch.Tensor, bool, int64_t, double, complex) with proper type checking order (bool before int) and appropriate error handling. |
Sequence Diagram
sequenceDiagram
participant Python
participant PyIterable as from_pyiterable
participant OldPath as Old Path
participant NewPath as New Path
participant KAH as KernelArgumentHolder
Python->>PyIterable: py::iterable(scalars, tensors)
rect rgb(200, 220, 255)
Note over PyIterable,OldPath: Old Conversion: torch::jit::toIValue
PyIterable->>OldPath: py::handle to IValue
OldPath->>OldPath: IValueToPolymorphicValue
OldPath->>KAH: push PolymorphicValue
end
rect rgb(220, 255, 220)
Note over PyIterable,NewPath: New Conversion: toPolymorphicValue
PyIterable->>NewPath: py::handle to PolymorphicValue
NewPath->>NewPath: Direct type checks
NewPath->>KAH: push PolymorphicValue
end
761abfa to
a49a6c1
Compare
|
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No files reviewed, no comments
torch::jit::toIValueconverts frompy::handletoIValue. ThenKernelArgumentHolderconvertsIValuetoPolymorphicValue. The PyTorch function handles several types, beyond what NvFuser supports, resulting in unnecessary latency.toPolymorphicValueconvertspy::handledirectly toPolymorphicValueto save latency.nanobind.tests/python/direct/test_python_frontend.py
toPolymorphicValuetorch::jit::toIValueAll python tests --- 7% improvement
toPolymorphicValuetorch::jit::toIValue