-
Notifications
You must be signed in to change notification settings - Fork 45
[DLPack] C Functions for DLPack Speed Exchange and Stream Handling #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f6895ff to
4060e27
Compare
4060e27 to
00fb997
Compare
5fabff1 to
cbbda6b
Compare
|
@tqchen comments are addressed. I think this PR is in good shape now, let's separate the unit tests into another PR |
|
/gemini Review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the DLPack integration to use the new DLPackExchangeAPI struct, replacing the previous mechanism of separate function pointers. This is a significant improvement that aligns with the latest DLPack standard, enhances maintainability, and centralizes the DLPack C-API functions. The changes are extensive, touching C++ JIT code, Cython FFI bindings, and tests. The implementation is well-executed and consistent across the codebase. I've found one minor issue regarding device type handling that could affect non-CUDA GPU backends.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
tqchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final comment, we are close, please fix the lint
|
/gemini summary |
Summary of ChangesThis pull request significantly refactors the DLPack integration by introducing a unified Highlights
Changelog
Activity
|
…111) As a followup to PR #96, this PR adds comprehensive unit tests for `torch.Tensor.__c_dlpack_exchange_api__` using inline C++. It validates PyTorch's implementation of the `DLPackExchangeAPI` struct-based fast exchange protocol. Unlike the ctypes-based tests, these tests use `torch.utils.cpp_extension.load_inline` to avoid GIL release issues when calling `THPVariable_Wrap`. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Summary of Changes
This PR introduces a unified
DLPackExchangeAPIstruct as described in proposal 175. This new convention replaces the previous mechanism of separate function pointers, and aligns with the latest DLPack standard as shown in PR 174.Within the new
DLPackExchangeAPIstruct, it also includes acurrent_work_streamfunction pointer that allows more robust and integrated querying of the current device stream (e.g., CUDA stream) during DLPack tensor exchanges. All the conversion from/to DLPack has been updated to_no_sync, meaning you should usecurrent_work_streamto explicitly handle stream synchronization. It also includes a non-owning DLTensor conversion to avoid unnecessary reference counting.Following this change, the Python FFI for PyTorch has been updated to expose the new
DLPackExchangeAPIstruct via__c_dlpack_exchange_api__on torch.Tensor.The
3rdparty/dlpackhas been updated to incorporate the latest commit.Benchmark Results
The benchmark results show that the overhead of API by passing in arguments, doing conversion at cpp extension level, and return matches the expected speedup as described in the proposal 175.
Followup in later PR
This PR does not contain the unit tests for the cpp conversion. There will be a followup PR to add the tests.