[DLPack] C Functions for DLPack Speed Exchange and Stream Handling #96

Kathryn-cat · 2025-10-08T08:34:57Z

Summary of Changes

This PR introduces a unified DLPackExchangeAPI struct as described in proposal 175. This new convention replaces the previous mechanism of separate function pointers, and aligns with the latest DLPack standard as shown in PR 174.

Within the new DLPackExchangeAPI struct, it also includes a current_work_stream function pointer that allows more robust and integrated querying of the current device stream (e.g., CUDA stream) during DLPack tensor exchanges. All the conversion from/to DLPack has been updated to _no_sync, meaning you should use current_work_stream to explicitly handle stream synchronization. It also includes a non-owning DLTensor conversion to avoid unnecessary reference counting.

Following this change, the Python FFI for PyTorch has been updated to expose the new DLPackExchangeAPI struct via __c_dlpack_exchange_api__ on torch.Tensor.

The 3rdparty/dlpack has been updated to incorporate the latest commit.

Benchmark Results

The benchmark results show that the overhead of API by passing in arguments, doing conversion at cpp extension level, and return matches the expected speedup as described in the proposal 175.

Followup in later PR

This PR does not contain the unit tests for the cpp conversion. There will be a followup PR to add the tests.

python/tvm_ffi/cython/function.pxi

python/tvm_ffi/cython/tensor.pxi

python/tvm_ffi/_optional_torch_c_dlpack.py

Kathryn-cat · 2025-10-11T02:49:34Z

@tqchen comments are addressed. I think this PR is in good shape now, let's separate the unit tests into another PR

Kathryn-cat · 2025-10-11T02:52:32Z

/gemini Review

gemini-code-assist

Code Review

This pull request refactors the DLPack integration to use the new DLPackExchangeAPI struct, replacing the previous mechanism of separate function pointers. This is a significant improvement that aligns with the latest DLPack standard, enhances maintainability, and centralizes the DLPack C-API functions. The changes are extensive, touching C++ JIT code, Cython FFI bindings, and tests. The implementation is well-executed and consistent across the codebase. I've found one minor issue regarding device type handling that could affect non-CUDA GPU backends.

python/tvm_ffi/_optional_torch_c_dlpack.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

tqchen

final comment, we are close, please fix the lint

python/tvm_ffi/cython/tvm_ffi_python_helpers.h

Kathryn-cat · 2025-10-11T20:31:04Z

/gemini summary

gemini-code-assist · 2025-10-11T20:31:28Z

Summary of Changes

This pull request significantly refactors the DLPack integration by introducing a unified DLPackExchangeAPI struct. This change streamlines the process of exchanging tensors and handling device streams between different frameworks, particularly PyTorch, by consolidating related C functions into a single, more manageable interface. It also adds support for non-owning DLTensor conversions to improve efficiency.

Highlights

Unified DLPack Exchange API: A new DLPackExchangeAPI C struct has been introduced to consolidate all DLPack-related function pointers (for converting to/from PyObject, allocating tensors, and querying current work streams) into a single, streamlined interface. This replaces the previous method of passing individual function pointers.
Enhanced Stream Handling: The new DLPackExchangeAPI now includes a current_work_stream function pointer, which allows for more robust and integrated querying of the current device stream (e.g., CUDA stream) during DLPack tensor exchanges, particularly for non-CPU devices.
Non-Owning DLTensor Conversion: A new toDLPackNonOwningImpl function has been added to facilitate the creation of non-owning DLTensor instances directly from PyTorch tensors. This can lead to performance improvements by avoiding unnecessary reference counting overhead.
Refactored PyTorch Integration: The Python FFI for PyTorch has been updated to expose the new DLPackExchangeAPI struct via __c_dlpack_exchange_api__ on torch.Tensor, simplifying the integration and interaction with DLPack functionalities.
DLPack Submodule Update: The 3rdparty/dlpack submodule has been updated to a newer commit, which likely incorporates the necessary definitions and changes for the new DLPackExchangeAPI structure.

Changelog

3rdparty/dlpack
- Updated the DLPack submodule to a newer commit.
python/tvm_ffi/_optional_torch_c_dlpack.py
- Introduced toDLPackNonOwningImpl for non-owning DLTensor conversion.
- Refactored individual DLPack C functions into a unified TorchDLPackExchangeAPI struct.
- Added TorchDLPackCurrentWorkStream for querying device streams.
- Updated torch.Tensor to expose the new __c_dlpack_exchange_api__ instead of individual function pointers.
python/tvm_ffi/core.pyi
- Updated type hints to reflect the change to __c_dlpack_exchange_api__.
python/tvm_ffi/cython/base.pxi
- Added DLPACK_MAJOR_VERSION and DLPACK_MINOR_VERSION macros.
- Defined the DLPackExchangeAPI struct and its associated function pointer types.
- Updated TVMFFIPyCallContext and TVMFFIPyArgSetter to use the new const DLPackExchangeAPI* c_dlpack_exchange_api member.
python/tvm_ffi/cython/function.pxi
- Modified make_ret to accept the DLPackExchangeAPI pointer.
- Refactored TVMFFIPyArgSetterDLPackCExporter_ to TVMFFIPyArgSetterDLPackExchangeAPI_ to utilize the new API struct.
- Introduced GetTorchFallbackExchangeAPI for handling fallback scenarios.
python/tvm_ffi/cython/tensor.pxi
- Refactored DLTensorTestWrapper's DLPack functions into separate _dltensor_test_wrapper_from_pyobject, _dltensor_test_wrapper_to_pyobject, and _dltensor_test_wrapper_current_work_stream functions.
- Implemented _dltensor_test_wrapper_get_exchange_api to provide a static DLPackExchangeAPI instance for testing purposes.
- Updated DLTensorTestWrapper to use __c_dlpack_exchange_api__.
python/tvm_ffi/cython/tvm_ffi_python_helpers.h
- Updated TVMFFIPyCallContext and TVMFFIPyArgSetter to use a single const DLPackExchangeAPI* member instead of individual function pointers.
- Modified TVMFFIPyFuncCall to correctly pass and receive the DLPackExchangeAPI pointer.
tests/python/test_load_inline.py
- Updated test_load_inline_with_env_tensor_allocator to check for the presence of __c_dlpack_exchange_api__.

Activity

Kathryn-cat addressed comments from tqchen and indicated that the PR is in good shape, suggesting that unit tests be moved to a separate PR.
tqchen provided several review comments, including suggestions for refactoring context fields, clarifying static member initialization, moving implementations to the aten:: namespace, ensuring consistent naming, and updating the DLPack API passing mechanism.
Kathryn-cat updated the code multiple times in response to tqchen's feedback.
A bot (gemini-code-assist[bot]) identified a broad condition for device type checks in CUDA stream handling and suggested making it specific to kDLCUDA or kDLROCM, which was subsequently incorporated into the code.

…111) As a followup to PR #96, this PR adds comprehensive unit tests for `torch.Tensor.__c_dlpack_exchange_api__` using inline C++. It validates PyTorch's implementation of the `DLPackExchangeAPI` struct-based fast exchange protocol. Unlike the ctypes-based tests, these tests use `torch.utils.cpp_extension.load_inline` to avoid GIL release issues when calling `THPVariable_Wrap`. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Kathryn-cat force-pushed the kathy/dlpack-c branch 2 times, most recently from f6895ff to 4060e27 Compare October 10, 2025 04:26

init

00fb997

Kathryn-cat force-pushed the kathy/dlpack-c branch from 4060e27 to 00fb997 Compare October 10, 2025 04:27

upd

cbbda6b

Kathryn-cat force-pushed the kathy/dlpack-c branch from 5fabff1 to cbbda6b Compare October 10, 2025 09:48

Kathryn-cat added 5 commits October 10, 2025 17:28

can compile

6d18409

all python tests can pass

68f2da5

change to latest dlpack

f93186b

change TestWrapper and update benchmark results

1ddd13e

debugging managed_tensor_to_py_object_no_sync

2be587b

Kathryn-cat marked this pull request as ready for review October 11, 2025 00:31

Kathryn-cat changed the title ~~wip: C Functions for DLPack Speed Exchange and Stream Handling~~ C Functions for DLPack Speed Exchange and Stream Handling Oct 11, 2025

tqchen requested changes Oct 11, 2025

View reviewed changes

python/tvm_ffi/cython/function.pxi Outdated Show resolved Hide resolved

python/tvm_ffi/cython/tensor.pxi Outdated Show resolved Hide resolved

python/tvm_ffi/_optional_torch_c_dlpack.py Outdated Show resolved Hide resolved

tqchen requested changes Oct 11, 2025

View reviewed changes

python/tvm_ffi/_optional_torch_c_dlpack.py Outdated Show resolved Hide resolved

Kathryn-cat changed the title ~~C Functions for DLPack Speed Exchange and Stream Handling~~ [DLPack] C Functions for DLPack Speed Exchange and Stream Handling Oct 11, 2025

Kathryn-cat added 6 commits October 10, 2025 21:55

move test to cpp

499dfe7

addr comments

457ab5d

addr comments

2ef40ae

addr comments

d5936fe

fix

9a4e177

Update dlpack to latest version

a212e4c

Kathryn-cat requested a review from tqchen October 11, 2025 02:47

gemini-code-assist bot reviewed Oct 11, 2025

View reviewed changes

python/tvm_ffi/_optional_torch_c_dlpack.py Outdated Show resolved Hide resolved

Update python/tvm_ffi/_optional_torch_c_dlpack.py

28f8373

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

SigureMo added a commit to cattidea/Paddle that referenced this pull request Oct 11, 2025

Implement the latest C dlpack exchange API, refer to apache/tvm-ffi#96

7ac3498

SigureMo mentioned this pull request Oct 11, 2025

[DLPack] Bump DLPack to v1.2 and implement C functions exchange API PaddlePaddle/Paddle#75650

Merged

5 tasks

tqchen reviewed Oct 11, 2025

View reviewed changes

python/tvm_ffi/cython/tvm_ffi_python_helpers.h Show resolved Hide resolved

Kathryn-cat added 2 commits October 11, 2025 16:17

addr comments

6aa56dc

lint

497dabd

tqchen approved these changes Oct 11, 2025

View reviewed changes

tqchen merged commit 22a7894 into apache:main Oct 11, 2025
7 checks passed

Kathryn-cat mentioned this pull request Oct 13, 2025

[DLPack] Add inline C++ tests for torch DLPack Exchange API protocol #111

Merged

SigureMo added a commit to cattidea/Paddle that referenced this pull request Oct 14, 2025

Implement the latest C dlpack exchange API, refer to apache/tvm-ffi#96

1fb670e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DLPack] C Functions for DLPack Speed Exchange and Stream Handling #96

[DLPack] C Functions for DLPack Speed Exchange and Stream Handling #96

Uh oh!

Kathryn-cat commented Oct 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kathryn-cat commented Oct 11, 2025

Uh oh!

Kathryn-cat commented Oct 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

tqchen left a comment

Uh oh!

Uh oh!

Kathryn-cat commented Oct 11, 2025

Uh oh!

gemini-code-assist bot commented Oct 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[DLPack] C Functions for DLPack Speed Exchange and Stream Handling #96

[DLPack] C Functions for DLPack Speed Exchange and Stream Handling #96

Uh oh!

Conversation

Kathryn-cat commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of Changes

Benchmark Results

Followup in later PR

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kathryn-cat commented Oct 11, 2025

Uh oh!

Kathryn-cat commented Oct 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

tqchen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Kathryn-cat commented Oct 11, 2025

Uh oh!

gemini-code-assist bot commented Oct 11, 2025

Summary of Changes

Highlights

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Kathryn-cat commented Oct 8, 2025 •

edited

Loading