Skip to content

Conversation

@nieubank
Copy link
Contributor

@nieubank nieubank commented Dec 18, 2025

Description

Introduces the OrtExternalResourceImporter API enabling execution providers to import D3D12 shared resources and timeline fences for zero-copy GPU-to-GPU data sharing with ORT inference.

Public API additions:

  • OrtExternalResourceImporter capability object
  • OrtExternalMemoryHandle for imported D3D12 allocations
  • OrtExternalSemaphoreHandle for imported D3D12 timeline fences
  • SessionGetEpDeviceForOutputs to query output EP device placement

EP Plugin API:

  • OrtExternalResourceImporterImpl interface for EP implementations
  • OrtEpFactory::CreateExternalResourceImporterForDevice extension

Design:

  • No GPU virtual addresses in public API
  • EP-agnostic design allows any EP to implement import
  • Capability discovery with explicit ORT_NOT_IMPLEMENTED
  • Follows existing patterns (Allocator, DataTransfer, SyncStream)

Includes example_plugin_ep mock implementation and autoep tests.

Motivation and Context

#26821

Introduces the OrtExternalResourceImporter API enabling execution providers
to import D3D12 shared resources and timeline fences for zero-copy GPU-to-GPU
data sharing with ORT inference.

Public API additions:
- OrtExternalResourceImporter capability object
- OrtExternalMemoryHandle for imported D3D12 allocations
- OrtExternalSemaphoreHandle for imported D3D12 timeline fences
- SessionGetEpDeviceForOutputs to query output EP device placement
- RunOptions_SetSyncStream to associate sync stream for async execution

EP Plugin API:
- OrtExternalResourceImporterImpl interface for EP implementations
- OrtEpFactory::CreateExternalResourceImporterForDevice extension

Design:
- No GPU virtual addresses in public API
- EP-agnostic design allows any EP to implement import
- Capability discovery with explicit ORT_NOT_IMPLEMENTED
- Follows existing patterns (Allocator, DataTransfer, SyncStream)

Includes example_plugin_ep mock implementation and autoep tests.
@nieubank nieubank requested a review from skottmckay December 18, 2025 21:04
- Deleted the sync_stream member from OrtRunOptions structure.
- Removed the RunOptions_SetSyncStream API and its implementation.
- Updated related C++ API and example implementations to reflect the removal of sync stream functionality.
- Adjusted tests to remove references to RunOptions_SetSyncStream.
- Introduced new structures for external memory and semaphore handles to improve resource management.
- Ensured backward compatibility by checking EP version support for external resource import.
@nieubank nieubank marked this pull request as ready for review December 19, 2025 23:04
yuslepukhin
yuslepukhin previously approved these changes Dec 22, 2025
Copy link
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@nieubank
Copy link
Contributor Author

nieubank commented Jan 5, 2026

@skottmckay / @yuslepukhin - any concerns with getting this in this week prior to the 1.24 snap?

I think this provides a sufficient baseline for achieving the core of the asks around shared resources and we continue iterating on specific EP implementations with this foundation.

@yuslepukhin
Copy link
Member

Looks great to me

@nieubank nieubank enabled auto-merge (squash) January 5, 2026 23:39
@nieubank nieubank requested a review from skottmckay January 5, 2026 23:40
@skottmckay
Copy link
Contributor

@gaugarg-nv and @gedoensmax can you please take a look and let us know if there are any gaps/issues with this approach?

@nieubank nieubank disabled auto-merge January 6, 2026 22:38
- Added `ep_interop_api.h` to define the Interop API for external resource importers.
- Implemented functions for creating and managing external resource importers, including memory and semaphore import capabilities.
- Updated `onnxruntime_c_api.cc` to integrate the new Interop API, replacing previous external resource importer implementations.
- Modified `ort_apis.h` to declare the new Interop API functions.
- Refactored tests in `test_external_resource_importer.cc` to utilize the new Interop API for external resource importer operations.
Resolved API conflicts by placing KernelInfo APIs before Interop APIs
- Return ORT_NOT_IMPLEMENTED status instead of nullptr when EP doesn't support external resource import
- Rename ep_interop_api.{cc,h} to interop_api.{cc,h} to match the generic OrtInteropApi naming
- Update documentation to reflect the new error handling behavior
…eExternalResourceImporterForDevice

Capability discovery APIs should return success with nullptr output when a feature
is unsupported, rather than an error status. This allows simple "if (out != nullptr)"
checks without needing to distinguish ORT_NOT_IMPLEMENTED from real errors.

- Update tests to assert status is nullptr and skip when importer is nullptr
@gedoensmax
Copy link
Contributor

@skottmckay thanks for tagging, this looks good to me. One question i would have is if we could handle e.g. overallocation of allocation with a callback could work. Usually one can simply bin a memory info and rely on ORT to handle the correct allocation size. With this approach it will require preallocation and importing the memory beforehand with the correct shape i assume.
I think this is an edge case just wanted to hear your thoughts.

@nieubank
Copy link
Contributor Author

nieubank commented Jan 8, 2026

@skottmckay thanks for tagging, this looks good to me. One question i would have is if we could handle e.g. overallocation of allocation with a callback could work. Usually one can simply bin a memory info and rely on ORT to handle the correct allocation size. With this approach it will require preallocation and importing the memory beforehand with the correct shape i assume. I think this is an edge case just wanted to hear your thoughts.

Right, good callout, the current design is intentionally kept simple and focused on the core import-existing-memory scenario for basic zero-copy interop.

The dual offset pattern OrtExternalMemoryDescriptor::offset_bytes + OrtExternalTensorDescriptor::offset_bytes does allow importing a larger buffer and carving out regions, but I agree it's not quite the same as ORT-driven binning strategies.

This is definitely something we can build on top of in the future, through allocator callbacks or other extensions.

Copy link
Contributor

@skottmckay skottmckay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@nieubank nieubank merged commit c54be3c into main Jan 9, 2026
102 of 103 checks passed
@nieubank nieubank deleted the nieubank/ext_importer branch January 9, 2026 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants