Summary
Adopt TVM FFI as PegaInfer's official DSL interface, starting with a bidirectional interoperability test that proves PegaInfer can call into a TVM FFI-exported object and that a TVM FFI-side caller can call back into a PegaInfer-exported test function.
The first milestone should be a validation scaffold. Once that is stable, TVM FFI becomes the intended boundary for DSL-produced kernel artifacts that need to integrate with Rust-side PegaInfer runtime code.
Motivation
PegaInfer already has several FFI boundaries in pegainfer-kernels for CUDA, cuBLAS, FlashInfer, Triton AOT, and generated DeepSeek V4 kernels. The CuTe DSL generator at pegainfer-kernels/tools/cutedsl/deepseek_v4/generate.py currently discovers runtime libraries with enable_tvm_ffi=False.
Before using TVM FFI as the official DSL interface in production paths, add a small bidirectional test so we know the dependency, symbol loading, object lifetime, tensor handoff, error propagation, and CI/build behavior are understood.
Another reason to validate this path early: TVM FFI can be a route for launching Triton-generated CUBINs from Rust. A minimal interop test gives us a controlled place to verify that launch path before wiring it into real generated-kernel or model execution code.
Proposed Scope
Define TVM FFI as the official interface between PegaInfer Rust code and DSL-generated kernel artifacts, then add a feature-gated test path, for example tvm-ffi-interop, under pegainfer-kernels.
The first test should cover two directions:
-
PegaInfer -> TVM FFI
- Build or load a tiny TVM FFI-exported fixture.
- Call it from Rust test code.
- Validate simple scalar or CPU tensor behavior, such as
add_one([1, 2, 3]) -> [2, 3, 4].
-
TVM FFI -> PegaInfer
- Export a tiny PegaInfer-owned test function through the same interop boundary.
- Call it from the TVM FFI side.
- Validate return values and error handling.
Start CPU-only unless CUDA tensor handoff is straightforward. CUDA/DLPack coverage can be a follow-up once the host-side ABI is stable.
After the test path passes, use it to document the expected DSL integration contract:
- how DSL-generated artifacts are discovered or linked;
- how Rust launches exported functions or kernels;
- how DSL-side code calls back into PegaInfer-owned test/runtime functions;
- how tensor handles, stream handles, and errors cross the boundary;
- how Triton-generated CUBIN launch support should be represented.
Acceptance Criteria
-
The issue establishes TVM FFI as PegaInfer's official DSL interface direction.
-
A test-only dependency path exists and is off by default.
-
The normal default build does not pull in TVM FFI.
-
A focused command is documented in the test or issue follow-up, for example:
cargo test --release -p pegainfer-kernels --features tvm-ffi-interop tvm_ffi_bidirectional -- --ignored
-
The test verifies both call directions, not just loading a library.
-
Failures surface useful diagnostics for missing TVM FFI runtime libraries, version mismatch, symbol lookup failure, and callback error propagation.
-
The draft integration contract covers DSL-generated artifacts and the Triton CUBIN launch path.
-
The implementation does not replace the existing handwritten CUDA FFI surface.
Non-Goals
- Do not migrate existing CUDA kernel calls to TVM FFI in the first validation milestone.
- Do not make TVM FFI a default workspace dependency.
- Do not add model-runtime behavior or scheduler changes.
- Do not require GPU hardware for the first interoperability check.
Notes
- Keep the implementation in the kernels/build-test boundary unless a later production use case requires a wider runtime API.
- Treat Rust package/version details as part of the spike. If TVM FFI Rust support needs a git/path dependency or build-time environment setup, document that explicitly before expanding scope.
- Useful references:
Summary
Adopt TVM FFI as PegaInfer's official DSL interface, starting with a bidirectional interoperability test that proves PegaInfer can call into a TVM FFI-exported object and that a TVM FFI-side caller can call back into a PegaInfer-exported test function.
The first milestone should be a validation scaffold. Once that is stable, TVM FFI becomes the intended boundary for DSL-produced kernel artifacts that need to integrate with Rust-side PegaInfer runtime code.
Motivation
PegaInfer already has several FFI boundaries in
pegainfer-kernelsfor CUDA, cuBLAS, FlashInfer, Triton AOT, and generated DeepSeek V4 kernels. The CuTe DSL generator atpegainfer-kernels/tools/cutedsl/deepseek_v4/generate.pycurrently discovers runtime libraries withenable_tvm_ffi=False.Before using TVM FFI as the official DSL interface in production paths, add a small bidirectional test so we know the dependency, symbol loading, object lifetime, tensor handoff, error propagation, and CI/build behavior are understood.
Another reason to validate this path early: TVM FFI can be a route for launching Triton-generated CUBINs from Rust. A minimal interop test gives us a controlled place to verify that launch path before wiring it into real generated-kernel or model execution code.
Proposed Scope
Define TVM FFI as the official interface between PegaInfer Rust code and DSL-generated kernel artifacts, then add a feature-gated test path, for example
tvm-ffi-interop, underpegainfer-kernels.The first test should cover two directions:
PegaInfer -> TVM FFI
add_one([1, 2, 3]) -> [2, 3, 4].TVM FFI -> PegaInfer
Start CPU-only unless CUDA tensor handoff is straightforward. CUDA/DLPack coverage can be a follow-up once the host-side ABI is stable.
After the test path passes, use it to document the expected DSL integration contract:
Acceptance Criteria
The issue establishes TVM FFI as PegaInfer's official DSL interface direction.
A test-only dependency path exists and is off by default.
The normal default build does not pull in TVM FFI.
A focused command is documented in the test or issue follow-up, for example:
cargo test --release -p pegainfer-kernels --features tvm-ffi-interop tvm_ffi_bidirectional -- --ignoredThe test verifies both call directions, not just loading a library.
Failures surface useful diagnostics for missing TVM FFI runtime libraries, version mismatch, symbol lookup failure, and callback error propagation.
The draft integration contract covers DSL-generated artifacts and the Triton CUBIN launch path.
The implementation does not replace the existing handwritten CUDA FFI surface.
Non-Goals
Notes