Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/.rstcheck.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[rstcheck]
report_level = warning
ignore_directives = automodule, autosummary, currentmodule, toctree, ifconfig, tab-set, collapse, tabs, dropdown
ignore_roles = ref, cpp:class, cpp:func, py:func, c:macro, external+data-api:doc, external+scikit_build_core:doc
ignore_roles = ref, cpp:class, cpp:func, py:func, c:macro, external+data-api:doc, external+scikit_build_core:doc, external+dlpack:doc
ignore_languages = cpp, python
2 changes: 2 additions & 0 deletions docs/concepts/abi_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,8 @@ and hash TVMFFIAny in bytes for quick equality checks without going through
type index switching.
:::

(object-storage-format)=

## Object Storage Format

When TVMFFIAny points to a heap-allocated object (such as n-dimensional arrays),
Expand Down
483 changes: 483 additions & 0 deletions docs/concepts/tensor.rst

Large diffs are not rendered by default.

78 changes: 39 additions & 39 deletions docs/get_started/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,16 @@ Quick Start

.. note::

All the code in this tutorial can be found under `examples/quickstart <https://github.com/apache/tvm-ffi/tree/main/examples/quickstart>`_ in the repository.
All the code in this tutorial is under `examples/quickstart <https://github.com/apache/tvm-ffi/tree/main/examples/quickstart>`_ in the repository.

This guide walks through shipping a minimal ``add_one`` function that computes
``y = x + 1`` in C++ and CUDA.
TVM-FFI's Open ABI and FFI make it possible to **ship one library** for multiple frameworks and languages.
We can build a single shared library that works across:

- **ML frameworks**, e.g. PyTorch, JAX, NumPy, CuPy, etc., and
- **Languages**, e.g. C++, Python, Rust, etc.,
- **Python ABI versions**, e.g. ship one wheel to support all Python versions, including free-threaded ones.
- **ML frameworks**, e.g. PyTorch, JAX, NumPy, CuPy, and others;
- **Languages**, e.g. C++, Python, Rust, and others;
- **Python ABI versions**, e.g. one wheel that supports all Python versions, including free-threaded ones.

.. admonition:: Prerequisite
:class: hint
Expand All @@ -39,7 +39,7 @@ We can build a single shared library that works across:
- Compiler: C++17-capable toolchain (GCC/Clang/MSVC)
- Optional ML frameworks for testing: NumPy, PyTorch, JAX, CuPy
- CUDA: Any modern version (if you want to try the CUDA part)
- TVM-FFI installed via
- TVM-FFI installed via:

.. code-block:: bash

Expand All @@ -52,7 +52,7 @@ Write a Simple ``add_one``
Source Code
~~~~~~~~~~~

Suppose we implement a C++ function ``AddOne`` that performs elementwise ``y = x + 1`` for a 1-D ``float32`` vector. The source code (C++, CUDA) is:
Suppose we implement a C++ function ``AddOne`` that performs elementwise ``y = x + 1`` for a 1-D ``float32`` vector. The source code (C++ and CUDA) is:

.. hint::

Expand Down Expand Up @@ -84,23 +84,23 @@ Suppose we implement a C++ function ``AddOne`` that performs elementwise ``y = x


The macro :c:macro:`TVM_FFI_DLL_EXPORT_TYPED_FUNC` exports the C++ function ``AddOne``
as a TVM FFI compatible symbol ``__tvm_ffi_add_one_cpu/cuda``. If :c:macro:`TVM_FFI_DLL_EXPORT_INCLUDE_METADATA` is set to 1,
as a TVM-FFI-compatible symbol ``__tvm_ffi_add_one_cpu/cuda``. If :c:macro:`TVM_FFI_DLL_EXPORT_INCLUDE_METADATA` is set to 1,
it also exports the function's metadata as a symbol ``__tvm_ffi__metadata_add_one_cpu/cuda`` for type checking and stub generation.

The class :cpp:class:`tvm::ffi::TensorView` allows zero-copy interop with tensors from different ML frameworks:
The class :cpp:class:`tvm::ffi::TensorView` enables zero-copy interop with tensors from different ML frameworks:

- NumPy, CuPy,
- PyTorch, JAX, or
- any array type that supports the standard :external+data-api:doc:`DLPack protocol <design_topics/data_interchange>`.

Finally, :cpp:func:`TVMFFIEnvGetStream` can be used in the CUDA code to launch a kernel on the caller's stream.
Finally, :cpp:func:`TVMFFIEnvGetStream` can be used in the CUDA code to launch kernels on the caller's stream.

.. _sec-cpp-compile-with-tvm-ffi:

Compile with TVM-FFI
~~~~~~~~~~~~~~~~~~~~

**Raw command.** We can use the following minimal commands to compile the source code:
**Raw command.** Use the following minimal commands to compile the source code:

.. tabs::

Expand All @@ -118,16 +118,16 @@ Compile with TVM-FFI
:start-after: [cuda_compile.begin]
:end-before: [cuda_compile.end]

This step produces a shared library ``add_one_cpu.so`` and ``add_one_cuda.so`` that can be used across languages and frameworks.
These steps produce shared libraries ``add_one_cpu.so`` and ``add_one_cuda.so`` that can be used across languages and frameworks.

.. hint::

For a single-file C++/CUDA project, a convenient method :py:func:`tvm_ffi.cpp.load_inline`
is provided to minimize boilerplate code in compilation, linking, and loading.
For a single-file C++/CUDA project, :py:func:`tvm_ffi.cpp.load_inline`
minimizes boilerplate for compilation, linking, and loading.


**CMake.** CMake is the preferred approach for building across platforms.
TVM-FFI natively integrates with CMake via ``find_package`` as demonstrated below:
TVM-FFI integrates with CMake via ``find_package`` as demonstrated below:

.. tabs::

Expand Down Expand Up @@ -158,19 +158,19 @@ TVM-FFI natively integrates with CMake via ``find_package`` as demonstrated belo
add_library(add_one_cuda SHARED compile/add_one_cuda.cu)
tvm_ffi_configure_target(add_one_cuda)

**Artifact.** The resulting ``add_one_cpu.so`` and ``add_one_cuda.so`` are minimal libraries that are agnostic to:
**Artifact.** The resulting ``add_one_cpu.so`` and ``add_one_cuda.so`` are small libraries that are agnostic to:

- Python version/ABI. It is not compiled/linked with Python and depends only on TVM-FFI's stable C ABI;
- Languages, including C++, Python, Rust or any other language that can interop with C ABI;
- ML frameworks, such as PyTorch, JAX, NumPy, CuPy, or anything with standard :external+data-api:doc:`DLPack protocol <design_topics/data_interchange>`.
- Python version/ABI. They are not compiled or linked with Python and depend only on TVM-FFI's stable C ABI;
- Languages, including C++, Python, Rust, or any other language that can interop with the C ABI;
- ML frameworks, such as PyTorch, JAX, NumPy, CuPy, or any array library that implements the standard :external+data-api:doc:`DLPack protocol <design_topics/data_interchange>`.

.. _sec-use-across-framework:

Ship Across ML Frameworks
-------------------------

TVM-FFI's Python package provides :py:func:`tvm_ffi.load_module`, which can load either
the ``add_one_cpu.so`` or ``add_one_cuda.so`` into :py:class:`tvm_ffi.Module`.
TVM-FFI's Python package provides :py:func:`tvm_ffi.load_module` to load either
``add_one_cpu.so`` or ``add_one_cuda.so`` into a :py:class:`tvm_ffi.Module`.

.. code-block:: python

Expand All @@ -179,7 +179,7 @@ the ``add_one_cpu.so`` or ``add_one_cuda.so`` into :py:class:`tvm_ffi.Module`.
func : tvm_ffi.Function = mod.add_one_cpu

``mod.add_one_cpu`` retrieves a callable :py:class:`tvm_ffi.Function` that accepts tensors from host frameworks
directly. This process is done zero-copy, without any boilerplate code, under extremely low latency.
directly. This is zero-copy, requires no boilerplate code, and adds very little overhead.

We can then use these functions in the following ways:

Expand All @@ -198,13 +198,13 @@ PyTorch
JAX
~~~

Support via `nvidia/jax-tvm-ffi <https://github.com/nvidia/jax-tvm-ffi>`_. This can be installed via
Support is provided via `nvidia/jax-tvm-ffi <https://github.com/nvidia/jax-tvm-ffi>`_. Install it with:

.. code-block:: bash

pip install jax-tvm-ffi

After installation, ``add_one_cuda`` can be registered as a target to JAX's ``ffi_call``.
After installation, ``add_one_cuda`` can be registered as a target for JAX's ``ffi_call``.

.. code-block:: python

Expand Down Expand Up @@ -248,26 +248,26 @@ NumPy/CuPy
Ship Across Languages
---------------------

TVM-FFI's core loading mechanism is ABI stable and works across language boundaries.
A single library can be loaded in every language TVM-FFI supports,
without having to recompile different libraries targeting different ABIs or languages.
TVM-FFI's core loading mechanism is ABI-stable and works across language boundaries.
A single library can be loaded in any language TVM-FFI supports,
without recompiling for different ABIs or languages.

.. _ship-to-python:

Python
~~~~~~

As shown in the :ref:`previous section<sec-use-across-framework>`, :py:func:`tvm_ffi.load_module` loads a language-
and framework-independent ``add_one_cpu.so`` or ``add_one_cuda.so`` and can be used to incorporate it into all Python
array frameworks that implement the standard :external+data-api:doc:`DLPack protocol <design_topics/data_interchange>`.
and framework-independent ``add_one_cpu.so`` or ``add_one_cuda.so`` and can be used with any Python
array framework that implements the standard :external+data-api:doc:`DLPack protocol <design_topics/data_interchange>`.

.. _ship-to-cpp:

C++
~~~

TVM-FFI's C++ API :cpp:func:`tvm::ffi::Module::LoadFromFile` loads ``add_one_cpu.so`` or ``add_one_cuda.so`` and
can be used directly in C/C++ with no Python dependency.
can be used directly from C/C++ without a Python dependency.

.. literalinclude:: ../../examples/quickstart/load/load_cpp.cc
:language: cpp
Expand All @@ -290,13 +290,13 @@ Compile and run it with:

.. note::

Don't like loading shared libraries? Static linking is also supported.
Prefer not to load shared libraries? Static linking is also supported.

In such cases, we can use :cpp:func:`tvm::ffi::Function::FromExternC` to create a
In such cases, use :cpp:func:`tvm::ffi::Function::FromExternC` to create a
:cpp:class:`tvm::ffi::Function` from the exported symbol, or directly use
:cpp:func:`tvm::ffi::Function::InvokeExternC` to invoke the function.

This feature can be useful on iOS, or when the exported module is generated by another DSL compiler matching the ABI.
This feature can be useful on iOS, or when the exported module is generated by another DSL compiler targeting the ABI.

.. code-block:: cpp

Expand All @@ -321,7 +321,7 @@ Rust

TVM-FFI's Rust API ``tvm_ffi::Module::load_from_file`` loads ``add_one_cpu.so`` or ``add_one_cuda.so`` and
then retrieves a function ``add_one_cpu`` or ``add_one_cuda`` from it.
This procedure is identical to those in C++ and Python:
This mirrors the C++ and Python flows:

.. code-block:: rust

Expand All @@ -336,8 +336,8 @@ This procedure is identical to those in C++ and Python:

.. hint::

We can also use the Rust API to target the TVM FFI ABI. This means we can use Rust to write the function
implementation and export to Python/C++ in the same fashion.
You can also use the Rust API to target the TVM-FFI ABI. This lets you write the function
implementation in Rust and export it to Python/C++ in the same way.


Troubleshooting
Expand All @@ -351,7 +351,7 @@ Troubleshooting
Further Reading
---------------

- :doc:`Python Packaging <../packaging/python_packaging>` provides details on ABI-agnostic Python wheel building, as well as
exposing functions, classes and C symbols from TVM-FFI modules.
- :doc:`Stable C ABI <stable_c_abi>` explains the ABI in depth and how it enables stability guarantee. Its C examples demonstrate
how to interoperate through the stable C ABI from both callee and caller sides.
- :doc:`Python Packaging <../packaging/python_packaging>` provides details on ABI-agnostic Python wheel builds and on
exposing functions, classes, and C symbols from TVM-FFI modules.
- :doc:`Stable C ABI <stable_c_abi>` explains the ABI in depth and the stability guarantees it enables. Its C examples demonstrate
how to interoperate through the stable C ABI from both the callee and caller sides.
35 changes: 17 additions & 18 deletions docs/get_started/stable_c_abi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Stable C ABI

.. note::

All code used in this guide lives under
All code used in this guide is under
`examples/stable_c_abi <https://github.com/apache/tvm-ffi/tree/main/examples/stable_c_abi>`_.

.. admonition:: Prerequisite
Expand All @@ -34,11 +34,10 @@ Stable C ABI

pip install --reinstall --upgrade apache-tvm-ffi

This guide introduces TVM-FFI's stable C ABI: a single, minimal and stable
ABI that represents any cross-language calls, with DSL and ML compiler codegen
in mind.
This guide introduces TVM-FFI's stable C ABI: a single, minimal ABI that represents
cross-language calls and is designed for DSL and ML compiler codegen.

TVM-FFI builds on the following key idea:
TVM-FFI is built around the following key idea:

.. _tvm_ffi_c_abi:

Expand All @@ -56,27 +55,27 @@ TVM-FFI builds on the following key idea:
TVMFFIAny* result, // output: *result
);

where :cpp:class:`TVMFFIAny`, is a tagged union of all supported types, e.g. integers, floats, Tensors, strings, etc., and can be further extended to arbitrary user-defined types.
where :cpp:class:`TVMFFIAny` is a tagged union of all supported types, e.g. integers, floats, tensors, strings, and more, and can be extended to user-defined types.

Built on top of this stable C ABI, TVM-FFI defines a common C ABI protocol for all functions, and further provides an extensible, performant, and ecosystem-friendly open solution for all.
Built on top of this stable C ABI, TVM-FFI defines a common C ABI protocol for all functions and provides an extensible, performant, and ecosystem-friendly solution.

The rest of this guide covers:

- The stable C layout and calling convention of ``tvm_ffi_c_abi``;
- C examples from both callee and caller side of this ABI.
- C examples from both the callee and caller side of this ABI.

Stable C Layout
---------------

TVM-FFI's :ref:`C ABI <tvm_ffi_c_abi>` uses a stable layout for all the input and output arguments.
TVM-FFI's :ref:`C ABI <tvm_ffi_c_abi>` uses a stable layout for all input and output arguments.

Layout of :cpp:class:`TVMFFIAny`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:cpp:class:`TVMFFIAny` is a fixed-size (128-bit) tagged union that represents all supported types.

- First 32 bits: type index indicating which value is stored (supports up to 2^32 types).
- Next 32 bits: reserved (used for flags in rare cases, e.g. small-string optimization).
- Next 32 bits: reserved (used for flags in rare cases, e.g., small-string optimization).
- Last 64 bits: payload that is either a 64-bit integer, a 64-bit floating-point number, or a pointer to a heap-allocated object.

.. figure:: https://raw.githubusercontent.com/tlc-pack/web-data/main/images/tvm-ffi/stable-c-abi-layout-any.svg
Expand Down Expand Up @@ -137,9 +136,9 @@ Stable ABI in C Code
You can build and run the examples either with raw compiler commands or with CMake.
Both approaches are demonstrated below.

TVM FFI's :ref:`C ABI <tvm_ffi_c_abi>` is designed with DSL and ML compilers in mind. DSL codegen usually relies on MLIR, LLVM or low-level C as the compilation target, where no access to C++ features is available, and where stable C ABIs are preferred for simplicity and stability.
TVM-FFI's :ref:`C ABI <tvm_ffi_c_abi>` is designed with DSL and ML compilers in mind. DSL codegen often targets MLIR, LLVM, or low-level C, where C++ features are unavailable and stable C ABIs are preferred for simplicity and stability.

This section shows how to write C code that follows the stable C ABI. Specifically, we provide two examples:
This section shows how to write C code that follows the stable C ABI using two examples:

- Callee side: A CPU ``add_one_cpu`` kernel in C that is equivalent to the :ref:`C++ example <cpp_add_one_kernel>`.
- Caller side: A loader and runner in C that invokes the kernel, a direct C translation of the :ref:`C++ example <ship-to-cpp>`.
Expand All @@ -149,11 +148,11 @@ The C code is minimal and dependency-free, so it can serve as a direct reference
Callee: ``add_one_cpu`` Kernel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Below is a minimal ``add_one_cpu`` kernel in C that follows the stable C ABI. It has three steps:
Below is a minimal ``add_one_cpu`` kernel in C that follows the stable C ABI in three steps:

- **Step 1**. Extract input ``x`` and output ``y`` as DLPack tensors;
- **Step 2**. Implement the kernel ``y = x + 1`` on CPU with a simple for-loop;
- **Step 3**. Set the output result to ``result``.
- **Step 3**. Set the output result in ``result``.

.. literalinclude:: ../../examples/stable_c_abi/src/add_one_cpu.c
:language: c
Expand Down Expand Up @@ -188,7 +187,7 @@ Build it with either approach:
**C vs. C++.** Compared to the :ref:`C++ example <cpp_add_one_kernel>`, there are a few key differences:

- The explicit marshalling in **Step 1** is only needed in C. In C++, templates hide these details.
- The C++ macro :c:macro:`TVM_FFI_DLL_EXPORT_TYPED_FUNC` (used to export ``add_one_cpu``) is not needed in C, because this example directly defines the exported C symbol ``__tvm_ffi_add_one_cpu``.
- The C++ macro :c:macro:`TVM_FFI_DLL_EXPORT_TYPED_FUNC` (used to export ``add_one_cpu``) is not needed in C, since this example directly defines the exported C symbol ``__tvm_ffi_add_one_cpu``.

.. hint::

Expand All @@ -200,7 +199,7 @@ Build it with either approach:
Caller: Kernel Loader
~~~~~~~~~~~~~~~~~~~~~

Next, a minimal C loader invokes the ``add_one_cpu`` kernel. It is functionally identical to the :ref:`C++ example <ship-to-cpp>` and performs:
Next, a minimal C loader invokes the ``add_one_cpu`` kernel. It mirrors the :ref:`C++ example <ship-to-cpp>` and performs:

- **Step 1**. Load the shared library ``build/add_one_cpu.so`` that contains the kernel;
- **Step 2**. Get function ``add_one_cpu`` from the library;
Expand Down Expand Up @@ -238,7 +237,7 @@ Build and run the loader with either approach:
cmake --build build --config RelWithDebInfo
build/load

To call a function via the stable C ABI in C, idiomatically:
In C, the idiomatic steps to call a function via the stable C ABI are:

- Convert input arguments to the :cpp:class:`TVMFFIAny` type;
- Call the target function (e.g., ``add_one_cpu``) via :cpp:func:`TVMFFIFunctionCall`;
Expand All @@ -247,7 +246,7 @@ To call a function via the stable C ABI in C, idiomatically:
What's Next
-----------

**ABI specification.** See the complete ABI specification in :doc:`../concepts/abi_overview`.
**ABI specification.** See the full ABI specification in :doc:`../concepts/abi_overview`.

**Convenient compiler target.** The stable C ABI is a simple, portable codegen target for DSL compilers. Emit C that follows this ABI to integrate with TVM-FFI and call the result from multiple languages and frameworks. See :doc:`../concepts/abi_overview`.

Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ Table of Contents
:caption: Concepts

concepts/abi_overview.md
concepts/tensor.rst

.. toctree::
:maxdepth: 1
Expand Down
Loading