apache · junrushao · Jan 4, 2026 · Dec 31, 2025
diff --git a/docs/.rstcheck.cfg b/docs/.rstcheck.cfg
@@ -1,5 +1,5 @@
 [rstcheck]
 report_level = warning
 ignore_directives = automodule, autosummary, currentmodule, toctree, ifconfig, tab-set, collapse, tabs, dropdown
-ignore_roles = ref, cpp:class, cpp:func, py:func, c:macro, external+data-api:doc, external+scikit_build_core:doc
+ignore_roles = ref, cpp:class, cpp:func, py:func, c:macro, external+data-api:doc, external+scikit_build_core:doc, external+dlpack:doc
 ignore_languages = cpp, python
diff --git a/docs/concepts/abi_overview.md b/docs/concepts/abi_overview.md
@@ -184,6 +184,8 @@ and hash TVMFFIAny in bytes for quick equality checks without going through
 type index switching.
 :::
 
+(object-storage-format)=
+
 ## Object Storage Format
 
 When TVMFFIAny points to a heap-allocated object (such as n-dimensional arrays),

diff --git a/docs/concepts/tensor.rst b/docs/concepts/tensor.rst
diff --git a/docs/get_started/quickstart.rst b/docs/get_started/quickstart.rst
@@ -20,16 +20,16 @@ Quick Start
 
 .. note::
 
-  All the code in this tutorial can be found under `examples/quickstart <https://github.com/apache/tvm-ffi/tree/main/examples/quickstart>`_ in the repository.
+  All the code in this tutorial is under `examples/quickstart <https://github.com/apache/tvm-ffi/tree/main/examples/quickstart>`_ in the repository.
 
 This guide walks through shipping a minimal ``add_one`` function that computes
 ``y = x + 1`` in C++ and CUDA.
 TVM-FFI's Open ABI and FFI make it possible to **ship one library** for multiple frameworks and languages.
 We can build a single shared library that works across:
 
-- **ML frameworks**, e.g. PyTorch, JAX, NumPy, CuPy, etc., and
-- **Languages**, e.g. C++, Python, Rust, etc.,
-- **Python ABI versions**, e.g. ship one wheel to support all Python versions, including free-threaded ones.
+- **ML frameworks**, e.g. PyTorch, JAX, NumPy, CuPy, and others;
+- **Languages**, e.g. C++, Python, Rust, and others;
+- **Python ABI versions**, e.g. one wheel that supports all Python versions, including free-threaded ones.
 
 .. admonition:: Prerequisite
    :class: hint
@@ -39,7 +39,7 @@ We can build a single shared library that works across:
    - Compiler: C++17-capable toolchain (GCC/Clang/MSVC)
    - Optional ML frameworks for testing: NumPy, PyTorch, JAX, CuPy
    - CUDA: Any modern version (if you want to try the CUDA part)
-   - TVM-FFI installed via
+   - TVM-FFI installed via:
 
      .. code-block:: bash
 
@@ -52,7 +52,7 @@ Write a Simple ``add_one``
 Source Code
 ~~~~~~~~~~~
 
-Suppose we implement a C++ function ``AddOne`` that performs elementwise ``y = x + 1`` for a 1-D ``float32`` vector. The source code (C++, CUDA) is:
+Suppose we implement a C++ function ``AddOne`` that performs elementwise ``y = x + 1`` for a 1-D ``float32`` vector. The source code (C++ and CUDA) is:
 
 .. hint::
 
@@ -84,23 +84,23 @@ Suppose we implement a C++ function ``AddOne`` that performs elementwise ``y = x
 
 
 The macro :c:macro:`TVM_FFI_DLL_EXPORT_TYPED_FUNC` exports the C++ function ``AddOne``
-as a TVM FFI compatible symbol ``__tvm_ffi_add_one_cpu/cuda``. If :c:macro:`TVM_FFI_DLL_EXPORT_INCLUDE_METADATA` is set to 1,
+as a TVM-FFI-compatible symbol ``__tvm_ffi_add_one_cpu/cuda``. If :c:macro:`TVM_FFI_DLL_EXPORT_INCLUDE_METADATA` is set to 1,
 it also exports the function's metadata as a symbol ``__tvm_ffi__metadata_add_one_cpu/cuda`` for type checking and stub generation.
 
-The class :cpp:class:`tvm::ffi::TensorView` allows zero-copy interop with tensors from different ML frameworks:
+The class :cpp:class:`tvm::ffi::TensorView` enables zero-copy interop with tensors from different ML frameworks:
 
 - NumPy, CuPy,
 - PyTorch, JAX, or
 - any array type that supports the standard :external+data-api:doc:`DLPack protocol <design_topics/data_interchange>`.
 
-Finally, :cpp:func:`TVMFFIEnvGetStream` can be used in the CUDA code to launch a kernel on the caller's stream.
+Finally, :cpp:func:`TVMFFIEnvGetStream` can be used in the CUDA code to launch kernels on the caller's stream.
 
 .. _sec-cpp-compile-with-tvm-ffi:
 
 Compile with TVM-FFI
 ~~~~~~~~~~~~~~~~~~~~
 
-**Raw command.** We can use the following minimal commands to compile the source code:
+**Raw command.** Use the following minimal commands to compile the source code:
 
 .. tabs::
 
@@ -118,16 +118,16 @@ Compile with TVM-FFI
       :start-after: [cuda_compile.begin]
       :end-before: [cuda_compile.end]
 
-This step produces a shared library ``add_one_cpu.so`` and ``add_one_cuda.so`` that can be used across languages and frameworks.
+These steps produce shared libraries ``add_one_cpu.so`` and ``add_one_cuda.so`` that can be used across languages and frameworks.
 
 .. hint::
 
-   For a single-file C++/CUDA project, a convenient method :py:func:`tvm_ffi.cpp.load_inline`
-   is provided to minimize boilerplate code in compilation, linking, and loading.
+   For a single-file C++/CUDA project, :py:func:`tvm_ffi.cpp.load_inline`
+   minimizes boilerplate for compilation, linking, and loading.
 
 
 **CMake.** CMake is the preferred approach for building across platforms.
-TVM-FFI natively integrates with CMake via ``find_package`` as demonstrated below:
+TVM-FFI integrates with CMake via ``find_package`` as demonstrated below:
 
 .. tabs::
 
@@ -158,19 +158,19 @@ TVM-FFI natively integrates with CMake via ``find_package`` as demonstrated belo
       add_library(add_one_cuda SHARED compile/add_one_cuda.cu)
       tvm_ffi_configure_target(add_one_cuda)
 
-**Artifact.** The resulting ``add_one_cpu.so`` and ``add_one_cuda.so`` are minimal libraries that are agnostic to:
+**Artifact.** The resulting ``add_one_cpu.so`` and ``add_one_cuda.so`` are small libraries that are agnostic to:
 
-- Python version/ABI. It is not compiled/linked with Python and depends only on TVM-FFI's stable C ABI;
-- Languages, including C++, Python, Rust or any other language that can interop with C ABI;
-- ML frameworks, such as PyTorch, JAX, NumPy, CuPy, or anything with standard :external+data-api:doc:`DLPack protocol <design_topics/data_interchange>`.
+- Python version/ABI. They are not compiled or linked with Python and depend only on TVM-FFI's stable C ABI;
+- Languages, including C++, Python, Rust, or any other language that can interop with the C ABI;
+- ML frameworks, such as PyTorch, JAX, NumPy, CuPy, or any array library that implements the standard :external+data-api:doc:`DLPack protocol <design_topics/data_interchange>`.
 
 .. _sec-use-across-framework:
 
 Ship Across ML Frameworks
 -------------------------
 
-TVM-FFI's Python package provides :py:func:`tvm_ffi.load_module`, which can load either
-the ``add_one_cpu.so`` or ``add_one_cuda.so`` into :py:class:`tvm_ffi.Module`.
+TVM-FFI's Python package provides :py:func:`tvm_ffi.load_module` to load either
+``add_one_cpu.so`` or ``add_one_cuda.so`` into a :py:class:`tvm_ffi.Module`.
 
 .. code-block:: python
 
@@ -179,7 +179,7 @@ the ``add_one_cpu.so`` or ``add_one_cuda.so`` into :py:class:`tvm_ffi.Module`.
   func : tvm_ffi.Function = mod.add_one_cpu
 
 ``mod.add_one_cpu`` retrieves a callable :py:class:`tvm_ffi.Function` that accepts tensors from host frameworks
-directly. This process is done zero-copy, without any boilerplate code, under extremely low latency.
+directly. This is zero-copy, requires no boilerplate code, and adds very little overhead.
 
 We can then use these functions in the following ways:
 
@@ -198,13 +198,13 @@ PyTorch
 JAX
 ~~~
 
-Support via `nvidia/jax-tvm-ffi <https://github.com/nvidia/jax-tvm-ffi>`_. This can be installed via
+Support is provided via `nvidia/jax-tvm-ffi <https://github.com/nvidia/jax-tvm-ffi>`_. Install it with:
 
 .. code-block:: bash
 
   pip install jax-tvm-ffi
 
-After installation, ``add_one_cuda`` can be registered as a target to JAX's ``ffi_call``.
+After installation, ``add_one_cuda`` can be registered as a target for JAX's ``ffi_call``.
 
 .. code-block:: python
 
@@ -248,26 +248,26 @@ NumPy/CuPy
 Ship Across Languages
 ---------------------
 
-TVM-FFI's core loading mechanism is ABI stable and works across language boundaries.
-A single library can be loaded in every language TVM-FFI supports,
-without having to recompile different libraries targeting different ABIs or languages.
+TVM-FFI's core loading mechanism is ABI-stable and works across language boundaries.
+A single library can be loaded in any language TVM-FFI supports,
+without recompiling for different ABIs or languages.
 
 .. _ship-to-python:
 
 Python
 ~~~~~~
 
 As shown in the :ref:`previous section<sec-use-across-framework>`, :py:func:`tvm_ffi.load_module` loads a language-
-and framework-independent ``add_one_cpu.so`` or ``add_one_cuda.so`` and can be used to incorporate it into all Python
-array frameworks that implement the standard :external+data-api:doc:`DLPack protocol <design_topics/data_interchange>`.
+and framework-independent ``add_one_cpu.so`` or ``add_one_cuda.so`` and can be used with any Python
+array framework that implements the standard :external+data-api:doc:`DLPack protocol <design_topics/data_interchange>`.
 
 .. _ship-to-cpp:
 
 C++
 ~~~
 
 TVM-FFI's C++ API :cpp:func:`tvm::ffi::Module::LoadFromFile` loads ``add_one_cpu.so`` or ``add_one_cuda.so`` and
-can be used directly in C/C++ with no Python dependency.
+can be used directly from C/C++ without a Python dependency.
 
 .. literalinclude:: ../../examples/quickstart/load/load_cpp.cc
    :language: cpp
@@ -290,13 +290,13 @@ Compile and run it with:
 
 .. note::
 
-  Don't like loading shared libraries? Static linking is also supported.
+  Prefer not to load shared libraries? Static linking is also supported.
 
-  In such cases, we can use :cpp:func:`tvm::ffi::Function::FromExternC` to create a
+  In such cases, use :cpp:func:`tvm::ffi::Function::FromExternC` to create a
   :cpp:class:`tvm::ffi::Function` from the exported symbol, or directly use
   :cpp:func:`tvm::ffi::Function::InvokeExternC` to invoke the function.
 
-  This feature can be useful on iOS, or when the exported module is generated by another DSL compiler matching the ABI.
+  This feature can be useful on iOS, or when the exported module is generated by another DSL compiler targeting the ABI.
 
   .. code-block:: cpp
 
@@ -321,7 +321,7 @@ Rust
 
 TVM-FFI's Rust API ``tvm_ffi::Module::load_from_file`` loads ``add_one_cpu.so`` or ``add_one_cuda.so`` and
 then retrieves a function ``add_one_cpu`` or ``add_one_cuda`` from it.
-This procedure is identical to those in C++ and Python:
+This mirrors the C++ and Python flows:
 
 .. code-block:: rust
 
@@ -336,8 +336,8 @@ This procedure is identical to those in C++ and Python:
 
 .. hint::
 
-    We can also use the Rust API to target the TVM FFI ABI. This means we can use Rust to write the function
-    implementation and export to Python/C++ in the same fashion.
+    You can also use the Rust API to target the TVM-FFI ABI. This lets you write the function
+    implementation in Rust and export it to Python/C++ in the same way.
 
 
 Troubleshooting
@@ -351,7 +351,7 @@ Troubleshooting
 Further Reading
 ---------------
 
-- :doc:`Python Packaging <../packaging/python_packaging>` provides details on ABI-agnostic Python wheel building, as well as
-  exposing functions, classes and C symbols from TVM-FFI modules.
-- :doc:`Stable C ABI <stable_c_abi>` explains the ABI in depth and how it enables stability guarantee. Its C examples demonstrate
-  how to interoperate through the stable C ABI from both callee and caller sides.
+- :doc:`Python Packaging <../packaging/python_packaging>` provides details on ABI-agnostic Python wheel builds and on
+  exposing functions, classes, and C symbols from TVM-FFI modules.
+- :doc:`Stable C ABI <stable_c_abi>` explains the ABI in depth and the stability guarantees it enables. Its C examples demonstrate
+  how to interoperate through the stable C ABI from both the callee and caller sides.
diff --git a/docs/get_started/stable_c_abi.rst b/docs/get_started/stable_c_abi.rst
@@ -20,7 +20,7 @@ Stable C ABI
 
 .. note::
 
-  All code used in this guide lives under
+  All code used in this guide is under
   `examples/stable_c_abi <https://github.com/apache/tvm-ffi/tree/main/examples/stable_c_abi>`_.
 
 .. admonition:: Prerequisite
@@ -34,11 +34,10 @@ Stable C ABI
 
         pip install --reinstall --upgrade apache-tvm-ffi
 
-This guide introduces TVM-FFI's stable C ABI: a single, minimal and stable
-ABI that represents any cross-language calls, with DSL and ML compiler codegen
-in mind.
+This guide introduces TVM-FFI's stable C ABI: a single, minimal ABI that represents
+cross-language calls and is designed for DSL and ML compiler codegen.
 
-TVM-FFI builds on the following key idea:
+TVM-FFI is built around the following key idea:
 
 .. _tvm_ffi_c_abi:
 
@@ -56,27 +55,27 @@ TVM-FFI builds on the following key idea:
         TVMFFIAny*       result,  // output: *result
       );
 
-  where :cpp:class:`TVMFFIAny`, is a tagged union of all supported types, e.g. integers, floats, Tensors, strings, etc., and can be further extended to arbitrary user-defined types.
+  where :cpp:class:`TVMFFIAny` is a tagged union of all supported types, e.g. integers, floats, tensors, strings, and more, and can be extended to user-defined types.
 
-Built on top of this stable C ABI, TVM-FFI defines a common C ABI protocol for all functions, and further provides an extensible, performant, and ecosystem-friendly open solution for all.
+Built on top of this stable C ABI, TVM-FFI defines a common C ABI protocol for all functions and provides an extensible, performant, and ecosystem-friendly solution.
 
 The rest of this guide covers:
 
 - The stable C layout and calling convention of ``tvm_ffi_c_abi``;
-- C examples from both callee and caller side of this ABI.
+- C examples from both the callee and caller side of this ABI.
 
 Stable C Layout
 ---------------
 
-TVM-FFI's :ref:`C ABI <tvm_ffi_c_abi>` uses a stable layout for all the input and output arguments.
+TVM-FFI's :ref:`C ABI <tvm_ffi_c_abi>` uses a stable layout for all input and output arguments.
 
 Layout of :cpp:class:`TVMFFIAny`
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 :cpp:class:`TVMFFIAny` is a fixed-size (128-bit) tagged union that represents all supported types.
 
 - First 32 bits: type index indicating which value is stored (supports up to 2^32 types).
-- Next 32 bits: reserved (used for flags in rare cases, e.g. small-string optimization).
+- Next 32 bits: reserved (used for flags in rare cases, e.g., small-string optimization).
 - Last 64 bits: payload that is either a 64-bit integer, a 64-bit floating-point number, or a pointer to a heap-allocated object.
 
 .. figure:: https://raw.githubusercontent.com/tlc-pack/web-data/main/images/tvm-ffi/stable-c-abi-layout-any.svg
@@ -137,9 +136,9 @@ Stable ABI in C Code
   You can build and run the examples either with raw compiler commands or with CMake.
   Both approaches are demonstrated below.
 
-TVM FFI's :ref:`C ABI <tvm_ffi_c_abi>` is designed with DSL and ML compilers in mind. DSL codegen usually relies on MLIR, LLVM or low-level C as the compilation target, where no access to C++ features is available, and where stable C ABIs are preferred for simplicity and stability.
+TVM-FFI's :ref:`C ABI <tvm_ffi_c_abi>` is designed with DSL and ML compilers in mind. DSL codegen often targets MLIR, LLVM, or low-level C, where C++ features are unavailable and stable C ABIs are preferred for simplicity and stability.
 
-This section shows how to write C code that follows the stable C ABI. Specifically, we provide two examples:
+This section shows how to write C code that follows the stable C ABI using two examples:
 
 - Callee side: A CPU ``add_one_cpu`` kernel in C that is equivalent to the :ref:`C++ example <cpp_add_one_kernel>`.
 - Caller side: A loader and runner in C that invokes the kernel, a direct C translation of the :ref:`C++ example <ship-to-cpp>`.
@@ -149,11 +148,11 @@ The C code is minimal and dependency-free, so it can serve as a direct reference
 Callee: ``add_one_cpu`` Kernel
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Below is a minimal ``add_one_cpu`` kernel in C that follows the stable C ABI. It has three steps:
+Below is a minimal ``add_one_cpu`` kernel in C that follows the stable C ABI in three steps:
 
 - **Step 1**. Extract input ``x`` and output ``y`` as DLPack tensors;
 - **Step 2**. Implement the kernel ``y = x + 1`` on CPU with a simple for-loop;
-- **Step 3**. Set the output result to ``result``.
+- **Step 3**. Set the output result in ``result``.
 
 .. literalinclude:: ../../examples/stable_c_abi/src/add_one_cpu.c
    :language: c
@@ -188,7 +187,7 @@ Build it with either approach:
 **C vs. C++.** Compared to the :ref:`C++ example <cpp_add_one_kernel>`, there are a few key differences:
 
 - The explicit marshalling in **Step 1** is only needed in C. In C++, templates hide these details.
-- The C++ macro :c:macro:`TVM_FFI_DLL_EXPORT_TYPED_FUNC` (used to export ``add_one_cpu``) is not needed in C, because this example directly defines the exported C symbol ``__tvm_ffi_add_one_cpu``.
+- The C++ macro :c:macro:`TVM_FFI_DLL_EXPORT_TYPED_FUNC` (used to export ``add_one_cpu``) is not needed in C, since this example directly defines the exported C symbol ``__tvm_ffi_add_one_cpu``.
 
 .. hint::
 
@@ -200,7 +199,7 @@ Build it with either approach:
 Caller: Kernel Loader
 ~~~~~~~~~~~~~~~~~~~~~
 
-Next, a minimal C loader invokes the ``add_one_cpu`` kernel. It is functionally identical to the :ref:`C++ example <ship-to-cpp>` and performs:
+Next, a minimal C loader invokes the ``add_one_cpu`` kernel. It mirrors the :ref:`C++ example <ship-to-cpp>` and performs:
 
 - **Step 1**. Load the shared library ``build/add_one_cpu.so`` that contains the kernel;
 - **Step 2**. Get function ``add_one_cpu`` from the library;
@@ -238,7 +237,7 @@ Build and run the loader with either approach:
        cmake --build build --config RelWithDebInfo
        build/load
 
-To call a function via the stable C ABI in C, idiomatically:
+In C, the idiomatic steps to call a function via the stable C ABI are:
 
 - Convert input arguments to the :cpp:class:`TVMFFIAny` type;
 - Call the target function (e.g., ``add_one_cpu``) via :cpp:func:`TVMFFIFunctionCall`;
@@ -247,7 +246,7 @@ To call a function via the stable C ABI in C, idiomatically:
 What's Next
 -----------
 
-**ABI specification.** See the complete ABI specification in :doc:`../concepts/abi_overview`.
+**ABI specification.** See the full ABI specification in :doc:`../concepts/abi_overview`.
 
 **Convenient compiler target.** The stable C ABI is a simple, portable codegen target for DSL compilers. Emit C that follows this ABI to integrate with TVM-FFI and call the result from multiple languages and frameworks. See :doc:`../concepts/abi_overview`.
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -59,6 +59,7 @@ Table of Contents
    :caption: Concepts
 
    concepts/abi_overview.md
+   concepts/tensor.rst
 
 .. toctree::
    :maxdepth: 1