ptoas is a specialized compiler toolchain built on top of the LLVM21 VPTO branch (vpto-dev/llvm-project:feature-vpto-llvm21), designed specifically for PTO Bytecode (Programming Tiling Operator Bytecode).
Acting as the bridge between upper-level AI frameworks and underlying NPU/GPGPU/CPU hardware, ptoas is built in an Out-of-Tree architecture and provides complete C++ and Python interfaces. Its primary responsibilities include:
- IR Parsing & Verification: Parses
.ptoinput files and verifies the semantic correctness of PTO Dialect operations (Ops). - Compilation & Optimization (Passes): Executes optimization passes targeting the Da Vinci Architecture, such as operator fusion and automatic synchronization insertion.
- Code Generation (Lowering): Supports lowering PTO IR to
EmitC/Linalgdialects, ultimately generating code that calls thepto-isaC++ library. - Python Bindings: Provides seamlessly integrated Python modules. Through integration with MLIR Core bindings, frameworks such as PyPTO, TileLang, and CuTile can build, manipulate, and compile PTO Bytecode directly from Python.
PTOAS/
├── include/
│ └── PTO/ # PTO Dialect headers and TableGen definitions (.td)
├── lib/
│ ├── PTO/ # Dialect core implementation (IR) and Pass logic (Transforms)
│ ├── CAPI/ # C language interface exposure
│ └── Bindings/Python/ # Python Binding C++ implementation (Pybind11)
├── python/ # Python module build scripts and helper code
├── test/
│ └── samples/ # Test cases
├── tools/
│ ├── ptoas/ # ptoas command-line tool entry point (Output: ptoas)
│ └── ptobc/ # ptobc command-line tool entry point (Output: ptobc)
└── CMakeLists.txt # Top-level build configuration
vpto-dev/llvm-project:feature-vpto-llvm21.
To simplify the build process, first modify and run the following commands according to your environment. Subsequent steps reference these variables directly.
# ================= Configuration (edit here) =================
# Set your workspace root directory
# (recommended: a dedicated directory for LLVM and PTOAS)
export WORKSPACE_DIR=$HOME/llvm-workspace
# LLVM source and build paths
export LLVM_SOURCE_DIR=$WORKSPACE_DIR/llvm-project
export LLVM_BUILD_DIR=$LLVM_SOURCE_DIR/build-shared
# PTOAS source and install paths
export PTO_SOURCE_DIR=$WORKSPACE_DIR/PTOAS
export PTO_INSTALL_DIR=$PTO_SOURCE_DIR/install
# =============================================================
# Create the workspace directory
mkdir -p $WORKSPACE_DIR- OS: Linux (Ubuntu 20.04+ recommended)
- Compiler: GCC >= 9 or Clang (C++17 support required)
- Build System: CMake >= 3.20, Ninja
- Python: 3.8+
- Python Packages:
pybind11<3,nanobind,numpy
python3 -m pip install "pybind11<3" nanobind numpyNote: The current LLVM/MLIR Python bindings are not compatible with
pybind113.x. If you encounter errors likedef_property family does not currently support keep_alivewhen building LLVM, run the downgrade command above first.
Download the VPTO-adapted LLVM source, check out the feature-vpto-llvm21 branch, and build with shared libraries to ensure correct linking for Python bindings.
# 1. Clone LLVM
cd $WORKSPACE_DIR
git clone https://github.com/vpto-dev/llvm-project.git
cd $LLVM_SOURCE_DIR
# 2. [Critical] Check out the VPTO adaptation branch
git checkout feature-vpto-llvm21
# 3. Configure CMake (build shared libs with Python bindings enabled)
cmake -G Ninja -S llvm -B $LLVM_BUILD_DIR \
-DLLVM_ENABLE_PROJECTS="mlir;clang" \
-DBUILD_SHARED_LIBS=ON \
-DMLIR_ENABLE_BINDINGS_PYTHON=ON \
-DPython3_EXECUTABLE=$(which python3) \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_TARGETS_TO_BUILD="host"
# 4. Build LLVM (this step takes a long time)
ninja -C $LLVM_BUILD_DIRClone the PTOAS source and build against the LLVM 21 you just compiled.
# 1. Clone PTOAS
cd $WORKSPACE_DIR
git clone https://gitcode.com/cann/pto-as.git PTOAS
cd $PTO_SOURCE_DIR
# 2. Build and install via pip
# The build backend (pyproject.toml) drives CMake + Ninja automatically.
pip install .This produces the same artifacts as a manual CMake build:
# CLI tools
$PTO_SOURCE_DIR/build/tools/ptoas/ptoas
$PTO_SOURCE_DIR/build/tools/ptobc/ptobc
# Native extension installed into the MLIR Python package
$LLVM_BUILD_DIR/tools/mlir/python_packages/mlir_core/
└── mlir
└── _mlir_libs
└── _pto.cpython-*.so
# Python dialect files
$PTO_INSTALL_DIR/
└── mlir
└── dialects
├── pto.py
└── _pto_ops_gen.py
If you want to develop and test Python code against the in-tree build without reinstalling after every C++ change, use an editable install.
pip install -e . --no-build-isolationWhy
--no-build-isolation? Without this flag, pip uses a temporary virtual environment for the build, records its pybind11 path inCMakeCache.txt, then deletes the venv — breaking any subsequentninjareconfigure.
If you previously ran pip install -e . without the flag and your build is now broken, fix the existing CMakeCache.txt with:
cmake -B build -Dpybind11_DIR=$(python3 -m pybind11 --cmakedir)# Parse and print PTO IR
ptoas test/lit/pto/empty_func.pto
# Run the AutoSyncInsert pass
ptoas test/lit/pto/empty_func.pto --enable-insert-sync -o outputfile.cpp
# Specify target hardware architecture (A3 / A5)
ptoas test/lit/pto/empty_func.pto --pto-arch=a5 -o outputfile.cpp
# Specify build level (level3 disables PlanMemory/InsertSync)
ptoas test/lit/pto/empty_func.pto --pto-level=level3 -o outputfile.cpp
# Print the current ptoas release version
ptoas --versionAfter configuring the environment variables, the PTO Dialect is loaded as part of mlir.dialects.
from mlir.ir import Context, Module, Location
# [Key] Import pto from mlir.dialects — the standard pattern for out-of-tree bindings
from mlir.dialects import pto
with Context() as ctx, Location.unknown():
pto.register_dialect(ctx, load=True)
module = Module.create()
print("PTO Dialect registered successfully!")# Run Python binding tests
cd $PTO_SOURCE_DIR/test/samples/MatMul/
python3 ./tmatmulk.py > ./tmatmulk.pto
# Run ptoas tests
$PTO_SOURCE_DIR/build/tools/ptoas/ptoas ./tmatmulk.pto -o ./tmatmulk.cppThis flow generates NPU validation test cases from the .cpp files produced by ptoas (under test/samples/) and runs them on an NPU. The example below reuses MatMul/tmatmulk.cpp generated in section 4.3.
For compile-only validation on a machine without an NPU card, see docs/no_npu_compile_only_guide_zh.md.
# 1) Generate the npu_validation test directory
# (creates npu_validation/ under the current sample directory)
# A2/A3 example:
python3 test/npu_validation/scripts/generate_testcase.py \
--input test/samples/MatMul/tmatmulk.cpp \
--run-mode npu \
--soc-version Ascend910B1
# A5 example:
python3 test/npu_validation/scripts/generate_testcase.py \
--input test/samples/MatMul/tmatmulk.cpp \
--run-mode npu \
--soc-version Ascend950
# 2) Run validation (run.sh requires no additional arguments)
test/samples/MatMul/npu_validation/tmatmulk/run.shNotes:
test/samples/MatMul/npu_validation/tmatmulk/will containtmatmulk_kernel.cpp,main.cpp,golden.py,compare.py,run.sh, andCMakeLists.txt.golden.pygenerates random inputs by default; outputs default to all zeros (only the count, shape, and data type of inputs/outputs match the kernel parameters).compare.pycomparesgolden*.binagainstoutput*.binand reports an error if they differ.