Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

PTO Runtime Test Framework

This directory contains scripts and tools for running PTO Runtime tests.

Overview

The PTO Runtime test framework provides a simplified interface for testing runtime implementations. Users only need to provide:

  1. Kernel configuration (kernel_config.py) - Defines kernels and orchestration function
  2. Golden script (golden.py) - Defines input generation and expected output computation

The test framework automatically handles compilation, execution, and result validation.

Quick Start

Basic Usage

python examples/scripts/run_example.py \
  --kernels <kernels_directory> \
  --golden <golden_script_path> \
  --platform <platform_name>

Examples

Running Hardware Platform Tests (Requires Ascend Device)

python examples/scripts/run_example.py \
  -k examples/a2a3/host_build_graph/vector_example/kernels \
  -g examples/a2a3/host_build_graph/vector_example/golden.py \
  -p a2a3

Running Simulation Platform Tests (No Hardware Required)

python examples/scripts/run_example.py \
  -k examples/a2a3/host_build_graph/vector_example/kernels \
  -g examples/a2a3/host_build_graph/vector_example/golden.py \
  -p a2a3sim

Command Line Arguments

run_example.py Parameters

Argument Short Description Default
--kernels -k Kernels directory path (contains kernel_config.py) Required
--golden -g golden.py script path Required
--platform -p Platform name: a2a3 or a2a3sim a2a3
--device -d Device ID From env var or 0
--runtime -r Runtime implementation name host_build_graph
--verbose -v Enable verbose output (equivalent to --log-level debug) False
--silent Enable silent mode (equivalent to --log-level error) False
--log-level Set log level: error, warn, info, debug info
--clone-protocol Git protocol for cloning pto-isa: ssh or https ssh

Platform Description

  • a2a3: Hardware platform, requires Ascend device and CANN toolkit
  • a2a3sim: Simulation platform, uses thread simulation, only requires gcc/g++

Logging Control

The test framework supports unified logging across Python and C++ Host code with four levels:

Log Levels

  • error (ERROR): Only show errors, suitable for CI/CD or production
  • warn (WARNING): Show warnings and errors
  • info (INFO, default): Show progress and status information
  • debug (DEBUG): Show detailed debug information including:
    • Compiler commands
    • Compiler stdout/stderr output
    • Detailed tensor data
    • Intermediate step information

Usage Examples

# Error level - only show errors
python examples/scripts/run_example.py -k ./kernels -g ./golden.py --silent
# or explicitly:
python examples/scripts/run_example.py -k ./kernels -g ./golden.py --log-level error

# Info level (default) - show progress
python examples/scripts/run_example.py -k ./kernels -g ./golden.py

# Debug level - show all debug info
python examples/scripts/run_example.py -k ./kernels -g ./golden.py --verbose
# or explicitly:
python examples/scripts/run_example.py -k ./kernels -g ./golden.py --log-level debug

# Warning level - show warnings and errors
python examples/scripts/run_example.py -k ./kernels -g ./golden.py --log-level warn

Environment Variable

You can also control logging via environment variable (lower priority than CLI arguments):

export PTO_LOG_LEVEL=debug  # Options: error, warn, info, debug
python examples/scripts/run_example.py -k ./kernels -g ./golden.py

Priority

Log level is determined by (highest to lowest priority):

  1. CLI arguments (--log-level, --verbose, --silent)
  2. Environment variable (PTO_LOG_LEVEL)
  3. Default value (info / INFO level)

Compiler Output Behavior

  • error/warn/info levels: Compiler output hidden unless compilation fails
  • debug level: All compiler output displayed via DEBUG level
  • Compilation failures: Always show error messages regardless of log level

File Structure Requirements

1. Kernels Directory Structure

The kernels directory must contain a kernel_config.py file:

kernels/
├── kernel_config.py          # Required: kernel configuration
├── orchestration/
│   └── example_orch.cpp      # Orchestration function implementation
└── aiv/                      # or aic/, depending on core type
    ├── kernel_add.cpp
    ├── kernel_mul.cpp
    └── ...

2. kernel_config.py Format

from pathlib import Path

KERNELS_DIR = Path(__file__).parent

# Kernel list
KERNELS = [
    {
        "func_id": 0,                                      # Kernel ID
        "core_type": "aiv",                                # Core type: aiv or aic
        "source": str(KERNELS_DIR / "aiv/kernel_add.cpp") # Kernel source file path
    },
    # More kernels...
]

# Orchestration function configuration
ORCHESTRATION = {
    "source": str(KERNELS_DIR / "orchestration/example_orch.cpp"),
    "function_name": "BuildExampleGraph"  # Orchestration function name
}

3. golden.py Format

import torch

# Output tensor names list (optional, or use 'out_' prefix convention)
__outputs__ = ["f"]

# Tensor order (required, must match orchestration function parameter order)
TENSOR_ORDER = ["a", "b", "f"]

# Comparison tolerances
RTOL = 1e-5
ATOL = 1e-5

def generate_inputs(params: dict) -> dict:
    """
    Generate input and output tensors.

    Args:
        params: Parameter dictionary (from ALL_CASES)

    Returns:
        Dictionary containing all tensors (inputs + outputs)
    """
    SIZE = 16384
    return {
        "a": torch.full((SIZE,), 2.0, dtype=torch.float32),
        "b": torch.full((SIZE,), 3.0, dtype=torch.float32),
        "f": torch.zeros(SIZE, dtype=torch.float32),  # Output tensor
    }

def compute_golden(tensors: dict, params: dict) -> None:
    """
    Compute expected output (in-place modification).

    Args:
        tensors: Dictionary containing all tensors
        params: Parameter dictionary
    """
    a = tensors["a"]
    b = tensors["b"]
    tensors["f"][:] = (a + b + 1) * (a + b + 2)

# Optional: Multiple test cases
ALL_CASES = {
    "Default": {},
    # "Large": {"size": 1024},  # Other test cases
}
DEFAULT_CASE = "Default"

Golden Script Interface Description

Required Functions

  1. generate_inputs(params: dict) -> dict

    • Generate input and output tensors
    • Returns: Dictionary with tensor names as keys and torch tensors as values
  2. compute_golden(tensors: dict, params: dict) -> None

    • Compute expected output values
    • Modifies output tensors in tensors dictionary in-place

Required Configuration

  • TENSOR_ORDER: List specifying the order of tensors passed to orchestration function

Optional Configuration

  • __outputs__: Output tensor names list (or use out_ prefix convention)
  • ALL_CASES: Dict of named parameter sets for parameterized tests
  • DEFAULT_CASE: Name of the default case to run
  • RTOL: Relative tolerance (default 1e-5)
  • ATOL: Absolute tolerance (default 1e-5)

Output Tensor Identification

The test framework supports two methods for identifying output tensors:

Method 1: Explicit Declaration (Recommended)

__outputs__ = ["f", "result", ...]

Method 2: Naming Convention

Use out_ prefix to name output tensors:

def generate_inputs(params: dict) -> dict:
    return {
        "a": torch.randn(1024),      # Input
        "b": torch.randn(1024),      # Input
        "out_f": torch.zeros(1024),  # Output (auto-detected)
    }

Orchestration Function Interface

For host_build_graph, orchestration sources should include orchestration_api.h and use ChipStorageTaskArgs:

// Assume TENSOR_ORDER = ["a", "b", "f"]
#include "orchestration_api.h"

int BuildExampleGraph(OrchestrationRuntime* runtime, const ChipStorageTaskArgs &orch_args) {
    void* ptr_a = orch_args.tensor(0).data_as<void>();
    void* ptr_b = orch_args.tensor(1).data_as<void>();
    void* ptr_f = orch_args.tensor(2).data_as<void>();

    size_t size_a = orch_args.tensor(0).nbytes();
    size_t size_b = orch_args.tensor(1).nbytes();
    size_t size_f = orch_args.tensor(2).nbytes();

    void* dev_a = device_malloc(runtime, size_a);
    void* dev_b = device_malloc(runtime, size_b);
    void* dev_f = device_malloc(runtime, size_f);
    copy_to_device(runtime, dev_a, ptr_a, size_a);
    copy_to_device(runtime, dev_b, ptr_b, size_b);
    record_tensor_pair(runtime, ptr_f, dev_f, size_f);

    // Build task graph...
    return 0;
}

Environment Variables

Logging Configuration (All Platforms)

# Set log level (optional, CLI arguments take priority)
export PTO_LOG_LEVEL=debug  # Options: error, warn, info, debug

# Optional: Output C++ Host logs to file
export PTO_LOG_FILE=/tmp/pto_runtime.log

a2a3 Platform (Hardware)

# Required
export ASCEND_HOME_PATH=/usr/local/Ascend/cann-8.5.0

# PTO_ISA_ROOT is auto-detected (auto-cloned to examples/scripts/_deps/pto-isa on first run)
# Override if needed:
# export PTO_ISA_ROOT=/path/to/pto-isa

# Optional: choose device via CLI, e.g. `-d 0`

a2a3sim Platform (Simulation)

No special platform-specific environment variables required.

Complete Example

Directory Structure

my_test/
├── kernels/
│   ├── kernel_config.py
│   ├── orchestration/
│   │   └── my_orch.cpp
│   └── aiv/
│       ├── kernel_add.cpp
│       └── kernel_mul.cpp
└── golden.py

Running Tests

# Hardware platform
python examples/scripts/run_example.py -k my_test/kernels -g my_test/golden.py -p a2a3

# Simulation platform
python examples/scripts/run_example.py -k my_test/kernels -g my_test/golden.py -p a2a3sim

# Verbose output
python examples/scripts/run_example.py -k my_test/kernels -g my_test/golden.py -p a2a3sim -v

Test Output

Success Example

=== Building Runtime: host_build_graph (platform: a2a3sim) ===
...
=== Compiling and Registering Kernels ===
Compiling kernel: kernels/aiv/kernel_add.cpp (func_id=0)
...
=== Generating Input Tensors ===
Inputs: ['a', 'b']
Outputs: ['f']
...
=== Launching Runtime ===
...
=== Comparing Results ===
Comparing f: shape=(16384,), dtype=float32
  First 10 actual:   [42. 42. 42. 42. 42. 42. 42. 42. 42. 42.]
  First 10 expected: [42. 42. 42. 42. 42. 42. 42. 42. 42. 42.]
  f: PASS (16384/16384 elements matched)

============================================================
TEST PASSED
============================================================

Failure Example

=== Comparing Results ===
Comparing f: shape=(16384,), dtype=float32
  First 10 actual:   [40. 40. 40. 40. 40. 40. 40. 40. 40. 40.]
  First 10 expected: [42. 42. 42. 42. 42. 42. 42. 42. 42. 42.]

TEST FAILED: Output 'f' does not match golden

Reference Examples

FAQ

Q: How to debug test failures?

Use the -v flag to enable verbose output:

python examples/scripts/run_example.py -k ... -g ... -p ... -v

Q: Why "binary_data cannot be empty" error?

This usually happens when:

  • Using wrong platform (a2a3 vs a2a3sim)
  • Kernel compilation failed silently

Solutions:

  1. Verify correct -p parameter is used
  2. Check if kernel source files exist
  3. Use -v to view detailed compilation logs

Q: How to add multiple test cases?

Define ALL_CASES and DEFAULT_CASE in golden.py:

ALL_CASES = {
    "Small": {"size": 1024},
    "Medium": {"size": 2048},
    "Large": {"size": 4096},
}
DEFAULT_CASE = "Small"

def generate_inputs(params: dict) -> dict:
    size = params["size"]
    return {
        "a": torch.randn(size, dtype=torch.float32),
        "b": torch.randn(size, dtype=torch.float32),
        "out_f": torch.zeros(size, dtype=torch.float32),
    }

Then use --all to run all cases or --case Medium to run a specific one.

Q: Are PyTorch tensors supported?

Yes. The test framework uses PyTorch tensors by default:

import torch

def generate_inputs(params: dict) -> dict:
    return {
        "a": torch.randn(1024),
        "b": torch.randn(1024),
        "out_f": torch.zeros(1024),
    }

Q: How to control log output verbosity?

Use the --log-level argument or --verbose/--silent flags:

# Show detailed debug information
python examples/scripts/run_example.py -k ... -g ... --verbose

# Only show errors
python examples/scripts/run_example.py -k ... -g ... --silent

# Or use explicit log level
python examples/scripts/run_example.py -k ... -g ... --log-level debug

# Show warnings and errors
python examples/scripts/run_example.py -k ... -g ... --log-level warn

Q: How to hide compiler warnings?

Use info level (default). Compiler output is automatically hidden in error/warn/info levels unless compilation fails:

# Default behavior - hides compiler output
python examples/scripts/run_example.py -k ... -g ...

To see compiler output for debugging, use debug level:

python examples/scripts/run_example.py -k ... -g ... --verbose
# or
python examples/scripts/run_example.py -k ... -g ... --log-level debug

Q: How to save C++ logs to a file?

Set the PTO_LOG_FILE environment variable:

export PTO_LOG_FILE=/tmp/pto_runtime.log
python examples/scripts/run_example.py -k ... -g ...

# View the logs
cat /tmp/pto_runtime.log

Advanced Usage

Custom Runtime Implementation

If you have a custom runtime implementation:

python examples/scripts/run_example.py \
  -k my_test/kernels \
  -g my_test/golden.py \
  -r my_custom_runtime \
  -p a2a3sim

Runtime implementation should be located at: src/{arch}/runtime/<runtime_name>/

Programmatic Usage

You can use create_code_runner directly in Python scripts. It creates a CodeRunner configured from the RUNTIME_CONFIG in your kernel_config.py:

from code_runner import create_code_runner

runner = create_code_runner(
    kernels_dir="my_test/kernels",
    golden_path="my_test/golden.py",
    platform="a2a3sim",
    device_id=0,
)

runner.run()  # Execute test

Related Documentation

Contributing

For issues or suggestions, please submit an Issue or Pull Request.