Skip to content

Add PTODSL A5 DSL ST coverage#886

Open
jimmychou0 wants to merge 3 commits into
hw-native-sys:mainfrom
jimmychou0:codex/ptodsl-a5-dsl-st-validation
Open

Add PTODSL A5 DSL ST coverage#886
jimmychou0 wants to merge 3 commits into
hw-native-sys:mainfrom
jimmychou0:codex/ptodsl-a5-dsl-st-validation

Conversation

@jimmychou0

@jimmychou0 jimmychou0 commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Abstract

This PR adds the first PTODSL-authored A5 DSL ST coverage and updates the PTODSL simulator CI path so the new cases are actually built and run.

The branch was rebuilt after dropping the earlier broad backend workaround. The remaining backend change is intentionally narrow: FoldTileBufIntrinsics now performs fixpoint cleanup of dead view chains exposed after tile intrinsic folding. It deletes only use-empty bridge casts, memref view ops, pto.make_tensor_view / pto.partition_view, and dead tile allocations; it does not rerun full PTOViewToMemref, does not broaden ExpandTileOp, and does not change live view lowering semantics.

Problem scenarios covered:

  • PTODSL A5 vector tile-op coverage: tadd validates a basic tload + tadd + tstore path outside the old tilelang_st harness.
  • PTODSL data movement coverage: tload_store validates GM view construction, tload, tstore, and layout variants.
  • PTODSL broadcast/reduction-style tile coverage: tcolexpand and tcolsum cover non-trivial tile shapes, valid rows/cols, and tile-op expansion/runtime behavior.
  • PTODSL cube coverage: tmatmul validates a cube tile matmul path, while the existing cube_matrix_pipeline.py and gemv_mx_pipeline.py remain part of the simulator suite.
  • Explicit native build policy: mode="explicit" kernels need to compile through PTOAS level3 and should not implicitly enable sync insertion.
  • Native build cache correctness: changing effective compile policy, such as PTO level or insert-sync behavior, must invalidate cached .so artifacts.
  • PTODSL helper/container shape: explicit same-kind @pto.simd / @pto.cube should not create redundant section wrappers; explicit kind mismatches should fail early with a clear diagnostic.
  • Dead PTODSL view chains: after tile-op expansion/folding, dead make_tensor_view / partition_view chains can otherwise leave high-level or memref view ops that later VPTO emission validation rejects.

Implementation changes:

  • Add test/dsl-st/npu_a5 cases for tadd, tload_store, tcolexpand, tcolsum, and tmatmul.
  • Update existing DSL ST cases for the validated simulator flow, including pointer-form vector load/store operands in predicate_pack.py and vmulscvt.py to avoid level3 live memref subviews.
  • Track whether kernel_kind was explicitly authored in @pto.jit while preserving the historical default effective kind of vector when omitted.
  • Lower same-kind explicit subkernel scopes using function/kernel-kind metadata, and diagnose explicit kernel-kind/subkernel-kind mismatches.
  • Map PTODSL native mode="explicit" to ptoas --pto-level=level3; keep explicit mode from implicitly enabling insert-sync.
  • Add compile-configuration hashing to the PTODSL native build cache manifest.
  • Add fixpoint dead-chain cleanup to FoldTileBufIntrinsics and a focused VPTO lit regression.
  • Handle single-child VPTO backend container compile units in tools/ptoas/driver.cpp for non-debug output paths.
  • Update PTODSL simulator CI to use an existing torch / torch_npu runtime instead of installing them each run, isolate PTODSL build artifacts, and ensure test/dsl-st/npu_a5 is covered.

Validation

Validated on the 144 simulator environment under /home/zhoujiaming/ptoas-sim-ci/pr886-cleanup using the LLVM21 VPTO build and CANN simulator:

ninja -C build-sim ptoas PTOPythonModules
/home/zhoujiaming/ptoas-sim-ci/venv-llvm21-build/bin/python \
  /home/zhoujiaming/ptoas-sim-ci/llvm-project-vpto21/build-assert/bin/llvm-lit -sv build-sim/test/lit
/home/zhoujiaming/ptoas-sim-ci/PTOAS-ptodsl-main/.venv/bin/python ptodsl/tests/test_jit_compile.py
scripts/sim_dsl.sh --soc-version Ascend950PR_9599 test/dsl-st

Results:

  • llvm-lit: 764/764 passed.
  • ptodsl/tests/test_jit_compile.py: passed.
  • scripts/sim_dsl.sh --soc-version Ascend950PR_9599 test/dsl-st: all cases passed, including cube_matrix_pipeline, gemv_mx_pipeline, predicate_pack, simt_gm_memory_core, vmulscvt, and the new npu_a5 directory coverage.

Local checks:

git diff --check HEAD~1..HEAD

Also checked that the final diff no longer touches the old broad-workaround files such as ExpandTileOp.cpp, PTOInstantiateAndInlineOpLib.cpp, Passes.td, tools/ptoas/ptoas.cpp, VPTOOps.td, VPTO.cpp, or VPTOPtrNormalize.cpp.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces native tensor_view and partition_tensor_view folding support in the FoldTileBufIntrinsics pass, updates ExpandTileOp to include view shape and strides in the specialization key, and adds a pto_level parameter to @pto.jit to forward build-level overrides to ptoas. Additionally, VPTOSplitCVModule is updated to normalize sections in-place for pre-annotated modules. Feedback on the changes highlights a concurrency violation in FoldTileBufIntrinsics where a FuncOp pass queries the parent module's symbol table, a limitation in traceViewChain that fails on nested partitions, and an inefficient cleanup loop that should be optimized using a worklist-based dead code elimination approach.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread lib/PTO/Transforms/FoldTileBufIntrinsics.cpp Outdated
Comment thread lib/PTO/Transforms/FoldTileBufIntrinsics.cpp Outdated
Comment thread lib/PTO/Transforms/FoldTileBufIntrinsics.cpp Outdated
@reedhecre

reedhecre commented Jun 30, 2026

Copy link
Copy Markdown

Codex Review

该评论由 review 机器人自动更新。

  • PR: Add PTODSL A5 DSL ST coverage #886 Add PTODSL A5 DSL ST coverage
  • Author: jimmychou0
  • Base/Head: main / codex/ptodsl-a5-dsl-st-validation
  • Head SHA: 26da00910468
  • Trigger: PR 有新提交
  • Generated At: 2026-07-03T15:00:50Z
  • Previous Head SHA: 4b25c384bf6e
  • Status: failed at codex-review (exit=1)

Summary

Review failed at stage codex-review: exit=1

Findings

未生成结构化 findings,因为 review 过程提前失败。

Log Tail

 ptodsl/ptodsl/_runtime/toolchain.py                |  12 +
 ptodsl/ptodsl/_tracing/module_builder.py           |   1 +
 ptodsl/ptodsl/_tracing/session.py                  |  47 ++-
 ptodsl/tests/test_jit_compile.py                   |  98 ++++-
 ptodsl/tests/test_runtime_toolchain.py             |  53 +++
 scripts/ptoas_env.sh                               |   9 +-
 scripts/sim_dsl.sh                                 |  45 ++-
 test/dsl-st/cube_matrix_pipeline.py                | 113 +++---
 test/dsl-st/gemv_mx_pipeline.py                    |  16 -
 test/dsl-st/npu_a5/__main__.py                     |  32 ++
 test/dsl-st/npu_a5/tadd.py                         | 128 +++++++
 test/dsl-st/npu_a5/tcolexpand.py                   |  91 +++++
 test/dsl-st/npu_a5/tcolsum.py                      |  90 +++++
 test/dsl-st/npu_a5/tload_store.py                  | 147 +++++++
 test/dsl-st/npu_a5/tmatmul.py                      | 199 ++++++++++
 test/dsl-st/predicate_pack.py                      |  51 +--
 .../fold_tile_buf_intrinsics_dead_view_cleanup.pto |  45 +++
 test/vpto/scripts/run_host_vpto_validation.sh      |  51 ++-
 .../scripts/run_host_vpto_validation_parallel.sh   |  19 +
 tools/ptoas/driver.cpp                             |  36 +-
 28 files changed, 1674 insertions(+), 226 deletions(-)
===== END STAGE clone rc=0 @ 2026-07-03 23:00:40 =====

===== STAGE codex-review @ 2026-07-03 23:00:40 =====
set -euo pipefail
cd '/tmp/ptoas-pr-review-monitor/runs/20260703_230032_pr886/repo'
'codex' exec -C '/tmp/ptoas-pr-review-monitor/runs/20260703_230032_pr886/repo' -s read-only -c 'model_provider="codereview"' -c 'model="gpt-5.4"' -c 'model_reasoning_effort="xhigh"' --output-schema '/tmp/ptoas-pr-review-monitor/runs/20260703_230032_pr886/review_schema.json' -o '/tmp/ptoas-pr-review-monitor/runs/20260703_230032_pr886/codex_last_message.json' --color never - < '/tmp/ptoas-pr-review-monitor/runs/20260703_230032_pr886/review_prompt.txt'
[monitor] stage timeout: 1800s
OpenAI Codex v0.115.0 (research preview)
--------
workdir: /tmp/ptoas-pr-review-monitor/runs/20260703_230032_pr886/repo
model: gpt-5.4
provider: codereview
approval: never
sandbox: read-only
reasoning effort: xhigh
reasoning summaries: none
session id: 019f287f-074e-7e71-be9b-fe240db96f6e
--------
user
你现在在审查 GitHub PR。

仓库:hw-native-sys/PTOAS
PR:#886 Add PTODSL A5 DSL ST coverage
作者:jimmychou0
base branch:origin/main
head branch:HEAD(当前已 checkout 到 PR head)

要求:
1. 只审查这个 PR 相对 origin/main 的改动,必要时可以看上下文文件。
2. 重点找真实的 correctness / regression / contract mismatch / CI / runtime / compatibility 问题。
3. 不要提纯风格建议,不要提低价值猜测。
4. 严格按优先级输出:
   - P1:高概率会导致错误结果、编译/运行失败、严重回归、发布阻断
   - P2:重要缺陷、行为回归、遗漏校验/测试、较大兼容性问题
   - P3:次要但明确可改的问题
5. 如果没有问题,summary 直接写:未检查到 PR #886 存在问题,并返回 findings=[]。
6. 如果有问题,summary 简洁概括,findings 里每条都要给出:
   - severity
   - title
   - body(说明为什么是问题,尽量具体)
   - file(尽量给相对路径)
   - line(能确定就填整数,否则 null)

建议先查看:
- git status --short
- git diff --stat origin/main...HEAD
- git diff --unified=80 origin/main...HEAD

最终输出必须严格匹配 JSON schema。

mcp startup: no servers
Reconnecting... 1/5 (unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, cf-ray: a156c2e13c814747-LAX, request id: 4176ba95-f6a9-4085-a682-ed3983e0d448)
Reconnecting... 2/5 (unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, cf-ray: a156c2e4ee0eb75a-LAX, request id: dd8010ef-bc77-41e1-9013-e177e6b27484)
Reconnecting... 3/5 (unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, cf-ray: a156c2e9b84c6c4b-LAS, request id: 7d0fa252-6b49-42a7-91c1-67ec758facab)
Reconnecting... 4/5 (unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, cf-ray: a156c2f0fb1de172-LAX, request id: 2debfcdc-1904-4b29-81b1-e47823b6ce40)
Reconnecting... 5/5 (unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, cf-ray: a156c2fdc99e17c8-LAX, request id: fb2ae680-5a00-40f0-ad2d-59b7e74431c5)
ERROR: unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, cf-ray: a156c315b85a3385-LAX, request id: 41be1570-7c35-488b-a23a-c2c8ef4b270d
Warning: no last agent message; wrote empty content to /tmp/ptoas-pr-review-monitor/runs/20260703_230032_pr886/codex_last_message.json
===== END STAGE codex-review rc=1 @ 2026-07-03 23:00:50 =====

@jimmychou0 jimmychou0 force-pushed the codex/ptodsl-a5-dsl-st-validation branch 9 times, most recently from 9f6aa25 to 93ec308 Compare July 3, 2026 04:20
@jimmychou0 jimmychou0 marked this pull request as ready for review July 3, 2026 06:21
L0C_ADDR = 0


@pto.cube

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这几个case已经是DSL写的了,为啥要改?

@jimmychou0 jimmychou0 Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • test/dsl-st/cube_matrix_pipeline.py:原来的 cube 用例写法和当前 PTODSL surface 已有漂移,CI 上会在 compile 路径失败;这里改成了当前主干稳定支持的 cube pipeline 写法,用显式的 L1/L0 搬运、matmul 和 writeback 来覆盖同一类能力。

    • test/dsl-st/gemv_mx_pipeline.py:中间一版手写插入了 _pto.TGetScaleAddrOp(...) 来补 MX scale 绑定,但这会让 CI 的 build-ptodsl 路径把pto.tget_scale_addr 送进错误的 ExpandTileOp 模板实例化并失败;这里回退成纯 PTODSL 的 pto.tile.gemv_mx* 写法,避免手写 raw IR。

    • test/dsl-st/predicate_pack.py:原来的写法把 psts/ppack/punpack 放在 @pto.simd helper 里,并通过 helper 中的 tile handle 取地址,CI 上会触发不稳定的 helper ABI/lowering 问题;这里把 predicate materialization 挪回顶层 vector body,并改用显式 UB ptr 做 vlds/psts,

Comment thread test/dsl-st/npu_a5/tadd.py Outdated
Comment on lines +51 to +55

def compile(self, **constexpr_bindings):
compiled = self._compiler.compile(**constexpr_bindings)
_attach_flat_vpto_attrs(compiled.build(), self._compiler._module_spec)
return compiled

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些是啥,为啥要在测试用例里写这些复杂的东西

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之前为了绕开nested container 写了 _flat_jit 和内部 KernelCompiler/KernelModuleSpec 调用问题。
现在都改回@pto.jit


for (pto::AllocTileOp alloc : llvm::reverse(deadAllocs))
alloc.erase();
return !deadAllocs.empty();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的改动和下面的lit用例拆成单独的commit吧,每个commit的修改尽量干净一些

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已重新整理提交

mkdir -p "${WORK_SPACE}"
WORK_SPACE="$(cd "${WORK_SPACE}" && pwd)"

has_torch_npu_packages() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉这些环境问题我们不应该在每个脚本里都写兜底逻辑啊,如果CI环境有问题请王淼修一下吧。脚本里就应该假设所有环境都是ready的,可以在CI的入口统一setup下环境。

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

现在 PTODSL source-backed case 只使用 CI/调用方显式传入的 PTO_PYTHON_BIN / PYTHON_BIN / PTO_DSL_ST_PYTHON_BIN,CI 入口负责选择并 export 可用 Python 环境

Comment thread tools/ptoas/driver.cpp
isBackendPartitionedContainer(op) &&
children.front()->hasAttr(mlir::pto::FunctionKernelKindAttr::name)) {
FailureOr<OwningOpRef<ModuleOp>> jobModuleOr =
buildBackendChildCompileUnit(op, children.front());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这段逻辑是在干嘛,--mlir-print-ir-after可以dump 任意pass的输出,不需要特意写个debug入口吧

@jimmychou0 jimmychou0 Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的修改是为了解决:
PTODSL @pto.jit 生成的 single-child backend container:

module attributes {pto.target_arch = "a5"} {
module attributes {pto.backend = "vpto", pto.kernel_kind = #pto.kernel_kind, ...} {
func.func @tadd_f32_16x64(...) attributes {pto.entry} {
...
pto.tload ...
pto.tstore ...
}
}
}
PTODSL 会生成 outer module + one child module 的backend-partitioned container,ptoas driver 在 single-child backend-partitioned container 下,object 编译时没有把 child module 作为真正 VPTO compile unit。 在expandtileop 时会失败。

@jimmychou0 jimmychou0 force-pushed the codex/ptodsl-a5-dsl-st-validation branch from 93ec308 to 99f140f Compare July 3, 2026 07:45
@jimmychou0 jimmychou0 force-pushed the codex/ptodsl-a5-dsl-st-validation branch from 4b25c38 to 26da009 Compare July 3, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants