You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A kernel using the hard (FFTS) form of pl.system.syncall deadlocks on device with RuntimeError: run_prepared failed with code 507018 (AICore timeout) when the enclosing pl.spmd(N) launch does not fill all physical cores of the barrier's core_type. The hard barrier (SYNC_AIV_ONLY_ALL / mix) waits for every physical core of that type to arrive; the unlaunched cores never reach the barrier, so the FFTS wait never completes.
There is no compile-time diagnostic — the mistake only surfaces at runtime as a timeout, and it leaves the device draining/reset (subsequent runs on that device fail with code 13 until reset). The compiler already knows both the launch block count (pl.spmd(N)) and the target SoC's physical core count per core type, so this is statically checkable.
Steps to Reproduce
Minimal on-device repro (Ascend 910B = 48 physical AIV cores): a SPMD elementwise add with a hard pl.system.syncall(core_type="aiv_only") between the loads and the add, launched at partial occupancy. Mirrors tests/st/runtime/cross_core/test_syncall.py (whose docstring already documents that hard SYNCALL needs full AIV occupancy).
Run on a2a3. pl.spmd(24) (partial) → 507018. Changing onlypl.spmd(24) → pl.spmd(48) (full occupancy) → PASS. Occupancy is the sole variable.
Expected Behavior
The compiler rejects a hard-mode syncall whose enclosing pl.spmd launch does not fill all physical cores of the barrier's core_type, with a clear compile-time error, e.g.:
hard pl.system.syncall(core_type="aiv_only") requires the spmd launch to fill all 48 AIV cores, but pl.spmd(24) launches 24 blocks. Use mode="soft" (GM-polling) for partial occupancy.
At minimum, a documented static diagnostic instead of a silent runtime 507018 + device reset.
Actual Behavior
Compiles silently; on device the run hangs until the FFTS wait times out and the device is force-reset:
[ERROR] sync_run_streams: aclrtSynchronizeStreamWithTimeout (AICPU) failed: 507018
[ERROR] recover_device_or_mark_unusable: AICore error 507018: bounded device drain failed (force reset will follow in finalize)
RuntimeError: run_prepared failed with code 507018
The device is left unusable afterwards (later runs fail with code 13 until it recovers).
Environment
Component
Version
pypto
d598b41b (branch: main)
pypto runtime (submodule)
02bd0c4f
pto-isa
e722679
ptoas
0.48
CANN
not detected
Host Platform
Linux (aarch64)
NPU Kind
Ascend 910B
Additional Context
The soft form already works at partial occupancy: pl.system.syncall(mode="soft", core_type="aiv_only", gm_workspace=ws, used_cores=N) (see the *Soft* case in tests/st/runtime/cross_core/test_syncall.py). A compile-time check for the hard form would make the hard-vs-soft occupancy contract explicit and catch the footgun early.
Component
Codegen
Description
A kernel using the hard (FFTS) form of
pl.system.syncalldeadlocks on device withRuntimeError: run_prepared failed with code 507018(AICore timeout) when the enclosingpl.spmd(N)launch does not fill all physical cores of the barrier'score_type. The hard barrier (SYNC_AIV_ONLY_ALL/ mix) waits for every physical core of that type to arrive; the unlaunched cores never reach the barrier, so the FFTS wait never completes.There is no compile-time diagnostic — the mistake only surfaces at runtime as a timeout, and it leaves the device draining/reset (subsequent runs on that device fail with
code 13until reset). The compiler already knows both the launch block count (pl.spmd(N)) and the target SoC's physical core count per core type, so this is statically checkable.Steps to Reproduce
Minimal on-device repro (Ascend 910B = 48 physical AIV cores): a SPMD elementwise add with a hard
pl.system.syncall(core_type="aiv_only")between the loads and the add, launched at partial occupancy. Mirrorstests/st/runtime/cross_core/test_syncall.py(whose docstring already documents that hard SYNCALL needs full AIV occupancy).Run on
a2a3.pl.spmd(24)(partial) → 507018. Changing onlypl.spmd(24)→pl.spmd(48)(full occupancy) → PASS. Occupancy is the sole variable.Expected Behavior
The compiler rejects a hard-mode
syncallwhose enclosingpl.spmdlaunch does not fill all physical cores of the barrier'score_type, with a clear compile-time error, e.g.:At minimum, a documented static diagnostic instead of a silent runtime 507018 + device reset.
Actual Behavior
Compiles silently; on device the run hangs until the FFTS wait times out and the device is force-reset:
The device is left unusable afterwards (later runs fail with
code 13until it recovers).Environment
d598b41b(branch:main)02bd0c4fe7226790.48Host Platform
Linux (aarch64)
NPU Kind
Ascend 910B
Additional Context
pl.system.syncall(mode="soft", core_type="aiv_only", gm_workspace=ws, used_cores=N)(see the*Soft*case intests/st/runtime/cross_core/test_syncall.py). A compile-time check for the hard form would make the hard-vs-soft occupancy contract explicit and catch the footgun early.