Skip to content

[Profiling Report] TraCR trace capture on DeepSeek V4 CSA decode-attention kernel #590

Description

@noabauma

Summary

Captured AICPU/host-runtime-level traces for the standalone decode_attention_csa.py (DeepSeek V4 CSA decode-attention) kernel using the TraCR-enabled simpler runtime.

Setup

Kernel ran standalone on a single Ascend 910B2 NPU via the pypto-lib JIT harness (run_jit).
Simpler built from the TraCR branch with BUILD_TRACR=ON.

Result

Kernel passed golden validation against the torch reference — kv_cache and x_out both PASS, so the traces reflect a correct, complete run.
TraCR recorded the simpler task-graph scheduler across the 4 AICPU threads.

Profile view

Image

This is the file to visualize on Perfetto:
decode_attention_csa.json

How to recreate the plot

To create these profiles. One has to use the Simpler with TraCR branch, and when compiling Simpler, add the Env flag BUILD_TRACR=ON pip install --no-build-isolation -e .. TraCR is a low-level profiler that captures traces on the Ascend device. Also, set _DEFAULT_RUNTIME = "tensormap_and_ringbuffer" inside the pypto/runtime/worker.py as currently, TraCR is only built on top of this Simpler runtime scheduler. PyPTO should also be built based on this Simpler version; otherwise, PyPTO-based examples will not capture TraCR traces. When running PyPTO or Simpler examples, TraCR will produce a tracr_0/ in ~/ascend/. This has to be post-processed by running this command:

./pypto/runtime/build/output/bin/tracr_process ~/ascend/tracr_0/

It will generate a tracr_0/perfetto.json file, which can be viewed in Perfetto.

Motivation / Use Case

Showing the current state of the decode_attention_csa.py (DeepSeek V4 CSA decode-attention) kernel using TraCR.

Validate that TraCR (simpler tracr branch, BUILD_TRACR=ON) can profile a real pypto-lib kernel end-to-end on actual NPU hardware — capturing AICPU/host-runtime traces from a standalone, single-device kernel run — as a first step toward profiling larger models.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions