Skip to content

[Code Health] Warn when task fanin/fanout exceeds 16 during dependency construction #1261

Description

@ChaoZheng109

Category

Robustness (potential edge-case failure)

Component

AICPU Scheduler

Description

When the orch submit path and scheduler thread 0 build task dependencies, the runtime constructs each task's fanin/fanout relationship. Very large fanin or fanout can make dependency wiring harder to reason about and may indicate an unexpectedly broad dependency shape.

Add a warning diagnostic when a task's fanin or fanout is greater than 16. The warning should identify the task and whether the high-water value is fanin or fanout, so workload authors can spot unusually dense dependency graphs early.

Because this is on the AICPU scheduler/orchestration path, the warning should avoid hot-path log flooding. Prefer logging only when the threshold is crossed for a task, or as a high-water diagnostic, rather than logging unconditionally in an inner loop.

Related: #959

Location

  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/pto_scheduler.h

Proposed Fix

During dependency construction, track the computed fanin/fanout count for each task. If either count is greater than 16, emit a bounded warning log that includes at least:

  • task id / slot id if available
  • fanin count
  • fanout count
  • the dependency-construction phase where the value was observed

Keep the diagnostic bounded to avoid per-edge or per-iteration AICPU logging.

Priority

Medium (minor risk, should fix in next few releases)

Metadata

Metadata

Assignees

No one assigned

    Labels

    code healthTechnical debt, robustness, code quality

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions