Skip to content

[BUG] memtrace multicore: DFE no-forward-progress assert #21

@iconstan01

Description

@iconstan01

Title

memtrace multicore: DFE no-forward-progress assert, then teardown asserts (FTQ/Lookahead not drained)

Body

Hi all,

I found a reproducible multicore memtrace liveness failure.

Baseline repro (first issue observed)

  • Commit tested: cc8a749
  • Frontend: memtrace
  • Cores: 2

Command:

./src/scarab --frontend memtrace --num_cores=2 \
  --cbp_trace_r0=/home/iconst01/SCARAB_traces/geekbench/single_core/navigation/traces/simp/4190.zip \
  --cbp_trace_r1=/home/iconst01/SCARAB_traces/geekbench/single_core/html5_browser/traces/simp/780.zip \
  --inst_limit=100000000

Observed:

No forward progress for 1000000 cycles
src/decoupled_frontend.cc:368: ASSERT FAILED ... : 0

Immediately after abort, teardown asserts also fire:

src/lookahead_buffer.cc:45: ASSERT FAILED ... : ft_buffer_count == 0
src/decoupled_frontend.cc:190: ASSERT FAILED ... : ftq.empty()

Follow-up investigation

  1. I changed DFE forward-progress accounting from global to per-core.
    The failure persisted.

  2. I enabled debug prints and saw DFE frequently breaking due to FTQ full (Break due to full FTQ), i.e. no effective progress.

  3. I then removed the DFE break on FTQ-full just for debugging.
    This avoids the original DFE forward-progress assert path, but the run still fails later with:

    • Exact message: What prevents proceeding? Node stage is empty!
    • src/sim.c:344 forward-progress assert (last_forward_progress:0, node stage empty)
    • and teardown asserts (lookahead_buffer.cc:45, decoupled_frontend.cc:190)

Interpretation

The initial user-visible failure is DFE no-forward-progress.
The later lookahead_buffer / ftq.empty() asserts appear to be fallout during shutdown after the liveness failure.

I can provide:

  • full logs
  • debug log snippets
  • temporary debug-only patch (not proposed as final fix)

Suggestion (process)

To catch this class of issues earlier (Or any multicore issues), it may help to add a small multicore memtrace regression to CI
(for example, 2 cores with a short inst_limit) and fail the run if forward progress stops.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions