-
Notifications
You must be signed in to change notification settings - Fork 35
[BUG] memtrace multicore: DFE no-forward-progress assert #21
Description
Title
memtrace multicore: DFE no-forward-progress assert, then teardown asserts (FTQ/Lookahead not drained)
Body
Hi all,
I found a reproducible multicore memtrace liveness failure.
Baseline repro (first issue observed)
- Commit tested:
cc8a749 - Frontend:
memtrace - Cores:
2
Command:
./src/scarab --frontend memtrace --num_cores=2 \
--cbp_trace_r0=/home/iconst01/SCARAB_traces/geekbench/single_core/navigation/traces/simp/4190.zip \
--cbp_trace_r1=/home/iconst01/SCARAB_traces/geekbench/single_core/html5_browser/traces/simp/780.zip \
--inst_limit=100000000Observed:
No forward progress for 1000000 cycles
src/decoupled_frontend.cc:368: ASSERT FAILED ... : 0
Immediately after abort, teardown asserts also fire:
src/lookahead_buffer.cc:45: ASSERT FAILED ... : ft_buffer_count == 0
src/decoupled_frontend.cc:190: ASSERT FAILED ... : ftq.empty()
Follow-up investigation
-
I changed DFE forward-progress accounting from global to per-core.
The failure persisted. -
I enabled debug prints and saw DFE frequently breaking due to FTQ full (
Break due to full FTQ), i.e. no effective progress. -
I then removed the DFE break on FTQ-full just for debugging.
This avoids the original DFE forward-progress assert path, but the run still fails later with:- Exact message: What prevents proceeding? Node stage is empty!
src/sim.c:344forward-progress assert (last_forward_progress:0, node stage empty)- and teardown asserts (
lookahead_buffer.cc:45,decoupled_frontend.cc:190)
Interpretation
The initial user-visible failure is DFE no-forward-progress.
The later lookahead_buffer / ftq.empty() asserts appear to be fallout during shutdown after the liveness failure.
I can provide:
- full logs
- debug log snippets
- temporary debug-only patch (not proposed as final fix)
Suggestion (process)
To catch this class of issues earlier (Or any multicore issues), it may help to add a small multicore memtrace regression to CI
(for example, 2 cores with a short inst_limit) and fail the run if forward progress stops.