perf: optimize CGRANode::isOccupied with O(1) periodic lookup (~7.5x speedup) by shiyunyao · Pull Request #70 · tancheng/CGRA-Mapper

shiyunyao · 2026-01-14T12:10:25Z

Profiling the mapper using perf (with OpenMP disabled to isolate algorithmic costs) revealed that CGRANode::isOccupied consumes approximately 70-80% of the total execution time.

I replaced the O(N) linear loop with an O(1) direct index lookup
This change maintains bit-perfect consistency with the original logic but eliminates the loop overhead.

To demonstrate scalability, I tested the fir kernel with an enlarged CGRA configuration (rows=16, cols=16 in param.json):

Metric	Original	This PR
Time	~197s	~26s
Speedup	-	~7.5x
Result (II)	16	16

After removing this bottleneck, the overhead of OpenMP thread management has become relatively significant compared to the reduced computation time. We might need to reconsider the necessity of the current OpenMP strategy in future optimizations

tancheng · 2026-01-14T17:26:05Z

src/CGRANode.cpp

-      if (p.second == START_PIPE_OCCUPY or p.second == SINGLE_OCCUPY or m_supportDVFS) {
-        return true;
-      }
+  for (pair<DFGNode*, int> p: *(m_dfgNodesWithOccupyStatus[t_II+(t_cycle)%t_II])){


Removing this is correct? Removing this would cause only checking one specific cycle. However, we need to check cycle, cycle + II, cycle + 2 * II, cycle + 3 * II, etc. WDYT?

Hi @tancheng, thanks for the review!

You are completely right that modulo scheduling theoretically requires verifying all equivalent cycles.

I double-checked the implementation of setDFGNode and confirmed that m_dfgNodesWithOccupyStatus is only modified in that function. Crucially, the population logic there is strictly periodic:

// In setDFGNode: for (int cycle = t_cycle % interval; cycle < m_cycleBoundary; cycle += interval) { // This loop ensures the occupancy status is identical for ALL modulo cycles. m_dfgNodesWithOccupyStatus[cycle]->push_back(...); }

Since the data is already populated identically for every cycle + k*II, checking a single valid cycle (like t_II + t_cycle % t_II) is mathematically equivalent to checking the entire loop, but much faster.

You mean there is no need to check cycle + k*II? Could you go through the other kernels' mapping results to ensure the correctness?

Then plz leave a comment there, mentioning materializing DFG node mapping across all cycles with II interval has already been done during setDFGNode().

@MeowMJ WDYT?

@MeowMJ I have performed a regression test on 5 kernels (fir, conv, nonlinear, multicycle, dvfs) to verify the correctness.

Kernel Baseline II Optimized II Result Status Routing

fir 6 6 Pass Identical

conv 4 4 Pass Identical

nonlinear 2 2 Pass Identical

multicycle 4 4 Pass Identical

dvfs 4 4 Pass Identical

The optimization is strictly bit-exact. For all tested kernels, not only did we achieve the same Initiation Interval (II), but the final Placement and Routing layouts were also identical to the baseline.

@tancheng, I have checked the logic of setDFGNode and m_dfgNodesWithOccupyStatus. This change is correct and helpful. We can even update all functions related to setDFGNode and m_dfgNodesWithOccupyStatus to a simpler implementation.

We can even update all functions related to setDFGNode and m_dfgNodesWithOccupyStatus to a simpler implementation.

@MeowMJ can you please elaborate on this, and let @shiyunyao try your idea?

src/CGRANode.cpp

tancheng · 2026-01-15T02:56:56Z

src/CGRANode.cpp

-      if (p.second == START_PIPE_OCCUPY or p.second == SINGLE_OCCUPY or m_supportDVFS) {
-        return true;
-      }
+  for (pair<DFGNode*, int> p: *(m_dfgNodesWithOccupyStatus[t_II+(t_cycle)%t_II])){


@MeowMJ WDYT?

MeowMJ · 2026-01-31T07:02:08Z

@shiyunyao I found a problem. In setDFGNode, the duplicate data is placed into m_dfgNodesWithOccupyStatus[k*t_II], where k < CGRANode*t_II, or into m_dfgNodesWithOccupyStatus[t_cycle % t_II + k'*t_II], where k' < CGRANode*t_II - (t_cycle % t_II) / t_II. However, in functions that are related to m_dfgNodesWithOccupyStatus, that is, isOccupied, isStartOrInPipe, isInOrEndPipe, and isEndPipe, they fetch data in m_dfgNodesWithOccupyStatus[t_cycle + k''*t_II], where k'' < CGRANode*t_II - t_cycle / t_II. Please note that the t_cycle is a parameter of these functions.

Though we can update the data fetch pattern in four functions related to m_dfgNodesWithOccupyStatus, it may raise other issues when t_cycle changes in their callers. WDYT? Can you check the value of t_cycle in the four functions?

optimize CGRANode::isOccupied

c4e8214

tancheng reviewed Jan 14, 2026

View reviewed changes

tancheng requested a review from MeowMJ January 14, 2026 17:26

docs: add comment explaining the O(1) check logic

177ec85

tancheng approved these changes Jan 15, 2026

View reviewed changes

fix operator

24a3ee7

MeowMJ approved these changes Jan 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize CGRANode::isOccupied with O(1) periodic lookup (~7.5x speedup)#70

perf: optimize CGRANode::isOccupied with O(1) periodic lookup (~7.5x speedup)#70
shiyunyao wants to merge 3 commits intotancheng:masterfrom
shiyunyao:master

shiyunyao commented Jan 14, 2026

Uh oh!

tancheng Jan 14, 2026

Uh oh!

shiyunyao Jan 15, 2026

Uh oh!

MeowMJ Jan 15, 2026

Uh oh!

tancheng Jan 15, 2026

Uh oh!

tancheng Jan 15, 2026

Uh oh!

shiyunyao Jan 15, 2026

Uh oh!

MeowMJ Jan 17, 2026

Uh oh!

tancheng Jan 17, 2026

Uh oh!

Uh oh!

tancheng Jan 15, 2026

Uh oh!

MeowMJ commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Kernel	Baseline II	Optimized II	Result Status	Routing
`fir`	6	6	Pass	Identical
`conv`	4	4	Pass	Identical
`nonlinear`	2	2	Pass	Identical
`multicycle`	4	4	Pass	Identical
`dvfs`	4	4	Pass	Identical

Conversation

shiyunyao commented Jan 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MeowMJ commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants