Skip to content

Allocate cgra to task#307

Open
guosran wants to merge 7 commits intomainfrom
feature/allocate-cgra-to-task
Open

Allocate cgra to task#307
guosran wants to merge 7 commits intomainfrom
feature/allocate-cgra-to-task

Conversation

@guosran
Copy link
Copy Markdown
Collaborator

@guosran guosran commented Mar 31, 2026

AllocateCgraToTask Pass

Summary

Adds a new AllocateCgraToTask compiler pass that maps tasks onto a 2D CGRA grid, replacing the previous MapTaskOnCgra pass. The key addition is multi-CGRA support: a task can now be assigned multiple contiguous grid positions based on a cgra_count attribute already present in the IR (set manually or by an upstream optimization pass).

Changes

New pass

The old 600-line monolithic MapTaskOnCgraPass.cpp is replaced by:

  • A thin pass wrapper (~50 lines) in lib/TaskflowDialect/Transforms/.
  • A mapper implementation and shared utilities in a new lib/TaskflowDialect/Util/ library, making the placement logic reusable by other passes.

Multi-CGRA task placement

Previously each task was pinned to a single CGRA cell. Now the placer reads cgra_count per task and finds a connected cluster of that many cells on the grid. Placement shapes are enumerated (rectangles first, then non-rectangular DFS fallback) and ranked by a proximity score.

Placement algorithm

  1. Critical-path-first ordering: tasks with longer downstream dependency chains are placed first, giving their successors the best chance of landing on adjacent cells.
  2. Scoring: a candidate position is scored by Manhattan distance to already-placed SSA producers/consumers and to assigned SRAM locations.
  3. Fixed-point SRAM assignment: after placing all tasks, each MemRef is assigned to the SRAM at the centroid of its accessing tasks. Task placement is then re-run with the updated SRAM positions; this repeats until assignments converge.

Input / output contract

  • Input: task IR with cgra_count attribute on each TaskflowTaskOp.
  • Output: task_mapping_info attribute on each task with cgra_positions,
    read_sram_locations, and write_sram_locations.

What Is Not In This PR

The ResourceAwareTaskOptimizationPass (which decides how many CGRAs each task should use) is not included. That integration lives on a separate branch.

@guosran guosran requested a review from ShangkunLi March 31, 2026 02:09
Comment on lines 70 to 71
Fusion candidates (same-header SSA dependencies) are placed on adjacent
CGRAs to enable direct data forwarding.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do fusion candidates mean?

}

void runOnOperation() override {
runAllocateCgraToTask(getOperation(), kCgraGridRows, kCgraGridCols);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can maintain an Allocation class in the include or lib/TaskflowDialect/Allocation folder. And make this function a virtual function of this pass, which can be overridden by different task allocation algorithms. Please refer to https://github.com/coredac/dataflow/blob/main/include/NeuraDialect/Mapping/Mapping.h and https://github.com/coredac/dataflow/tree/main/include/NeuraDialect/Mapping/HeuristicMapping

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can put the definition of this function in the allocation_utils.cpp in this pr. And make another pr for the code refactoring described above.

};

//===----------------------------------------------------------------------===//
/// Maps a task-memory graph onto a 2D CGRA grid.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2D multi-CGRA

// successors the best chance of landing on adjacent grid cells.
computeDependencyDepth(graph);

// Sorts tasks by dependency depth (Critical Path First).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rename the "Critical Path" to "Routing-Critical Path", because there might be different critical paths in a compiler optimization pipeline, and we should distinguish them.

});

// Fixed-point iteration: task placement scoring depends on SRAM
// positions (memory proximity), and SRAM assignment depends on task
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you randomly distribute the memrefs on the multi-cgra grid initially?

task_nodes.push_back(std::move(node));
});

// Phase 2: Create MemoryNodes using ORIGINAL memrefs (canonical identity).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creates

DenseMap<Operation *, TaskNode *> op_to_node;

void build(func::FuncOp func) {
// Phase 1: Create a TaskNode for every TaskflowTaskOp in the function.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creates

Comment on lines +289 to +291
if (iter > 0 && !sram_moved) {
break;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for early exit, right?

}

// Finds the best placement for `task_node` requiring exactly `cgra_count`
// CGRAs. Strategy:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cgra_count for a task is tightly coupled with the shape.

That means, this allocation function will only take the task with a determined cgra_count + a determined shape as input and generate the output.

Both cgra_count determination and shape determination should be handled by an upstream pass (e.g., resource binding).

WDYT?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the rotation of a binding shape should be handled in this allocate-cgra-to-task pass (i.e., we should consider different rotations for a non-rectangular shape).

// canAllTasksFitOnGrid
//===----------------------------------------------------------------------===//

bool mlir::taskflow::canAllTasksFitOnGrid(ArrayRef<int> task_cgra_counts) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this function is trying to map tasks onto the multi-cgra grid without considering memory placement?

@ShangkunLi ShangkunLi added the new feature New feature or request label Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants