Conversation
…ng multi-CGRA task placement
| Fusion candidates (same-header SSA dependencies) are placed on adjacent | ||
| CGRAs to enable direct data forwarding. |
There was a problem hiding this comment.
What do fusion candidates mean?
| } | ||
|
|
||
| void runOnOperation() override { | ||
| runAllocateCgraToTask(getOperation(), kCgraGridRows, kCgraGridCols); |
There was a problem hiding this comment.
I think we can maintain an Allocation class in the include or lib/TaskflowDialect/Allocation folder. And make this function a virtual function of this pass, which can be overridden by different task allocation algorithms. Please refer to https://github.com/coredac/dataflow/blob/main/include/NeuraDialect/Mapping/Mapping.h and https://github.com/coredac/dataflow/tree/main/include/NeuraDialect/Mapping/HeuristicMapping
There was a problem hiding this comment.
I think you can put the definition of this function in the allocation_utils.cpp in this pr. And make another pr for the code refactoring described above.
| }; | ||
|
|
||
| //===----------------------------------------------------------------------===// | ||
| /// Maps a task-memory graph onto a 2D CGRA grid. |
| // successors the best chance of landing on adjacent grid cells. | ||
| computeDependencyDepth(graph); | ||
|
|
||
| // Sorts tasks by dependency depth (Critical Path First). |
There was a problem hiding this comment.
I think we should rename the "Critical Path" to "Routing-Critical Path", because there might be different critical paths in a compiler optimization pipeline, and we should distinguish them.
| }); | ||
|
|
||
| // Fixed-point iteration: task placement scoring depends on SRAM | ||
| // positions (memory proximity), and SRAM assignment depends on task |
There was a problem hiding this comment.
Will you randomly distribute the memrefs on the multi-cgra grid initially?
| task_nodes.push_back(std::move(node)); | ||
| }); | ||
|
|
||
| // Phase 2: Create MemoryNodes using ORIGINAL memrefs (canonical identity). |
| DenseMap<Operation *, TaskNode *> op_to_node; | ||
|
|
||
| void build(func::FuncOp func) { | ||
| // Phase 1: Create a TaskNode for every TaskflowTaskOp in the function. |
| if (iter > 0 && !sram_moved) { | ||
| break; | ||
| } |
There was a problem hiding this comment.
This is for early exit, right?
| } | ||
|
|
||
| // Finds the best placement for `task_node` requiring exactly `cgra_count` | ||
| // CGRAs. Strategy: |
There was a problem hiding this comment.
I think the cgra_count for a task is tightly coupled with the shape.
That means, this allocation function will only take the task with a determined cgra_count + a determined shape as input and generate the output.
Both cgra_count determination and shape determination should be handled by an upstream pass (e.g., resource binding).
WDYT?
There was a problem hiding this comment.
And the rotation of a binding shape should be handled in this allocate-cgra-to-task pass (i.e., we should consider different rotations for a non-rectangular shape).
| // canAllTasksFitOnGrid | ||
| //===----------------------------------------------------------------------===// | ||
|
|
||
| bool mlir::taskflow::canAllTasksFitOnGrid(ArrayRef<int> task_cgra_counts) { |
There was a problem hiding this comment.
So this function is trying to map tasks onto the multi-cgra grid without considering memory placement?
AllocateCgraToTask Pass
Summary
Adds a new
AllocateCgraToTaskcompiler pass that maps tasks onto a 2D CGRA grid, replacing the previousMapTaskOnCgrapass. The key addition is multi-CGRA support: a task can now be assigned multiple contiguous grid positions based on acgra_countattribute already present in the IR (set manually or by an upstream optimization pass).Changes
New pass
The old 600-line monolithic
MapTaskOnCgraPass.cppis replaced by:lib/TaskflowDialect/Transforms/.lib/TaskflowDialect/Util/library, making the placement logic reusable by other passes.Multi-CGRA task placement
Previously each task was pinned to a single CGRA cell. Now the placer reads
cgra_countper task and finds a connected cluster of that many cells on the grid. Placement shapes are enumerated (rectangles first, then non-rectangular DFS fallback) and ranked by a proximity score.Placement algorithm
Input / output contract
cgra_countattribute on eachTaskflowTaskOp.task_mapping_infoattribute on each task withcgra_positions,read_sram_locations, andwrite_sram_locations.What Is Not In This PR
The
ResourceAwareTaskOptimizationPass(which decides how many CGRAs each task should use) is not included. That integration lives on a separate branch.