Refactor the Ctrl to Data Flow Implementation Logic#60
Refactor the Ctrl to Data Flow Implementation Logic#60ShangkunLi merged 7 commits intocoredac:mainfrom
Conversation
|
Excellent summarization @ShangkunLi.
Shouldn't we grant_predicate all live-out results of |
|
Ah I got it. |
You mean grant_predicate all the live-in values used in the block? But in many benchmarks, most live-ins are granted always. Like |
I believe we can
How does this sound? |
Sure, this sounds more robust! |
|
Fix the transform logic for forward cond_br edges without values. |
1ed1e8c to
47ca8f9
Compare
There was a problem hiding this comment.
Hi @ShangkunLi, I just noticed you disabled these tests, can you plz help restore them?
There was a problem hiding this comment.
Sure, will enable in the next pr. I am coding on the ctrl-flow fusion now.
There was a problem hiding this comment.
Hmmm, the --insert-data-mov can work and can generate correct intermediate ir, but the --map-to-accelerator doesn't work now.... I may try to solve this.
There was a problem hiding this comment.
Thanks for investigation.
- We'd better avoid disabling tests, especially when there are more than one contributor, and some others are using the repo to ramp up
- We prefer "tiny" PRs. One PR could only target single functionality or function, it might touch a lot of tests, which is fine, but the funtionality-wise, it should be "tiny"
- If it is a large project, you could make a chain of branches, make PRs gradually
--insert-data-mov can work and can generate correct intermediate ir
You mean the intermediate ir is exactly the same with or without your changes (this PR or your current implementation)?
but the --map-to-accelerator doesn't work now
Hanging or crash or other issue?
There was a problem hiding this comment.
- We'd better avoid disabling tests, especially when there are more than one contributor, and some others are using the repo to ramp up
Got it! Sorry for the problem caused by my unorthodox development process.
For the intermediate ir generated by the --insert-data-mov now is
func.func @loop_test() -> f32 attributes {accelerator = "neura"} {
%0 = "neura.constant"() <{predicate = true, value = 10 : i64}> : () -> !neura.data<i64, i1>
%1 = "neura.data_mov"(%0) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%2 = "neura.grant_always"(%1) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%3 = "neura.constant"() <{predicate = true, value = 0 : i64}> : () -> !neura.data<i64, i1>
%4 = "neura.data_mov"(%3) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%5 = "neura.grant_once"(%4) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%6 = "neura.constant"() <{predicate = true, value = 1 : i64}> : () -> !neura.data<i64, i1>
%7 = "neura.data_mov"(%6) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%8 = "neura.grant_always"(%7) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%9 = "neura.constant"() <{predicate = true, value = 3.000000e+00 : f32}> : () -> !neura.data<f32, i1>
%10 = "neura.data_mov"(%9) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
%11 = "neura.grant_always"(%10) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
%12 = "neura.constant"() <{predicate = true, value = 0.000000e+00 : f32}> : () -> !neura.data<f32, i1>
%13 = "neura.data_mov"(%12) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
%14 = "neura.grant_once"(%13) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
%15 = neura.reserve : !neura.data<f32, i1>
%16 = "neura.data_mov"(%14) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
%17 = "neura.phi"(%15, %16) : (!neura.data<f32, i1>, !neura.data<f32, i1>) -> !neura.data<f32, i1>
%18 = neura.reserve : !neura.data<i64, i1>
%19 = "neura.data_mov"(%5) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%20 = "neura.phi"(%18, %19) : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i64, i1>
%21 = "neura.data_mov"(%17) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
%22 = "neura.data_mov"(%11) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
%23 = "neura.fadd"(%21, %22) : (!neura.data<f32, i1>, !neura.data<f32, i1>) -> !neura.data<f32, i1>
%24 = "neura.data_mov"(%20) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%25 = "neura.data_mov"(%8) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%26 = "neura.add"(%24, %25) : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i64, i1>
%27 = "neura.data_mov"(%26) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%28 = "neura.data_mov"(%2) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%29 = "neura.icmp"(%27, %28) <{cmpType = "slt"}> : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i1, i1>
%30 = "neura.data_mov"(%26) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%31 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
%32 = neura.grant_predicate %30, %31 : !neura.data<i64, i1>, !neura.data<i1, i1> -> !neura.data<i64, i1>
neura.ctrl_mov %32 -> %18 : !neura.data<i64, i1> !neura.data<i64, i1>
%33 = "neura.data_mov"(%23) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
%34 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
%35 = neura.grant_predicate %33, %34 : !neura.data<f32, i1>, !neura.data<i1, i1> -> !neura.data<f32, i1>
neura.ctrl_mov %35 -> %15 : !neura.data<f32, i1> !neura.data<f32, i1>
%36 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
%37 = "neura.not"(%36) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
%38 = "neura.data_mov"(%23) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
%39 = "neura.data_mov"(%37) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
%40 = neura.grant_predicate %38, %39 : !neura.data<f32, i1>, !neura.data<i1, i1> -> !neura.data<f32, i1>
%41 = "neura.data_mov"(%40) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
"neura.return"(%41) : (!neura.data<f32, i1>) -> ()
}
compared with the former one
// MOV: func.func @loop_test() -> f32 attributes {accelerator = "neura"} {
// MOV-NEXT: %0 = "neura.constant"() <{predicate = true, value = 10 : i64}> : () -> !neura.data<i64, i1>
// MOV-NEXT: %1 = "neura.data_mov"(%0) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %2 = "neura.grant_always"(%1) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %3 = "neura.constant"() <{predicate = true, value = 0 : i64}> : () -> !neura.data<i64, i1>
// MOV-NEXT: %4 = "neura.data_mov"(%3) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %5 = "neura.grant_once"(%4) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %6 = "neura.constant"() <{predicate = true, value = 1 : i64}> : () -> !neura.data<i64, i1>
// MOV-NEXT: %7 = "neura.data_mov"(%6) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %8 = "neura.grant_always"(%7) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %9 = "neura.constant"() <{predicate = true, value = 3.000000e+00 : f32}> : () -> !neura.data<f32, i1>
// MOV-NEXT: %10 = "neura.data_mov"(%9) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %11 = "neura.grant_always"(%10) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %12 = "neura.constant"() <{predicate = true, value = 0.000000e+00 : f32}> : () -> !neura.data<f32, i1>
// MOV-NEXT: %13 = "neura.data_mov"(%12) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %14 = "neura.grant_once"(%13) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %15 = neura.reserve : !neura.data<i64, i1>
// MOV-NEXT: %16 = "neura.data_mov"(%5) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %17 = "neura.phi"(%16, %15) : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %18 = neura.reserve : !neura.data<f32, i1>
// MOV-NEXT: %19 = "neura.data_mov"(%14) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %20 = "neura.phi"(%19, %18) : (!neura.data<f32, i1>, !neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %21 = "neura.data_mov"(%20) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %22 = "neura.data_mov"(%11) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %23 = "neura.fadd"(%21, %22) : (!neura.data<f32, i1>, !neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %24 = "neura.data_mov"(%17) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %25 = "neura.data_mov"(%8) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %26 = "neura.add"(%24, %25) : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %27 = "neura.data_mov"(%26) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %28 = "neura.data_mov"(%2) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %29 = "neura.icmp"(%27, %28) <{cmpType = "slt"}> : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i1, i1>
// MOV-NEXT: %30 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
// MOV-NEXT: %31 = "neura.not"(%30) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
// MOV-NEXT: %32 = "neura.data_mov"(%23) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %33 = "neura.data_mov"(%31) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
// MOV-NEXT: %34 = neura.grant_predicate %32, %33 : !neura.data<f32, i1>, !neura.data<i1, i1> -> !neura.data<f32, i1>
// MOV-NEXT: %35 = "neura.data_mov"(%23) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %36 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
// MOV-NEXT: %37 = neura.grant_predicate %35, %36 : !neura.data<f32, i1>, !neura.data<i1, i1> -> !neura.data<f32, i1>
// MOV-NEXT: neura.ctrl_mov %37 -> %18 : !neura.data<f32, i1> !neura.data<f32, i1>
// MOV-NEXT: %38 = "neura.data_mov"(%26) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %39 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
// MOV-NEXT: %40 = neura.grant_predicate %38, %39 : !neura.data<i64, i1>, !neura.data<i1, i1> -> !neura.data<i64, i1>
// MOV-NEXT: neura.ctrl_mov %40 -> %15 : !neura.data<i64, i1> !neura.data<i64, i1>
// MOV-NEXT: %41 = "neura.data_mov"(%34) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: "neura.return"(%41) : (!neura.data<f32, i1>) -> ()
// MOV-NEXT: }
The difference is that the two phi operations and the corresponding reserve operations appear in different orders. But from ir's view, this does not affect the DFG dependencies.
And now, when I try to use the --map-to-accelerator, it will reach the maxII and exit without generating a legal mapping result.
There was a problem hiding this comment.
Got it! Sorry for the problem caused by my unorthodox development process.
No worry :-)
it will reach the maxII and exit without generating a legal mapping result.
It is probably due to unoptimized mapping strategy:
- Only pick the lowest cost (highest award) tile to map and ignore all other candidate tiles: https://github.com/coredac/dataflow/blob/c58754e23fb1c4349aefd99054b107342e703167/lib/NeuraDialect/Mapping/mapping_util.cpp#L378-L390
- No backtracking, i.e., if that placement or route failed, it just return false for that specific II, without back track to another tile and retry placement and route
Are you able to make a clean PR to just try to make the backtrackable mapping work to restore this test? No need to be super sophisticated though, #59
There was a problem hiding this comment.
Sure! I will do it ASAP.
Refactor the Ctrl to Data Flow Implementation Logic
In this PR, an edge-based control flow to data flow transform pass is implemented.
Basically, we can categorize all the edges in CFG into the following 8 cases:
Cases 3 and 4 do not appear in the current benchmarks. Since they correspond to control flow jumps for statements like
goto, we do not consider these cases for now.The transform is implemented based on the remaining six edges.
For the target block of the edge in case 7, we chose to
grant_predicateall results in this block based on condition to ensure correctness. For example:target block
bb2:transformed ir:
We
grant_predicatethe result ofbb2--%13with the condition of its pred block.