Enable nested loop ctrl2data flow transforms#57
Enable nested loop ctrl2data flow transforms#57ShangkunLi wants to merge 1 commit intocoredac:mainfrom
Conversation
| // operation. | ||
| Location loc = | ||
| block->empty() ? block->getParent()->getLoc() : block->front().getLoc(); | ||
| if (has_block_args) { |
There was a problem hiding this comment.
What we are specializaing here using has_block_args?
There was a problem hiding this comment.
Because we need to handle those blocks that do not have block arguments, like bb4 in this example.
module {
func.func @_Z10bert_node1PA1_A1_A1_A1_A128_bPA1_A128_S1_(%arg0: memref<?x1x1x1x1x128xi8>, %arg1: memref<?x1x128x1x1x128xi8>) attributes {accelerator = "neura", llvm.linkage = #llvm.linkage<external>} {
%0 = "neura.constant"() <{value = 1 : index}> : () -> index
%1 = "neura.constant"() <{value = 128 : index}> : () -> index
%2 = "neura.constant"() <{value = 0 : index}> : () -> index
%3 = "neura.cast"(%2) <{cast_type = "index_to_int"}> : (index) -> i64
neura.br %3 : i64 to ^bb1
^bb1(%4: i64): // 2 preds: ^bb0, ^bb5
%5 = "neura.cast"(%4) <{cast_type = "int_to_index"}> : (i64) -> index
%6 = "neura.icmp"(%5, %1) <{cmpType = "slt"}> : (index, index) -> i1
neura.cond_br %6 : i1 then to ^bb2 else to ^bb6
^bb2: // pred: ^bb1
%7 = "neura.cast"(%2) <{cast_type = "index_to_int"}> : (index) -> i64
neura.br %7 : i64 to ^bb3
^bb3(%8: i64): // 2 preds: ^bb2, ^bb4
%9 = "neura.cast"(%8) <{cast_type = "int_to_index"}> : (i64) -> index
%10 = "neura.icmp"(%9, %1) <{cmpType = "slt"}> : (index, index) -> i1
neura.cond_br %10 : i1 then to ^bb4 else to ^bb5
^bb4: // pred: ^bb3
%11 = neura.load_indexed %arg0[%2, %2, %2, %2, %2, %9 : index, index, index, index, index, index] memref<?x1x1x1x1x128xi8> : i8
neura.store_indexed %11 to %arg1[%2, %2, %5, %2, %2, %9 : index, index, index, index, index, index] memref<?x1x128x1x1x128xi8> : i8
%12 = "neura.add"(%9, %0) : (index, index) -> index
%13 = "neura.cast"(%12) <{cast_type = "index_to_int"}> : (index) -> i64
neura.br %13 : i64 to ^bb3
^bb5: // pred: ^bb3
%14 = "neura.add"(%5, %0) : (index, index) -> index
%15 = "neura.cast"(%14) <{cast_type = "index_to_int"}> : (index) -> i64
neura.br %15 : i64 to ^bb1
^bb6: // pred: ^bb1
"neura.return"() : () -> ()
}
}
The pred block of bb4 is bb3, and we can jump from bb3 to bb4 through the cond_br. So in this implementation, we grant predicate each result in bb4 with the cond of bb3 (i.e., %10). The transformed code looks like
%18 = "neura.icmp"(%17, %3) <{cmpType = "slt"}> : (!neura.data<index, i1>, !neura.data<index, i1>) -> !neura.data<i1, i1>
%19 = "neura.not"(%18) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
%20 = neura.load_indexed %arg0[%5, %5, %5, %5, %5, %17 : !neura.data<index, i1>, !neura.data<index, i1>, !neura.data<index, i1>, !neura.data<index, i1>, !neura.data<index, i1>, !neura.data<index, i1>] memref<?x1x1x1x1x128xi8> : !neura.data<i8, i1>
%21 = neura.grant_predicate %20, %18 : !neura.data<i8, i1>, !neura.data<i1, i1> -> !neura.data<i8, i1>
neura.store_indexed %21 to %arg1[%5, %5, %10, %5, %5, %17 : !neura.data<index, i1>, !neura.data<index, i1>, !neura.data<index, i1>, !neura.data<index, i1>, !neura.data<index, i1>, !neura.data<index, i1>] memref<?x1x128x1x1x128xi8> : !neura.data<i8, i1>
%22 = "neura.add"(%17, %1) : (!neura.data<index, i1>, !neura.data<index, i1>) -> !neura.data<index, i1>
%23 = neura.grant_predicate %22, %18 : !neura.data<index, i1>, !neura.data<i1, i1> -> !neura.data<index, i1>
%24 = "neura.cast"(%23) <{cast_type = "index_to_int"}> : (!neura.data<index, i1>) -> !neura.data<i64, i1>
%25 = neura.grant_predicate %24, %18 : !neura.data<i64, i1>, !neura.data<i1, i1> -> !neura.data<i64, i1>
There was a problem hiding this comment.
- Your test contains
store_indexed, which is derived from thebert_nodexx.mlir, right? We didn't have a test withstore_indexed(except those bert xxx). has_block_argsis robust/enough? what about a BB has block args and also has non-block-arg live-in?- Why there is
%23 = neura.grant_predicate %22->neura.cast(%23)? The dataflow within BB shouldn't need thatgrant_predicate, right?
There was a problem hiding this comment.
Hmmm, I see the problem. Will fix it soon.
|
|
||
|
|
There was a problem hiding this comment.
One additional line is enough.
Can you use an example to explain this in the PR description? |
Sure! In the previous implementation, the transformed You can see that In the new implementation, the transformed ir looks like We grant predicate the result |
|
In this pr:
grant_predicateoperation. (specifically, in previous code, it adds thegrant_predicateoperation on thegrant_alwaysvalue).CMakeLists.txt