Refactor the Ctrl to Data Flow Implementation Logic by ShangkunLi · Pull Request #60 · coredac/neura

ShangkunLi · 2025-06-29T08:44:45Z

In this PR, an edge-based control flow to data flow transform pass is implemented.

Basically, we can categorize all the edges in CFG into the following 8 cases:

Backward cond_br edges with values.
Backward br edges with values.
Backward cond_br without values.
Backward br without value.
Forward cond_br edges with values.
Forward br edges with values.
Forward cond_br edges without values.
Forward br edges without values.

Cases 3 and 4 do not appear in the current benchmarks. Since they correspond to control flow jumps for statements like goto, we do not consider these cases for now.

The transform is implemented based on the remaining six edges.

For the target block of the edge in case 7, we chose to grant_predicate all results in this block based on condition to ensure correctness. For example:

target block bb2:

^bb1(%4: !neura.data<i64, i1>):  // 2 preds: ^bb0, ^bb5
    %5 = "neura.cast"(%4) <{cast_type = "int_to_index"}> : (!neura.data<i64, i1>) -> !neura.data<index, i1>
    %6 = "neura.icmp"(%5, %1) <{cmpType = "slt"}> : (!neura.data<index, i1>, !neura.data<index, i1>) -> !neura.data<i1, i1>
    neura.cond_br %6 : !neura.data<i1, i1> then to ^bb2 else to ^bb6
^bb2:  // pred: ^bb1
    %7 = "neura.cast"(%2) <{cast_type = "index_to_int"}> : (!neura.data<index, i1>) -> !neura.data<i64, i1>
    neura.br %7 : !neura.data<i64, i1> to ^bb3

transformed ir:

%11 = "neura.icmp"(%10, %3) <{cmpType = "slt"}> : (!neura.data<index, i1>, !neura.data<index, i1>) -> !neura.data<i1, i1>
%12 = "neura.not"(%11) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
%13 = "neura.cast"(%5) <{cast_type = "index_to_int"}> : (!neura.data<index, i1>) -> !neura.data<i64, i1>
%14 = neura.grant_predicate %13, %11 : !neura.data<i64, i1>, !neura.data<i1, i1> -> !neura.data<i64, i1>

We grant_predicate the result of bb2 -- %13 with the condition of its pred block.

tancheng · 2025-06-29T15:17:34Z

Excellent summarization @ShangkunLi.

For the target block of the edge in case 7, we chose to grant_predicate all results in this block based on condition to ensure correctness.

Shouldn't we grant_predicate all live-out results of bb1? Your example shows the bb2's live-out is predicated based on bb1's condition. This is some kind of indirect? WDYT?

tancheng · 2025-06-29T15:22:04Z

Ah I got it. bb2 is dominated by bb1 but the its live-in is from another block. In such case, shouldn't we grant_predicate to bb2's live-in?

ShangkunLi · 2025-06-29T15:33:01Z

Ah I got it. bb2 is dominated by bb1 but the its live-in is from another block. In such case, shouldn't we grant_predicate to bb2's live-in?

You mean grant_predicate all the live-in values used in the block? But in many benchmarks, most live-ins are granted always. Like %2 in bb2, %2 and %0 in bb4. So do we need to grant_predicate such grant_always values? Or just grant_predicat each result of operations in such block?

func.func @_Z10bert_node1PA1_A1_A1_A1_A128_bPA1_A128_S1_(%arg0: memref<?x1x1x1x1x128xi8>, %arg1: memref<?x1x128x1x1x128xi8>) attributes {accelerator = "neura", llvm.linkage = #llvm.linkage<external>} {
    %0 = "neura.constant"() <{value = 1 : index}> : () -> index
    %1 = "neura.constant"() <{value = 128 : index}> : () -> index
    %2 = "neura.constant"() <{value = 0 : index}> : () -> index
    %3 = "neura.cast"(%2) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %3 : i64 to ^bb1
^bb1(%4: i64):  // 2 preds: ^bb0, ^bb5
    %5 = "neura.cast"(%4) <{cast_type = "int_to_index"}> : (i64) -> index
    %6 = "neura.icmp"(%5, %1) <{cmpType = "slt"}> : (index, index) -> i1
    neura.cond_br %6 : i1 then to ^bb2 else to ^bb6
^bb2:  // pred: ^bb1
    %7 = "neura.cast"(%2) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %7 : i64 to ^bb3
^bb3(%8: i64):  // 2 preds: ^bb2, ^bb4
    %9 = "neura.cast"(%8) <{cast_type = "int_to_index"}> : (i64) -> index
    %10 = "neura.icmp"(%9, %1) <{cmpType = "slt"}> : (index, index) -> i1
    neura.cond_br %10 : i1 then to ^bb4 else to ^bb5
^bb4:  // pred: ^bb3
    %11 = neura.load_indexed %arg0[%2, %2, %2, %2, %2, %9 : index, index, index, index, index, index] memref<?x1x1x1x1x128xi8> : i8
    neura.store_indexed %11 to %arg1[%2, %2, %5, %2, %2, %9 : index, index, index, index, index, index] memref<?x1x128x1x1x128xi8> : i8
    %12 = "neura.add"(%9, %0) : (index, index) -> index
    %13 = "neura.cast"(%12) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %13 : i64 to ^bb3
^bb5:  // pred: ^bb3
    %14 = "neura.add"(%5, %0) : (index, index) -> index
    %15 = "neura.cast"(%14) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %15 : i64 to ^bb1
^bb6:  // pred: ^bb1
    "neura.return"() : () -> ()
}

tancheng · 2025-06-29T15:56:00Z

Ah I got it. bb2 is dominated by bb1 but the its live-in is from another block. In such case, shouldn't we grant_predicate to bb2's live-in?

You mean grant_predicate all the live-in values used in the block? But in many benchmarks, most live-ins are granted always. Like %2 in bb2, %2 and %0 in bb4. So do we need to grant_predicate such grant_always values? Or just grant_predicat each result of operations in such block?

func.func @_Z10bert_node1PA1_A1_A1_A1_A128_bPA1_A128_S1_(%arg0: memref<?x1x1x1x1x128xi8>, %arg1: memref<?x1x128x1x1x128xi8>) attributes {accelerator = "neura", llvm.linkage = #llvm.linkage<external>} {
    %0 = "neura.constant"() <{value = 1 : index}> : () -> index
    %1 = "neura.constant"() <{value = 128 : index}> : () -> index
    %2 = "neura.constant"() <{value = 0 : index}> : () -> index
    %3 = "neura.cast"(%2) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %3 : i64 to ^bb1
^bb1(%4: i64):  // 2 preds: ^bb0, ^bb5
    %5 = "neura.cast"(%4) <{cast_type = "int_to_index"}> : (i64) -> index
    %6 = "neura.icmp"(%5, %1) <{cmpType = "slt"}> : (index, index) -> i1
    neura.cond_br %6 : i1 then to ^bb2 else to ^bb6
^bb2:  // pred: ^bb1
    %7 = "neura.cast"(%2) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %7 : i64 to ^bb3
^bb3(%8: i64):  // 2 preds: ^bb2, ^bb4
    %9 = "neura.cast"(%8) <{cast_type = "int_to_index"}> : (i64) -> index
    %10 = "neura.icmp"(%9, %1) <{cmpType = "slt"}> : (index, index) -> i1
    neura.cond_br %10 : i1 then to ^bb4 else to ^bb5
^bb4:  // pred: ^bb3
    %11 = neura.load_indexed %arg0[%2, %2, %2, %2, %2, %9 : index, index, index, index, index, index] memref<?x1x1x1x1x128xi8> : i8
    neura.store_indexed %11 to %arg1[%2, %2, %5, %2, %2, %9 : index, index, index, index, index, index] memref<?x1x128x1x1x128xi8> : i8
    %12 = "neura.add"(%9, %0) : (index, index) -> index
    %13 = "neura.cast"(%12) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %13 : i64 to ^bb3
^bb5:  // pred: ^bb3
    %14 = "neura.add"(%5, %0) : (index, index) -> index
    %15 = "neura.cast"(%14) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %15 : i64 to ^bb1
^bb6:  // pred: ^bb1
    "neura.return"() : () -> ()
}

I believe we can grant_predicate further on the already grant_always, i.e., all the live-in of bb2, rather than the result of bb2.

I think the rule is to (use bb1 with cond_br as an example) grant_predicate all bb1's live-out and all true-successor bb's (i.e., bb2) live-ins, WDYT? (and grant_predicate(NOT) on false-successor, i.e., bb6, though nothing to predicate for it as it has no argument in this IR example)
If one const/grant_always is used by multiple BBs being dominated by different conditions, we can grant_predicate them based on the specific condition
For now, we blindly give grant_always for all the const inside the entry block, we later can fuse the grant_always -> grant_predicate

How does this sound?

ShangkunLi · 2025-06-29T15:58:14Z

Ah I got it. bb2 is dominated by bb1 but the its live-in is from another block. In such case, shouldn't we grant_predicate to bb2's live-in?

You mean grant_predicate all the live-in values used in the block? But in many benchmarks, most live-ins are granted always. Like %2 in bb2, %2 and %0 in bb4. So do we need to grant_predicate such grant_always values? Or just grant_predicat each result of operations in such block?
func.func @_Z10bert_node1PA1_A1_A1_A1_A128_bPA1_A128_S1_(%arg0: memref<?x1x1x1x1x128xi8>, %arg1: memref<?x1x128x1x1x128xi8>) attributes {accelerator = "neura", llvm.linkage = #llvm.linkage<external>} {
    %0 = "neura.constant"() <{value = 1 : index}> : () -> index
    %1 = "neura.constant"() <{value = 128 : index}> : () -> index
    %2 = "neura.constant"() <{value = 0 : index}> : () -> index
    %3 = "neura.cast"(%2) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %3 : i64 to ^bb1
^bb1(%4: i64):  // 2 preds: ^bb0, ^bb5
    %5 = "neura.cast"(%4) <{cast_type = "int_to_index"}> : (i64) -> index
    %6 = "neura.icmp"(%5, %1) <{cmpType = "slt"}> : (index, index) -> i1
    neura.cond_br %6 : i1 then to ^bb2 else to ^bb6
^bb2:  // pred: ^bb1
    %7 = "neura.cast"(%2) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %7 : i64 to ^bb3
^bb3(%8: i64):  // 2 preds: ^bb2, ^bb4
    %9 = "neura.cast"(%8) <{cast_type = "int_to_index"}> : (i64) -> index
    %10 = "neura.icmp"(%9, %1) <{cmpType = "slt"}> : (index, index) -> i1
    neura.cond_br %10 : i1 then to ^bb4 else to ^bb5
^bb4:  // pred: ^bb3
    %11 = neura.load_indexed %arg0[%2, %2, %2, %2, %2, %9 : index, index, index, index, index, index] memref<?x1x1x1x1x128xi8> : i8
    neura.store_indexed %11 to %arg1[%2, %2, %5, %2, %2, %9 : index, index, index, index, index, index] memref<?x1x128x1x1x128xi8> : i8
    %12 = "neura.add"(%9, %0) : (index, index) -> index
    %13 = "neura.cast"(%12) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %13 : i64 to ^bb3
^bb5:  // pred: ^bb3
    %14 = "neura.add"(%5, %0) : (index, index) -> index
    %15 = "neura.cast"(%14) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %15 : i64 to ^bb1
^bb6:  // pred: ^bb1
    "neura.return"() : () -> ()
}
I believe we can grant_predicate further on the already grant_always, i.e., all the live-in of bb2, rather than the result of bb2.

I think the rule is to (use bb1 with cond_br as an example) grant_predicate all bb1's live-out and all true-successor bb's live-in, WDYT?

If one const/grant_always is used by multiple BBs being dominated by different conditions, we can grant_predicate them based on the specific condition

For now, we blindly give grant_always for all the const inside the entry block, we later can fuse the grant_always -> grant_predicate

How does this sound?

Sure, this sounds more robust!

ShangkunLi · 2025-06-29T17:56:38Z

Fix the transform logic for forward cond_br edges without values.

lib/NeuraDialect/Transforms/TransformCtrlToDataFlowPass.cpp

test/neura/ctrl/branch_for.mlir

test/neura/ctrl/branch_with_and_without_arg.mlir

test/neura/ctrl/branch_without_arg.mlir

lib/NeuraDialect/Transforms/TransformCtrlToDataFlowPass.cpp

tancheng · 2025-07-05T07:51:22Z

test/neura/ctrl/branch_for.mlir

Hi @ShangkunLi, I just noticed you disabled these tests, can you plz help restore them?

Sure, will enable in the next pr. I am coding on the ctrl-flow fusion now.

Hmmm, the --insert-data-mov can work and can generate correct intermediate ir, but the --map-to-accelerator doesn't work now.... I may try to solve this.

Thanks for investigation.

We'd better avoid disabling tests, especially when there are more than one contributor, and some others are using the repo to ramp up

We prefer "tiny" PRs. One PR could only target single functionality or function, it might touch a lot of tests, which is fine, but the funtionality-wise, it should be "tiny"

If it is a large project, you could make a chain of branches, make PRs gradually

--insert-data-mov can work and can generate correct intermediate ir

You mean the intermediate ir is exactly the same with or without your changes (this PR or your current implementation)?

but the --map-to-accelerator doesn't work now

Hanging or crash or other issue?

We'd better avoid disabling tests, especially when there are more than one contributor, and some others are using the repo to ramp up

Got it! Sorry for the problem caused by my unorthodox development process.

For the intermediate ir generated by the --insert-data-mov now is

func.func @loop_test() -> f32 attributes {accelerator = "neura"} { %0 = "neura.constant"() <{predicate = true, value = 10 : i64}> : () -> !neura.data<i64, i1> %1 = "neura.data_mov"(%0) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> %2 = "neura.grant_always"(%1) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> %3 = "neura.constant"() <{predicate = true, value = 0 : i64}> : () -> !neura.data<i64, i1> %4 = "neura.data_mov"(%3) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> %5 = "neura.grant_once"(%4) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> %6 = "neura.constant"() <{predicate = true, value = 1 : i64}> : () -> !neura.data<i64, i1> %7 = "neura.data_mov"(%6) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> %8 = "neura.grant_always"(%7) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> %9 = "neura.constant"() <{predicate = true, value = 3.000000e+00 : f32}> : () -> !neura.data<f32, i1> %10 = "neura.data_mov"(%9) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> %11 = "neura.grant_always"(%10) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> %12 = "neura.constant"() <{predicate = true, value = 0.000000e+00 : f32}> : () -> !neura.data<f32, i1> %13 = "neura.data_mov"(%12) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> %14 = "neura.grant_once"(%13) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> %15 = neura.reserve : !neura.data<f32, i1> %16 = "neura.data_mov"(%14) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> %17 = "neura.phi"(%15, %16) : (!neura.data<f32, i1>, !neura.data<f32, i1>) -> !neura.data<f32, i1> %18 = neura.reserve : !neura.data<i64, i1> %19 = "neura.data_mov"(%5) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> %20 = "neura.phi"(%18, %19) : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i64, i1> %21 = "neura.data_mov"(%17) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> %22 = "neura.data_mov"(%11) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> %23 = "neura.fadd"(%21, %22) : (!neura.data<f32, i1>, !neura.data<f32, i1>) -> !neura.data<f32, i1> %24 = "neura.data_mov"(%20) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> %25 = "neura.data_mov"(%8) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> %26 = "neura.add"(%24, %25) : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i64, i1> %27 = "neura.data_mov"(%26) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> %28 = "neura.data_mov"(%2) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> %29 = "neura.icmp"(%27, %28) <{cmpType = "slt"}> : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i1, i1> %30 = "neura.data_mov"(%26) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> %31 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1> %32 = neura.grant_predicate %30, %31 : !neura.data<i64, i1>, !neura.data<i1, i1> -> !neura.data<i64, i1> neura.ctrl_mov %32 -> %18 : !neura.data<i64, i1> !neura.data<i64, i1> %33 = "neura.data_mov"(%23) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> %34 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1> %35 = neura.grant_predicate %33, %34 : !neura.data<f32, i1>, !neura.data<i1, i1> -> !neura.data<f32, i1> neura.ctrl_mov %35 -> %15 : !neura.data<f32, i1> !neura.data<f32, i1> %36 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1> %37 = "neura.not"(%36) : (!neura.data<i1, i1>) -> !neura.data<i1, i1> %38 = "neura.data_mov"(%23) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> %39 = "neura.data_mov"(%37) : (!neura.data<i1, i1>) -> !neura.data<i1, i1> %40 = neura.grant_predicate %38, %39 : !neura.data<f32, i1>, !neura.data<i1, i1> -> !neura.data<f32, i1> %41 = "neura.data_mov"(%40) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> "neura.return"(%41) : (!neura.data<f32, i1>) -> () }

compared with the former one

// MOV: func.func @loop_test() -> f32 attributes {accelerator = "neura"} { // MOV-NEXT: %0 = "neura.constant"() <{predicate = true, value = 10 : i64}> : () -> !neura.data<i64, i1> // MOV-NEXT: %1 = "neura.data_mov"(%0) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> // MOV-NEXT: %2 = "neura.grant_always"(%1) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> // MOV-NEXT: %3 = "neura.constant"() <{predicate = true, value = 0 : i64}> : () -> !neura.data<i64, i1> // MOV-NEXT: %4 = "neura.data_mov"(%3) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> // MOV-NEXT: %5 = "neura.grant_once"(%4) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> // MOV-NEXT: %6 = "neura.constant"() <{predicate = true, value = 1 : i64}> : () -> !neura.data<i64, i1> // MOV-NEXT: %7 = "neura.data_mov"(%6) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> // MOV-NEXT: %8 = "neura.grant_always"(%7) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> // MOV-NEXT: %9 = "neura.constant"() <{predicate = true, value = 3.000000e+00 : f32}> : () -> !neura.data<f32, i1> // MOV-NEXT: %10 = "neura.data_mov"(%9) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> // MOV-NEXT: %11 = "neura.grant_always"(%10) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> // MOV-NEXT: %12 = "neura.constant"() <{predicate = true, value = 0.000000e+00 : f32}> : () -> !neura.data<f32, i1> // MOV-NEXT: %13 = "neura.data_mov"(%12) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> // MOV-NEXT: %14 = "neura.grant_once"(%13) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> // MOV-NEXT: %15 = neura.reserve : !neura.data<i64, i1> // MOV-NEXT: %16 = "neura.data_mov"(%5) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> // MOV-NEXT: %17 = "neura.phi"(%16, %15) : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i64, i1> // MOV-NEXT: %18 = neura.reserve : !neura.data<f32, i1> // MOV-NEXT: %19 = "neura.data_mov"(%14) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> // MOV-NEXT: %20 = "neura.phi"(%19, %18) : (!neura.data<f32, i1>, !neura.data<f32, i1>) -> !neura.data<f32, i1> // MOV-NEXT: %21 = "neura.data_mov"(%20) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> // MOV-NEXT: %22 = "neura.data_mov"(%11) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> // MOV-NEXT: %23 = "neura.fadd"(%21, %22) : (!neura.data<f32, i1>, !neura.data<f32, i1>) -> !neura.data<f32, i1> // MOV-NEXT: %24 = "neura.data_mov"(%17) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> // MOV-NEXT: %25 = "neura.data_mov"(%8) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> // MOV-NEXT: %26 = "neura.add"(%24, %25) : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i64, i1> // MOV-NEXT: %27 = "neura.data_mov"(%26) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> // MOV-NEXT: %28 = "neura.data_mov"(%2) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> // MOV-NEXT: %29 = "neura.icmp"(%27, %28) <{cmpType = "slt"}> : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i1, i1> // MOV-NEXT: %30 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1> // MOV-NEXT: %31 = "neura.not"(%30) : (!neura.data<i1, i1>) -> !neura.data<i1, i1> // MOV-NEXT: %32 = "neura.data_mov"(%23) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> // MOV-NEXT: %33 = "neura.data_mov"(%31) : (!neura.data<i1, i1>) -> !neura.data<i1, i1> // MOV-NEXT: %34 = neura.grant_predicate %32, %33 : !neura.data<f32, i1>, !neura.data<i1, i1> -> !neura.data<f32, i1> // MOV-NEXT: %35 = "neura.data_mov"(%23) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> // MOV-NEXT: %36 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1> // MOV-NEXT: %37 = neura.grant_predicate %35, %36 : !neura.data<f32, i1>, !neura.data<i1, i1> -> !neura.data<f32, i1> // MOV-NEXT: neura.ctrl_mov %37 -> %18 : !neura.data<f32, i1> !neura.data<f32, i1> // MOV-NEXT: %38 = "neura.data_mov"(%26) : (!neura.data<i64, i1>) -> !neura.data<i64, i1> // MOV-NEXT: %39 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1> // MOV-NEXT: %40 = neura.grant_predicate %38, %39 : !neura.data<i64, i1>, !neura.data<i1, i1> -> !neura.data<i64, i1> // MOV-NEXT: neura.ctrl_mov %40 -> %15 : !neura.data<i64, i1> !neura.data<i64, i1> // MOV-NEXT: %41 = "neura.data_mov"(%34) : (!neura.data<f32, i1>) -> !neura.data<f32, i1> // MOV-NEXT: "neura.return"(%41) : (!neura.data<f32, i1>) -> () // MOV-NEXT: }

The difference is that the two phi operations and the corresponding reserve operations appear in different orders. But from ir's view, this does not affect the DFG dependencies.

And now, when I try to use the --map-to-accelerator, it will reach the maxII and exit without generating a legal mapping result.

Got it! Sorry for the problem caused by my unorthodox development process.

No worry :-)

it will reach the maxII and exit without generating a legal mapping result.

It is probably due to unoptimized mapping strategy:

Only pick the lowest cost (highest award) tile to map and ignore all other candidate tiles: https://github.com/coredac/dataflow/blob/c58754e23fb1c4349aefd99054b107342e703167/lib/NeuraDialect/Mapping/mapping_util.cpp#L378-L390

No backtracking, i.e., if that placement or route failed, it just return false for that specific II, without back track to another tile and retry placement and route

Are you able to make a clean PR to just try to make the backtrackable mapping work to restore this test? No need to be super sophisticated though, #59

Sure! I will do it ASAP.

Refactor the Ctrl to Data Flow Implementation Logic

ShangkunLi marked this pull request as ready for review June 29, 2025 15:14

ShangkunLi requested a review from tancheng June 29, 2025 15:14

ShangkunLi closed this Jun 29, 2025

ShangkunLi reopened this Jun 29, 2025

tancheng reviewed Jun 29, 2025

View reviewed changes

lib/NeuraDialect/Transforms/TransformCtrlToDataFlowPass.cpp Show resolved Hide resolved

tancheng reviewed Jul 1, 2025

View reviewed changes

ShangkunLi added 5 commits July 1, 2025 18:59

Enbale nested loop ctrl2data & non-argument cond_br

2f39625

Refactor the ctrl2data logic

fb88dc5

[fix] fix the logic for no-value cond_br edges

dc6b65c

add live-out assertion code & refactor the test scripts

bd26412

temporarily disable the mapping and codegen in branch_for

47ca8f9

ShangkunLi force-pushed the refactor-ctrl2data branch from 1ed1e8c to 47ca8f9 Compare July 1, 2025 11:23

tancheng approved these changes Jul 1, 2025

View reviewed changes

lib/NeuraDialect/Transforms/TransformCtrlToDataFlowPass.cpp Show resolved Hide resolved

lib/NeuraDialect/Transforms/TransformCtrlToDataFlowPass.cpp Outdated Show resolved Hide resolved

ShangkunLi mentioned this pull request Jul 1, 2025

[P1] Enbale a deterministic ctrol to data flow conversion #64

Closed

ShangkunLi added 2 commits July 1, 2025 22:56

[fix] typo

49f805f

add issue link

adf5b20

ShangkunLi merged commit 1d30fcf into coredac:main Jul 1, 2025
1 check passed

ShangkunLi mentioned this pull request Jul 2, 2025

[P1] Fix the problem when lowering nested loop into data flow #54

Closed

tancheng reviewed Jul 5, 2025

View reviewed changes

ShangkunLi added a commit that referenced this pull request Mar 12, 2026

Merge pull request #60 from ShangkunLi/refactor-ctrl2data

394efff

Refactor the Ctrl to Data Flow Implementation Logic

Conversation

ShangkunLi commented Jun 29, 2025

Uh oh!

tancheng commented Jun 29, 2025

Uh oh!

tancheng commented Jun 29, 2025

Uh oh!

ShangkunLi commented Jun 29, 2025

Uh oh!

tancheng commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShangkunLi commented Jun 29, 2025

Uh oh!

ShangkunLi commented Jun 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tancheng Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

ShangkunLi Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

ShangkunLi Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

tancheng Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

ShangkunLi Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

tancheng Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

ShangkunLi Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tancheng commented Jun 29, 2025 •

edited

Loading