Skip to content

Refactor ctrl2data pass#114

Closed
ShangkunLi wants to merge 1 commit intocoredac:mainfrom
ShangkunLi:revise-ctrl2data
Closed

Refactor ctrl2data pass#114
ShangkunLi wants to merge 1 commit intocoredac:mainfrom
ShangkunLi:revise-ctrl2data

Conversation

@ShangkunLi
Copy link
Copy Markdown
Collaborator

In the existing implementation, we can't ensure the correctness of the value in blocks that are behind the nested loop control/branch control.

There are two examples here:
Example 1 (nested loop)

  func.func @_Z10bert_node1PA1_A1_A1_A1_A128_bPA1_A128_S1_(%arg0: memref<?x1x1x1x1x128xi8>, %arg1: memref<?x1x128x1x1x128xi8>) attributes {accelerator = "neura"} {
    %0 = "neura.constant"() <{predicate = true, value = "%arg0"}> : () -> memref<?x1x1x1x1x128xi8>
    %1 = "neura.constant"() <{predicate = true, value = "%arg1"}> : () -> memref<?x1x128x1x1x128xi8>
    %2 = "neura.constant"() <{value = 1 : i64}> : () -> i64
    %3 = "neura.constant"() <{value = 128 : i64}> : () -> i64
    %4 = "neura.constant"() <{value = 0 : i64}> : () -> i64
    neura.br %4, %3 : i64, i64 to ^bb1
  ^bb1(%5: i64, %6: i64):  // 2 preds: ^bb0, ^bb5
    %7 = "neura.icmp"(%5, %6) <{cmpType = "slt"}> : (i64, i64) -> i1
    neura.cond_br %7 : i1 then %4 : i64 to ^bb2 else to ^bb6
  ^bb2(%8: i64):  // pred: ^bb1
    neura.br %8, %3 : i64, i64 to ^bb3
  ^bb3(%9: i64, %10: i64):  // 2 preds: ^bb2, ^bb4
    %11 = "neura.icmp"(%9, %10) <{cmpType = "slt"}> : (i64, i64) -> i1
    neura.cond_br %11 : i1 then %0, %4, %9, %1, %5, %2, %3 : memref<?x1x1x1x1x128xi8>, i64, i64, memref<?x1x128x1x1x128xi8>, i64, i64, i64 to ^bb4 else %5, %2, %3 : i64, i64, i64 to ^bb5
  ^bb4(%12: memref<?x1x1x1x1x128xi8>, %13: i64, %14: i64, %15: memref<?x1x128x1x1x128xi8>, %16: i64, %17: i64, %18: i64):  // pred: ^bb3
    %19 = neura.load_indexed %12[%13, %13, %13, %13, %13, %14 : i64, i64, i64, i64, i64, i64] memref<?x1x1x1x1x128xi8> : i8
    neura.store_indexed %19 to %15[%13, %13, %16, %13, %13, %14 : i64, i64, i64, i64, i64, i64] memref<?x1x128x1x1x128xi8> : i8
    %20 = "neura.add"(%14, %17) : (i64, i64) -> i64
    neura.br %20, %18 : i64, i64 to ^bb3
  ^bb5(%21: i64, %22: i64, %23: i64):  // pred: ^bb3
    %24 = "neura.add"(%21, %22) : (i64, i64) -> i64
    neura.br %24, %23 : i64, i64 to ^bb1
  ^bb6:  // pred: ^bb1
    "neura.return"() : () -> ()
  }

The predicate bit of the upper bound (%10) of the second loop compare %11 = "neura.icmp"(%9, %10) <{cmpType = "slt"}> : (i64, i64) -> i1 should be determined by the outer loop. If the outer loop is not valid, %10 should be invalid. But current implementation simply uses the %3 without considering %7 = "neura.icmp"(%5, %6) <{cmpType = "slt"}> : (i64, i64) -> i1.

Example 2 (for + if):

func.func @_Z11for_with_ifPi(%arg0: memref<?xi32>) -> i32 attributes {accelerator = "neura", llvm.linkage = #llvm.linkage<external>} {
    %0 = "neura.constant"() <{predicate = true, value = 0 : i32}> : () -> i32
    %1 = "neura.constant"() <{predicate = true, value = 1000 : i32}> : () -> i32
    %2 = "neura.constant"() <{predicate = true, value = 2 : i32}> : () -> i32
    %3 = "neura.constant"() <{predicate = true, value = 1 : i32}> : () -> i32
    %4 = "neura.constant"() <{predicate = true, value = -5 : i32}> : () -> i32
    %5 = "neura.constant"() <{predicate = true, value = 1 : index}> : () -> index
    %6 = "neura.constant"() <{predicate = true, value = 128 : index}> : () -> index
    %7 = "neura.constant"() <{predicate = true, value = 0 : index}> : () -> index
    %8 = "neura.cast"(%7) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %8, %0 : i64, i32 to ^bb1
  ^bb1(%9: i64, %10: i32):  // 2 preds: ^bb0, ^bb6
    %11 = "neura.cast"(%9) <{cast_type = "int_to_index"}> : (i64) -> index
    %12 = "neura.icmp"(%11, %6) <{cmpType = "slt"}> : (index, index) -> i1
    neura.cond_br %12 : i1 then to ^bb2 else to ^bb7
  ^bb2:  // pred: ^bb1
    %13 = "neura.icmp"(%10, %1) <{cmpType = "sge"}> : (i32, i32) -> i1
    neura.cond_br %13 : i1 then to ^bb3 else to ^bb4
  ^bb3:  // pred: ^bb2
    %14 = "neura.add"(%10, %4) : (i32, i32) -> i32
    neura.br %14 : i32 to ^bb5
  ^bb4:  // pred: ^bb2
    neura.br %10 : i32 to ^bb5
  ^bb5(%15: i32):  // 2 preds: ^bb3, ^bb4
    neura.br to ^bb6
  ^bb6:  // pred: ^bb5
    %16 = neura.load_indexed %arg0[%11 : index] memref<?xi32> : i32
    %17 = "neura.mul"(%16, %2) : (i32, i32) -> i32
    %18 = "neura.add"(%17, %3) : (i32, i32) -> i32
    %19 = "neura.add"(%15, %18) : (i32, i32) -> i32
    %20 = "neura.add"(%11, %5) : (index, index) -> index
    %21 = "neura.cast"(%20) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %21, %19 : i64, i32 to ^bb1
  ^bb7:  // pred: ^bb1
    "neura.return"(%10) : (i32) -> ()
  }

Current implementation cannot grant_predicate live-in values in bb6 because the execution condition of bb6 is the same as bb5. And bb5 has two predecessor blocks; these two predecessors have different execution conditions (e.g., bb3 -> %12 AND %13, bb4 -> %12 AND !%13).

In this pr:

  • We add a calculateBlockExecuteConditions() function to calculate each blocks execution condition
  • Then we can grant_predicate all the live-ins (except live-ins operands in cond_br, because they are guranteed by the condition in cond_br) in each block based on the execution condition

@ShangkunLi ShangkunLi marked this pull request as ready for review August 19, 2025 07:31
@ShangkunLi ShangkunLi added bug Something isn't working new feature New feature or request labels Aug 19, 2025
@ShangkunLi ShangkunLi self-assigned this Aug 19, 2025
@ShangkunLi ShangkunLi linked an issue Aug 19, 2025 that may be closed by this pull request
@ShangkunLi ShangkunLi requested a review from tancheng August 19, 2025 10:26
@tancheng
Copy link
Copy Markdown
Contributor

Hi @ShangkunLi,

  • In example 1, why do we have ^bb2(%8: i64): // pred: ^bb1? Shouldn't we apply canonicalization pass to make bb2 have two arguments?
  • Similarly to example 2, many bbs have no arguments.

let summary = "Bitwise OR operation";
let arguments = (ins AnySignlessInteger:$lhs, AnySignlessInteger:$rhs, Optional<AnyType>:$predicate);
let results = (outs AnySignlessInteger:$result);
let arguments = (ins AnyType:$lhs, AnyType:$rhs, Optional<AnyType>:$predicate);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned in another PR, i.e., do you think we still need this Optional<AnyType>:$predicate? our data now has the predicate bit.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need such Optional<AnyType>:$predicate anymore, we can clean them up together with the predicate attribute in neura.constant.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we may only need grant_predicate has a predicate as operand, which is not optional there. Please file an issue for this cleanup, thanks~!

@ShangkunLi
Copy link
Copy Markdown
Collaborator Author

Hi @ShangkunLi,

  • In example 1, why do we have ^bb2(%8: i64): // pred: ^bb1? Shouldn't we apply canonicalization pass to make bb2 have two arguments?
  • Similarly to example 2, many bbs have no arguments.

Sorry, there is a bug in the current --canonicalize-live-in pass. Will fix it soon.

@tancheng
Copy link
Copy Markdown
Contributor

Hi @ShangkunLi,

  • In example 1, why do we have ^bb2(%8: i64): // pred: ^bb1? Shouldn't we apply canonicalization pass to make bb2 have two arguments?
  • Similarly to example 2, many bbs have no arguments.

Sorry, there is a bug in the current --canonicalize-live-in pass. Will fix it soon.

Shouldn't we fix --canonicalize-live-in pass first, then start from there to resolve any issue/bug during ctrl2dataflow? I just feel your description "The predicate bit of the upper bound (%10) of the second loop compare %11 = "neura.icmp"(%9, %10) <{cmpType = "slt"}> : (i64, i64) -> i1 should be determined by the outer loop." sounds would change if --canonicalize-live-in checked in? Still a bit confused about the problem there.

@ShangkunLi
Copy link
Copy Markdown
Collaborator Author

ShangkunLi commented Aug 19, 2025

Hi @ShangkunLi,

  • In example 1, why do we have ^bb2(%8: i64): // pred: ^bb1? Shouldn't we apply canonicalization pass to make bb2 have two arguments?
  • Similarly to example 2, many bbs have no arguments.

Sorry, there is a bug in the current --canonicalize-live-in pass. Will fix it soon.

Shouldn't we fix --canonicalize-live-in pass first, then start from there to resolve any issue/bug during ctrl2dataflow? I just feel your description "The predicate bit of the upper bound (%10) of the second loop compare %11 = "neura.icmp"(%9, %10) <{cmpType = "slt"}> : (i64, i64) -> i1 should be determined by the outer loop." sounds would change if --canonicalize-live-in checked in? Still a bit confused about the problem there.

Okay, I will split this pr as two prs. Fix the canonicalize-live-in and then enable this more robust transform.

@tancheng
Copy link
Copy Markdown
Contributor

Where is the udpated IR? ^bb2(%8: i64) still the same.

@ShangkunLi
Copy link
Copy Markdown
Collaborator Author

Where is the udpated IR? ^bb2(%8: i64) still the same.

Sorry, typo. Haven't updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working new feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[P1] Transform Ctrl to Data Flow Error

2 participants