Skip to content

Remove non-sense neura::cast ops#91

Merged
ShangkunLi merged 1 commit intocoredac:mainfrom
ShangkunLi:removecast
Jul 22, 2025
Merged

Remove non-sense neura::cast ops#91
ShangkunLi merged 1 commit intocoredac:mainfrom
ShangkunLi:removecast

Conversation

@ShangkunLi
Copy link
Copy Markdown
Collaborator

In this pr:

  • Add a pass to convert all index types to i64
  • Remove non-sense index_to_i64 or i64_to_index cast operations

@ShangkunLi ShangkunLi marked this pull request as ready for review July 22, 2025 09:58
@ShangkunLi ShangkunLi requested a review from tancheng July 22, 2025 09:58
@tancheng tancheng added the new feature New feature or request label Jul 22, 2025
@tancheng
Copy link
Copy Markdown
Contributor

LGTM. But can we try to figure out why this cast is added by which pass for what reason? One reason might be to guarantee the data/value range won't exceed the represent range, especially for a large tensor, however, the range is statically known; but MLIR can support dynamic shape, in which case it is unknown, so cast is reasonable.

@ShangkunLi
Copy link
Copy Markdown
Collaborator Author

LGTM. But can we try to figure out why this cast is added by which pass for what reason? One reason might be to guarantee the data/value range won't exceed the represent range, especially for a large tensor, however, the range is statically known; but MLIR can support dynamic shape, in which case it is unknown, so cast is reasonable.

The index->i64 or i64->index cast ops are introduced by --convert-cf-to-llvm pass. The reason why these cast ops exist is that mlir by default treats index and i64 as different types. So when we want to take the loop index to compute a result, we need to explicitly convert such index type to i64, i32, etc.

The reason why I can remove non-sense cast ops is that I treat all the realizations of index type values as i64 type in our CGRA. Then we can remove most cast ops.

And for other cast ops like index->i32 are converted to i64->i32 to explicitly denote the type conversion. A test example is added in test/controlflow_fuse/simpleloop/simpleloop.mlir

module {
  func.func @_Z10simpleloopv() -> i32 attributes {accelerator = "neura", llvm.linkage = #llvm.linkage<external>} {
    %0 = "neura.constant"() <{predicate = true, value = 1 : i64}> : () -> i64
    %1 = "neura.constant"() <{predicate = true, value = 128 : i64}> : () -> i64
    %2 = "neura.constant"() <{predicate = true, value = 0 : i32}> : () -> i32
    %3 = "neura.constant"() <{predicate = true, value = 0 : i64}> : () -> i64
    neura.br %3, %2 : i64, i32 to ^bb1
  ^bb1(%4: i64, %5: i32):  // 2 preds: ^bb0, ^bb2
    %6 = "neura.icmp"(%4, %1) <{cmpType = "slt"}> : (i64, i64) -> i1
    neura.cond_br %6 : i1 then to ^bb2 else to ^bb3
  ^bb2:  // pred: ^bb1
    %7 = "neura.cast"(%4) <{cast_type = "i64_to_i32"}> : (i64) -> i32
    %8 = "neura.add"(%5, %7) : (i32, i32) -> i32
    %9 = "neura.add"(%4, %0) : (i64, i64) -> i64
    neura.br %9, %8 : i64, i32 to ^bb1
  ^bb3:  // pred: ^bb1
    "neura.return"(%5) : (i32) -> ()
  }
}

@tancheng
Copy link
Copy Markdown
Contributor

LGTM. But can we try to figure out why this cast is added by which pass for what reason? One reason might be to guarantee the data/value range won't exceed the represent range, especially for a large tensor, however, the range is statically known; but MLIR can support dynamic shape, in which case it is unknown, so cast is reasonable.

The index->i64 or i64->index cast ops are introduced by --convert-cf-to-llvm pass. The reason why these cast ops exist is that mlir by default treats index and i64 as different types. So when we want to take the loop index to compute a result, we need to explicitly convert such index type to i64, i32, etc.

The reason why I can remove non-sense cast ops is that I treat all the realizations of index type values as i64 type in our CGRA. Then we can remove most cast ops.

And for other cast ops like index->i32 are converted to i64->i32 to explicitly denote the type conversion. A test example is added in test/controlflow_fuse/simpleloop/simpleloop.mlir

module {
  func.func @_Z10simpleloopv() -> i32 attributes {accelerator = "neura", llvm.linkage = #llvm.linkage<external>} {
    %0 = "neura.constant"() <{predicate = true, value = 1 : i64}> : () -> i64
    %1 = "neura.constant"() <{predicate = true, value = 128 : i64}> : () -> i64
    %2 = "neura.constant"() <{predicate = true, value = 0 : i32}> : () -> i32
    %3 = "neura.constant"() <{predicate = true, value = 0 : i64}> : () -> i64
    neura.br %3, %2 : i64, i32 to ^bb1
  ^bb1(%4: i64, %5: i32):  // 2 preds: ^bb0, ^bb2
    %6 = "neura.icmp"(%4, %1) <{cmpType = "slt"}> : (i64, i64) -> i1
    neura.cond_br %6 : i1 then to ^bb2 else to ^bb3
  ^bb2:  // pred: ^bb1
    %7 = "neura.cast"(%4) <{cast_type = "i64_to_i32"}> : (i64) -> i32
    %8 = "neura.add"(%5, %7) : (i32, i32) -> i32
    %9 = "neura.add"(%4, %0) : (i64, i64) -> i64
    neura.br %9, %8 : i64, i32 to ^bb1
  ^bb3:  // pred: ^bb1
    "neura.return"(%5) : (i32) -> ()
  }
}

Thanks @ShangkunLi, however, if we want to support i64 or any type more than 32 bits, our RTL should be parameterized in 64-bit, which leads to waste for 32-bit (in most cases) then. Do you think we need a truncateDataTypePass (suppose it is safe to statically change i64 to i32) then?

@ShangkunLi
Copy link
Copy Markdown
Collaborator Author

ShangkunLi commented Jul 22, 2025

Thanks @ShangkunLi, however, if we want to support i64 or any type more than 32 bits, our RTL should be parameterized in 64-bit, which leads to waste for 32-bit (in most cases) then. Do you think we need a truncateDataTypePass (suppose it is safe to statically change i64 to i32) then?

It is determined by the hardware and benchmarks. For the simpleloop.mlir case shown above, most types are i64, it is not safe/realistic to convert all the data into i32. But for other cases where i32 types are dominant, we can do such conversion.

We can create such pass and leave the choice to the user. Filed an issue in #92.

@ShangkunLi ShangkunLi merged commit 0e5be5c into coredac:main Jul 22, 2025
1 check passed
ShangkunLi added a commit that referenced this pull request Mar 12, 2026
Remove non-sense neura::cast ops
ShangkunLi added a commit that referenced this pull request Mar 12, 2026
Remove non-sense neura::cast ops
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants