Remove non-sense neura::cast ops by ShangkunLi · Pull Request #91 · coredac/dataflow

ShangkunLi · 2025-07-22T06:05:25Z

In this pr:

Add a pass to convert all index types to i64
Remove non-sense index_to_i64 or i64_to_index cast operations

tancheng · 2025-07-22T14:15:31Z

LGTM. But can we try to figure out why this cast is added by which pass for what reason? One reason might be to guarantee the data/value range won't exceed the represent range, especially for a large tensor, however, the range is statically known; but MLIR can support dynamic shape, in which case it is unknown, so cast is reasonable.

ShangkunLi · 2025-07-22T14:32:27Z

LGTM. But can we try to figure out why this cast is added by which pass for what reason? One reason might be to guarantee the data/value range won't exceed the represent range, especially for a large tensor, however, the range is statically known; but MLIR can support dynamic shape, in which case it is unknown, so cast is reasonable.

The index->i64 or i64->index cast ops are introduced by --convert-cf-to-llvm pass. The reason why these cast ops exist is that mlir by default treats index and i64 as different types. So when we want to take the loop index to compute a result, we need to explicitly convert such index type to i64, i32, etc.

The reason why I can remove non-sense cast ops is that I treat all the realizations of index type values as i64 type in our CGRA. Then we can remove most cast ops.

And for other cast ops like index->i32 are converted to i64->i32 to explicitly denote the type conversion. A test example is added in test/controlflow_fuse/simpleloop/simpleloop.mlir

module {
  func.func @_Z10simpleloopv() -> i32 attributes {accelerator = "neura", llvm.linkage = #llvm.linkage<external>} {
    %0 = "neura.constant"() <{predicate = true, value = 1 : i64}> : () -> i64
    %1 = "neura.constant"() <{predicate = true, value = 128 : i64}> : () -> i64
    %2 = "neura.constant"() <{predicate = true, value = 0 : i32}> : () -> i32
    %3 = "neura.constant"() <{predicate = true, value = 0 : i64}> : () -> i64
    neura.br %3, %2 : i64, i32 to ^bb1
  ^bb1(%4: i64, %5: i32):  // 2 preds: ^bb0, ^bb2
    %6 = "neura.icmp"(%4, %1) <{cmpType = "slt"}> : (i64, i64) -> i1
    neura.cond_br %6 : i1 then to ^bb2 else to ^bb3
  ^bb2:  // pred: ^bb1
    %7 = "neura.cast"(%4) <{cast_type = "i64_to_i32"}> : (i64) -> i32
    %8 = "neura.add"(%5, %7) : (i32, i32) -> i32
    %9 = "neura.add"(%4, %0) : (i64, i64) -> i64
    neura.br %9, %8 : i64, i32 to ^bb1
  ^bb3:  // pred: ^bb1
    "neura.return"(%5) : (i32) -> ()
  }
}

tancheng · 2025-07-22T14:40:46Z

LGTM. But can we try to figure out why this cast is added by which pass for what reason? One reason might be to guarantee the data/value range won't exceed the represent range, especially for a large tensor, however, the range is statically known; but MLIR can support dynamic shape, in which case it is unknown, so cast is reasonable.

The index->i64 or i64->index cast ops are introduced by --convert-cf-to-llvm pass. The reason why these cast ops exist is that mlir by default treats index and i64 as different types. So when we want to take the loop index to compute a result, we need to explicitly convert such index type to i64, i32, etc.

The reason why I can remove non-sense cast ops is that I treat all the realizations of index type values as i64 type in our CGRA. Then we can remove most cast ops.

And for other cast ops like index->i32 are converted to i64->i32 to explicitly denote the type conversion. A test example is added in test/controlflow_fuse/simpleloop/simpleloop.mlir
module {
  func.func @_Z10simpleloopv() -> i32 attributes {accelerator = "neura", llvm.linkage = #llvm.linkage<external>} {
    %0 = "neura.constant"() <{predicate = true, value = 1 : i64}> : () -> i64
    %1 = "neura.constant"() <{predicate = true, value = 128 : i64}> : () -> i64
    %2 = "neura.constant"() <{predicate = true, value = 0 : i32}> : () -> i32
    %3 = "neura.constant"() <{predicate = true, value = 0 : i64}> : () -> i64
    neura.br %3, %2 : i64, i32 to ^bb1
  ^bb1(%4: i64, %5: i32):  // 2 preds: ^bb0, ^bb2
    %6 = "neura.icmp"(%4, %1) <{cmpType = "slt"}> : (i64, i64) -> i1
    neura.cond_br %6 : i1 then to ^bb2 else to ^bb3
  ^bb2:  // pred: ^bb1
    %7 = "neura.cast"(%4) <{cast_type = "i64_to_i32"}> : (i64) -> i32
    %8 = "neura.add"(%5, %7) : (i32, i32) -> i32
    %9 = "neura.add"(%4, %0) : (i64, i64) -> i64
    neura.br %9, %8 : i64, i32 to ^bb1
  ^bb3:  // pred: ^bb1
    "neura.return"(%5) : (i32) -> ()
  }
}

Thanks @ShangkunLi, however, if we want to support i64 or any type more than 32 bits, our RTL should be parameterized in 64-bit, which leads to waste for 32-bit (in most cases) then. Do you think we need a truncateDataTypePass (suppose it is safe to statically change i64 to i32) then?

ShangkunLi · 2025-07-22T14:56:46Z

Thanks @ShangkunLi, however, if we want to support i64 or any type more than 32 bits, our RTL should be parameterized in 64-bit, which leads to waste for 32-bit (in most cases) then. Do you think we need a truncateDataTypePass (suppose it is safe to statically change i64 to i32) then?

It is determined by the hardware and benchmarks. For the simpleloop.mlir case shown above, most types are i64, it is not safe/realistic to convert all the data into i32. But for other cases where i32 types are dominant, we can do such conversion.

We can create such pass and leave the choice to the user. Filed an issue in #92.

Remove non-sense neura::cast ops

remove non-sense cast ops

14c9c37

ShangkunLi marked this pull request as ready for review July 22, 2025 09:58

ShangkunLi requested a review from tancheng July 22, 2025 09:58

tancheng assigned ShangkunLi Jul 22, 2025

tancheng added the new feature New feature or request label Jul 22, 2025

tancheng approved these changes Jul 22, 2025

View reviewed changes

ShangkunLi merged commit 0e5be5c into coredac:main Jul 22, 2025
1 check passed

ShangkunLi mentioned this pull request Jul 22, 2025

[P1] Fuse some non-sense cast operations #77

Closed

ShangkunLi added a commit that referenced this pull request Mar 12, 2026

Merge pull request #91 from ShangkunLi/removecast

64d2833

Remove non-sense neura::cast ops

ShangkunLi added a commit that referenced this pull request Mar 12, 2026

Merge pull request #91 from ShangkunLi/removecast

66a667b

Remove non-sense neura::cast ops

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove non-sense neura::cast ops#91

Remove non-sense neura::cast ops#91
ShangkunLi merged 1 commit intocoredac:mainfrom
ShangkunLi:removecast

ShangkunLi commented Jul 22, 2025

Uh oh!

tancheng commented Jul 22, 2025

Uh oh!

ShangkunLi commented Jul 22, 2025

Uh oh!

tancheng commented Jul 22, 2025

Uh oh!

ShangkunLi commented Jul 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ShangkunLi commented Jul 22, 2025

Uh oh!

tancheng commented Jul 22, 2025

Uh oh!

ShangkunLi commented Jul 22, 2025

Uh oh!

tancheng commented Jul 22, 2025

Uh oh!

ShangkunLi commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ShangkunLi commented Jul 22, 2025 •

edited

Loading