add interpreter dataflow mode by itemkelvin · Pull Request #105 · coredac/dataflow

itemkelvin · 2025-08-06T13:56:09Z

The dataflow execution process (from the provided code context) is as follows:

Setup a dependency graph (value_users) tracking which operations depend on each value.
Initialize a worklist. Operations with no dependencies (e.g., constants) are added to the worklist.
Execute operations in the worklist:

Fetch an operation and validate that all its input operands in value_map are valid (via predicate).
Execute the operation (e.g., neura.or, neura.sel) using operand values from value_map.
Store the result (with combined validity predicate) back into value_map.
Add dependent operations (from value_users) to the worklist for subsequent processing.

The loop ends when the worklist is empty.

FAIL: Neura Dialect Tests :: neura/interpreter/lower_and_interpret.mlir (30 of 31)
FAIL: Neura Dialect Tests :: neura/interpreter/lower_and_interpret_subf.mlir (31 of 31)
********************
Unresolved Tests (2):
  Neura Dialect Tests :: neura/interpreter/Output/lower_and_interpret.mlir.tmp-lowered-to-llvm.mlir
  Neura Dialect Tests :: neura/interpreter/Output/lower_and_interpret_subf.mlir.tmp-lowered-to-llvm.mlir

********************
Failed Tests (2):
  Neura Dialect Tests :: neura/interpreter/lower_and_interpret.mlir
  Neura Dialect Tests :: neura/interpreter/lower_and_interpret_subf.mlir


Testing Time: 0.23s

Total Discovered Tests: 31
  Passed    : 27 (87.10%)
  Unresolved:  2 (6.45%)
  Failed    :  2 (6.45%)

1 warning(s) in tests

There are still two test cases that fail, and they weren't written by me.

tancheng · 2025-08-06T16:33:14Z

Thanks @itemkelvin for the prototyping~!

Plz provide brief explanation about your design in this PR's description. I am most interested in how dataflow is handled different from ctrlflow, what additional stuff is needed, and is it possible to merge some common logic in the code.
Plz add comment (descriptive style, third person verb, and end with period, i.e., // Explains sth as comment.) above each newly introduced func.
Plz use snake_case for all the variables.

I appreciate the effort~!

itemkelvin · 2025-08-07T14:27:56Z

Thanks @itemkelvin for the prototyping~!

Plz provide brief explanation about your design in this PR's description. I am most interested in how dataflow is handled different from ctrlflow, what additional stuff is needed, and is it possible to merge some common logic in the code.

Plz add comment (descriptive style, third person verb, and end with period, i.e., // Explains sth as comment.) above each newly introduced func.

Plz use snake_case for all the variables.

I appreciate the effort~!

The dataflow execution process (from the provided code context) is as follows:

Setup a dependency graph (value_users) tracking which operations depend on each value.
Initialize a worklist. Operations with no dependencies (e.g., constants) are added to the worklist.
Execute operations in the worklist:

Fetch an operation and validate that all its input operands in value_map are valid (via predicate).
Execute the operation (e.g., neura.or, neura.sel) using operand values from value_map.
Store the result (with combined validity predicate) back into value_map.
Add dependent operations (from value_users) to the worklist for subsequent processing.

The loop ends when the worklist is empty.

tancheng · 2025-08-07T15:15:13Z

Thanks @itemkelvin for summarization, I put your summary into the PR's description.

Initialize a worklist. Operations with no dependencies (e.g., constants) are added to the worklist.

This sounds a topological sorting first, then put the sorted stuff into a map. So can we reuse

dataflow/lib/NeuraDialect/Mapping/mapping_util.cpp

Lines 149 to 209 in 3158147

    
           std::vector<Operation *> 
        
               mlir::neura::getTopologicallySortedOps(Operation *func_op) { 
        
             std::vector<Operation *> sorted_ops; 
        
             llvm::DenseMap<Operation *, int> pending_deps; 
        
             std::deque<Operation *> ready_queue; 
        
             // Collects recurrence cycle ops. 
        
             auto recurrence_cycles = collectRecurrenceCycles(func_op); 
        
             llvm::DenseSet<Operation *> recurrence_ops; 
        
             for (const auto &cycle : recurrence_cycles) 
        
               for (Operation *op : cycle.operations) 
        
                 recurrence_ops.insert(op); 
        
             // Counts unresolved dependencies for each op. 
        
             func_op->walk([&](Operation *op) { 
        
               if (op == func_op) { 
        
                 return; 
        
               } 
        
               int dep_count = 0; 
        
               for (Value operand : op->getOperands()) { 
        
                 if (operand.getDefiningOp()) { 
        
                   ++dep_count; 
        
                 } 
        
               } 
        
               pending_deps[op] = dep_count; 
        
               if (dep_count == 0) { 
        
                 // TODO: Prioritize recurrence ops. But cause compiled II regression. 
        
                 // https://github.com/coredac/dataflow/issues/59. 
        
                 if (recurrence_ops.contains(op)) { 
        
                   // ready_queue.push_front(op); 
        
                   ready_queue.push_back(op); 
        
                 } else { 
        
                   ready_queue.push_back(op); 
        
                 } 
        
               } 
        
             }); 
        
             // BFS-style topological sort with recurrence priority. 
        
             while (!ready_queue.empty()) { 
        
               Operation *op = ready_queue.front(); 
        
               ready_queue.pop_front(); 
        
               sorted_ops.push_back(op); 
        
               for (Value result : op->getResults()) { 
        
                 for (Operation *user : result.getUsers()) { 
        
                   if (--pending_deps[user] == 0) { 
        
                     // TODO: Prioritize recurrence ops. But cause compiled II regression. 
        
                     // https://github.com/coredac/dataflow/issues/59. 
        
                     if (recurrence_ops.contains(user)) { 
        
                       // ready_queue.push_front(user); 
        
                       ready_queue.push_back(user); 
        
                     } else { 
        
                       ready_queue.push_back(user); 
        
                     } 
        
                   } 
        
                 } 
        
               } 
        
             } 
        
             return sorted_ops; 
        
           }

i.e., create a util/op_util.cc file, move that getTopologicallySortedOps() into that file, and use it in your code.

value_map & value_users

These two are a little bit confusing. I would suggest rename value_map to value_to_predicated_data_map, and rename value_users to value_to_users_map, how does this sound? We then know what they are supposed to serve/do from the naming.

The loop ends when the worklist is empty.

Can you explain when the worklist would be empty? From you description, "Add dependent operations (from value_users) to the worklist", I didn't see when we skip adding dependent users into the worklist, so sounds like never end.

test/neura/interpreter/lower_and_interpret.mlir

itemkelvin · 2025-08-08T11:59:07Z

Thanks @itemkelvin for summarization, I put your summary into the PR's description.

Initialize a worklist. Operations with no dependencies (e.g., constants) are added to the worklist.

This sounds a topological sorting first, then put the sorted stuff into a map. So can we reuse

dataflow/lib/NeuraDialect/Mapping/mapping_util.cpp

Lines 149 to 209 in 3158147

std::vector<Operation *>

mlir::neura::getTopologicallySortedOps(Operation *func_op) {

std::vector<Operation *> sorted_ops;

llvm::DenseMap<Operation *, int> pending_deps;

std::deque<Operation *> ready_queue;

// Collects recurrence cycle ops.

auto recurrence_cycles = collectRecurrenceCycles(func_op);

llvm::DenseSet<Operation *> recurrence_ops;

for (const auto &cycle : recurrence_cycles)

for (Operation *op : cycle.operations)

recurrence_ops.insert(op);

// Counts unresolved dependencies for each op.

func_op->walk([&](Operation *op) {

if (op == func_op) {

return;

}

int dep_count = 0;

for (Value operand : op->getOperands()) {

if (operand.getDefiningOp()) {

++dep_count;

}

}

pending_deps[op] = dep_count;

if (dep_count == 0) {

// TODO: Prioritize recurrence ops. But cause compiled II regression.

// https://github.com/coredac/dataflow/issues/59.

if (recurrence_ops.contains(op)) {

// ready_queue.push_front(op);

ready_queue.push_back(op);

} else {

ready_queue.push_back(op);

}

}

});

// BFS-style topological sort with recurrence priority.

while (!ready_queue.empty()) {

Operation *op = ready_queue.front();

ready_queue.pop_front();

sorted_ops.push_back(op);

for (Value result : op->getResults()) {

for (Operation *user : result.getUsers()) {

if (--pending_deps[user] == 0) {

// TODO: Prioritize recurrence ops. But cause compiled II regression.

// https://github.com/coredac/dataflow/issues/59.

if (recurrence_ops.contains(user)) {

// ready_queue.push_front(user);

ready_queue.push_back(user);

} else {

ready_queue.push_back(user);

}

}

}

}

}

return sorted_ops;

}

i.e., create a util/op_util.cc file, move that getTopologicallySortedOps() into that file, and use it in your code.

value_map & value_users

These two are a little bit confusing. I would suggest rename value_map to value_to_predicated_data_map, and rename value_users to value_to_users_map, how does this sound? We then know what they are supposed to serve/do from the naming.

The loop ends when the worklist is empty.

Can you explain when the worklist would be empty? From you description, "Add dependent operations (from value_users) to the worklist", I didn't see when we skip adding dependent users into the worklist, so sounds like never end.

In the executeOperation function, the interpreter determines whether an operation's result (including both value and predicate) has changed. Only when the result is actually updated (is_updated = 1) will it propagate to downstream users by adding them to the worklist.

For example, when executing ctrl_mov, if the input predicate is true and the target value is updated, the interpreter logs:

[neura-interpreter]  Executing neura.ctrl_mov(dataflow):
[neura-interpreter]  ├─ Source: 1.000000e+00 | 1
[neura-interpreter]  ├─ Target (after): 1.000000e+00 | 1 | is_updated=1
[neura-interpreter]  └─ Execution succeeded
[neura-interpreter]  Operation updated, propagating to users...
[neura-interpreter]  Added user to next work_list: %14 = "neura.phi"(%13, %1) : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i64, i1>
[neura-interpreter]  Added user to next work_list: neura.ctrl_mov %18 -> %13 : !neura.data<i64, i1> !neura.data<i64, i1>

Otherwise, if there's no meaningful change, dependent operations are not added to the worklist.

[neura-interpreter]  Executing neura.ctrl_mov(dataflow):
[neura-interpreter]  ├─ Skip update: Source predicate invalid (pred=0)
[neura-interpreter]  ├─ Source: 1.000000e+01 | 0
[neura-interpreter]  ├─ Target (after): 1.000000e+01 | 1 | is_updated=0
[neura-interpreter]  └─ Execution succeeded
[neura-interpreter]  No update for ctrl_mov target: %5 = neura.reserve : !neura.data<i64, i1>

This selective propagation mechanism ensures that only meaningful changes trigger re-execution, which prevents unnecessary work and guarantees that the worklist eventually empties, allowing interpretation to terminate.

test/neura/interpreter/basic_operation/add.mlir

test/neura/interpreter/loop_dataflow.mlir

test/neura/interpreter/lower_and_interpret.mlir

tools/neura-interpreter/neura-interpreter.cpp

tancheng

Thanks, LGTM :-)

tools/neura-interpreter/neura-interpreter.cpp

itemkelvin · 2025-08-21T16:05:03Z

This queue only handles data-flow execution.

tancheng · 2025-08-21T16:13:14Z

This queue only handles data-flow execution. Jeromer @.***
…
------------------ 原始邮件 ------------------ 发件人: "Cheng @.>; 发送时间: 2025年8月22日(星期五) 凌晨0:00 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [coredac/dataflow] add interpreter dataflow mode (PR #105) @tancheng commented on this pull request. In tools/neura-interpreter/neura-interpreter.cpp: > + if (value_to_predicated_data_map.count(source) && + value_to_predicated_data_map[source].is_updated && + value_to_predicated_data_map[source].predicate && + value_to_predicated_data_map.count(target) && + value_to_predicated_data_map[target].is_updated && + value_to_predicated_data_map[target].predicate) { + affected_values.push_back(target); + } + } + } + + // Adds all users of affected values to the next pending operation queue (if not already present). + for (Value val : affected_values) { + for (Operation* user_op : val.getUsers()) { + if (!is_operation_enqueued[user_op]) { + next_pending_operation_queue.push_back(user_op); We are also leveraging this queue for ctrl-flow execution, right? I am wondering how could it correctly handle if/else or certain ctrl flow, i.e., a operation/value has multiple users, both if/else users would be inserted? or we can correctly only insert the chosen path? Or there will never be above scenario, as ctrl-flow execution would have a br to make user op be identified correctly? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Thanks, plz reply on the pending comment in this PR, so then I can "resolve" them, and merge this PR.

itemkelvin · 2025-08-21T16:55:50Z

This queue only handles data-flow execution. Jeromer @.***
…
------------------ 原始邮件 ------------------ 发件人: "Cheng @.>; 发送时间: 2025年8月22日(星期五) 凌晨0:00 收件人: _@**._>; 抄送: _@.>; @._>; 主题: Re: [coredac/dataflow] add interpreter dataflow mode (PR #105) @tancheng commented on this pull request. In tools/neura-interpreter/neura-interpreter.cpp: > + if (value_to_predicated_data_map.count(source) && + value_to_predicated_data_map[source].is_updated && + value_to_predicated_data_map[source].predicate && + value_to_predicated_data_map.count(target) && + value_to_predicated_data_map[target].is_updated && + value_to_predicated_data_map[target].predicate) { + affected_values.push_back(target); + } + } + } + + // Adds all users of affected values to the next pending operation queue (if not already present). + for (Value val : affected_values) { + for (Operation* user_op : val.getUsers()) { + if (!is_operation_enqueued[user_op]) { + next_pending_operation_queue.push_back(user_op); We are also leveraging this queue for ctrl-flow execution, right? I am wondering how could it correctly handle if/else or certain ctrl flow, i.e., a operation/value has multiple users, both if/else users would be inserted? or we can correctly only insert the chosen path? Or there will never be above scenario, as ctrl-flow execution would have a br to make user op be identified correctly? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: _@_.*>

Thanks, plz reply on the pending comment in this PR, so then I can "resolve" them, and merge this PR.

All pending comments have been replied.

add interpreter dataflow mode

add interpreter dataflow mode

c48566c

tancheng requested review from MeowMJ, ShangkunLi and Yfeng-44 August 6, 2025 16:33

tancheng assigned itemkelvin Aug 6, 2025

tancheng added the new feature New feature or request label Aug 6, 2025

add interpreter dataflow mode

30c0c82

ShangkunLi reviewed Aug 7, 2025

View reviewed changes

test/neura/interpreter/lower_and_interpret.mlir Show resolved Hide resolved

tancheng reviewed Aug 8, 2025

View reviewed changes

add interpreter dataflow mode

0305dc9

tancheng reviewed Aug 9, 2025

View reviewed changes

add interpreter dataflow mode

2f3a9f3

tancheng reviewed Aug 16, 2025

View reviewed changes

add interpreter dataflow mode

7627287

tancheng approved these changes Aug 17, 2025

View reviewed changes

add interpreter dataflow mode

ca864d2

tancheng reviewed Aug 21, 2025

View reviewed changes

tools/neura-interpreter/neura-interpreter.cpp Show resolved Hide resolved

tancheng approved these changes Aug 21, 2025

View reviewed changes

tancheng merged commit 7502a45 into main Aug 21, 2025
1 check passed

tancheng mentioned this pull request Aug 21, 2025

[P1] Interpreting for loop in dataflow style #42

Open

ShangkunLi mentioned this pull request Aug 22, 2025

[P0] Github verification fails after checking in https://github.com/coredac/dataflow/pull/105 #118

Closed

ShangkunLi pushed a commit that referenced this pull request Mar 12, 2026

Merge pull request #105 from coredac/interpreter-dataflow-mode

b078bbe

add interpreter dataflow mode

ShangkunLi pushed a commit that referenced this pull request Mar 12, 2026

Merge pull request #105 from coredac/interpreter-dataflow-mode

cb0fa41

add interpreter dataflow mode

Conversation

itemkelvin commented Aug 6, 2025 • edited by tancheng Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tancheng commented Aug 6, 2025

Uh oh!

itemkelvin commented Aug 7, 2025

Uh oh!

tancheng commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

itemkelvin commented Aug 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tancheng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

itemkelvin commented Aug 21, 2025 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tancheng commented Aug 21, 2025

Uh oh!

itemkelvin commented Aug 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

itemkelvin commented Aug 6, 2025 •

edited by tancheng

Loading

tancheng commented Aug 7, 2025 •

edited

Loading

itemkelvin commented Aug 21, 2025 via email •

edited

Loading