Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combinational loops created when invoked component has single group in control sequence #2198

Open
nathanielnrn opened this issue Jul 5, 2024 · 8 comments
Labels
S: Discussion needed Issues blocked on discussion Type: Bug Bug in the implementation

Comments

@nathanielnrn
Copy link
Contributor

nathanielnrn commented Jul 5, 2024

I think the title describes what is happening, but I'm not sure. Any input people have is appreciated!

I’ve tried to pin down where some combinational loops are being generated when using invokes. I think it’s a symptom of the invoke getting “hoisted” in a way such that the invoked component’s done signal relies on an input signal, after hoisting.

It seems like a main with a group that writes to a “concrete” register, followed by an invoke to a sub-component that writes to that same register, as a ref, creates a combinational loop. Something like this:

component invokee() -> () {
  cells {
    ref ref_reg = std_reg(32);
  }
  wires {
    //writes to the ref register
    group invokee_ref_reg_write {
      ref_reg.in = 32'd0;
      ref_reg.write_en = 1'd1;
      invokee_ref_reg_write[done] = ref_reg.done;
    }
  }
  control {
    invokee_ref_reg_write;
  }
}
component wrapper<"toplevel"=1>() -> () {
  cells {
    concrete_reg = std_reg(32);
    invokee = invokee();
  }
  wires {
    //writes to the concrete register
    group invoker_reg_write {
      concrete_reg.in = 32'd0;
      concrete_reg.write_en = 1'd1;
      invoker_reg_write[done] = concrete_reg.done;
    }
  }
  control {
    invoker_reg_write;
    invoke invokee[ref_reg=concrete_reg]()();
  }
}

Looking at the the flattened version of the above program, I think this is a symptom of #621. The empty control block of invokee, and the combinational dependence of invokee.done on an input to the component, I think is what @rachitnigam describes in this comment coming into play.

In terms of an actionable way forward, one approach could modify the compile_invoke pass to flatten such components that only have a single group in their control block into the component doing the invoking itself? So in the flattened example, we would remove invokee as a component and have invokee's sole group invokee_ref_reg_write live in the wires block of wrapper. In this way we would avoid the combinational-groups that get produced with the current pass.

Happy to hear any thoughts about other possible ways forward.

@nathanielnrn nathanielnrn added S: Discussion needed Issues blocked on discussion Type: Bug Bug in the implementation labels Jul 5, 2024
@nathanielnrn
Copy link
Contributor Author

In case it's of interest, the verilog produced can be found here

And the error message verilator produces is

%Warning-UNOPTFLAT: dumpvars.v:380:7: Signal unoptimizable: Circular combinational logic: 'wrapper.invokee_ref_reg_done'
  380 | logic invokee_ref_reg_done;
      |       ^~~~~~~~~~~~~~~~~~~~
                    ... For warning description see https://verilator.org/warn/UNOPTFLAT?v=5.006
                    ... Use "/* verilator lint_off UNOPTFLAT */" and lint_on around source to disable this message.
                    dumpvars.v:380:7:      Example path: wrapper.invokee_ref_reg_done
                    dumpvars.v:532:22:      Example path: ASSIGNW
                    dumpvars.v:391:7:      Example path: wrapper.invoke1_go_in
                    dumpvars.v:517:29:      Example path: ASSIGNW
                    dumpvars.v:380:7:      Example path: wrapper.invokee_ref_reg_done
dot -Tpdf -o ~/a.pdf sim_build//Vwrapper_059_unoptflat.dot
%Error: Exiting due to 1 warning(s)

@nathanielnrn nathanielnrn changed the title invokes create combinational loops when invoked component has single group in control sequence Combinational loops created when invoked component has single group in control sequence Jul 5, 2024
@calebmkim
Copy link
Contributor

calebmkim commented Jul 9, 2024

Sorry if this is a dumb question, but what is the easiest way to reproduce this error?

As in, what command should I run?

@nathanielnrn
Copy link
Contributor Author

Not dumb! If the example above is named minimized.futil then

fud2 minimized.futil --from calyx --to verilog-noverify -o minimized-verilog.v
then
verilator -cc --exe -Mdir sim_build/ -DCOCOTB_SIM=1 --top-module wrapper --timing --timescale 1ns/1ps ./minimized-verilog.v
should reproduce.

@sampsyo
Copy link
Contributor

sampsyo commented Jul 9, 2024

FWIW, while I continue to dig into this a bit, here's a one-liner for detecting the problem:

verilator --lint-only --top-module wrapper =(fud2 minimized.futil --to verilog)

=(...) is a bash-ism that puts a command's stdout into a temporary file and passes along that filename.

@sampsyo
Copy link
Contributor

sampsyo commented Jul 9, 2024

This one's a doozy!! Thanks for the compact reproduction.

understanding the problem

I tried to pull out the core of the problem in the lowered Calyx. The offending lines are:

In wrapper:

invokee.go = invoke0_go.out ? 1'd1;
invokee.ref_reg_done = invoke0_go.out ? concrete_reg.done;
invoke0_go.in = !invoke0_done.out & fsm.out == 2'd1 & tdcc_go.out ? 1'd1;
invoke0_done.in = invokee.done;

In invokee:

done = invokee_ref_reg_write_done.out ? 1'd1;
invokee_ref_reg_write_done.in = ref_reg_done;
invokee_ref_reg_write_go.in = go;

Simplifying all the std_wires and stuff down, the combinational cycle looks sorta like this:

invokee.ref_reg_done = !invokee.done & (some FSM stuff) ? concrete_reg.done;
invokee.done = invokee.ref_reg_done;

That is, the compiler is trying to make sure ref_reg_done (like all the assignments involved in the invocation) only gets assigned when the invocation is currently ongoing, i.e., the invokee is not yet done. But the invocation wants to determine whether or not it is done using ref_reg_done. Hence the cycle.

Another way of seeing the problem is to look at the result of compile-invoke, which generates this group:

group invoke0 {
  concrete_reg.in = invokee.ref_reg_in;
  concrete_reg.write_en = invokee.ref_reg_write_en;
  invokee.ref_reg_out = concrete_reg.out;
  invokee.ref_reg_done = concrete_reg.done;
  invokee.go = 1'd1;
  invoke0[done] = invokee.done;
}

These lines cause the problem:

invokee.ref_reg_done = concrete_reg.done;
invoke0[done] = invokee.done;

The assignment to ref_reg_done needs to be guarded on the group's doneness. But of course this assignment also determines the group's doneness. The compiler can't know that only by looking at wrapper; divining this dependency requires a whole-program analysis.

potential solutions

Way back when, in #621 (comment), @rachitnigam distilled one option:

Another ponderous solution to this problem, or rather a guarantee that should possibly be provided by the compiler, is that there is no combinational path between a component's go and done signals.

Which we would have to generalize (disallow combinational paths from any input to the done port). That would do it; the only problem is that this perfectly innocuous-looking ref generates a component that does just that (i.e., creates a combinational path from an input to done where one did not otherwise exist).

While this requirement seems awkward, maybe it is necessary to make it feasible to implement invokes correctly. We would then have the problem of figuring out what to do with ref cells that are wired up to done ports… disallowing them seems pretty annoying, but maybe we can do something better, such as marking the relevant input port to say "hey, this one is actually my done signal."

And @nathanielnrn suggests:

In terms of an actionable way forward, one approach could modify the [compile_invoke pass](https://github.com/calyxir/calyx/blob/ed4ce6e5980dfb61a289354231dd514105248c7e/calyx-opt/src/passes/compile_invoke.rs) to flatten such components that only have a single group in their control block into the component doing the invoking itself?

This would amount to inlining the invoke. Which is a perfectly reasonable thing to do, but it feels kinda bad to require inlining for correctness. Deciding whether or not to inline based on the control program is also a little problematic: it seems to imply that there is an underlying criterion that makes a given component require inlining for correctness, but that we can only detect this property if it has not had its control compiled away yet. So we would still have problems with components that behave the same way but are fully structural (e.g., the result of TDCC'ing a single-statement component).

an aside

As an aside, it would be very handy to have (but probably not worth it to build) a tool that can detect combinational cycles at the Calyx level, just for debugging compiler bugs, without emitting Verilog...

@rachitnigam
Copy link
Contributor

In terms of an actionable way forward, one approach could modify the compile_invoke pass to flatten such components that only have a single group in their control block into the component doing the invoking itself?

Another possible approach I've thought of in the past is making sure that every component gets an FSM, even if there is exactly one group in it. This would allow us to redirect the go and done signals and make sure that we do not get combinational loops.

For assignments of this form:

 invokee.ref_reg_done = concrete_reg.done;

There is some sort of "done-forwarding" being performed that should not be treated as a normal assignment. This is possibly related to #1034 where we have a similar problem with clk and reset signals needing special-cased forwarding compared to normal data signals.

I think there is some higher-level problem/question about how to think about control vs. data signals (which keeps coming up) and something that we should address with a possible redesign.


=(fud2 minimized.futil --to verilog)

I did not know this! Super useful!

@nathanielnrn
Copy link
Contributor Author

Having thought about this a bit more, I think it can be useful to have a component's/group's done signal be driven by an input. For example an AXI write channel can be considered done when WLAST is high (To be precise, it would actually be done in the cycle after WLAST is high, but I think this gets the point across). So I think that

making sure that every component gets an FSM, even if there is exactly one group in it

might be the right way forward.

@sampsyo
Copy link
Contributor

sampsyo commented Jul 11, 2024

Thanks for the discussion. To distill the option space, one way to think about it is that we need to respect the Calyx requirement that every (non-combinational) component takes at least 1 cycle to run. I think there are two possible directions:

  • prohibit, distinguish, or specially handle component inputs that are wired up to their @done signal
  • compile components like the one in this example so that, even when an input is wired up to @done, they still take one cycle to run (i.e., they ignore the @done-wired input until the cycle after their @go goes high)

I think "making sure every component gets an FSM" amounts to a pretty good approach to the second path.

nathanielnrn added a commit that referenced this issue Jul 24, 2024
This works around #2198 causing combinational loops
nathanielnrn added a commit that referenced this issue Aug 28, 2024
…ctually used in an `invoke` (#2216)

These changes make it so that only signals used by a component are
turned into assignments when said component is `invoke`d.

I added some comments trying to point out what changes were made as the
naming and keeping track of what is held in which data structure can be
a bit confusing.

For example, consider this reader component, which manually drives a
`manually_driven_ref_reg_out` signal

```
component reader() -> (manually_driven_ref_reg_out : 32) {
  cells {
    ref ref_reg = std_reg(32);
  }
  wires {
    //reads from the ref register
    group read_from_ref_reg {
      manually_driven_ref_reg_out = ref_reg.out;
      read_from_ref_reg[done] = ref_reg.done;
    }
  }
  control {
    read_from_ref_reg;
  }
}
```

If we invoke this component in main and pass in a `concrete_cell`:
`invoke reader[ref_reg=concrete_reg]()();`

This control statement will now get compiled to a group that only
assigns to the ports used in `reader`. So in this case,
`concrete_reg.out`.
Notably, `concrete_reg`'s `write_en` and `in` signals do not appear
here. Before this PR, they would have.
```
group invoke0 {
      reader.manually_driven_ref_reg_out = concrete_reg.out;
      reader.go = 1'd1;
      invoke0[done] = reader.done;
    }
```

This partially address the issue of [passing in `ref` cells in
parallel](https://calyx.zulipchat.com/#narrow/stream/423433-general/topic/parallel.20ref.20passing).
In particular, correctly models behavior in verilog in cases where ports
driven/accessed by components that are passed in the same concrete cell
in parallel are mutually exclusive. (i.e. a write component that only
drive a register's `in` and `write_en` ports.)

So one could now imagine the following control block valid:

```
control {
  par {
    invoke reader[ref_reg = concrete_reg]()();
    invoke writer[ref_reg = concrete_reg]()();
  }
}
```

---
PS: Worth noting that the `reader` component as written above would
currently run into issues due to #2198
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S: Discussion needed Issues blocked on discussion Type: Bug Bug in the implementation
Projects
None yet
Development

No branches or pull requests

4 participants