Questions about msp_done, pd, and comb_reg #1990

jmsguevara · 2024-03-28T15:16:12Z

jmsguevara
Mar 28, 2024

Hi! I was looking at the Calyx and Verilog code outputs from the systolic array generator and I was confused by some of the additional registers/FSMs that the Calyx compiler generates in the Verilog code. If my understanding is correct, these components I observed in the systolic array generator are not specific to that application:

For some instructions that are executed in parallel, Calyx generates an msp_done wire with logic tied to the done signals of the associated registers. For other instructions, Calyx generates pd registers which receive done signals instead. Inserting pd registers instead of relying on purely combinational logic as in the msp_done scenario adds latency, so why does this happen?
For instructions that involve combinational logic with a registered output (for example incrementing a memory index register), the Calyx compiler splits these across two FSM states in Verilog with a register designated as "comb_reg" inserted in between. Is there a drawback of some sort for keeping those signals within the same FSM state?
Regarding the previous point, I also saw that executing instructions of this type in parallel with other instructions results in Calyx generating additional FSMs for the aforementioned instructions (because incrementing a memory index register requires multiple states). Generating additional FSMs incurs a latency penalty because the top-level FSM has to wait for the done state before it can proceed, but if it's feasible to bypass the comb_reg insertion then would it also be feasible to bypass this additional FSM generation?

I'm not quite sure, but it seems like the insertion of these registers is tied to particular compiler passes. If that's the case then I'd appreciate your help in understanding the importance of those compiler passes, what they do, and whether or not they can be removed safely (for a particular definition of "safely").

Sorry if the questions seem obvious, I was going through the Verilog code and couldn't figure out why these components were being inserted. Thank you very much for your time and help!

rachitnigam · 2024-04-15T15:23:14Z

rachitnigam
Apr 15, 2024
Maintainer

Hi @jmsguevara! Thanks for starting a discussion. By and large these registers are added (and sometimes eliminated) during the compilation process. If you're interested in figuring out how the compiler is transforming the program pass-by-pass, you can run it with the --dump-ir flag: calyx <file> -b verilog --dump-ir.

For the most part, the answer to the question is that "the compiler determined this is the best trade-off to make". We don't do any sophisticated delay modeling so some of these trade-offs might be sub-optimal. Calyx aims to support more than just systolic microarchitectures some of these optimizations are unavailable; it should be fairly straightforward to write a systolic array that is implemented more efficiently. The real challenge with the systolic array generator is all the post-ops that we generate and optimize the control code for. If you're trying to get baseline working, I would recommend understanding the new implementation as described in Section 7 of our new paper.

I'll tag @calebmkim to discuss the specific details here since he worked on the systolic array generator.

0 replies

calebmkim · 2024-04-16T18:31:04Z

calebmkim
Apr 16, 2024
Collaborator

Hi @jmsguevara, many thanks for the detailed questions.

Are you using an older commit for the systolic array stuff? For example, I ran

./frontends/systolic-lang/gen-systolic.py -tl 8 -td 8 -ll 8 -ld 8 > systolic.futil
fud e --to synth-verilog --from calyx systolic.futil

In the result, there were no msp_done or comb_reg: if you want to take a look at the most up to date systolic array, you can try to git pull on main and re-trying (if you do that, then you'll get some warnings from the Calyx compiler; this is fine but if it is too annoying to deal with, you can add -s calyx.flags "-d well-formed" to the end of the fud commands).

Either way though, I'll try to answer your questions, based on the commit I think you are using:

msp is, I think, generated from the merge-static-par pass, which we have removed from the compiler passes.
My guess for this (and number 3) is that it has something to do with how we compile the if __ with { ... } statements; if this is the case, you're right to point out that it adds latency: we've had discussions about this problem.

Let me know if you'd like to expand on any of these points

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Calyx Infrastructure

Questions about msp_done, pd, and comb_reg #1990

{{title}}

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

The Calyx Infrastructure

Questions about msp_done, pd, and comb_reg #1990

jmsguevara Mar 28, 2024

Replies: 2 comments

rachitnigam Apr 15, 2024 Maintainer

calebmkim Apr 16, 2024 Collaborator

jmsguevara
Mar 28, 2024

rachitnigam
Apr 15, 2024
Maintainer

calebmkim
Apr 16, 2024
Collaborator