Questions about msp_done, pd, and comb_reg #1990
Replies: 2 comments
-
Hi @jmsguevara! Thanks for starting a discussion. By and large these registers are added (and sometimes eliminated) during the compilation process. If you're interested in figuring out how the compiler is transforming the program pass-by-pass, you can run it with the For the most part, the answer to the question is that "the compiler determined this is the best trade-off to make". We don't do any sophisticated delay modeling so some of these trade-offs might be sub-optimal. Calyx aims to support more than just systolic microarchitectures some of these optimizations are unavailable; it should be fairly straightforward to write a systolic array that is implemented more efficiently. The real challenge with the systolic array generator is all the post-ops that we generate and optimize the control code for. If you're trying to get baseline working, I would recommend understanding the new implementation as described in Section 7 of our new paper. I'll tag @calebmkim to discuss the specific details here since he worked on the systolic array generator. |
Beta Was this translation helpful? Give feedback.
-
Hi @jmsguevara, many thanks for the detailed questions. Are you using an older commit for the systolic array stuff? For example, I ran
In the result, there were no Either way though, I'll try to answer your questions, based on the commit I think you are using:
Let me know if you'd like to expand on any of these points |
Beta Was this translation helpful? Give feedback.
-
Hi! I was looking at the Calyx and Verilog code outputs from the systolic array generator and I was confused by some of the additional registers/FSMs that the Calyx compiler generates in the Verilog code. If my understanding is correct, these components I observed in the systolic array generator are not specific to that application:
For some instructions that are executed in parallel, Calyx generates an msp_done wire with logic tied to the done signals of the associated registers. For other instructions, Calyx generates pd registers which receive done signals instead. Inserting pd registers instead of relying on purely combinational logic as in the msp_done scenario adds latency, so why does this happen?
For instructions that involve combinational logic with a registered output (for example incrementing a memory index register), the Calyx compiler splits these across two FSM states in Verilog with a register designated as "comb_reg" inserted in between. Is there a drawback of some sort for keeping those signals within the same FSM state?
Regarding the previous point, I also saw that executing instructions of this type in parallel with other instructions results in Calyx generating additional FSMs for the aforementioned instructions (because incrementing a memory index register requires multiple states). Generating additional FSMs incurs a latency penalty because the top-level FSM has to wait for the done state before it can proceed, but if it's feasible to bypass the comb_reg insertion then would it also be feasible to bypass this additional FSM generation?
I'm not quite sure, but it seems like the insertion of these registers is tied to particular compiler passes. If that's the case then I'd appreciate your help in understanding the importance of those compiler passes, what they do, and whether or not they can be removed safely (for a particular definition of "safely").
Sorry if the questions seem obvious, I was going through the Verilog code and couldn't figure out why these components were being inserted. Thank you very much for your time and help!
Beta Was this translation helpful? Give feedback.
All reactions