[fud] Get designs to properly execute on FPGA boards: lab notebook #1022
Replies: 16 comments 43 replies
-
Wednesday June 8: This morning was spent setting things up. Finally was able to execute dot product on havarti's fpga. Output still 0. Next steps should be writing and AXI interface manually. Have a feeling next few days will be spent familiarizing myself with AXI. The rest of today and the coming few days will likely leave me with a lot of questions (and hopefully progress) regarding AXI. |
Beta Was this translation helpful? Give feedback.
-
Friday June 10: Working on #1020. Seems like the tasks currently listed there may not be the right way forward.
It might be more useful to examine the produced waveform from hardware emulation (would be happy to hear others' opinions on this). The above is the main way forward regarding getting designs to run on fpgas. Beyond that, recently have been working on getting fud to output waveform (.wdb) files during hardware emulation. Currently the process is hardcoded into various python scripts in [fud stages] (https://github.com/cucapra/calyx/tree/master/fud/fud/stages/xilinx). Was wondering how much time should be devoted to making this an easier process? Probably by passing in a Next week I'll be examining the produced waveform, hopefully helping us understand what's going wrong during hardware emulation. Fingers crossed. 🤞 |
Beta Was this translation helpful? Give feedback.
-
Monday June 13 Working on #1020 Spent today wrangling with fud stages and trying to get
to work. My stubbornness probably led me to spend a bit too much time on this, but hopefully in the future I'll learn to let go. As of right now a properly passed flag creates a directory Due to my attempts at making the above work accidentally deleting my initial .wdb file, haven't spent any meaningful time examining the waveforms. That will happen tomorrow and in the following days. Hope to find something useful. |
Beta Was this translation helpful? Give feedback.
-
Woo! I’d recommend opening a PR for the new fud options once you have a minimal version working |
Beta Was this translation helpful? Give feedback.
-
Wednesday June 15 Recently created #1036, which does a very minimal job of getting the ,wdb files we can use to debug. Spent the rest of my time staring at said waveform, Vitis documentation, old issues (#853, #958, #367, #876, etc.), and whatever else I could get my hands on. I'm a bit at a loss for the best direction to pursue from here in order to make concrete progress. I've contemplated writing an AXI controller from scratch and then generalizing that, but that seems like doing work that's already been done by others (that in general works). Basically, there are bits and pieces of things I understand, a whole web of things connecting them that I don't. This in turn makes it hard to know the best way forward from here. If @sampsyo, @rachitnigam, or anyone else has any recommendations I would be very grateful! |
Beta Was this translation helpful? Give feedback.
-
Friday June 17 Last two days have been spent reading Xilinx docs regarding axi controllers, vitis, and vivado, and trying to get one of their examples to produce a waveform, probably by using commands similar to the ones at the bottom of this page. Hope to produce and compare this working waveform to the one generated by fud's simulation of dot product and see if any pattern/problems can be recognized from this comparison. No real blockers/questions fortunately. Unfortunately, the going is a bit slow, but I hope things will pick up pace on Monday. |
Beta Was this translation helpful? Give feedback.
-
Pair-programming today, we had a grand ol' time trying to get PYNQ to work to run Xilinx's own example today so we have a reference of something that actually works! Namely, our target was to take the Vitis We ran into a problem where PYNQ got confused and crashed when it saw that one of the kernel parameters had type …and indeed, So our next steps are:
If PYNQ continues to fail us by trying too hard to be easy to use, my next recommendation is to fall back to using PyXRT directly. It's not fancy but it seems to work. There is even a good-looking example of the host code available, albeit for an HLS (not RTL) kernel. The only trouble is that the current version of XRT that's on Havarti is too old to include PyXRT, so we would need to get a newer version. But thankfully, XRT (being open source) is way easier to install than the "main" Xilinx tools like Vitis and stuff. |
Beta Was this translation helpful? Give feedback.
-
Friday June 24 Got PYNQ to work with both calyx generated RTLs and xilinx provided RTLs. In turn spent some time looking at waveforms produced from emulation. Unfortunately the waveforms have not yet proved to be terribly insightful, but I have a feeling I will be revisiting them as I make my way through the examples seen here and comparing them to the generated code calyx produces Unfortunately the calyx generated code is a bit tough to parse. Any insight anyone has there would be immensely helpful! Beyond these comparisons, I was surprised that looking at the example given which uses 2 separate adders to compute the sum of 3 vectors both ips have their very own axi controllers. This seems wasteful to me, I would have assumed that the axi controller could be shared, especially because the adders are identical as far as I can tell. (Unless something happens during compilation to optimize this redundancy away?) Could also be related to #853. This also ties into some confusion from this documentation which is also vector addition (although not necessarily identical to the examples Vitis provides seen above) . Specifically, scrolling down to the RTL viewer images, I don't understand why Axi manager 02 has an adder within it. I would have assumed that the Axi controller would be on the same level as the actual kernel (i.e the adder). Finally, I found a very friendly blog post that I have a feeling I will be coming back to. Next week feels like it might be a collaboration heavy one trying to find what exactly might be wrong. I'm also considering simply writing things from scratch, probably with the help of the blog post above, and trying to create something that works from that. As always I love hearing ideas/thoughts/suggestions on anything and everything and am super appreciative of them! |
Beta Was this translation helpful? Give feedback.
-
Friday July 1, This week we pivoted to trying to get a single memory calyx program to work. The generated waveforms indicated a number of errors in the generated verilog code. I've been addressing the issues by manually modifying verilog code and getting them to emulate through Pynq. Eventually this will need to be transferred over to generation code. Working through the issues incrementally some progress has been made, specifically, the computational kernel (the generated main.sv file) can now correctly accesses the internal bram within the axi controller module to compute 8*4. Side note: @sampsyo Seems like your theory that mismatched width signals between SystemVerilog and Verilog leads to 'z's bring produced. Fixing the width fixed this and it is now being driven correctly. Additionally, this means that we need to match the width of calyx memory to the width of the generated/expected AXI memory controller. Additionally, a DONE signal on the bram wasn't being driven and was causing the toplevel not to write anything to memory. This is now fixed. And a series of writes occurs at the end of the iterator's computational sequence. However, the final memory value is still a 0, which is incorrect. However, I feel like we're pretty close. For those interested, a vcd file of the most recent manually-altered main.sv and toplevel.v files live here. Here is a screenshot of the relevant part of the simulation. At the beginning of next week I will explore @andrewb1999's #1071 and try to make sense of it. From here we need to see why the writes are still writing 0 to memory, and why both reads and writes occur 32 times. Fixing both of those should bring us pretty close to a working model of the iterator program. As always, very appreciative and comments, ideas, and help |
Beta Was this translation helpful? Give feedback.
-
Finally back in Austin so I had access to my AXI experiments. I threw what I had in a repo: https://github.com/sgpthomas/axi-playground. You may already be past the need for these things but here they are anyways. I had a simple verilator AXI driver here. Very basic but maybe something to build off of for testing. It implements a simple state machine using the There's also a barebones axi implementation in that repo: file. It looks like you already found the ZipCPU resource. That was essential in my understanding of things. The other thing I did was dig through the VivadoHLS generated verilog for some Dahlia programs and look at the interfaces it was generating. |
Beta Was this translation helpful? Give feedback.
-
Monday July 11 The past week was spent using @andrewb1999 #1071 to make changes to the verilog generation code, located at #1072. Specifically, I am currently working on #1078, getting ap_idle signal to correctly work. It has taken me a bit of time to comprehend the use of flags and the difference between write and read ports in the axi address space generator. Currently thinking about ways to implement ap_idle section, the block we need doesn't seem to work well with any of the flags we currently. Feels wrong to me to have an entire flag that would only be used once for this particular signal, but on the other hand using a new flag would keep with the current design of the generator. Would appreciate any thoughts on this. My time this week will be spent working through the issues in #1072, hopefully finishing by the end of this week. |
Beta Was this translation helpful? Give feedback.
-
Friday July 15: Finished #1072 today, thanks to @rachitnigam @sampsyo and especially @andrewb1999. When trying to get generated code to work on calyx's vectorized add only the 0 index of our output array is correctly written to (from PYNQ's perspective). One thing that is strange is that PYNQ seemed to think it was writing 8 times, implying that it was either writing 0s to certain indexes or writing at the 0th index multiple times (which would be strange considering that the WADDR changes in the waveform below. For an input of [0,1,2,3,4,5,6,7] and [300,300,300,300,300,300,300,300], The waveform of the axi-memory-controller for our output vector looks like this (almost everything is decimal): I briefly talked to @sampsyo about the waveform, and we struggled to see what might be wrong. WDATA is properly assigned, handshakes occur on all channels, In case anyone is interested, Possible ways forward:
Happy to be so close, but still have a bit more work to do. P.S. @andrewb1999 If you're free and have a chance to look into this that could be super helpful. You did in a day what would have taken me a month, so maybe something will pop out to you. Of course, no worries if not. As always, happy to hear suggestions/comments/feedback/etc! |
Beta Was this translation helpful? Give feedback.
-
Monday July 18, As of today, when #1109 is merged, #1072 should be done and we should be able to perform correct hardware emulation for calyx programs (at least the ones we have as examples: vector addition, dot product, and iteration). Talked with @rachitnigam about steps forward, which is as follows:
With the eventual goal of having CI testing with cocotb to ensure nothing breaks with our generated AXI code. Additionally, a parallel existing issue to tackle is #1084, which shouldn't be complex but might be a little tedious. I plan to work on this in spare time/when I need a break from whatever I'm doing |
Beta Was this translation helpful? Give feedback.
-
Friday July 22, Tried to use a combination of cocotb documentation, cocotbext-axi code, Andrew's advice and Rachit's code to create the outline of a cocotb harness It is very messy and still very much an outline, I'm still working on figuring out cocotb and trying to understand a "standard", best-practices form of writing these from the examples I linked above. Luckily I'm not really stuck on anything, these things just take me a bit of time. Next week I am unfortunately not going to be very available, but in general the current plan is to continue working on this until it is finished |
Beta Was this translation helpful? Give feedback.
-
Friday August 5th, Since last time a bug in creating vcd files was found and fixed #1127. Determining the controls signals needed to be sent to the kernel's subordinate-control-module is the next step. This probably will end up tying into #1138. Determining the read/writes to perform on our manager memory-controlers will probably be easier, and similar to setting up the rams that currently appear in the prototype test bench. I feel like I'm making slow progress, yet can't put my finger on any one thing that is blocking me. Might just be a matter of taking time to understand things better. As always, any suggestions/tips regarding anything here is appreciated. |
Beta Was this translation helpful? Give feedback.
-
Thursday August 11, Happy to report that some progress has been made. A very messy axi_test.py succeeds in writing the output of our computational kernel to a Lots of stuff in this minimal working version is hard coded, so from here we need to generalize. However I'm hoping that this will end up similar to what I had to do for PYNQ. After having a generalized version working on multiple calyx programs, I will need to integrate into our CI flow. Hoping that's not too difficult. Happily nothing is blocking me at the moment. |
Beta Was this translation helpful? Give feedback.
-
Goal is to create a generalized way for Verilog designs to execute on FPGA boards using Calyx.
Broadly split into the following parts:
Broad Steps
This issue will also be a place for weekly high level notes and updates consisting of the following:
Beta Was this translation helpful? Give feedback.
All reactions