-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenSBI Bug #1264
Comments
I found what I think the root cause is. OpenSBI makes two calls to fdt_open_into with the length of the device tree + 32 and then again with the length of the device tree + 1024. This function modifies and moves the device tree to another location but appears to incorrectly either move the wrong number of bytes or set the size of the device tree incorrectly. The file in question is buildroot/output/build/opensbi-1.6/lib/utils/libfdt/fdt_rw.c. The third argument to the function fdt_open_into is int bufsize. The memory move however is called with a different length newsize which is smaller than bufsize. memmove(buf, tmp, newsize); Later in the code the length is set with fdt_set_totalsize(buf, bufsize); One of two changes should be made to either set the length to match newsize or have memmove copy the how bufsize. I am testing the first option. |
That’s interesting and sounds like a bug in the source code for openSBI itself. If so, I wonder why no one else has run into this. |
I wouldn't be surprised if Linux just ignores the crc32 error. |
This definitely fixed the issue and lets the Linux boot continue. I found another issue immediately after. The MCAUSE, MEPC, and several other CSRs are continuously enabled while StallW is asserted during a Trap this causes a problem for the hardware tracer as the state will update immediately and be associated with the previous instruction. I assume this will also happen with wallyTracer.sv. This isn't really a hardware bug in Wally, but would have a small power cost. However given the tracer implication I think we want these changes. Once I've confirmed this will impact wallyTracer I will make a separate pull request. Linux got to 2.2B instructions. The current problem is the interrupt timer is too fast and boot isn't making any progress. I'll have to adjust the device tree. |
Whoo! |
Great! Should this be opened as an issue against OpenSBI? And until the fix is upstreamed, do we want/need to carry that patch for our build of OpenSBI? Buildroot has strong support for custom patches if so. |
Yes we should do open an issue with OpenSBI. I'm going to be busy for the next few hours, but when I get back I'll separate out the code for OpenSBI and make a separate patch. It's possible there is a better solution than mine as I might not fully understand the intention of their code. |
My solution is much uglier than I like and is probably not the correct solution. ftd_open_into(const void *fdt, void *buf, int bufsize) does not initialize buf and has multiple path ways through the code to set the new size of the device tree. I am just initializing buf with all 0's at the beginning and this corrects the ImperasDV mismatch. However, I doubt this is the solution the OpenSBI maintainers will want. However, lockstep is now working for billions of instructions! Unfortunately the FPGA kernel panics but lockstep passes. |
I'm opening a bug to track progress on an issue my FPGA/ImperasDV lockstep emulator discovered.
The bug becomes visible when checking the crc32 of the device tree in the Linux kernel. The length of the crc32 is too long, 0xda5 bytes rather than the correct 0x9a5. Interesting, this passes simulation because our RAMs model un-initialized memory to read 0 and so does ImperasDV. Thus ImperasDV sees agreement between the two and the crc32 computes 0 when the input is 0 so the computation passes. However the FPGA is real hardware and reads a random value past the length of the device tree so ImperasDV mismatches.
I've repeated this on both the old and new (Kernel 6.12.8) versions of buildroot.
OpenSBI copies the length 0x9a5 twice to the same location in memory (PA 0x822004). The first time is the correct value 0x9a5, but the second time is the wrong value 0xda5.
The text was updated successfully, but these errors were encountered: