Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSBI Bug #1264

Open
rosethompson opened this issue Feb 6, 2025 · 8 comments
Open

OpenSBI Bug #1264

rosethompson opened this issue Feb 6, 2025 · 8 comments

Comments

@rosethompson
Copy link
Contributor

I'm opening a bug to track progress on an issue my FPGA/ImperasDV lockstep emulator discovered.

The bug becomes visible when checking the crc32 of the device tree in the Linux kernel. The length of the crc32 is too long, 0xda5 bytes rather than the correct 0x9a5. Interesting, this passes simulation because our RAMs model un-initialized memory to read 0 and so does ImperasDV. Thus ImperasDV sees agreement between the two and the crc32 computes 0 when the input is 0 so the computation passes. However the FPGA is real hardware and reads a random value past the length of the device tree so ImperasDV mismatches.

I've repeated this on both the old and new (Kernel 6.12.8) versions of buildroot.

OpenSBI copies the length 0x9a5 twice to the same location in memory (PA 0x822004). The first time is the correct value 0x9a5, but the second time is the wrong value 0xda5.

@rosethompson
Copy link
Contributor Author

I found what I think the root cause is. OpenSBI makes two calls to fdt_open_into with the length of the device tree + 32 and then again with the length of the device tree + 1024. This function modifies and moves the device tree to another location but appears to incorrectly either move the wrong number of bytes or set the size of the device tree incorrectly.

The file in question is buildroot/output/build/opensbi-1.6/lib/utils/libfdt/fdt_rw.c. The third argument to the function fdt_open_into is int bufsize. The memory move however is called with a different length newsize which is smaller than bufsize.

memmove(buf, tmp, newsize);

Later in the code the length is set with

fdt_set_totalsize(buf, bufsize);

One of two changes should be made to either set the length to match newsize or have memmove copy the how bufsize. I am testing the first option.

@jordancarlin
Copy link
Member

That’s interesting and sounds like a bug in the source code for openSBI itself. If so, I wonder why no one else has run into this.

@rosethompson
Copy link
Contributor Author

I wouldn't be surprised if Linux just ignores the crc32 error.

@rosethompson
Copy link
Contributor Author

This definitely fixed the issue and lets the Linux boot continue. I found another issue immediately after. The MCAUSE, MEPC, and several other CSRs are continuously enabled while StallW is asserted during a Trap this causes a problem for the hardware tracer as the state will update immediately and be associated with the previous instruction. I assume this will also happen with wallyTracer.sv. This isn't really a hardware bug in Wally, but would have a small power cost. However given the tracer implication I think we want these changes. Once I've confirmed this will impact wallyTracer I will make a separate pull request.

Linux got to 2.2B instructions. The current problem is the interrupt timer is too fast and boot isn't making any progress. I'll have to adjust the device tree.

@davidharrishmc
Copy link
Contributor

Whoo!

@jordancarlin
Copy link
Member

Great! Should this be opened as an issue against OpenSBI? And until the fix is upstreamed, do we want/need to carry that patch for our build of OpenSBI? Buildroot has strong support for custom patches if so.

@rosethompson
Copy link
Contributor Author

Yes we should do open an issue with OpenSBI. I'm going to be busy for the next few hours, but when I get back I'll separate out the code for OpenSBI and make a separate patch. It's possible there is a better solution than mine as I might not fully understand the intention of their code.

@rosethompson
Copy link
Contributor Author

My solution is much uglier than I like and is probably not the correct solution. ftd_open_into(const void *fdt, void *buf, int bufsize) does not initialize buf and has multiple path ways through the code to set the new size of the device tree. I am just initializing buf with all 0's at the beginning and this corrects the ImperasDV mismatch. However, I doubt this is the solution the OpenSBI maintainers will want.

However, lockstep is now working for billions of instructions! Unfortunately the FPGA kernel panics but lockstep passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants