Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable all possible leon3 bare-metal tests #33

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Enable all possible leon3 bare-metal tests #33

wants to merge 3 commits into from

Conversation

zzhu35
Copy link
Contributor

@zzhu35 zzhu35 commented Apr 17, 2020

The following multicore baremetal tests from GrLib were enabled and tested during the development of Spandex LLC at UIUC using a quad-core leon3 configuration. The enabled tests were also verified on an original ESP quad-core system. When calling base_test() from systest.c in the design folder, CPU 0 should print a report of all successfully executed tasks by each CPU.

@davide-giri davide-giri self-requested a review April 17, 2020 03:51
@davide-giri davide-giri self-assigned this Apr 17, 2020
@davide-giri davide-giri added the enhancement New feature or request label Apr 17, 2020
@davide-giri
Copy link
Member

Hi, thanks for submitting the pull request!

I ran the RTL simulation of the base_test() in the pull request with 4 Leon3 cores, but either the simulation got stuck or it needs to run for more than two days. When you verified it on the original ESP quad-core system, how long did your simulation take approximately? Did you test the app with the latest version of ESP?

If the runtime of the simulation is really that long I suggest to shorten it as much as possible (e.g. do not repeat multiple times the more time consuming tests). Additionally, it would be useful to add some prints done only by CPU0, to keep track of the state of the simulation. This can be done at the granularity of each test in leon3_test.c or at a finer granularity in some cases.

Thank you!

@zzhu35
Copy link
Contributor Author

zzhu35 commented Apr 20, 2020

Hi Davide, I did not run these tests in the simulator because I figured out that they take too long. Could you try running it on the FPGA and see if it actually gets stuck? It was working fine for me.
Once we verified it's working on the FPGA we can reduce the numbers for the simulators.

Thank you!

@davide-giri
Copy link
Member

I tested on a Xilinx VC707 with 4 CPU tiles, but it still gets stuck with or without the new prints you added. This is the behavior I observe. What's your setup for this test?

  • Without new prints
Start testing on 4 CPUs.
  • With new prints:
Start testing on 4 CPUs.
Finished multest.
Finished divtest.
Finished cache_fill with BYTE granularity.
Finished cache_fill with HALFWORD granularity.
Finished cache_fill with WORD granularity.

@zzhu35
Copy link
Contributor Author

zzhu35 commented Apr 22, 2020

Thank you.

I am using the same board as you are.

I do not recall which commit I ran these tests on. I'll re-do the test on the latest version and see what happens.

Is you FPGA test hanging every time you run it? Does it ever get finished?

@davide-giri
Copy link
Member

It always hangs and according to the terminal output it's possible it's always getting stuck in the same place.

@zzhu35
Copy link
Contributor Author

zzhu35 commented Apr 22, 2020

Thank you for that info. I do not have an ESP implementation in hand and I'm compiling one as I type. However, I just ran the test with Spandex and it was working fine. Are you using ESP RTL cache or ESP SystemC cache?

@davide-giri
Copy link
Member

I'm actually using the RTL cache, I can try with the SystemC cache and see if there are any differences.

@zzhu35
Copy link
Contributor Author

zzhu35 commented Apr 22, 2020

I remember in the past I tested it using an ESP SystemC cache. I have never tried using ESP RTL cache but I'll give it a try now to see if anything went wrong.

@zzhu35
Copy link
Contributor Author

zzhu35 commented Apr 23, 2020

I just ran the tests with ESP RTL caches and got the same hanging behavior as yours. I'm now compiling a new design with SystemC caches.

@zzhu35
Copy link
Contributor Author

zzhu35 commented Apr 23, 2020

I ran the test with SystemC caches and got the following output.

Start testing on 4 CPUs.
Finished multest.
Finished divtest.

I am afraid some other bugs are still present in the system. Just for a sanity check I will merge the HEAD of ESP into Spandex and see if it hangs or not.

@davide-giri
Copy link
Member

davide-giri commented Apr 23, 2020

Ok, so I won't merge this pull request because the tests do not work in ESP. Anyway this is useful to potentially find a bug in the system.

We'll take a look on our side to see if we find the problem. By the way are you sure about the position of if (!pid) data_structures_setup();? It seems to me it should be called earlier, before the other cores wake up.

@zzhu35
Copy link
Contributor Author

zzhu35 commented Apr 23, 2020

Yes, the data_structures_setup routine calls malloc for the buffers being used by later cache_fill tests.

I ran the tests with Spandex and it is working for me.

How many ways in the L2 cache does your configuration have? I just realized that the "ways" parameter passed to cache_fill shouldn't be hardcoded to 4. If your configuration has more or less than 4 ways, could you rerun the modified test?

Thank you!

@davide-giri
Copy link
Member

We reproduced the issue and some debugging showed a potential corner-case not covered correctly in the case of two consecutive casa instructions on two different words of the same cache line.

We're working on a bug fix and we'll post here when done.

@zzhu35
Copy link
Contributor Author

zzhu35 commented May 4, 2020

Thank you for the update! Which level of cache is this bug occurring on? I'm concerned if it's on L2 then Spandex might also be affected.

@davide-giri
Copy link
Member

At the moment it seems it should be in the private L2. If that's the case Spandex may have the same problem, which may not manifest itself because of different timing. We'll know more once we confirm and fix the bug.

@zzhu35
Copy link
Contributor Author

zzhu35 commented May 4, 2020

I see. Some of the ESP L2 states are unreachable in Spandex. That's also a potential reason why it was not being triggered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants