Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Refactor the EAE (hardware multiplier and divider) #39

Open
MJoergen opened this issue Dec 31, 2023 · 4 comments
Open

Suggestion: Refactor the EAE (hardware multiplier and divider) #39

MJoergen opened this issue Dec 31, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@MJoergen
Copy link
Collaborator

MJoergen commented Dec 31, 2023

The refactoring will ultimately reduce the complex logic tree generated.

The idea is to use simpler (and slightly slower) algorithms, but reduce complexity (and maybe even synthesis time).

Currently, the EAE uses approx 1339 LUTs and 410 slices. This can probably be reduced by a factor of 5 or more.

Currently, a calculation takes 3 clock cycles. After refactoring I estimate a calculation to take around 7 clock cycles.

To accommodate the slower calculation, a CPU_WAIT_FOR_DATA signal should be added.

@MJoergen MJoergen added the enhancement New feature or request label Dec 31, 2023
@MJoergen
Copy link
Collaborator Author

We might also want to include additional functions, e.g. square root. This can be done in around 7 clock cycles too. See this link for an implementation of the square root function: https://github.com/MJoergen/math/tree/master/fast_sqrt

@MJoergen
Copy link
Collaborator Author

Here is an implementation of the division algorithm: https://github.com/MJoergen/nexys4ddr/blob/master/offloader/Episodes/ep10_-_Fact/math/divmod.vhd

Note, all these optimized algorithms are expected to run at a much faster clock rate (> 200 MHz). This is possible because the bit shifting logic is very simple. As long as the fast clock is an integer multiple of the CPU clock, then there should be no Clock Domain Crossing issues.

@sy2002
Copy link
Owner

sy2002 commented Dec 31, 2023

@MJoergen Sounds like a good idea. We could do this in the "dev-V1.61" branch of QNICE, which is the "MiSTer2MEGA65" branch when time is ripe.

Instead of implementing a CPU_WAIT_FOR_DATA, we might also consider to use the busy flag in the EAE's CSR register, which was meant to signal to the Monitor's math library that the calculation is still runnning:

https://github.com/sy2002/QNICE-FPGA/blob/dev-V1.61/vhdl/EAE.vhd#L108C27-L108C27

The math library already has code that waits for the calculation to finish as shown here, for example:

https://github.com/sy2002/QNICE-FPGA/blob/master/monitor/math_library.asm#L19

We then would need to ensure, that our MiSTer2MEGA65 version of monitor.asm has NOT defined EAE_NO_WAIT:

https://github.com/sy2002/QNICE-FPGA/blob/master/monitor/monitor.asm#L4

So by going down this route, the refactoring would be rather seamless to the QNICE Monitor code.

@MJoergen
Copy link
Collaborator Author

MJoergen commented Jan 1, 2024

Interesting alternative. I had indeed forgotten about that "busy" status :-) However, I still prefer the CPU_WAIT_FOR_DATA approach.

My reasoning is as follows: The "check for busy" code in math_library.asm takes three CPU instructions (MOVE, AND, RBRA), for a total of circa 7 (?) clock cycles in each iteration. That approach seems optimal for "long" waits, i.e. if the hardware is MANY clock cycles in completing. If, however, we reach the target of 7 clock cycles for a single calculation then just stalling the CPU for this many clock cycles is "simpler". If nothing else, it saves three instructions in the firmware.

Plus, the stalling will only ever occur during the read from the EAE. Notice line 24: https://github.com/sy2002/QNICE-FPGA/blob/master/monitor/math_library.asm#L24C1-L24C1 This is a "useful" instruction taking place simultaneously with the calculation. Therefore, the subsequent read from EAE in the next line will stall for only circa 4 clock cycles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants