Replies: 8 comments 15 replies
-
Is the critical path of the FPU through the multiplier? |
Beta Was this translation helpful? Give feedback.
-
@povik any thoughts? |
Beta Was this translation helpful? Give feedback.
-
I don't know how you've implemented your FPU, but frequently the problem in FPUs is that you can't pipeline through operators +, -, /, <, >. Chisel and other HLS type things don't lower all the way to gates so you can't insert registers through it. It's usually the rounding and ulup stuff that kills you in FPs. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
@QuantamHD The main motivation for using retiming is to identify performance left on the table from a lack of architectural/manual retiming and also to improve ranking accuracy with a minimum of effort(don't manually retime architectural dead ends). |
Beta Was this translation helpful? Give feedback.
-
Have you looked at the paths coming into core/095297 and out of core/091420 to see if there is slack that could have been reapportioned by moving the register? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
It looks like retiming isn't able to attain the optimum if it's too far off from the starting point. E.g. take this synthetic example:
I'm skipping any optimizations which would fuse the additions. Theoretically the 100 registers at the end can be evenly distributed to get a 100x reduction in the critical path. Abc's
and it achieves final depth of 20.
If I iterate it 7 times (I need to use At the same time, another option on the retime command (
This option looks to be informational and doesn't seem to support actually transforming the netlist to attain a depth of 3. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Why as I add more pipeline stages is the asymptotic minimum clock period for an FPU 600ps and 200ps for a 64x64 bit multiplier?
Change
10
below with the desired number of pipeline stages.FPU:
bazelisk run //multiplier:FloatingPointUnit_10_1_retimed_synth /tmp/synth gui_synth
. A 10 pipeline stage retimed Berkeley hardfloat.64x64 bit integer multiplier with 10 pipeline stages has ca. 200ps lower limit
To create a plot:
Beta Was this translation helpful? Give feedback.
All reactions