Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timing constraints and 'report_timing' in CologneChip proprietary PNR?! #18

Closed
chili-chips-ba opened this issue Jul 11, 2024 · 10 comments
Closed
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation question Further information is requested

Comments

@chili-chips-ba
Copy link
Owner

Here is a good question from @TurboVega:

"... does anyone know how (command line option, maybe) to get more detailed information out of the P/R tool, such that we can determine exactly what "paths" are responsible for the maximum clock rates computed by the tool? In its output, it just labels the clocks with generated names, as it does when referring to various FPGA internal components, making it hard to know what Verilog source entities are involved. Knowing what parts of the source affect long paths helps in optimizing speed..."

@chili-chips-ba chili-chips-ba added the documentation Improvements or additions to documentation label Jul 11, 2024
@pu-cc
Copy link
Collaborator

pu-cc commented Jul 11, 2024

Due to mapping and various optimizations during implementation in P&R, it is not possible to keep all signals and names for cross-referencing. However, registers remain identical, and can be found in the *.crf output. The file is generated automatically after P&R if the +crf flag is set.

Here is an example: You find the critical path information with highlighted start and end in the P&R log:
image

The CPE names are made up of the component number (_a) and the CPE part (/1) or (/2). In this example, the starting flip-flop has component number 110, part 2 (110/2 or _a110/OUT2). You find it in the CRF file as follows:
image

In the post-synthesis netlist (*_synth.v), it has the instance name _3208_, and you will find the flip-flop with reference to downsampler_inst.generalcounter[15] in your code:
image

Similarly, we also find the target flip-flop (100/1 or _a100/OUT1) in the CRF:
image

In the post-synthesis netlist, you find it as instance _3199_:
image

In order to optimizing your critical path, you could now examine the path between thegeneralcounter[{15,6}] registers in your code and optimize it if necessary.

@TurboVega
Copy link
Collaborator

TurboVega commented Jul 11, 2024 via email

@chili-chips-ba
Copy link
Owner Author

@pu-cc good tips 💯

Still, how do we do random timing queries (such as report_timing) in the CologneChip framework?

Is there a document that describes scripts and procedures to use if they are not based on the de-facto industry standard SDC?!

@chili-chips-ba
Copy link
Owner Author

chili-chips-ba commented Jul 24, 2024

@pu-cc - How do we go about specifying timing constraints for GateMate?

  • primary clock period
  • generated clocks (such as from internal PLL) or RTL dividers
  • exceptions: MCP, FP
  • I/O delays

From earlier experience (see this, nextpnr is not very too timing-savvy, if at all (@MikeReznikov for additional comment).

While we expect CologneChip proprietary P_R tool to be better than nextpnr in terms of timing awareness, this is to seek additional info on that topic.

@chili-chips-ba chili-chips-ba added the question Further information is requested label Jul 25, 2024
@chili-chips-ba
Copy link
Owner Author

chili-chips-ba commented Sep 9, 2024

@pu-cc , @DadoCCAG -- Your answers to the above questions have become uber-critical at this point!

We are seeing that PicoRV32, which is the essential element of our TetriSaraj application, does not work properly at 100MHz. There are timing violations in hardware. They are not reported, which is expected, as we currently don't have any clock constraints in the build.

While we have blindly reduced PicoRV32 clock to 10MHz to "make it work" (or at least so appear) without any timing constraints, we don't know for a fact whether that's sufficiently slow.

Builds without timing constraints are not acceptable in the long run. Moreover, inability to specify timing constraints is simply a showstopper for commercial / professional projects and settings.

@chili-chips-ba chili-chips-ba added the bug Something isn't working label Sep 9, 2024
@chili-chips-ba chili-chips-ba changed the title How to 'report_timing' in CologneChip proprietary PNR tool? Timing constraints and 'report_timing' in CologneChip proprietary PNR?! Sep 9, 2024
@TarikHamedovic
Copy link
Collaborator

TarikHamedovic commented Sep 10, 2024

I went through the GateMate documentation and found this line:

Furthermore, the netlist is passed to the Place & Route tool for architecture-specific im-
plementation and bitstream generation. A netlist converter generates a generic netlist
from the Yosys or legacy netlist. The first steps of Place & Route comprise procedures for
speed or area optimization before mapping. After placement and routing, the static tim-
ing analysis (STA) might lead to further optimization steps and makes the Place & Route
software an iterative process of constraint-driven re-placement and re-routing steps to
finally achieve user requirements.

In which it says that after P&R there is an STA, but looking through the pages 80-86 of the GateMate FPGA Datasheet there are no options to specify a clock constraint as other FPGA vendors have. And also there is no mention of a clock constraint in their workflow diagram below.

image

@chili-chips-ba
Copy link
Owner Author

... this calls for some questions:

  1. What criteria are used for *constraints-driven placement* and *constraints-driven routing* in the situation when even the elementary clock period cannot be specified?!
  2. What's the scope of *STA implementation step* in this context, w/o timing constraints whatsoever?

@chili-chips-ba
Copy link
Owner Author

@pu-cc it's interesting that your own PicoRV32 constraints for GateMate are also alluding to 10MHz clock. Granted, even your CCF has it only as a comment, as opposed to the actual clock constraint.
image

  • How have you arrived to, and validated that 10MHz number?

Is it that you simply "feel comfortable" with 10MHz, based on your extensive empirical trial-and-error?! Note that PicoRV32 in both Xilinx and Gowin ports of TetriSaraj runs reliably at 100MHz+.

@DadoCCAG, in order for us to compare eduBOS5 GateMate timing performance to that of Xilinx and Gowin, we absolutely need to have a reliable way for specifying timing constraints, i.e. validating timing closure.

@pu-cc
Copy link
Collaborator

pu-cc commented Sep 11, 2024

Is it that you simply "feel comfortable" with 10MHz [...]

No, not at all. Let me briefly address the most important points:

Placement takes place using the quadratic placement algorithm. After all signals have been routed, p_r always runs an STA. This can also be seen in the log file:

[...]
Static Timing Analysis

Skew violation report using only 80% delay of data path
[...]

STA takes the current placement as a basis and calculates the maximum achievable frequency for all clocks, as I have shown in my first answer. Each clock reports a maximum clock frequency and it's critical path.

Moreover, STA checks for clock skew and applies measures to reduce it.

Once the STA has finished, it should be ensured that the timing for the clock specified in the report is achieved.

In my experiments, picorv and vexcrisv reached about 30-50 Mhz (worst corner).

@chili-chips-ba
Copy link
Owner Author

@pu-cc given that the necessary timing information is available in the P_R database, what would it take to bring the flow from its current reactive* timing closure methodology up to something that at least on surface resembles the mainstream pro-active approach?!

Here is an idea:

  1. allow declaration of the basic clock constraint in the CCF
  2. provide post-processing script that would extract all Fmax reports from the P_R log and compare them to the declared input clock frequencies, flagging violations when below, and displaying the extent of headroom when met
  3. in the next phase, build on top of it to add support for generated clocks
  4. eventually add ability to parse the database and support report_timing command

(*) the current P_R is apparently not timing-driven. We understand that the P_R is using quadratic placement algorithm.

  • Where can we find more about its constraints-driven placement and constraints-driven routing properties?
  • What exact constraints are tapped into to drive that process?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants