Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limited support for Distributed Memory / LUTRAM #20

Open
hansemro opened this issue Dec 20, 2023 · 35 comments
Open

Limited support for Distributed Memory / LUTRAM #20

hansemro opened this issue Dec 20, 2023 · 35 comments

Comments

@hansemro
Copy link

hansemro commented Dec 20, 2023

Issue Description

memory_libmap pass in Yosys 0.18 and newer would synthesize LUTRAMs unsupported by nextpnr including:

  • RAMS32 (manually instantiated)
  • RAMD32 (manually instantiated)
  • RAMS64E (manually instantiated)
  • RAMD64E (manually instantiated)
  • RAM32X1S
  • RAM64X1S
  • RAM64X1S_1 (same as RAM64X1S with inverted clock)
  • RAM128X1S
  • RAM256X1S

Part of the issue stems from nextpnr not fully supporting all LUTRAMs in the Distributed RAM packer in xilinx/pack_dram.cc.

Resolving this should also address openXC7/demo-projects#6.

Tasks/Status

TODO: rewrite tasks

Development Branches

  • experimental branch with support for RAMS32, RAMS64E, RAM32X1S, RAM64X1S, RAM128X1S, RAM256X1S: https://github.com/hansemro/nextpnr-xilinx/commits/xc7-lutram-dev/
    • use with yosys 0.18 or newer to test
    • rebuild chipdb after building
    • broken as many incorrect assumptions were made in this branch:
      • does not check negative z height for newly supported cells, possibly breaking projects with several newly supported cells.
      • missing RAMS32/RAMD32 to LUT_OR_MEM transformations with DI1 and O6 ports used.

References

See 018-clb-ram minitest. Build and view design checkpoint in Vivado.

https://f4pga.readthedocs.io/projects/prjxray/en/latest/architecture/dram_configuration.html

https://docs.xilinx.com/v/u/en-US/ug474_7Series_CLB

https://docs.xilinx.com/v/u/en-US/ug574-ultrascale-clb

https://docs.xilinx.com/r/en-US/ug953-vivado-7series-libraries

https://docs.xilinx.com/r/en-US/ug974-vivado-ultrascale-libraries

https://www.xilinx.com/content/dam/xilinx/support/documents/sw_manuals/xilinx14_7/7series_hdl.pdf

https://docs.amd.com/v/u/en-US/7series_hdl

https://github.com/Xilinx/XilinxUnisimLibrary/tree/master/verilog/src/unisims

@hansfbaier
Copy link
Collaborator

Very good, thanks!

@hansemro
Copy link
Author

Added support for RAM128X1S and RAM256X1S though not sufficiently tested. I was able to build litex-ddr-kc705 after latest changes. However, I am now running into DDR memtest issues it seems: https://gist.github.com/hansemro/5f48f4098e59f9db2e34ae25cb0b6ecd

@hansfbaier
Copy link
Collaborator

@hansemro Wow, that was quick! We might want to write some basic tests here:
https://github.com/openXC7/primitive-tests
Debugging the issue inside that complex design is probably too cumbersome.

@hansemro
Copy link
Author

@hansfbaier Sounds good. I'll try to write some tests soon. DDR issue could be totally unrelated to this.

@hansfbaier
Copy link
Collaborator

@hansemro Yes, very likely your code works. I never got 8 modules working nice with OpenXC7. At some point the timing just falls apart, because of congestion, I suppose. See my comment on your gist.

@hansemro
Copy link
Author

Rebased experimental branch on 8120acd (current stable-backports)

@hansfbaier
Copy link
Collaborator

@hansemro great, thanks

@hansemro
Copy link
Author

hansemro commented Dec 20, 2023

Fixed up RAM32X1D not creating RAMD32 instances (was creating RAMD64 instances previously) and made sure RAM32X1S is handled in pack_dram (forgot this one apparently).

@hansemro
Copy link
Author

hansemro commented Dec 20, 2023

Made initial tests: https://github.com/hansemro/primitive-tests/commits/lutram-tests/

  • Targets KC705 with its 200 MHz differential clock.
  • Uses basic clock division and reset propagation so that I can visibly check patterns via LED.
  • Basic 1,0,1,0,... write pattern.
  • Simple FSM: Reset -> Clear -> Write -> Read -> Finish

Notably nextpnr-xilinx hangs with RAM256X1S. Seems #10 made this observation months ago. I'll try to spend some time debugging this.

@hansemro
Copy link
Author

RAM256X1S and RAM256X1D were not handled correctly since their address pins are in an array A[N:0] rather than specified individually (A0, A1, A2, ... ). I'll need to validate every transform rule anyway...

@hansfbaier
Copy link
Collaborator

@hansemro I pushed an MMCM-fix to stable-backports. The MMCM should work now, if you have time to try.

@hansemro
Copy link
Author

hansemro commented Dec 21, 2023

@hansfbaier Thanks for the heads up. I will get to it at some point. I decided it is better for me to fix dual-port LUTRAMs before moving forward, because everything is broken (for at least xc7, not sure about ultrascale).

For example, while tracing how RAM128X1D is handled, twice as many RAMD64Es are created with z/height decremented for each one. We end up with negative z?! Not sure what the intention was but this does not seem right to me.

@hansfbaier
Copy link
Collaborator

Yes definitely, that is more important. Thanks for working on this.

@hansemro
Copy link
Author

I misspoke since I was testing RAM256X1D (thought I was testing RAM256X1S) which does not fit in xc7 anyhow. Still, yosys should not allow unsupported LUTRAMs to be synthesized!

@hansemro
Copy link
Author

hansemro commented Dec 21, 2023

Resolved two issues:

  • Fixed address port detection for single port LUTRAM with c073116
  • Fixed MUXF tree z offset for RAM256X1S with 4792453
    • This fixes placer stalling when handling designs with RAM256X1S
    • Will need to revisit for Ultrascale since they have different offsets (more LUTs in a slice).

@hansfbaier
Copy link
Collaborator

@hansemro How are things going? Have you been able to test the changes?

@hansemro
Copy link
Author

hansemro commented Feb 6, 2024

@hansfbaier

Have you been able to test the changes?

MMCM is confirmed working with multiple clock outputs on KC705 though with a BUFG on all clock outputs. fasm2frames would throw segment DB errors if I didn't have them. Test branch: https://github.com/hansemro/primitive-tests/commits/mmcm-blinky-kc705-db-error/

Interestingly, the first clock output didn't require me to place BUFG buffer, though it should probably have one.

How are things going?

Things were going well until I had to handle each LUTRAM as an edge case. Initially, I was less bothered to write things down, but now I feel it is appropriate to actually spend time documenting the port/parameter transformations for all cells (including ultrascale-only cells). I intend to resume work on validation and get xc7 cells covered.

Anyhow, it turns out I made some incorrect assumptions about some things. Here are some TODOs:

  1. create_dram32_lut should be able to map LUT5 or LUT6 BEL site, but currently doesn't
  2. check whether address ports are connected to both A{6:1} and WA{8:1} (WA{9:1} for ultrascale) ports in LUT6/LUT5 BEL for single-port LUTRAM

Will elaborate further with a follow-up post on LUTRAM transformations to RAMS/RAMD primitive cells to LUT5/LUT6 BELs in more detail.

@hansfbaier
Copy link
Collaborator

hansfbaier commented Feb 6, 2024

Yes, I had similar observations.
CLKOUT 1-3 had missing pips. CLKOUT 0,5,6 worked fine out of the box:
https://github.com/openXC7/primitive-tests/blob/main/mmcm-blinky-kintex/blinky.v
But only on Kintex. On other series all CLKOUT ports were fine.

@hansfbaier
Copy link
Collaborator

Thanks for the update!

@hansemro
Copy link
Author

hansemro commented Feb 8, 2024

Naming Notation:

I'll define the following name notation to fold port/parameter names:

  • A comma-separated list or range enclosed by curly brackets {} maps to individual names with an entry from the list/range.
    • Curly brackets may be inserted anywhere in the name.
    • List example: ADR{3, 2, 1, 0} maps to ADR3, ADR2, ADR1, ADR0
    • Range example: {D:A}_O maps to D_O, C_O, B_O, A_O
  • Verilog-style array with a range enclosed by square brackets [].
    • The range indicates the bit width of the signal/parameter. If there are no square brackets, assume 1-bit width.
    • Square brackets can only be placed at the end of a name.
    • Example: ADR[3:0] maps to ADR[3], ADR[2], ADR[1], ADR[0]
  • Combined example:
    • ADDR{D:A}[4:0] maps to ADDRD[4:0], ADDRC[4:0], ADDRB[4:0], ADDRA[4:0]

LUTRAM Cell Table:

Additional notes:

  • LUT_OR_MEM5/LUT_OR_MEM6 BEL cannot be instantiated manually. Port and parameter details of these BELs are purely behavioral.
  • LUT_OR_MEM6 BEL has SIN and MC31 pins which are used for shift register functionality and can be ignored for LUTRAM.
Cell Cell Type US-only *INIT* Parameter *CLK_INVERTED Parameter Clock Input Write Enable Input Write Data Input Write Select Input Write Address Input Read Address Input Read Data Output
LUT_OR_MEM5 BEL false INIT[31:0] N/A CLK WE DI1 N/A WA{5:1} A{5:1} O5
LUT_OR_MEM6 BEL false INIT[63:0] N/A CLK WE DI2 N/A US ? WA{9:1} : WA{8:1} A{6:1} O6
RAMS32 LUTRAM Primitive false INIT[31:0] IS_CLK_INVERTED CLK WE I N/A ADR{4:0} ADR{4:0} O
RAMD32 LUTRAM Primitive false INIT[31:0] IS_CLK_INVERTED CLK WE I N/A WADR{4:0} RADR{4:0} O
RAMD32M64 LUTRAM Primitive true INIT[63:0] IS_CLK_INVERTED CLK WE I N/A WADR{4:0} RADR{5:0} O
RAM32X1S LUTRAM false INIT[31:0] IS_WCLK_INVERTED WCLK WE D N/A A{4:0} A{4:0} O
RAM32X1D LUTRAM false INIT[31:0] IS_WCLK_INVERTED WCLK WE D N/A A{4:0} A{4:0}; DPRA{4:0} SPO; DPO
RAM32X16DR8 Asymmetric LUTRAM true N/A? IS_WCLK_INVERTED WCLK WE DI{H:A}[1:0] N/A ADDRH[4:0];ADDR{G:A}[5:0] ADDRH[4:0];ADDR{G:A}[5:0] DOH[1:0]; DO{G:A}
RAM32M SelectRAM false INIT_{D:A}[63:0] IS_WCLK_INVERTED WCLK WE DI{D:A}[1:0] N/A ADDR{D:A}[4:0] ADDR{D:A}[4:0] DO{D:A}[1:0]
RAM32M16 SelectRAM true INIT_{H:A}[63:0] IS_WCLK_INVERTED WCLK WE DI{H:A}[1:0] N/A ADDR{H:A}[4:0] ADDR{H:A}[4:0] DO{H:A}[1:0]
RAMS64E LUTRAM Primitive false INIT[63:0] IS_CLK_INVERTED CLK WE I N/A (WADR{7:6}), ADR{5:0} ADR{5:0} O
RAMS64E1 LUTRAM Primitive true? INIT[63:0] IS_CLK_INVERTED CLK WE I N/A (WADR{8:6}), ADR{5:0} ADR{5:0} O
RAMD64E LUTRAM Primitive false INIT[63:0] IS_CLK_INVERTED CLK WE I N/A WADR{7:0} RADR{5:0} O
RAM64X1S LUTRAM false INIT[63:0] IS_WCLK_INVERTED WCLK WE D N/A A{5:0} A{5:0} O
RAM64X1D LUTRAM false INIT[63:0] IS_WCLK_INVERTED WCLK WE D N/A A{5:0} A{5:0}; DPRA{5:0} SPO; DPO
RAM64X8SW SelectRAM true INIT_{H:A}[63:0] IS_WCLK_INVERTED WCLK WE D WSEL[2:0] A[5:0] A[5:0] O[7:0]
RAM64M SelectRAM false INIT_{D:A}[63:0] IS_WCLK_INVERTED WCLK WE DI{D:A} N/A ADDR{D:A}[5:0] ADDR{D:A}[5:0] DO{D:A}
RAM64M8 SelectRAM true INIT_{H:A}[63:0] IS_WCLK_INVERTED WCLK WE DI{H:A} N/A ADDR{H:A}[5:0] ADDR{H:A}[5:0] DO{H:A}
RAM128X1S LUTRAM false INIT[127:0] IS_WCLK_INVERTED WCLK WE D N/A A{6:0} A{6:0} O
RAM128X1D LUTRAM false INIT[127:0] IS_WCLK_INVERTED WCLK WE D N/A A[6:0] A[6:0]; DPRA[6:0] SPO; DPO
RAM256X1S LUTRAM false INIT[255:0] IS_WCLK_INVERTED WCLK WE D N/A A[7:0] A[7:0] O
RAM256X1D LUTRAM true INIT[255:0] IS_WCLK_INVERTED WCLK WE D N/A A[7:0] A[7:0]; DPRA[7:0] SPO; DPO
RAM512X1S LUTRAM true INIT[511:0] IS_WCLK_INVERTED WCLK WE D N/A A[8:0] A[8:0] O

@hansemro
Copy link
Author

hansemro commented Feb 8, 2024

XC7 LUTRAM to LUTRAM Primitive Transformations:

LUTRAMs are broken down to primitive cell(s) that will eventually map to SLICEM LUT_OR_MEM6/LUT_OR_MEM5 BEL site(s) once placed.

Convention:

  • Title: Source Cell to Destination Cell
  • Transformations: Source Name to Destination Name (if owner is not specified)

RAM32X1S -> 1x RAMS32

Cell Rules:

  • RAMS32 (6LUT) with /SP appended to name
  • No MUXF cells
  • 1 output

Parameter Rules:

  • IS_WCLK_INVERTED to IS_CLK_INVERTED

Port Rules:

  • A{4:0} to /SP's ADR{4:0}
  • D to I
  • O to O
  • WCLK to CLK
  • WE to WE

RAM32X1D -> 2x RAMD32

Cell Rules:

  • RAMD32 (D6LUT/B6LUT) with /SP appended to name
  • RAMD32 (C6LUT/A6LUT) with /DP appended to name
  • No MUXF cells
  • 2 outputs

Parameter Rules:

  • IS_WCLK_INVERTED to IS_CLK_INVERTED

Port Rules:

  • A{4:0} to /SP's RADR{4:0}
  • A{4:0} to /SP's WADR{4:0}
  • DPRA{4:0} to /DP's RADR{4:0}
  • A{4:0} to /DP's WADR{4:0}
  • D to all I
  • SPO to /SP's O
  • DPO to /DP's O
  • WCLK to all CLK
  • WE to all WE

RAM32M -> 2x RAMS32 + 6x RAMD32

Cell Rules:

  • RAMS32 (D5LUT) with /RAMD appended to name
  • RAMS32 (D6LUT) with /RAMD_D1 appended to name
  • RAMD32 (C5LUT) with /RAMC appended to name
  • RAMD32 (C6LUT) with /RAMC_D1 appended to name
  • RAMD32 (B5LUT) with /RAMB appended to name
  • RAMD32 (B6LUT) with /RAMB_D1 appended to name
  • RAMD32 (A5LUT) with /RAMA appended to name
  • RAMD32 (A6LUT) with /RAMA_D1 appended to name
  • No MUXF cells
  • 8 outputs

Parameter Rules:

  • IS_WCLK_INVERTED to IS_CLK_INVERTED

Port Rules:

  • WCLK to all CLK
  • WE to all WE
  • DIA[0] to /RAMA's I
  • DIA[1] to /RAMA_D1's I
  • DIB[0] to /RAMB's I
  • DIB[1] to /RAMB_D1's I
  • DIC[0] to /RAMC's I
  • DIC[1] to /RAMC_D1's I
  • DID[0] to /RAMD's I
  • DID[1] to /RAMD_D1's I
  • DOA[0] to /RAMA's O
  • DOA[1] to /RAMA_D1's O
  • DOB[0] to /RAMB's O
  • DOB[1] to /RAMB_D1's O
  • DOC[0] to /RAMC's O
  • DOC[1] to /RAMC_D1's O
  • DOD[0] to /RAMD's O
  • DOD[1] to /RAMD_D1's O
  • ADDRA[4:0] to /RAMA and /RAMA_D1's RADR{4:0}
  • ADDRB[4:0] to /RAMB and /RAMB_D1's RADR{4:0}
  • ADDRC[4:0] to /RAMC and /RAMC_D1's RADR{4:0}
  • ADDRD[4:0] to all WADR{4:0}
  • ADDRD[4:0] to /RAMD and /RAMD_D1's `ADR{4:0}

RAM64X1S -> RAMS64E

Cell Rules:

  • RAMS64E (6LUT)
  • No MUXF cells
  • 1 output

Parameter Rules:

  • IS_WCLK_INVERTED to IS_CLK_INVERTED

Port Rules:

  • WCLK to CLK
  • WE to WE
  • D to I
  • O to O
  • A{5:0} to ADR{5:0}
  • RAMS64E's WADR6 and WADR7 ports not connected

RAM64X1D -> 2x RAMD64E

Cell Rules:

  • RAMD64E (D6LUT/B6LUT) with /SP appended to name
  • RAMD64E (C6LUT/A6LUT) with /DP appended to name
  • No MUXF cells
  • 2 outputs

Parameter Rules:

  • IS_WCLK_INVERTED to IS_CLK_INVERTED

Port Rules:

  • WCLK to all CLK
  • WE to all WE
  • A{5:0} to all RADR{5:0}
  • A{5:0} to all WADR{5:0}
  • DPRA{5:0} to /DP's RADR{5:0}
  • D to all I
  • SPO to /SP's O
  • DPO to /DP's O

RAM64M -> 4x RAMD64E

Cell Rules:

  • RAMD64E (A6LUT) with /RAMA appended to name
  • RAMD64E (B6LUT) with /RAMB appended to name
  • RAMD64E (C6LUT) with /RAMC appended to name
  • RAMD64E (D6LUT) with /RAMD appended to name
  • No MUXF cells
  • 4 outputs

Parameter Rules:

  • IS_WCLK_INVERTED to IS_CLK_INVERTED

Port Rules:

  • WCLK to all CLK
  • WE to all WE
  • DI{A:D} to /RAM{A:D}'s I
  • DO{A:D} to /RAM{A:D}'s O
  • DO{A:D} to /RAM{A:D}'s O
  • ADDR{A:D}[5:0] to /RAM{A:D}'s RADR{5:0}
  • ADDRD[5:0] to all WADR{5:0}
  • all WADR6 and WADR7 RAMD64E ports are not connected

RAM128X1S -> 2x RAMS64E

Cell Rules:

  • RAMS64E (D6LUT/B6LUT) with /LOW appended to name
  • RAMS64E (C6LUT/A6LUT) with /HIGH appended to name
  • MUXF7 cell with /F7 for single output

Parameter Rules:

  • IS_WCLK_INVERTED to IS_CLK_INVERTED

Port Rules:

  • WCLK to all CLK
  • D to all I
  • A{5:0} to all ADR{5:0}
  • A6 to all WADR6
  • A6 to /F7's S
  • all WADR7 ports are not connected
  • /LOW's O to /F7's I0
  • /HIGH's O to /F7's I1
  • O to /F7's O

RAM128X1D -> 4x RAMD64E

Cell Rules:

  • RAMD64E (D6LUT) with /SP.LOW appended to name
  • RAMD64E (C6LUT) with /SP.HIGH appended to name
  • RAMD64E (B6LUT) with /DP.LOW appended to name
  • RAMD64E (A6LUT) with /DP.HIGH appended to name
  • MUXF7 with /F7.SP appended to name for single port output
  • MUXF7 with /F7.DP appended to name for dual port output
  • 2 outputs

Parameter Rules:

  • IS_WCLK_INVERTED to IS_CLK_INVERTED

Port Rules:

  • WCLK to all CLK
  • WE to all WE
  • D to all I
  • A[6:0] to all WADR{6:0}
  • A[6:0] to /SP.LOW's RADR{6:0}
  • A[6:0] to /SP.HIGH's RADR{6:0}
  • A[6] to /F7.SP's S
  • DPRA[6:0] to /DP.LOW's RADR{6:0}
  • DPRA[6:0] to /DP.HIGH's RADR{6:0}
  • DPRA[6] to /F7.DP's S
  • /SP.LOW's O to /F7.SP's I0
  • /SP.HIGH's O to /F7.SP's I1
  • /DP.LOW's O to /F7.DP's I0
  • /DP.HIGH's O to /F7.DP's I1
  • SPO to /F7.SP's O
  • DPO to /F7.DP's O

RAM256X1S -> 4x RAMS64E

Cell Rules:

  • RAMS64E (D6LUT) with /RAMS64E_D appended to name
  • RAMS64E (C6LUT) with /RAMS64E_C appended to name
  • RAMS64E (B6LUT) with /RAMS64E_B appended to name
  • RAMS64E (A6LUT) with /RAMS64E_A appended to name
  • MUXF7 with /F7.A appended to name
  • MUXF7 with /F7.B appended to name
  • MUXF8 with /F8 appended to name for single output

Parameter Rules:

  • IS_WCLK_INVERTED to IS_CLK_INVERTED

Port Rules:

  • WCLK to all CLK
  • WE to all WE
  • D to all I
  • A[5:0] to all ADR{5:0}
  • A[7:6] to all WADR{7:6}
  • A[6] to /F7.A's S
  • A[6] to /F7.B's S
  • A[7] to /F8's S
  • /RAMS64E_D's O to /F7.B's I0
  • /RAMS64E_C's O to /F7.B's I1
  • /RAMS64E_B's O to /F7.A's I0
  • /RAMS64E_A's O to /F7.A's I1
  • /F7.B's O to /F8's I0
  • /F7.A's O to /F8's I1
  • O to /F8's O

@hansemro
Copy link
Author

hansemro commented Feb 8, 2024

LUTRAM Primitive to BEL Transformations:

Additional notes:

  • Unclear to me how CLKINV bit is set for SLICE_LUTX in nextpnr/fasm.
    • Also unclear to me how CLKINV status for all BELs in the SLICE site are checked to match.
  • Some of the transformation details have yet been implemented or verified in my development branch.

RAMD64E -> LUT_OR_MEM6 BEL

  • nextpnr-specific rules:
    • set type to id_SLICE_LUTX
    • transform parameter IS_CLK_INVERTED to IS_WCLK_INVERTED
    • set attribute X_LUT_AS_DRAM to 1
  • RADR{5:0} to A{6:1}
  • WADR{7:0} to WA{8:1}
  • I to DI1
  • O to O6
  • WE to WE
  • CLK to CLK

RAMS64E -> LUT_OR_MEM6 BEL

  • nextpnr-specific rules:
    • set type to id_SLICE_LUTX
    • transform parameter IS_CLK_INVERTED to IS_WCLK_INVERTED
    • set attribute X_LUT_AS_DRAM to 1
  • ADR{5:0} to A{6:1}
  • ADR{5:0} to WA{6:1}
  • WADR{7:6} to WA{8:7}
  • I to DI1
  • O to O6
  • WE to WE
  • CLK to CLK

RAMD32 -> LUT_OR_MEM6 BEL

  • nextpnr-specific rules:
    • set type to id_SLICE_LUTX
    • transform parameter IS_CLK_INVERTED to IS_WCLK_INVERTED
    • set attribute X_LUT_AS_DRAM to 1
  • RADR{4:0} to A{5:1}
  • WADR{4:0} to WA{5:1}
  • I to DI1 or DI2 if DI1 already used by LUT_OR_MEM5
  • O to O6
  • WE to WE
  • CLK to CLK

RAMD32 -> LUT_OR_MEM5 BEL

  • nextpnr-specific rules:
    • set type to id_SLICE_LUTX
    • transform parameter IS_CLK_INVERTED to IS_WCLK_INVERTED
    • set attribute X_LUT_AS_DRAM to 1
  • RADR{4:0} to A{5:1}
  • WADR{4:0} to WA{5:1}
  • I to DI1
  • O to O5
  • WE to WE
  • CLK to CLK

RAMS32 -> LUT_OR_MEM6 BEL

  • nextpnr-specific rules:
    • set type to id_SLICE_LUTX
    • transform parameter IS_CLK_INVERTED to IS_WCLK_INVERTED
    • set attribute X_LUT_AS_DRAM to 1
  • ADR{4:0} to A{5:1}
  • ADR{4:0} to WA{5:1}
  • I to DI1 or DI2 if DI1 already used by LUT_OR_MEM5
  • O to O6
  • WE to WE
  • CLK to CLK

RAMS32 -> LUT_OR_MEM5 BEL

  • nextpnr-specific rules:
    • set type to id_SLICE_LUTX
    • transform parameter IS_CLK_INVERTED to IS_WCLK_INVERTED
    • set attribute X_LUT_AS_DRAM to 1
  • ADR{4:0} to A{5:1}
  • ADR{4:0} to WA{5:1}
  • I to DI1
  • O to O5
  • WE to WE
  • CLK to CLK

@hansfbaier
Copy link
Collaborator

Thanks for the effort! I am looking forward to what you will come up with!

@hansemro

This comment was marked as resolved.

@hansemro

This comment was marked as resolved.

@hansemro
Copy link
Author

hansemro commented Feb 13, 2024

Issue: nextpnr does nothing with IS_*CLK_INVERTED property for LUTRAM cells

While working on CLKINV property test, I noticed that nextpnr does not set the CLKINV bit when IS_*CLK_INVERTED property for a LUTRAM is set. Instead, nextpnr fasm writer ignores the property and sets NOCLKINV for the SLICEM site.

Also note that, on XC7, LUTRAMs and FFs share the same CLKINV routing BEL. However, on Ultrascale(+), LUTRAMs have their own dedicated clock inverter provided by the LCLKINV routing BEL.

WIP CLKINV property test: https://github.com/hansemro/primitive-tests/tree/xc7-lutram-tests/lutram-tests/clkinv-test

@hansfbaier
Copy link
Collaborator

Great work!

@hansemro
Copy link
Author

Something that bothered me about how RAM{S,D}64E maps to LUT_OR_MEM6 BEL is that only DI1 data input is used to write to both internal LUT_OR_MEM5 BELs. Somehow one of the LUT_OR_MEM5 BEL can select between DI1 and DI2 data inputs, but this is not really well documented.

Recently, I stumbled on the following physical design rules (in $VIVADO_2017.2_ROOT/ids_lite/ISE/msg/usenglish/PhysDesignRules.msg) that seem to correspond to RAM.SMALL configuration bit being what controls data input selection:

1383
Issue with pin connections and/or configuration on block:<%s>:<%s>.  For RAMMODE programming set with DPRAM64 or SPRAM64 or SRL32 the DI1 input pin must be connected.\n
Issue with pin connections and/or configuration on block:<!%1!>:<!%2!>.  For RAMMODE programming set with DPRAM64 or SPRAM64 or SRL32 the DI1 input pin must be connected.\n
;;
1384
Issue with pin connections and/or configuration on block:<%s>:<%s>.  For RAMMODE programming set with DPRAM64 or SPRAM64 or SRL32 the DI2 input pin cannot be connected.\n
Issue with pin connections and/or configuration on block:<!%1!>:<!%2!>.  For RAMMODE programming set with DPRAM64 or SPRAM64 or SRL32 the DI2 input pin cannot be connected.\n
;;
1385
Issue with pin connections and/or configuration on block:<%s>:<%s>.  For RAMMODE programming set with DPRAM32 or SPRAM32 or SRL16 the DI2 input pin must be connected.\n
Issue with pin connections and/or configuration on block:<!%1!>:<!%2!>.  For RAMMODE programming set with DPRAM32 or SPRAM32 or SRL16 the DI2 input pin must be connected.\n
;;
1386
Issue with pin connections and/or configuration on block:<%s>:<%s>.  For RAMMODE programming set with DPRAM32 or SPRAM32 or SRL16 the DI1 input pin cannot be connected.\n
Issue with pin connections and/or configuration on block:<!%1!>:<!%2!>.  For RAMMODE programming set with DPRAM32 or SPRAM32 or SRL16 the DI1 input pin cannot be connected.\n

Coincidentally, 018-clb-ram prjxray fuzzer found that RAM.SMALL configuration bit is set for RAM32M/RAM32X1{S,D} and SRL16E, but not set for RAM64M/RAM{64,128}X{S,D}/RAM256X1S and SRLC32E. This all seems to indicate RAM.SMALL bit is used for data input selection for upper LUT_OR_MEM5 BEL (one with DI2 input and initialized with INIT[63:32]).

Here is a block diagram to help visualize what I see of LUT_OR_MEM6 BEL:

LUT_OR_MEM6 R1

@hansfbaier
Copy link
Collaborator

Good to see you making progress!

@lehaifeng000
Copy link

@hansemro the DI2 can't be used, it would make the Infinite loop in place. there is some check in the place, if the lutram used the DI2 port, and the wa7-8 ports are not being config, It cannot complete the place

@hansemro
Copy link
Author

hansemro commented Sep 2, 2024

@lehaifeng000 Yes, nextpnr does not currently have the capacity to pack/place RAM32X1S/RAMS32 which would occupy and utilize both LUT_OR_MEM5 BELs. As you say, placer will get stuck because LUT_OR_MEM5/RAM32* with DI2 is not yet accepted as legal:

else if (i_net != lut5->lutInfo.di1_net) {
DBG();
return false; // Memory and SRLs only valid in SLICEMs
}
.

Additionally, how {C,B}X pins are connected to {C,B}LUTs and WA7USED/WA8USED BEL should also be considered in legalization. I believe, Vivado avoids this by making it illegal to place mismatched LUTRAM types in the same CLB SLICEM site. However, I will need to look more into this.

If we update the packer to utilize DI2, we will need to and should update the placer and legalization accordingly.

@lehaifeng000
Copy link

I tend to understand and try to modify place, inserting rules into the placement process.

@lehaifeng000
Copy link

@hansfbaier @hansemro
I'm wondering if you could provide an email or other contact information. Sometimes, materials that aren't suitable for public sharing can be sent to you privately.

@hansfbaier
Copy link
Collaborator

@lehaifeng000 my email address [email protected] should be visible in every git commit. Same for @hansemro

@lehaifeng000
Copy link

@lehaifeng000 my email address [email protected] should be visible in every git commit. Same for @hansemro
OK, I got it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants