how to integrate NVDLA large? #132

12ff7a6 · 2021-11-05T11:57:17Z

Hello,
I have already integrated NVDLA small and tested on VC707, it works perfect. Now I want to try NVDLA large. It seems one can not integrate NVDLA large by using the guide https://www.esp.cs.columbia.edu/docs/thirdparty_acc/. Because NVDLA large needs an extra SRAM and Microcontroller.

Adding an extra SRAM manually into NVDLA small project should be easy. But how can I add a Microcontroller?
NVDLA said

Requirements for the NVDLA coprocessor are fairly typical; as such, there are many general purpose processors that would be appropriate (e.g., RISC-V-based PicoRV32 processors, ARM Cortex-M or Cortex-R processors, or even in-house microcontroller designs).

actually I have a stupid idea. Can I just implement a 2 cores CPU(Ariane) instead of implement another microcontroller? Can someone give me a hint? Any reply will be appreciated.

davide-giri · 2021-11-07T03:57:30Z

Hi! We have not yet explored the integration of NVDLA large (or full) in ESP, although it would be very nice to have.

I agree that adding the SRAM should not be a big deal. There may also be some work needed on the main memory interface where the AXI data bitwidth will increase significantly, but also in this case I don't see any major roadblock.

Regarding the additional microcontroller I wonder if that's more of a suggesting system structure, rather than an actual strict requirement. For what I know about the NVDLA software stack I think it will be able to run on a single CPU also in the case of NVDLA full and NVDLA large, but I may be wrong. If you definitely need 2 CPUs I think 2 Ariane will probably do. We're going to release to master very soon the support for multicore Ariane ESP systems, the feature is already available in the ariane-smp branch. So you can definitely have an SoC with 2 Ariane cores.

What do you think?

Thanks!
Davide

12ff7a6 · 2021-11-07T14:16:10Z

Hi Davide, thank you for your reply. I have also compared the RTL both for NVDLA small and large from Chipyard, they do only support VCU118. Anyway, I didn't find any microcontroller. So the microcontroller is not a must. I will try to implement NVDLA large in the next few days, I hope VC707'LUT is large enough for NVDLA large. Will leave an update if I have something new. thanks again for your help!

davide-giri · 2021-11-09T04:22:17Z

Great, that's what I thought. The problem is that I don't think NVDLA large will fit on the VC707. Let us know how it goes and if you have any further questions, thanks!

12ff7a6 · 2021-11-09T11:50:30Z

Hi Davide, I was wondering, can I still use the guide How to: integrate a third-party accelerator (e.g. NVDLA) to integrate NVDLA large? Since microcontroller is not necessary. Then integrate the SRAM manually. But in this guide I found that

Currently ESP can host third-party accelerators with an AXI4 master interface to send out memory requests (32-bit or 64-bit), an APB slave interface (32-bit) to be configured by a CPU and an interrupt signal.

NVDLA large need more bit width then ESP supported. So that means I should DIY the NVDLA small to large and modify the bus width rather then follow this guide?

davide-giri · 2021-11-09T17:30:14Z

Yes, that is exactly what I was referring to when I mentioned there is some work needed on the main memory interface, where the AXI data bitwidth will increase significantly.

The lowest effort option would be to insert a data bitwidth adapter in the NVDLA wrapper to bridge from the current 64bit data bitwidth to the bitwidth of NVDLA large (512 bits I think). When I looked into this some time ago I identified some modules in the pulp-platform axi repository that do exactly that. Specifically, I'm referring to the axi_dw_converter.sv, axi_sw_upsizer.sv, axi_dw_downsizer.sv. Beware of the description on top of those modules, as they have some limitations.

Integrating NVDLA large this way would allow you to reuse entirely the ESP guide you mentioned as all your modifications will be internal to NVDLA and its wrapper. From an SoC perspective nothing changes.

Of course the tile containing NVDLA will still have a 64bit data bitwidth, which may or may not limit the NVDLA large performance depending on the workload. If the performance ends up to be drastically limited by the main memory access bandwidth we can discuss possible solutions, which we may help with on our end. Pretty much it will boil down to increasing the bitwidth of the ESP NoC planes used by NVDLA and/or of the memory controller. Changing the bitwidth of the ESP NoC is seamless, but there may be a couple of patches requires in some of the bridges connected to the NoC. Then with ESP you can easily have multiple memory tiles (and therefore multiple memory links) so that's another way to parallelize more the memory accesses. Probably this is a discussion for the future, but I just wanted to give you a sense of what can be done with a modest effort.

I hope this helps!
Davide

jzuckerman added the enhancement New feature or request label Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to integrate NVDLA large? #132

how to integrate NVDLA large? #132

12ff7a6 commented Nov 5, 2021

davide-giri commented Nov 7, 2021

12ff7a6 commented Nov 7, 2021

davide-giri commented Nov 9, 2021

12ff7a6 commented Nov 9, 2021

davide-giri commented Nov 9, 2021

how to integrate NVDLA large? #132

how to integrate NVDLA large? #132

Comments

12ff7a6 commented Nov 5, 2021

davide-giri commented Nov 7, 2021

12ff7a6 commented Nov 7, 2021

davide-giri commented Nov 9, 2021

12ff7a6 commented Nov 9, 2021

davide-giri commented Nov 9, 2021