Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to integrate NVDLA large? #132

Open
12ff7a6 opened this issue Nov 5, 2021 · 5 comments
Open

how to integrate NVDLA large? #132

12ff7a6 opened this issue Nov 5, 2021 · 5 comments
Labels
enhancement New feature or request

Comments

@12ff7a6
Copy link

12ff7a6 commented Nov 5, 2021

Hello,
I have already integrated NVDLA small and tested on VC707, it works perfect. Now I want to try NVDLA large. It seems one can not integrate NVDLA large by using the guide https://www.esp.cs.columbia.edu/docs/thirdparty_acc/. Because NVDLA large needs an extra SRAM and Microcontroller.
nvdla-primer-system-comparison
Adding an extra SRAM manually into NVDLA small project should be easy. But how can I add a Microcontroller?
NVDLA said

Requirements for the NVDLA coprocessor are fairly typical; as such, there are many general purpose processors that would be appropriate (e.g., RISC-V-based PicoRV32 processors, ARM Cortex-M or Cortex-R processors, or even in-house microcontroller designs).

actually I have a stupid idea. Can I just implement a 2 cores CPU(Ariane) instead of implement another microcontroller? Can someone give me a hint? Any reply will be appreciated.

@davide-giri
Copy link
Member

Hi! We have not yet explored the integration of NVDLA large (or full) in ESP, although it would be very nice to have.

I agree that adding the SRAM should not be a big deal. There may also be some work needed on the main memory interface where the AXI data bitwidth will increase significantly, but also in this case I don't see any major roadblock.

Regarding the additional microcontroller I wonder if that's more of a suggesting system structure, rather than an actual strict requirement. For what I know about the NVDLA software stack I think it will be able to run on a single CPU also in the case of NVDLA full and NVDLA large, but I may be wrong. If you definitely need 2 CPUs I think 2 Ariane will probably do. We're going to release to master very soon the support for multicore Ariane ESP systems, the feature is already available in the ariane-smp branch. So you can definitely have an SoC with 2 Ariane cores.

What do you think?

Thanks!
Davide

@12ff7a6
Copy link
Author

12ff7a6 commented Nov 7, 2021

Hi Davide, thank you for your reply. I have also compared the RTL both for NVDLA small and large from Chipyard, they do only support VCU118. Anyway, I didn't find any microcontroller. So the microcontroller is not a must. I will try to implement NVDLA large in the next few days, I hope VC707'LUT is large enough for NVDLA large. Will leave an update if I have something new. thanks again for your help!

@davide-giri
Copy link
Member

Great, that's what I thought. The problem is that I don't think NVDLA large will fit on the VC707. Let us know how it goes and if you have any further questions, thanks!

@12ff7a6
Copy link
Author

12ff7a6 commented Nov 9, 2021

Hi Davide, I was wondering, can I still use the guide How to: integrate a third-party accelerator (e.g. NVDLA) to integrate NVDLA large? Since microcontroller is not necessary. Then integrate the SRAM manually. But in this guide I found that

Currently ESP can host third-party accelerators with an AXI4 master interface to send out memory requests (32-bit or 64-bit), an APB slave interface (32-bit) to be configured by a CPU and an interrupt signal.

NVDLA large need more bit width then ESP supported. So that means I should DIY the NVDLA small to large and modify the bus width rather then follow this guide?

@davide-giri
Copy link
Member

Yes, that is exactly what I was referring to when I mentioned there is some work needed on the main memory interface, where the AXI data bitwidth will increase significantly.

The lowest effort option would be to insert a data bitwidth adapter in the NVDLA wrapper to bridge from the current 64bit data bitwidth to the bitwidth of NVDLA large (512 bits I think). When I looked into this some time ago I identified some modules in the pulp-platform axi repository that do exactly that. Specifically, I'm referring to the axi_dw_converter.sv, axi_sw_upsizer.sv, axi_dw_downsizer.sv. Beware of the description on top of those modules, as they have some limitations.

Integrating NVDLA large this way would allow you to reuse entirely the ESP guide you mentioned as all your modifications will be internal to NVDLA and its wrapper. From an SoC perspective nothing changes.

Of course the tile containing NVDLA will still have a 64bit data bitwidth, which may or may not limit the NVDLA large performance depending on the workload. If the performance ends up to be drastically limited by the main memory access bandwidth we can discuss possible solutions, which we may help with on our end. Pretty much it will boil down to increasing the bitwidth of the ESP NoC planes used by NVDLA and/or of the memory controller. Changing the bitwidth of the ESP NoC is seamless, but there may be a couple of patches requires in some of the bridges connected to the NoC. Then with ESP you can easily have multiple memory tiles (and therefore multiple memory links) so that's another way to parallelize more the memory accesses. Probably this is a discussion for the future, but I just wanted to give you a sense of what can be done with a modest effort.

I hope this helps!
Davide

@jzuckerman jzuckerman added the enhancement New feature or request label Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants