Replies: 2 comments 6 replies
-
Hi @gj-raza, First I want to note that LCE will only use the ARM CPU and not any of the deep learning accelerators on the Jetson, only the CPU. Nevertheless, seeing as these are cores that support the ARMv8.2 instruction set, they should indeed use LCE's optimized kernel implementations. Our docs report a time of 42 ms on the Pixel 1 phone, whereas you found 69 ms, assuming its exactly the same network that is being benchmarked. I tried searching for some info to compare the CPUs: It looks like the Nvidia Carmel CPU should be faster according to general benchmarks even though its clockspeed is slightly lower. It is possible that the Carmel cores are very optimized for particular tasks which appear in those benchmarks but not optimized for the binary operations that we employ in LCE. For example, it is possible that the popcount instruction that we use is not pipelined as efficiently as it is on the CPU in the Pixel 1. It is hard to say anything about this without knowing more details about the internals of the Nvidia Carmel core and without doing extensive profiling. |
Beta Was this translation helpful? Give feedback.
-
@gj-raza did you run the Xavier NX platform in MAXP power mode such that the cores can go to the limit of 1.9Ghz? theoretically it should also be possible to build the kernel to exceed the official limits (at least for one core such that the system remains stable for your single-core tests with LCE). |
Beta Was this translation helpful? Give feedback.
-
I wanted to benchmark the performance of Larq CE on a Jetson NX (which has arm v8.2 based Carmel cpu), for that i compiled CE from source natively as mentioned in the docs, and here is the result of BinaryResNetE18 model from the zoo:
The concern is that , the infer time numbers aren't even beating the Pixel benchmarks as reported on your zoo page, despite Carmel being a better processor than the Pixels. Can you please shed some light on that?
Beta Was this translation helpful? Give feedback.
All reactions