-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interest in getting these benchmarks easier to run across systems / NPUs #1
Comments
Hey! Yes, these scripts were written — evolved really over the course of a number of articles — and aren't really set up to be automated, are fiddly to run, and they have no tests. I wrote them at a time when there was no standard benchmarks for models, and honestly I still think that the real world(ish) scenario I use here is far more illustrative of performance than some of the fancier and (in theory) more complete benchmarks. One thing I'd desperately want to keep is simplicity, one of the things that these benchmarks don't do is (much) optimisation. They take an image, throw it at a model, and measure the result. The code is simple, and what it measures at that point is comparable to the performance an average developer doing the task might get, rather than a machine learning researcher than understands the complexities and limitations of the models and how to adapt them to individual platforms. |
I think the first step is to make a decision about how to structure things. Right now the scripts are split by platform — Coral, Movidius, TensorFlow, TensorFlow Lite, OpenVINO (aka Nvidia), and Xnor. Unfortunately the Xnor models are no longer available (thanks for that Apple!) so we can drop that platform. Some of the earlier scripts have a confusing platform decision tree at the start right there in the Python, possibly that needs dropped and we need to break things out into separate scripts, one-per-platform. Then we can write a bash script to figure out the platform, and run the right script. This will probably simplify the main code in the scripts so that it is more easily read and understood by folks that want to come in and understand what's going on. |
Ping (perhaps?) @petewarden and @dansitu for their comments? Pete, Dan! These are the scripts (and models and the original image) I used back when I benchmarked accelerator hardware back in 2019. The code is slightly out of date, but changes to get it working again are probably fairly minimal — although installation of |
I guess the first step is getting the code working again before a proper refactor. |
So, dropping AI2GO as a framework as it's no longer available the frameworks used would be,
Hardware to test against would be,
What new hardware should we be looking at, and what frameworks does it need? Presumably Hailo, what else? |
I'd also like to see in general how other new platforms touting built-in NPUs fare—so like Apple M4, Intel Lunar Lake, Snapdragon X, and AMD Strix Point... some of these platforms are hard to come by (and may also be a bit weird), but hopefully have bindings we could use. I think the shorter term goal is just making the tests simple and reproducible (make it really easy to run, maybe even a convenience script). Then making the depth of support grow if that first goal's met! |
That's going to heavily depend on software support. Just running TensorFlow on a lot of these platforms will work, but be unaccelerated by whatever built-in NPU the platform has onboard. I had started to look at the Beaglebone AI board, which had just been pre-announced as I went to work for Pi, but the software just wasn't there, and it seemed unfair to run unaccelerated TensorFlow models on the board and get a really bad result. Things seem more mature there now, and there does seem to be software, but it's all based on TI's TIDL framework. Poking around the Hailo docs it looks like there are python bindings to the HailoRT framework. So we should be able to take the benchmark's TensorFlow model and convert it to HEF format for that family of hardware at least. But this is the thing that really annoys me. Every manufacturer feels like they need to reinvent the wheel. Every new hardware platform has a new software framework. Even Google created a whole framework (two, first edgetpu and then pycoral) on top of TensorFlow Lite (which is their own software!) before you can use their Coral hardware. That means you have to jump through hoops to convert a normal TensorFlow model into whatever weird format you need this time around. Getting things working on Intel hardware with OpenVINO was especially hard!
Agreed. I'm trying to decide whether then "what platform am I on" decision should be made as close to inferencing as possible, or right at the start. The obvious architecture would be to ingest a TensorFlow model and then do the model conversion automagically. So the user just throws an arbitary model at the script and it figures out what architecture its on, converts it to the right format, and runs the benchmark. But that's almost certainly going to be impossible, converting a model is a very manual affair in most cases as it takes knowledge of the model internals. |
Spun up an issue #2 to keep track of Hailo support. |
I wonder if there's any value in working with tinymlperf on merging/extending these benchmarks? https://mlcommons.org/2021/06/mlperf-tiny-inference-benchmark/ |
Apple M4 uses arm's SME so it should be possible to test that with CPU code that has appropriate optimisations. Maybe using something like TinyGrad or lama.cpp
There's now also a Beaglebone AI64. The Jetson Nano is also now EOL replaced by Jetson Orin Nano.
Does it make sense to add a generic OpenCL/Vulkan compute test framework? Most small edge devices that have a GPU have the potential to do OpenCL if the drivers support it. |
Just today, Intel's CEO said:
Just marking my interest in helping get these benchmarks to run well across a variety of systems. This month my concrete goal is to try to get a benchmark to compare the Coral (supposedly 4 TOPS, tops) to the Hailo-8L (supposedly 13 TOPS).
It'd be nice if there were some benchmarks we can run that compare different NPUs somewhat fairly using real world scenarios, like what this repository does. Geekbench, Cinebench, etc. are decent for what they are, and this could be added to the testbench for all the upcoming 'AI PCs', 'Copilot+' PCs, Edge AI boxes, whatever.
The text was updated successfully, but these errors were encountered: