Inference c++ jetson xavier AGX #782

christophezeinaty · 2023-07-31T14:37:46Z

christophezeinaty
Jul 31, 2023

Hello I trained my model using larq and saved the model using this

with open("bnn.tflite", 'wb') as flatbuffer_file: flatbuffer_bytes = lce.convert_keras_model(model_b) flatbuffer_file.write(flatbuffer_bytes)

I worte a c++ code to do the inference using the LCE engine and compiled the code directly on the jetson xavier which is arm64 based processor. I am not sure if my inference uses matrix operations or is it really based only on bitwise operations. Whats bothering me is that this model has 76 000 params and do a 60 ms inference time per image, if I compare this model to a smaller one with 12 000 params and quantized weights int 8 the BNN is slower. The int8 models takes only 5ms per image is it normal ?

My hypothesis was even if I have a model that is a little bit larger but is binarized it should be faster (due to removing the mat mult operations) , am I wrong ?

Tombana · 2023-07-31T14:43:51Z

Tombana
Jul 31, 2023
Maintainer

Hi @christophezeinaty, could you open the bnn.tflite file in https://netron.app/ and show a screenshot of it? Then we can check if it was converted correctly. (And maybe also a screenshot of the int8 tflite file, for comparison)
Since the Jetson is designed for AI, it is possible that the int8 model is accelerated by either the GPU or by special int8 SIMD instructions on the CPU, which might not exist for binary operations. So although the binary model might be faster in theory, it could be slower on hardware that was specifically designed for int8 instead of binary.

45 replies

Tombana Aug 4, 2023
Maintainer

I should've mentioned, you have to initialize submodules: git submodule update --init --recursive. I'll edit my post above as well. (EDIT: no -- in front of update)

christophezeinaty Aug 4, 2023
Author

ok It worked, seems like there is something wrong with the way I build LCE, thank you a lot. And to clarify we should always look at the line were its written Inference time not the time we see in the end of the file ? (that's why its 2.9ms instead of 0.8ms?) and for you is it a reasonable speedup ?

Tombana Aug 4, 2023
Maintainer

And to clarify we should always look at the line were its written Inference time not the time we see in the end of the file ? (that's why its 2.9ms instead of 0.8ms?)

Correct.
I was confused because the output of this tool is different from what it was in previous tensorflow versions. In previous versions it was always shown at the end of the file but not anymore. I think the 0.8 ms is only the part that was done by the "XNNPack Delegate" (part of the network that is accelerated by the XNNPack library) and 2.9 ms is the full model.

and for you is it a reasonable speedup ?

I think this is reasonable for this model yes. To get bigger speedups, the layers would need a higher number of channels, and always multiples of 32 channels.

christophezeinaty Aug 4, 2023
Author

ok well noted, but higher channel means bigger model no ? so if we have memory constraint it wont be the best idea (I guess it should be a comprise between model size and inference time)

Tombana Aug 4, 2023
Maintainer

That's right.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference c++ jetson xavier AGX #782

{{title}}

Replies: 1 comment 45 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Inference c++ jetson xavier AGX #782

christophezeinaty Jul 31, 2023

Replies: 1 comment · 45 replies

Tombana Jul 31, 2023 Maintainer

Tombana Aug 4, 2023 Maintainer

christophezeinaty Aug 4, 2023 Author

Tombana Aug 4, 2023 Maintainer

christophezeinaty Aug 4, 2023 Author

Tombana Aug 4, 2023 Maintainer

christophezeinaty
Jul 31, 2023

Replies: 1 comment 45 replies

Tombana
Jul 31, 2023
Maintainer

Tombana Aug 4, 2023
Maintainer

christophezeinaty Aug 4, 2023
Author

Tombana Aug 4, 2023
Maintainer

christophezeinaty Aug 4, 2023
Author

Tombana Aug 4, 2023
Maintainer