Inference c++ jetson xavier AGX #782
christophezeinaty
started this conversation in
General
Replies: 1 comment 45 replies
-
Hi @christophezeinaty, could you open the |
Beta Was this translation helpful? Give feedback.
45 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello I trained my model using larq and saved the model using this
with open("bnn.tflite", 'wb') as flatbuffer_file: flatbuffer_bytes = lce.convert_keras_model(model_b) flatbuffer_file.write(flatbuffer_bytes)
I worte a c++ code to do the inference using the LCE engine and compiled the code directly on the jetson xavier which is arm64 based processor. I am not sure if my inference uses matrix operations or is it really based only on bitwise operations. Whats bothering me is that this model has 76 000 params and do a 60 ms inference time per image, if I compare this model to a smaller one with 12 000 params and quantized weights int 8 the BNN is slower. The int8 models takes only 5ms per image is it normal ?
My hypothesis was even if I have a model that is a little bit larger but is binarized it should be faster (due to removing the mat mult operations) , am I wrong ?
Beta Was this translation helpful? Give feedback.
All reactions