-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bool input tensor #771
Comments
I think the MLIR converter can not directly create The LCE converter does have a utility function for getting bitpacked boolean output tensors: first you have to create a regular tflite file that ends with a from larq_compute_engine.mlir.python.util import strip_lcedequantize_ops
strip_lcedequantize_ops(tflite_file_bytes) That should result in a tflite file that has an int32 output tensor (again its a good idea to verify it in netron). It does not actually use 32-bit integers though: these numbers represent bitpacked booleans where every integer contains 32 booleans. |
Thanks for your reply. It helps me a lot. Besides, how can I use int8 input in LCE benchmark? When I directly modify the dtype of input tensor, it raises a TypeError. X_in = Input(shape=(1, 1, 1024,), batch_size=1024, dtype=tf.int8) TypeError: Value passed to parameter 'x' has DataType int8 not in list of allowed values: bfloat16, float16, float32, float64, int32, int64, complex64, complex128 |
For int8 tflite files you have to do either int8 quantization-aware-training or use post-training quantization. The tensor in Tensorflow/Keras stays float32 and during conversion it becomes int8. |
I get it. Thank you! |
Hi @Tombana , sorry for reopening this issue. After learning more about larq compute engine, I have two new questions. X_in = Input(shape=(1, 1, 1024), batch_size=1024, dtype=tf.float32)
X_in_quantized = lq.quantizers.SteSign()(X_in) I want Q2. Is there any possibility to add a custom operation, e.g., X_in = Input(shape=(1, 1, 32), batch_size=1024, dtype=tf.float32)
X_in_quantized = costom_converter()(X_in) If the answer is yes, how can I add a custom operation and build it. I'm not familiar with tflite and mlir, so I'd appreciate it if you could go into more detail. Thank you! |
This is correct. During training everything is float so that you can compute gradients. Once you convert the model into a tflite model for inference, it becomes a bitpacked tensor. What exactly is your question about this? This conversion between the models happens in the MLIR converter. In the tflite model, there is an operator
If you want the |
@Tombana , thanks for your suggestion! I have successfully removed the |
Hi all,
I’m using larq to benchmark the real performance of my bnn. It’s very convenient, but I’m having some trouble. I use a toy model as an example.
This model only has a QuantDense Layer. Then, I benchmark it on Raspberry Pi 4B (64bit-OS).
STARTING! Log parameter values verbosely: [0] Min num runs: [50] Num threads: [1] Graph: [toy1.tflite] #threads used for CPU inference: [1] Loaded model toy1.tflite INFO: Created TensorFlow Lite XNNPACK delegate for CPU. The input model file size (MB): 0.010452 Initialized session in 3.889ms. Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds. count=121 first=4357 curr=2186 min=2143 max=4357 avg=2277.17 std=302 Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds. count=253 first=2206 curr=2179 min=2141 max=2248 avg=2168.41 std=18 Inference timings in us: Init: 3889, First inference: 4357, Warmup (avg): 2277.17, Inference (avg): 2168.41 Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion. Memory footprint delta from the start of the tool (MB): init=0.0507812 overall=10.5469
I think 10.5469 MB is the used memory size, containing the floating-point input tensor. I am wondering how to use a quantized (binarized) tensor as input for saving memory. It’s actually a bool input tensor. How can I test the memory consumption with the bool input tensor?
If you can give me some hints, I would be very grateful.
The text was updated successfully, but these errors were encountered: