Skip to content

Latest commit

 

History

History
167 lines (129 loc) · 7.63 KB

File metadata and controls

167 lines (129 loc) · 7.63 KB

Object detection STM32 model quantization

Post training quantization is a good way to optimize your neural network models before deploying them on a target. This enables the deployment process more efficient on your embedded devices by reducing the required memory usage (Flash/RAM) and reducing the inference time, and all this with little-to-no degradation on the model accuracy.

This tutorial shows how to quantize a floating point model with real data.

1. Configure the yaml file

All the sections of the YAML file must be setted like described in the README.md with setting the operation_mode to quantization, otherwise you can precise the operation_mode later when running the experience by python stm32ai_main.py operation_mode=quantization.

    1.1 Prepare the dataset

    Information about the dataset you want use for activations calibration is provided in the dataset section of the configuration file, as shown in the YAML code below.

    dataset:
      name: COCO_2017_person
      class_names: [ person ]
      test_path:
      quantization_path: ../datasets/COCO_2017_person
      quantization_split: 0.4
      seed: 0

    In this example the only provided path is the quantization_path. It could be the full training set or a specific set dedicated to activations calibration. If you only want to quantize the model on a random part of your quantization set, simply set the percentage value in the quantization_split parameter.

    1.2 Apply preprocessing

    The images from the dataset need to be preprocessed before they are presented to the network for quantization. This includes rescaling and resizing. In particular, they need to be rescaled exactly as they were at training step. This is illustrated in the YAML code below:

    preprocessing:
      rescaling: { scale: 1/127.5, offset: -1 }
      resizing:
        aspect_ratio: "fit"
        interpolation: nearest
      color_mode: rgb

    In this example, the pixels of the input images read in the dataset are in the interval [0, 255], that is UINT8. If you set scale to 1./255 and offset to 0, they will be rescaled to the interval [0.0, 1.0]. If you set scale to 1/127.5 and offset to -1, they will be rescaled to the interval [-1.0, 1.0].

    The resizing attribute specifies the image resizing methods you want to use:

    • The value of interpolation must be one of {"bilinear", "nearest", "bicubic", "area", "lanczos3", "lanczos5", " gaussian", "mitchellcubic"}.
    • The value of aspect_ratio must have a value of "fit" as we do not support other values such as "crop". If you set it to "fit", the resized images will be distorted if their original aspect ratio is not the same as in the resizing size.

    The color_mode attribute must be one of "grayscale", "rgb" or "rgba".

    1.3 Set the model and quantization parameters

    As mentioned previously, all the sections of the YAML file must be set in accordance with this README.md. In particular, operation_mode should be set to quantization and the quantization section should be filled as in the following example:

    general:
      model_path: ../pretrained_models/st_ssd_mobilenet_v1/ST_pretrainedmodel_public_dataset/COCO/ssd_mobilenet_v1_0.25_224/ssd_mobilenet_v1_0.25_224.h5
    quantization:
      quantizer: TFlite_converter
      quantization_type: PTQ
      quantization_input_type: uint8
      quantization_output_type: float
      granularity: per_channel  # Optional, defaults to "per_channel".
      optimize: False           # Optional, defaults to False.
      export_dir: quantized_models

    where:

    • model_path - String, specifies the path of the model to be quantized.
    • quantizer - String, only option is "TFlite_converter" which will convert model trained weights from float to integer values. The quantized model will be saved in TensorFlow Lite format.
    • quantization_type - String, only option is "PTQ",i.e. "Post-Training Quantization".
    • quantization_input_type - String, can be "int8", "uint8" or "float", represents the quantization type for the model input.
    • quantization_output_type - String, can be "int8", "uint8" or "float", represents the quantization type for the model output.
    • granularity - String, can be "per_tensor" or "per_channel", defines the quantization granularity
    • optimize - String, can be either True or False, controls whether the user wants to optimize the model before attempting to quantize it "per_tensor".
    • export_dir - String, refers to directory name to save the quantized model.
    1.4 Random quantization

    When no path is specified neither in quantization_path nor in training_path, the model is quantized after calibration on random data. There is no interest in evaluating the accuracy in this case. However, this random quantization can be useful to quickly estimate the model footprints on a target after quantization. We will see how to proceed in next section.

    1.5 Hydra and MLflow settings

    The mlflow and hydra sections must always be present in the YAML configuration file. The hydra section can be used to specify the name of the directory where experiment directories are saved and/or the pattern used to name experiment directories. With the YAML code below, every time you run the Model Zoo, an experiment directory is created that contains all the directories and files created during the run. The names of experiment directories are all unique as they are based on the date and time of the run.

    hydra:
      run:
        dir: ./experiments_outputs/${now:%Y_%m_%d_%H_%M_%S}

    The mlflow section is used to specify the location and name of the directory where MLflow files are saved, as shown below:

    mlflow:
      uri: ./experiments_outputs/mlruns
2. Quantize your model

To launch your model quantization using a real dataset, run the following command from src/ folder:

python stm32ai_main.py --config-path ./config_file_examples/ --config-name quantization_config.yaml

Quantized tflite model can be found in corresponding experiments_outputs/ folder.

In case you want to evaluate the accuracy of the quantized model, you can either launch the evaluation operation mode on the generated quantized model (please refer to evaluation README.md that describes in details how to proceed) or you can use chained services like launching chain_eqe example with command below:

python stm32ai_main.py --config-path ./config_file_examples/ --config-name chain_eqe_config.yaml

In case you want to evaluate your quantized model footprints, you can either launch the benchmark operation mode on the generated quantized model (please refer to benchmarking README.md that describes in details how to proceed) or you can use chained services like launching chain_qb example with command below:

python stm32ai_main.py --config-path ./config_file_examples/ --config-name chain_qb_config.yaml

Chained services work whether you specify a quantization dataset or not (random quantization).