Skip to content

This is a repository which step by step teaches you how to use the "Ultralytics Hub" to train your own yolov8n model and deploy it to FVP.

License

Notifications You must be signed in to change notification settings

HimaxWiseEyePlus/YOLOv8_on_WE2

Repository files navigation

YOLOv8_on_WE2

This is a repository which step by step teaches you how to use the "Ultralytics Hub" to train your own yolov8n model and deploy it to FVP.

Yolov8n object detection

Web GUI training

  • Web GUI Overview
  • Home page alt text
  • Provided public dataset (only detection dataset) alt text
  • You can also upload your own dataset alt text
  • Integrations with the Roboflow which can upload your own data and output yolov8 data format alt text alt text
  • Create project page alt text
  • Models which you trained or pre-trained alt text
  • GUI training on your own
  • Create your own project alt text
  • Create model alt text
  • Select provided dataset alt text
  • Set model setting
    • Please choose Yolov8n (n: nano is the smallest model at Yolov8 series) and use pre-trained model alt text
  • Set training setting
    • Scroll down to set the advance setting
      • Please set input size to 192, which can fulfill the HIMAX WE2 setting alt text
    • Connect to your own training resource
      • Google Colab alt text
        • please copy the Colab code
        • open Google Colab and paste on Google Colab
      • Bring to your own agent alt text
        • Please pip install the Ultralytics package
        • Create the python file and paste the python code
        • Execute python file
  • See the training progress on the Web alt text

Export yolov8n object detection pytorch model to int8 tflite

  • After training process is done, then you will get the best.pt which is the pytorch model.
  • Use python code to export it to int8 tflite (full interger quant) by Ultralytics export API
    • If your training dataset is another, you should change the data yaml file.
      from ultralytics import YOLO
      
      # # Load a model
      image_size = 192
      
      model = YOLO("best.pt")  
      
      model.export(format="tflite", imgsz = image_size, int8 = True, data="SKU-110K.yaml")
      
      
    • We use provided dataset SKU-110K for retail detect alt text
    • The output int8 tflite model will be called *_full_integer_quant.tflite
    • You can also use yolov8n object detection pre-trained weight, which is output COCO 80 class.
      from ultralytics import YOLO
      
      # # Load a model
      image_size = 192
      
      model = YOLO("yolov8n.pt")  
      
      model.export(format="tflite", imgsz = image_size, int8 = True, data="coco128.yaml")
      
      

How to use HIMAX config file to generate vela model

  • You can reference here.
  • Go under vela folder
    cd vela
    
  • Install necessary package:
    pip install ethos-u-vela
    
  • Run vela with himax config ini file with mac=64 and the yolov8n object detect example tflite model
    vela --accelerator-config ethos-u55-64 --config himax_vela.ini --system-config My_Sys_Cfg --memory-mode My_Mem_Mode_Parent --output-dir ./img_yolov8_192 ./img_yolov8_192/yolov8n_full_integer_quant_size_192.tflite
    
  • You will see the vela report on the terminal: : (Notes: Total SRAM used less than 1MB will be better)
    • There are 5 transpose ops leak out the vela compiler, and it will be fall back to run on Cortex-M55 CPU. alt text

Export yolov8n object detection pytorch model to int8 tflite and delete 4 transpose ops

  • If there are many transpose OP run on CPU, it will be slow.
  • At the Ultralytics repositroy, it use PINTO0309's onnx2tf tool to convert the onnx model to int8 tflite.
  • First, we can convert pytroch to onnx model.
  • Second, add the following command at the ultralytics repository https://github.com/ultralytics/ultralytics/blob/main/ultralytics/engine/exporter.py#L709C2-L709C76 . (please reference the tutorial here)
     -prf {param_replacement.json} -rtpo 
    
    • We give the example here, replace_192_80cls_transpose_op.json, but there are something should be changed under following condition.
      1. Be careful, the op_name and param_name should be changed if needed, depending on the op_name and param_name of your converted onnx file .
      2. If your output class is not 80, you should modify replace_192_80cls_transpose_op.json to replace 144 value to 64+{class_num}. For example, class_num is 1, you should modify replace_192_80cls_transpose_op.json to replace 144 value to 65, and then it could be correct.
  • Finally, you can delete 4 transpose op, at final int8 tflite.
    • original tflite which pass vela alt text
    • delete 4 transpose tflite which pass vela alt text

Yolov8n pose

Export yolov8n pose pytorch model to int8 tflite

  • We use DeGirum/ultralytics_yolov8 to convert the yolov8n pose pre-trained weight to int8 tflite.
    • Prerequisites
    #create python virtual environment
    python3 -m venv ultralytics_yolov8_venv
    
    #activate ultralytics_yolov8_venv
    source ultralytics_yolov8_venv/bin/activate
    
    pip install tensorflow==2.13.1
    pip install onnx2tf==1.15.4
    pip install -U onnx==1.15.0 \
    && pip install -U nvidia-pyindex \
    && pip install -U onnx-graphsurgeon \
    && pip install -U onnxruntime==1.16.3 \
    && pip install -U onnxsim==0.4.33 \
    && pip install -U simple_onnx_processing_tools \
    && pip install -U onnx2tf \
    && pip install -U h5py==3.7.0 \
    && pip install -U psutil==5.9.5 \
    && pip install -U ml_dtypes==0.2.0
    
    git clone https://github.com/DeGirum/ultralytics_yolov8
    cd ultralytics_yolov8
    
    #install ultralytics_yolov8 package
    pip install -e .
    
    cd ..
    
    • Export command
    python dg_export_int8_output.py --weights="yolov8n-pose.pt"  --img=192
    #or
    python dg_export_int8_output.py --weights="yolov8n-pose.pt"  --img=256
    
  • The DeGirum/ultralytics_yolov8 repository exporting a YOLOv8n pose model with 7 separate outputs improved performance in quantized model.

Retrain yolov8n pose pytorch model and export it to int8 tflite

  • You can train the yolov8n pose model on your own PC.
    python dg_train_pose.py --weights="yolov8n-pose.pt"  --img=256
    
  • After training process is done, then you will get the best.pt which is the pytorch model. Next, it will automatically generate the best_save_model/best_full_integer_quant.tflite. Just generate the vela model by passing the best_save_model/best_full_integer_quant.tflite to vela compiler, and you can run the model which you retrain on WE2. alt text

How to use HIMAX config file to generate vela model

  • You can reference here.
  • Go under vela folder
cd vela
  • Install necessary package:
pip install ethos-u-vela
  • Run vela with himax config ini file with mac=64 and the yolov8n pose example tflite model
vela --accelerator-config ethos-u55-64 --config himax_vela.ini --system-config My_Sys_Cfg --memory-mode My_Mem_Mode_Parent --output-dir ./img_yolov8_pose_192 ./img_yolov8_pose_192/yolov8n-pose_full_integer_quant.tflite
  • You will see the vela report on the terminal: (Notes: Total SRAM used less than 1MB will be better)
    • There are 4 transpose ops leak out the vela compiler, and it will be fall back to run on Cortex-M55 CPU. alt text

Deploy YOLOv8n to FVP

Prerequisites

  • To run evaluations using this software, we suggest using Ubuntu 20.04 LTS environment.
  • Install the toolkits listed below:
    • Install necessary packages:

      sudo apt-get update
      
      sudo apt-get install cmake
      
      sudo apt-get install curl
      
      sudo apt install xterm
      
      sudo apt install python3
      
      sudo apt install python3.8-venv
      
      sudo apt-get install libpython3.8-dev
      
    • Corstone SSE-300 FVP: aligned with the Arm MPS3 development platform and includes both the Cortex-M55 and the Ethos-U55 processors.

      # Fetch Corstone SSE-300 FVP
      wget https://developer.arm.com/-/media/Arm%20Developer%20Community/Downloads/OSS/FVP/Corstone-300/MPS3/FVP_Corstone_SSE-300_Ethos-U55_11.14_24.tgz
      

      alt text

      # Create folder to be extracted
      mkdir temp
      # Extract the archive
      tar -C temp -xvzf FVP_Corstone_SSE-300_Ethos-U55_11.14_24.tgz
      

      alt text

      # Execute the self-install script
      temp/FVP_Corstone_SSE-300_Ethos-U55.sh --i-agree-to-the-contained-eula --no-interactive -d CS300FVP
      

      alt text

    • GNU Arm Embedded Toolchain 10-2020-q4-major is the only version supports Cortex-M55.

      # fetch the arm gcc toolchain.
      wget https://developer.arm.com/-/media/Files/downloads/gnu-rm/10-2020q4/gcc-arm-none-eabi-10-2020-q4-major-x86_64-linux.tar.bz2
      
      # Extract the archive
      tar -xjf gcc-arm-none-eabi-10-2020-q4-major-x86_64-linux.tar.bz2
      
      # Add gcc-arm-none-eabi/bin into PATH environment variable.
      export PATH="${PATH}:/[location of your GCC_ARM_NONE_EABI_TOOLCHAIN_ROOT]/gcc-arm-none-eabi/bin"
      
    • Arm ML embedded evaluation kit Machine Learning (ML) applications targeted for Arm Cortex-M55 and Arm Ethos-U55 NPU.

      • We use Arm ML embedded evaluation kit to run the Yolov8n FVP example.
        # Fetch Arm ML embedded evaluation kit
        wget https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ml-embedded-evaluation-kit/+archive/refs/tags/22.02.tar.gz
        
        mkdir ml-embedded-evaluation-kit
        tar -C ml-embedded-evaluation-kit  -xvzf 22.02.tar.gz
        cp -r ./source/application/main/include ./ml-embedded-evaluation-kit/source/application/main
        cp -r ./source/application/tensorflow-lite-micro/include ./ml-embedded-evaluation-kit/source/application/tensorflow-lite-micro
        cp -r ./source/profiler/include ./ml-embedded-evaluation-kit/source/profiler
        cp -r ./source/use_case/ad/include ./ml-embedded-evaluation-kit/source/use_case/ad
        cp -r ./source/use_case/asr/include ./ml-embedded-evaluation-kit/source/use_case/asr
        cp -r ./source/use_case/img_class/include ./ml-embedded-evaluation-kit/source/use_case/img_class
        cp -r ./source/use_case/inference_runner/include ./ml-embedded-evaluation-kit/source/use_case/inference_runner
        cp -r ./source/use_case/kws/include ./ml-embedded-evaluation-kit/source/use_case/kws
        cp -r ./source/use_case/kws_asr/include ./ml-embedded-evaluation-kit/source/use_case/kws_asr
        cp -r ./source/use_case/noise_reduction/include ./ml-embedded-evaluation-kit/source/use_case/noise_reduction
        cp -r ./source/use_case/object_detection/include ./ml-embedded-evaluation-kit/source/use_case/object_detection
        cp -r ./source/use_case/vww/include ./ml-embedded-evaluation-kit/source/use_case/vww
        cp -r download_dependencies.py ./ml-embedded-evaluation-kit/
        cp -r set_up_default_resources.py ./ml-embedded-evaluation-kit/
        cp -r gen_rgb_cpp.py ./ml-embedded-evaluation-kit/scripts/py/
        cp -r requirements.txt ./ml-embedded-evaluation-kit/scripts/py/
        cd ml-embedded-evaluation-kit/
        rm -rf ./dependencies
        python3 ./download_dependencies.py
        ./build_default.py --npu-config-name ethos-u55-64
        #go out ml-embedded-evaluation-kit folder and copy the example resources to ML embedded evaluation kit
        cd ..
        cp -r ./resources/img_yolov8_192 ./ml-embedded-evaluation-kit/resources
        cp -r ./source/use_case/img_yolov8_192 ./ml-embedded-evaluation-kit/source/use_case
        cp -r ./vela/img_yolov8_192 ./ml-embedded-evaluation-kit/resources_downloaded/
        
        cp -r ./resources/img_yolov8_192_delete_transpose ./ml-embedded-evaluation-kit/resources
        cp -r ./source/use_case/img_yolov8_192_delete_transpose ./ml-embedded-evaluation-kit/source/use_case
        cp -r ./vela/img_yolov8_192_delete_transpose ./ml-embedded-evaluation-kit/resources_downloaded/
        
        cp -r ./resources/img_yolov8_pose_192 ./ml-embedded-evaluation-kit/resources
        cp -r ./source/use_case/img_yolov8_pose_192 ./ml-embedded-evaluation-kit/source/use_case
        cp -r ./vela/img_yolov8_pose_192 ./ml-embedded-evaluation-kit/resources_downloaded/
        
        cp -r ./resources/img_yolov8_pose_256 ./ml-embedded-evaluation-kit/resources
        cp -r ./source/use_case/img_yolov8_pose_256 ./ml-embedded-evaluation-kit/source/use_case
        cp -r ./vela/img_yolov8_pose_256 ./ml-embedded-evaluation-kit/resources_downloaded/
        

Build with Yolov8n Object detection tflite model passing vela

  • Go under folder of ml-embedded-evaluation-kit

    cd ml-embedded-evaluation-kit
    
  • First, Create the output file and go under the folder

    mkdir build_img_yolov8_192 && cd build_img_yolov8_192
    
  • Second, Configure the Yolov8n Object detection example and set ETHOS_U_NPU_ENABLED to be ON.And you can run with Ethos-U55 NPU.

    cmake ../ -DUSE_CASE_BUILD=img_yolov8_192 \-DETHOS_U_NPU_ENABLED=ON
    
  • Compile the Yolov8n Object detection example

    make -j8
    

Run with Yolov8n Object detection tflite model and inference using Ethos-U55 NPU and m55

  • Go out and under the folder of YOLOv8_on_WE2
    cd ../../
    
  • Run with the commad about
    CS300FVP/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 -C ethosu.num_macs=64 ml-embedded-evaluation-kit/build_img_yolov8_192/bin/ethos-u-img_yolov8_192.axf
    
    Be careful of the ethosu.num_macs number of the MACS at the command. If you use missmatch MACS number with vela model, it will be invoke fail.
  • You with see the FVP telnetterminal result below:
  • Start inference:
    • You will see the input size, output tensor size and MACS size on telnetterminal.
    • The tflite op has run with the ethos-u op. alt text
  • Run inference:
    • key-in 1 on telnetterminal and you will start to inference first image with Ethos-U55 NPU and Cortex-M55. alt text
    • First, you will see the input image on the screen.
    • Then, you will see the detection result with bbox and class on the screen. alt text

Build with Yolov8n Object detection delete transpose tflite model passing vela

  • Go under folder of ml-embedded-evaluation-kit

    cd ml-embedded-evaluation-kit
    
  • First, Create the output file and go under the folder

    mkdir build_img_yolov8_192_delete_transpose && cd build_img_yolov8_192_delete_transpose
    
  • Second, Configure the Yolov8n Object detection delete transpose example and set ETHOS_U_NPU_ENABLED to be ON.And you can run with Ethos-U55 NPU.

    cmake ../ -DUSE_CASE_BUILD=img_yolov8_192_delete_transpose \-DETHOS_U_NPU_ENABLED=ON
    
  • Compile the Yolov8n Object detection delete transpose example

    make -j8
    

Run with Yolov8n Object detection delete transpose tflite model and inference using Ethos-U55 NPU and m55

  • Go out and under the folder of YOLOv8_on_WE2
    cd ../../
    
  • Run with the commad about
    CS300FVP/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 -C ethosu.num_macs=64 ml-embedded-evaluation-kit/build_img_yolov8_192_delete_transpose/bin/ethos-u-img_yolov8_192_delete_transpose.axf
    
    Be careful of the ethosu.num_macs number of the MACS at the command. If you use missmatch MACS number with vela model, it will be invoke fail.
  • You with see the FVP telnetterminal result below:
  • Start inference:
    • You will see the input size, output tensor size and MACS size on telnetterminal.
    • The tflite op has run with the ethos-u op. alt text
  • Run inference:
    • key-in 1 on telnetterminal and you will start to inference first image with Ethos-U55 NPU and Cortex-M55. alt text
    • First, you will see the input image on the screen.
    • Then, you will see the detection result with bbox and class on the screen. alt text

Build with Yolov8n pose tflite model passing vela

  • Go under folder of ml-embedded-evaluation-kit

    cd ml-embedded-evaluation-kit
    
  • First, Create the output file and go under the folder

    mkdir build_img_yolov8_pose_192 && cd build_img_yolov8_pose_192
    
  • Second, Configure the Yolov8n pose example and set ETHOS_U_NPU_ENABLED to be ON.And you can run with Ethos-U55 NPU.

    cmake ../ -DUSE_CASE_BUILD=img_yolov8_pose_192 \-DETHOS_U_NPU_ENABLED=ON
    
  • Compile the Yolov8n pose example

    make -j8
    

Run with Yolov8n pose tflite model and inference using Ethos-U55 NPU and m55

  • Go out and under the folder of YOLOv8_on_WE2
    cd ../../
    
  • Run with the commad about
    CS300FVP/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 -C ethosu.num_macs=64 ml-embedded-evaluation-kit/build_img_yolov8_pose_192/bin/ethos-u-img_yolov8_pose_192.axf
    
    Be careful of the ethosu.num_macs number of the MACS at the command. If you use missmatch MACS number with vela model, it will be invoke fail.
  • You with see the FVP telnetterminal result below:
  • Start inference:
    • You will see the input size, output tensor size and MACS size on telnetterminal.
    • The tflite op has run with the ethos-u op. alt text alt text
  • Run inference:
    • key-in 1 on telnetterminal and you will start to inference first image with Ethos-U55 NPU and Cortex-M55. alt text
    • First, you will see the input image on the screen.
    • Then, you will see the detection result with bbox and pose key-points on the screen. alt text

Build with Yolov8n pose tflite model (input size = 256) passing vela

  • Go under folder of ml-embedded-evaluation-kit

    cd ml-embedded-evaluation-kit
    
  • First, Create the output file and go under the folder

    mkdir build_img_yolov8_pose_256 && cd build_img_yolov8_pose_256
    
  • Second, Configure the Yolov8n pose example and set ETHOS_U_NPU_ENABLED to be ON.And you can run with Ethos-U55 NPU.

    cmake ../ -DUSE_CASE_BUILD=img_yolov8_pose_256 \-DETHOS_U_NPU_ENABLED=ON
    
  • Compile the Yolov8n pose example

    make -j8
    

Run with Yolov8n pose tflite model (input size = 256) and inference using Ethos-U55 NPU and m55

  • Go out and under the folder of YOLOv8_on_WE2
    cd ../../
    
  • Run with the commad about
    CS300FVP/models/Linux64_GCC-6.4/FVP_Corstone_SSE-300_Ethos-U55 -C ethosu.num_macs=64 ml-embedded-evaluation-kit/build_img_yolov8_pose_256/bin/ethos-u-img_yolov8_pose_256.axf
    
    Be careful of the ethosu.num_macs number of the MACS at the command. If you use missmatch MACS number with vela model, it will be invoke fail.

Reference

  1. https://github.com/ultralytics/ultralytics
  2. https://github.com/PINTO0309/onnx2tf
  3. https://github.com/DeGirum/ultralytics_yolov8

About

This is a repository which step by step teaches you how to use the "Ultralytics Hub" to train your own yolov8n model and deploy it to FVP.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published