Skip to content

Learned Approximate Matrix Profile (LAMP) implementation on Ultra96-v2 board

License

Notifications You must be signed in to change notification settings

aminiok1/LAMP-FPGA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LAMP-FPGA

This is an FPGA implementation of Learned Approximate Matrix Profile (LAMP) algorithm on Ultra96-V2 board. LAMP-FPGA takes a time series as input and predicts its matrix profile values for a particular window size. You can read more at the Matrix Profile Homepage.

Folder Structure

.
├── script                  # Scripts for generating the compiled model for use on FPGA and evalution
├── files                   # Input dataset sample, calibration data, pre-trained LAMP model
├── src                     # Source files for executing the model on FPGA
├── LICENSE
└── README.md

Build Instructions

In this section we show how to deploy LAMP on Xilinx Deep Processing Unit (DPU) and implement it on the FPGA given a pre-trained model on CPU. The instructions require Vitis-AI installed on the system.

1. Freezing Tensorflow graph

The Vitis-AI flow requires a frozen model for quantization and optimization steps. A frozen model contains information about the graph and checkpoint variables, saving these hyperparameters as constants within the graph structure. This allows fusing some of the layers together for deployment on DPU. We can generate a binary protobuf (.pb) file by running the freeze_graph.py script

python freeze_graph.py input_model

where input_model is the pre-trained LAMP model.

2. Quantization

We will quantize the weights/biases and activations of the model to improve the performance of the model inference on FPGA. Currently, Xilinx DPU only supports 8 bit models, so we quantize everything to 8 bits.

vai_q_tensorflow quantize 
                 --input_frozen_graph frozen_graph.pb 
                 --input_fn input_func.calib_input
                 --output_dir quantized 
                 --input_nodes input_1 
                 --output_nodes reshape_1/Reshape 
                 --input_shapes ?,256,1,32 
                 --calib_iter 32

frozen_graph.pb is the frozen model generated in the previous step, input_func is the python file that generates the input data for quantizer (since there is no backpropagation step here, the unlabeled dataset is sufficient), and calib_iter is the number of iterations for calibrating the activations, we noticed that values larger than 32 do not increase the quantizer accuracy by a large degree.

3. Evaluation

We will test the accuracy of the generated quantized model before deploying it to the FPGA.

python evaluate.py

evaluate.py reads in the Tensorflow frozen binary graph, runs the inference and reports the mean squared error and mean absolute percentage error by comparing the model output with the labels (matrix profile values).

4. Compilation

Vitis-AI Docker image does not support Ultra96-v2 board, we need to generate the DPU configuration file (Ultra96.dcf) required in the compile step by first downloading the DPU Hardware Handoff file dpu.hwh and then running the following command

dlet -f dpu.hwh

dlet is a host tool that extracts the DPU information from the input file and generates the configuration file.

Next, we will compile the model for the target hardware

vai_c_tensorflow --frozen_pb quantized\deploy_model.pb 
                 --arch /opt/vitis_ai/compiler/arch/DPUCZDX8G/ultra96/arch.json 
                 --output_dir . 
                 --net_name lamp

arch.json is located in the files directory. Since, Sigmoid and Global Average Pool layers are not supported by DPU, the command generates four kernels, two of which will be deployed to the FPGA and the other two will be implemented on the host CPU. Xilinx DNNK API has an issue with loading the static libraries that start with the same name, in order to fix this issue, we run the same command one more time setting the output model name to dense_lamp, later, we will use lamp_0.elf and dense_lamp_2.elf when loading the kernels.

5. Running inference

First we need to install dpu-pynq on the Ultra96 board. Open a terminal and run

git clone --recursive --shallow-submodules https://github.com/Xilinx/DPU-PYNQ
cd DPU-PYNQ/upgrade
make

pip3 install pynq-dpu

The build process might take up to an hour. After that, we can run the models on board by executing lamp_dpu.ipynb. The notebook takes a time series dataset as input and writes the predictions in predict.txt file.

About

Learned Approximate Matrix Profile (LAMP) implementation on Ultra96-v2 board

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published