some intro
- four classes: debris, forest, water, other
also some intro
only for CUDA-enabled host machine
docker pull tumbgd/vai-pt-cuda
Xilinx Kria KV260
Download dataset here, then unzip.
-
Generate h5 dataset
-
Train model
-
Pruning the trained model
-
Quantizing the pruned trained model
-
Compiling the quantized pruned trained model (for on-board depolyment):
vai_c_xir -x ./quantize_result/Model_int.xmodel -a /opt/vitis_ai/compiler/arch/DPUCZDX8G/KV260/arch.json -o dwc_ob -n dwc_ob
Please follow the README in onboard
.
Model | Acc. | # param. | Size |
---|---|---|---|
non-opt | 86.0% | 205.28k | 806kB |
opt | 85.2% | 76.65k (62.6% smaller) | 334kB (58.6% smaller) |
Device & Model | Inference Speed (FPS) |
---|---|
opt on KV260 | 211.07 |
non-opt on KV260 | 87.71 |
non-opt on laptop (CPU) | 255.59 |
non-opt on laptop (GPU) | 308.75 |
-
setup VART on Ubuntu 22.04 (currently on pre-built images with shabby GUI)
- lower version VART seems okay, but much less functions supported.
-
bias_corr
isNone
. Seems no error here (accuray hardly drops), but why not 0 rather thanNone
-
ReLU
should be supported as stated in the Xilinx document but not in practice.- Maybe
torch.nn.ReLU
is not supported, buttoerch.nn.functional.relu
is. Need a try here. - Or just not supported. In this case, try all possible activation functions to avoid multiple subgraphs. Our aim is to run a model fully on a single DPU graph to aviod data copy between DPU and CPU.
- Maybe