Neural network accelerators are increasingly used because of their competitive advantages in performance and energy efficiency. Their reliability is critical to high-level applications built on top of the accelerators, and must be comprehensively evaluated and verified to ensure application safety. In this case, neural network reliability analysis tools are demanded. While the reliability analysis accuracy and the overhead is always contradictory, we present an neural network reliability analysis toolbox to enable various reliability analysis requirements for the sake of fault-tolerant design at different abstraction levels of neural network applications. Specifically, we provide five different fault analysis tools or methods including FPGA-based fault emulation[1][2], architecture-level fault simulation[7], operation-level fault simulation [3], neuron-level fault simulation[4], and statistical model based reliability analysis[5]. We also made application-level fault simulation for autonomous driving scenarios[6]. Particularly, the architecture level fault simulation in [7] is developed by Prof.Tan's group and please refer to her homepage for more details.
FPGA-based fault emulation developed in [1] essentially have neural network accelerator implemented on FPGAs and injects stuck-at faults into the FPGA bitstream through vendor specific fault injection mechanism. It simulates the hardware faults of FPGAs directly and enables accurate fault simulation whenever it is required. The fault simulation is fast and only limited by the FPGAs.
The fault simulation in [7] is essentially a microarchitecture-level fault simulation framework and it can be utilized to analyze the reliability of systolic array based CNN accelerators. Specifically, it is built on top of scacle-sim simulator [8].
Operation-level fault simulation proposed in [3] essentially has bit-flip errors injected to output of primitive operations such addition and multiplication in neural network processing. It reflects the influence of bit flip errors of the underlying computing engines, but it is not bound to any specific computing architectures. Compared to architecture-specific fault injection, it is more general and much faster yet compromised with fault simulation accuracy. Moreover, it also has the non-linear functions and different convolution methods such as winograd taken into consideration, which is more accurate than the widely utilized neural-level fault injection tools with little fault simulation speed penalty.
Neuron-level fault simulation has random bit flip errors injected to neurons, which is also general and neglects the underlying micro-architecture of the computing engines. This is the most widely used fault injection approach because of the convenient portability and fast fault simulation speed. However, existing neuron-level fault injection tools such as PyTorchFI, TensorFI, and Ares usually only provide coarse-grained reliability analysis such as layer-wise reliability, which is insufficient for in-depth reliability analysis of neural network processing. Worse still, most of the fault injection tools typically have the fault injection mechanism combined with the neural network computing engines i.e. Pytorch, which makes the fault simulation complex across different models and scenarios. To address the above problems, we enable fault reliability analysis at various granularities ranging from bit-level, neuron-level, channal-level, and layer-wise as detailed below. In addition, the error injection mechanism is decoupled with the model processing as much as possible through internal observation hook, which makes the neural network reliability analysis convenient.
- Bit level: Distinguish the error of each binary bit.
- Per neuron: Distinguish the error of neuron value.
- Channel-wise or kernel-wise: Distinguish the error impact between convolution channels or kernels.
- Layer-wise or block-wise: Distinguish layers or blocks.
The base idea of statistical model based analysis is to observe the statistical trend when errors occur and propagate on large number of neurons. Although it is not as accurate as fault simulation, the anaylsis under different configurations such as the reliability of the model under different injection rate, quantization methods, and protection strategies can be obtained rapidly. Moreover, the statistical model based analysis is usually more general compared to fault injection based analysis.
Instead of performing the reliability analysis on neural network processing, we also investigate the reliability analysis on high-level applications such that the reliability issues can be estimated from the perspective of applications such as autonomous driving tasks, which can provide a more comprehensive analysis of the impact of hardware errors on the entire autonomous system. This is important, because autonomous driving system usually involves multiple processing stages and some of the stages may hide some hardware errors or some minor computing errors may induce critical failure of the autonomous driving tasks.
Method | Feature | Speed |
---|---|---|
FPGA-based reliability emulation | Simulate on hardware, accurate on specific hardware | Slow, depend on specific architecture |
Architecture-level fault simulation | Fault injection on cycle-accurate accelerator simulator | Slow, depend on specific architecture |
Operation-level fault simulation | Fault injection on primitive operations | Fast, independent with specific architecture |
Neuron-level fault simulation | Fault injection on neurons | Fast, independent with specific architecture |
Statistical model based analysis | Based on statistical trend | Orders of magnitude faster, independent with specific architecture |