(NIPS 2018 talk for ML on device)
May 2019
tl;dr: Layer-wise pruning, but with layer-compensated loss.
Previous method approximates the pruning loss increase with the L1 or L2 of the pruned filter. This is not true. LcP first approximates the layer-wise error compensation and then uses naive pruning (global greedy pruning algorithms) to prune network.
- Two problems in pruning the network: how many to prune and which to prune. The first is also named layer scheduling.
- Naive pruning algorithm: global iterative pruning without layer scheduling.
- Two approximation in prior art of multi-filter pruning:
- Approximate loss change with a ranking metric (the paper addresses this issue)
- Approximate the effect of multiple filter pruning with addition of single layer pruning.
- The paper assumes that the approximation error to be identical for filters in the same layer. Therefore only L latent variables
$\beta_l, l=1, ..., L$ need to be approximated.
- Summary of technical details
- Questions and notes on how to improve/revise the current work