Data-Free Parameter Pruning and Quantization

For many applications, when transfer learning is used to retrain an image classification network for a new task, or when a new network is trained from scratch, the optimal network architecture is not known. This can result in overparameterization and redundant connections for the network. Pruning aims to identify these redundant, unnecessary connections that can be removed without affecting final network accuracy. This demonstration shows how to implement unstructured pruning in MATLAB®. Magnitude pruning is an intuitive first step to pruning, where the absolute value of each parameter is used as a metric of its relative importance to the network. A classification network is iteratively pruned using magnitude scores to achieve a target network sparsity. After pruning, the best solution is selected based on the accuracy of the pruned network as a function of sparsity. Unstructured pruning alone does not lead to any specific memory or inference speedup when not used with sparse matrix optimizers for the final deployed solution. But when used in combination with network quantization, it can lead to a more efficient deployment solution. This demo covers the quantization workflow. It also shows how the impact of quantization on the accuracy of the pruned network is similar to the accuracy of the floating-point network.