国家重点研发计划(2017YFC1703506)
国家自然科学基金重点项目(61632004,61832002,61672518)
[1] LeCun Y. Generalization and network design strategies. Connectionism in Perspective, 1989, 19: 143--155. Google Scholar
[2] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012. 1097--1105. Google Scholar
[3] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 1--9. Google Scholar
[4] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770--778. Google Scholar
[5] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014. 580--587. Google Scholar
[6] Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems, 2015. 91--99. Google Scholar
[7] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 3431--3440. Google Scholar
[8] Chen L C, Papandreou G, Kokkinos I. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 834-848 CrossRef Google Scholar
[9] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436--444. Google Scholar
[10] Ji Rongrong, Lin Shaohui, Chao Fei, et al. A review of deep neural network compression and acceleration. Journal of Computer Research and Development, 2018, 55(9): 1871-1888 doi: 10.7544/issn1000-1239.2018.20180129. Google Scholar
[11] Li H, Kadav A, Durdanovic I, et al. Pruning filters for efficient convnets. 2016,. arXiv Google Scholar
[12] Han S, Pool J, Tran J, et al. Learning both weights and connections for efficient neural network. In: Proceedings of Advances in Neural Information Processing Systems, 2015. 1135--1143. Google Scholar
[13] Chen W, Wilson J, Tyree S, et al. Compressing neural networks with the hashing trick. In: Proceedings of International Conference on Machine Learning, 2015. 2285--2294. Google Scholar
[14] Denton E L, Zaremba W, Bruna J, et al. Exploiting linear structure within convolutional networks for efficient evaluation. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 1269--1277. Google Scholar
[15] Buciluva C, Caruana R, Niculescu-Mizil A. Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006. 535--541. Google Scholar
[16] Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and $<$ 0.5 MB model size. 2016,. arXiv Google Scholar
[17] Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. 2017,. arXiv Google Scholar
[18] Schulz K, Sixt L, Tombari F, et al. Restricting the flow: Information bottlenecks for attribution. 2020,. arXiv Google Scholar
[19] Selvaraju R R, Cogswell M, Das A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, 2017. 618--626. Google Scholar
[20] Molchanov P, Tyree S, Karras T, et al. Pruning convolutional neural networks for resource efficient inference. 2016,. arXiv Google Scholar
[21] Springenberg J T, Dosovitskiy A, Brox T, et al. Striving for simplicity: the all convolutional net. 2014,. arXiv Google Scholar
[22] Molchanov P, Mallya A, Tyree S, et al. Importance estimation for neural network pruning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 11264--11272. Google Scholar
[23] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014,. arXiv Google Scholar
[24] Nilsback M E, Zisserman A. Automated flower classification over a large number of classes. In: Proceedings of 2008 6th Indian Conference on Computer Vision, Graphics & Image Processing, 2008. 722--729. Google Scholar
[25] Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases, 2009, 1(4). Google Scholar
[26] LeCun Y, Denker J S, Solla S A. Optimal brain damage. In: Proceedings of Advances in Neural Information Processing Systems, 1990. 598--605. Google Scholar
[27] Hassibi B, Stork D G. Second order derivatives for network pruning: Optimal brain surgeon. In: Proceedings of Advances in Neural Information Processing Systems, 1993. 164--171. Google Scholar
[28] Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. 2015,. arXiv Google Scholar
[29] Guo Y, Yao A, Chen Y. Dynamic network surgery for efficient dnns. In: Proceedings of Advances in Neural Information Processing Systems, 2016. 1379--1387. Google Scholar
[30] He Y, Liu P, Wang Z, et al. Filter pruning via geometric median for deep convolutional neural networks acceleration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 4340--4349. Google Scholar
[31] Hu H, Peng R, Tai Y W, et al. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. 2016,. arXiv Google Scholar
[32] Lin S, Ji R, Li Y, et al. Accelerating Convolutional Networks via Global & Dynamic Filter Pruning. In: Proceedings of 27th International Joint Conference on Artificial Intelligence, 2018. 2425--2432. Google Scholar
[33] Lin M, Ji R, Wang Y, et al. HRank: filter pruning using high-rank feature map. 2020,. arXiv Google Scholar
[34] Wang D, Zhou L, Zhang X, et al. Exploring linear relationship in feature map subspace for convnets compression. 2018,. arXiv Google Scholar
[35] Lin S, Ji R, Yan C, et al. Towards optimal structured cnn pruning via generative adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2790--2799. Google Scholar
[36] Gao X, Zhao Y, Dudziak L, et al. Dynamic channel pruning: Feature boosting and suppression. 2018,. arXiv Google Scholar
[37] Huang Z, Wang N. Data-driven sparse structure selection for deep neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), 2018. 304--320. Google Scholar
[38] He Y, Kang G, Dong X, et al. Soft filter pruning for accelerating deep convolutional neural networks. 2018,. arXiv Google Scholar
[39] Liu Z, Li J, Shen Z, et al. Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE International Conference on Computer Vision, 2017. 2736--2744. Google Scholar
[40] Zhao C, Ni B, Zhang J, et al. Variational convolutional neural network pruning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2780--2789. Google Scholar
[41] Zhuo H, Qian X, Fu Y, et al. Scsp: Spectral clustering filter pruning with soft self-adaption manners. 2018,. arXiv Google Scholar
[42] Paszke A, Gross S, Chintala S, et al. Automatic differentiation in PyTorch. In: Proceedings of Conference and Workshop on Neural Information Processing Systems, 2017. Google Scholar
Figure 1
(Color online) (a) Attributional characteristics of the model; (b) attribution characteristics of the filter
Figure 2
(Color online) Illustration of attribution pruning method
$\varphi=1$ represents the number of filters currently retained/the number of original filters; |
Dataset $D$ is input into $\rm~MODEL$ and computed forward; |
Evaluate the importance of each filter in $\rm~MODEL$ by attribution or Taylor-guided pruning method; |
The importance is regularized by L2 norm; |
Sort from small to large by regularization result and set $T={\rm~valuate}_{\tau}~~~{\rm~or}~~~T=|\triangle~L(o)|_{\tau}$; |
Update $\delta$ according to $T$; |
Use $\delta$ to prune corresponding filters, and finetune $\rm~MODEL$; |
$\varphi=\varphi-\frac~{\tau}{N}$ and calculate the accuracy of compression model $P_{\rm~com}$. |
Model | Top-1 (%) | FLOPs (PR (%)) | Parameters (PR (%)) |
VGG-16 | $76.86$ | $1.56\times~10^{10}~(0.0)$ | $1.35\times~~10^{8}~(0.0)$ |
Attribution (low compression ratio) | $76.62$ | $3.86\times~~10^{9}~(75.26)$ | $4.42\times~~10^{7}~(67.26)$ |
Taylor-guided (low compression ratio) | $75.76$ | $4.40\times~~10^9~(71.79)$ | $1.09\times~~10^8~(19.26)$ |
L1 | $74.23$ | $2.03\times~~10^9~(86.99)$ | $4.20\times~~10^7~(68.89)$ |
Taylor | $71.00$ | $1.07\times~~10^9~(93.14)$ | $2.66\times~10^7~(80.30)$ |
Taylor-guided (high compression ratio) | $72.36$ | $1.11\times~~10^9~(92.88)$ | $3.36\times~10^7~(75.11)$ |
Attribution (high compression ratio) | $74.90$ | $5.55\times~10^8~(96.44)$ | $2.34\times~10^7~(83.04)$ |
Model | Top-1 (%) | FLOPs (PR (%)) | Parameters (PR (%)) |
ResNet-18/ResNet-50 | $75.39/85.68$ | $1.88\times~10^9~(0.0)/6.59\times~10^9~(0.0)$ | $1.19\times~10^{7}~(0.0)/4.02\times~10^7~(0.0)$ |
Taylor | $70.62/81.67$ | $7.06\times~10^{8}~(62.45)/2.27\times~10^9~(65.55)$ | $2.46\times~10^6~(79.33)/1.29\times~10^7~(67.91)$ |
Taylor-guided | $73.86/82.96$ | $7.51\times~10^8~(60.05)/2.30\times~10^9~(65.10)$ | $2.07\times~10^6~(82.61)/8.73\times~10^6~(78.28)$ |
Attribution | $74.53/82.95$ | $6.03\times~10^8~(67.93)/2.11\times~10^9~(67.98)$ | $2.58\times~10^6~(78.32)/9.48\times~10^6~(76.42)$ |
Model | Top-1 (%) | FLOPs (PR (%)) | Parameters (PR (%)) |
VGG-16 | $93.96$ | $1.56\times~10^{10}~(0.0)$ | $1.35\times~10^{8}~(0.0)$ |
L1 | $93.40$ | $2.06\times~10^8~(34.39)$ | $5.04\times~10^6~(65.71)$ |
SSS | $93.02$ | $1.83\times~10^8~(41.72)$ | $3.95\times~10^6~(73.13)$ |
Zhao et al. | $93.18$ | $1.90\times~10^8~(39.49)$ | $3.92\times~10^6~(73.33)$ |
Taylor | $93.20$ | $1.28\times~10^8~(59.24)$ | $4.20\times~10^6~(71.43)$ |
Taylor-guided | $93.21$ | $6.50\times~10^7~(79.30)$ | $2.05\times~10^6~(86.05)$ |
Class | Class of original model top-1 (%) | Class of compress model top-1 (%) |
Plane | 92.86 | 89.29 |
Car | 94.00 | 94.00 |
Bird | 87.34 | 79.75 |
Cat | 82.19 | 80.82 |
Deer | 85.45 | 89.09 |
Dog | 84.75 | 79.66 |
Frog | 92.86 | 87.50 |
Horse | 98.44 | 93.75 |
Ship | 94.83 | 93.60 |
Truck | 93.59 | 92.31 |