logo

SCIENTIA SINICA Informationis, Volume 51 , Issue 1 : 13(2021) https://doi.org/10.1360/SSI-2020-0186

Convolution network pruning based on the evaluation of the importance of characteristic attributions

More info
  • ReceivedJun 19, 2020
  • AcceptedAug 5, 2020
  • PublishedDec 29, 2020

Abstract


Funded by

国家重点研发计划(2017YFC1703506)

国家自然科学基金重点项目(61632004,61832002,61672518)


References

[1] LeCun Y. Generalization and network design strategies. Connectionism in Perspective, 1989, 19: 143--155. Google Scholar

[2] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012. 1097--1105. Google Scholar

[3] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 1--9. Google Scholar

[4] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770--778. Google Scholar

[5] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014. 580--587. Google Scholar

[6] Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems, 2015. 91--99. Google Scholar

[7] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 3431--3440. Google Scholar

[8] Chen L C, Papandreou G, Kokkinos I. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 834-848 CrossRef Google Scholar

[9] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436--444. Google Scholar

[10] Ji Rongrong, Lin Shaohui, Chao Fei, et al. A review of deep neural network compression and acceleration. Journal of Computer Research and Development, 2018, 55(9): 1871-1888 doi: 10.7544/issn1000-1239.2018.20180129. Google Scholar

[11] Li H, Kadav A, Durdanovic I, et al. Pruning filters for efficient convnets. 2016,. arXiv Google Scholar

[12] Han S, Pool J, Tran J, et al. Learning both weights and connections for efficient neural network. In: Proceedings of Advances in Neural Information Processing Systems, 2015. 1135--1143. Google Scholar

[13] Chen W, Wilson J, Tyree S, et al. Compressing neural networks with the hashing trick. In: Proceedings of International Conference on Machine Learning, 2015. 2285--2294. Google Scholar

[14] Denton E L, Zaremba W, Bruna J, et al. Exploiting linear structure within convolutional networks for efficient evaluation. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 1269--1277. Google Scholar

[15] Buciluva C, Caruana R, Niculescu-Mizil A. Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006. 535--541. Google Scholar

[16] Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and $<$ 0.5 MB model size. 2016,. arXiv Google Scholar

[17] Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. 2017,. arXiv Google Scholar

[18] Schulz K, Sixt L, Tombari F, et al. Restricting the flow: Information bottlenecks for attribution. 2020,. arXiv Google Scholar

[19] Selvaraju R R, Cogswell M, Das A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, 2017. 618--626. Google Scholar

[20] Molchanov P, Tyree S, Karras T, et al. Pruning convolutional neural networks for resource efficient inference. 2016,. arXiv Google Scholar

[21] Springenberg J T, Dosovitskiy A, Brox T, et al. Striving for simplicity: the all convolutional net. 2014,. arXiv Google Scholar

[22] Molchanov P, Mallya A, Tyree S, et al. Importance estimation for neural network pruning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 11264--11272. Google Scholar

[23] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014,. arXiv Google Scholar

[24] Nilsback M E, Zisserman A. Automated flower classification over a large number of classes. In: Proceedings of 2008 6th Indian Conference on Computer Vision, Graphics & Image Processing, 2008. 722--729. Google Scholar

[25] Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases, 2009, 1(4). Google Scholar

[26] LeCun Y, Denker J S, Solla S A. Optimal brain damage. In: Proceedings of Advances in Neural Information Processing Systems, 1990. 598--605. Google Scholar

[27] Hassibi B, Stork D G. Second order derivatives for network pruning: Optimal brain surgeon. In: Proceedings of Advances in Neural Information Processing Systems, 1993. 164--171. Google Scholar

[28] Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. 2015,. arXiv Google Scholar

[29] Guo Y, Yao A, Chen Y. Dynamic network surgery for efficient dnns. In: Proceedings of Advances in Neural Information Processing Systems, 2016. 1379--1387. Google Scholar

[30] He Y, Liu P, Wang Z, et al. Filter pruning via geometric median for deep convolutional neural networks acceleration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 4340--4349. Google Scholar

[31] Hu H, Peng R, Tai Y W, et al. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. 2016,. arXiv Google Scholar

[32] Lin S, Ji R, Li Y, et al. Accelerating Convolutional Networks via Global & Dynamic Filter Pruning. In: Proceedings of 27th International Joint Conference on Artificial Intelligence, 2018. 2425--2432. Google Scholar

[33] Lin M, Ji R, Wang Y, et al. HRank: filter pruning using high-rank feature map. 2020,. arXiv Google Scholar

[34] Wang D, Zhou L, Zhang X, et al. Exploring linear relationship in feature map subspace for convnets compression. 2018,. arXiv Google Scholar

[35] Lin S, Ji R, Yan C, et al. Towards optimal structured cnn pruning via generative adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2790--2799. Google Scholar

[36] Gao X, Zhao Y, Dudziak L, et al. Dynamic channel pruning: Feature boosting and suppression. 2018,. arXiv Google Scholar

[37] Huang Z, Wang N. Data-driven sparse structure selection for deep neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), 2018. 304--320. Google Scholar

[38] He Y, Kang G, Dong X, et al. Soft filter pruning for accelerating deep convolutional neural networks. 2018,. arXiv Google Scholar

[39] Liu Z, Li J, Shen Z, et al. Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE International Conference on Computer Vision, 2017. 2736--2744. Google Scholar

[40] Zhao C, Ni B, Zhang J, et al. Variational convolutional neural network pruning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2780--2789. Google Scholar

[41] Zhuo H, Qian X, Fu Y, et al. Scsp: Spectral clustering filter pruning with soft self-adaption manners. 2018,. arXiv Google Scholar

[42] Paszke A, Gross S, Chintala S, et al. Automatic differentiation in PyTorch. In: Proceedings of Conference and Workshop on Neural Information Processing Systems, 2017. Google Scholar

  • Figure 1

    (Color online) (a) Attributional characteristics of the model; (b) attribution characteristics of the filter

  • Figure 2

    (Color online) Illustration of attribution pruning method

  •   

    Algorithm 1 Pruning algorithm for convolutional neural networks

    Require:Datasets $D$, convergent $\rm~MODEL$, accuracy reduction boundary $\varepsilon$, the number of filters pruned $\tau$ during iterative pruning, and the minimum proportion of filters retained $\beta_{\rm~min}$.

    $\varphi=1$ represents the number of filters currently retained/the number of original filters;

    while $P_{\rm~ori}-P_{\rm~com}\leq~\varepsilon~~{\rm~and}~~\varphi~\geq~\beta_{\rm~min}$ do

    Dataset $D$ is input into $\rm~MODEL$ and computed forward;

    Evaluate the importance of each filter in $\rm~MODEL$ by attribution or Taylor-guided pruning method;

    The importance is regularized by L2 norm;

    Sort from small to large by regularization result and set $T={\rm~valuate}_{\tau}~~~{\rm~or}~~~T=|\triangle~L(o)|_{\tau}$;

    Update $\delta$ according to $T$;

    Use $\delta$ to prune corresponding filters, and finetune $\rm~MODEL$;

    $\varphi=\varphi-\frac~{\tau}{N}$ and calculate the accuracy of compression model $P_{\rm~com}$.

    end while

  • Table 1   Pruning results of VGG-16 on flower-102
    Model Top-1 (%) FLOPs (PR (%)) Parameters (PR (%))
    VGG-16 $76.86$ $1.56\times~10^{10}~(0.0)$ $1.35\times~~10^{8}~(0.0)$
    Attribution (low compression ratio) $76.62$ $3.86\times~~10^{9}~(75.26)$ $4.42\times~~10^{7}~(67.26)$
    Taylor-guided (low compression ratio) $75.76$ $4.40\times~~10^9~(71.79)$ $1.09\times~~10^8~(19.26)$
    L1 [11] $74.23$ $2.03\times~~10^9~(86.99)$ $4.20\times~~10^7~(68.89)$
    Taylor [20] $71.00$ $1.07\times~~10^9~(93.14)$ $2.66\times~10^7~(80.30)$
    Taylor-guided (high compression ratio) $72.36$ $1.11\times~~10^9~(92.88)$ $3.36\times~10^7~(75.11)$
    Attribution (high compression ratio) $74.90$ $5.55\times~10^8~(96.44)$ $2.34\times~10^7~(83.04)$
  • Table 2   Pruning results of ResNet-18/ResNet-50 on flower-102
    Model Top-1 (%) FLOPs (PR (%)) Parameters (PR (%))
    ResNet-18/ResNet-50 $75.39/85.68$ $1.88\times~10^9~(0.0)/6.59\times~10^9~(0.0)$ $1.19\times~10^{7}~(0.0)/4.02\times~10^7~(0.0)$
    Taylor [20] $70.62/81.67$ $7.06\times~10^{8}~(62.45)/2.27\times~10^9~(65.55)$ $2.46\times~10^6~(79.33)/1.29\times~10^7~(67.91)$
    Taylor-guided $73.86/82.96$ $7.51\times~10^8~(60.05)/2.30\times~10^9~(65.10)$ $2.07\times~10^6~(82.61)/8.73\times~10^6~(78.28)$
    Attribution $74.53/82.95$ $6.03\times~10^8~(67.93)/2.11\times~10^9~(67.98)$ $2.58\times~10^6~(78.32)/9.48\times~10^6~(76.42)$
  • Table 3   Pruning results of VGGNet on cifar-10
    Model Top-1 (%) FLOPs (PR (%)) Parameters (PR (%))
    VGG-16 $93.96$ $1.56\times~10^{10}~(0.0)$ $1.35\times~10^{8}~(0.0)$
    L1 [11] $93.40$ $2.06\times~10^8~(34.39)$ $5.04\times~10^6~(65.71)$
    SSS [37] $93.02$ $1.83\times~10^8~(41.72)$ $3.95\times~10^6~(73.13)$
    Zhao et al. [40] $93.18$ $1.90\times~10^8~(39.49)$ $3.92\times~10^6~(73.33)$
    Taylor [20] $93.20$ $1.28\times~10^8~(59.24)$ $4.20\times~10^6~(71.43)$
    Taylor-guided $93.21$ $6.50\times~10^7~(79.30)$ $2.05\times~10^6~(86.05)$
  • Table 4   Precision comparison of Taylor-guided pruning and original model in cifar-10
    Class Class of original model top-1 (%) Class of compress model top-1 (%)
    Plane 92.86 89.29
    Car 94.00 94.00
    Bird 87.34 79.75
    Cat 82.19 80.82
    Deer 85.45 89.09
    Dog 84.75 79.66
    Frog 92.86 87.50
    Horse 98.44 93.75
    Ship 94.83 93.60
    Truck 93.59 92.31