SCIENTIA SINICA Informationis, Volume 49, Issue 6: 739(2019) https://doi.org/10.1360/N112017-00268

Fusion of front-end and back-end learning based on layer-by-layer data re-representation

More info
  • ReceivedDec 7, 2017
  • AcceptedJan 28, 2018
  • PublishedJun 11, 2019


Based on two research contents of machine learning, a two-element layered model of machine learning is proposed. In addition, the concepts of front-end learning, back-end learning, a combination of front-end and back-end learning, and the fusion of front-end and back-end leaning are presented. Specifically, a framework and optimization model for the fusion of front-end and back-end learning is constructed. For front-end learning, which is a simulated hierarchical working mechanism of the brain, we present a layer-by-layer data re-representation method, which is driven by both data and a model. In addition, we propose a specific implementation of the data re-representation method for visual learning.

Funded by



[1] Mitchell T M. Machine Learning. New York: McGraw-Hill Science, 1997. Google Scholar

[2] Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules. In: Proceedings of the 31st Conference on Neural Information Processing System, Long Beac, 2017. Google Scholar

[3] Xie J B, Xing J L, Zhang L N, et al. 20 Lectures on Visual Machine Learning. Beijing: Tsinghua University Press, 2015 [谢剑斌, 兴军亮, 张立宁, 等. 视觉机器学习20讲. 北京: 清华大学出版社, 2015]. Google Scholar

[4] Wu W, Yang J. $L_{1/2}$ regularization methods for weights sparsification of neural networks. Sci Sin Math, 2015, 45: 1487--1504. Google Scholar

[5] Liu J W, Liu Y, Luo X L. Semi-supervised learning methods. Chinese J Comput, 2015, 38: 1592--1617. Google Scholar

[6] Ma L R, Song D D, Liao L J. PSVM: a preference-enhanced SVM model using preference data for classification. Sci China Inf Sci, 2017, 60: 122103 CrossRef Google Scholar

[7] Deng C W, Huang G B, Xu J. Extreme learning machines: new trends and applications. Sci China Inf Sci, 2015, 58: 020301 CrossRef Google Scholar

[8] Feng X D, He X M. Robust low-rank data matrix approximations. Sci China Math, 2017, 60: 189-200 CrossRef ADS Google Scholar

[9] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313: 504-507 CrossRef PubMed ADS Google Scholar

[10] Qu W, Wang D L, Feng S. A novel cross-modal hashing algorithm based on multimodal deep learning. Sci China Inf Sci, 2017, 60: 092104 CrossRef Google Scholar

[11] Gao W, Zhou Z H. Dropout rademacher complexity of deep neural networks. Sci China Inf Sci, 2016, 59: 072104 CrossRef Google Scholar

[12] Shao G Q, Wu Y P, Yong A . Fingerprint compression based on sparse representation. IEEE Trans Image Process, 2014, 23: 489-501 CrossRef PubMed ADS Google Scholar

[13] Shao G Q, Han C Y, Guo T D, et al. An NMF-based method for the fingerprint orientation field estimation. In: Proceedings of Computer and Information Science, Warsaw, 2012. 93--104. Google Scholar

[14] Li M Q, Han C Y, Guo T D. New gradient algorithms for optimization problems constrained by a cartesian product of unit balls. Acta Math Appl Sin, 2018, 41: 43--54. Google Scholar

[15] Li M Q, Han C Y, Wang R X. Shrinking gradient descent algorithms for total variation regularized image denoising. Comput Opt Appl, 2017, 68: 643-660 CrossRef Google Scholar

[16] Donoho D L. Compressed sensing. IEEE Trans Inf Theory, 2006, 52: 1289-1306 CrossRef Google Scholar

[17] Candes E J, Romberg J, Tao T. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory, 2006, 52: 489-509 CrossRef Google Scholar

[18] Olshausen B A, Field D J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 1996, 381: 607-609 CrossRef PubMed ADS Google Scholar

[19] Olshausen B A, Field D J. Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Res, 1997, 37: 3311--3325. Google Scholar

[20] Olshausen B A, Field D J. Natural image statistics and efficient coding. Netw-Comput Neural Syst, 1996, 7: 333-339 CrossRef Google Scholar

[21] Li M Q. Optimization theory and algorithms for image denoising and representation layer by layer. Dissertation for Ph.D. Degree. Beijing: University of Chinese Academy of Sciences, 2017 [李明强. 图像去噪与逐层表达的优化理论与算法研究. 博士学位论文. 北京: 中国科学院大学, 2017]. Google Scholar

[22] Conte D, Foggia P, Sansone C. Thirty years of graph matching in pattern recognition. Int J Pattern Recogn Artif Intel, 2004, 18: 265-298 CrossRef Google Scholar

[23] Bourgeois F, Lassalle J C. An extension of the munkres algorithm for the assignment problem to rectangular matrices. Commun ACM, 1971, 14: 802-804 CrossRef Google Scholar

[24] Birgin E G, Mario M J. Large-scale active-set box constrained optimization method with spectral projected gradients. Comput Opt Appl, 2002, 23: 101-125 CrossRef Google Scholar

[25] Beck A, Teboulle M. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans Image Process, 2009, 18: 2419-2434 CrossRef PubMed ADS Google Scholar

  • Figure 1

    Combination of front-end and back-end learning

  • Figure 2

    Two-element layered model of machine learning

  • Figure 3

    Framework of front-end and back-end fusion learning

  • Figure 4

    Natural scenery image and its image block


    Algorithm 1 Initial image blocks election algorithm

    Input $N$ initial image blocks $x_i$, $i=1,2,\ldots,~N$ for being trained and choose $K$ image blocks randomly as clustering centers;

    while it is not convergent do

    Fix clustering centers $C$, computing (21) for every $i=1,2,\ldots,~N$, beginflalign&Z_ki= begincases 1,&textif $k=I_i$ 0,&textotherwise; endcases& endflalign

    Fix $Z$, solving (22) for every $k=1,2,\ldots,~K$: beginflalign &∑łimits_i=1^N Z_kik'(x_i,c_k)=0;& endflalign

    Construct bipartite graph matching as shown in 20, and obtain $Q$ using Kuhn-Munkres algorithm [23];


    end while


    Algorithm 2 FISTA algorithm

    R(B_k)right),&endflalign beginflalign & t_k+1=frac

    12łeft(1+sqrt1+4t_k²right),&endflalign beginflalign & B_k+1=B_k+frac


    Input elected image matrix $C$, and the Lipschitz continuous constant $L$ of $\nabla~R(A)$;

    Choose initial point $A_0$, set $B_1=A_0$, $t_1=1$, $k=1$;

    while it is not convergent do

    beginflalign & A_k=T_σ,λłeft(B_k-frac1L

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有