logo

SCIENTIA SINICA Informationis, Volume 49, Issue 6: 739-759(2019) https://doi.org/10.1360/N112017-00268

Fusion of front-end and back-end learning based on layer-by-layer data re-representation

More info
  • ReceivedDec 7, 2017
  • AcceptedJan 28, 2018
  • PublishedJun 11, 2019

Abstract

Based on two research contents of machine learning, a two-element layered model of machine learning is proposed. In addition, the concepts of front-end learning, back-end learning, a combination of front-end and back-end learning, and the fusion of front-end and back-end leaning are presented. Specifically, a framework and optimization model for the fusion of front-end and back-end learning is constructed. For front-end learning, which is a simulated hierarchical working mechanism of the brain, we present a layer-by-layer data re-representation method, which is driven by both data and a model. In addition, we propose a specific implementation of the data re-representation method for visual learning.


Funded by

国家自然科学基金(11331012,11731013,11571014)


References

[1] Mitchell T M. Machine Learning. New York: McGraw-Hill Science, 1997. Google Scholar

[2] Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules. In: Proceedings of the 31st Conference on Neural Information Processing System, Long Beac, 2017. Google Scholar

[3] Xie J B, Xing J L, Zhang L N, et al. 20 Lectures on Visual Machine Learning. Beijing: Tsinghua University Press, 2015 [谢剑斌, 兴军亮, 张立宁, 等. 视觉机器学习20讲. 北京: 清华大学出版社, 2015]. Google Scholar

[4] Wu W, Yang J. $L_{1/2}$ regularization methods for weights sparsification of neural networks. Sci Sin Math, 2015, 45: 1487--1504. Google Scholar

[5] Liu J W, Liu Y, Luo X L. Semi-supervised learning methods. Chinese J Comput, 2015, 38: 1592--1617. Google Scholar

[6] Ma L R, Song D D, Liao L J. PSVM: a preference-enhanced SVM model using preference data for classification. Sci China Inf Sci, 2017, 60: 122103 CrossRef Google Scholar

[7] Deng C W, Huang G B, Xu J. Extreme learning machines: new trends and applications. Sci China Inf Sci, 2015, 58: 020301 CrossRef Google Scholar

[8] Feng X D, He X M. Robust low-rank data matrix approximations. Sci China Math, 2017, 60: 189-200 CrossRef ADS Google Scholar

[9] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313: 504-507 CrossRef PubMed ADS Google Scholar

[10] Qu W, Wang D L, Feng S. A novel cross-modal hashing algorithm based on multimodal deep learning. Sci China Inf Sci, 2017, 60: 092104 CrossRef Google Scholar

[11] Gao W, Zhou Z H. Dropout rademacher complexity of deep neural networks. Sci China Inf Sci, 2016, 59: 072104 CrossRef Google Scholar

[12] Shao G Q, Wu Y P, Yong A . Fingerprint compression based on sparse representation. IEEE Trans Image Process, 2014, 23: 489-501 CrossRef PubMed ADS Google Scholar

[13] Shao G Q, Han C Y, Guo T D, et al. An NMF-based method for the fingerprint orientation field estimation. In: Proceedings of Computer and Information Science, Warsaw, 2012. 93--104. Google Scholar

[14] Li M Q, Han C Y, Guo T D. New gradient algorithms for optimization problems constrained by a cartesian product of unit balls. Acta Math Appl Sin, 2018, 41: 43--54. Google Scholar

[15] Li M Q, Han C Y, Wang R X. Shrinking gradient descent algorithms for total variation regularized image denoising. Comput Opt Appl, 2017, 68: 643-660 CrossRef Google Scholar

[16] Donoho D L. Compressed sensing. IEEE Trans Inf Theory, 2006, 52: 1289-1306 CrossRef Google Scholar

[17] Candes E J, Romberg J, Tao T. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory, 2006, 52: 489-509 CrossRef Google Scholar

[18] Olshausen B A, Field D J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 1996, 381: 607-609 CrossRef PubMed ADS Google Scholar

[19] Olshausen B A, Field D J. Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Res, 1997, 37: 3311--3325. Google Scholar

[20] Olshausen B A, Field D J. Natural image statistics and efficient coding. Netw-Comput Neural Syst, 1996, 7: 333-339 CrossRef Google Scholar

[21] Li M Q. Optimization theory and algorithms for image denoising and representation layer by layer. Dissertation for Ph.D. Degree. Beijing: University of Chinese Academy of Sciences, 2017 [李明强. 图像去噪与逐层表达的优化理论与算法研究. 博士学位论文. 北京: 中国科学院大学, 2017]. Google Scholar

[22] Conte D, Foggia P, Sansone C. Thirty years of graph matching in pattern recognition. Int J Pattern Recogn Artif Intel, 2004, 18: 265-298 CrossRef Google Scholar

[23] Bourgeois F, Lassalle J C. An extension of the munkres algorithm for the assignment problem to rectangular matrices. Commun ACM, 1971, 14: 802-804 CrossRef Google Scholar

[24] Birgin E G, Mario M J. Large-scale active-set box constrained optimization method with spectral projected gradients. Comput Opt Appl, 2002, 23: 101-125 CrossRef Google Scholar

[25] Beck A, Teboulle M. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans Image Process, 2009, 18: 2419-2434 CrossRef PubMed ADS Google Scholar

  • Figure 1

    Combination of front-end and back-end learning

  • Figure 2

    Two-element layered model of machine learning

  • Figure 3

    Framework of front-end and back-end fusion learning

  • Figure 4

    Natural scenery image and its image block

  • Figure 5

    Framework of data re-representation method of layer by layer

  • Figure 6

    Two natural landscape test images

  • Figure 7

    Number distribution diagram of each kind of element

  • Figure 8

    Partial elements of the 262th kinds

  • Figure 9

    Partial primary image blocks

  • Figure 10

    Base-image reconstruction results of initial layer. The left image blocks of (a)$\sim$(d) are 16$\times$16 original image blocks randomly selected; the right image blocks of (a)$\sim$(d) are the results of sparse reconstruction of the corresponding left image blocks

  • Figure 11

    (Color online) Sparse coefficient distribution of base images.(a)$\sim$(d) are the sparse coefficient distribution of corresponding images (a)$\sim$(d) of Figure 10 expressed by base images, respectively

  • Figure 12

    Reconstructed image blocks with different sparsity. (a) Original image block; (b) Spare=0, PSNR=26.80;protect łinebreak (c) Spare=0.1, PSNR=25.29; (d) Spare=0.2, PSNR=26.64; (e) Spare=0.3, PSNR=22.02; (f) Spare=0.4, PSNR=16.04

  • Figure 13

    Reconstructed images with different sparsity. (a) Original image; (b) Spare=0, PSNR=28.20; (c) Spare=0.1, PSNR=30.03; (d) Spare=0.2, PSNR=29.15; (e) Spare=0.3, PSNR=24.83; (f) Spare=0.4, PSNR=23.14

  •   

    Algorithm 1 Initial image blocks election algorithm

    Input $N$ initial image blocks $x_i$, $i=1,2,\ldots,~N$ for being trained and choose $K$ image blocks randomly as clustering centers;

    while it is not convergent do

    Fix clustering centers $C$, computing (21) for every $i=1,2,\ldots,~N$, \begin{eqnarray} Z_{ki}= \left( \begin{array}{ccc} 1,&\text{if $k=I_i$}, \\ 0,&\text{otherwise}; \\ \end{array} \right), \tag{21} \end{eqnarray}

    Fix $Z$, solving (22) for every $k=1,2,\ldots,~K$: \begin{equation} \sum\limits_{i=1}^{N} Z_{ki}k'(x_i,c_k)=0; \tag{22}\end{equation}

    Construct bipartite graph matching as shown in 20, and obtain $Q$ using Kuhn-Munkres algorithm [23];

    $C=XQ$;

    end while

  •   

    Algorithm 2 FISTA algorithm

    Input elected image matrix $C$, and the Lipschitz continuous constant $L$ of $\nabla~R(A)$;

    Choose initial point $A_0$, set $B_1=A_0$, $t_1=1$, $k=1$;

    while it is not convergent do

    \begin{equation} A_k=T_{\sigma,\lambda}\left(B_k-\frac{1}{L}\nabla R(B_k)\right)\hspace{-0.8 mm}, \tag{30}\end{equation}

    \begin{equation} t_{k+1}=\frac{1}{2}\left(1+\sqrt{1+4t_k^2}\right)\hspace{-0.8 mm}, \tag{31}\end{equation}

    \begin{equation} B_{k+1}=B_{k}+\frac{t_k-1}{t_{k+1}}(A_k-A_{k-1}); \tag{32}\end{equation}

    end while

  •   

    Algorithm 3 Matrix binarization algorithm

    Input continuous matrix $H\in~[0,1]$ and the threshold $\epsilon\in~[0,1]$;

    Sort all the elements of $H$ from large to small, denote the sorting sequence pair as $(p_i,q_i)$, $i=1,\ldots,K^2$;

    for $i=1,\ldots,K^2$

    if $H(p_i,q_i)\geq \epsilon$ $H(p_i,:)=0,~ H(:,q_i)=0,~ H(p_i,q_i)=1$;

    else $ H(p_i,q_i)=0;$

    end for

  •   

    Algorithm 4 Base image selection algorithm

    Input initial image matrix $C$, initialize $(A^0,H^0)$, $k=1$, and maximum number of iterations Maxiter;

    while $k<{\rm~Maxiter}$ do

    Fix $H^k$, and solve 28 by Algorithm 2 to gain the solution $A^{k+1}$;

    Fix $A^{k+1}$, and solve the quadratic programming problem on $H$ by active-set algorithm to gain the solution $H^{k+1}$;

    Binarize $H^{k+1}$ by Algorithm 3;

    $k=k+1$;

    end while

    Output $H^{\rm~Maxiter}$ corresponding to the row label of non-zero elements of the base image.

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1       京公网安备11010102003388号