SCIENTIA SINICA Informationis, Volume 48, Issue 5: 511-520(2018) https://doi.org/10.1360/N112017-00261

## Conceptor-based deep neural networks

• AcceptedFeb 25, 2018
• PublishedMay 11, 2018
Share
Rating

### Abstract

In recent years, deep neural networks, also known as deep learning, have achieved several breakthroughs in different fields that were previously dominated by machine learning. Even when using high-performance computing devices, it takes days or weeks to train a deep neural network. Conceptor, as an extension of echo state networks, can be understood as certain neural filters that characterize dynamical neural activation patterns. In this study, based on some improvements to the original conceptor model, we have conducted several studies from the perspectives of non-iterative methods and transfer learning to address the issues mentioned above, which can be summarized as follows: (1) A conceptor-based classifier for non-temporal data and a non-iterative approach feedforward convolutional conceptor neural network are proposed. This classifier achieves classifying accuracy comparable to that of the state-of-the-art methods while requiring significantly less training time. Through experiments on MNIST variation datasets, we evaluate the classifying quality of the feedforward convolutional conceptor neural network. (2) A classifier called fast conceptor classifier is proposed based on conceptors and it achieves state-of-the-art results with the training time reduced by a factor of 60 on average. Its evaluations with pre-trained rather than fine-tuned neural networks have been investigated on Caltech-101 and Caltech-256 datasets.

### References

[1] Zhang L, Zhang Y. Big data analysis by infinite deep neural networks. J Comput Res Dev, 2016, 53: 68--79. Google Scholar

[2] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets.. Neural Computation, 2006, 18: 1527-1554 CrossRef PubMed Google Scholar

[3] Hinton G E. Reducing the Dimensionality of Data with Neural Networks. Science, 2006, 313: 504-507 CrossRef PubMed ADS Google Scholar

[4] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, Lake Tahoe, 2012. 1097--1105. Google Scholar

[5] Wan L, Zeiler M, Zhang S, et al. Regularization of neural networks using dropconnect. In: Proceedings of the 30th International Conference on Machine Learning, Atlanta, 2013. 1058--1066. Google Scholar

[6] Zhang L, Yi Z, Amari S. Theoretical study of oscillator neurons in recurrent neural networks. IEEE Trans Neural Netw Learn Syst, 2018, 99: 1--7. Google Scholar

[7] Sermanet P, Eigen D, Zhang X, et al. Overfeat: integrated recognition, localization and detection using convolutional networks,. arXiv Google Scholar

[8] Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In: Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2014. 818--833. Google Scholar

[9] Li Z, Tang J. Weakly Supervised Deep Metric Learning for Community-Contributed Image Retrieval. IEEE Trans Multimedia, 2015, 17: 1989-1999 CrossRef Google Scholar

[10] Li Z, Tang J. Weakly Supervised Deep Matrix Factorization for Social Image Understanding. IEEE Trans Image Process, 2017, 26: 276-288 CrossRef PubMed ADS Google Scholar

[11] Tang J, Shu X, Qi G J. Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement.. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 1662-1674 CrossRef PubMed Google Scholar

[12] Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors. Nature, 1986, 323: 533-536 CrossRef ADS Google Scholar

[13] Bruna J, Mallat S. Invariant scattering convolution networks.. IEEE Trans Pattern Anal Mach Intell, 2013, 35: 1872-1886 CrossRef PubMed Google Scholar

[14] Chan T H, Jia K, Gao S, et al. Pcanet: a simple deep learning baseline for image classification? IEEE Trans Image Process, 2015, 24: 5017--5032. Google Scholar

[15] Qian G W, Zhang L. A simple feedforward convolutional conceptor neural network for classification. Appl Soft Comput, 2017. Google Scholar

[16] He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Computer Vision — ECCV 2014. Berlin: Springer, 2014. 346--361. Google Scholar

[17] Qian G W, Zhang L, Zhang Q J. Fast conceptor classifier in pre-trained neural networks for visual recognition. In: Advances in Neural Networks — ISNN 2017. Berlin: Springer, 2017. 290--298. Google Scholar

[18] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition,. arXiv Google Scholar

[19] Russakovsky O, Deng J, Su H. ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis, 2015, 115: 211-252 CrossRef Google Scholar

[20] Chatfield K, Simonyan K, Vedaldi A, et al. Return of the devil in the details: delving deep into convolutional nets,. arXiv Google Scholar

[21] Donahue J, Jia Y, Vinyals O, et al. Decaf: a deep convolutional activation feature for generic visual recognition. In: Proceedings of the 31st International Conference on Machine Learning, Beijing, 2014. 647--655. Google Scholar

[22] Razavian A S, Azizpour H, Sullivan J, et al. CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, 2014. 806--813. Google Scholar

[23] Jaeger H. Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication. Science, 2004, 304: 78-80 CrossRef PubMed ADS Google Scholar

[24] Jaeger H. Using conceptors to manage neural long-term memories for temporal patterns. J Mach Learn Res, 2017, 18: 1--43. Google Scholar

[25] Pearson K. On lines and planes of closest fit to systems of point in space. London Edinburgh Dublin Philos Mag J Sci, 1901, 2: 559-572 CrossRef Google Scholar

[26] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition,. arXiv Google Scholar

[27] Boser B E, Guyon I M, Vapnik V N. A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory, Pittsburgh, 1992. 144--152. Google Scholar

[28] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 3431--3440. Google Scholar

[29] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 2014. 580--587. Google Scholar

[30] Larochelle H, Erhan D, Courville A, et al. An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th International Conference on Machine Learning, Corvalis, 2007. 473--480. Google Scholar

[31] Rifai S, Vincent P, Muller X, et al. Contractive auto-encoders: explicit invariance during feature extraction. In: Proceedings of the 28th International Conference on Machine Learning, Bellevue, 2011. 833--840. Google Scholar

[32] Sohn K, Lee H. Learning invariant representations with local transformations. In: Proceedings of the 29th International Conference on Machine Learning, Edinburgh, 2012. Google Scholar

[33] Sohn K, Zhou G, Lee C, et al. Learning and selecting features jointly with point-wise gated Boltzmann machines. In: Proceedings of the 30th International Conference on Machine Learning, Atlanta, 2013. 217--225. Google Scholar

• Figure 1

The flowchart of FCCNN

• Figure 2

The flowchart of FCC

• Table 1   Error rates of different methods on MNIST variations and corresponding training time on bg-img-rot$^{\rm~a)}$
 Method Basic Rot Bg-rand Bg-img Bg-img-rot Training time CAE-2 [31] 2.48 9.66 10.9 15.5 45.23 $>$3 h TIRBM [32] – 4.2 – – 35.5 $>$3 h PGBM+DN-1 [33] – – 6.08 12.25 36.76 $>$3 h ScatNet-2 [14] 1.27 7.48 18.4 12.3 50.48 – PCANet-2 [13] 1.06 7.37 6.19 10.95 35.48 15 min FCCNN 2.43 8.91 6.45 10.8 33.6 5 $\sim$ 30 min

a

• Table 2   Classifying accuracies on Caltech-101 and Caltech-256
 Method Caltech-101 Caltech-256 Zeiler & Fergus [7] 86.5 74.2 Chatfield et al. [22] 88.4 77.6 He et al. [17] 93.4 – VGG-16 Net [16] 91.8 84.57 Resnet-50 [26] 92.65 82.43 Resnet-152 [26] 95.23 90.24 FCC(VGG-16 Net) 91.87 84.67 FCC(Resnet-50) 93.08 82.81 FCC(Resnet-152) 95.55 90.87
• Table 3   Running time of VGG-16 Net, Resnet-50 and Resnet-152 with different classifiers (s)
 Method Caltech-101 Caltech-256 Training time Testing time Training time Testing time VGG-16 Net 118.31 118.28 2345.07 3114.48 FCC(VGG-16 Net) 1.76 65.2 26.16 1103.25 Resnet-50 16.03 21.59 82.43 554.12 FCC(Resnet-50) 0.33 15.19 2.64 220.99 Resnet-152 13.07 20.24 229.93 497.93 FCC(Resnet-152) 0.32 15.33 2.73 223.21

Citations

• #### 0

Altmetric

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有