logo

SCIENTIA SINICA Informationis, Volume 50, Issue 5: 662-674(2020) https://doi.org/10.1360/N112018-00332

Few-shot learning via model composition

More info
  • ReceivedDec 21, 2018
  • AcceptedJul 4, 2019
  • PublishedApr 26, 2020

Abstract

Although achieve inspiring performance in many real-world applications, machine learning methods require a huge amount of training examples to obtain an effective model. Considering the effort collecting labeled training data, the few-shot learning, i.e., learning with budgeted training set, is necessary and useful. Model prior, e.g., the feature embedding, initialization, and configuration, is the key to the few-shot learning. This study metalearns such prior from seenclasses and apply the learned prior over few-shot task on unseenclasses. Meanwhile, based on the first order optimal condition of the objective, the model composition prior (MCP) is stressed to decompose the model prior and estimate each component. The composition strategy improves the explainability, while guiding the shared and specific parts among those few-shot tasks. We verify the ability of our approach to recover task relationship over the synthetic dataset, and our MCP method achieves better results on two benchmark datasets (MiniImageNetand CUB).


Funded by

国家重点研发计划“大数据分析的基础理论和技术方法”(2018YFB1004300)

国家自然科学基金(61773198,61632004)

计算机软件新技术协同创新中心

南京大学优秀博士研究生创新能力提升计划项目


Acknowledgment

作者感谢在南加州大学沙飞老师研究组访问期间沙飞老师以及组里同学提供的帮助.


References

[1] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. 2014,. arXiv Google Scholar

[2] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012, 1097--1105. Google Scholar

[3] Wang Y-X, Girshick R, Hebert M, et al. Low-shot learning from imaginary data. 2018,. arXiv Google Scholar

[4] Tan X, Chen S, Zhou Z H. Face recognition from a single image per person: A survey. Pattern Recognition, 2006, 39: 1725-1745 CrossRef Google Scholar

[5] Finn C, Yu T H, Zhang T H, et al. One-shot visual imitation learning via meta-learning. In: Proceedings of the 1st Annual Conference on Robot Learning, Mountain View, 2017. 357--368. Google Scholar

[6] Karlinsky L, Shtok J, Tzur Y, et al. Fine-grained recognition of thousands of object categories with single-example training. In: Proceedings of Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 965--974. Google Scholar

[7] Zhou Z H. Learnware: on the future of machine learning. Front Comput Sci, 2016, 10: 589-590 CrossRef Google Scholar

[8] Lake B M, Salakhutdinov R, Tenenbaum J B. Human-level concept learning through probabilistic program induction. Science, 2015, 350: 1332-1338 CrossRef PubMed ADS Google Scholar

[9] Patricia N, Caputo B. Learning to learn, from transfer learning to domain adaptation: a unifying perspective. In: Proceedings of Conference on Computer Vision and Pattern Recognition, Columbus, 2014. 1442--1449. Google Scholar

[10] Li Z G, Zhou F W, Chen F, et al. Meta-sgd: Learning to learn quickly for few shot learning. 2017,. arXiv Google Scholar

[11] Russakovsky O, Deng J, Su H. ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis, 2015, 115: 211-252 CrossRef Google Scholar

[12] Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for one-shot image recognition. In: Proceedings of the 32nd International Conference on MachineLearning, Lille, 2015. Google Scholar

[13] Vinyals O, Blundell C, Lillicrap T, et al. Matching networks for one shot learning. In: Proceedings of Advances in Neural Information Processing Systems, 2016. 3630--3638. Google Scholar

[14] Snell J, Swersky K, Zemel R S. Prototypical networks for few-shot learning. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 4080--4090. Google Scholar

[15] Triantafillou E, Zemel R S, Urtasun R. Few-shot learning through an information retrieval lens. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 2252--2262. Google Scholar

[16] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of Proceedings of the 34th International Conference on Machine Learning, Sydney, 2017. 1126--1135. Google Scholar

[17] Nichol A, Achiam J, Schulman J. On first-order meta-learning algorithms. 2018,. arXiv Google Scholar

[18] Ravi S, Larochelle H. Optimization as a model for few-shot learning. In: Proceedings of International Conference on Learning Representations, 2017. Google Scholar

[19] Schölkopf B, Smola A J. Learning with Kernels. Cambridge: The MIT Press, 2001. Google Scholar

[20] Friedman J, Hastie T, Tibshirani R. The elements of statistical learning. Berlin: Springer, 2001. Google Scholar

[21] Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural Computation, 1997, 9: 1735-1780 CrossRef Google Scholar

[22] Schuster M, Paliwal K K. Bidirectional recurrent neural networks. IEEE Trans Signal Process, 1997, 45: 2673-2681 CrossRef ADS Google Scholar

[23] Zaheer M, Kottur S, Ravanbakhsh S, et al. Deep sets. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 3394--3404. Google Scholar

[24] Bertinetto L, Henriques J F, Torr P H S, et al. Meta-learning with differentiable closed-form solvers. 2018,. arXiv Google Scholar

[25] Li Fei-Fei , Fergus R, Perona P. One-shot learning of object categories.. IEEE Trans Pattern Anal Machine Intell, 2006, 28: 594-611 CrossRef PubMed Google Scholar

[26] Lake B M, Salakhutdinov R, Gross J, et al. One shot learning of simple visual concepts. In: Proceedings of the 33th Annual Meeting of the Cognitive Science Society, Boston, 2011. Google Scholar

[27] Andrychowicz M, Denil M, Colmenarejo S G, et al. Learning to learn by gradient descent by gradient descent. In: Proceedings of Advances in Neural Information Processing Systems, 2016. 3981--3989. Google Scholar

[28] Maurer A. Transfer bounds for linear feature learning. Mach Learn, 2009, 75: 327-350 CrossRef Google Scholar

[29] Maurer A, Pontil M, Romera-Paredes B. The benefit of multitask representation learning. J Mach Learn Res, 2016, 17: 1--32. Google Scholar

[30] Hariharan B, Girshick R B. Low-shot visual recognition by shrinking and hallucinating features. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 3037--3046. Google Scholar

[31] Dai W Z, Muggleton S, Wen J, et al. Logical vision: one-shot meta-interpretive learning from real images. In: Proceedings of the 27th International Conference on Inductive Logic Programming, Orléans, 2017. 46--62. Google Scholar

[32] Wang P, Liu L Q, Shen C H, et al. Multi-attention network for one shot learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 6212--6220. Google Scholar

[33] Shyam P, Gupta S, Dukkipati A. Attentive recurrent comparators. In: Proceedings of the 34th International Conference on Machine Learning, Sydney, 2017. 3173--3181. Google Scholar

[34] Pan S J, Yang Q. A Survey on Transfer Learning. IEEE Trans Knowl Data Eng, 2010, 22: 1345-1359 CrossRef Google Scholar

[35] Weiss K, Khoshgoftaar T M, Wang D D. A survey of transfer learning. J Big Data, 2016, 3: 9 CrossRef Google Scholar

[36] Day O, Khoshgoftaar T M. A survey on heterogeneous transfer learning. J Big Data, 2017, 4: 29 CrossRef Google Scholar

[37] Wang Y X, Ramanan D, Hebert M. Learning to model the tail. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 7032--7042. Google Scholar

[38] Motiian S, Jones Q, Iranmanesh S M, et al. Few-shot adversarial domain adaptation. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 6673--6683. Google Scholar

[39] Yu T H, Finn X, Xie A N, et al. One-shot imitation from observing humans via domain-adaptive meta-learning. 2018,. arXiv Google Scholar

[40] Reed S E, Chen Y T, Paine T, et al. Few-shot autoregressive density estimation: towards learning to learn distributions. 2017,. arXiv Google Scholar

[41] Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, Lille, 2015. 448--456. Google Scholar

[42] Kingma D P, Ba J. Adam: a method for stochastic optimization. 2014,. arXiv Google Scholar

[43] Wah C, Branson S, Welinder P, et al. The Caltech-UCSD Birds-200-2011 Dataset. Technical report, 2011. Google Scholar

[44] Mensink T, Verbeek J, Perronnin F. Distance-based image classification: generalizing to new classes at near-zero cost.. IEEE Trans Pattern Anal Mach Intell, 2013, 35: 2624-2637 CrossRef PubMed Google Scholar

[45] Sung F, Yang Y X, Zhang L, et al. Learning to compare: relation network for few-shot learning. 2017,. arXiv Google Scholar

  • Figure 1

    (Color online) Comparison between different approaches. (a) ProtoNet; (b) MAML; (c) MCP.

  • Figure 2

    (Color online) Illustration of the MCP approach

  • Figure 3

    (Color online) The ground-truth precition matrix (a) and the ones estimated by MCP with weak (b) and strong (c) noise

  • Table 1   Comparison results over two settings on synthetic dataset
    Method Noise 30-Way 1-Shot (%) 30-Way 5-Shot (%) Method Noise 30-Way 1-Shot (%) 30-Way 5-Shot (%)
    NN N 9.82 $\pm$ 0.10 13.91 $\pm$ 0.11 NN Y 9.61 $\pm$ 0.10 13.46 $\pm$ 0.11
    Proto N 9.82 $\pm$ 0.10 25.35 $\pm$ 0.15 Proto Y 9.61 $\pm$ 0.10 24.97 $\pm$ 0.14
    FCE N 12.80 $\pm$ 0.12 63.08 $\pm$ 0.25 FCE Y 10.09 $\pm$ 0.10 48.53 $\pm$ 0.22
    MCP N 91.77 $\pm$ 0.19 95.46 $\pm$ 0.12 MCP Y 89.53 $\pm$ 0.20 94.88 $\pm$ 0.12
  • Table 2   Few-shot classification mean accuracy on MiniImageNet dataset, together with 95% confidence intervals
    5-Way 1-Shot (%) 5-Way 5-Shot (%) 5-Way 1-Shot (%) 5-Way 5-Shot (%)
    Baseline NN 41.08 $\pm$ 0.70 51.04 $\pm$ 0.65 MAML [16] 48.70 $\pm$ 1.84 63.11 $\pm$ 0.92
    MatchingNet [13] 43.40 $\pm$ 0.78 51.09 $\pm$ 0.71 Siamese 48.42 $\pm$ 0.79
    MatchingNet (FCE) [13] 43.56 $\pm$ 0.84 55.31 $\pm$ 0.73 mAP-SSVM [15] 50.32 $\pm$ 0.80 63.94 $\pm$ 0.72
    Meta-LSTM [18] 43.44 $\pm$ 0.77 60.60 $\pm$ 0.71 mAP-DLM [15] 50.28 $\pm$ 0.80 63.70 $\pm$ 0.70
    RelationNet [43] 50.40 $\pm$ 0.80 65.30 $\pm$ 0.70 Meta-CF [24] 48.70 $\pm$ 0.60 65.50 $\pm$ 0.60
    ProtoNet [14] 49.42 $\pm$ 0.78 68.20 $\pm$ 0.66 ProtoNet Pool 49.21 $\pm$ 0.79 66.80 $\pm$ 0.68
    MCP 51.24 $\pm$ 0.82 67.37 $\pm$ 0.67 MCP$^+$ 51.27 $\pm$ 0.81 66.93 $\pm$ 0.63
  • Table 3   Few-shot classification mean accuracy on CUB, together with 95% confidence intervals
    1-Shot 5-Way (%) 1-Shot 20-Way (%) 1-Shot 5-Way (%) 1-Shot 20-Way (%)
    Siamese 47.17 $\pm$ 0.62 21.35 $\pm$ 0.21 mAP-DLM 50.07 $\pm$ 0.64 22.79 $\pm$ 0.22
    MatchingNet 46.58 $\pm$ 0.66 16.61 $\pm$ 0.18 ProtoNet 48.89 $\pm$ 6.30 22.92 $\pm$ 2.17
    MAML 31.04 $\pm$ 0.43 10.57 $\pm$ 0.13 ProtoNet Pool 48.96 $\pm$ 0.65 21.18 $\pm$ 0.21
    MCP 50.95 $\pm$ 0.65 25.30 $\pm$ 0.25 MCP$^{+}$ 55.85 $\pm$ 0.72 26.39 $\pm$ 0.26

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1       京公网安备11010102003388号