logo

SCIENTIA SINICA Informationis, Volume 48, Issue 1: 47-59(2018) https://doi.org/10.1360/N112016-00279

Kernel method for matrix completion with side information and its application in multi-label learning

More info
  • ReceivedDec 13, 2016
  • AcceptedMar 24, 2017
  • PublishedSep 14, 2017

Abstract

In practical machine learning, one instance is always associated with multiple labels. However, due to high cost, it is difficult to acquire the full supervised information for multi-label data. Thus, multi-label learning faces the problem of missing supervised information. By considering missing labels as unobserved entries in a matrix and features as side information, the matrix completion algorithm can be exploited to solve the missing-supervised-information problem in multi-label learning. While the previous research often focused on the case where data is linearly separable, in this paper, we propose the KernelMaxide algorithm, which not only exploits the nonlinear structure in the missing-supervised-information multi-label data, but also considers the correlation between labels. In particular, we construct a novel optimization objective based on the kernel matrix, using the Representer Theorem of Matrix Norm. We further use the Nystrom method to reduce the memory and computational burden on the kernel matrix. Experiments show the merit of our proposal.


Funded by

国家自然科学基金(61333014)


References

[1] Zhang M L, Zhou Z H. A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng, 2014, 26: 1819--1837. Google Scholar

[2] Zhuang Y T, Han Y H, Wu F, et al. Stable multi-label boosting for image annotation with structural feature selection. Sci China Inf Sci, 2011, 54: 2508--2521. Google Scholar

[3] Nguyen V A, Boydgraber J L, Resnik P, et al. Learning a concept hierarchy from multi-labeled documents.łinebreak In: Proceedings of the Advances in Neural Information Processing Systems, Montreal, 2014. 3671--3679. Google Scholar

[4] Chakrabarti D, Funiak S, Chang J, et al. Joint inference of multiple label types in large networks. In: Proceedings of the 31st International Conference on Machine Learning, Beijing, 2014. 874--882. Google Scholar

[5] Elisseeff A, Weston J. A kernel method for multi-labelled classification. In: Proceedings of the Advances in Neural Information Processing Systems, Vancouver, 2001. 681--687. Google Scholar

[6] Shao H, Li G, Liu G, et al. Symptom selection for multi-label data of inquiry diagnosis in traditional Chinese medicine. Sci China Inf Sci, 2013, 56: 052118. Google Scholar

[7] Cabral R S, de la Torre F, Costeira J P, et al. Matrix completion for weakly-supervised multi-label image classification. IEEE Trans Pattern Anal Mach Intell, 2015, 37: 121--135. Google Scholar

[8] Xu M, Jin R, Zhou Z H. Speedup matrix completion with side information: application to multi-label learning. łinebreak In: Proceedings of the Advances in Neural Information Processing Systems Conference, Lake Tahoe, 2013. 2301--2309. Google Scholar

[9] Goldberg A B, Zhu X, Recht B, et al. Transduction with matrix completion: three birds with one stone. In: Proceedings of the Advances in Neural Information Processing Systems Conference, Vancouver, 2010. 757--765. Google Scholar

[10] Boutell M R, Luo J, Shen X, et al. Learning multi-label scene classification. Pattern Recog, 2004, 37: 1757--1771. Google Scholar

[11] Recht B. A simpler approach to matrix completion. J Mach Learn Res, 2011, 12: 3413--3430. Google Scholar

[12] Ji S, Sun L, Jin R, et al. Multi-label multiple kernel learning. In: Proceedings of the Advances in Neural Information Processing Systems Conference, Vancouver, 2008. 777--784. Google Scholar

[13] Tang L, Chen J, Ye J. On multiple kernel learning with multiple labels. In: Proceedings of the 18th International Joint Conference on Artificial Intelligenc, Pasadena, 2009. 1255--1260. Google Scholar

[14] Bucak S S, Jin R, Jain A K. Multi-label multiple kernel learning by stochastic approximation: application to visual object recognition. In: Proceedings of the Advances in Neural Information Processing Systems Conference, Vancouver, 2010. 325--333. Google Scholar

[15] Argyriou A, Micchelli C A, Pontil M. When is there a representer theorem? Vector versus matrix regularizers. J Mach Learn Res, 2009, 10: 2507--2529. Google Scholar

[16] Williams C K I, Seeger M W. Using the nystrom method to speed up kernel machines. In: Proceedings of the Advances in Neural Information Processing Systems Conference, Denver, 2000. 682--688. Google Scholar

[17] Liu W, Tsang I W. Large margin metric learning for multi-label prediction. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, 2015. 2800--2806. Google Scholar

[18] Gentile C, Orabona F. On multilabel classification and ranking with bandit feedback. J Mach Learn Res, 2014, 15: 2451--2487. Google Scholar

[19] Bhatia K, Jain H, Kar P, et al. Sparse local embeddings for extreme multi-label classification. In: Proceedings of the Advances in Neural Information Processing Systems Conference, Montreal, 2015. 730--738. Google Scholar

[20] Zhu Y, Ting K M, Zhou Z H. Multi-label learning with emerging new labels. In: Proceedings of the IEEE International Conference on Data Mining, Barcelona, 2016. 1371--1376. Google Scholar

[21] Schapire R E, Singer Y. BoosTexter: a boosting-based system for text categorization. Mach Learn, 2000, 39: 135--168. Google Scholar

[22] Zhang M L, Zhou Z H. ML-KNN: a lazy learning approach to multi-label learning. Pattern Recog, 2007, 40: 2038--2048. Google Scholar

[23] Tsoumakas G, Katakis I, Vlahavas I P. Random k-labelsets for multilabel classification. IEEE Trans Knowl Data Eng, 2011, 23: 1079--1089. Google Scholar

[24] Bi W, Kwok J T. Bayes-optimal hierarchical multilabel classification. IEEE Trans Knowl Data Eng, 2015, 27: 2907--2918. Google Scholar

[25] Furnkranz J, Hullermeier E, Mencia E L, et al. Multilabel classification via calibrated label ranking. Mach Learn, 2008, 73: 133--153. Google Scholar

[26] Deng J, Russakovsky O, Krause J, et al. Scalable multi-label annotation. In: Proceedings of the CHI Conference on Human Factors in Computing Systems, Toronto, 2014. 3099--3102. Google Scholar

[27] Gao N, Huang S J, Chen S. Multi-label active learning by model guided distribution matching. Front Comput Sci, 2016, 10: 845--855. Google Scholar

[28] Yu H F, Jain P, Kar P, et al. Large-scale multi-label learning with missing labels. In: Proceedings of the 31st International Conference on Machine Learning, Beijing, 2014. 593--601. Google Scholar

[29] Scholkopf B, Smola A J. Learning With Kernels: Support Vector Machines, Cegularization, Optimization, and Beyond. Cambridge: MIT Press, 2011. Google Scholar

[30] Steinwart I, Christmann A. Support Vector Machines. New York: Springer, 2008. Google Scholar

[31] Rahimi A, Recht B. Random features for large-scale kernel machines. In: Proceedings of the Advances in Neural Information Processing Systems Conference, Vancour, 2007. 1177--1184. Google Scholar

[32] Yang T, Li Y F, Mahdavi M, et al. Nystrom method vs random fourier features: a theoretical and empirical comparison. In: Proceedings of the Advances in Neural Information Processing Systems Conference, Lake Tahoe, 2012. 485--493. Google Scholar

[33] Tseng P. On Accelerated Proximal Gradient Methods for Convex-Concave Optimization. Technical Report. Seattle: University of Washington. 2008. http://www.mit.edu/ dimitrib/PTseng/papers/apgm.pdf. Google Scholar

[34] Cai J F, Candès E J, Shen Z. A singular value thresholding algorithm for matrix completion. SIAM J Optim, 2010, 20: 1956--1982. Google Scholar

[35] Ueda N, Saito K. Parametric mixture models for multi-labeled text. In: Proceedings of the Advances in Neural Information Processing Systems Conference, Vancour, 2002. 721--728. Google Scholar

[36] Fan R E, Chen P H, Lin C J. Working set selection using second order information for training support vector machines. J Mach Learn Res, 2005, 6: 1889--1918. Google Scholar

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1