logo

SCIENTIA SINICA Informationis, Volume 48, Issue 5: 487-500(2018) https://doi.org/10.1360/N112017-00246

Review on hierarchical learning methods for large-scale classification task

More info
  • ReceivedDec 3, 2017
  • AcceptedMar 12, 2018
  • PublishedMay 11, 2018

Abstract

Hierarchical classification is a task that uses hierarchy of categories in data. It can handle large-scale data. In recent years, significant research has emerged in this field, which is receiving increasingly more attention. In this paper, we first introduce the definition of hierarchical classification and thereafter review the important studies on several basic issues in large-scale hierarchical classification tasks based on different problem-solving strategies. First, we define the hierarchy formally and introduce some hierarchical evaluation metrics. Second, we explain how to construct the hierarchy, how to learn classifiers and perform feature selection using the information in the hierarchy, and how to design stopping strategies and introduce some representative studies on each issue. Finally, we summarize the features of large-scale hierarchical classification task and discuss the possible future work in this field.


Funded by

国家自然科学基金(61432011)

国家自然科学基金(U1435212)

国家自然科学基金(61732011)


References

[1] Yen I E H, Huang X, Zhong K, et al. Pd-sparse: a primal and dual sparse approach to extreme multiclass and multilabel classification. In: Proceedings of the International Conference on Machine Learning, New York, 2016. 3069--3077. Google Scholar

[2] Hippel T V, Storrielombardi L J, Storrielombardi M C, et al. Automated classification of stellar spectra-I. Initial results with artificial neural networks. Mon Not Royal Astron Soc, 1994, 269: 97. Google Scholar

[3] Russakovsky O, Deng J, Su H, et al. Imagenet large scale visual recognition challenge. Int J Comput Vision, 2014, 115: 211--252. Google Scholar

[4] Powers D M W. Applications and explanations of zipf's law. In: Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning, Sydney, 1998. 151--160. Google Scholar

[5] Farid M, Ilyas I F, Whang S E, et al. Lonlies: estimating property values for long tail entities. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, 2016. 1125--1128. Google Scholar

[6] Zhao H, Zhu P F, Wang P, et al. Hierarchical feature selection with recursive regularization. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, 2017. 3483--3489. Google Scholar

[7] Zhai Y, Ong Y S, Tsang I W. The Emerging "Big Dimensionality". IEEE Comput Intell Mag, 2014, 9: 14-26 CrossRef Google Scholar

[8] Collins A M, Quillian M R. Retrieval time from semantic memory. American Assoc Artif Intell, 1995, 8: 240--247. Google Scholar

[9] Pons F, Harris P L, de Rosnay M. Emotion comprehension between 3 and 11 years: Developmental periods and hierarchical organization. Eur J Dev Psychology, 2004, 1: 127-152 CrossRef Google Scholar

[10] Friedman N. Inferring Cellular Networks Using Probabilistic Graphical Models. Science, 2004, 303: 799-805 CrossRef PubMed ADS Google Scholar

[11] Bj?rklund M, Taipale M, Varjosalo M. Identification of pathways regulating cell size and cell-cycle progression by RNAi. Nature, 2006, 439: 1009-1013 CrossRef PubMed ADS Google Scholar

[12] Hayes E C. The Classification of Social Phenomena. Am J Sociology, 1911, 17: 375-399 CrossRef Google Scholar

[13] Li J Y, Fong S, Zhuang Y, et al. Hierarchical classification in text mining for sentiment analysis. In: Proceedings of the International Conference on Soft Computing and Machine Intelligence, New Delhi, 2015. 46--51. Google Scholar

[14] Jernite Y, Choromanska A, Sontag D. Simultaneous learning of trees and representations for extreme classification and density estimation. In: Proceedings of the International Conference on Machine Learning, Sydney, 2017. Google Scholar

[15] Yin B, Ambikairajah E, Chen F. Hierarchical language identification based on automatic language clustering. In: Procedings of the 8th Annual Conference of the International Speech Communication Association, Antwerp, 2007. 178--181. Google Scholar

[16] Oh H S, Myaeng S H. Utilizing global and path information with language modelling for hierarchical text classification. J Inf Sci, 2014, 40: 127-145 CrossRef Google Scholar

[17] Qu Y Y, Li L, Shen F M, et al. Joint hierarchical category structure learning and large-scale image classification. IEEE Trans Image Process, 2016, 26: 4331--4346. Google Scholar

[18] Gao T, Koller D. Discriminative learning of relaxed hierarchy for large-scale visual recognition. In: Proceedings of the International Conference on Computer Vision, Barcelona, 2011. 2072--2079. Google Scholar

[19] Silla C N, Freitas A A. A survey of hierarchical classification across different application domains. Data Min Knowl Disc, 2011, 22: 31-72 CrossRef Google Scholar

[20] Wu F H, Zhang J, Honavar V. Learning classifiers using hierarchically structured class taxonomies. In: Proceedings of the International Conference on Abstraction, Reformulation and Approximation. Berlin: Springer, 2005. 313--320. Google Scholar

[21] Esposito F, Malerba D, Tamma V, et al. Classical resemblance measures. In: Analysis of Symbolic Data. Berlin: Springer, 2000, 12: 139--152. Google Scholar

[22] Dekel O, Keshet J, Singer Y. Large margin hierarchical classification. In: Proceedings of the International Conference on Machine Learning, Banff, 2004. 27. Google Scholar

[23] Kosmopoulos A, Partalas I, Gaussier E. Evaluation measures for hierarchical classification: a unified view and novel approaches. Data Min Knowl Disc, 2015, 29: 820-865 CrossRef Google Scholar

[24] Deng J, Krause J, Berg A C, et al. Hedging your bets: optimizing accuracy-specificity trade-offs in large scale visual recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, Providence, 2012. 3450--3457. Google Scholar

[25] Ferrari V, Guillaumin M. Large-scale knowledge transfer for object localization in imagenet. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2012. 3202--3209. Google Scholar

[26] Gopal S, Yang Y. Hierarchical bayesian inference and recursive regularization for large-scale classification. Acm Trans Knowl Discov Data, 2015, 9: 1--23. Google Scholar

[27] Deri L, Martinelli M, Sartiano D, et al. Large scale web-content classification. In: Proceedings of the International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Lisbon, 2016. 545--554. Google Scholar

[28] Griffin G, Perona P. Learning and using taxonomies for fast visual categorization. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, Anchorage, 2008. 1--8. Google Scholar

[29] Bengio S, Weston J, Grangier D. Label embedding trees for large multi-class tasks. In: Proceedings of the International Conference on Neural Information Processing Systems, Vancouver, 2010. 163--171. Google Scholar

[30] Liu B, Sadeghi F, Tappen M, et al. Probabilistic label trees for efficient large scale image classification. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, Portland, 2013. 843--850. Google Scholar

[31] Deng J, Satheesh S, Berg A C, et al. Fast and balanced: efficient label tree learning for large scale object recognition. In: Proceedings of the International Conference on Neural Information Processing Systems, Granada, 2011. 567--575. Google Scholar

[32] Zhou N, Fan J P. Jointly learning visually correlated dictionaries for large-scale visual recognition applications. IEEE Trans Pattern Anal Mach Intell, 2013, 36: 715--730. Google Scholar

[33] Lei H, Mei K, Zheng N. Learning group-based dictionaries for discriminative image representation. Pattern Recognition, 2014, 47: 899-913 CrossRef Google Scholar

[34] Jianping Fan , Ning Zhou , Jinye Peng . Hierarchical Learning of Tree Classifiers for Large-Scale Plant Species Identification. IEEE Trans Image Process, 2015, 24: 4172-4184 CrossRef PubMed ADS Google Scholar

[35] Zheng Y, Fan J, Zhang J. Hierarchical learning of multi-task sparse metrics for large-scale image classification. Pattern Recognition, 2017, 67: 97-109 CrossRef Google Scholar

[36] Hwang S J, Grauman K, Fei S. Semantic kernel forests from multiple taxonomies. In: Proceedings of the International Conference on Neural Information Processing Systems, Lake Tahoe, 2012. 1718--1726. Google Scholar

[37] Wang Y, Forsyth D. Large multi-class image categorization with ensembles of label trees. Int J Mol Medicine, 2013, 31: 1--6. Google Scholar

[38] Zhao S, Han Y, Zou Q. Hierarchical support vector machine based structural classification with fused hierarchies. Neurocomputing, 2016, 214: 86-92 CrossRef Google Scholar

[39] Zhao S, Zou Q. Fusing multiple hierarchies for semantic hierarchical classification. Int J Mach Learn Comput, 2016. Google Scholar

[40] Deng J, Satheesh S, Berg A C, et al. Fast and balanced: efficient label tree learning for large scale object recognition. In: Proceedings of the International Conference on Neural Information Processing Systems, Granada, 2011. Google Scholar

[41] Jernite Y, Choromanska A, Sontag D. Simultaneous learning of trees and representations for extreme classification and density estimation. In: Proceedings of the International Conference on Machine Learning, Sydney, 2017. Google Scholar

[42] Zhou Y C, Hu Q H, Wang Y. Deep super-class learning for long-tail distributed image classification. Pattern Recogn, 2018. Google Scholar

[43] Tang B, Kay S, He H. Toward Optimal Feature Selection in Naive Bayes for Text Categorization. IEEE Trans Knowl Data Eng, 2016, 28: 2508-2521 CrossRef Google Scholar

[44] Zhang T, Ren P, Ge Y. Learning Proximity Relations for Feature Selection. IEEE Trans Knowl Data Eng, 2016, 28: 1231-1244 CrossRef Google Scholar

[45] Yang Y, Shen H T, Ma Z, et al. $l_{2,1}$-norm regularized discriminative feature selection for unsupervised learning. In: Proceedings of the International Joint Conference on Artificial Intelligence, Barcelona, 2011. 1589--1594. Google Scholar

[46] Freeman C, Kulic D, Basir O. Feature-selected tree-based classification.. IEEE Trans Cybern, 2013, 43: 1990-2004 CrossRef PubMed Google Scholar

[47] Freeman C, Kuli$\acute{\rm~~c}$ D, Basir O. Joint feature selection and hierarchical classifier design. In: Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, Anchorage, 2011. 1728--1734. Google Scholar

[48] Grimaudo L, Mellia M, Baralis E. Hierarchical learning for fine grained Internet traffic classification. In: Proceedings of the International Wireless Communications and Mobile Computing Conference, Limassol, 2012. 463--468. Google Scholar

[49] Hanchuan Peng , Fuhui Long , Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy.. IEEE Trans Pattern Anal Machine Intell, 2005, 27: 1226-1238 CrossRef PubMed Google Scholar

[50] Feng S, Lang C, Feng J. Human Facial Age Estimation by Cost-Sensitive Label Ranking and Trace Norm Regularization. IEEE Trans Multimedia, 2017, 19: 136-148 CrossRef Google Scholar

[51] Wen Y D, Zhang K P, Li Z F, et al. A discriminative feature learning approach for deep face recognition. In: Proceedings of the European Conference on Computer Vision, Amsterdam, 2016. 499--515. Google Scholar

[52] Decoro C, Barutcuoglu Z, Fiebrink R. Bayesian aggregation for hierarchical genre classification. In: Proceedings of the International Conference on Music Information Retrieval, Vienna, 2007. 77--80. Google Scholar

[53] Bennett P N, Nguyen N. Refined experts: improving classification in large taxonomies. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, Boston, 2009. 11--18. Google Scholar

[54] Sun M, Huang W, Savarese S. Find the best path: an efficient and accurate classifier for image hierarchies. In: Proceedings of the IEEE International Conference on Computer Vision, Sydney, 2014. 265--272. Google Scholar

[55] Haibo He , Garcia E A. Learning from Imbalanced Data. IEEE Trans Knowl Data Eng, 2009, 21: 1263-1284 CrossRef Google Scholar

[56] Ramanathan V, Li C, Deng J, et al. Learning semantic relationships for better action retrieval in images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 1100--1109. Google Scholar

[57] Shuai B, Zuo Z, Wang B, et al. Dag-recurrent neural networks for scene labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3620--3629. Google Scholar

[58] Wang H, Wu J, Yuan S. On characterizing scale effect of Chinese mutual funds via text mining. Signal Processing, 2016, 124: 266-278 CrossRef Google Scholar

[59] Zhao B, Li F F, Xing E P. Large-scale category structure aware image categorization. In: Proceedings of the International Conference on Neural Information Processing Systems, Granada, 2011. 1251--1259. Google Scholar

[60] Mccallum A, Rosenfeld R, Mitchell T M, et al. Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc, 1998. 359--367. Google Scholar

[61] Shahbaba B, Neal R M. Improving classification when a class hierarchy is available using a hierarchy-based prior. Bayesian Anal, 2005, 2: 221--237. Google Scholar

[62] Wu H, Merler M, Uceda-Sosa R, et al. Learning to make better mistakes: semantics-aware visual food recognition. In: Proceedings of the ACM on Multimedia Conference, Amsterdam, 2016. 172--176. Google Scholar

[63] Fan J, Zhao T, Kuang Z. HD-MTL: Hierarchical Deep Multi-Task Learning for Large-Scale Visual Recognition. IEEE Trans Image Process, 2017, 26: 1923-1938 CrossRef PubMed ADS Google Scholar

[64] Zhou D, Xiao L, Wu M R. Hierarchical classification via orthogonal transfer. In: Proceedings of the International Conference on Machine Learning, Bellevue, 2011. 801--808. Google Scholar

[65] Xie S N, Yang T B, Wang X Y, et al. Hyper-class augmented and regularized deep learning for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 2645--2654. Google Scholar

[66] Lowe D G. Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision, Kerkyra, 1999. Google Scholar

[67] Sánchez J, Perronnin F, Mensink T. Image Classification with the Fisher Vector: Theory and Practice. Int J Comput Vis, 2013, 105: 222-245 CrossRef Google Scholar

[68] Luo J H, Wu J X. A survey on fine-grained image categorization using deep convolutional features. Acta Autom Sin, 2017, 43: 1306--1318. Google Scholar

[69] Sun A, Lim E P. Hierarchical text classification and evaluation. In: Proceedings of the IEEE International Conference on Data Mining, San Jose, 2001. 521--528. Google Scholar

[70] Ran E Y, Wiener Y. On the foundations of noise-free selective classification. J Mach Learn Res, 2010, 11: 1605--1641. Google Scholar

[71] Yuan M, Wegkamp M. Classification methods with reject option based on convex risk minimization. J Mach Learn Res, 2010, 11: 111--130. Google Scholar

[72] Hanczar B, Dougherty E R. Classification with reject option in gene expression data.. Bioinformatics, 2008, 24: 1889-1895 CrossRef PubMed Google Scholar

[73] D'Alessio S, Murray K, Schiaffino R, et al. The effect of using hierarchical classifiers in text categorization. Content-Based Multimedia Inf Access, 2000, 1: 302--313. Google Scholar

[74] Sun A, Lim E P, Ng W K. Blocking reduction strategies in hierarchical text classification. IEEE Trans Knowl Data Eng, 2004, 16: 1305-1308 CrossRef Google Scholar

[75] Ceci M, Malerba D. Classifying web documents in a hierarchy of categories: a comprehensive study. J Intell Inf Syst, 2007, 28: 37-78 CrossRef Google Scholar

[76] Wang Y, Hu Q H, Zhou Y C, et al. Local bayes risk minimization based stopping strategy for hierarchical classification. In: Proceedings of the IEEE International Conference on Data Mining, New Orleans, 2017. Google Scholar

[77] Babbar R, Partalas I, Gaussier E, et al. Learning taxonomy adaptation in large-scale classification. J Mach Learn Res, 2016, 17: 3350--3386. Google Scholar

[78] Naik A, Rangwala H. Inconsistent node flattening for improving top-down hierarchical classification. In: Proceedings of the IEEE International Conference on Data Science and Advanced Analytics, Montreal, 2016. 379--388. Google Scholar

[79] Naik A, Rangwala H. Filter based taxonomy modification for improving hierarchical classification,. arXiv Google Scholar

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1