logo

SCIENTIA SINICA Informationis, Volume 49, Issue 9: 1083-1096(2019) https://doi.org/10.1360/N112018-00150

Disambiguation-free partial label learning

More info
  • ReceivedJun 10, 2018
  • AcceptedApr 29, 2019
  • PublishedAug 29, 2019

Abstract

Partial label learning is an important weakly supervised machine learning framework. In partial label learning, each object is described by a single instance in the input space; however, in the output space, it is associated with a set of candidate labels among which only one is valid. An intuitive strategy is to disambiguate candidate labels, but this strategy tends to be misled by false positive labels; therefore, new disambiguation-free approaches need to be considered. In this paper, several algorithms are reviewed from the perspective of disambiguation and disambiguation-free strategies. First, the problem definition on partial label learning and its relationship with other related learning frameworks are given. Second, several representative partial label learning algorithms via the disambiguation strategy are introduced. Third, two of our proposed disambiguation-free algorithms are presented. Finally, the summary of this paper is given and possible future investigations on partial label learning are briefly discussed.


Funded by

国家重点研发计划(2018YFB1004300)

国家自然科学基金(61573104)


References

[1] Zhou Z H. Machine Learning. Beijing: Tsinghua University Press, 2016. Google Scholar

[2] Zhou Z H. A brief introduction to weakly supervised learning. Natl Sci Rev, 2018, 5: 44-53 CrossRef Google Scholar

[3] Cour T, Sapp B, Taskar B. Learning from partial labels. J Mach Learn Res, 2011, 12: 1501--1536. Google Scholar

[4] Chen C H, Patel V M, Chellappa R. Learning from Ambiguously Labeled Face Images.. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 1653-1667 CrossRef PubMed Google Scholar

[5] Zhang M L. Research on Partial Label Learning. J Data Acquis Process, 2015, 30: 77--87. Google Scholar

[6] Luo J, Orabona F. Learning from candidate labeling sets. In: Proceedings of Advances in Neural Information Processing Systems, Cambridge, 2010. 1504--1512. Google Scholar

[7] Zeng Z, Xiao S, Jia K, et al. Learning by associating ambiguously labeled images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, 2013. 708--715. Google Scholar

[8] Liu L, Dietterich T G. A conditional multinomial mixture model for superset label learning. In: Proceedings of Advances in Neural Information Processing Systems, Cambridge, 2012. 548--556. Google Scholar

[9] Wang J, Zhang M L. Towards mitigating the class-imbalance problem for partial label learning. In: Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, London, 2018. 2427--2436. Google Scholar

[10] Zhou Y, Gu H. Geometric mean metric learning for partial label data. Neurocomputing, 2018, 275: 394-402 CrossRef Google Scholar

[11] Nguyen V L, Destercke S, Masson M H. Querying partially labelled data to improve a k-nn classifier. In: Proceedings of 31st AAAI Conference on Artificial Intelligence, San Francisco, 2017. 2401--2407. Google Scholar

[12] Chapelle O, Scholkopf B, Zien, Eds. A. Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews]. IEEE Trans Neural Netw, 2009, 20: 542-542 CrossRef Google Scholar

[13] Zhu X J, Goldberg A B. Introduction to Semi-Supervised Learning. In: Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan and Claypool Publishers, 2009. 3: 1--130. Google Scholar

[14] Dietterich T G, Lathrop R H, Lozano-Pérez T. Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 1997, 89: 31-71 CrossRef Google Scholar

[15] Amores J. Multiple instance classification: Review, taxonomy and comparative study. Artificial Intelligence, 2013, 201: 81-105 CrossRef Google Scholar

[16] Tsoumakas G, Katakis I, Vlahavas I. Mining multi-label data. In: Proceedings of Data Mining and Knowledge Discovery Handbook, Boston, 2009. 667--685. Google Scholar

[17] Zhang M L, Zhou Z H. A Review on Multi-Label Learning Algorithms. IEEE Trans Knowl Data Eng, 2014, 26: 1819-1837 CrossRef Google Scholar

[18] Sun Y Y, Zhang Y, Zhou Z H. Multi-label learning with weak label. In: Proceedings of 24th AAAI Conference on Artificial Intelligence, Atlanta, 2010. 593--598. Google Scholar

[19] Wang D Y, Hoi S C H, He Y. Mining Weakly Labeled Web Facial Images for Search-Based Face Annotation. IEEE Trans Knowl Data Eng, 2014, 26: 166-179 CrossRef Google Scholar

[20] Li Y F, Tsang I W, Kwok J T, et al. Convex and scalable weakly labeled svms. J Machine Learn Res, 2013, 14: 2151--2188. Google Scholar

[21] Zhou Z H, Zhang M L, Huang S J. Multi-instance multi-label learning. Artificial Intelligence, 2012, 176: 2291-2320 CrossRef Google Scholar

[22] Xie M K, Huang S J. Partial multi-label learning. In: Proceedings of 32nd AAAI Conference on Artificial Intelligence, New Orleans, 2018. 4302--4309. Google Scholar

[23] Yu G X, Chen X, Domeniconi C, et al. Feature-Induced Partial Multi-label Learning. In: Proceedings of 2018 IEEE International Conference on Data Mining, Singapore, 2018. 1398--1403. Google Scholar

[24] Fang J P, Zhang M L. Partial multi-label learning via credible label elicitation. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, 2019. Google Scholar

[25] Jin R, Ghahramani Z. Learning with multiple labels. In: Proceedings of Advances in Neural Information Processing Systems, Cambridge, 2003. 921--928. Google Scholar

[26] Satoh S, Nakamura Y, Kanade T. Name-It: naming and detecting faces in news videos. IEEE Multimedia, 1999, 6: 22-35 CrossRef Google Scholar

[27] Barnard K, Duygulu P, Forsyth D, et al. Matching words and pictures. J Mach Learn Res, 2003, 3: 1107--1135. Google Scholar

[28] Berg T L, Berg A C, Edwards J, et al. Names and faces in the news. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, 2004. 848--854. Google Scholar

[29] Everingham M, Sivic J, Zisserman A. Hello My name is... Buffy--automatic naming of characters in TV video. In: Proceedings of the 17th British Machine Vision Conference, Edinburgh, 2006. 889--908. Google Scholar

[30] Yu F, Zhang M L. Maximum margin partial label learning. In: Proceedings of Asian Conference on Machine Learning, Hamilton, 2016. 96--111. Google Scholar

[31] Tang C Z, Zhang M L. Confidence-Rated Discriminative Partial Label Learning. In: Proceedings of the Association for the Advancement of Artificial, San Francisco, 2017. 2611--2617. Google Scholar

[32] Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B, 1977, 39: 1--38. Google Scholar

[33] Grandvallet Y. Logistic regression for partial labels. In: Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Annecy, 2002. 1935--1941. Google Scholar

[34] Della Pietra S, Della Pietra V, Lafferty J. Inducing features of random fields. IEEE Trans Pattern Anal Machine Intell, 1997, 19: 380-393 CrossRef Google Scholar

[35] Koller D, Friedman N. Probabilistic Graphical Models: Principles and Techniques. Cambridge: MIT Press, 2009. Google Scholar

[36] Nguyen N, Caruana R. Classification with partial labels. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, Las Vegas, 2008. 551--559. Google Scholar

[37] Hüllermeier E, Beringer J. Learning from ambiguously labeled examples*. IDA, 2006, 10: 419-439 CrossRef Google Scholar

[38] Zhang M L, Yu F. Solving the partial label learning problem: an instance-based approach. In: Proceedings of International Conference on Artificial Intelligence, Buenos Aires, 2015. 4048--4054. Google Scholar

[39] Gong C, Liu T L, Tang Y Y. A Regularization Approach for Instance-Based Superset Label Learning.. IEEE Trans Cybern, 2018, 48: 967-978 CrossRef PubMed Google Scholar

[40] Feng L, An B. Leveraging latent label distributions for partial label learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, 2018. 2107--2113. Google Scholar

[41] Zhang M L, Zhou B B, Liu X Y. Partial label learning via feature-aware disambiguation. In: Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, San Francisco, 2016. 1335--1344. Google Scholar

[42] Xu N, Lv J Q, Geng X. Partial label learning via label enhancement. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, 2019. Google Scholar

[43] Zhang M L, Yu F, Tang C Z. Disambiguation-Free Partial Label Learning. IEEE Trans Knowl Data Eng, 2017, 29: 2155-2167 CrossRef Google Scholar

[44] Wu X, Zhang M L. Towards enabling binary decomposition for partial label learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, 2018. 2868--2874. Google Scholar

[45] Pujol O, Escalera S, Radeva P. An incremental node embedding technique for error correcting output codes. Pattern Recognition, 2008, 41: 713-725 CrossRef Google Scholar

[46] Zhou Z H. Ensemble Methods: Foundations and Algorithms. Boca Raton: Chapman & Hall/CRC, 2012. Google Scholar

[47] Allwein E L, Schapire R E, Singer Y. Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res, 2000, 1: 113--141. Google Scholar

  • Figure 1

    (Color online) Weakly-supervised machine learning framework [3,5]. (a) Semi-supervised learning; (b) multi-instance learning; (c) multi-label learning; (d) partial label learning

  •   

    Algorithm 1 The pseudo-code of PL-ECOC

    Require:$\mathcal{D}$: partial label training set $\lbrace~({\boldsymbol~x}_i,S_i)~\mid~1~\leq~i~<~m~\rbrace$, $({\boldsymbol~x}_i~\in~\mathcal{X},S_i~\subseteq~\mathcal{Y},\mathcal{X}=\mathbb{R}^d,~\mathcal{Y}=\lbrace~y_1,~y_2,~\dots,~y_q~\rbrace)$; $L$: the codeword length; $\mathcal{B}$: binary training algorithm; $\tau$: the thresholding binary training set size; ${\boldsymbol~x}^*$: the unseen instance;

    Output:$y^*$: the predicted class label for ${\boldsymbol~x}^*$;

    $l=0$;

    while $l~\neq~L$ do

    Randomly generate a $q$-bits column coding: $\boldsymbol~v~=~[v_1,v_2,\ldots,v_q]~\in~\{+1,-1\}^q$;

    Dichotomize the label space according to (16);

    Construct binary training set according to (17);

    if $|\mathcal{D}_v~\mid~\leq~\tau$ then

    $l=l+1$;

    Set the $l$-th column of the coding matrix ${\boldsymbol~M}$ to $\boldsymbol~v$: ${\boldsymbol~M}(:,l)=\boldsymbol~v$;

    Build the binary classifier by invoking $\mathcal{B}$ on $\mathcal{D}_v$, i.e., $h_l\leftarrow\mathcal{B}(\mathcal{D}_v)$;

    end if

    end while

    Generate codeword $h({\boldsymbol~x}^*)$ by querying binary classifier' outputs: $h({\boldsymbol~x}^*)=[h_1({\boldsymbol~x}^*),h_2({\boldsymbol~x}^*),\ldots,h_L({\boldsymbol~x}^*)]^{\rm~T}$;

    Return $y^*=f({{\boldsymbol~x}^*})$ according to (18).

  •   

    Algorithm 2 The pseudo-code of PALOC

    Require:$\mathcal{D}$: partial label training set $\lbrace~({\boldsymbol~x}_i,S_i)~\mid~1~\leq~i~<~m~\rbrace$, $({\boldsymbol~x}_i~\in~\mathcal{X},S_i~\subseteq~\mathcal{Y},\mathcal{X}=\mathbb{R}^d,~\mathcal{Y}=\lbrace~y_1,~y_2,~\dots,~y_q~\rbrace)$; $\mathcal{B}$: binary training algorithm; $\mu$: the balance parameter; ${\boldsymbol~x}^*$: the unseen instance;

    Output:$y^*$: the predicted label label for $\boldsymbol~x^*$;

    for $j=1$ to $q-1$

    for $k=j+1$ to $q$

    Construct the one-vs-one binary training set $\mathcal{D}_{jk}$ according to (19);

    $g_{jk}~\leftarrow~\mathcal{B}(\mathcal{D}_{jk})$;

    end for

    end for

    for $i=1$ to $m$

    Obtain the disambiguation prediction $\hat~y_i$ for ${\boldsymbol~x}_i$ according to (20);

    Indentify the refined candidate label set $\hat{S}_i$ for ${\boldsymbol~x}_i$ according to (21);

    end for

    for $r=1$ to $q$

    Construct the stacking binary training set $\mathcal{D}_{j}$ according to (22);

    $g_{r}~\leftarrow~\mathcal{B}(\mathcal{D}_{j})$;

    end for

    Generate the augmented feature vector $\hat{\boldsymbol~x}^*$ for $\hat{{\boldsymbol~x}^*}$ according to (23);

    Return $y^*=f(\hat{{\boldsymbol~x}^*})$ according to (24).

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1       京公网安备11010102003388号