SCIENCE CHINA Information Sciences, Volume 64 , Issue 3 : 130101(2021) https://doi.org/10.1007/s11432-020-3132-4

Learning from group supervision: the impact of supervision deficiency on multi-label learning

• AcceptedJul 30, 2020
• PublishedFeb 7, 2021
Share
Rating

References

[1] Zhou Z-H, Zhang M-L. Multi-label learning. In: Encyclopedia of Machine Learning and Data Mining. 875--881. Berlin: Springer, 2016. 1--8. Google Scholar

[2] Cabral R, De la Torre F, Costeira J P. Matrix Completion for Weakly-Supervised Multi-Label Image Classification. IEEE Trans Pattern Anal Mach Intell, 2015, 37: 121-135 CrossRef Google Scholar

[3] Chen M, Zheng A X, Weinberger K Q. Fast image tagging. In: Proceedings of the 30th International Conference on Machine Learning, 2013. 1274--1282. Google Scholar

[4] Chalkidis I, Fergadiotis M, Malakasiotis P, et al. Large-scale multi-label text classification on EU legislation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019. 6314--6322. Google Scholar

[5] Nam J, Kim J, Menciaa E L, et al. Large-scale multi-label text classification - revisiting neural networks. In: Proceedings of the 25th European Conference on Machine Learning, 2014. 437--452. Google Scholar

[6] iATC-NRAKEL: an efficient multi-label classifier for recognizing anatomical therapeutic chemical classes of drugs. Bioinformatics, 2019, 7 CrossRef Google Scholar

[7] Ontological function annotation of long non-coding RNAs through hierarchical multi-label classification. Bioinformatics, 2018, 34: 1750-1757 CrossRef Google Scholar

[8] Zhou Z H. A brief introduction to weakly supervised learning. Natl Sci Rev, 2018, 5: 44-53 CrossRef Google Scholar

[9] Xu M, Jin R, Zhou Z. Speedup matrix completion with side information: application to multi-label learning. In: Proceedings of Advances in Neural Information Processing Systems 26, 2013. 2301--2309. Google Scholar

[10] Sun Y, Zhang Y, Zhou Z. Multi-label learning with weak label. In: Proceedings of the 24th Conference on Artificial Intelligence, 2010. Google Scholar

[11] Bucak S S, Jin R, Jain A K. Multi-label learning with incomplete class assignments. In: Proceedings of the 24th Conference on Computer Vision and Pattern Recognition, 2011. 2801--2808. Google Scholar

[12] Xie M, Huang S. Partial multi-label learning. In: Proceedings of the 32th Conference on Artificial Intelligence, 2018. 4302--4309. Google Scholar

[13] Yu G, Chen X, Domeniconi C, et al. Feature-induced partial multi-label learning. In: Proceedings of the 2018 International Conference on Data Mining, 2018. 1398--1403. Google Scholar

[14] Estellés-Arolas E, González-Ladrón-de-Guevara F. Towards an integrated crowdsourcing definition. J Inf Sci, 2012, 38: 189-200 CrossRef Google Scholar

[15] Li S Y, Jiang Y, Chawla N V. Multi-Label Learning from Crowds. IEEE Trans Knowl Data Eng, 2019, 31: 1369-1382 CrossRef Google Scholar

[16] Li S, Jiang Y. Multi-label crowdsourcing learning with incomplete annotations. In: Proceedings of the 15th Pacific Rim International Conference on Artificial Intelligence, 2018. 232--245. Google Scholar

[17] Quiroga R Q, Pedreira C. How Do We See Art: An Eye-Tracker Study. Front Hum Neurosci, 2011, 5 CrossRef Google Scholar

[18] Group N N. How people read online: The eyetracking evidence, 2nd edition. Technical report, 2020. Google Scholar

[19] Kingma D P, Ba J. Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations, 2015. Google Scholar

[20] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., 2014, 15(1): 1929--1958. Google Scholar

[21] Boutell M R, Luo J, Shen X. Learning multi-label scene classification. Pattern Recognition, 2004, 37: 1757-1771 CrossRef Google Scholar

[22] Tsoumakas G and Vlahavas I P. Random k -labelsets: An ensemble method for multilabel classification. In: Proceedings of the 18th European Conference on Machine Learning, 2007. 406--417. Google Scholar

[23] Read J, Pfahringer B, Holmes G, et al. Classifier chains for multi-label classification. In: Proceedings of the 20th European Conference on Machine Learning, 2009. 254--269. Google Scholar

[24] Elisseeff A, Weston J. A kernel method for multi-labelled classification. In: Proceedings of Advances in Neural Information Processing Systems 14, 2001. 681--687. Google Scholar

[25] Bhatia K, Jain H, Kar P, et al. Sparse local embeddings for extreme multi-label classification. In: Proceedings of Advances in Neural Information Processing Systems 28, 2015. 730--738. Google Scholar

[26] Zhang M L, Zhou Z H. A Review on Multi-Label Learning Algorithms. IEEE Trans Knowl Data Eng, 2014, 26: 1819-1837 CrossRef Google Scholar

[27] Hsu D J, Kakade S M, Langford J, et al. Multi-label prediction via compressed sensing. In: Proceedings of Advances in Neural Information Processing Systems 22, 2009. 772--780. Google Scholar

[28] Tai F, Lin H T. Multilabel Classification with Principal Label Space Transformation. Neural Computation, 2012, 24: 2508-2542 CrossRef Google Scholar

[29] Bi W, Kwok J T. Efficient multi-label classification with many labels. In: Proceedings of the 30th International Conference on Machine Learning, 2013. 405--413. Google Scholar

[30] Ubaru S, Mazumdar A. Multilabel classification with group testing and codes. In: Proceedings of the 34th International Conference on Machine Learning, 2017. 3492--3501. Google Scholar

[31] Goldberg A B, Zhu X, Recht B, et al. Transduction with matrix completion: Three birds with one stone. In: Proceedings of Advances in Neural Information Processing Systems 23, 2010. 757--765. Google Scholar

[32] Bi W, Kwok J T. Multilabel classification with label correlations and missing labels. In: Proceedings of the 28th Conference on Artificial Intelligence, 2014. 1680--1686. Google Scholar

[33] Xu L, Wang Z, Shen Z, et al. Learning low-rank label correlations for multi-label classification with missing labels. In: Proceedings of the 2014 International Conference on Data Mining, 2014. 1067--1072. Google Scholar

[34] Jing L, Yang L, Yu J, et al. Semi-supervised low-rank mapping learning for multi-label classification. In: Proceedings of the 28th Conference on Computer Vision and Pattern Recognition, 2015. 1483--1491. Google Scholar

[35] Wu B, Lyu S, Ghanem B. ML-MG: multi-label learning with missing labels using a mixed graph. In: Proceedings of the 2015 International Conference on Computer Vision, 2015. 4157--4165. Google Scholar

[36] Ferng C, Lin H. Multi-label classification with error-correcting codes. In: Proceedings of Hsu C and Lee W S, editors, Proceedings of the 3rd Asian Conference on Machine Learning, 2011. 281--295. Google Scholar

[37] Lv J, Xu M, Feng L, et al. Progressive identification of true labels for partial-label learning. 2020,. arXiv Google Scholar

[38] Bottou L. On-line learning and stochastic approximations. In: Proceedings of Online Learning in Neural Networks. Cambridge University Press, 1998. 9--42. Google Scholar

[39] Andrew G, Gao J. Scalable training of L1-regularized log-linear models. In: Proceedings of the 24th International Conference on Machine Learning, 2007. 33--40. Google Scholar

[40] Liu L and Dietterich T G. Learnability of the superset label learning problem. In: Proceedings of the 31th International Conference on Machine Learning, 2014. 1629--1637. Google Scholar

[41] Min-Ling Zhang , Zhi-Hua Zhou . Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization. IEEE Trans Knowl Data Eng, 2006, 18: 1338-1351 CrossRef Google Scholar

• Figure 1

(Color online) Different types of supervision information for multi-labeled data.

• Figure 2

(Color online) Test results show how the number of positive groups impacts learning. $x$-axis is the epoch, and $y$-axis is the MSE. The shadow area shows the standard deviation (STD) of ten random trials under the same setting. iris the number of irrelevant labels in each positive group, and rcshows the relevant label coverage. The number of relevant label per group is fixed to be $1$, and the irrelevant label coverage is fixed to be $0.5$. (a) ir: 2, rc: 0.25; (b) ir: 2, rc: 0.5; (c) ir: 2, rc: 0.75; (d) ir: 2, rc: 1.0; (e) ir: 4, rc: 0.25; (f) ir: 4, rc: 0.5; (g) ir: 4, rc: 0.75; (h) ir: 4, rc: 1.0.

• Figure 3

(Color online) Test results show how the relevant label coverage (rc) impacts learning. $x$-axis is the epoch, and $y$-axis is the MSE. Under the same setting, the shadow area shows the variance of ten random trials. iris the number of irrelevant labels in each positive group, and icshows the irrelevant label coverage. The number of relevant label per group is fixed to be $1$, and the number of positive groups is fixed to be $3$. (a) ir: 1, ic: 0.25; (b) ir: 1, ic: 0.5; (c) ir: 1, ic: 0.75; (d) ir: 1, ic: 1.0; (e) ir: 2, ic: 0.25; (f) ir: 2, ic: 0.5; (g) ir: 2, ic: 0.75; (h) ir: 2, ic: 1.0.

• Figure 4

(Color online) Test results show how the irrelevant label coverage (ic) impacts learning. $x$-axis is the epoch, and $y$-axis is the MSE. Under the same setting, the shadow area shows the STD of ten random trials. ngis the number of positive groups, and rcshows the relevant label coverage. The number of relevant label per group is fixed to be $1$, and the number of irrelevant label per group is fixed to be $2$. (a) ng: 2, rc: 0.25; (b) ng: 2, rc: 0.5; (c) ng: 2, rc: 0.75; (d) ng: 2, rc: 1.0; (e) ng: 3, rc: 0.25; protectłinebreak (f) ng: 3, rc: 0.5; (g) ng: 3, rc: 0.75; (h) ng: 3, rc: 1.0.

• Figure 5

(Color online) Test results show how the number of relevant labels per positive group impacts learning. $x$-axis is the epoch, and $y$-axis is the MSE. Under the same settings, the shadow area shows the STD of ten random trials. iris the number of irrelevant labels per positive group, and icshows the irrelevant label coverage. The number of positive groups and the relevant label coverage are fixed to be $3$ and $0.5$ respectively. (a) ir: 1, ic: 0.25; (b) ir: 1, ic: 0.5; (c) ir: 1, ic: 0.75; (d) ir: 1, ic: 1.0; (e) ir: 2, ic: 0.25; (f) ir: 2, ic: 0.5; (g) ir: 2, ic: 0.75; (h) ir: 2, ic: 1.0.

• Figure 6

(Color online) Test results show how the number of irrelevant labels per positive group impacts learning. $x$-axis is the epoch, and $y$-axis is the MSE. Under the same setting, the shadow area shows the STD of ten random trials. ngis the number of positive groups, and icshows the irrelevant label coverage. The number of relevant labels per group and the relevant label coverage are fixed to be $1$ and $0.5$, respectively. (a) ng: 2, ic: 0.25; (b) ng: 2, ic: 0.5; (c) ng: 2, ic: 0.75; (d) ng: 2, ic: 1.0; (e) ng: 3, ic: 0.25; (f) ng: 3, ic: 0.5; (g) ng: 3, ic: 0.75; (h) ng: 3, ic: 1.0.

•

Algorithm 1 GS-MLL: group-supervised multi-label learning

Require:the training set $\cD~=~\{(\x_1,~\S_1),~\ldots,~(\x_n,~\S_n)\}$; epoch number $T$; number of mini batches $B$;

Output:$\Theta$, the model parameter for $\g(\x;\Theta)$;

Let $\cA$ be any stochastic optimization algorithm;

$t=1$;

while $t\le~T$ do

$t~=~t+1$;

Shuffle $\cD$ into $B$ mini-batches;

$b~=~1$;

while $b\le~B$ do

Pick the $b$th mini batch;

Compute the empirical risk $L$ on the mini batch by (9);

Calculate the gradient $-\nabla_\Theta~L$;

Update $\Theta$ by $\cA$;

$b~=~b+1$;

end while

end while

Citations

Altmetric