logo

SCIENCE CHINA Information Sciences, Volume 64 , Issue 3 : 130102(2021) https://doi.org/10.1007/s11432-020-3117-3

Deep multiple instance selection

More info
  • ReceivedJun 26, 2020
  • AcceptedJul 30, 2020
  • PublishedFeb 7, 2021

Abstract


Acknowledgment

This work was supported by National Natural Science Foundation of China (Grant Nos. 61773198, 61751306) and NSFC-NRF Joint Research Project (Grant No. 61861146001).


References

[1] Dietterich T G, Lathrop R H, Lozano-Pérez T. Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 1997, 89: 31-71 CrossRef Google Scholar

[2] Zhou Z H, Zhang M L, Huang S J. Multi-instance multi-label learning. Artificial Intelligence, 2012, 176: 2291-2320 CrossRef Google Scholar

[3] Angelidis S, Lapata M. Multiple Instance Learning Networks for Fine-Grained Sentiment Analysis. TACL, 2018, 6: 17-31 CrossRef Google Scholar

[4] Feng J, Zhou Z H. Deep MIML network. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017. 1884--1890. Google Scholar

[5] Carbonneau M A, Cheplygina V, Granger E. Multiple instance learning: A survey of problem characteristics and applications. Pattern Recognition, 2018, 77: 329-353 CrossRef Google Scholar

[6] Andrews S, Tsochantaridis I, Hofmann T. Support vector machines for multiple-instance learning. In: Proceedings of Advances in Neural Information Processing Systems, 2002. 561--568. Google Scholar

[7] Li Y F, Kwok J T, Tsang I W, et al. A convex method for locating regions of interest with multi-instance learning. In: Proceedings of Machine Learning and Knowledge Discovery in Databases, European Conference, 2009. 15--30. Google Scholar

[8] Carbonneau M A, Granger E, Raymond A J. Robust multiple-instance learning ensembles using random subspace instance selection. Pattern Recognition, 2016, 58: 83-99 CrossRef Google Scholar

[9] Zhang Q, Goldman S A. EM-DD: an improved multiple-instance learning technique. In: Proceedings of Advances in Neural Information Processing Systems, 2001. 1073--1080. Google Scholar

[10] Qi C R, Su H, Mo K, et al. Pointnet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 77--85. Google Scholar

[11] Ilse M, Tomczak J M, Welling M. Attention-based deep multiple instance learning. In: Proceedings of the 35th International Conference on Machine Learning, 2018. 2132--2141. Google Scholar

[12] Tang P, Wang X, Bai S. PCL: Proposal Cluster Learning for Weakly Supervised Object Detection. IEEE Trans Pattern Anal Mach Intell, 2020, 42: 176-191 CrossRef Google Scholar

[13] Wang X, Yan Y, Tang P. Bag similarity network for deep multi-instance learning. Inf Sci, 2019, 504: 578-588 CrossRef Google Scholar

[14] Wei X S, Ye H J, Mu X, et al. Multi-instance learning with emerging novel class. IEEE Transactions on Knowledge and Data Engineering, in press. Google Scholar

[15] Zhou Z H, Xue X B, Jiang Y. Locating regions of interest in CBIR with multi-instance learning techniques. In: Proceedings of the 18th Australian Joint Conference on Artificial Intelligence, 2005. 92--101. Google Scholar

[16] Yixin Chen , Jinbo Bi , Wang J Z. MILES: Multiple-Instance Learning via Embedded Instance Selection. IEEE Trans Pattern Anal Mach Intell, 2006, 28: 1931-1947 CrossRef Google Scholar

[17] Wang J, Zucker J D. Solving the multiple-instance problem: a lazy learning approach. In: Proceedings of the 17th International Conference on Machine Learning, 2000. 1119--1126. Google Scholar

[18] Zhou Z H, Zhang M L. Solving multi-instance problems with classifier ensemble based on constructive clustering. Knowl Inf Syst, 2007, 11: 155-170 CrossRef Google Scholar

[19] Viola P A, Platt J C, Zhang C. Multiple instance boosting for object detection. In: Proceedings of Advances in Neural Information Processing Systems, 2005. 1417--1424. Google Scholar

[20] Olvera-López J A, Carrasco-Ochoa J A, Martínez-Trinidad J F. A review of instance selection methods. Artif Intell Rev, 2010, 34: 133-143 CrossRef Google Scholar

[21] Sofiiuk K, Barinova O, Konushin A. Adaptis: adaptive instance selection network. In: Proceedings of IEEE/CVF International Conference on Computer Vision, 2019. 7354--7362. Google Scholar

[22] Li Z, Geng G H, Feng J. Multiple instance learning based on positive instance selection and bag structure construction. Pattern Recognition Lett, 2014, 40: 19-26 CrossRef Google Scholar

[23] Liu G Q, Wu J X, Zhou Z H. Key instance detection in multi-instance learning. In: Proceedings of the 4th Asian Conference on Machine Learning, 2012. 253--268. Google Scholar

[24] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations, 2015. Google Scholar

[25] Xu K, Ba J, Kiros R, et al. Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the 32nd International Conference on Machine Learning, 2015. 2048--2057. Google Scholar

[26] Deng Y T, Kim Y, Chiu J, et al. Latent alignment and variational attention. In: Proceedings of Advances in Neural Information Processing Systems, 2018. 9735--9747. Google Scholar

[27] Malinowski M, Doersch C, Santoro A, et al. Learning visual question answering by bootstrapping hard attention. In: Proceedings of the 15th European Conference on Computer Vision, 2018. 3--20. Google Scholar

[28] Jang E, Gu S, Poole B. Categorical reparameterization with gumbel-softmax. In: Proceedings of the 5th International Conference on Learning Representations, 2017. Google Scholar

[29] Maddison C J, Mnih A, Teh Y W. The concrete distribution: a continuous relaxation of discrete random variables. In: Proceedings of the 5th International Conference on Learning Representations, 2017. Google Scholar

[30] van den Oord A, Vinyals O, Kavukcuoglu K. Neural discrete representation learning. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 6306--6315. Google Scholar

[31] Li Z H, He D, Tian F, et al. Towards binary-valued gates for robust LSTM training. In: Proceedings of the 35th International Conference on Machine Learning, 2018. 3001--3010. Google Scholar

[32] Kool W, van Hoof H, Welling M. Stochastic beams and where to find them: the gumbel-top-k trick for sampling sequences without replacement. In: Proceedings of the 36th International Conference on Machine Learning, 2019. 3499--3508. Google Scholar

[33] Do T T, Tran T, Reid I D, et al. A theoretically sound upper bound on the triplet loss for improving the efficiency of deep distance metric learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 10404--10413. Google Scholar

[34] Qian Q, Shang L, Sun B G, et al. Softtriple loss: deep metric learning without triplet sampling. In: Proceedings of IEEE/CVF International Conference on Computer Vision, 2019. 6449--6457. Google Scholar

[35] Bengio Y, L'eonard N, Courville A C. Estimating or propagating gradients through stochastic neurons for conditional computation. 2013,. arXiv Google Scholar

[36] Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, 2015. 448--456. Google Scholar

[37] Ulyanov D, Vedaldi A, Lempitsky V S. Instance normalization: the missing ingredient for fast stylization. 2016,. arXiv Google Scholar

[38] Santurkar S, Tsipras D, Ilyas A, et al. How does batch normalization help optimization? In: Proceedings of Advances in Neural Information Processing Systems, 2018. 2488--2498. Google Scholar

[39] Zaheer M, Kottur S, Ravanbakhsh S, et al. Deep sets. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 3391--3401. Google Scholar

[40] Zhou Z H, Sun Y Y, Li Y F. Multi-instance learning by treating instances as non-i.i.d. samples. In: Proceedings of the 26th International Conference on Machine Learning, 2009. 1249--1256. Google Scholar

[41] Wei X S, Wu J, Zhou Z H. Scalable Algorithms for Multi-Instance Learning. IEEE Trans Neural Netw Learning Syst, 2017, 28: 975-987 CrossRef Google Scholar

[42] Wang X, Yan Y, Tang P. Revisiting multiple instance neural networks. Pattern Recognition, 2018, 74: 15-24 CrossRef Google Scholar

[43] Tang D Y, Qin B, Liu T. Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, 2015. 1422--1432. Google Scholar

[44] Yang Z C, Yang D Y, Dyer C, et al. Hierarchical attention networks for document classification. In: Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016. 1480--1489. Google Scholar

[45] Pennington J, Socher R, Manning C D. Glove: global vectors for word representation. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, 2014. 1532--1543. Google Scholar

  • Figure 1

    (Color online) Illustrations for MIL problem and the difference between soft aggregation and hard selection. The ellipses are the true distributions of positive (red) and negative (blue) instances, whereas the black dotted line is the oracle classifier. Triangles are instances from a positive bag, whereas diamonds are from a negative bag. (a) Standard assumption in MIL. (b) Soft aggregation in MIL, the weighted positive and negative bags are plotted, and the positive bag will be wrongly classified with the interference of negative instances. (c) Hard selection in MIL, the selected instances in each bag are plotted, and the interference of negative instances in the positive bag can be omitted.

  • Figure 2

    (Color online) Comparisons of our proposed DMIS with instance-level and bag-level frameworks in MIL. The “a" with a circle refers to the attention, and the dotted lines mean that the attention aggregation is optional. (a) Instance-level MILs first make predictions for instances, and then aggregate all predictions. (b) Bag-level MILs first aggregate instances together, and then make predictions. (c) Our proposed DMIS will first figure ROIs out, and then make predictions without the influence of disturbing instances.

  • Figure 3

    (Color online) Comparisons of classification performance and positioning ROIs with different aggregation weights in synthetic MIL dataset. (a) Classification accuracy. MI-Mu1.0 uses the oracle hard selection weights. DMIS-GS can get better performance than MI-Soft. (b) Comparison of positioning ROIs. DMIS-GS can identify ROIs more accurately than MI-Soft.

  • Figure 4

    (Color online) The change of weight entropy on the test sets of MUSK1 (a) and ELEPHANT (b) in the learning process. DMIS-GS can obtain much lower entropy in the learning process.

  • Figure 5

    (Color online) The comparisons of instance score distributions before (a) and after (b) variance normalization. With variance normalization, the score distributions are more similar.

  • Figure 6

    (Color online) Comparisons between the decision process of DMIS-GS-2 and HAN (MI-Soft). The red circle shows the selection of words or sentences in DMIS-GS-2, and the purple rectangle shows the weights learned from MI-Soft. DMIS-GS-2 can provide a clearer explanation for the final decision.

  • Table 1  

    Table 1Comparison results on five classical MIL datasets, and the classification accuracy is reported. “$K$" in “DMIS-GS-$K$" means the number of selected instances. DMIS-GS and DMIS-GS-$K$ can get better results than most compared methods

    MUSK1 MUSK2 FOX TIGER ELEPHANT
    mi-SVM 87.4 83.6 58.2 78.4 82.2
    MI-SVM 77.9 84.3 57.8 84.0 84.3
    mi-Graph 88.9 90.3 62.0 86.0 86.9
    miVLAD 87.1 87.2 62.0 81.1 85.0
    miFV 90.988.4 62.1 81.3 85.2
    MI-Net 88.7 85.9 62.2 83.0 86.2
    MI-Net-DS 85.9 87.4 63.0 84.5 87.2
    MI-Net-RC 89.8 87.3 61.9 83.6 85.7
    MI-Mean 89.1 89.6 60.2 83.5 86.9
    MI-Max 87.6 84.7 58.1 81.2 85.7
    MI-Soft 89.2 85.8 61.5 83.9 86.8
    DMIS-GS 90.4 90.2 62.7 86.6 87.9 DMIS-GS-290.2 90.2 63.985.7 86.0
    DMIS-GS-3 90.3 90.762.8 85.9 87.6
  • Table 2  

    Table 2Comparison results on different hyper-parameters of the temperature in DMIS-GS. With variance normalization, the best hyper-parameters in different datasets tend to have the same values

    With VarNormWithout VarNorm
    $(T_0,~\gamma,~T_{\rm~min})$ FOX TIGER ELEPHANT FOX TIGER ELEPHANT
    (5.0, 0.9, 0.1) 62.7 86.6 87.960.3 84.5 85.9
    (1.0, 0.9, 0.1) 60.3 85.5 83.2 60.8 84.7 86.3 (10.0, 0.9, 0.1) 62.186.2 87.7 61.2 85.5 83.2
    (5.0, 0.5, 0.1) 54.3 79.7 82.2 59.8 85.884.7
    (5.0, 0.95, 0.1) 61.4 85.3 85.8 61.5 83.6 85.4
    (5.0, 0.9, 0.5) 61.5 84.5 86.5 60.5 84.1 85.2
    (5.0, 0.9, 0.01) 58.7 83.2 83.8 61.984.8 84.5
  • Table 3  

    Table 3The precision of detecting ROIs on some Reuters text datasets

    mi-SVM KI-SVM VF VFr DMIS-GS
    alt.atheism 0.53 0.37 0.73 0.58 0.78
    comp.graphics 0.61 0.38 0.66 0.95 0.93
    comp.os.ms-windows.misc 0.55 0.39 0.79 0.55 0.86
    comp.sys.ibm.pc 0.62 0.39 0.85 0.68 0.86
    comp.sys.mac.hardware 0.78 0.32 0.78 0.70 0.81
    comp.windows.x 0.55 0.40 0.69 0.27 0.72
    misc.forsale 0.59 0.03 0.66 0.85 0.73
    rec.autos 0.43 0.39 0.78 0.79 0.82
    rec.motorcycles 0.40 0.71 0.70 0.42 0.74
    rec.sport.baseball 0.46 0.63 0.73 0.84 0.79
    rec.sport.hockey 0.45 0.83 0.79 0.83 0.82
    sci.crypt 0.63 0.36 0.79 0.40 0.85
    sci.electronics 0.95 0.39 0.96 0.90 0.96
    sci.med 0.56 0.57 0.73 0.54 0.83
    sci.space 0.37 0.30 0.86 0.76 0.91
    soc.religion.christian 0.34 0.39 0.84 0.64 0.80
    talk.politics.guns 0.52 0.36 0.53 0.66 0.73
    talk.politics.mideast 0.73 0.66 0.72 0.58 0.71
    talk.politics.misc 0.65 0.54 0.72 0.65 0.66
    talk.religion.misc 0.30 0.38 0.67 0.49 0.70
  • Table 4  

    Table 4Comparison results on sentiment classification. ACC is classification accuracy, the higher the better, and MSE is the mean squared error, the lower the better. “$K$" in “DMIS-GS-$K$" means the number of selected instances

    Yelp13Yelp14IMDB
    ACC MSE ACC MSE ACC MSE
    MILNET 63.8 0.477 63.7 0.465 45.8 2.12
    HN-Mean63.7 0.475 63.5 0.473 46.4 2.16
    HN-Max 63.2 0.502 63.6 0.463 46.6 2.14
    HAN 64.0 0.470 64.0 0.45546.5 1.95
    DMIS-GS 63.8 0.471 63.9 0.473 46.6 1.88
    DMIS-GS-2 64.80.462 63.8 0.460 46.9 1.86 DMIS-GS-364.0 0.460 64.50.465 46.7 1.88