logo

SCIENTIA SINICA Informationis, Volume 48, Issue 1: 60-78(2018) https://doi.org/10.1360/N112017-00124

Deep relative metric learning for visual tracking

More info
  • ReceivedJun 2, 2017
  • AcceptedAug 1, 2017
  • PublishedJan 5, 2018

Abstract

While traditional tracking-by-detection methods have some robustness in object tracking, their simple classification of the target and the background cannot model the relative structural relationship between the target and the background. It is the lack of relative structural discriminative information that always causes a tracker drifting away. In order to alleviate the tracking drifting problem, we propose a new visual travel approach based on deep relative metric learning. In this study, we design a deep relative metric learning model with a symmetric and shared-weight deep convolutional neural network. Through such a network, we can explore the relative structural relationship between the target and the background in large-scale image patches. Then, the highest score of relative metric is used to locate the tracking object in the Bayesian tracking framework. The whole tracking algorithm is simple and effective. Experimental results on the tracking benchmark show that the proposed algorithm achieves a better tracking precision rate and success rate than other state-of-the-art tracking methods.


Funded by

国家自然科学基金(61603372)

国家自然科学基金(61307041)

山东省自然科学基金(ZR2015FL020)

国家自然科学基金(614722227)

国家重点基础研究发展计划 (973计划)(2012CB316304)

模式识别国家重点实验室开放课题(201600024)

国家自然科学基金(61303086)

国家自然科学基金(61572498)

国家自然科学基金(61572296)

国家自然科学基金(61672327)


References

[1] Zhang S, Yao H, Sun X, et al. Sparse coding based visual tracking: review and experimental comparison. Pattern Recogn, 2013, 46: 1772--1788. Google Scholar

[2] Wu Y, Lim J, Yang M H. Online object tracking: a benchmark. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Portland, 2013. 2411--2418. Google Scholar

[3] Smeulders A W, Chu D M, Cucchiara R, et al. Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intel, 2014, 36: 1442--1468. Google Scholar

[4] Yang M, Pei M T, Wu Y W, et al. Learning online structural appearance model for robust object tracking. Sci China Inf Sci, 2015, 58: 032106. Google Scholar

[5] Bai Y C, Tang M. Robust tracking via weakly supervised ranking SVM. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2012. 1854--1861. Google Scholar

[6] Jiang N, Liu W, Wu Y. Learning adaptive metric for robust visual tracking. IEEE Trans Image Process, 2011, 20: 2288--2300. Google Scholar

[7] Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intel, 2013, 35: 1798--1828. Google Scholar

[8] Fan J L, Xu W, Wu Y, et al. Human tracking using convolutional neural networks. IEEE Trans Neural Netw, 2012, 21: 1610--1623. Google Scholar

[9] Hu D, Zhou X S, Wu J J. Visual tracking based on convolutional deep belief network. In: Proceedings of International Workshop on Advanced Parallel Processing Technologies. Berlin: Springer, 2015. 103--115. Google Scholar

[10] Li H X, Li Y, Porikli F. DeepTrack: learning discriminative feature representations by convolutional neural networks for visual tracking. IEEE Trans Image Process, 2016, 25: 1834--1848. Google Scholar

[11] Wang N Y, Yeung D Y. Learning a deep compact image representation for visual tracking. In: Proceedings of International Conference on Advances in Neural Information Processing System, Lake Tahoe, 2013. 809--817. Google Scholar

[12] Wang N Y, Shi J P, Yeung D Y, et al. Understanding and diagnosing visual tracking systems. In: Proceedings of IEEE International Conference on Computer Vision, Santiago, 2015. 3101--3109. Google Scholar

[13] Comaniciu D, Ramesh V, Meer P. Kernel-based object tracking. IEEE Trans Pattern Anal Mach Intel, 2003, 25: 265--577. Google Scholar

[14] Collins R T. Mean-shift blob tracking through scale space. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Madison, 2003. Google Scholar

[15] Ross D, Lim J W, Lin R S, et al. Incremental learning for robust visual tracking. Int J Comput Vision, 2008, 77: 125--141. Google Scholar

[16] Zhang T Z, Ghanem B, Liu S. Robust visual tracking via structured multi-task sparse learning. Int J Comput Vision, 2013, 102: 367--383. Google Scholar

[17] Porikli F, Tuzel O, Meer P. Covariance tracking using model update based on Lie Algebra. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, New York, 2006. 728--735. Google Scholar

[18] Li K, He F Z, Chen X. Visual tracking algorithm based on MAP multi-subspace incremental learning. Sci Sin Inform, 2016, 46: 476--495. Google Scholar

[19] Zhou Y, Bai X, Liu W, et al. Similarity fusion for visual tracking. Int J Comput Vision, 2016, 118: 337--363. Google Scholar

[20] Zhang K H, Zhang L, Yang M H. Real-time compressive tracking. In: Proceedings of European Conference on Computer Vision, Florence, 2012. 866--879. Google Scholar

[21] Avidan S. Support vector tracking. IEEE Trans Pattern Anal Mach Intel, 2004, 26: 1064--1072. Google Scholar

[22] Avidan S. Ensemble tracking. IEEE Trans Pattern Anal Mach Intel, 2007, 29: 261--271. Google Scholar

[23] Collins R, Liu Y, Leordeanu M. Online selection of discriminative tracking features. IEEE Trans Pattern Anal Mach Intel, 2005, 27: 1631--1643. Google Scholar

[24] Grabner H, Bischof H. Online boosting and vision. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recogntion, New York, 2006. 260--267. Google Scholar

[25] Guo W, Cao L, Han T X, et al. Max-confidence boosting with uncertainty for visual tracking. IEEE Trans Image Process, 2015, 24: 1650--1659. Google Scholar

[26] Saffari A, Leistner C, Santner J, et al. Online random forests. In: Proceedings of IEEE International Conference on Computer Vision Workshops, Kyoto, 2009. 1393--1400. Google Scholar

[27] Babenko B, Yang M, Belongie S. Visual tracking with online multiple instance learning. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recogntion, Miami, 2009. 983--990. Google Scholar

[28] Kalal Z, Matas J, Mikolajczyk K. “P-N learning: bootstrapping binary classifiers by structural constraints. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recogntion, San Francisco, 2010. 49--56. Google Scholar

[29] Zhang J M, Ma S G, Sclaroff S. MEEM: robust tracking via multiple experts using entropy minimization. In: Proceedings of European Conference on Computer Vision. Berlin: Springer, 2014. 188--203. Google Scholar

[30] Bolme D S, Beveridge J R, Draper B A, et al. Visual object tracking using adaptive correlation filters. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 2544--2550. Google Scholar

[31] Henriques J F, Caseiro R, Martins P, et al. Exploiting the circulant structure of tracking-by-detection with kernels. In: Proceedings of the 12th European Conference on Computer Vision, Florence, 2012. 702--715. Google Scholar

[32] Henriques J F, Caseiro R, Martins P, et al. High-Speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intel, 2015, 37: 583--596. Google Scholar

[33] Liu T, Wang G, Yang Q, Real-time part-based visual tracking via adaptive correlation filters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 4902--4912. Google Scholar

[34] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436--444. Google Scholar

[35] Wang N Y, Yeung D Y. Learning a deep compact image representation for visual tracking. In: Proceedings of Advances in Neural Information Processing Systems, Lake Tahoe, 2013. 809--817. Google Scholar

[36] Hong S, You T, Kwak S, et al. Online tracking by learning discriminative saliency map with convolutional neural network. In: Proceedings of the 32nd International Conference on Machine Learning, Lille, 2015. 597--606. Google Scholar

[37] Wang L, Ouyang W, Wang X, et al. Visual tracking with fully convolutional networks. In: Proceedings of IEEE International Conference on Computer Vision, Santiago, 2015. 3119--3127. Google Scholar

[38] Li H X, Li Y, Porikli F. Deeptrack: learning discriminative feature representations by convolutional neural networks for visual tracking. IEEE Trans Image Process, 2016, 25: 1834--1848. Google Scholar

[39] Ma C, Huang J B, Yang X K, et al. Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Santiago, 2015. 3074--3082. Google Scholar

[40] Cui Z, Xiao S T, Feng J S, et al. Recurrently target-attending tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 1449--1458. Google Scholar

[41] Bertinetto L, Valmadre J, Henriques J F, et al. Fully-convolutional siamese networks for object tracking. In: Proceeding of European Conference on Computer Vision. Berlin: Springer, 2016. 850--865. Google Scholar

[42] Tao R, Efstratios G, Smeulders A W M. Siamese instance search for tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 1420--1429. Google Scholar

[43] Parikh D, Grauman K. Relative attributes. In: Proceedings of International Conference on Computer Vision, Barcelona, 2011. 503--510. Google Scholar

[44] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, 2012. 1097--1105. Google Scholar

[45] Jia Y, Shelhamer E, Donahue J, et al. Caffe: convolutional architecture for fast feature embedding. In: Proceedings of ACM Conference on Multimedia, Orlando, 2014. 675--678. Google Scholar

[46] Matthew D Z, Rob F. Visualizing and understanding convolutional networks. In: Proceedings of European Conference on Computer Vision. Berlin: Springer, 2014. 818--833. Google Scholar

[47] Everingham M, Van G L, Williams C K, et al. The pascal visual object classes challenge. Int J Comput Vision, 2010, 88: 303--338. Google Scholar

[48] Cehovin L, Leonardis A, Kristan M. Visual object tracking performance measures revisited. IEEE Trans Image Process, 2016, 25: 1261--1274. Google Scholar

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1