SCIENTIA SINICA Informationis, Volume 48, Issue 12: 1697-1708(2018) https://doi.org/10.1360/N112018-00136

Hierarchical feature fusion hashing for near-duplicate video retrieval

More info
  • ReceivedAug 16, 2018
  • AcceptedOct 9, 2018
  • PublishedDec 12, 2018


In recent years, owing to the rapid growth of the number of videos on the Internet, near-duplicate video retrieval (NDVR) by video hashing has attracted huge attention. In the existing methods, the visual features of videos, including single feature and multiple visual feature fusion, are widely used in the NDVR algorithms. However, low-level visual features have some disadvantages in expressing high-level semantics, which may lead to low performance in NDVR. In this paper, we propose a video hashing method for NDVR based on hierarchical feature fusion to address this issue. In the proposed method, low-level handcrafted features from videos were first extracted; then, the intermediate-level deep features and high-level semantic features extracted from the convolutional neural network are obtained. Finally, these semantic features are combined with low-level visual features, where the global structural relationships and complementarity discovered among the hierarchical features are utilized to learn the hash code for NDVR. Extensive experiments are performed on the CC-WEB-VIDEO dataset; the proposed framework proves to have a better retrieval performance compared with the state-of-the-art approaches.

Funded by







[1] Song J, Yang Y, Huang Z. Effective Multiple Feature Hashing for Large-Scale Near-Duplicate Video Retrieval. IEEE Trans Multimedia, 2013, 15: 1997-2008 CrossRef Google Scholar

[2] Hao Y, Mu T, Hong R. Stochastic Multiview Hashing for Large-Scale Near-Duplicate Video Retrieval. IEEE Trans Multimedia, 2017, 19: 1-14 CrossRef Google Scholar

[3] Liu H, Zhao Q, Wang H. An image-based near-duplicate video retrieval and localization using improved Edit distance. Multimed Tools Appl, 2017, 76: 24435-24456 CrossRef Google Scholar

[4] Lv J, Wu B, Yang S, et al. Efficient large scale near-duplicate video detection base on spark. In: Proceedings of IEEE International Conference on Big Data, Washington, 2016. 957--962. Google Scholar

[5] Chou C L, Chen H T, Lee S Y. Pattern-Based Near-Duplicate Video Retrieval and Localization on Web-Scale Videos. IEEE Trans Multimedia, 2015, 17: 382-395 CrossRef Google Scholar

[6] Nie X S, Jing W Z, Ma L Y, et al. Two-layer video fingerprinting strategy for near-duplicate video detection. In: Proceedings of IEEE International Conference on Multimedia and Expo Workshops (ICMEW), Hong Kong, 2017. 555--560. Google Scholar

[7] Shen H T, Zhou X F, Huang Z, et al. UQLIPS: a real-time near-duplicate video clip detection system. In: Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna, 2007. 1374--1377. Google Scholar

[8] Wei S, Zhao Y, Zhu C. Frame Fusion for Video Copy Detection. IEEE Trans Circuits Syst Video Technol, 2011, 21: 15-28 CrossRef Google Scholar

[9] Zhao G, Pietikainen M. Dynamic texture recognition using local binary patterns with an application to facial expressions.. IEEE Trans Pattern Anal Mach Intell, 2007, 29: 915-928 CrossRef PubMed Google Scholar

[10] Kordopatis-Zilos G, Papadopoulos S, Patras I, et al. Near-duplicate video retrieval by aggregating intermediate cnn layers. In: Proceedings of International Conference on Multimedia Modeling, Cham, 2017. 251--263. Google Scholar

[11] Cai J J, Merler M, Pankanti S, et al. Heterogeneous semantic level features fusion for action recognition. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, Shanghai, 2015. 307--314. Google Scholar

[12] Nie X, Yin Y, Sun J. Comprehensive Feature-Based Robust Video Fingerprinting Using Tensor Model. IEEE Trans Multimedia, 2017, 19: 785-796 CrossRef Google Scholar

[13] Jiang M L, Tian Y H, Huang T J. Video copy detection using a soft cascade of multimodal features. In: Proceedings of International Conference on Multimedia and Expo, Melbourne, 2012. 374--379. Google Scholar

[14] Nie X S, Liu J, Sun J D. Robust video hashing based on representative-dispersive frames. Sci China Inf Sci, 2013, 56: 1-11 CrossRef Google Scholar

[15] Nie X, Chai Y, Liu J. Spherical torus-based video hashing for near-duplicate video detection. Sci China Inf Sci, 2016, 59: 059101 CrossRef Google Scholar

[16] Lin G S, Shen C H, van den Hengel A, et al. Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3194--3203. Google Scholar

[17] Liu Z W, Li X X, Luo P, et al. Semantic image segmentation via deep parsing network. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), Santiago, 2015. 1377--1385. Google Scholar

[18] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014,. arXiv Google Scholar

[19] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, Lake Tahoe, 2012. 1097--1105. Google Scholar

[20] Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 2015. Google Scholar

[21] Kan M, Shan S, Zhang H. Multi-View Discriminant Analysis.. IEEE Trans Pattern Anal Mach Intell, 2016, 38: 188-194 CrossRef PubMed Google Scholar

[22] Weiss Y, Torralba A, Fergus R. Spectral hashing. In: Proceedings of Advances in Neural Information Processing Systems, Vancouver, 2008. 1753--1760. Google Scholar

[23] Wu X, Hauptmann A G, Ngo C W. Practical elimination of near-duplicates from web video search. In: Proceedings of the 15th ACM International Conference on Multimedia, New York, 2007. 218--227. Google Scholar

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有