logo

SCIENCE CHINA Information Sciences, Volume 59, Issue 7: 072103(2016) https://doi.org/10.1007/s11432-015-5494-4

High-level representation sketch for video event retrieval

More info
  • ReceivedOct 22, 2015
  • AcceptedDec 29, 2015
  • PublishedJun 15, 2016

Abstract

Representing video events is an essential step for a wide range of visual applications. In this paper, we propose the event sketch, a high-level event representation, to depict the dynamic properties of video events composed of actions of semantic objects. We show that this representation can facilitate a novel sketch based video retrieval (SBVR) system, which has not been considered before to the best of our knowledge. In this system, users are allowed to draw the evolutions (e.g. spatiotemporal layouts and behaviors of semantic objects) on a board, and retrieve the events whose semantic objects have the similar evolutions from a database. To do this, event sketches are constructed on both the user queries and database videos, and compared under a novel graph-matching scheme based on data-driven Monta Carlo Markov chain (DDMCMC). To test our approach, we collect a novel dataset of goal events in real soccer videos, which consists actions of multiple players and shows large variability in the evolution process of the events. Experiments on this dataset and the publicly available dataset CAVIAR demonstrated the effectiveness of the proposed approach.


Acknowledgment

Acknowledgments

This work was partially supported by National Natural Science Foundation of China (Grant Nos. 61532003, 61325011, 61421003), National High Technology Research and Development Program of China (Grant No. 2013AA013801), and Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20131102130002).


References

[1] Yuan J, Zha Z J, Zheng Y T, et al. Learning concept bundles for video search with complex queries. In: Proceedings of International Conference on Multimedia, Scottsdale, 2011. 453--462. Google Scholar

[2] Bao L, Cao J, Zhang Y, et al. Explicit and implicit concept-based video retrieval with bipartite graph propagation model. In: Proceedings of International Conference on Multimedia, Firenze, 2010. 939--942. Google Scholar

[3] Ulges A, Schulze C, Koch M, et al. Learning automatic concept detectors from online video. Comput Vis Image Underst, 2010, 114: 429-438 CrossRef Google Scholar

[4] Hu R, Collomosse J. Motion-sketch based video retrieval using a trellis levenshtein distance. In: Proceedings of International Conference on Pattern Recognition, Istanbul, 2010. 121--124. Google Scholar

[5] Collomosse J P, McNeill G, Qian Y. Storyboard sketches for content based video retrieval. In: Proceedings of International Conference on Computer Vision, Kyoto, 2009. 245--252. Google Scholar

[6] Hu R, James S, Collomosse J. Annotated free-hand sketches for video retrieval using object semantics and motion. In: Proceedings of the 18th International Conference on Advances in Multimedia Modeling. Berlin: Springer, 2012. 473--484. Google Scholar

[7] Hu R, James S, Wang T, et al. Markov random fields for sketch based video retrieval. In: Proceedings of International Conference on Multimedia Retrieval, Dallas, 2013. 279--286. Google Scholar

[8] Zhou R, Chen L, Zhang L. Sketch-based image retrieval on a large scale database. In: Proceedings of International Conference on Multimedia, Nara, 2012. 973--976. Google Scholar

[9] Eitz M, Hildebrand K, Boubekeur T, et al. Sketch-based image retrieval: benchmark and bag-of-features descriptors. IEEE Trans Vis Comput Graph, 2011, 17: 1624-1636 CrossRef Google Scholar

[10] Cao Y, Wang C, Zhang L, et al. Edgel index for large-scale sketch-based image search. In: Proceedings of International Conference on Computer Vision and Pattern Recognition, Colorado, 2011. 761--768. Google Scholar

[11] Lu D, Ma H, Fu H. Efficient Sketch-based 3D shape retrieval via view selection. In: Proceedings of Advances in Multimedia Information Processing--PCM, Nanjing, 2013. 396--407. Google Scholar

[12] Xu H, Wang J, Hua X S, et al. Interactive image search by 2D semantic map. In: Proceedings of International Conference on World Wide Web, Raleigh, 2010. 1321--1324. Google Scholar

[13] Yu G, Yuan J, Liu Z. Action search by example using randomized visual vocabularies. IEEE Trans Image Process, 2013, 22: 377-390 CrossRef Google Scholar

[14] Lan T, Wang Y, Mori G, et al. Retrieving actions in group contexts. In: Proceedings of the 11th European Conference on Trends and Topics in Computer Vision--Volume Part I. Berlin: Springer, 2012. 181--194. Google Scholar

[15] Ma X, Chen X, Khokhar A, et al. Motion trajectory-based video retrieval, classification, and summarization. In: Video Search and Mining. Berlin: Springer, 2010. 53--82. Google Scholar

[16] Cheng Z, Qin L, Huang Q, et al. Human group activity analysis with fusion of motion and appearance information. In: Proceedings of International Conference on Multimedia, Scottsdale, 2011. 1401--1404. Google Scholar

[17] Fisher M, Savva M, Hanrahan P. Characterizing structural relationships in scenes using graph kernels. ACM Trans Graph, 2011, 30: 34-390 Google Scholar

[18] Chang C C, Lin C J. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Tech, 2011, 2: 27-390 Google Scholar

[19] P{é}rez P, Hue C, Vermaak J, et al. Color-based probabilistic tracking. In: Proceedings of European Conference on Computer Vision, Copenhagen, 2002. 661--675. Google Scholar

[20] Tran D, Sorokin A. Human activity recognition with metric learning. In: Proceedings of European Conference on Computer Vision, Copenhagen, 2008. 548--561. Google Scholar

[21] Jiang K, Chen X, Zhang Y, et al. Video event representation and inference on and-or graph. Comput Animat Virtual Worlds, 2012, 23: 145-154 CrossRef Google Scholar

[22] Ribeiro P C, Santos-Victor J. Human activity recognition from video: modeling, feature selection and classification architecture. In: Proceedings of International Workshop on Human Activity Recognition and Modelling, Oxford, 2005. 61--78. Google Scholar

[23] Ben Shitrit H, Berclaz J, Fleuret F, et al. Tracking multiple people under global appearance constraints. In: Proceedings of International Conference on Computer Vision, Barcelona, 2011. 137--144. Google Scholar

[24] Xie Y, Chang H, Li Z, et al. A unified framework for locating and recognizing human actions. In: Proceedings of International Conference on Computer Vision and Pattern Recognition, Colorado, 2011. 25--32. Google Scholar

[25] Hua X-S, Qi G-J. Online multi-label active annotation: towards large-scale content-based video search. In: Proceedings of International Conference on Multimedia, Vancouver, 2008. 141--150. Google Scholar

[26] Ahn L-V, Dabbish L. Labeling images with a computer game. In: Processings of SIGCHI Conference on Human Factors in Computing Systems, Vienna, 2004. 319--326. Google Scholar

[27] Sorokin A, Forsyth D. Utility data annotation with amazon mechanical turk. In: Workshops of International Conference on Computer Vision and Pattern Recognition, Anchorage, 2008. 1--8. Google Scholar

[28] Lee J, Cho M, Lee K M. A graph matching algorithm using data-driven markov chain monte carlo sampling. In: Proceedings of International Conference on Pattern Recognition, Istanbul, 2010. 2816--2819. Google Scholar

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1