SCIENCE CHINA Information Sciences, Volume 60, Issue 7: 072104(2017) https://doi.org/10.1007/s11432-015-1014-7

Personalized gesture interactions for cyber-physical smart-home environments

More info
  • ReceivedMar 16, 2016
  • AcceptedApr 22, 2016
  • PublishedOct 13, 2016


A gesture-based interaction system for smart homes is a part of a complex cyber-physical environment, for which researchers and developers need to address major challenges in providing personalized gesture interactions. However, current research efforts have not tackled the problem of personalized gesture recognition that often involves user identification. To address this problem, we propose in this work a new event-driven service-oriented framework called gesture services for cyber-physical environments (GS-CPE) that extends the architecture of our previous work gesture profile for web services (GPWS). To provide user identification functionality, GS-CPE introduces a two-phase cascading gesture password recognition algorithm for gesture-based user identification using a two-phase cascading classifier with the hidden Markov model and the Golden Section Search, which achieves an accuracy rate of 96.2% with a small training dataset. To support personalized gesture interaction, an enhanced version of the Dynamic Time Warping algorithm with multiple gestural input sources and dynamic template adaptation support is implemented. Our experimental results demonstrate the performance of the algorithm can achieve an average accuracy rate of 98.5% in practical scenarios. Comparison results reveal that GS-CPE has faster response time and higher accuracy rate than other gesture interaction systems designed for smart-home environments.


This work was supported by National High Technology Research and Development Program of China (Grant No. 2013AA01A210), State Key Laboratory of Software Development Environment (Grant No. SKLSDE-2013ZX-03), and National Natural Science Foundation of China (Grant No. 61532004). Vatavu also acknowledges support from the project “Integrated Center for Research, Development and Innovation in Advanced Materials, Nanotechnologies, and Distributed Systems for Fabrication and Control” (Grant No. 671/09.04.2015), Sectorial Operational Program for Increase of the Economic Competitiveness, co-funded from the European Regional Development Fund.


[1] Bernhaupt R, Obrist M, Weiss A, et al. Trends in the living room and beyond: results from ethnographic studies using creative and playful probing. ACM CIE, 2008, 6: 5. Google Scholar

[2] Panger G. Kinect in the kitchen: testing depth camera interactions in practical home environments. In: Proceedings of the CHI Extended Abstracts on Human Factors in Computing Systems. New York: ACM, 2012. 1985--1990. Google Scholar

[3] Pan G, Wu J, Zhang D. GeeAir: a universal multimodal remote control device for home appliances. Pers Ubiquit Comput, 2010, 14: 723-735 CrossRef Google Scholar

[4] Vatavu R D. Point & click mediated interactions for large home entertainment displays. Multimed Tools Appl, 2012, 59: 113-128 CrossRef Google Scholar

[5] Kühnel C, Westermann T, Hemmert F. I'm home: Defining and evaluating a gesture set for smart-home control. Int J Human-Comp Studies, 2011, 69: 693-704 CrossRef Google Scholar

[6] Vatavu R D. A comparative study of user-defined handheld vs. freehand gestures for home entertainment environments. J Ambient Intell Smart Environ, 2013, 5: 187--211. Google Scholar

[7] Vatavu R D, Zaiti I A. Leap gestures for TV: insights from an elicitation study. In: Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video. New York: ACM, 2014. 131--138. Google Scholar

[8] Li W, Lee Y H, Tsai W T. Service-oriented smart home applications: composition, code generation, deployment, and execution. SOCA, 2012, 6: 65-79 CrossRef Google Scholar

[9] Vatavu R D, Chera C M, Tsai W T. Gesture profile for web services: an event-driven architecture to support gestural interfaces for smart environments. In: Ambient Intelligence. Berlin: Springer-Verlag, 2012. 161--176. Google Scholar

[10] Lou Y H, Wu W J. A real-time personalized gesture interaction system using Wii remote and Kinect for tiled-display environment. In: Proceedings of the International Conference on Software Engineering and Knowledge Engineering. Skokie: KSI, 2013. 131--136. Google Scholar

[11] Zhang H K, Wu W J, Lou Y H. A personalized gesture interaction system with user identification using Kinect. In: PRICAI 2014: Trends in Artificial Intelligence. Berlin: Springer, 2014. 614--626. Google Scholar

[12] Vatavu R D. User-defined gestures for free-hand TV control. In: Proceedings of the 10th European Conference on Interactive TV and Video. New York: ACM, 2012. 45--48. Google Scholar

[13] Wobbrock J O, Morris M R, Wilson A D, et al. User-defined gestures for surface computing. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. New York: ACM, 2009. 1083--1092. Google Scholar

[14] Vatavu R D, Wobbrock J O. Formalizing agreement analysis for elicitation studies: new measures, significance test, and toolkit. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. New York: ACM, 2015. 1325--1334. Google Scholar

[15] Wobbrock J O, Aung H H, Brandon R, et al. Maximizing the guessability of symbolic input. In: Proceedings of the CHI Extended Abstracts on Human Factors in Computing Systems. New York: ACM, 2005. 1869--1872. Google Scholar

[16] Lou Y H, Yao T, Chen Y Q, et al. A novel scheme of ROI detection and transcoding for mobile devices in high-definition videoconferencing. In: Proceedings of the 5th Workshop on Mobile Video. New York: ACM, 2013. 31--36. Google Scholar

[17] Wang Y W, Yang C, Wu X, et al. Kinect based dynamic hand gesture recognition algorithm research. In: Proceedings of the 4th International Conference on Intelligent Human-Machine Systems and Cybernetics, Nanchang, 2012. 274--279. Google Scholar

[18] Zhu H M, Pun C M. Real-time hand gesture recognition form depth image sequences. In: Proceedings of the 9th International Conference on Computer Graphics, Imaging and Visualization, Hsinchu, 2012. 49--52. Google Scholar

[19] Moni M A, Shawkat Ali A B M. HMM based hand gesture recognition: a review on techniques and approaches. In: Proceedings of the 2nd IEEE International Conference on Computer Science and Information Technology, Beijing, 2009. 433--437. Google Scholar

[20] Kiefer J. Sequential minimax search for a maximum. Proc Amer Math Soc, 1953, 4: 502-502 CrossRef Google Scholar

[21] Vatavu R D, Anthony L, Wobbrock J O. Gestures as point clouds: a P recognizer for user interface prototypes. In: Proceedings of the 14th ACM International Conference on Multimodal Interaction. New York: ACM, 2012. 273--280. Google Scholar

[22] Myers C S, Rabiner L R. A Comparative Study of Several Dynamic Time-Warping Algorithms for Connected-Word Recognition. Bell Syst Technical J, 1981, 60: 1389-1409 CrossRef Google Scholar

[23] Wobbrock J O, Wilson A D, Li Y. Gestures without libraries, toolkits or training: a 1 recognizer for user interface prototypes. In: Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology. New York: ACM, 2007. 159--168. Google Scholar

[24] Carmona J M, Climent J. A performance evaluation of HMM and DTW for gesture recognition. In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Berlin: Springer-Verlag, 2012. 236--243. Google Scholar

[25] Liu J, Zhong L, Wickramasuriya J. uWave: Accelerometer-based personalized gesture recognition and its applications. Pervasive Mobile Computing, 2009, 5: 657-675 CrossRef Google Scholar

[26] Reyes M, Dominguez G, Escalera S. Feature weighting in dynamic time warping for gesture recognition in depth data. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. Piscataway: IEEE, 2011. 1182--1188. Google Scholar

[27] Liu X, Mu Y, Zhang D. Large-Scale Unsupervised Hashing with Shared Structure Learning. IEEE Trans Cybern, 2015, 45: 1811-1822 CrossRef PubMed Google Scholar

[28] Liu X, Deng C, Lang B, et al. Query-adaptive reciprocal hash tables for nearest neighbor search. IEEE Trans Image Process, 2015, 25: 907--919. Google Scholar

[29] Chen M Y, AlRegib G, Juang B H. 6DMG: a new 6D motion gesture database. In: Proceedings of the 3rd Multimedia Systems Conference. New York: ACM, 2012. 83--88. Google Scholar

[30] Mitra S, Acharya T. Gesture Recognition: A Survey. IEEE Trans Syst Man Cybern C, 2007, 37: 311-324 CrossRef Google Scholar

[31] Ruiz J, Li Y, Lank E. User-defined motion gestures for mobile interaction. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. New York: ACM, 2011. 197--206. Google Scholar

[32] Schlömer T, Poppinga B, Henze N, et al. Gesture recognition with a Wii controller. In: Proceedings of the 2nd International Conference on Tangible and Embedded Interaction. New York: ACM, 2008. 11--14. Google Scholar

[33] Vatavu R D. Nomadic gestures: a technique for reusing gesture commands for frequent ambient interactions. J Ambient Intell Smart Environ, 2012, 4: 79--93. Google Scholar

[34] Pu Q F, Gupta S, Gollakota S, et al. Whole-home gesture recognition using wireless signals. In: Proceedings of the 19th Annual International Conference on Mobile Computing & Networking. New York: ACM, 2013. 27--38. Google Scholar

[35] van Seghbroeck G, Verstichel S, de Truck F, et al. WS-Gesture: a gesture-based state-aware control framework. In: Proceedings of the IEEE International Conference on Service-Oriented Computing and Applications. Piscataway: IEEE, 2010. 1--8. Google Scholar

[36] Zheng Y W, Sheng H, Zhang B C, et al. Weight-based sparse coding for multi-shot person re-identification. Sci China Inf Sci, 2015, 58: 100104. Google Scholar

[37] Hayashi E, Maas M, Hong J I. Wave to me: user identification using body lengths and natural gestures. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. New York: ACM, 2014. 3453--3462. Google Scholar

[38] Wang D, Xiong Z, Zhang M. An application oriented and shape feature based multi-touch gesture description and recognition method. Multimed Tools Appl, 2012, 58: 497-519 CrossRef Google Scholar

[39] Chera C M, Tsai W T, Vatavu R D. Gesture ontology for informing Service-oriented architecture. In: Proceedings of IEEE International Symposium on Intelligent Control. Piscataway: IEEE, 2012. 1184--1189. Google Scholar

  • Figure 1

    GS-CPE framework. (a) Architecture; (b) services and events.

  • Figure 2

    Typical workflow of GS-CPE.

  • Figure 3

    The direction quantization scheme, examples of trajectory sequences and likelihood comparison. (a) Quantization scheme; (b) trajectory of “6”; (c) trajectory of “4”; (d) comparison of second maximum likelihood ratio.

  • Figure 4

    The relationship between indicators and TP.

  • Figure 5

    The experiment results of HMM-GSS algorithm. (a) Average win/lost/profit; (b) average accuracy comparison.

  • Figure 6

    Theoretical and practical performance of MS-DTW. (a) Theoretical accuracy comparison; (b) accuracy increment by multiple-source; (c) theoretical rejection comparison; (d) practical accuracy comparison w/ and w/o template adaptation; (e) practical rejection comparison w/ and w/o template adaptation.


    Algorithm 1 The HMM-GSS algorithm

    Require:Input trajectory $g$, threshold value $\epsilon$, GSS standard template set $\mathbb{ST}$;


    $H_g, \bf{lr}^g\leftarrow$ Classification result and likelihood ratio vector of input gesture $g$ from the HMM classifier;

    if $\mathrm{secmax}( \bf{lr}^g)<\epsilon$ then

    ${\rm Label} \Leftarrow H_g$;


    $\boldsymbol{R}=\{h|h\in\boldsymbol{H}\land\frac{l_h^g}{{\rm max}({\boldsymbol l}^g)}\ge\epsilon\}$;

    $ \bf{GT}=\{t_i|t_i\in \bf{ST}\land{i}\in\boldsymbol{R}\}$;

    ${\rm Label} \Leftarrow S_g$, which is the classification result of input gesture $g$ from the GSS classifier according to $\mathbb{ST}$;

    end if


    Algorithm 2 The multiple-source DTW (MS-DTW) algorithm

    Require:$\mathbb{T}^u, \mathbb{G}$;


    Initialize $ \bf{Da}=\{{\rm Da}_k^i={\rm DTW}( \bf{Ga}, \bf{ta}_k^i )\}, \bf{Dp}=\{{\rm Dp}_k^i={\rm DTW}( \bf{Gp}, \bf{tp}_k^i)\}, \bf{Tr}_i\leftarrow\phi$, where $i\in[1,n],k\in[1,l]$;

    //The template matching process;

    if using the K-nearest neighbor criterion or nearest neighbor criterion then

    $ \bf{da}\leftarrow\{{\rm da}_i|i\in[1,K]\}$ where ${\rm da}_i$ belongs to the minimal K values in $ \bf{Da}$;

    $ \bf{dp}\leftarrow\{{\rm dp}_i|i\in[1,K]\}$ where ${\rm dp}_i$ belongs to the minimal K values in $ \bf{Dp}$;

    ${\rm Label}_a\leftarrow$ The majority class in $ \bf{da}$;

    ${\rm Label}_p\leftarrow$ The majority class in $ \bf{dp}$;


    $ \bf{da}\leftarrow\{{\rm da}_i=\frac{1}{l}\sum_{k=1}^l{{\rm Da}_k^i}|i\in[1,n]\land{{\rm Da}_k^i}\in \bf{Da}\}$;

    $ \bf{dp}\leftarrow\{{\rm dp}_i=\frac{1}{l}\sum_{k=1}^l{{\rm Dp}_k^i}|i\in[1,n]\land{{\rm Dp}_k^i}\in \bf{Dp}\}$;

    ${\rm Label}_a\leftarrow \arg\!\min_{i\in[1,n]}\{{\rm da}_1,{\rm da}_2,\ldots,{\rm da}_n\}$;

    ${\rm Label}_p\leftarrow \arg\!\min_{i\in[1,n]}\{{\rm dp}_1,{\rm dp}_2,\ldots,{\rm dp}_n\}$;

    end if

    //The rejection determination process;

    if ${\rm Label}_a={\rm Label}_p$ then

    $L\leftarrow {\rm Label}_a$;



    end if

    //The dynamic template adaptation process;

    if $L \ne 0$ then

    $ \bf{Tr}_L\leftarrow \bf{Tr}_L\cup\{\mathbb{G}\}, {\rm rc}_L\leftarrow0$;


    if received correct label $g$ and $| \bf{Tr}_g|>0$ then

    Update $\boldsymbol{T}_g^u$ using $ \bf{Tr}_g\cup\boldsymbol{T}_g^u$ through the same way of selecting initial standard templates;

    $ \bf{Tr}_g\leftarrow\phi$;

    end if

    end if

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有