Abstract
This work targets real-time recognition of both static hand-poses and dynamic hand-gestures in a unified open-source framework. The developed solution enables natural and intuitive hand-pose recognition of American Sign Language (ASL), extending the recognition to ambiguous letters not challenged by previous work. While hand-pose recognition exploits techniques working on depth information using texture-based descriptors, gesture recognition evaluates hand trajectories in the depth stream using angular features and hidden Markov models (HMM). Although classifiers come already trained on ASL alphabet and 16 uni-stroke dynamic gestures, users are able to extend these default sets by adding their personalized poses and gestures. The accuracy and robustness of the recognition system have been evaluated using a publicly available database and across many users. The XKin open project is available online (Pedersoli, XKin libraries. https://github.com/fpeder/XKin, 2013) under FreeBSD License for researchers in human–machine interaction.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
3Gear systems: Gestural user interfaces. http://www.threegear.com/ (2013)
American sign language. http://en.wikipedia.org/wiki/AmericanLanguage (2013)
Gibson Hasbrouck & Associates. http://www.gha-pd.com/ (2013)
Biswas, K., Basu, S.: Gesture recognition using Microsoft Kinect\(^{\rm TM}\). In: 2011 5th International Conference on Automation, Robotics and Applications (ICARA), pp. 100–103 (2011). doi:10.1109/ICARA.2011.6144864
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002). doi:10.1109/34.1000236
Daugman, J.G.: Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Opt. Soc. Am. A Opt. Image Sci. Vis. 2(7), 1160–1169 (1985)
Doliotis, P., Athitsos, V., Kosmopoulos, D.I., Perantonis, S.J.: Hand shape and 3d pose estimation using depth data from a single cluttered frame. In: Bebis, G., Boyle, R., Parvin, B., Koracin, D., Fowlkes, C., Wang, S., Choi, M.H., Mantler, S., Schulze, J.P., Acevedo, D., Mueller, K., Papka, M.E. (eds.) International Symposium on Visual Computing (ISVC). Springer, Springer (2012)
Doliotis, P., Stefan, A., McMurrough, C., Eckhard, D., Athitsos, V.: Comparing gesture recognition accuracy using color and depth information. In: Proceedings of the 4th International Conference on Pervasive Technologies Related to Assistive Environments, PETRA ’11, pp. 20:1–20:7. ACM (2011). doi:10.1145/2141622.2141647
Escalera, S., Gonzlez, J., Bar, X., Reyes, M., Lopes, O., Guyon, I., Athitsos, V., Escalante, H.: Multi-modal gesture recognition challenge 2013: Dataset and results. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, p. 445452 (2013). http://dl.acm.org/citation.cfm?id=2532595
Guyon, I., Athitsos, V., Jangyodsuk, P., Hamner, B., Escalante, H.: Chalearn gesture challenge: Design and first results. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–6 (2012) doi:10.1109/CVPRW.2012.6239178
Keskin, C., Kirac, F., Kara, Y., Akarun, L.: Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: ECCV12, pp. VI: 852–863 (2012)
Le, T.L., Nguyen, V.N., Tran, T.T.H., Nguyen, V.T., Nguyen, T.T.: Temporal gesture segmentation for recognition. In: 2013 International Conference on Computing, Management and Telecommunications (ComManTel), pp. 369–373 (2013). doi:10.1109/ComManTel.6482422
Li, Y.: Hand gesture recognition using Kinect. In: 2012 IEEE 3rd International Conference on Software Engineering and Service Science (ICSESS), pp. 196–199 (2012). doi:10.1109/ICSESS.2012.6269439
Liang, H., Yuan, J., Thalmann, D., Zhang, Z.: Model-based hand pose estimation via spatial-temporal hand parsing and 3D fingertip localization. Vis. Comput. 29(6–8), 837–848 (2013). doi:10.1007/s00371-013-0822-4
Liddel, S., Johnson, R.E.: American sign language—compound formation processes, lexicalization, and phonological remnants. Nat. Lang. Ling. Theory 4, 445–513 (1986)
Microsoft Kinect for Windows. http://www.microsoft.com/en-us/kinectforwindows (2013)
Mihail, R.P., Jacobs, N., Goldsmith, J.: Real time gesture recognition with 2 Kinect sensors. In: International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV) (2012)
Myers, C.S., Rabiner, L.R.: Comparative Study of several dynamic time-warping algorithms for connected-word recognition. Bell Syst. Tech. J. 60(7) (1981)
Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3D tracking of hand articulations using Kinect. In: British Machine Vision Conference, pp. 101.1-101.11. British Machine Vision Association (2011). doi:10.5244/C.25.101
Libfreenect. http://openkinect.org (2013)
Standard framework for 3D sensing. http://www.openni.org (2013)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Pedersoli, F.: XKin libraries. https://github.com/fpeder/XKin (2013)
Pedersoli, F., Adami, N., Benini, S., Leonardi, R.: XKin - eXtendable hand pose and gesture recognition library for Kinect. In: Proceedings of ACM Conference on Multimedia 2012—Open Source Competition, Nara (2012)
Peris, M., Fukui, K.: Both-hand gesture recognition based on komsm with volume subspaces for robot teleoperation. In: IEEE-Cyber (2012)
PrimeSense: NiTE. http://www.primesense.com/nite (2013)
PrimeSense: sensing and natural interaction. http://www.primesense.com (2013)
Pugeault, N., Bowden, R.: Spelling it out: real-time asl fingerspelling recognition. IEEE International Conference on Computer Vision Workshops, ICCV, vol. 2011, pp. 1114–1119 (2011)
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2), 257–286 (1989). doi:10.1109/5.18626
Ren, Z., Meng, J., Yuan, J., Zhang, Z.: Robust hand gesture recognition with Kinect sensor. In: Proceedings of the 19th ACM international conference on Multimedia, MM ’11, pp. 759–760. ACM (2011). doi:10.1145/2072298.2072443
Ren, Z., Yuan, J., Meng, J., Zhang, Z.: Robust part-based hand gesture recognition using kinect sensor. IEEE Trans. Multimedia 15(5), 1110–1120 (2013). doi:10.1109/TMM.2013.2246148
Robot Operating System. http://www.ros.org/wiki/ (2013)
Rubine, D.: Specifying gestures by example. SIGGRAPH Comput. Graph. 25(4), 329–337 (1991). doi:10.1145/127719.122753
\({\$}\)1 Unistroke Recognizer. http://depts.washington.edu/aimgroup/proj/dollar/ (2013)
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’11, pp. 1297–1304. IEEE Computer Society, Washington, DC, USA (2011). doi:10.1109/CVPR.2011.5995316
Uebersax, D., Gall, J., den Bergh, M.V., Gool, L.J.V.: Real-time sign language letter and word recognition from depth data. IEEE International Conference on Computer Vision Workshops, ICCV, vol. 2011, pp. 383–390 (2011)
Wachs, J.P., Kölsch, M., Stern, H., Edan, Y.: Vision-based hand-gesture applications. Commun. ACM 54(2), 60–71 (2011). doi:10.1145/1897816.1897838
Wan, T., Wang, Y., Li, J.: Hand gesture recognition system using depth data. In: Consumer Electronics, Communications and Networks (CECNet), 2012 2nd International Conference on, pp. 1063–1066 (2012). doi:10.1109/CECNet.6201837
Wang, R., Paris, S., Popović, J.: 6d hands: markerless hand-tracking for computer aided design. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, UIST ’11, pp. 549–558. ACM, New York (2011). doi:10.1145/2047196.2047269
Wobbrock, J.O., Wilson, A.D., Li, Y.: Gestures without libraries, toolkits or training: a \({\$}\)1 recognizer for user interface prototypes. In: Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology, UIST 07, pp. 159–168. ACM, New York (2007). doi:10.1145/1294211.1294238
Zhang, H.J., Kankanhalli, A., Smoliar, S.: Automatic partitioning of full-motion video. Multimedia Syst. 1(1), 10–28 (1993). doi:10.1007/BF01210504
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pedersoli, F., Benini, S., Adami, N. et al. XKin: an open source framework for hand pose and gesture recognition using kinect. Vis Comput 30, 1107–1122 (2014). https://doi.org/10.1007/s00371-014-0921-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-014-0921-x