Abstract
Sign language is the visual language of deaf people. It is also natural language, different in form from spoken language. To resolve a communication barrier between hearing people and deaf, several researches for automatic sign language recognition (ASLR) system are now under way. However, existing research of ASLR deals with only small vocabulary. It is also limited in the environmental conditions and the use of equipment. In addition, compared with the research field of speech recognition, there is no large scale sign database for various reasons. One of the major reasons is that there is no official writing system for Japanese sign Language (JSL). In such a situation, we focused on the use of the knowledge of phonology of JSL and dictionary, in order to develop a develop a real-time JSL sign recognition system. The dictionary consists of over 2,000 JSL sign, each sign defined as three types of phonological elements in JSL: hand shape, motion, and position. Thanks to the use of the dictionary, JSL sign models are represented by the combination of these elements. It also can respond to the expansion of a new sign. Our system employs Kinect v2 sensor to obtain sign features such as hand shape, position, and motion. Depth sensor enables real-time processing and robustness against environmental changes. In general, recognition of hand shape is not easy in the field of ASLR due to the complexity of hand shape. In our research, we apply a contour-based method to hand shape recognition. To recognize hand motion and position, we adopted statistical models such as Hidden Markov models (HMMs) and Gaussian mixture models (GMMs). To address the problem of lack of database, our method utilizes the pseudo motion and hand shape data. We conduct experiments to recognize 223 JSL sign targeted professional sign language interpreters.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In general, sign is represented by combinations of posture or movement of the hands and facial expressions such as eyes or month. These representations of sign are happen both sequentially and simultaneously. Communication between the hearing people and the deaf can be difficult, because the most of hearing people do not understand sign language. To resolve a communication problem between hearing people and deaf, projects for automatic sign language recognition (ASLR) system is now under way.
One of major problem of current ASLR system is performing small vocabulary. Corresponding to the unknown vocabulary is also important from the view of practical aspect. It is said that the number of JSL vocabulary is over 3,000. In addition, a new sign is introduced to adjust the situation. Obviously, it is inefficient to perform the recognition on an individual sign units.
From the point of view, we employ a database JSL dictionary and notation system proposed by Kimura [1]. Our system is based on three elements of sign language: hand motion, position, and pose. This study considers a hand pose recognition using depth data obtained from a single depth sensor. We apply the contour-based method proposed by Keogh [2] to hand pose recognition. This method recognizes a contour by means of discriminators learned from contours. To recognize hand motion and position, we adopted statistical models such as Hidden Markov models (HMMs) and Gaussian mixture models (GMMs). To address the problem of lack of database, our method utilizes the pseudo motion and hand shape data. We conduct experiments to recognize 233 JSL sign targeted professional sign language interpreters.
2 Overview of the System
An overview of our proposed system is shown in Fig. 1. The feature parameters of sign motion are captured by using Microsoft Kinect v2 sensor [3]. At first, time series of feature parameter is cut outed into moving segment. Second, the three phonological elements are recognized individually. Finally, the recognition result is determined by the weighted sum of each score of three elements. The recognition process of the hand pose and other two components employs depth data of the hand region and coordinates of joints, respectively.
We used JSL dictionary proposed by Kimura [1]. In this dictionary, hand poses are classified by several element as shown in Table 1. These elements are also illustrated in Fig. 2. Currently, the vocabulary of this dictionary is approximately 2,600.
3 Hand Pose Recognition
Several study on hand pose recognition using a technique of estimating the finger joints has been proposed [4, 5]. However, these methods still has difficulties when some fingers are invisible. This situation occurs frequently in sign language. We adopt the contour-based technique proposed by Keogh [2] in order to recognize hand pose. This technique is considered to be robust even when the finger is partially occluded. The details of the method is described below.
3.1 Feature Extraction
Hand shapes can be converted to distance vectors to form one-dimensional sequence. Figure 3 shows the procedure to extracting a distance vector from a hand image. At first, the center point of the hand region is determined by distance transform. Distance transform convert one pixel value of the binary image with the distance between the nearest zero value pixel. Next, each distance from the center point to every pixel on the contour is calculated. The distance vector represents a series of these distances.
3.2 Calculation of Distance
A distance D between two distance vectors \(P=\{p_{0},p_{1},\dots ,p_{n}\}\) and \(Q=\{q_{0},q_{1},\dots ,q_{n}\}\) is calculated according to the followings.
If the length of two distance vectors is different, dynamic time warping (DTW) should be used to adjust for size variations. To simplify, we adjust length of vector to be same in advance for computation time reason.
We can compare contours by calculating their distances or using classifiers generated from contours. These classifiers are called wedges. Wedges have maximum and minimum values at each point. If a contour is located inside a wedge, the distance is zero. The distance D between a wedge W (\(U=\{u_{0},u_{1},...,u_{n}\}\) is its top, \(L=\{l_{0},l_{1},...,l_{n}\}\) is its bottom) and a contour \(P=\{p_{0},p_{1},...,p_{n}\}\) is calculated by following equation.
3.3 Generate Wedges
Wedges are generated according to the following procedures.
-
1.
Extract feature parameter from depth data
-
2.
Calculate distances of all contours
-
3.
Unify two contours in ascending order of distances. The set of maximum and minimum values of merged contours become a wedge
-
4.
Repeat third process until the predecided number of wedges
The process of generating wedges are also illustrated in Fig. 4. We prepare various wedges to recognizing each hand type.
4 Sign Movement and Position Recognition
In this paper, HMMs are utilized to recognized hand movement using the feature parameter of hand position provided by the Kinect sensor. 3-dimensional hand position and its speed are used as feature parameter of HMMs. HMMs corresponding to the typical movement of sign are constructed from pseudo-training data. It can be omitted the cost of collecting the sign data.
The definition of the hand position is ambiguous in JSL. It is necessary to consider for the hand position recognition. In this paper, the particular position of the hand in sign is modeled by GMMs. 3-dimensional hand position are used as feature parameter of GMMs. GMMs corresponding to the typical position of sign are also trained from pseudo-training data.
5 Experiment and Results
We conducted experiments to recognize 223 JSL sign performed by two professional sign language interpreters. Experimental conditions are listed in Table 2. These sign are extracted from the basic JSL sign that corresponds to Japanese Sign Language Proficiency Test grade 5.
The recognition result is determined by the weighted sum of each score of three elements. Optimal weight parameter of three elements are determined by grid search in the training data.
The results of the hand shape recognition in the three data sets are shown in Fig. 5. From this result, recognition rate can be improved according to the number of wedge. Recognition results of sign and optimal weight are also listed in Table 3. The recognition rate for unknown sign which does not include training data was obtained 33.8 %. One of the reason recognition rate was low is fluctuations in the hand shape in real sign motion.
6 Conclusion
In this paper, we proposed a real-time Japanese sign language recognition system based on three elements of sign language: motion, position, and pose. This study examined hand pose recognition by means of contour-based method proposed by Keogh using depth images obtained from a single depth sensor. We conducted experiments on recognizing 24 hand poses from 223 typical Japanese sign. The recognition rate for an unknown sign which does not include training data was obtained 33.8 %. To increase recognition performance, we have to increasing the learning data of the wedge. Expansion of vocabulary can also be considered as a future work.
References
Kimura, T., Hara, D., Kanda, K., Morimoto, K.: Expansion of the system of JSL-Japanese electronic dictionary: an evaluation for the compound research system. In: Kurosu, M. (ed.) HCD 2011. LNCS, vol. 6776, pp. 407–416. Springer, Heidelberg (2011)
Keogh, E., Wei, L., Xi, X., Lee, S.H., Vlachos, M.: LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures. In: 32nd International Conference on Very Large Data Bases (VLDB2006), pp. 882–893 (2006)
Microsoft Kinect for Windows. http://kinectforwindows.org
Liang, H., Yuan, J., Thalmann, D.: Parsing the hand in depth images. IEEE Trans. Multimedia 16(5), 1241–1253 (2014)
Tang, D., Yu, T.H., Kim, T.K.: Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In: Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV), pp. 3224–3231 (2013)
Acknowledgement
This research was supported in part by Japan Society for the Promotion of Science KAKENHI (No. 25350666), and Toukai Foundation for Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Sako, S., Hatano, M., Kitamura, T. (2016). Real-Time Japanese Sign Language Recognition Based on Three Phonological Elements of Sign. In: Stephanidis, C. (eds) HCI International 2016 – Posters' Extended Abstracts. HCI 2016. Communications in Computer and Information Science, vol 618. Springer, Cham. https://doi.org/10.1007/978-3-319-40542-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-40542-1_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40541-4
Online ISBN: 978-3-319-40542-1
eBook Packages: Computer ScienceComputer Science (R0)