Abstract
The present day computers can outperform the human in many complicated tasks very precisely and efficiently. However, in many scenarios like pattern recognition and more importantly, character recognition; a school going child can outperform the sophisticated machines available today. The modern machines present today find handwritten, calligraphic text difficult to recognize because such texts hardly contain rationalized straight lines or perfect loops or circles. Therefore, most of the optical character recognition systems fail to recognize the characters beyond certain levels of distortions and noise. On the other hand, the human brain has achieved a remarkable ability to recognize visual patterns or characters in various distortion conditions with high speed. The present work tries to understand how human perceive, process and recognize the Devanagari characters under various distortion levels. In order to achieve this objective, eye tracking experiment was performed on 20 graduate participants by presenting stimuli in decreasing level of distortions (from highly distorted to more normal one). The eye fixation patterns along with the time course of recognition gave us the moment-to-moment processing involved in letter identification. Upon understanding the level of distortion acceptable for correct letter recognition and the processes involved in the identification of the letters, the OCR can be made more robust and the gap between human reading and machine reading can be narrowed down.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Optical character recognition is highly researched domain in the area of image processing and computer vision. The commercial OCR systems can be divided into four generations, depending on the robustness and efficiency [1]. The first generation OCR can be characterized by the constrained letter shapes readable to the OCR, whereas the second generation is characterized by the recognition capabilities of the set of regular machine printed characters. The third generation OCR is focused on the poorly printed characters and handwritten characters. OCR dealing with complex documents intermixing with the text, graphics, table and mathematical symbols, unconstrained handwritten characters, low quality noisy documents comes under the fourth generation OCR [1]. However, there are several instances where, if the characters are a bit distorted the OCR systems fail. In most of the situations, a school going kid can outperform the highly sophisticated OCR systems. Therefore, there is a need to understand how a person can recognize the letter irrespective of different variations.
There are some roots in the domain of psychology, cognitive science and neuroscience, which may help us to address the posed question. It has been rightly said that letter recognition is the foundation of human reading [2]. However, the attempts being made to understand the letter perception are very rare. In order to understand the nature of letter representation, two broad theories of visual recognition have been proposed viz. template matching and feature based approach. In the template matching, letter recognition is achieved by matching the letter stimulus to an internal template [3] and in feature based approach, visual features of the letters are extracted in the early stage of processing and then comparing those features with the list of features stored in memory, a letter is recognized [4]. It is hard to believe that we have stored all the features and templates of all the characters to be recognized. Therefore, we are trying to understand how the processing of the letter takes place under various distortion levels of letters and understand what features one considers while recognizing the letter.
There is a cognitive link between eye movement and brain. Therefore, eye tracking seems to be a promising technology to shed more light on visual letter processing. Rayner stated [5] that eye movement data reflects moment-to-moment cognitive processes in the various tasks. In the literature, we find various researchers had used eye tracking technology to understand language expertise [6], number of words read [7], type of document read [8] etc. However, no one has talked about a letter recognition or letter processing. Along with this there is hardly any attempt being made to understand the Devanagari script using eye tracking technology. Therefore, the proposed paper aims to identify the visual features and understand the visual processing of the Devanagari letters by human using eye tracking metrics.
This paper has been divided into four sections. Section 1 talks about the introduction; experimental setup has been discussed in Sect. 2. Section 3 highlights the results and discussions over the results. Concluding remarks with possible future extensions are presented in the last section.
2 Experimental Setup and Details
We had performed the eye tracking experiment on 20 healthy, graduate participants (7 females and 13 males) with normal or corrected to normal vision and with age ranging in between 20–30 years. These participants were frequent readers of the Devanagari script. In order to capture the eye movement, the Tobii T120 eye tracker was used which was a camera based, remote eye tracker with a sampling frequency of 120 Hz.
2.1 Stimulus Design
Initially, we grouped the characters based on the common elements as shown in Fig. 1. After that, we selected some characters showing high variations in their structure and the characters which are frequently used, for our experiment. Mangal font with point size 72 was used for designing the stimuli which were reduced to a single pixel contour through morphological operations. We were trying to incorporate the handwritten variations in the character and trying to understand how readers process the letter. Based on the variations which were commonly observed in the different parts of the handwritten letters, we divided the letter in different parts as shown in Fig. 2. Then each letter part was scaled in the range from 0.2 to 5. The stimuli were thus formed by combing the scaled letter parts with unchanged/unscaled part of the same letter. Thus, the letter formed was termed as distorted letter and was presented in black color against a white background. The prototype of the stimuli is shown in the Table 1.
Bhagwat’s group based on graphical similarity [9]
2.2 Experimental Procedure
All the equipment required for the experiment had been set properly in the experimental room. Before starting the experiment, participants were given all the instructions about the experiment and explained their exact role in the study. When the participants agreed to participate, they were asked to sign the consent form and the necessary details such as name, age, eyesight, first language etc. were recorded. Each participant was seated in front of the eye tracker keeping the distance of 60–70 cm. The experiment started with the welcome message on the screen, followed by instructions and trial. The experiment was conducted in three phases. In each phase, participants were presented 6–7 letters with 25 variations each. The experiment started with the calibration of the eye tracker, according to the participant’s eyes. In order to maintain good quality of eye tracking data, calibration was done in all phases separately and participants were allowed to take a break of 2 min after each phase if they wished to do so.
After successful calibration, actual experiment started. The stimuli were presented as different variations of the different letters chosen randomly, however the decreasing order of distortion was maintained i.e. high distortion of letter1, high distortion for letter2, lesser distortion of letter1 and so on. The participants were asked to press the arrow key on the keyboard as well as spell the letter if they recognized the letter. If they were not able to recognize the letter, they could skip the letter by pressing the space bar key. This procedure continued till the end of the phase. The total time spent on the letter for recognition lets us know where the participants had faced difficulty in recognition. The pressing of a key allowed us to get to know the level of distortion participants were able to tolerate and recognize the letter.
The eye tracking data was collected throughout all the phases. In order to get fixations and saccades from the raw eye tracking data as shown in Fig. 3. Velocity based classification algorithm called as Velocity-Threshold Identification (I-VT) fixation classification algorithm was used. The algorithm classified the eye movements into fixation and saccades based on the velocity of the directional shift of the eye [10].
3 Results and Discussions
The analysis of the eye tracking data was carried out based on the fixation duration and fixation count. The participants were allowed to spend certain time to recognize the letter and allowed to proceed further by skipping the letter if they were unable to recognize the letters. The key pressed and letter pronounced loudly corresponding to correct letter recognition enabled us to know the level of distortion each one was able to tolerate. The correct recognition response is plotted for each letter as shown in Figs. 4, 5 and 6.
From these responses, it could be observed that almost all the participants had successfully recognized the letters in less time i.e. with higher levels of distortion. On the other hand, the letters
were recognized a bit late by most of the participants. The recognition of the letters sharing same structural features such as
and
and
and
had occurred almost at the same time. There was a similarity in the eye tracking patterns of the participants and particular eye fixation pattern that had been observed among the participants as well. Maximum eye fixation duration on the character region implies the difficulty in understanding or encoding the information. Participants have to spend much more time to encode what exactly it represents. As there was a change in distortion level, there was an interesting change in response as well.
3.1 Delayed Letter Recognition
There had been many incorrect responses reported in the recognition of the letters sharing similar visual features. Therefore, the correct recognition of these letters got delayed and had occurred when the letter has lesser distortion level. This can be observed from Figs. 4, 5 and 6. This was due to the different way of processing the letters. Most of the participants reported as
as
as
as
etc. at a higher level of distortion. As the letter was getting normal, the participants fixated on different locations and then recognized the letter correctly. We have created the heat map using number of fixations and fixation duration as shown in Fig. 7. The variation in the fixation duration and number of fixations has enabled us to understand how the letter was processed and how the fixation pattern changed which had enabled the participant to recognize the letter correctly. The particular observations are tabulated in Table 2. The pronouncing the letter had given us the exact idea about what the participant had recognized and it had provided us the cross check whether the recognition was correct or not.
4 Conclusions
The proposed research unfolds insights about how Devanagari letters are processed by the readers. The eye tracking seems to be the promising technique to understand the moment-to-moment processing of the letter. In this work, we have incorporated the maximum variations that are generally seen in the handwritten characters and subsequently recorded readers’ behavior using eye tracking for these characters. The results demonstrate that the maximum attention of the reader is along the curves, knots, loops and contacts with the headline. There is also a change in fixation patterns along with various distortion levels. This peculiar eye movement behavior might have provided some crucial visual cues to the participant for efficient recognition. In our future work, all these demonstrated visual cues would be used to build the smart OCR model for robust character recognition.
References
Pal, U., Chaudhuri, B.B.: Indian script character recognition: a survey. Patt. Recogn. 37(9), 1887–1899 (2004)
Chang, Y.N., Furber, S., Welbourne, S.: Modelling normal and impaired letter recognition: implications for understanding pure alexic reading. Neuropsychologia 50(12), 2773–2788 (2012)
Neisser, U.: Cognitive Psychology. Appleton-Century-Crofts, New York (1967)
Gibson, E.L.: Principles of Perceptual Learning and Development. Appleton-Century-Crofts, New York (1969)
Rayner, K.: Eye movements in reading and information processing: 20 years of research. Psychol. Bull. 124(3), 372 (1998)
Yoshimura, K., Kise, K., Kunze, K.: The eye as the window of the language ability: estimation of English skills by analyzing eye movement while reading documents. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 251–255. IEEE (2015)
Kunze, K., Kawaichi, H., Yoshimura, K., Kise, K.: The Wordometer–Estimating the number of words read using document image retrieval and mobile eye tracking. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 25–29. IEEE (2013)
Kunze, K., Utsumi, Y., Shiga, Y., Kise, K., Bulling, A.: I know what you are reading: recognition of document types using mobile eye tracking. In: Proceedings of the 2013 International Symposium on Wearable Computers, pp. 113–116. ACM (2013)
Bhagwat, S.V.: Phoniemic Frequencies in Marathi and Their Relation to Devising a Speed-script, vol. 1. Deccan College Post-graduate and Research Institute (1961)
Salvucci, D.D., Goldberg, J.H.: Identifying fixations and saccades in eye-tracking protocols. In: Proceedings of the 2000 Symposium on Eye Tracking Research and Applications, pp. 71–78. ACM (2000)
Acknowledgements
We would like to thank all the participants participated in our study. We would also like to thank the Ministry of Electronics and Information Technology Government of India and the Media Lab Asia for financial assistance to carry out research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ralekar, C., Gandhi, T.K., Chaudhury, S. (2017). Unlocking the Mechanism of Devanagari Letter Identification Using Eye Tracking. In: Shankar, B., Ghosh, K., Mandal, D., Ray, S., Zhang, D., Pal, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2017. Lecture Notes in Computer Science(), vol 10597. Springer, Cham. https://doi.org/10.1007/978-3-319-69900-4_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-69900-4_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69899-1
Online ISBN: 978-3-319-69900-4
eBook Packages: Computer ScienceComputer Science (R0)