Abstract
In recent years, the detection and recognition of text in natural images has become a very attractive and important subject for researchers. Many applications were developed for text detection and recognition and the majority of them are based on deep learning (DL) and augmented reality (AR). In this article, we propose a perfect solution based on both deep learning and augmented reality in order to make the text reading process more efficient, clear and safer. The system purpose is to help visually impaired people read a text from natural images. First of all, the user has to hover his smartphone’s camera over the image of the text present in his environment. Then, the system executes the detection and recognition module using the DL model. Finally, the system displays the associated graphical data augmented on the identified text on the screen of the smartphone using the AR method. AR method is used to improve the visualization of the detected and recognized word so that the user can read that text more efficiently. This mobile application has the highest-level visual features to improve the reading process of the detected and recognized text. To validate the system performance, the application is tested on a group of people who answer a questionnaire that reflects their experience with our proposed approach. In addition, user study test is performed to test user friendliness and satisfaction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ali, A., Pickering, M., Shafi, K.: rdu natural scene character recognition using convolutional neural networks. In: 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), pp. 29–34. IEEE (2018)
Ardian, Z., Santoso, P.I., Hantono, B.S.: Argot: text-based detection systems in real time using augmented reality for media translator aceh-indonesia with android-based smartphones. J. Phys. Conf. Ser. 1019, 012074 (2018)
Baek,Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9365–9374 (2019)
Bhatt, P., Panchal, K., Patel, H., Rote, U.: Tourism application using augmented reality. Available at SSRN 3568709 (2020)
Huang, Z., Zhong, Z., Sun, L., Huo, Q.: Mask R-CNN with pyramid attention network for scene text detection. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 764–772. IEEE (2019)
Liu, X., Zhou, G., Zhang, R., Wei, X.: An accurate segmentation-based scene text detector with context attention and repulsive text border. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 550–551 (2020)
Lundgren, A., Castro, D., Lima, E., Bezerra, B.: OctShuffleMLT: a compact octave based neural network for end-to-end multilingual text detection and recognition. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 4, pp. 37–42. IEEE (2019)
Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 67–83 (2018)
Mansoor, K., Olson, C.F.: Recognizing text with a CNN. In: 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ), pp. 1–6. IEEE (2019)
Ouali, I., Ghozzi, F., Taktak, R., Sassi, M.S.H.: Ontology alignment using stable matching. Procedia Comput. Sci. 159, 746–755 (2019)
Ouali, I., Sassi, M.S.H., Halima, M.B., Ali, W.: A new architecture based AR for detection and recognition of objects and text to enhance navigation of visually impaired people. Procedia Comput. Sci. 176, 602–611 (2020)
Ouali, I., Hadj Sassi, M.S., Ben Halima, M., Wali, A.: Architecture for real-time visualizing arabic words with diacritics using augmented reality for visually impaired people. In: Barolli, L., Woungang, I., Enokido, T. (eds.) AINA 2021. LNNS, vol. 225, pp. 285–296. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75100-5_25
Ouertani, H.C., Tatwany, L.: Augmented reality based mobile application for real-time arabic language translation. Commun. Sci. Technol. 4(1), 30–37 (2019)
Pu, M., Majid, N., Idrus, B.: Framework based on mobile augmented reality for translating food menu in Thai language to Malay language. Int. J. Adv. Sci. Engl. Inf. Technol. 7, 153–159 (2017)
Qin, S., Ren, P., Kim, S., Manduchi, R.: Robust and accurate text stroke segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 242–250. IEEE (2018)
Qin, X., Zhou, Y., Guo, Y., Wu, D., Wang, W.: Fc2rn: a fully convolutional corner refinement network for accurate multi-oriented scene text detection. arXiv preprint arXiv:2007.05113 (2020)
Sassi, M.S.H., Jedidi, F.G., Fourati, L.C.: A new architecture for cognitive internet of things and big data. Procedia Comput. Sci. 159, 534–543 (2019)
Saudagar, A.K.J., Mohammad, H.: Augmented reality mobile application for arabic text extraction, recognition and translation. J. Stat. Manag. Syst. 21(4), 617–629 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Syahidi, A.A., Tolle, H., Supianto, A.A., Arai, K.: Bandoar: real-time text based detection system using augmented reality for media translator Banjar language to Indonesian with smartphone. In: 2018 IEEE 5th International Conference on Engineering Technologies and Applied Sciences (ICETAS), pp. 1–6. IEEE (2018)
Tang, Y., Wu, X.: Scene text detection using superpixel-based stroke feature transform and deep learning based region classification. IEEE Trans. Multimedia 20(9), 2276–2288 (2018)
Wang, X., Jiang, Y., Luo, Z., Liu, C.-L., Choi, H., Kim, S.: Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6449–6458 (2019)
Wang, Y., Xie, H., Fu, Z., Zhang, Y.: DSRN: a deep scale relationship network for scene text detection. In: IJCAI, pp. 947–953 (2019)
Xu, Y., Wang, Y., Zhou, W., Wang, Y., Yang, Z., Bai, X.: Textfield: learning a deep direction field for irregular scene text detection. IEEE Trans. Image Process. 28(11), 5566–5579 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ouali, I., Halima, M.B., Wali, A. (2022). Text Detection and Recognition Using Augmented Reality and Deep Learning. In: Barolli, L., Hussain, F., Enokido, T. (eds) Advanced Information Networking and Applications. AINA 2022. Lecture Notes in Networks and Systems, vol 449. Springer, Cham. https://doi.org/10.1007/978-3-030-99584-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-99584-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99583-6
Online ISBN: 978-3-030-99584-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)