Text Detection and Recognition Using Augmented Reality and Deep Learning | SpringerLink
Skip to main content

Text Detection and Recognition Using Augmented Reality and Deep Learning

  • Conference paper
  • First Online:
Advanced Information Networking and Applications (AINA 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 449))

Abstract

In recent years, the detection and recognition of text in natural images has become a very attractive and important subject for researchers. Many applications were developed for text detection and recognition and the majority of them are based on deep learning (DL) and augmented reality (AR). In this article, we propose a perfect solution based on both deep learning and augmented reality in order to make the text reading process more efficient, clear and safer. The system purpose is to help visually impaired people read a text from natural images. First of all, the user has to hover his smartphone’s camera over the image of the text present in his environment. Then, the system executes the detection and recognition module using the DL model. Finally, the system displays the associated graphical data augmented on the identified text on the screen of the smartphone using the AR method. AR method is used to improve the visualization of the detected and recognized word so that the user can read that text more efficiently. This mobile application has the highest-level visual features to improve the reading process of the detected and recognized text. To validate the system performance, the application is tested on a group of people who answer a questionnaire that reflects their experience with our proposed approach. In addition, user study test is performed to test user friendliness and satisfaction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 28599
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 35749
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ali, A., Pickering, M., Shafi, K.: rdu natural scene character recognition using convolutional neural networks. In: 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), pp. 29–34. IEEE (2018)

    Google Scholar 

  2. Ardian, Z., Santoso, P.I., Hantono, B.S.: Argot: text-based detection systems in real time using augmented reality for media translator aceh-indonesia with android-based smartphones. J. Phys. Conf. Ser. 1019, 012074 (2018)

    Google Scholar 

  3. Baek,Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9365–9374 (2019)

    Google Scholar 

  4. Bhatt, P., Panchal, K., Patel, H., Rote, U.: Tourism application using augmented reality. Available at SSRN 3568709 (2020)

    Google Scholar 

  5. Huang, Z., Zhong, Z., Sun, L., Huo, Q.: Mask R-CNN with pyramid attention network for scene text detection. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 764–772. IEEE (2019)

    Google Scholar 

  6. Liu, X., Zhou, G., Zhang, R., Wei, X.: An accurate segmentation-based scene text detector with context attention and repulsive text border. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 550–551 (2020)

    Google Scholar 

  7. Lundgren, A., Castro, D., Lima, E., Bezerra, B.: OctShuffleMLT: a compact octave based neural network for end-to-end multilingual text detection and recognition. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 4, pp. 37–42. IEEE (2019)

    Google Scholar 

  8. Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 67–83 (2018)

    Google Scholar 

  9. Mansoor, K., Olson, C.F.: Recognizing text with a CNN. In: 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ), pp. 1–6. IEEE (2019)

    Google Scholar 

  10. Ouali, I., Ghozzi, F., Taktak, R., Sassi, M.S.H.: Ontology alignment using stable matching. Procedia Comput. Sci. 159, 746–755 (2019)

    Article  Google Scholar 

  11. Ouali, I., Sassi, M.S.H., Halima, M.B., Ali, W.: A new architecture based AR for detection and recognition of objects and text to enhance navigation of visually impaired people. Procedia Comput. Sci. 176, 602–611 (2020)

    Article  Google Scholar 

  12. Ouali, I., Hadj Sassi, M.S., Ben Halima, M., Wali, A.: Architecture for real-time visualizing arabic words with diacritics using augmented reality for visually impaired people. In: Barolli, L., Woungang, I., Enokido, T. (eds.) AINA 2021. LNNS, vol. 225, pp. 285–296. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75100-5_25

    Chapter  Google Scholar 

  13. Ouertani, H.C., Tatwany, L.: Augmented reality based mobile application for real-time arabic language translation. Commun. Sci. Technol. 4(1), 30–37 (2019)

    Article  Google Scholar 

  14. Pu, M., Majid, N., Idrus, B.: Framework based on mobile augmented reality for translating food menu in Thai language to Malay language. Int. J. Adv. Sci. Engl. Inf. Technol. 7, 153–159 (2017)

    Article  Google Scholar 

  15. Qin, S., Ren, P., Kim, S., Manduchi, R.: Robust and accurate text stroke segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 242–250. IEEE (2018)

    Google Scholar 

  16. Qin, X., Zhou, Y., Guo, Y., Wu, D., Wang, W.: Fc2rn: a fully convolutional corner refinement network for accurate multi-oriented scene text detection. arXiv preprint arXiv:2007.05113 (2020)

  17. Sassi, M.S.H., Jedidi, F.G., Fourati, L.C.: A new architecture for cognitive internet of things and big data. Procedia Comput. Sci. 159, 534–543 (2019)

    Article  Google Scholar 

  18. Saudagar, A.K.J., Mohammad, H.: Augmented reality mobile application for arabic text extraction, recognition and translation. J. Stat. Manag. Syst. 21(4), 617–629 (2018)

    Google Scholar 

  19. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  20. Syahidi, A.A., Tolle, H., Supianto, A.A., Arai, K.: Bandoar: real-time text based detection system using augmented reality for media translator Banjar language to Indonesian with smartphone. In: 2018 IEEE 5th International Conference on Engineering Technologies and Applied Sciences (ICETAS), pp. 1–6. IEEE (2018)

    Google Scholar 

  21. Tang, Y., Wu, X.: Scene text detection using superpixel-based stroke feature transform and deep learning based region classification. IEEE Trans. Multimedia 20(9), 2276–2288 (2018)

    Article  Google Scholar 

  22. Wang, X., Jiang, Y., Luo, Z., Liu, C.-L., Choi, H., Kim, S.: Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6449–6458 (2019)

    Google Scholar 

  23. Wang, Y., Xie, H., Fu, Z., Zhang, Y.: DSRN: a deep scale relationship network for scene text detection. In: IJCAI, pp. 947–953 (2019)

    Google Scholar 

  24. Xu, Y., Wang, Y., Zhou, W., Wang, Y., Yang, Z., Bai, X.: Textfield: learning a deep direction field for irregular scene text detection. IEEE Trans. Image Process. 28(11), 5566–5579 (2019)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Imene Ouali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ouali, I., Halima, M.B., Wali, A. (2022). Text Detection and Recognition Using Augmented Reality and Deep Learning. In: Barolli, L., Hussain, F., Enokido, T. (eds) Advanced Information Networking and Applications. AINA 2022. Lecture Notes in Networks and Systems, vol 449. Springer, Cham. https://doi.org/10.1007/978-3-030-99584-3_2

Download citation

Publish with us

Policies and ethics