Thai Scene Text Recognition with Character Combination

Li, Chun; Zhan, Hongjian; Zhao, Kun; Lu, Yue

doi:10.1007/978-3-031-18913-5_25

Chun Li^15,16,
Hongjian Zhan ORCID: orcid.org/0000-0002-3906-658X^15,16,17,
Kun Zhao¹⁸ &
…
Yue Lu ORCID: orcid.org/0000-0003-4062-6553^15,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13536))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

1900 Accesses

Abstract

In recent years, scene text recognition(STR) that recognizing character sequences in natural images is in great demand beyond various fields. However, most STR studies only focus on popular scripts like Chinese or English, too little attention has been paid to minority languages. In this paper, we address problems on Thai STR, and introduce a novel strategy called Thai Character Combination(TCC), which explore original characteristics of Thai text. Unlike most other scripts, characters in Thai text can be written both horizontally and vertically, which brings big challenges to current sequence-based text recognition methods. In order to reduce complexity of structure and alleviate the misalignment problem in attention-based methods, TCC intends to combine Thai characters that stack vertically to independent combined characters. Furthermore, we establish a Thai Scene Text(TST) dataset that collected from multiple scenarios to evaluate the performance of our proposed character modeling strategy. We conduct abundant experiments and analyses to compare the recognition performance of models with and without TCC. The results indicate the effectiveness of the proposed method from multiple perspectives, especially, TCC benefits a lot for long text recognition, and there is a substantial improvement in the recognition accuracy of entire string-level.

C. Li and H. Zhan—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Scene text recognition: an Indic perspective

Article 15 July 2024

S5TR: Simple Single Stage Sequencer for Scene Text Recognition

Adaptive feature fusion for scene text script identification

Article 08 January 2024

References

Baek, J., et al.: What is wrong with scene text recognition model comparisons? dataset and model analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4715–4723 (2019)
Google Scholar
Chaiwatanaphan, S., Pluempitiwiriyawej, C., Wangsiripitak, S.: Printed Thai character recognition using shape classification in video sequence along a line. Eng. J. 21(6), 37–45 (2017)
Article Google Scholar
Chamchong, R., Gao, W., McDonnell, M.D.: Thai handwritten recognition on text block-based from Thai archive manuscripts. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1346–1351. IEEE (2019)
Google Scholar
Chamchong, R., Gao, W., McDonnell, M.D.: Thai handwritten recognition on text block-based from Thai archive manuscripts. In: 2019 International Conference on Document Analysis and Recognition, ICDAR 2019, Sydney, Australia, 20–25 September 2019, pp. 1346–1351. IEEE (2019)
Google Scholar
Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5076–5084 (2017)
Google Scholar
Emsawas, T., Kijsirikul, B.: Thai printed character recognition using long short-term memory and vertical component shifting. In: Booth, R., Zhang, M.-L. (eds.) PRICAI 2016. LNCS (LNAI), vol. 9810, pp. 106–115. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42911-3_9
Chapter Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Google Scholar
He, P., Huang, W., Qiao, Y., Loy, C.C., Tang, X.: Reading scene text in deep convolutional sequences. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Hu, W., Cai, X., Hou, J., Yi, S., Lin, Z.: Gtc: guided training of ctc towards efficient and accurate scene text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11005–11012 (2020)
Google Scholar
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_34
Chapter Google Scholar
Lee, J., Park, S., Baek, J., Oh, S.J., Kim, S., Lee, H.: On recognizing texts of arbitrary shapes with 2D self-attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 546–547 (2020)
Google Scholar
Li, H., Wang, P., Shen, C., Zhang, G.: Show, attend and read: a simple and strong baseline for irregular text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8610–8617 (2019)
Google Scholar
Liao, M., et al.: Scene text recognition from two-dimensional perspective. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8714–8721 (2019)
Google Scholar
Litman, R., Anschel, O., Tsiper, S., Litman, R., Mazor, S., Manmatha, R.: Scatter: selective context attentional scene text recognizer. In: proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11962–11972 (2020)
Google Scholar
Liu, W., Chen, C., Wong, K.-Y.K., Su, Z., Han, J.: Star-net: a spatial attention residue network for scene text recognition. In: BMVC, vol. 2, p. 7 (2016)
Google Scholar
Liu, Z., Li, Y., Ren, F., Goh, W.L., Yu, H.: Squeezedtext: a real-time scene text recognition by binary convolutional encoder-decoder network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Lyu, P., Yang, Z., Leng, X., Wu, X., Li, R., Shen, X.: 2D attentional irregular scene text recognizer. arXiv preprint arXiv:1906.05708 (2019)
Phokharatkul, P., Kimpan, C.: Recognition of handprinted Thai characters using the cavity features of character based on neural network. In: IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No. 98EX242), pp. 149–152. IEEE (1998)
Google Scholar
Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., Wang, W.: Seed: semantics enhanced encoder-decoder framework for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13528–13537 (2020)
Google Scholar
Sanguansat, P., Asdornwised, W., Jitapunkul, S.: Online Thai handwritten character recognition using hidden Markov models and support vector machines. In: IEEE International Symposium on Communications and Information Technology, 2004, ISCIT 2004, vol. 1, pp. 492–497. IEEE (2004)
Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Article Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 International Conference on Computer Vision, pp. 1457–1464. IEEE (2011)
Google Scholar
Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), pp. 3304–3308. IEEE (2012)
Google Scholar
Wang, T., et al.: Decoupled attention network for text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12216–12224 (2020)
Google Scholar
Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4042–4049 (2014)
Google Scholar
Yue, X., Kuang, Z., Lin, C., Sun, H., Zhang, W.: RobustScanner: dynamically enhancing positional clues for robust text recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 135–151. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_9
Chapter Google Scholar
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4159–4167 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Shanghai Key Laboratory of Multidimensional Information Processing, Shanghai, 200241, China
Chun Li, Hongjian Zhan & Yue Lu
School of Communication and Electronic Engineering, East China Normal University, Shanghai, 200062, China
Chun Li, Hongjian Zhan & Yue Lu
Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, 401120, China
Hongjian Zhan
iFlytek Research, iFlytek Co., Ltd., Hefei, China
Kun Zhao

Authors

Chun Li
View author publications
You can also search for this author in PubMed Google Scholar
Hongjian Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Kun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yue Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Lu .

Editor information

Editors and Affiliations

Southern University of Science and Technology, Shenzhen, China
Shiqi Yu
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhaoxiang Zhang
Hong Kong Baptist University, Hong Kong, China
Pong C. Yuen
Northwestern Polytechnical University, Xi'an, China
Junwei Han
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Hong Kong Baptist University, Hong Kong, China
Yike Guo
Sun Yat-sen University, Guangzhou, China
Jianhuang Lai
Southern University of Science and Technology, Shenzhen, China
Jianguo Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, C., Zhan, H., Zhao, K., Lu, Y. (2022). Thai Scene Text Recognition with Character Combination. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13536. Springer, Cham. https://doi.org/10.1007/978-3-031-18913-5_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-18913-5_25
Published: 27 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18912-8
Online ISBN: 978-3-031-18913-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Thai Scene Text Recognition with Character Combination