Abstract
In recent years, scene text recognition(STR) that recognizing character sequences in natural images is in great demand beyond various fields. However, most STR studies only focus on popular scripts like Chinese or English, too little attention has been paid to minority languages. In this paper, we address problems on Thai STR, and introduce a novel strategy called Thai Character Combination(TCC), which explore original characteristics of Thai text. Unlike most other scripts, characters in Thai text can be written both horizontally and vertically, which brings big challenges to current sequence-based text recognition methods. In order to reduce complexity of structure and alleviate the misalignment problem in attention-based methods, TCC intends to combine Thai characters that stack vertically to independent combined characters. Furthermore, we establish a Thai Scene Text(TST) dataset that collected from multiple scenarios to evaluate the performance of our proposed character modeling strategy. We conduct abundant experiments and analyses to compare the recognition performance of models with and without TCC. The results indicate the effectiveness of the proposed method from multiple perspectives, especially, TCC benefits a lot for long text recognition, and there is a substantial improvement in the recognition accuracy of entire string-level.
C. Li and H. Zhan—These authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baek, J., et al.: What is wrong with scene text recognition model comparisons? dataset and model analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4715–4723 (2019)
Chaiwatanaphan, S., Pluempitiwiriyawej, C., Wangsiripitak, S.: Printed Thai character recognition using shape classification in video sequence along a line. Eng. J. 21(6), 37–45 (2017)
Chamchong, R., Gao, W., McDonnell, M.D.: Thai handwritten recognition on text block-based from Thai archive manuscripts. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1346–1351. IEEE (2019)
Chamchong, R., Gao, W., McDonnell, M.D.: Thai handwritten recognition on text block-based from Thai archive manuscripts. In: 2019 International Conference on Document Analysis and Recognition, ICDAR 2019, Sydney, Australia, 20–25 September 2019, pp. 1346–1351. IEEE (2019)
Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5076–5084 (2017)
Emsawas, T., Kijsirikul, B.: Thai printed character recognition using long short-term memory and vertical component shifting. In: Booth, R., Zhang, M.-L. (eds.) PRICAI 2016. LNCS (LNAI), vol. 9810, pp. 106–115. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42911-3_9
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
He, P., Huang, W., Qiao, Y., Loy, C.C., Tang, X.: Reading scene text in deep convolutional sequences. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Hu, W., Cai, X., Hou, J., Yi, S., Lin, Z.: Gtc: guided training of ctc towards efficient and accurate scene text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11005–11012 (2020)
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_34
Lee, J., Park, S., Baek, J., Oh, S.J., Kim, S., Lee, H.: On recognizing texts of arbitrary shapes with 2D self-attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 546–547 (2020)
Li, H., Wang, P., Shen, C., Zhang, G.: Show, attend and read: a simple and strong baseline for irregular text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8610–8617 (2019)
Liao, M., et al.: Scene text recognition from two-dimensional perspective. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8714–8721 (2019)
Litman, R., Anschel, O., Tsiper, S., Litman, R., Mazor, S., Manmatha, R.: Scatter: selective context attentional scene text recognizer. In: proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11962–11972 (2020)
Liu, W., Chen, C., Wong, K.-Y.K., Su, Z., Han, J.: Star-net: a spatial attention residue network for scene text recognition. In: BMVC, vol. 2, p. 7 (2016)
Liu, Z., Li, Y., Ren, F., Goh, W.L., Yu, H.: Squeezedtext: a real-time scene text recognition by binary convolutional encoder-decoder network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Lyu, P., Yang, Z., Leng, X., Wu, X., Li, R., Shen, X.: 2D attentional irregular scene text recognizer. arXiv preprint arXiv:1906.05708 (2019)
Phokharatkul, P., Kimpan, C.: Recognition of handprinted Thai characters using the cavity features of character based on neural network. In: IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No. 98EX242), pp. 149–152. IEEE (1998)
Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., Wang, W.: Seed: semantics enhanced encoder-decoder framework for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13528–13537 (2020)
Sanguansat, P., Asdornwised, W., Jitapunkul, S.: Online Thai handwritten character recognition using hidden Markov models and support vector machines. In: IEEE International Symposium on Communications and Information Technology, 2004, ISCIT 2004, vol. 1, pp. 492–497. IEEE (2004)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 International Conference on Computer Vision, pp. 1457–1464. IEEE (2011)
Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), pp. 3304–3308. IEEE (2012)
Wang, T., et al.: Decoupled attention network for text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12216–12224 (2020)
Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4042–4049 (2014)
Yue, X., Kuang, Z., Lin, C., Sun, H., Zhang, W.: RobustScanner: dynamically enhancing positional clues for robust text recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 135–151. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_9
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4159–4167 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, C., Zhan, H., Zhao, K., Lu, Y. (2022). Thai Scene Text Recognition with Character Combination. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13536. Springer, Cham. https://doi.org/10.1007/978-3-031-18913-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-18913-5_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18912-8
Online ISBN: 978-3-031-18913-5
eBook Packages: Computer ScienceComputer Science (R0)