GARDEN: Generative Prior Guided Network for Scene Text Image Super-Resolution

Kong, Yuxin; Ma, Weihong; Jin, Lianwen; Xue, Yang

doi:10.1007/978-3-031-70549-6_12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14808))

Included in the following conference series:

International Conference on Document Analysis and Recognition

361 Accesses

Abstract

Scene text image super-resolution (STISR) is a popular research topic due to its great potential for improving downstream recognition performance. Many recent STISR approaches have utilized recognition feedback to guide the reconstruction process. However, their effectiveness is often limited by inaccurate recognition feedback and insufficient use of visual priors. To address these challenges, we propose a novel GenerAtive pRior guiDEd Network, namely GARDEN, which surpasses existing practices by exploiting enriched generative priors for precise and reliable guidance towards STISR. Innovatively, GARDEN leverages a pre-trained Vision Transformer (ViT) as the generative style bank, which provides diverse image priors and further assists in generating reliable text priors. This allows the network to leverage prior information from both visual and semantic domains for the final reconstruction, leading to more efficient learning of both texture generation and text recovery. In addition, GARDEN introduces multi-scale sequential residual block (MS-SRB), a simple, efficient, and flexible structure for achieving the maximal utilization of generative priors. By leveraging enriched generative priors within a novel architecture design, GARDEN is better suited to encode, transfer, and reconstruct super-resolution text images than the best previous methods in terms of both fidelity and recognition accuracy, as shown in Fig. 1. Code will be publicly available.

Y. Kong and W. Ma—The authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 8465; Price includes VAT (Japan)

Softcover Book: JPY 10581; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Semantic and Gradient Guided Scene Text Image Super-Resolution

More and Less: Enhancing Abundance and Refining Redundancy for Text-Prior-Guided Scene Text Image Super-Resolution

Advancing scene text image super-resolution via edge enhancement priors

Article 07 August 2024

References

Alberti, C., Ling, J., Collins, M., Reitter, D.: Fusion of detected objects in text for visual question answering. In: EMNLP-IJCNLP, pp. 2131–2140 (2019)
Google Scholar
Biten, A.F., et al.: Scene text visual question answering. In: ICCV, pp. 4291–4301 (2019)
Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. TACL 5, 135–146 (2017)
Article Google Scholar
Chan, K.C., Wang, X., Xu, X., Gu, J., Loy, C.C.: GLEAN: generative latent bank for large-factor image super-resolution. In: CVPR, pp. 14245–14254 (2021)
Google Scholar
Chen, J., Li, B., Xue, X.: Scene text telescope: text-focused scene image super-resolution. In: CVPR, pp. 12026–12035 (2021)
Google Scholar
Chen, J., Yu, H., Ma, J., Li, B., Xue, X.: Text gestalt: stroke-aware scene text image super-resolution. In: AAAI, pp. 285–293 (2022)
Google Scholar
Chen, X., Wang, X., Zhou, J., Qiao, Y., Dong, C.: Activating more pixels in image super-resolution transformer. In: CVPR, pp. 22367–22377 (2023)
Google Scholar
Chen, Z., Zhang, Y., Gu, J., Kong, L., Yang, X., Yu, F.: Dual aggregation transformer for image super-resolution. In: ICCV (2023)
Google Scholar
Dai, T., Cai, J., Zhang, Y., Xia, S.T., Zhang, L.: Second-order attention network for single image super-resolution. In: CVPR, pp. 11065–11074 (2019)
Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_13
Chapter Google Scholar
Dong, C., Zhu, X., Deng, Y., Loy, C.C., Qiao, Y.: Boosting optical character recognition: a super-resolution approach. arXiv preprint (2015)
Google Scholar
Goodfellow, I.J., et al.: Generative adversarial nets. In: NeurIPS, pp. 2672–2680 (2014)
Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: CVPR, pp. 2315–2324 (2016)
Google Scholar
Hamza, H., Belaïd, Y., Belaïd, A.: Case-based reasoning for invoice analysis and recognition. In: Weber, R.O., Richter, M.M. (eds.) ICCBR 2007. LNCS (LNAI), vol. 4626, pp. 404–418. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74141-1_28
Chapter Google Scholar
Han, W., Chang, S., Liu, D., Yu, M., Witbrock, M., Huang, T.S.: Image super-resolution via dual-state recurrent networks. In: CVPR, pp. 1654–1663 (2018)
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: CVPR, pp. 16000–16009 (2022)
Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Chapter Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015)
Google Scholar
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013)
Google Scholar
Ledig, C., Theis, L., Huszár, F., Caballero, J.: Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR, pp. 4681–4690 (2017)
Google Scholar
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: CVPR, pp. 510–519 (2019)
Google Scholar
Li, X., Li, W., Ren, D., Zhang, H., Wang, M., Zuo, W.: Enhanced blind face restoration with multi-exemplar images and adaptive spatial feature fusion. In: CVPR, pp. 2706–2715 (2020)
Google Scholar
Li, X., Liu, M., Ye, Y., Zuo, W., Lin, L., Yang, R.: Learning warped guidance for blind face restoration. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 278–296. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_17
Chapter Google Scholar
Li, X., Zuo, W., Loy, C.C.: Learning generative structure prior for blind text image super-resolution. In: CVPR, pp. 10103–10113 (2023)
Google Scholar
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: SwinIR: image restoration using swin transformer. In: ICCV, pp. 1833–1844 (2021)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021)
Google Scholar
Lu, Z., Liu, H., Li, J., Zhang, L.: Efficient transformer for single image super-resolution. arXiv preprint (2021)
Google Scholar
Luan, Y., Eisenstein, J., Toutanova, K., Collins, M.: Sparse, dense, and attentional representations for text retrieval. TACL 9, 329–345 (2021)
Article Google Scholar
Luo, C., Jin, L., Sun, Z.: MORAN: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)
Article Google Scholar
Ma, C., Rao, Y., Cheng, Y., Chen, C., Lu, J., Zhou, J.: Structure-preserving super resolution with gradient guidance. In: CVPR, pp. 7769–7778 (2020)
Google Scholar
Ma, J., Guo, S., Zhang, L.: Text prior guided scene text image super-resolution. IEEE TIP (2023)
Google Scholar
Ma, J., Liang, Z., Zhang, L.: A text attention network for spatial deformation robust scene text image super-resolution. In: CVPR, pp. 5911–5920 (2022)
Google Scholar
Menon, S., Damian, A., Hu, S., Ravi, N., Rudin, C.: Pulse: self-supervised photo upsampling via latent space exploration of generative models. In: CVPR, pp. 2437–2445 (2020)
Google Scholar
Mishra, A., Alahari, K., Jawahar, C.: Scene text recognition using higher order language priors. In: BMVC (2012)
Google Scholar
Mou, Y., et al.: PlugNet: degradation aware scene text recognition supervised by a pluggable super-resolution unit. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 158–174. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_10
Chapter Google Scholar
Nazeri, K., Ng, E., Joseph, T., Qureshi, F., Ebrahimi, M.: EdgeConnect: structure guided image inpainting using edge prediction (2019)
Google Scholar
Niu, B., et al.: Single image super-resolution via a holistic attention network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 191–207. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_12
Chapter Google Scholar
Pan, X., Zhan, X., Dai, B., Lin, D., Loy, C.C., Luo, P.: Exploiting deep generative prior for versatile image restoration and manipulation 44(11), 7474–7489 (2021)
Google Scholar
Pandey, R.K., Vignesh, K., Ramakrishnan, A.: Binary document image super resolution for improved readability and OCR performance. arXiv preprint (2018)
Google Scholar
Phan, T.Q., Shivakumara, P., Tian, S., Tan, C.L.: Recognizing text with perspective distortion in natural scenes. In: ICCV, pp. 569–576 (2013)
Google Scholar
Quan, Y., Yang, J., Chen, Y., Xu, Y., Ji, H.: Collaborative deep learning for super-resolving blurry text images. IEEE Trans. Comput. Imaging 6, 778–790 (2020)
Article MathSciNet Google Scholar
Ren, Y., Yu, X., Zhang, R., Li, T.H., Liu, S., Li, G.: StructureFlow: image inpainting via structure-aware appearance flow. In: ICCV, pp. 181–190 (2019)
Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE TPAMI 39(11), 2298–2304 (2016)
Article Google Scholar
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE TPAMI 41(9), 2035–2048 (2018)
Article Google Scholar
Shi, Q., Zhu, Y., Liu, Y., Ye, J., Yang, D.: Perceiving multiple representations for scene text image super-resolution guided by text recognizer. Eng. Appl. Artif. Intell. 124, 106551 (2023)
Article Google Scholar
Soh, J.W., Park, G.Y., Jo, J., Cho, N.I.: Natural and realistic single image super-resolution with explicit natural manifold discrimination. In: CVPR, pp. 8122–8131 (2019)
Google Scholar
Tai, Y., Yang, J., Liu, X., Xu, C.: MemNet: a persistent memory network for image restoration. In: ICCV, pp. 4539–4547 (2017)
Google Scholar
Tran, H.T., Ho-Phuoc, T.: Deep Laplacian pyramid network for text images super-resolution. In: International Conference on Computing and Communication Technologies, pp. 1–6 (2019)
Google Scholar
Ulyanov, D., Vedaldi, A., Lempitsky, V.: Deep image prior. In: CVPR, pp. 9446–9454 (2018)
Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)
Google Scholar
Wang, W., et al.: Scene text image super-resolution in the wild. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 650–666. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_38
Wang, W., Xie, E., Sun, P., Wang, W., Tian, L., Shen, C., Luo, P.: TextSR: content-aware text super-resolution guided by recognition. arXiv preprint (2019)
Google Scholar
Wang, X., Li, Y., Zhang, H., Shan, Y.: Towards real-world blind face restoration with generative facial prior. In: CVPR, pp. 9168–9178 (2021)
Google Scholar
Wang, X., Xie, L., Dong, C., Shan, Y.: Real-ESRGAN: training real-world blind super-resolution with pure synthetic data. In: ICCV, pp. 1905–1914 (2021)
Google Scholar
Wang, X., et al.: ESRGAN: enhanced super-resolution generative adversarial networks. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 63–79. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11021-5_5
Chapter Google Scholar
Xu, X., Sun, D., Pan, J., Zhang, Y., Pfister, H., Yang, M.H.: Learning to super-resolve blurry face and text images. In: ICCV, pp. 251–260 (2017)
Google Scholar
Yang, F., Yang, H., Fu, J., Lu, H., Guo, B.: Learning texture transformer network for image super-resolution. In: CVPR, pp. 5791–5800 (2020)
Google Scholar
Yang, T., Ren, P., Xie, X., Zhang, L.: GAN prior embedded network for blind face restoration in the wild. In: CVPR, pp. 672–681 (2021)
Google Scholar
Zamir, S.W., et al.: Learning enriched features for real image restoration and enhancement. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 492–511. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_30
Chapter Google Scholar
Zhang, K., Gool, L.V., Timofte, R.: Deep unfolding network for image super-resolution. In: CVPR, pp. 3217–3226 (2020)
Google Scholar
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_40
Chapter Google Scholar
Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense network for image restoration. IEEE TPAMI 43(7), 2480–2495 (2020)
Article Google Scholar
Zhao, C., et al.: Scene text image super-resolution via parallelly contextual attention network. In: ACM MM, pp. 2908–2917 (2021)
Google Scholar
Zhu, S., Zhao, Z., Fang, P., Xue, H.: Improving scene text image super-resolution via dual prior modulation network. In: AAAI (2023)
Google Scholar

Download references

Acknowledgment

This research is supported in part by National Natural Science Foundation of China (Grant No.: 62441604, 61936003).

Author information

Authors and Affiliations

South China University of Technology, Guangzhou, China
Yuxin Kong, Weihong Ma, Lianwen Jin & Yang Xue

Authors

Yuxin Kong
View author publications
You can also search for this author in PubMed Google Scholar
Weihong Ma
View author publications
You can also search for this author in PubMed Google Scholar
Lianwen Jin
View author publications
You can also search for this author in PubMed Google Scholar
Yang Xue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lianwen Jin .

Editor information

Editors and Affiliations

Luleå Tekniska Universitet, Luleå, Sweden
Elisa H. Barney Smith
Luleå Tekniska Universitet, Luleå, Sweden
Marcus Liwicki
Tsinghua University, Beijing, China
Liangrui Peng

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1864 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kong, Y., Ma, W., Jin, L., Xue, Y. (2024). GARDEN: Generative Prior Guided Network for Scene Text Image Super-Resolution. In: Barney Smith, E.H., Liwicki, M., Peng, L. (eds) Document Analysis and Recognition - ICDAR 2024. ICDAR 2024. Lecture Notes in Computer Science, vol 14808. Springer, Cham. https://doi.org/10.1007/978-3-031-70549-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-70549-6_12
Published: 09 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70548-9
Online ISBN: 978-3-031-70549-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

GARDEN: Generative Prior Guided Network for Scene Text Image Super-Resolution

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic and Gradient Guided Scene Text Image Super-Resolution

More and Less: Enhancing Abundance and Refining Redundancy for Text-Prior-Guided Scene Text Image Super-Resolution

Advancing scene text image super-resolution via edge enhancement priors

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1864 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

GARDEN: Generative Prior Guided Network for Scene Text Image Super-Resolution

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic and Gradient Guided Scene Text Image Super-Resolution

More and Less: Enhancing Abundance and Refining Redundancy for Text-Prior-Guided Scene Text Image Super-Resolution

Advancing scene text image super-resolution via edge enhancement priors

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1864 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation