Generating Synthetic Handwritten Historical Documents with OCR Constrained GANs

Vögtlin, Lars; Drazyk, Manuel; Pondenkandath, Vinaychandran; Alberti, Michele; Ingold, Rolf

doi:10.1007/978-3-030-86334-0_40

Lars Vögtlin¹¹,
Manuel Drazyk¹¹,
Vinaychandran Pondenkandath¹¹,
Michele Alberti¹¹ &
…
Rolf Ingold¹¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12823))

Included in the following conference series:

International Conference on Document Analysis and Recognition

3744 Accesses
15 Citations

Abstract

We present a framework to generate synthetic historical documents with precise ground truth using nothing more than a collection of unlabeled historical images. Obtaining large labeled datasets is often the limiting factor to effectively use supervised deep learning methods for Document Image Analysis (DIA). Prior approaches towards synthetic data generation either require human expertise or result in poor accuracy in the synthetic documents. To achieve high precision transformations without requiring expertise, we tackle the problem in two steps. First, we create template documents with user-specified content and structure. Second, we transfer the style of a collection of unlabeled historical images to these template documents while preserving their text and layout. We evaluate the use of our synthetic historical documents in a pre-training setting and find that we outperform the baselines (randomly initialized and pre-trained). Additionally, with visual examples, we demonstrate a high-quality synthesis that makes it possible to generate large labeled historical document datasets with precise ground truth.

L. Vögtlin and M. Drazyk—Both authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Using GANs for Domain Adaptive High Resolution Synthetic Document Generation

Generating Synthetic Styled Chu Nom Characters

Data Synthesis for Document Layout Analysis

Notes

1.
https://github.com/DIVA-DIA/Generating-Synthetic-Handwritten-Historical-Documents.
2.
This can be done with any other word processing tool such as MS Word.

References

Alberti, M., Seuret, M., Ingold, R., Liwicki, M.: A pitfall of unsupervised pre-training (2017). arXiv: 1703.04332
Alberti, M., Vögtlin, L., Pondenkandath, V., Seuret, M., Ingold, R., Liwicki, M.: Labeling, cutting, grouping: an efficient text line segmentation method for medieval manuscripts. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1200–1206. IEEE (2019)
Google Scholar
Baird, H.S.: Document Image Defect Models. In: Baird, H.S., Bunke, H., Yamamoto, K. (eds.) Structured Document Image Analysis, pp. 546–556. Springer, Heidelberg (1992). https://doi.org/10.1007/978-3-642-77281-8_26
Bluche, T., Louradour, J., Messina, R.: Scan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 1050–1055 (2017)
Google Scholar
Capobianco, S., Marinai, S.: DocEmul: a toolkit to generate structured historical documents. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 1186–1191 (2017)
Google Scholar
Chu, C., Zhmoginov, A., Sandler, M.: CycleGAN, a master of steganography (2017)
Google Scholar
Clausner, C., Pletschacher, S., Antonacopoulos, A.: Aletheia - an advanced document layout and text ground-truthing system for production environments. In: 2011 International Conference on Document Analysis and Recognition, pp. 48–52 (2011)
Google Scholar
Edwards, H.J.: Caesar: The Gallic War. Harvard University Press Cambridge, Cambridge (1917)
Google Scholar
Fischer, A., Frinken, V., Fornés, A., Bunke, H.: Transcription alignment of Latin manuscripts using hidden Markov models. In: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, HIP 2011, pp. 29–36. Association for Computing Machinery (2011)
Google Scholar
Goodfellow, I.J., et al.: Generative Adversarial Networks (2014)
Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 369–376. Association for Computing Machinery (2006)
Google Scholar
Guan, M., Ding, H., Chen, K., Huo, Q.: Improving handwritten OCR with augmented text line images synthesized from online handwriting samples by style-conditioned GAN. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 151–156 (2020)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Journet, N., Visani, M., Mansencal, B., Van-Cuong, K., Billy, A.: DocCreator: a new software for creating synthetic ground-truthed document images. J. Imaging 3(4), 62 (2017)
Google Scholar
Kang, L., Riba, P., Wang, Y., Rusiñol, M., Fornés, A., Villegas, M.: GANwriting: content-conditioned generation of styled handwritten word images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 273–289. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_17
Chapter Google Scholar
Kieu, V.C., Visani, M., Journet, N., Domenger, J.P., Mullot, R.: A character degradation model for grayscale ancient document images. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 685–688 (2012)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization (2017)
Google Scholar
Li, H., Wang, W.: Reinterpreting CTC training as iterative fitting. Pattern Recog. 105, 107392 (2020)
Article Google Scholar
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. IJDAR 5(1), 39–46 (2002)
Article Google Scholar
Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 52–59. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21735-7_7
Chapter Google Scholar
Mehri, M., Héroux, P., Mullot, R., Moreux, J.P., Coüasnon, B., Barrett, B.: HBA 1.0: a pixel-based annotated dataset for historical book analysis. In: Proceedings of the 4th International Workshop on Historical Document Imaging and Processing, HIP 2017, pp. 107–112. Association for Computing Machinery (2017)
Google Scholar
Märgner, V., Abed, H.E.: Tools and metrics for document analysis systems evaluation. In: Doermann, D., Tombre, K. (eds.) Handbook of Document Image Processing and Recognition, pp. 1011–1036. Springer, London (2014). https://doi.org/10.1007/978-0-85729-859-1_33
Chapter Google Scholar
Pondenkandath, V., Alberti, M., Diatta, M., Ingold, R., Liwicki, M.: Historical document synthesis with generative adversarial networks. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 5, pp. 146–151 (2019)
Google Scholar
Scius-Bertrand, A., Voegtlin, L., Alberti, M., Fischer, A., Bui, M.: Layout analysis and text column segmentation for historical Vietnamese steles. In: Proceedings of the 5th International Workshop on Historical Document Imaging and Processing, HIP 2019, pp. 84–89. , Association for Computing Machinery (2019)
Google Scholar
Seuret, M., Chen, K., Eichenbergery, N., Liwicki, M., Ingold, R.: Gradient-domain degradations for improving historical documents images layout analysis. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1006–1010 (2015)
Google Scholar
Strauß, T., Leifert, G., Labahn, R., Hodel, T., Mühlberger, G.: ICFHR2018 competition on automated text recognition on a READ dataset. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 477–482 (2018)
Google Scholar
Studer, L., et al.: A comprehensive study of imagenet pre-training for historical document image analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 720–725 (2019)
Google Scholar
Taigman, Y., Polyak, A., Wolf, L.: Unsupervised Cross-Domain Image Generation (2016)
Google Scholar
Tensmeyer, C., Brodie, M., Saunders, D., Martinez, T.: Generating realistic binarization data with generative adversarial networks. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 172–177 (2019)
Google Scholar
Touvron, H., Douze, M., Cord, M., Jégou, H.: Powers of layers for image-to-image translation (2020). arXiv:2008.05763
Zhang, K.A., Cuesta-Infante, A., Xu, L., Veeramachaneni, K.: SteganoGAN: high capacity image steganography with GANs (2019). arXiv:1901.03892
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar

Download references

Acknowledgment

The work presented in this paper has been partially supported by the HisDoc III project funded by the Swiss National Science Foundation with the grant number 205120_169618. A big thanks to our co-workers Paul Maergner and Linda Studer for their support and advice.

Author information

Authors and Affiliations

Document Image and Voice Analysis Group (DIVA), University of Fribourg, Fribourg, Switzerland
Lars Vögtlin, Manuel Drazyk, Vinaychandran Pondenkandath, Michele Alberti & Rolf Ingold

Authors

Lars Vögtlin
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Drazyk
View author publications
You can also search for this author in PubMed Google Scholar
Vinaychandran Pondenkandath
View author publications
You can also search for this author in PubMed Google Scholar
Michele Alberti
View author publications
You can also search for this author in PubMed Google Scholar
Rolf Ingold
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lars Vögtlin .

Editor information

Editors and Affiliations

Universitat Autònoma de Barcelona, Barcelona, Spain
Josep Lladós
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti
Kyushu University, Fukuoka-shi, Japan
Seiichi Uchida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vögtlin, L., Drazyk, M., Pondenkandath, V., Alberti, M., Ingold, R. (2021). Generating Synthetic Handwritten Historical Documents with OCR Constrained GANs. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12823. Springer, Cham. https://doi.org/10.1007/978-3-030-86334-0_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-86334-0_40
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86333-3
Online ISBN: 978-3-030-86334-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)