Abstract
Paper documents are ideal sources of useful information and have a profound impact on every aspect of human lives. These documents may be printed or handwritten and contain information as combinations of texts, figures, tables, charts, etc. This paper proposes a method to segment text lines from both flatbed scanned/camera-captured heavily warped printed and handwritten documents. This work uses the concept of semantic segmentation with the help of a multi-scale convolutional neural network. The results of line segmentation using the proposed method outperform a number of similar proposals already reported in the literature. The performance and efficacy of the proposed method have been corroborated by the test result on a variety of publicly available datasets, including ICDAR, Alireza, IUPR, cBAD, Tobacco-800, IAM, and our dataset.














Similar content being viewed by others
References
Ahn, B., Ryu, J., Koo, H.I., Cho, N.I.: Textline detection in degraded historical document images. EURASIP J. Image Video Process. 2017(1), 82 (2017)
Alaei, A., Pal, U., Nagabhushan, P.: A new scheme for unconstrained handwritten text-line segmentation. Pattern Recogn. 44(4), 917–928 (2011)
Alaei, A., Pal, U., Nagabhushan, P.: Dataset and ground truth for handwritten text in four different scripts. Int. J. Pattern Recogn. Artif. Intell. 26, 2012 (2012)
Antonacopoulos, A., Karatzas, D.: Document image analysis for world war ii personal records. In: Document Image Analysis for Libraries, 2004. Proceedings. First International Workshop on, pp. 336–341. IEEE (2004)
Asi, A., Rabaev, I., Kedem, K., El-Sana, J.: User-assisted alignment of arabic historical manuscripts. In: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, pp. 22–28. ACM (2011)
Biswas, S., Das, A.K.: Writer identification of bangla handwritings by radon transform projection profile. In: 2012 10th IAPR International Workshop on Document Analysis Systems (DAS), pp. 215–219. IEEE (2012)
Bukhari, S.S., Shafait, F., Breuel, T.M.: T.m.: Dewarping of document images using coupled-snakes. In: In: Proceedings of Third International Workshop on Camera-Based Document Analysis and Recognition, pp. 34–41 (2009)
Bukhari, S.S., Shafait, F., Breuel, T.M.: Text-line extraction using a convolution of isotropic gaussian filter with a set of line filters. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 579–583. IEEE (2011)
Bukhari, S.S., Shafait, F., Breuel, T.M.: The IUPR Dataset of Camera-Captured Document Images, pp. 164–171. Springer, Berlin (2012)
cBAD: Scriptnet / icdar 2017 competition on baseline detection (cbad). https://scriptnet.iit.demokritos.gr/competitions/5/1/. (Accessed on 03/14/2019)
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: European conference on computer vision, pp. 184–199. Springer (2014)
Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning, 2016. arXiv preprint arXiv:1603.07285 (2016)
Dutta, A., Garai, A., Biswas, S.: Segmentation of meaningful text-regions from camera captured document images. In: 2018 Fifth International Conference on Emerging Applications of Information Technology (EAIT), pp. 1–4. IEEE (2018)
Eskenazi, S., Gomez-Krämer, P., Ogier, J.M.: A comprehensive survey of mostly textual document segmentation algorithms since 2008. Pattern Recogn. 64, 1–14 (2017)
Garai, A., Biswas, S., Mandal, S.: A theoretical justification of warping generation for dewarping using cnn. Pattern Recognition 109, 107621
Garai, A., Biswas, S., Mandal, S., Chaudhuri, B.B.: Automatic rectification of warped bangla document images. IET Image Processing (2019)
Garai, A., Biswas, S., Mandal, S., Chaudhuri, B.B.: A method to generate synthetically warped document image. arXiv preprint arXiv:1910.06621 (2019)
Gatos, B., Louloudis, G., Stamatopoulos, N.: Segmentation of historical handwritten documents into text zones and text lines. In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 464–469. IEEE (2014)
He, J., Downton, A.C.: User-assisted archive document image analysis for digital library construction. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, 2003, pp. 498–502. IEEE
Hendry, R.C.: Automatic license plate recognition via sliding-window darknet-yolo deep learning. Image Vis. Comput. 87, 47–56 (2019)
Kil, T., Seo, W., Koo, H.I., Cho, N.I.: Robust document image dewarping method using text-lines and line segments. In: 2017 14th IAPR International Conference on Document Analysis and Recognition, vol. 01, pp. 865–870
Kim, B.S., Koo, H.I., Cho, N.I.: Document dewarping via text-line based optimization. Pattern Recogn. 48(11), 3600–3614 (2015)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kornfield, E.M., Manmatha, R., Allan, J.: Text alignment with handwritten documents. In: Proceedings of the First International Workshop on Document Image Analysis for Libraries, 2004, pp. 195–209. IEEE (2004)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Lewis, D., Agam, G., Argamon, S., Frieder, O., Grossman, D., J.Heard: Building a test collection for complex document information processing. In: Proceedings of the 29th Annual International ACM SIGIR Conference, pp. 665–666 (2006)
Li, Y., Zheng, Y., Doermann, D., Jaeger, S.: Script-independent text line segmentation in freestyle handwritten documents. IEEE Trans. Pattern Anal. Mach. Intell. 30(8), 1313–1329 (2008)
Liu, C., Zhang, Y., Wang, B., Ding, X.: Restoring camera-captured distorted document images. IJDAR 18(2), 111–124 (2015)
Liwicki, M., Bunke, H.: Iam-ondb - an on-line english sentence database acquired from handwritten text on a whiteboard. In: Eighth International Conference on Document Analysis and Recognition, pp. 956–961 Vol. 2 (2005)
Liwicki, M., Indermuhle, E., Bunke, H.: On-line handwritten text line detection using dynamic programming. In: Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on, vol. 1, pp. 447–451. IEEE (2007)
Louloudis, G., Gatos, B., Pratikakis, I., Halatsis, C.: Text line and word segmentation of handwritten documents. Pattern Recogn. 42(12), 3169–3183 (2009)
Marti, U.V., Bunke, H.: The iam-database: an english sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002)
Moysset, B., Adam, P., Wolf, C., Louradour, J.: Space displacement localization neural networks to locate origin points of handwritten text lines in historical documents. In: Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing, HIP ’15, pp. 1–8. ACM, New York, NY, USA (2015)
Moysset, B., Kermorvant, C., Wolf, C.: Full-page text recognition: Learning where to start and when to stop. In: 2017 14th IAPR International Conference on Document Analysis and Recognition, vol. 01, pp. 871–876 (2017)
Moysset, B., Kermorvant, C., Wolf, C., Louradour, J.: Paragraph text segmentation into lines with recurrent neural networks. In: Document Analysis and Recognition (ICDAR), 2015 13th International Conference on, pp. 456–460. IEEE (2015)
Moysset, B., Kermorvant, C., Wolf, C., Louradour, J.: Paragraph text segmentation into lines with recurrent neural networks. In: 2015 13th International Conference on Document Analysis and Recognition, pp. 456–460 (2015)
Moysset, B., Louradour, J., Kermorvant, C., Wolf, C.: Learning text-line localization with shared and local regression neural networks. In: Frontiers in Handwriting Recognition (ICFHR), 2016 15th International Conference on, pp. 1–6. IEEE (2016)
Nagy, G., Seth, S.: Hierarchical representation of optically scanned documents (1984)
Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, vol. 2, pp. II–II. IEEE (2003)
Rath, T.M., Manmatha, R., Lavrenko, V.: A search engine for historical manuscript images. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 369–376. ACM (2004)
Renton, G., Soullard, Y., Chatelain, C., Adam, S., Kermorvant, C., Paquet, T.: Fully convolutional network with dilated convolutions for handwritten text line segmentation. IJDAR 21(3), 177–186 (2018)
Roy, P.P., Rayar, F., Ramel, J.Y.: Word spotting in historical documents using primitive codebook and dynamic programming. Image Vis. Comput. 44, 15–28 (2015)
Ryu, J., Koo, H.I., Cho, N.I.: Language-independent text-line extraction algorithm for handwritten documents. IEEE Signal Process. Lett. 21(9), 1115–1119 (2014)
Saabni, R., Asi, A., El-Sana, J.: Text line extraction for historical document images. Pattern Recogn. Lett. 35, 23–33 (2014)
Samit, B.: Department of computer science and technology. https://oldwww.iiests.ac.in/index.php/researchsamitbiswas-cst-menuitem
Shi, Z., Govindaraju, V.: Line separation for complex document images using fuzzy runlength. In: Proceedings of the First International Workshop on Document Image Analysis for Libraries, 2004. pp. 306–312. IEEE (2004)
Shi, Z., Setlur, S., Govindaraju, V.: Text extraction from gray scale historical document images using adaptive local connectivity map. In: Proceedings of the Eighth International Conference on Document Analysis and Recognition, 2005, pp. 794–798. IEEE (2005)
Shi, Z., Setlur, S., Govindaraju, V.: A steerable directional local profile technique for extraction of handwritten arabic text lines. In: 10th International Conference on Document Analysis and Recognition, 2009. ICDAR’09, pp. 176–180. IEEE (2009)
Shivakumara, P., Phan, T.Q., Tan, C.L.: A laplacian approach to multi-oriented text detection in video. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 412–419 (2011)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Song, Y., Liu, A., Pang, L., Lin, S., Zhang, Y., Tang, S.: A novel image text extraction method based on k-means clustering. ICIS 08, 185–190 (2008)
Stamatopoulos, N., Gatos, B., Louloudis, G., Pal, U., Alaei, A.: Icdar 2013 handwriting segmentation contest. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1402–1406. IEEE (2013)
Stewart, S., Barrett, B.: Document image page segmentation and character recognition as semantic segmentation. In: Proceedings of the 4th International Workshop on Historical Document Imaging and Processing, pp. 101–106. ACM (2017)
Tobacco: The Legacy Tobacco Document Library \((LTDL)\). http://legacy.library.ucsf.edu/
Vo, Q.N., Kim, S.H., Yang, H.J., Lee, G.S.: Text line segmentation using a fully convolutional network in handwritten document images. IET Image Process. 12(3), 438–446 (2017)
Ye, Q., Gao, W., Wang, W., Zeng, W.: A robust text detection algorithm in images and video frames. In: Information, Communications and Signal Processing, 2003 and Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint Conference of the Fourth International Conference on, vol. 2, pp. 802–806. IEEE (2003)
Yin, F., Liu, C.L.: Handwritten chinese text line segmentation by clustering with distance metric learning. Pattern Recogn. 42(12), 3146–3157 (2009)
Zhu, X., Qian, Y., Zhao, X., Sun, B., Sun, Y.: A deep learning approach to patch-based image inpainting forensics. Signal Process. Image Commun. 67, 90–99 (2018)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dutta, A., Garai, A., Biswas, S. et al. Segmentation of text lines using multi-scale CNN from warped printed and handwritten document images. IJDAR 24, 299–313 (2021). https://doi.org/10.1007/s10032-021-00370-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-021-00370-8