Abstract
Text line extraction is vital pre-requisite for various document processing tasks. This paper presents a novel approach for text line extraction which is based on Gaussian scale space and dedicated binarization that utilize the inherent structure of smoothed text document images. It enhances the text lines in the image using multi-scale anisotropic second derivative of Gaussian filter bank at the average height of the text line. It then applies a binarization, which is based on component-tree and is tailored towards line extraction. The final stage of the algorithm is based on an energy minimization framework for removing spurious text line and assigning connected components to lines. We have tested our approach on various datasets written in different languages at range of image quality and received high detection rates, which outperform state-of-the-art algorithms. Our MATLAB code is publicly available. (http://www.cs.bgu.ac.il/~rafico/LineExtraction.zip)
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baechler, M., Liwicki, M., Ingold, R.: Text line extraction using dmlp classifiers for historical manuscripts. In: ICDAR, pp. 1029–1033 (2013)
Bar-Yosef, I., Hagbi, N., Kedem, K., Dinstein, I.: Text Line Segmentation for Degraded Handwritten Historical Documents. In: ICDAR, pp. 1161–1165 (2009)
Biller, O., Kedem, K., Dinstein, I., El-Sana, J.: Evolution maps for connected components in text documents. In: ICFHR, pp. 403–408 (2012)
Bukhari, S.S., Shafait, F., Breuel, T.M.: Script-independent handwritten textlines segmentation using active contours. In: ICDAR, pp. 446–450 (2009)
Cohen, R., Asi, A., Kedem, K., El-Sana, J., Dinstein, I.: Robust text and drawing segmentation algorithm for historical documents. In: HIP, pp. 110–117 (2013)
Delong, A., Osokin, A., Isack, H.N., Boykov, Y.: Fast approximate energy minimization with label costs. IJCV 96(1), 1–27 (2012)
Diem, M., Kleber, F., Sablatnig, R.: Text line detection for heterogeneous documents. In: ICDAR, pp. 743–747 (2013)
Gatos, B., Stamatopoulos, N., Louloudis, G.: ICDAR2009 handwriting segmentation contest. IJDAR 14(1), 25–33 (2011)
Gatos, B., Stamatopoulos, N., Louloudis, G.: ICFHR 2010 handwriting segmentation contest. In: ICFHR, pp. 737–742 (2010)
Li, Y., Zheng, Y., Doermann, D., Jaeger, S.: Script-independent text line segmentation in freestyle handwritten documents. IEEE TPAMI 30(8), 1313–1329 (2008)
Lindeberg, T.: Feature detection with automatic scale selection. IJCV 30(2), 79–116 (1998)
Naegel, B., Wendling, L.: A document binarization method based on connected operators. Pattern Recognition Letters 31(11), 1251–1259 (2010)
Rabaev, I., Biller, O., El-Sana, J., Kedem, K., Dinstein, I.: Text line detection in corrupted and damaged historical manuscripts. In: ICDAR, pp. 812–816 (2013)
Saabni, R., Asi, A., El-Sana, J.: Text line extraction for historical document images. Pattern Recognition Letters 35, 23–33 (2014)
Shi, Z., Setlur, S., Govindaraju, V.: A steerable directional local profile technique for extraction of handwritten arabic text lines. In: ICDAR, pp. 176–180 (2009)
Stamatopoulos, N., Gatos, B., Louloudis, G., Pal, U., Alaei, A.: ICDAR 2013 handwriting segmentation contest. In: ICDAR, pp. 1402–1406 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Cohen, R., Dinstein, I., El-Sana, J., Kedem, K. (2014). Using Scale-Space Anisotropic Smoothing for Text Line Extraction in Historical Documents. In: Campilho, A., Kamel, M. (eds) Image Analysis and Recognition. ICIAR 2014. Lecture Notes in Computer Science(), vol 8814. Springer, Cham. https://doi.org/10.1007/978-3-319-11758-4_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-11758-4_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11757-7
Online ISBN: 978-3-319-11758-4
eBook Packages: Computer ScienceComputer Science (R0)