Abstract
Based on the study of the specificity of historical printed books and on the main error sources of classical methods of page layout analysis, this paper presents a new way to achieve an indexation of ancient printed documents. We have developed an approach based on the extraction and the quantification of the various orientations that are present in printed document images. The documents are initially splitted into homogenous areas in which we analyze significant orientations with a directional rose. Each kind of information (textual or graphical) is typically identified and labelled according to its orientation distribution. This choice of characterization allows us to separate textual regions from graphical ones by minimizing the a priori knowledge. The evaluation of our proposition lies on a document image retrieval using layout extraction criteria and can also be used to precisely localize graphical parts in various types of documents. The system has been tested with success over several ancient printed books of the Renaissance.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Martin, H.J.: La naissance du livre moderne, Editions du Cercle de la Librairie (2000)
Belaid, A.: Computer aided design of models of page for their use in recognition of documents. In: Workshop one Electronic Page Models, LAMPE 1997 (1997)
O’Gorman, L.: The Document Spectrum for Page Analysis Layout. Trans. IEEE One PAMI 15(11), P1162–P1173 (1993)
Lebourgeois, F., Emptoz, H., Trinh, E.: Compression and accessibility with the images of digitized documents – Application to the Debora project. Numerical Document, Flight 7(3-4), 103–127 (2003)
Xi, J., Hu, J., Wu, L.: Page segmentation of chinese newspaper. Pattern recognition, 2695–2704 (2002)
Malerba, D., Esposito, F., Altamura, O.: Adaptive Layout Analysis of document. In: Hacid, M.-S., Raś, Z.W., Zighed, D.A., Kodratoff, Y. (eds.) ISMIS 2002. LNCS (LNAI), vol. 2366, p. 526. Springer, Heidelberg (2002)
Duygulu, P., Atalay, V.: A Hierarchical Representation of Form Documents for Identification and Retrieval. International Journal on Document Analysis and Recognition IJDAR 5(1), 17–27 (2002)
Bres, S.: Contributions à la quantification des critères de transparence et d’anisotropie par une approche globale. PhD Thesis (1994)
Pratt, W.K.: Digital Image Processing, 2nd edn., p. 230. Wiley, New York (1991)
Shin, D.D.: Classification of document page images based on visual similarity of layout structures. Language and Media Processing Laboratory Center for Automation Research University of Maryland (2000)
Maderlechner, G., Suda, P., Bruckner, T.: Classification of documents by form and content, Siemens AG, Corporate Research and DeÍelopment, Otto-Hahn-Ring 6, D-81730 Munchen, Germany
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Journet, N., Mullot, R., Ramel, JY., Eglin, V. (2005). Ancient Printed Documents Indexation: A New Approach. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds) Pattern Recognition and Data Mining. ICAPR 2005. Lecture Notes in Computer Science, vol 3686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551188_64
Download citation
DOI: https://doi.org/10.1007/11551188_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28757-5
Online ISBN: 978-3-540-28758-2
eBook Packages: Computer ScienceComputer Science (R0)