Abstract
The design and performance of a content-based information retrieval system for handwritten documents is described. System indexing and retrieval is based on writer characteristics, textual content as well as document meta data such as writer profile. Documents are indexed using global image features, e.g., stroke width, slant, word gaps, as well local features that describe shapes of characters and words. Image indexing is done automatically using page analysis, page segmentation, line separation, word segmentation and recognition of characters and words. Several types of queries are permitted: (i) entire document image; (ii) a region of interest (ROI) of a document; (iii) a word image; and (iv) textual. Retrieval is based on a probabilistic model of information retrieval. The system has been implemented using Microsoft Visual C++ and a relational database system. This paper reports on the performance of the system for retrieving documents based on same and different content.
This work was supported in part by the U.S. Department of Justice, National Institute of Justice grant 2002-LT-BX-K007.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Osborn, A.S.: Questioned Documents. Nellon Hall Pub. (1929)
Robertson, E.W.: Fundamentals of Document Examination, Burnham Inc Pub. (1991)
Bradford, R.R., Bradford, R.B.: Introduction to Handwriting Examination and Identification, Burnham Inc Pub. (1992)
Hilton, O.: Scientific examination of questioned documents. CRC Press Inc., Boca Raton (1993)
Huber, R.A., Headrick, A.M.: Handwriting Identification: Facts and Fundamentals. CRC Press, Boca Raton (1999)
Franke, K., Schomaker, L., Vuurpijl, L., Giesler, S.: FISH-new: A common ground for computer-based forensic writer identification. In: Proceedings of the Third European Academy of Forensic Science Triennial Meeting, Istanbul, Turkey, p. 84 (2003)
Srihari, S.N., Cha, S.-H., Arora, H., Lee, S.: Individuality of Handwriting. Journal of Forensic Sciences 44(4), 856–872 (2002)
Srihari, S.N., Zhang, B., Tomai, C., Lee, S.-J., Shi, Z., Shin, Y.C.: A system for hand-writing matching and recognition. In: Proceedings of the Symposium on Document Image Understanding Technology (SDIUT 2003), Greenbelt, MD (2003)
Zhang, B., Srihari, S.N.: Binary vector dissimilarity measures for handwriting. In: Kanungo, T., Smith, E.H.B., Hu, J., Kantor, P.B. (eds.) Document Recognition and Retrieval X, vol. 5010, pp. 28–38. SPIE, Bellingham (2003)
Sparck Jones, K.: A Probabilistic Model of Information Retrieval: Development and Status, Technical Report, Computer Laboratory, University of Cambridge, UK (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Srihari, S., Ganesh, A., Tomai, C., Shin, YC., Huang, C. (2004). Information Retrieval System for Handwritten Documents. In: Marinai, S., Dengel, A.R. (eds) Document Analysis Systems VI. DAS 2004. Lecture Notes in Computer Science, vol 3163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28640-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-28640-0_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23060-1
Online ISBN: 978-3-540-28640-0
eBook Packages: Springer Book Archive