Abstract
A database is only usefull if it is associated a set of procedures allowing to retrieve relevant elements for the users’ needs. A lot of IR techniques have been developed for automatic indexing and retrieval in document databases. Most of these use indexes depending on the textual content of documents, and very few are able to handle graphical or image content without human annotation.
This paper describes an approach similar to the bag of words technique for automatic indexing of graphical document image databases and different ways to consequently query these databases. In an unsupervised manner, this approach proposes a set of automatically discovered symbols that can be combined with logical operators to build queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Antonacopoulos, A.: Introduction to Document Image Analysis (1996)
Nagy, G.: Twenty years of document analysis in pami. IEEE Trans. on Pattern Analysis and Machine Intelligence 22(1), 38–62 (2000)
Pavlidis, T.: Algorithms for Graphics and Image Processing. Computer Science Press, Rockville (1982)
Bagdanov, A.D., Worring, M.: Fine-grained document genre classification using first order random graphs. In: Proc. of the sixth International Conference on Document Analysis and Recognition, pp. 79–83 (2001)
Washio, T., Motoda, H.: State of the art of graph-based data mining. SIGKDD Explor. Newsletter 5(1), 59–68 (2003)
Fung, B.C.M., Wang, K., Ester, M.: Hierarchical document clustering using frequent items. In: Proc. of the SIAM Conference on Data Maining (2003)
Termier, A., Rousset, M., Sebag, M.: Mining xml data with frequent trees. In: Proc. of DBFusion Workshop, pp. 87–96 (2002)
Doermann, D.: The indexing and retrieval of document images: A survey. Technical report, LAMP (1998)
Lorenz, O., Monagan, G.: Automatic indexing for storage and retrieval of line drawings. In: SPIE (ed.) Storage and Retrieval for Image and Video Databases (SPIE), vol. 2420, pp. 216–227 (1995)
Blostein, D., Zanibbi, R., Nagy, G., Harrap, R.: Document representations. In: Proc. of the IAPR Workshop on Graphic Recognition (2003)
Khotazad, A., Hong, Y.H.: Invariant image recognition by zernike moments. IEEE Trans. on Pattern Recogntion and Machine Analysis 12(5) (1990)
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 58(2), 159–179 (1985)
Gordon, A.D.: Classification, 2nd edn. Chapman & Hall, Boca Raton (1999)
Kaufmann, L., Rousseeuw, P.J.: Clustering by means of medoids. In: Dodge, Y. (ed.) Statistical Data Analysis based on the L1 Norm and Related Methods, pp. 405–416. Elsevier Science, Amsterdam (1987)
Tabbone, S., Wendling, L., Tombre, K.: Matching of graphical symbols in line-drawing images using angular signature information. International Journal on Document Analysis and Recognition 6(2), 115–125 (2003)
Yan, X., Han, J.: Closegraph: mining closed frequent graph patterns. In: Press, A. (ed.) Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 286–295 (2003)
Kuramochi, M., Karypis, G.: An efficient algorithm for discovering frequent subgraphs. IEEE Transactions on Knowledge Data Engeneering 16(9), 1038–1051 (2004)
Dumais, S.T.: Improving the retrieval information from external ressources, behaviour research methods. Instrument and Computers 23(2), 229–236 (1991)
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Barbu, E., Héroux, P., Adam, S., Trupin, É. (2006). Using Bags of Symbols for Automatic Indexing of Graphical Document Image Databases. In: Liu, W., Lladós, J. (eds) Graphics Recognition. Ten Years Review and Future Perspectives. GREC 2005. Lecture Notes in Computer Science, vol 3926. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11767978_18
Download citation
DOI: https://doi.org/10.1007/11767978_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34711-8
Online ISBN: 978-3-540-34712-5
eBook Packages: Computer ScienceComputer Science (R0)