Abstract
Digital preservation workflows for automatic acquisition of image collections are susceptible to errors and require quality assurance. This paper presents an expert system that supports decision making for page duplicate detection in document image collections. Our goal is to create a reliable inference engine and a solid knowledge base from the output of an image processing tool that detects duplicates based on methods of computer vision. We employ artificial intelligence technologies (i.e. knowledge base, expert rules) to emulate reasoning about the knowledge base similar to a human expert. A statistical analysis of the automatically extracted information from the image comparison tool and the qualitative analysis of the aggregated knowledge are presented.
This work was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arman, N.: Fault detection in dynamic rule bases using spanning trees and disjoint sets. Int. Arab J. Inf. Technol. 4(1), 67–72 (2007)
Becker, C., Kulovits, H., Guttenbrunner, M., Strodl, S., Rauber, A., Hofman, H.: Systematic planning for digital preservation: evaluating potential strategies and building preservation plans. International Journal on Digital Libraries 10(4), 133–157 (2009)
Bernard, J.: Use of a rule-based system for process control. IEEE Control Systems Magazine 8(5), 3–13 (1988)
van Beusekom, J., Keysers, D., Shafait, F., Breuel, T.: Distance measures for layout-based document image retrieval. In: 2nd ICDIAL, DIAL 2006, pp. 231–242 (April 2006)
Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on SLCV, ECCV, pp. 1–22 (2004)
Huber-Mörk, R., Schindler, A.: Quality Assurance for Document Image Collections in Digital Preservation. In: Blanc-Talon, J., Philips, W., Popescu, D., Scheunders, P., Zemčík, P. (eds.) ACIVS 2012. LNCS, vol. 7517, pp. 108–119. Springer, Heidelberg (2012)
Ke, Y., Sukthankar, R., Huston, L.: An efficient parts-based near-duplicate and sub-image retrieval system. In: Proceedings of the 12th Annual ACM International Conference on Multimedia, MULTIMEDIA 2004, pp. 869–876. ACM, New York (2004)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. of Comput. Vision 60(2), 91–110 (2004)
Nguyen, T.A., Perkins, W.A., Laffey, T.J., Pecora, D.: Checking an expert systems knowledge base for consistency and completeness. In: Proc. of the 9th IJCAI, IJCAI 1985, vol. 1, pp. 375–378. Morgan Kaufmann Publishers Inc., San Francisco (1985)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proc. of the IEEE CCVPR (2007)
Ramachandrula, S., Joshi, G., Noushath, S., Parikh, P., Gupta, V.: Paperdiff: A script independent automatic method for finding the text differences between two document images. In: The Eighth IAPR Intl. Workshop on DAS, DAS 2008, pp. 585–590 (September 2008)
Schlarb, S., Michaelar, E., Kaiser, M., Lindley, A., Aitken, B., Ross, S., Jackson, A.: A case study on performing a complex file-format migration experiment using the planets testbed. IS&T Archiving Conference 7, 58–63 (2010)
Schmid, C., Mohr, R., Bauckhage, C.: Evaluation of interest point detectors. Int. J. of Computer Vision 37(2), 151–172 (2000)
Strodl, S., Becker, C., Neumayer, R., Rauber, A.: How to choose a digital preservation strategy: evaluating a preservation planning procedure. In: JCDL 2007: Proceedings of the 2007 Conference on Digital Libraries, pp. 29–38. ACM, New York (2007)
Wu, X., Zhao, W.L., Ngo, C.W.: Near-duplicate keyframe retrieval with visual keywords and semantic context. In: Proc. of the 6th ACM ICIVR, CIVR 2007, pp. 162–169. ACM, New York (2007)
Yoo, H.W., Park, H.S., Jang, D.S.: Expert system for color image retrieval. Expert Syst. Appl. 28(2), 347–357 (2005)
Zhao, W.L., Ngo, C.W., Tan, H.K., Wu, X.: Near-duplicate keyframe identification with interest point matching and pattern learning. IEEE Transactions on Multimedia 9(5), 1037–1048 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Graf, R., Huber-Mörk, R., Schindler, A., Schlarb, S. (2012). An Expert System for Quality Assurance of Document Image Collections. In: Ioannides, M., Fritsch, D., Leissner, J., Davies, R., Remondino, F., Caffo, R. (eds) Progress in Cultural Heritage Preservation. EuroMed 2012. Lecture Notes in Computer Science, vol 7616. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34234-9_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-34234-9_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34233-2
Online ISBN: 978-3-642-34234-9
eBook Packages: Computer ScienceComputer Science (R0)