Abstract.
This paper explores those aspects of document capture that are specific to cameras. Each of them must be addressed in order to close the gap between taking a photograph of a document and capturing the document itself. We present results in five areas: (1) framing documents using structured light, (2) robustly dealing with ambient illumination when capturing glossy documents, (3) improving text quality when using mosaiced color sensors, (4) robustly and passively recovering perspective and image plane skew using text flow, and (5) measuring and undoing page curl using structured light and an applicable surface model. The ultimate success of subsequent document recognition will be heavily dependent on the successful completion of these tasks.
Similar content being viewed by others
References
Ishii H, Kobayash M, Arita K, Yagi T (1997) Iterative design of collaboration media. In: Finn KE, Sellen AJ, Wilber SB (eds) Video-mediated communication, Chap 21. Erlbaum, Mahwah, NJ
Brown BAT, Sellen AJ, O’Hara KP (2000) A diary study of information capture in working life. In: Proceedings of CHI 2000, The Hague, The Netherlands, pp 438-445
Pollard SB, Pilu M, Goris AC (2000) Framing aid for a document capture device. European Patent Application EP1128655
Soifer VA, Golub MA (1994) Laser beam mode selection by computer generated holograms. CRC Press, Boca Raton, FL
Frost P, Pollard S, Pilu M (1999) Framing aids to support document capture using digital cameras: a user study. HP Labs Technical Report HPL-99-146
Judd DB (1937) Gloss and glossiness. Am Dyest Rep 26:234-235
Foley J, vanDam AM, Feiner S, Hughes J (1990) Computer graphics: principles and practice. Addison Wesley, Reading, MA
Pollard SB, Pilu M (2000) Practical modelling of specularity from strobes in close-up imaging. HP Labs Technical Report HPL-2000-150
Pollard SB, Pilu M (2002) Digital cameras. European Patent Application EP1233606
Adams JE (1997) Design of practical color filter array interpolation algorithms for digital cameras. In: Proceedings of SPIE Real Time Imaging II, 3028:117-125
Hunter AA, Pollard SB (2002) Image mosaic data reconstruction. US Patent Application 09/906, 786
Gonzalez RC (1992) Digital image processing. Addison Wesley, Reading, MA, pp 196-197
Haralick RM (1989) Monocular vision using inverse perspective projection geometry: analytic relations. In: CVPR, pp 370-378
Taylor MJ, Zappala A, Newman WM, Dance CR (1999) Documents through cameras. Image Vis Comput 17(11):831-844
Nakano Y, Shima Y, Fujisawa H, Higashino J, Fojinawa M (1990) An algorithm for the skew normalization of document images. In: ICPR, 2:8-13
Hashizume A, Yeh PS, Rosenfeld A (1986) A method of detecting the orientation of aligned components. Pattern Recog Lett 4:125-132
Messelodi S, Modena CM (1999) Automatic identification and skew estimation of text lines in real scene images. Pattern Recog 32:791-810
Coughlan JM, Yuille AL (1999) Manhattan world: compass direction from single image by Bayesian inference. In: International conference on computer vision, pp 941-947
Kwon JS, Hong HK, Choi JS (1996) Obtaining a 3D orientation of projective textures using a morphological method. Pattern Recog 29:725-732
Clark P, Mirmhedi M (2000) Location and recovery of text on oriented surfaces. SPIE conference on electronic imaging: document recognition and retrieval VII, January 2000
Clark P, Mirmehdi M (2003) Rectifying perspective views of text in 3D scenes using vanishing points. Pattern Recog 36(11):2673-2686
Pilu M (2001) Extraction of illusory linear clues in perspectively skewed documents. In: CVPR, December 2001
Pilu M (2001) Perspective deskewing of documents from linear clues. HP Labs Technical Report HPL-2001-6, January 2001
Pilu M (2002) Document capture. US Patent Application US20020149808 A1
Bruce V, Green PR (1991) Visual perception, 2nd edn. Psychology Press, East Sussex, UK
Pilu M, Pollard S (2002) A light-weight text image processing method for handheld embedded cameras. In: British Machine Vision Conference, September 2002
Haralich R, Shapiro L (1992) Computer and robot vision. Addison Wesley, Reading, MA
Fischler MA, Bolles RC (1981) A RANSAC-based approach to model fitting and its application to finding cylinders in range data. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp 637-643
Hartley RI (1999) Theory and practice of projective rectification. Int J Comput Vis 35(2):1-16
Pilu M (1998) Page curl recovery with structured light. HP Labs Technical Report HPL-98-174, October 1998
Pilu M (2000) Document imaging system. European Patent Application EP00946058
Pilu M (2002) Undoing page curl using applicable surfaces. : In: CVPR, Kauai, HI, December 2001
Wang YF, Aggarwal JK (1998) An overview of geometric modeling using active sensing. IEEE Control Syst Mag 8(3):5-13
Besl PJ, Jain RC (1985) Three-dimensional object recognition. Comput Surv 17(1):75-145
Xerox Corp (1998) Platenless book scanning system with a general imaging geometry. US Patent 5,760,925, June 1998
Xerox Corp (1998) Platenless book scanner with line buffering to compensate for image skew. US Patent 5,764,383, June 1998
Minolta Camera Kabushiki Kaisha (1992) Document reading apparatus for detection of curvature in documents. US Patent 5,084,611, January 1992
Ng HN, Grimsdale L (1996) Computer graphic techniques for modeling cloth. IEEE Comput Graph Appl 16(5):28-45
Ma SD, Lin H (1998) Optimal texture mapping. In: Eurographics. Elsevier, Amsterdam
Do Carmo MP (1976) Differential geometry of curves and surfaces. Prentice-Hall, Upper Saddle River, NJ
Author information
Authors and Affiliations
Corresponding author
Additional information
Received: 8 December 2003, Revised: 6 April 2004, Published online: 11 March 2005
Rights and permissions
About this article
Cite this article
Pollard, S., Pilu, M. Building cameras for capturing documents. IJDAR 7, 123–137 (2005). https://doi.org/10.1007/s10032-004-0129-0
Issue Date:
DOI: https://doi.org/10.1007/s10032-004-0129-0