Spatially Prioritized and Persistent Text Detection and Decoding

Wang, Hsueh-Cheng; Landa, Yafim; Fallon, Maurice; Teller, Seth

doi:10.1007/978-3-319-05167-3_1

Hsueh-Cheng Wang¹⁷,
Yafim Landa¹⁷,
Maurice Fallon¹⁷ &
…
Seth Teller¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8357))

Included in the following conference series:

International Workshop on Camera-Based Document Analysis and Recognition

883 Accesses
4 Citations
3 Altmetric

Abstract

We show how to exploit temporal and spatial coherence to achieve efficient and effective text detection and decoding for a sensor suite moving through an environment in which text occurs at a variety of locations, scales and orientations with respect to the observer. Our method uses simultaneous localization and mapping (SLAM) to extract planar “tiles” representing scene surfaces. Multiple observations of each tile, captured from different observer poses, are aligned using homography transformations. Text is detected using Discrete Cosine Transform (DCT) and Maximally Stable Extremal Regions (MSER), and decoded by an Optical Character Recognition (OCR) engine. The decoded characters are then clustered into character blocks to obtain an MLE word configuration. This paper’s contributions include: (1) spatiotemporal fusion of tile observations via SLAM, prior to inspection, thereby improving the quality of the input data; and (2) combination of multiple noisy text observations into a single higher-confidence estimate of environmental text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Scene Text Detection and Tracking for Wearable Text-to-Speech Translation Camera

Bidirectional extraction and recognition of scene text with layout consistency

Article 23 February 2016

Smooth Stroke Width Transform for Text Detection

References

Chen, X., Yuille, A.: Detecting and reading text in natural scenes. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2004)
Google Scholar
Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Asian Conference on Computer Vision (ACCV), pp. 770–783 (2004)
Google Scholar
Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 687–691 (2011)
Google Scholar
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Proceedings of the IEEE International Conference Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)
Article Google Scholar
Lucas, S.: ICDAR 2005 text locating competition results. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 80–84 (2005)
Google Scholar
Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010)
Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: International Conference on Computer Vision (ICCV) (2011)
Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of the IEEE International Conference Computer Vision and Pattern Recognition (CVPR), pp. 2963–2970 (2010)
Google Scholar
Smith, R.: An overview of the tesseract OCR engine. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 629–633 (2007)
Google Scholar
Smith, R.: History of the tesseract OCR engine: what worked and what didn’t. In: Proceedings of SPIE Document Recognition and Retrieval (2013)
Google Scholar
Posner, I., Corke, P., Newman, P.: Using text-spotting to query the world. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3181–3186 (2010)
Google Scholar
Yi, C., Tian, Y.: Assistive text reading from complex background for blind persons. In: Proceedings of Camera-based Document Analysis and Recognition (CBDAR), pp. 15–28 (2011)
Google Scholar
Sato, T., Kanade, T., Hughes, E., Smith, M.: Video OCR for digital news archive. In: Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database, pp. 52–60 (1998)
Google Scholar
Li, H., Doermann, D.: Text enhancement in digital video using multiple frame integration. In: Proceedings of the seventh ACM international conference on Multimedia (Part 1), pp. 19–22 (1999)
Google Scholar
Hua, X.S., Yin, P., Zhang, H.J.: Efficient video text recognition using multiple frame integration. In: Proceedings of the 2002 International Conference on Image Processing, vol. 2 II-397–II-400 (2002)
Google Scholar
Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recogn. 37(5), 977–997 (2004)
Article Google Scholar
Myers, G.K., Burns, B.: A robust method for tracking scene text in video imagery. In: CBDAR05 (2005)
Google Scholar
Olson, E.: Real-time correlative scan matching. In: IEEE International Conference on Robotics and Automation (ICRA), Kobe, Japan, pp. 4387–4393, June 2009
Google Scholar
Bachrach, A., Prentice, S., He, R., Roy, N.: RANGE - robust autonomous navigation in GPS-denied environments. J. Field Robot. 28(5), 644–666 (2011)
Article Google Scholar
Fallon, M.F., Johannsson, H., Brookshire, J., Teller, S., Leonard, J.J.: Sensor fusion for flexible human-portable building-scale mapping. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Algarve, Portugal (2012)
Google Scholar
Park, S.C., Park, M.K., Kang, M.G.: Super-resolution image reconstruction: a technical overview. IEEE Signal Process. Mag. 20(3), 21–36 (2003)
Article Google Scholar
Farsiu, S., Robinson, M., Elad, M., Milanfar, P.: Fast and robust multiframe super resolution. IEEE Trans. Image Process. 13(10), 1327–1344 (2004)
Article Google Scholar
Mishra, A., Alahari, K., Jawahar, C.: Top-down and bottom-up cues for scene text recognition. In: Proceedings of the IEEE International Conference Computer Vision and Pattern Recognition (CVPR), pp. 2687–2694 (2012)
Google Scholar
Crandall, D., Antani, S., Kasturi, R.: Extraction of special effects caption text events from digital video. Int. J. Doc. Anal. Recogn. 5(2–3), 138–157 (2003)
Article Google Scholar
Goto, H.: Redefining the dct-based feature for scene text detection. Int. J. Doc. Anal. Recogn. (IJDAR) 11(1), 1–8 (2008)
Article MathSciNet Google Scholar
Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 183–196. Springer, Heidelberg (2008)
Google Scholar
Merino-Gracia, C., Lenc, K., Mirmehdi, M.: A head-mounted device for recognizing text in natural scenes. In: Proceedings of Camera-based Document Analysis and Recognition (CBDAR), pp. 29–41 (2011)
Google Scholar
Huang, A., Olson, E., Moore, D.: LCM: Lightweight communications and marshalling. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan, October 2010
Google Scholar
Bonci, A., Leo, T., Longhi, S.: A Bayesian approach to the Hough transform for line detection. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 35(6), 945–955 (2005)
Article Google Scholar
Jones, M.N., Mewhort, D.J.K.: Case-sensitive letter and bigram frequency counts from large-scale english corpora. Behav. Res. Meth. Instrum. Comput. 36(3), 388–396 (2004)
Article Google Scholar
Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Article Google Scholar

Download references

Acknowledgment

We thank the Andrea Bocelli Foundation for their support, and Javier Velez and Ben Mattinson for their contributions.

Author information

Authors and Affiliations

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Hsueh-Cheng Wang, Yafim Landa, Maurice Fallon & Seth Teller

Authors

Hsueh-Cheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yafim Landa
View author publications
You can also search for this author in PubMed Google Scholar
Maurice Fallon
View author publications
You can also search for this author in PubMed Google Scholar
Seth Teller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hsueh-Cheng Wang .

Editor information

Editors and Affiliations

Graudate School of Engineering, Osaka Prefecture University, Osaka, Japan
Masakazu Iwamura
The University of Western Australia, Crawley, West Australia, Australia
Faisal Shafait

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, HC., Landa, Y., Fallon, M., Teller, S. (2014). Spatially Prioritized and Persistent Text Detection and Decoding. In: Iwamura, M., Shafait, F. (eds) Camera-Based Document Analysis and Recognition. CBDAR 2013. Lecture Notes in Computer Science(), vol 8357. Springer, Cham. https://doi.org/10.1007/978-3-319-05167-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-05167-3_1
Published: 19 March 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05166-6
Online ISBN: 978-3-319-05167-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics