Spatially Prioritized and Persistent Text Detection and Decoding | SpringerLink
Skip to main content

Spatially Prioritized and Persistent Text Detection and Decoding

  • Conference paper
  • First Online:
Camera-Based Document Analysis and Recognition (CBDAR 2013)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8357))

Abstract

We show how to exploit temporal and spatial coherence to achieve efficient and effective text detection and decoding for a sensor suite moving through an environment in which text occurs at a variety of locations, scales and orientations with respect to the observer. Our method uses simultaneous localization and mapping (SLAM) to extract planar “tiles” representing scene surfaces. Multiple observations of each tile, captured from different observer poses, are aligned using homography transformations. Text is detected using Discrete Cosine Transform (DCT) and Maximally Stable Extremal Regions (MSER), and decoded by an Optical Character Recognition (OCR) engine. The decoded characters are then clustered into character blocks to obtain an MLE word configuration. This paper’s contributions include: (1) spatiotemporal fusion of tile observations via SLAM, prior to inspection, thereby improving the quality of the input data; and (2) combination of multiple noisy text observations into a single higher-confidence estimate of environmental text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, X., Yuille, A.: Detecting and reading text in natural scenes. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2004)

    Google Scholar 

  2. Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Asian Conference on Computer Vision (ACCV), pp. 770–783 (2004)

    Google Scholar 

  3. Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 687–691 (2011)

    Google Scholar 

  4. Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Proceedings of the IEEE International Conference Computer Vision and Pattern Recognition (CVPR) (2012)

    Google Scholar 

  5. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)

    Article  Google Scholar 

  6. Lucas, S.: ICDAR 2005 text locating competition results. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 80–84 (2005)

    Google Scholar 

  7. Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010)

    Google Scholar 

  8. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: International Conference on Computer Vision (ICCV) (2011)

    Google Scholar 

  9. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of the IEEE International Conference Computer Vision and Pattern Recognition (CVPR), pp. 2963–2970 (2010)

    Google Scholar 

  10. Smith, R.: An overview of the tesseract OCR engine. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 629–633 (2007)

    Google Scholar 

  11. Smith, R.: History of the tesseract OCR engine: what worked and what didn’t. In: Proceedings of SPIE Document Recognition and Retrieval (2013)

    Google Scholar 

  12. Posner, I., Corke, P., Newman, P.: Using text-spotting to query the world. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3181–3186 (2010)

    Google Scholar 

  13. Yi, C., Tian, Y.: Assistive text reading from complex background for blind persons. In: Proceedings of Camera-based Document Analysis and Recognition (CBDAR), pp. 15–28 (2011)

    Google Scholar 

  14. Sato, T., Kanade, T., Hughes, E., Smith, M.: Video OCR for digital news archive. In: Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database, pp. 52–60 (1998)

    Google Scholar 

  15. Li, H., Doermann, D.: Text enhancement in digital video using multiple frame integration. In: Proceedings of the seventh ACM international conference on Multimedia (Part 1), pp. 19–22 (1999)

    Google Scholar 

  16. Hua, X.S., Yin, P., Zhang, H.J.: Efficient video text recognition using multiple frame integration. In: Proceedings of the 2002 International Conference on Image Processing, vol. 2 II-397–II-400 (2002)

    Google Scholar 

  17. Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recogn. 37(5), 977–997 (2004)

    Article  Google Scholar 

  18. Myers, G.K., Burns, B.: A robust method for tracking scene text in video imagery. In: CBDAR05 (2005)

    Google Scholar 

  19. Olson, E.: Real-time correlative scan matching. In: IEEE International Conference on Robotics and Automation (ICRA), Kobe, Japan, pp. 4387–4393, June 2009

    Google Scholar 

  20. Bachrach, A., Prentice, S., He, R., Roy, N.: RANGE - robust autonomous navigation in GPS-denied environments. J. Field Robot. 28(5), 644–666 (2011)

    Article  Google Scholar 

  21. Fallon, M.F., Johannsson, H., Brookshire, J., Teller, S., Leonard, J.J.: Sensor fusion for flexible human-portable building-scale mapping. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Algarve, Portugal (2012)

    Google Scholar 

  22. Park, S.C., Park, M.K., Kang, M.G.: Super-resolution image reconstruction: a technical overview. IEEE Signal Process. Mag. 20(3), 21–36 (2003)

    Article  Google Scholar 

  23. Farsiu, S., Robinson, M., Elad, M., Milanfar, P.: Fast and robust multiframe super resolution. IEEE Trans. Image Process. 13(10), 1327–1344 (2004)

    Article  Google Scholar 

  24. Mishra, A., Alahari, K., Jawahar, C.: Top-down and bottom-up cues for scene text recognition. In: Proceedings of the IEEE International Conference Computer Vision and Pattern Recognition (CVPR), pp. 2687–2694 (2012)

    Google Scholar 

  25. Crandall, D., Antani, S., Kasturi, R.: Extraction of special effects caption text events from digital video. Int. J. Doc. Anal. Recogn. 5(2–3), 138–157 (2003)

    Article  Google Scholar 

  26. Goto, H.: Redefining the dct-based feature for scene text detection. Int. J. Doc. Anal. Recogn. (IJDAR) 11(1), 1–8 (2008)

    Article  MathSciNet  Google Scholar 

  27. Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 183–196. Springer, Heidelberg (2008)

    Google Scholar 

  28. Merino-Gracia, C., Lenc, K., Mirmehdi, M.: A head-mounted device for recognizing text in natural scenes. In: Proceedings of Camera-based Document Analysis and Recognition (CBDAR), pp. 29–41 (2011)

    Google Scholar 

  29. Huang, A., Olson, E., Moore, D.: LCM: Lightweight communications and marshalling. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan, October 2010

    Google Scholar 

  30. Bonci, A., Leo, T., Longhi, S.: A Bayesian approach to the Hough transform for line detection. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 35(6), 945–955 (2005)

    Article  Google Scholar 

  31. Jones, M.N., Mewhort, D.J.K.: Case-sensitive letter and bigram frequency counts from large-scale english corpora. Behav. Res. Meth. Instrum. Comput. 36(3), 388–396 (2004)

    Article  Google Scholar 

  32. Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

Download references

Acknowledgment

We thank the Andrea Bocelli Foundation for their support, and Javier Velez and Ben Mattinson for their contributions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hsueh-Cheng Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, HC., Landa, Y., Fallon, M., Teller, S. (2014). Spatially Prioritized and Persistent Text Detection and Decoding. In: Iwamura, M., Shafait, F. (eds) Camera-Based Document Analysis and Recognition. CBDAR 2013. Lecture Notes in Computer Science(), vol 8357. Springer, Cham. https://doi.org/10.1007/978-3-319-05167-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05167-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05166-6

  • Online ISBN: 978-3-319-05167-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics