Abstract
For news video images, caption recognizing is a useful and important step for content understanding. Caption locating is usually the first step of caption recognizing and this paper proposes a simple but effective caption locating algorithm called maximum feature score region (MFSR) based method, which mainly consists of two stages: In the first stage, up/down boundaries are attained by turning to edge map projection. Then, maximum feature score region is defined and left/right boundaries are achieved by utilizing MFSR. Experiments show that the proposed MFSR based method has superior and robust performance on news video images of different types.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
S. Y. Yan, X. X. Xu, Q. S. Liu. Robust text detection in natural scenes using text geometry and visual appearance. International Journal of Automation and Computing, vol. 11, no. 5, pp. 480–488, 2014.
K. Jung, K. I. Kim, A. K. Jain. Text information extractionin images and video: a survey. Pattern Recognition, vol. 37, no. 5, pp. 977–997, 2004.
P. Shivakumara, T. Q. Phan, C. L. Tan. Video text detection based on filters and edge features. In Proceedings of 2009 IEEE International Conference on Multimedia and Expo, IEEE, New York, USA, pp. 514–517, 2009.
Y. C. Wei, C. H. Lin. A robust video text detection approach using SVM. Expert Systems with Applications, vol. 39, no. 12, pp. 10832–10840, 2012.
P. Shivakumara, W. H. Huang, T. Q. Phan, C. L. Tan. Accurate video text detection through classification of low andhigh contrast images. Pattern Recognition, vol. 43, no. 6, pp. 2165–2185, 2010.
D. T. Chen, J. M. Odobez, H. Bourlard. Text detection andrecognition in images and video frames. Pattern Recognition, vol. 37, no. 3, pp. 595–608, 2004.
N. Dimitrova, H. J. Zhang, B. Shahraray, I. Sezan, T.Huang, A. Zakhor. Applications of video-content analysisand retrieval. IEEE Multimedia, vol. 9, no. 3, pp. 43–55, 2002.
M. R. Lyu, J. Q. Song, M. Cai. A comprehensive methodfor multilingual video text detection, localization, and extraction. IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, no. 2, pp. 243–255, 2005.
D. T. Chen, J. M. Odobez, J. P. Thiran. A localization/ verification scheme for finding text in images and video frames based on contrast independent features and machine learning methods. Signal Processing: Image Communication, vol. 19, no. 3, pp. 205–217, 2004.
R. Liehart, A. Wernicke. Localizing and segmenting textin images and videos. IEEE Transactions on Circuits and Systems for Video Technology, vol. 12, no. 4, pp. 256–268, 2002.
C. Jung, Q. F. Liu, J. Kim. A new approach for text segmentation using a stroke filter. Signal Processing, vol. 88, no. 7, pp. 1907–1916, 2008.
M. Cai, J. Q. Song, M. R. Lyu. A new approach for videotext detection. In Proceedings of 2002 International Conference on Image Processing, IEEE, Rochester, USA, pp. I–117–I–120, 2002.
J. C. Shim, C. Dorai, R. Bolle. Automatic text extractionfrom video for content-based annotation and retrieval. In Proceedings of the 14th International Conference on Pattern Recognition, IEEE, Brisbane, Australia, pp. 618–620, 1998.
J. Q. Yan, X. B. Gao. Detection and recognition of text superimposed in images base on layered method. Neurocomputing, vol. 134, pp. 3–14, 2014.
J. Q. Yan, J. Li, X. B. Gao. Chinese text location undercomplex background using Gabor filter and SVM. Neurocomputing, vol. 74, no. 17, pp. 2998–3008, 2011.
D. T. Chen, K. Shearer, H. Bourlard. Text enhancement with asymmetric filter for video OCR. In Proceedings of the 11th International Conference on Image Analysis and Processing, IEEE, Palermo, Italy, pp. 192–197, 2001.
M. Anthimopoulos, B. Gatos, I. Pratikakis. Multiresolution text detection in video frames. In Proceedings of 2007International Conference on Computer Vision Theory and Applications, Barcelona, Spain, pp. 161–166, 2007.
C. Z. Shi, C. H. Wang, B. H. Xiao, Y. Zhang, S. Gao. Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognition Letters, vol. 34, no. 2, pp. 107–116, 2013.
Q. X. Ye, Q. M. Huang, W. Gao, D. B. Zhao. Fast and robust text detection in images and video frames. Image and Vision Computing, vol. 23, no. 6, pp. 565–576, 2005.
K. I. Kim, K. Jung, J. H. Kim. Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 12, pp. 1631–1639, 2003.
M. Anthimopoulos, B. Gatos, I. Pratikakis. A two-stage scheme for text detection in video images. Image and Vision Computing, vol. 28, no. 9, pp. 1413–1426, 2010.
H. G. Zhang, K. L. Zhao, Y. Z. Song, J. Guo, Text extraction from natural scene image: A survey. Neurocomputing, vol. 122, pp. 310–323, 2013.
H. Huang, P. Shi, L. W. Yang. A method of caption location and segmentation in news video. In Proceedings of the 7th International Congress on Image and Signal Processing, IEEE, Dalian, China, pp. 365–369, 2014.
Acknowledgement
This work was supported by National Natural Science Foundation of China (Nos. 61272394, 61201395 and 61472119), the program for Science & Technology Innovation Talents in Universities of Henan Province (No. 13HASTIT039), Henan Polytechnic University Innovative Research Team (No. T2014-3), and Henan Polytechnic University Fund for Distinguished Young Scholars (No. J2013-2).
Author information
Authors and Affiliations
Corresponding author
Additional information
Recommended by Associate Editor Victor Becerra
Zhi-Heng Wang received the B. Sc. degree in mechatronic engineering from Beijing Institute of Technology, China in 2004, and the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences, China in 2009. Currently, he is an associate professor at School of Computer Science and Technique, Henan Polytechnic University, China.
His research interests include computer vision, pattern recognition, and image processing.
Chao Guo received the B. Sc. degree from Henan Polytechnic University, China in 2013. Currently, he is a master student at School of Computer Science and Technology, Henan Polytechnic University, China.
His research interests include image processing.
Hong-Min Liu received the B. Sc. degree in electrical &information engineering from Xi’dian University, China in 2004, and her Ph.D. degree from the Institute of Electronics, Chinese Academy of Sciences, China in 2009. Currently, she works as an associate professor at School of Computer Science and Technique, Henan Polytechnic University, China.
Her research interests include image processing, especially on feature detection and matching.
Zhan-Qiang Huo received the B. Sc. degree in mathematics and applied mathematics from the Hebei Normal University of Science & Technology, China in 2003. He received his M. Sc. degree in computer software and theory in 2006 and the Ph.D. degree in circuit and system in 2009 from Yanshan University, China. Currently, he is an associate professor in the college of computer science and technology at Henan Polytechnic University, China. He has published about 20 refereed journal and conference papers.
His research interests include computer software and theory, queuing systems and digital image processing.
Rights and permissions
About this article
Cite this article
Wang, ZH., Guo, C., Liu, HM. et al. MFSR: Maximum feature score region-based captions locating in news video images. Int. J. Autom. Comput. 15, 454–461 (2018). https://doi.org/10.1007/s11633-015-0943-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-015-0943-5