Abstract
This paper presents a robust approach to extracting content from instructional videos for handwritten recognition, indexing and retrieval, and other e-learning applications. For the instructional videos of chalkboard presentations, retrieving the handwritten content (e.g., characters, drawings, figures) on boards is the first and prerequisite step towards further exploration of instructional video content. However, content extraction in instructional videos is still challenging due to video noise, non-uniformity of the color in board regions, light condition changes in a video session, camera movements, and unavoidable occlusions by instructors. To solve this problem, we first segment video frames into multiple regions and estimate the parameters of the board regions based on statistical analysis of the pixels in dominant regions. Then we accurately separate the board regions from irrelevant regions using a probabilistic classifier. Finally, we combine top-hat morphological processing with a gradient-based adaptive thresholding technique to retrieve content pixels from the board regions. Evaluation of the content extraction results on four full-length instructional videos shows the high performance of the proposed method. The extraction of content text facilitates the research on full exploitation of instructional videos, such as content enhancement, indexing, and retrieval.
Similar content being viewed by others
References
Altman E, Chen Y, Low WC (2002) Semantic exploration of lecture videos. In: ACM conference on multimedia, pp 416–417
Ankush Mittal SJ, Sumit Gupta, Jain A (2006) Content-based adaptive compression of educational videos using phase correlation techniques. IEEE Trans Multimedia 11(3):249–259
Antani S, Crandall D, Kasturi R (2000) Robust extraction of text in video. In: International conference on pattern recognition, pp 831–834
Cai M, Song J, Lyu MR (2002) A new approach for video text detection. In: International conference on image processing, pp 117–120
Comaniciu D, Meer P (2002) Mean shift: a robust approach towards feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619
Davis JL, Smith TW (1994) Computer-assisted distance learning. IEEE Trans Educ 37(2):228–233
Dorai C, Oria V, Neelavalli V (2003) Structuralizing educational videos based on presentation content. In: International conference on image processing, vol 3, pp 1029–1032
Fan J, Luo H, Elmagarmid AK (2004) Concept-oriented indexing of video databases: toward semantic sensitive retrieval and browsing. IEEE Trans Image Process 13(7):974–992
Gao J, Yang J (2001) An adaptive algorithm for text detection from natural scenes. In: International conference on computer vision and pattern recognition, pp 84–89
Gonzalez RC, Woods RE (2000) Digital image processing. Addison–Wesley, USA
Heng WJ, Tian Q (2002) Content enhancement for e-learning lecture videos using foreground/background separation. In: IEEE workshop on multimedia signal processing, pp 436–439
Ju SX, Black MJ, Minneman S, Kimber D (1998) Summarization of videotaped presentations: automatic analysis of motion and gesture. IEEE Trans Circuits Systems Video Technol 8(5):686–696
Kittler J, Illingworth J (1986) Minimum error thresholding. Pattern Recognit 19(1):41–47
Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process 9(2):147–156
Liang J, Doermann D, Li H (2005) Camera-based analysis of text and documents: a survey. Int J Doc Anal Recognit 7(2–3):84–104
Lienhart R(1996) Automatic text recognition for video indexing. In: ACM conference on multimedia, pp 11–20
Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits Syst Video Technol 12(4):256–268
Liu T, Hejelsvold R, Kender JR (2002) Analysis and enhancement of videos of electronic slide presentations. In: International conference on multimedia and expo, vol 1, pp 77–80
Liu T, Kender JR (2003) Spatial-temporal semantic grouping of instructional video content. In: International conference on content-based image and video retrieval, pp 362–372
Liu Y, Kender JR (2003) Fast video segment retrieval by sort-merge feature selection, boundary refinement and lazy evaluation. Comput Vis Image Underst 92(2-3):147–175
Malladi R, Sethian JA, Vemuri BC (1995) Shape modeling with front propagation: a level set approach. IEEE Trans Pattern Anal Mach Intell 17(2):158–175
Mandal MK, Idris F, Panchanathan S (1999) A Critical evaluation of image and video indexing techniques in the compressed domain. Image Vis Comput 17(7):513–529
Mukhopadhyay S, Smith B (1999) Passive capture and structuring of lectures. In: ACM conference on multimedia, pp 477–487
Ngo CW, Chan CK (2005) Video text detection and segmentation for optical character recognition. Multimedia Syst 10(3):261–272
Niblack W (1986) An introduction to image processing. Prentice-Hall, Englewood Cliffs
Onishi M, Izumi M, Fukunaga K (2000) Blackboard segmentation using video image of lecture and its applications. In: International conference on pattern recognition, pp 615–618
Phung DQ, Venkatesh S, Dorai C (2002) High level segmentation of instructional videos based on content density. In: ACM confernce on multimedia, pp 295–298
Sezgin M, Sankur B (2004) Survey over image thresholding techniques and quantitative performance evaluation. J Electron Imaging 13(1):146–168
Stafford-Fraser Q, Robinson P (1996) Brightboard: a video-augmented environment. In: Conference on computer human interface, pp 134–141
Syeda-Mahmood T, Srinivasan S (2000) Detecting topical events in digital video. In: ACM conference on multimedia, pp 85–94
Tang X, Luo B, Gao X, Pissaloux E, and Zhang H (2002) Video text extraction using temporal feature vectors. In: International conference on multimedia and expo, vol 1, 85–88
Wang S, Siskind JM (2003) Image segmentation with ratio cut. IEEE Trans Pattern Anal Mach Intell 25(6):675–690
Wienecke M, Fink GA, Sagerer G (2005) Toward automatic video-based whiteboard reading. Int J Doc Anal Recognit 7(2–3):188–200
Zhang D, Nunamaker JF (2004) A natural language approach to content-based video indexing and retrieval for interactive e-learning. IEEE Trans Multimedia 6(3):450–458
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Choudary, C., Liu, T. Extracting content from instructional videos by statistical modelling and classification. Pattern Anal Applic 10, 69–81 (2007). https://doi.org/10.1007/s10044-006-0051-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-006-0051-9