Extracting content from instructional videos by statistical modelling and classification

Choudary, Chekuri; Liu, Tiecheng

doi:10.1007/s10044-006-0051-9

Extracting content from instructional videos by statistical modelling and classification

Theoretical Advances
Published: 15 November 2006

Volume 10, pages 69–81, (2007)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Chekuri Choudary¹ &
Tiecheng Liu¹

175 Accesses
7 Citations
Explore all metrics

Abstract

This paper presents a robust approach to extracting content from instructional videos for handwritten recognition, indexing and retrieval, and other e-learning applications. For the instructional videos of chalkboard presentations, retrieving the handwritten content (e.g., characters, drawings, figures) on boards is the first and prerequisite step towards further exploration of instructional video content. However, content extraction in instructional videos is still challenging due to video noise, non-uniformity of the color in board regions, light condition changes in a video session, camera movements, and unavoidable occlusions by instructors. To solve this problem, we first segment video frames into multiple regions and estimate the parameters of the board regions based on statistical analysis of the pixels in dominant regions. Then we accurately separate the board regions from irrelevant regions using a probabilistic classifier. Finally, we combine top-hat morphological processing with a gradient-based adaptive thresholding technique to retrieve content pixels from the board regions. Evaluation of the content extraction results on four full-length instructional videos shows the high performance of the proposed method. The extraction of content text facilitates the research on full exploitation of instructional videos, such as content enhancement, indexing, and retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

References

Altman E, Chen Y, Low WC (2002) Semantic exploration of lecture videos. In: ACM conference on multimedia, pp 416–417
Ankush Mittal SJ, Sumit Gupta, Jain A (2006) Content-based adaptive compression of educational videos using phase correlation techniques. IEEE Trans Multimedia 11(3):249–259
Google Scholar
Antani S, Crandall D, Kasturi R (2000) Robust extraction of text in video. In: International conference on pattern recognition, pp 831–834
Cai M, Song J, Lyu MR (2002) A new approach for video text detection. In: International conference on image processing, pp 117–120
Comaniciu D, Meer P (2002) Mean shift: a robust approach towards feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619
Article Google Scholar
Davis JL, Smith TW (1994) Computer-assisted distance learning. IEEE Trans Educ 37(2):228–233
Article MathSciNet Google Scholar
Dorai C, Oria V, Neelavalli V (2003) Structuralizing educational videos based on presentation content. In: International conference on image processing, vol 3, pp 1029–1032
Fan J, Luo H, Elmagarmid AK (2004) Concept-oriented indexing of video databases: toward semantic sensitive retrieval and browsing. IEEE Trans Image Process 13(7):974–992
Article Google Scholar
Gao J, Yang J (2001) An adaptive algorithm for text detection from natural scenes. In: International conference on computer vision and pattern recognition, pp 84–89
Gonzalez RC, Woods RE (2000) Digital image processing. Addison–Wesley, USA
Heng WJ, Tian Q (2002) Content enhancement for e-learning lecture videos using foreground/background separation. In: IEEE workshop on multimedia signal processing, pp 436–439
Ju SX, Black MJ, Minneman S, Kimber D (1998) Summarization of videotaped presentations: automatic analysis of motion and gesture. IEEE Trans Circuits Systems Video Technol 8(5):686–696
Article Google Scholar
Kittler J, Illingworth J (1986) Minimum error thresholding. Pattern Recognit 19(1):41–47
Article Google Scholar
Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process 9(2):147–156
Article Google Scholar
Liang J, Doermann D, Li H (2005) Camera-based analysis of text and documents: a survey. Int J Doc Anal Recognit 7(2–3):84–104
Article Google Scholar
Lienhart R(1996) Automatic text recognition for video indexing. In: ACM conference on multimedia, pp 11–20
Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits Syst Video Technol 12(4):256–268
Article Google Scholar
Liu T, Hejelsvold R, Kender JR (2002) Analysis and enhancement of videos of electronic slide presentations. In: International conference on multimedia and expo, vol 1, pp 77–80
Liu T, Kender JR (2003) Spatial-temporal semantic grouping of instructional video content. In: International conference on content-based image and video retrieval, pp 362–372
Liu Y, Kender JR (2003) Fast video segment retrieval by sort-merge feature selection, boundary refinement and lazy evaluation. Comput Vis Image Underst 92(2-3):147–175
Google Scholar
Malladi R, Sethian JA, Vemuri BC (1995) Shape modeling with front propagation: a level set approach. IEEE Trans Pattern Anal Mach Intell 17(2):158–175
Article Google Scholar
Mandal MK, Idris F, Panchanathan S (1999) A Critical evaluation of image and video indexing techniques in the compressed domain. Image Vis Comput 17(7):513–529
Article Google Scholar
Mukhopadhyay S, Smith B (1999) Passive capture and structuring of lectures. In: ACM conference on multimedia, pp 477–487
Ngo CW, Chan CK (2005) Video text detection and segmentation for optical character recognition. Multimedia Syst 10(3):261–272
Article Google Scholar
Niblack W (1986) An introduction to image processing. Prentice-Hall, Englewood Cliffs
Onishi M, Izumi M, Fukunaga K (2000) Blackboard segmentation using video image of lecture and its applications. In: International conference on pattern recognition, pp 615–618
Phung DQ, Venkatesh S, Dorai C (2002) High level segmentation of instructional videos based on content density. In: ACM confernce on multimedia, pp 295–298
Sezgin M, Sankur B (2004) Survey over image thresholding techniques and quantitative performance evaluation. J Electron Imaging 13(1):146–168
Article Google Scholar
Stafford-Fraser Q, Robinson P (1996) Brightboard: a video-augmented environment. In: Conference on computer human interface, pp 134–141
Syeda-Mahmood T, Srinivasan S (2000) Detecting topical events in digital video. In: ACM conference on multimedia, pp 85–94
Tang X, Luo B, Gao X, Pissaloux E, and Zhang H (2002) Video text extraction using temporal feature vectors. In: International conference on multimedia and expo, vol 1, 85–88
Wang S, Siskind JM (2003) Image segmentation with ratio cut. IEEE Trans Pattern Anal Mach Intell 25(6):675–690
Article Google Scholar
Wienecke M, Fink GA, Sagerer G (2005) Toward automatic video-based whiteboard reading. Int J Doc Anal Recognit 7(2–3):188–200
Article Google Scholar
Zhang D, Nunamaker JF (2004) A natural language approach to content-based video indexing and retrieval for interactive e-learning. IEEE Trans Multimedia 6(3):450–458
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29208, USA
Chekuri Choudary & Tiecheng Liu

Authors

Chekuri Choudary
View author publications
You can also search for this author in PubMed Google Scholar
Tiecheng Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chekuri Choudary.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choudary, C., Liu, T. Extracting content from instructional videos by statistical modelling and classification. Pattern Anal Applic 10, 69–81 (2007). https://doi.org/10.1007/s10044-006-0051-9

Download citation

Received: 08 January 2006
Accepted: 06 September 2006
Published: 15 November 2006
Issue Date: May 2007
DOI: https://doi.org/10.1007/s10044-006-0051-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Extracting content from instructional videos by statistical modelling and classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Robust handwriting extraction and lecture video summarization

Generalized framework for summarization of fixed-camera lecture videos by detecting and binarizing handwritten content

A Robust Video Text Extraction and Recognition Approach Using OCR Feedback Information

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Extracting content from instructional videos by statistical modelling and classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Robust handwriting extraction and lecture video summarization

Generalized framework for summarization of fixed-camera lecture videos by detecting and binarizing handwritten content

A Robust Video Text Extraction and Recognition Approach Using OCR Feedback Information

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now