Abstract
Visual concept detection is important to access visual information on the level of objects and scene types. The current state–of–the–art in visual concept detection and annotation tasks is based on the bag–of–words model. Within the bag–of–words model, points are first sampled according to some strategy, then the area around these points are described using color descriptors. These descriptors are then vector–quantized against a codebook of prototypical descriptors, which results in a fixed–length representation of the image. Based on these representations, visual concept models are trained. In this chapter, we discuss the design choices within the bag–of–words model and their implications for concept detection accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Burghouts GJ, Geusebroek JM (2009) Performance evaluation of local color invariants. Computer Vision and Image Understanding 113:48–62
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm
Everingham M, Van Gool L, Williams C, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88(2):303–338
Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, vol 2, pp 524–531
Geusebroek JM, van den Boomgaard R, Smeulders AWM, Geerts H (2001) Color invariance. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(12):1338–1350
Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: IEEE International Conference on Computer Vision, pp 604–610
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, vol 2, pp 2169–2178
Leung TK, Malik J (2001) Representing and recognizing the visual appearance of materials using three–dimensional textons. International Journal of Computer Vision 43(1):29–44
Lin HT, Lin CJ, Weng RC (2007) A note on Platt’s probabilistic outputs for support vector machines. Machine Learning 68(3):267–276
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2):91–110
Marszałek M, Schmid C, Harzallah H, van de Weijer J (2007) Learning object representations for visual object class recognition. Visual Recognition Challenge workshop, in conjunction with IEEE International Conference on Computer Vision
Nowak S, Dunker P (2009) Overview of the clef 2009 large scale visual concept detection and annotation task. In: Working notes CLEF 2009, Corfu, Greece
Van de Sande KEA, Gevers T, Snoek CGM (2008) A comparison of color features for visual concept classification. In: ACM International Conference on Image and Video Retrieval. ACM press, pp 141–150
Van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9)
Snoek CGM, Worring M (2009) Concept–based video retrieval. Foundations and Trends in Information Retrieval 4(2):215–322
Snoek CGM, van de Sande KEA, de Rooij O, Huurnink B, van Gemert JC, Uijlings JRR, et al (2008) The MediaMill TRECVID 2008 semantic video search engine. In: Proceedings of the TRECVID Workshop
Snoek CGM, van de Sande KEA, de Rooij O, Huurnink B, Uijlings JRR, van Liempt M, Bugalho M, Trancoso I, Yan F, Tahir MA, Mikolajczyk K, Kittler J, de Rijke M, Geusebroek JM, Gevers T, Worring M, Koelma DC, Smeulders AWM (2009) The MediaMill TRECVID 2009 semantic video search engine. In: Proceedings of the TRECVID Workshop
Tuytelaars T, Mikolajczyk K (2008) Local invariant feature detectors: A survey. Foundations and Trends in Computer Graphics and Vision 3(3):177–280
Uijlings JRR, Smeulders AWM, Scha RJH (2009) Real–time bag–of–words, approximately. In: ACM International Conference on Image and Video Retrieval. ACM press
Van Gemert JC, Veenman CJ, Smeulders AWM, Geusebroek JM (2010) Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(7):1271–1283
Vapnik VN (2000) The nature of statistical learning theory, 2nd edn. Springer
Wang D, Liu X, Luo L, Li J, Zhang B (2007) Video diver: generic video indexing with diverse features. In: ACM International Workshop on Multimedia Information Retrieval. ACM press, Augsburg, Germany, pp 61–70
Zhang J, Marszałek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision 73(2):213–238
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
van de Sande, K.E.A., Gevers, T. (2010). University of Amsterdam at the Visual Concept Detection and Annotation Tasks. In: Müller, H., Clough, P., Deselaers, T., Caputo, B. (eds) ImageCLEF. The Information Retrieval Series, vol 32. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15181-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-15181-1_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15180-4
Online ISBN: 978-3-642-15181-1
eBook Packages: Computer ScienceComputer Science (R0)