Abstract
The goal of this paper is to evaluate and compare models and methods for learning to recognize basic entities in images in an unsupervised setting. In other words, we want to discover the objects present in the images by analyzing unlabeled data and searching for re-occurring patterns. We experiment with various baseline methods, methods based on latent variable models, as well as spectral clustering methods. The results are presented and compared both on subsets of Caltech256 and MSRC2, data sets that are larger and more challenging and that include more object classes than what has previously been reported in the literature. A rigorous framework for evaluating unsupervised object discovery methods is proposed.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Bart, E., Porteous, I., & Perona, P. (2008). Unsupervised learning of visual taxonomies. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373–1396.
Bengio, Y., Delalleau, O., Le Roux, N., Paiement, J.-F., Vincent, P., & Ouimet, M. (2004). Learning eigenfunctions links spectral embedding and kernel pca. Neural Computation, 16(10), 2197–2219.
Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94(2), 115–147.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Buntine, W. L. (2002). Variational extensions to EM and multinomial PCA. In 13th European conference on machine learning (ECML’02), Helsinki, Finland.
Buntine, W. L., & Jakulin, A. (2006). Discrete components analysis. In C. Saunders, M. Grobelnik, S. Gunn & J. Shawe-Taylor (Eds.), Subspace, latent structure and feature selection techniques. Berlin: Springer.
Canny, J. (2004). GaP: a factor model for discrete data. In SIGIR 2004 (pp. 122–129).
Chapelle, O., Haffner, P., & Vapnik, V. (1999). Svms for histogram-based image classification. In IEEE transactions on neural networks, special issue on support vectors.
Clarke, B. S., & Barron, A. R. (1994). Jeffrey’s prior is asymptotically least favorable under entropy risk. Journal of Statistical Planning and Inference, 41, 37–60.
Grauman, K., & Darrell, T. (2006). Unsupervised learning of categories from sets of partially matching image features. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology.
Hofmann, T. (1999). Probabilistic latent semantic indexing. In Research and development in information retrieval (pp. 50–57).
Kim, G., Faloutsos, C., & Hebert, M. (2008). Unsupervised modeling of object categories using link analysis techniques. In IEEE conference on computer vision and pattern recognition.
Lee, D., & Seung, H. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788–791.
Liu, D., & Chen, T. (2007). A topic-motion model for unsupervised video object discovery. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoint. International Journal of Computer Vision, 2(60), 91–110.
Meila, M. (2007). Comparing clusterings: an information based distance. Journal of Multivariate Analysis, 98, 873–895.
Mikolajczyk, K., & Schmid, C. (2004). Scale and affine invariant interest point detectors. International Journal of Computer Vision, 1(60), 63–86.
Ng, A., Jordan, M., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems, Vol. 14. Cambridge: MIT Press.
Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Rosenberg, A., & Hirschberg, J. (2007). V-measure: A conditional entropy-based external cluster evaluation measure. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, EMNLP-CoNLL (pp. 410–420).
Russell, B. C., Efros, A. A., Sivic, J., Freeman, W. T., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319.
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proceedings European conference on computer vision.
Sivic, J., Russell, B. C., Efros, A., Zisserman, A., & Freeman, W. T. (2005). Discovering object categories in image collections. In Proceedings of the international conference on computer vision.
Sivic, J., Russell, B. C., Zisserman, A., Freeman, W. T., & Efros, A. A. (2008). Unsupervised discovery of visual object class hierarchies. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Tang, J., & Lewis, P. (2008). Non-negative matrix factorisation for object class discovery and image auto-annotation. In ACM international conference on image and video retrieval.
Todorovic, S., & Ahuja, N. (2006). Extracting subimages of an unknown category from a set of images. In Proceedings of the IEEE conference on computer vision and pattern recognition.
von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.
Wang, X., & Grimson, E. (2008). Spatial latent Dirichlet allocation. In Proceedings of neural information processing systems conference.
Weber, M., Welling, M., & Perona, P., (2000). Towards automatic discovery of object categories. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Tuytelaars, T., Lampert, C.H., Blaschko, M.B. et al. Unsupervised Object Discovery: A Comparison. Int J Comput Vis 88, 284–302 (2010). https://doi.org/10.1007/s11263-009-0271-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-009-0271-8