Abstract
Learning models for detecting and classifying object categories is a challenging problem in machine vision. While discriminative approaches to learning and classification have, in principle, superior performance, generative approaches provide many useful features, one of which is the ability to naturally establish explicit correspondence between model components and scene features—this, in turn, allows for the handling of missing data and unsupervised learning in clutter. We explore a hybrid generative/discriminative approach, using ‘Fisher Kernels’ (Jaakola, T., et al. in Advances in neural information processing systems, Vol. 11, pp. 487–493, 1999), which retains most of the desirable properties of generative methods, while increasing the classification performance through a discriminative setting. Our experiments, conducted on a number of popular benchmarks, show strong performance improvements over the corresponding generative approach. In addition, we demonstrate how this hybrid learning paradigm can be extended to address several outstanding challenges within computer vision including how to combine multiple object models and learning with unlabeled data.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Burl, M., & Perona, P. (1996). Recognition of planar object classes. In Computer vision and pattern recognition (CVPR) (p. 223).
Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2002). Choosing multiple parameters for support vector machines. Machine Learning, 46, 131–159.
Crowley, J. L. (1984). A representation for shape based on peaks and ridges in the difference of low pass transform. In Pattern recognition and machine intelligence (PAMI).
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39, 1–38.
Dorko, G., & Schmid, C. (2005). Object class recognition using discriminative local features (Technical Report RR-5497). INRIA.
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples. In Computer vision and pattern recognition (CVPR) workshop on GMBV.
Fergus, R. (2005). Visual object recognition. Thesis, Department of Engineering Science, University of Oxford.
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Computer vision and pattern recognition (CVPR) (Vol. 2, p. 264).
Gold, C., Holub, A., & Sollich, P. (2005). Bayesian approach to feature selection and parameter tuning for support vector machine classifiers. In Neural Networks.
Holub, A., Welling, M., & Perona, P. (2005). Combining generative models and fisher kernels for object class recognition. In International conference on computer vision (ICCV).
Holub, A., & Perona, P. (2005). A discriminative framework for modeling object class. In Computer vision and pattern recognition (CVPR).
Jaakkola, T., Diekhans, M., & Haussler, D. (1999). Exploiting generative models in discriminative classifiers. In Advances in neural information processing systems (NIPS) (Vol. 11, pp. 487–493).
Jaakkola, T., & Haussler, D. (1999). Probabilistic kernel regression models. In Proceedings of the seventh international workshop on artificial intelligence and statistics.
Kadir, T., & Brady, M. (2001). Saliency, scale and image description. International Journal of Computer Vision, 45(2), 83–105.
Leibe, B., & Schiele, B. (2004). Scale-invariant object categorization using a scale-adaptive mean-shift search. In DAGM-symposium (pp. 145–153).
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.
Ng, A., & Jordan, M. (2002). On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In Advances in neural information processing systems (NIPS) (Vol. 12).
Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004). Weak hypotheses and boosting for generic object detection and recognition. In European conference on computer vision (ECCV) (pp. 71–84).
Opper, M., & Winther, O. (2000). Gaussian processes and svm: Mean field and leave-one-out. In Advances in large margin classifiers (pp. 311–326). Cambridge: MIT Press.
Schneiderman, H. (2004). Learning a restricted Bayesian network for object detection. In Computer vision and pattern recognition (CVPR) (pp. 639–646).
Schoelkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: MIT Press.
Seeger, M. (2002). Covariance kernels from Bayesian generative models. In Advances in neural information processing systems (NIPS) (Vol. 14, pp. 905–912).
Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.
Torralba, A., Murphy, K. P., & Freeman, W. T. (2004). Sharing visual features for multiclass and multiview object detection. In Computer vision and pattern recognition (CVPR).
Tsuda, K., Akaho, S., Kawanabe, M., & Müller, K.-R. (2003). Asymptotic properties of the fisher kernel. citeseer.ist.psu.edu/tsuda03asymptotic.html.
Ullman, S., Vidal-Naquet, M., & Sali, E. (2002). Visual features of intermediate complexity and their use in classification. In Nature neuroscience (pp. 682–687).
Vapnik, V. (1998). Statistical learning theory. New York: Wiley–Interscience.
Vasconcelos, N., Ho, P., & Moreno, P. (2004). The Kullback–Leibler kernel as a framework for discriminant and localized representations for visual recognition. In European conference on computer vision (ECCV) (pp. 430–441).
Wallraven, C., Caputo, B., & Graf, A. B. A. (2003). Recognition with local features: the kernel recipe. In International conference on computer vision (ICCV) (pp. 257–264).
Weber, M., Welling, M., & Perona, P. (2000). Towards automatic discovery of object categories. In Computer vision and pattern recognition (CVPR) (p. 2101).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Holub, A.D., Welling, M. & Perona, P. Hybrid Generative-Discriminative Visual Categorization. Int J Comput Vis 77, 239–258 (2008). https://doi.org/10.1007/s11263-007-0084-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-007-0084-6