Bayesian non-parametrics and the probabilistic approach to modelling

doi:10.1098/rsta.2011.0553

. 2012 Dec 31;371(1984):20110553.

doi: 10.1098/rsta.2011.0553. Print 2013 Feb 13.

Bayesian non-parametrics and the probabilistic approach to modelling

Zoubin Ghahramani¹

Affiliations

PMID: 23277609
PMCID: PMC3538441
DOI: 10.1098/rsta.2011.0553

Bayesian non-parametrics and the probabilistic approach to modelling

Zoubin Ghahramani. Philos Trans A Math Phys Eng Sci. 2012.

. 2012 Dec 31;371(1984):20110553.

doi: 10.1098/rsta.2011.0553. Print 2013 Feb 13.

Author

Zoubin Ghahramani¹

Affiliation

¹ Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK. zoubin@eng.cam.ac.uk

PMID: 23277609
PMCID: PMC3538441
DOI: 10.1098/rsta.2011.0553

Abstract

Modelling is fundamental to many fields of science and engineering. A model can be thought of as a representation of possible data one could predict from a system. The probabilistic approach to modelling uses probability theory to express all aspects of uncertainty in the model. The probabilistic approach is synonymous with Bayesian modelling, which simply uses the rules of probability theory in order to make predictions, compare alternative models, and learn model parameters and structure from data. This simple and elegant framework is most powerful when coupled with flexible probabilistic models. Flexibility is achieved through the use of Bayesian non-parametrics. This article provides an overview of probabilistic modelling and an accessible survey of some of the main tools in Bayesian non-parametrics. The survey covers the use of Bayesian non-parametrics for modelling unknown functions, density estimation, clustering, time-series modelling, and representing sparsity, hierarchies, and covariance structure. More specifically, it gives brief non-technical overviews of Gaussian processes, Dirichlet processes, infinite hidden Markov models, Indian buffet processes, Kingman's coalescent, Dirichlet diffusion trees and Wishart processes.

PubMed Disclaimer

Figures

**Figure 1.**
Marginal likelihoods, Occam’s razor and overfitting: consider modelling a function y=f(x)+ϵ describing the relationship between some input variable x, and some output or response variable y. (a) The red dots in the plots on the left-hand side are a dataset of eight (x,y) pairs of points. There are many possible f that could model this given data. Let us consider polynomials of different order, ranging from constant (M=0), linear (M=1), quadratic (M=2), etc., to seventh order (M=7). The blue curves depict maximum-likelihood polynomials fit to the data under Gaussian noise assumptions (i.e. least-squares fits). Clearly, the M=7 polynomial can fit the data perfectly, but it seems to be overfitting wildly, predicting that the function will shoot off up or down between neighbouring observed data points. By contrast, the constant polynomial may be underfitting, in the sense that it might not pick up some of the structure in the data. The green curves indicate 20 random samples from the Bayesian posterior of polynomials of different order given this data. A Gaussian prior was used for the coefficients, and an inverse gamma prior on the noise variance (these conjugate choices mean that the posterior can be analytically integrated). The samples show that there is considerable posterior uncertainty given the data, and also that the maximum-likelihood estimate can be very different from the typical sample from the posterior. (b) The normalized model evidence or marginal likelihood for this model is plotted as a function of the model order, P(Y |M), where the dataset Y are the eight observed output y values. Note that given the data, model orders ranging from M=0 to M=3 have considerably higher marginal likelihood than other model orders, which seems plausible given the data. Higher-order models, M>3, have relatively much smaller marginal likelihood, which is not visible on this scale. The decrease in marginal likelihood as a function of model order is a reflection of the automatic Occam razor that results from Bayesian marginalization.

**Figure 2.**
An illustration of Occam’s razor. Consider all possible datasets of some fixed size n. Competing probabilistic models correspond to alternative distributions over the datasets. Here, we have illustrated three possible models that spread their probability mass in different ways over these possible datasets. A *complex* model (shown in blue) spreads its mass over many more possible datasets, whereas a *simple* model (shown in green) concentrates its mass on a smaller fraction of possible data. Because probabilities have to sum to one, the complex model spreads its mass at the cost of not being able to model simple datasets as well as a simple model—this normalization is what results in an automatic Occam razor. Given any particular dataset, here indicated by the dotted line, we can use the marginal likelihood to reject both overly simple models, and overly complex models. This figure is inspired by a figure from MacKay [10], and an actual realization of this figure on a toy classification problem is discussed in Murray & Ghahramani [11].

**Figure 3.**
A sample from an IBP matrix, with columns reordered. Each row has, on average, 10 ones. Note the logarithmic growth of non-zero columns with rows. For the ‘restaurant’ analogy where customers enter a buffet with infinitely many dishes, you can refer to the original IBP papers.

**Figure 4.**
A diagram representing how some models relate to each other. We start from finite mixture models and consider three different ways of extending them. Orange arrows correspond to time-series versions of static (iid) models. Blue arrows correspond to Bayesian non-parametric versions of finite parametric models. Green arrows correspond to factorial (overlapping subset) versions of clustering (non-overlapping) models. ifHMM, infinite factorial hidden Markov model.

See this image and copyright information in PMC

Cited by

A Bayesian Sample Size Estimation Procedure Based on a B-Splines Semiparametric Elicitation Method.
Azzolina D, Berchialla P, Bressan S, Da Dalt L, Gregori D, Baldi I. Azzolina D, et al. Int J Environ Res Public Health. 2022 Oct 31;19(21):14245. doi: 10.3390/ijerph192114245. Int J Environ Res Public Health. 2022. PMID: 36361129 Free PMC article.
Genome-wide prediction using Bayesian additive regression trees.
Waldmann P. Waldmann P. Genet Sel Evol. 2016 Jun 10;48(1):42. doi: 10.1186/s12711-016-0219-8. Genet Sel Evol. 2016. PMID: 27286957 Free PMC article.
Statistical Emulation of Neural Simulators: Application to Neocortical L2/3 Large Basket Cells.
Shapira G, Marcus-Kalish M, Amsalem O, Van Geit W, Segev I, Steinberg DM. Shapira G, et al. Front Big Data. 2022 Mar 25;5:789962. doi: 10.3389/fdata.2022.789962. eCollection 2022. Front Big Data. 2022. PMID: 35402905 Free PMC article.
Prior Elicitation for Use in Clinical Trial Design and Analysis: A Literature Review.
Azzolina D, Berchialla P, Gregori D, Baldi I. Azzolina D, et al. Int J Environ Res Public Health. 2021 Feb 13;18(4):1833. doi: 10.3390/ijerph18041833. Int J Environ Res Public Health. 2021. PMID: 33668623 Free PMC article. Review.
GEO-CEOS stage 4 validation of the Satellite Image Automatic Mapper lightweight computer program for ESA Earth observation level 2 product generation - Part 1: Theory.
Baraldi A, Humber ML, Tiede D, Lang S. Baraldi A, et al. Cogent Geosci. 2018 Jun 10;4(1):1-46. doi: 10.1080/23312041.2018.1467357. eCollection 2018. Cogent Geosci. 2018. PMID: 30035156 Free PMC article.

See all "Cited by" articles

References

1. Wolpert DM, Ghahramani Z, Jordan MI. 1995. An internal model for sensorimotor integration. Science 269, 1880–188210.1126/science.7569931 (doi:10.1126/science.7569931) - DOI - DOI - PubMed
1. Knill D, Richards W. 1996. Perception as Bayesian inference. Cambridge, UK: Cambridge University Press.
1. Griffiths TL, Tenenbaum JB. 2006. Optimal predictions in everyday cognition. Psychol. Sci. 17, 767–77310.1111/j.1467-9280.2006.01780.x (doi:10.1111/j.1467-9280.2006.01780.x) - DOI - DOI - PubMed
1. Doob JL. 1949. Application of the theory of martingales. Coll. Int. Centre Nat. Res. Sci. 13, 23–27
1. Le Cam L. 1986. Asymptotic methods in statistical decision theory. Berlin, Germany: Springer.

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

[1] Wolpert DM, Ghahramani Z, Jordan MI. 1995. An internal model for sensorimotor integration. Science 269, 1880–188210.1126/science.7569931 (doi:10.1126/science.7569931) - DOI - DOI - PubMed

[2] Wolpert DM, Ghahramani Z, Jordan MI. 1995. An internal model for sensorimotor integration. Science 269, 1880–188210.1126/science.7569931 (doi:10.1126/science.7569931) - DOI - DOI - PubMed

[3] Knill D, Richards W. 1996. Perception as Bayesian inference. Cambridge, UK: Cambridge University Press.

[4] Knill D, Richards W. 1996. Perception as Bayesian inference. Cambridge, UK: Cambridge University Press.

[5] Griffiths TL, Tenenbaum JB. 2006. Optimal predictions in everyday cognition. Psychol. Sci. 17, 767–77310.1111/j.1467-9280.2006.01780.x (doi:10.1111/j.1467-9280.2006.01780.x) - DOI - DOI - PubMed

[6] Griffiths TL, Tenenbaum JB. 2006. Optimal predictions in everyday cognition. Psychol. Sci. 17, 767–77310.1111/j.1467-9280.2006.01780.x (doi:10.1111/j.1467-9280.2006.01780.x) - DOI - DOI - PubMed

[7] Doob JL. 1949. Application of the theory of martingales. Coll. Int. Centre Nat. Res. Sci. 13, 23–27

[8] Doob JL. 1949. Application of the theory of martingales. Coll. Int. Centre Nat. Res. Sci. 13, 23–27

[9] Le Cam L. 1986. Asymptotic methods in statistical decision theory. Berlin, Germany: Springer.

[10] Le Cam L. 1986. Asymptotic methods in statistical decision theory. Berlin, Germany: Springer.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Bayesian non-parametrics and the probabilistic approach to modelling

Affiliation

Bayesian non-parametrics and the probabilistic approach to modelling

Author

Affiliation

Abstract

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Other Literature Sources