Post hoc Bayesian model selection - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jun 15;56(4):2089-99.
doi: 10.1016/j.neuroimage.2011.03.062. Epub 2011 Mar 31.

Post hoc Bayesian model selection

Affiliations

Post hoc Bayesian model selection

Karl Friston et al. Neuroimage. .

Abstract

This note describes a Bayesian model selection or optimization procedure for post hoc inferences about reduced versions of a full model. The scheme provides the evidence (marginal likelihood) for any reduced model as a function of the posterior density over the parameters of the full model. It rests upon specifying models through priors on their parameters, under the assumption that the likelihood remains the same for all models considered. This provides a quick and efficient scheme for scoring arbitrarily large numbers of models, after inverting a single (full) model. In turn, this enables the selection among discrete models that are distinguished by the presence or absence of free parameters, where free parameters are effectively removed from the model using very precise shrinkage priors. An alternative application of this post hoc model selection considers continuous model spaces, defined in terms of hyperparameters (sufficient statistics) of the prior density over model parameters. In this instance, the prior (model) can be optimized with respect to its evidence. The expressions for model evidence become remarkably simple under the Laplace (Gaussian) approximation to the posterior density. Special cases of this scheme include Savage-Dickey density ratio tests for reduced models and automatic relevance determination in model optimization. We illustrate the approach using general linear models and a more complicated nonlinear state-space model.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Model evidence and posterior densities on the precision parameters of a general linear model. Upper left: the (exponential of the) free-energy bound on log-evidence as a function of prior mean and variance of the log-precision parameters of a general linear model. Lighter areas denote higher evidence. The dashed line represents the optimum prior mean that maximizes evidence. Upper right panel: this shows the model evidence as a function of prior variance at the optimum prior mean. Lower panel: this shows the posterior density on the first of two precision parameters. The solid line shows the (optimized) posterior, based upon the optimum priors, using Eq. (10) in the main text. The broken line represents the same quantity but under the full priors. The vertical doted line corresponds to the value (precision) of observation noise used to generate the data.
Fig. 2
Fig. 2
Model evidence and posteriors on the parameters of a linear model. Upper left panel: this shows the log-evidence over several thousand models that are distinguished by the permutation of priors on their regression parameters. These priors could take the value of zero or eight. Given there were twelve free regression parameters, this gives 212 = 4096 models. Upper right: the same data as on the left but expressed in terms of evidence (the exponential of free-energy, normalized to a sum of one). Lower panel: the conditional means of the twelve parameters of this linear model. The black bars show the posterior means under the full model, the grey bars under a reduced model and the white bars show the true values. The key thing to note here is that (redundant) parameters have shrunk to zero, under the priors selected by automatic model selection.
Fig. 3
Fig. 3
Model evidence as a function of prior variance for relevant and irrelevant parameters. This example comes from the inversion described in the previous figure and highlights the qualitative difference in the dependency of model evidence on prior variance. The important thing here is that only relevant parameters (that were used in generating data) have a maximum at a non-zero variance (marked with an arrow).
Fig. 4
Fig. 4
Example of synthetic data used for network discovery. Upper left panel: the simulated data over 256 (3.22 s) time bins comprising signal (solid lines) and correlated observation noise (broken lines). These simulated data were selected from two regions and were generated as a nonlinear function of region-specific hidden states shown on the upper left. These hidden states evolve dynamically according to equations of motion that model a physiological transduction of neuronal activity into measurable blood flow (hemodynamic) changes in the brain. The original perturbation to these dynamics arises from the hidden causes shown on the lower left. These were simply smooth random fluctuations sampled from a Gaussian distribution with a log-precision of eight. Examples of two hidden causes shown here correspond to the two colored regions in the graph (insert on the lower right). This graph depicts four nodes (brain regions) and all possible edges (putative connections). Hidden causes drive each of the four nodes to produce data. Crucially, the neuronal dynamics simulated in each node are communicated to other nodes through bidirectional connections (double headed arrows). When generating synthetic data we chose three out of a maximum of six connections. These are shown as solid arrows.
Fig. 5
Fig. 5
Results of model inversion and automated selection. Upper left panel: this shows the conditional means following inversion of the full model. The posterior means (grey bars) and 90% confidence intervals (red bars) are superimposed on the true values (black bars). It can be seen in most instances the true values fall within 90% confidence intervals. We have only shown the connections between brain regions in this figure; six of which were zero. Upper left: profile of log-evidences (or log-posterior of each model under flat model priors) over 64 models corresponding to different combinations of connections among the four nodes. Lower left: the same data but plotted as a function of graph size (number of bidirectional connections). The red dot corresponds to the model with the highest evidence, which was also the true model used to generate the data. Lower right: this portrays the same data as in the corresponding upper panel but here it is shown as a model posterior.
Fig. 6
Fig. 6
Adjacency matrices defining the connections between the four nodes in the simulated data of the previous figure. Left panel: this adjacency matrix defines a serially coupled chain with bidirectional connections and describes the connectivity used to generate data. Right panel: the optimized prior variance on each of the coupling parameters. This shows that the optimization of prior variance has identified the correct sparsity structure of connections and has assigned roughly equal prior variance to existing connections that were actually present. The gray scale is arbitrary.

Similar articles

Cited by

References

    1. Beal M.J., Ghahramani Z. The variational Bayesian EM algorithm for incomplete Data: with application to scoring graphical model structures. In: Bernardo J.M., Bayarri M.J., Berger J.O., Dawid A.P., Heckerman D., Smith A.F.M., West M., editors. Bayesian Statistics. OUP; UK: 2003. Chapter 7.
    1. Beal M.J. (1998) Variational algorithms for approximate Bayesian inference, PhD Thesis:http://www.cse.buffalo.edu/faculty/mbeal/thesis/- p58.
    1. Dempster A.P., Laird N.M., Rubin Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B. 1977;39:1–38.
    1. Dickey J. The weighted likelihood ratio, linear hypotheses on normal location parameters. Ann. Stat. 1971;42:204–223.
    1. Efron B., Morris C. Stein's estimation rule and its competitors—an empirical Bayes approach. J. Am. Stats. Assoc. 1973;68:117–130.

Publication types

LinkOut - more resources