Model-based influences on humans' choices and striatal prediction errors - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Mar 24;69(6):1204-15.
doi: 10.1016/j.neuron.2011.02.027.

Model-based influences on humans' choices and striatal prediction errors

Affiliations

Model-based influences on humans' choices and striatal prediction errors

Nathaniel D Daw et al. Neuron. .

Abstract

The mesostriatal dopamine system is prominently implicated in model-free reinforcement learning, with fMRI BOLD signals in ventral striatum notably covarying with model-free prediction errors. However, latent learning and devaluation studies show that behavior also shows hallmarks of model-based planning, and the interaction between model-based and model-free values, prediction errors, and preferences is underexplored. We designed a multistep decision task in which model-based and model-free influences on human choice behavior could be distinguished. By showing that choices reflected both influences we could then test the purity of the ventral striatal BOLD signal as a model-free report. Contrary to expectations, the signal reflected both model-free and model-based predictions in proportions matching those that best explained choice behavior. These results challenge the notion of a separate model-free learner and suggest a more integrated computational architecture for high-level human decision-making.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(a) Timeline of events in trial. A first-stage choice between two options (green boxes) leads to a second-stage choice (here, between two pink options), which is reinforced with money. (b) State transition structure. Each first-stage choice is predominantly associated with one or the other of the second-stage states, and leads there 70% of the time.
Figure 2
Figure 2
Factorial analysis of choice behavior. (a) Simple reinforcement predicts that a first-stage choice resulting in reward is more likely to be repeated on the subsequent trial, regardless of whether that reward occurred after a common or rare transition. (b) Model-based prospective evaluation instead predicts that a rare transition should affect the value of the other first-stage option, leading to a predicted interaction between the factors of reward and transition probability. (c) Actual stay proportions, averaged across subjects, display hallmarks of both strategies. Error bars: 1 SEM.
Figure 3
Figure 3
Neural correlates of model-free and model-based valuations in RPE in striatum. All maps thresholded at p<.001 uncorrected for display. (a) Correlates of model-free RPE in bilateral striatum (left peak: −12 10 4, right: 10 12 −4). (b) RPE signaling in ventral striatum is better explained by including some model-based predictions: correlations with the difference between model-based and model-free RPE signals (left: −10 6 12, right: 12 16 −8). (c) Conjunction of contrasts from a and b (left: −12 10 −10, right, 12 16 −6). (d) Region of right ventral striatum where the weight given to model-based valuations in explaining the BOLD response correlated, across subjects, with that derived from explaining their choice behavior (14 20 −6). (e) Conjunction of contrasts from a and d (14 20 −6). (f) Scatterplot of the correlation from d, from average activity over an anatomically defined mask of right ventral striatum. (r2 =.28, p=.027).
Figure 4
Figure 4
Neural correlates of model-free and model-based valuations in RPE in medial PFC. Thresholded at p<.001 uncorrected (a and b) or p<.005 uncorrected (c) for display. (a) Correlates of model-free RPE in medial PFC (−4 66 14). (b) RPE signaling in medial PFC is better explained by including some model-based predictions: correlations with the difference between the two RPE signals (−4 56,14). (c) Conjunction of contrasts from a and b (−4 62 12).
Figure 5
Figure 5
Factorial analysis of BOLD signal at start of trial, from average activity over an anatomical mask of right nucleus accumbens. (a) Signal change (relative to mean) as a function of whether the choice on the previous trial previous trial was rewarded or unrewarded, and whether that occurred after a common or rare transition (compare Figure 2c) Error bars: 1SEM. (b) Scatterplot of the correlation, across subjects, between the contrast measuring the size of the interaction between reward and transition probability (an index of model-based valuation), and the weight given to model-based vs model-free valuations in explaining choice behavior. (r2=0.32, p=.017).

Similar articles

Cited by

References

    1. Adams C. Variations in the sensitivity of instrumental responding to reinforcer devaluation. The Quarterly Journal of Experimental Psychology Section B. 1982;34:77–98.
    1. Balleine B, O’Doherty J. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology. 2010;35:48–69. - PMC - PubMed
    1. Balleine BW, Daw ND, O’Doherty JP. Multiple forms of value learning and the function of dopamine. In: Glimcher PW, Camerer C, Poldrack RA, Fehr E, editors. Neuroeconomics: Decision Making and the Brain. Academic Press; 2008.
    1. Ballmaier M, Toga A, Siddarth P, Blanton R, Levitt J, Lee M, Caplan R. Thought disorder and nucleus accumbens in childhood: a structural MRI study. Psychiatry Res. 2004;130:43–55. - PubMed
    1. Barto A, Sutton R, Anderson C. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on systems, man, and cybernetics. 1983;13:834–846.

Publication types

LinkOut - more resources