scholarly journals Reward prediction error in the ERP following unconditioned aversive stimuli

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Harry J. Stewardson ◽  
Thomas D. Sambrook

AbstractReinforcement learning in humans and other animals is driven by reward prediction errors: deviations between the amount of reward or punishment initially expected and that which is obtained. Temporal difference methods of reinforcement learning generate this reward prediction error at the earliest time at which a revision in reward or punishment likelihood is signalled, for example by a conditioned stimulus. Midbrain dopamine neurons, believed to compute reward prediction errors, generate this signal in response to both conditioned and unconditioned stimuli, as predicted by temporal difference learning. Electroencephalographic recordings of human participants have suggested that a component named the feedback-related negativity (FRN) is generated when this signal is carried to the cortex. If this is so, the FRN should be expected to respond equivalently to conditioned and unconditioned stimuli. However, very few studies have attempted to measure the FRN’s response to unconditioned stimuli. The present study attempted to elicit the FRN in response to a primary aversive stimulus (electric shock) using a design that varied reward prediction error while holding physical intensity constant. The FRN was strongly elicited, but earlier and more transiently than typically seen, suggesting that it may incorporate other processes than the midbrain dopamine system.

2017 ◽  
Author(s):  
Matthew P.H. Gardner ◽  
Geoffrey Schoenbaum ◽  
Samuel J. Gershman

AbstractMidbrain dopamine neurons are commonly thought to report a reward prediction error, as hypothesized by reinforcement learning theory. While this theory has been highly successful, several lines of evidence suggest that dopamine activity also encodes sensory prediction errors unrelated to reward. Here we develop a new theory of dopamine function that embraces a broader conceptualization of prediction errors. By signaling errors in both sensory and reward predictions, dopamine supports a form of reinforcement learning that lies between model-based and model-free algorithms. This account remains consistent with current canon regarding the correspondence between dopamine transients and reward prediction errors, while also accounting for new data suggesting a role for these signals in phenomena such as sensory preconditioning and identity unblocking, which ostensibly draw upon knowledge beyond reward predictions.


2018 ◽  
Vol 285 (1891) ◽  
pp. 20181645 ◽  
Author(s):  
Matthew P. H. Gardner ◽  
Geoffrey Schoenbaum ◽  
Samuel J. Gershman

Midbrain dopamine neurons are commonly thought to report a reward prediction error (RPE), as hypothesized by reinforcement learning (RL) theory. While this theory has been highly successful, several lines of evidence suggest that dopamine activity also encodes sensory prediction errors unrelated to reward. Here, we develop a new theory of dopamine function that embraces a broader conceptualization of prediction errors. By signalling errors in both sensory and reward predictions, dopamine supports a form of RL that lies between model-based and model-free algorithms. This account remains consistent with current canon regarding the correspondence between dopamine transients and RPEs, while also accounting for new data suggesting a role for these signals in phenomena such as sensory preconditioning and identity unblocking, which ostensibly draw upon knowledge beyond reward predictions.


2018 ◽  
Author(s):  
Anthony I. Jang ◽  
Matthew R. Nassar ◽  
Daniel G. Dillon ◽  
Michael J. Frank

AbstractThe dopamine system is thought to provide a reward prediction error signal that facilitates reinforcement learning and reward-based choice in corticostriatal circuits. While it is believed that similar prediction error signals are also provided to temporal lobe memory systems, the impact of such signals on episodic memory encoding has not been fully characterized. Here we develop an incidental memory paradigm that allows us to 1) estimate the influence of reward prediction errors on the formation of episodic memories, 2) dissociate this influence from other factors such as surprise and uncertainty, 3) test the degree to which this influence depends on temporal correspondence between prediction error and memoranda presentation, and 4) determine the extent to which this influence is consolidation-dependent. We find that when choosing to gamble for potential rewards during a primary decision making task, people encode incidental memoranda more strongly even though they are not aware that their memory will be subsequently probed. Moreover, this strengthened encoding scales with the reward prediction error, and not overall reward, experienced selectively at the time of memoranda presentation (and not before or after). Finally, this strengthened encoding is identifiable within a few minutes and is not substantially enhanced after twenty-four hours, indicating that it is not consolidation-dependent. These results suggest a computationally and temporally specific role for putative dopaminergic reward prediction error signaling in memory formation.


2018 ◽  
Author(s):  
Rachel S. Lee ◽  
Marcelo G. Mattar ◽  
Nathan F. Parker ◽  
Ilana B. Witten ◽  
Nathaniel D. Daw

AbstractAlthough midbrain dopamine (DA) neurons have been thought to primarily encode reward prediction error (RPE), recent studies have also found movement-related DAergic signals. For example, we recently reported that DA neurons in mice projecting to dorsomedial striatum are modulated by choices contralateral to the recording side. Here, we introduce, and ultimately reject, a candidate resolution for the puzzling RPE vs movement dichotomy, by showing how seemingly movement-related activity might be explained by an action-specific RPE. By considering both choice and RPE on a trial-by-trial basis, we find that DA signals are modulated by contralateral choice in a manner that is distinct from RPE, implying that choice encoding is better explained by movement direction. This fundamental separation between RPE and movement encoding may help shed light on the diversity of functions and dysfunctions of the DA system.


2019 ◽  
Author(s):  
Melissa J. Sharpe ◽  
Hannah M. Batchelor ◽  
Lauren E. Mueller ◽  
Chun Yun Chang ◽  
Etienne J.P. Maes ◽  
...  

AbstractDopamine neurons fire transiently in response to unexpected rewards. These neural correlates are proposed to signal the reward prediction error described in model-free reinforcement learning algorithms. This error term represents the unpredicted or ‘excess’ value of the rewarding event. In model-free reinforcement learning, this value is then stored as part of the learned value of any antecedent cues, contexts or events, making them intrinsically valuable, independent of the specific rewarding event that caused the prediction error. In support of equivalence between dopamine transients and this model-free error term, proponents cite causal optogenetic studies showing that artificially induced dopamine transients cause lasting changes in behavior. Yet none of these studies directly demonstrate the presence of cached value under conditions appropriate for associative learning. To address this gap in our knowledge, we conducted three studies where we optogenetically activated dopamine neurons while rats were learning associative relationships, both with and without reward. In each experiment, the antecedent cues failed to acquired value and instead entered into value-independent associative relationships with the other cues or rewards. These results show that dopamine transients, constrained within appropriate learning situations, support valueless associative learning.


2016 ◽  
Vol 18 (1) ◽  
pp. 23-32 ◽  

Reward prediction errors consist of the differences between received and predicted rewards. They are crucial for basic forms of learning about rewards and make us strive for more rewards—an evolutionary beneficial trait. Most dopamine neurons in the midbrain of humans, monkeys, and rodents signal a reward prediction error; they are activated by more reward than predicted (positive prediction error), remain at baseline activity for fully predicted rewards, and show depressed activity with less reward than predicted (negative prediction error). The dopamine signal increases nonlinearly with reward value and codes formal economic utility. Drugs of addiction generate, hijack, and amplify the dopamine reward signal and induce exaggerated, uncontrolled dopamine effects on neuronal plasticity. The striatum, amygdala, and frontal cortex also show reward prediction error coding, but only in subpopulations of neurons. Thus, the important concept of reward prediction errors is implemented in neuronal hardware.


2014 ◽  
Vol 26 (3) ◽  
pp. 635-644 ◽  
Author(s):  
Olav E. Krigolson ◽  
Cameron D. Hassall ◽  
Todd C. Handy

Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors—discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833–1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129–141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769–776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679–709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.


2019 ◽  
Author(s):  
David J. Ottenheimer ◽  
Bilal A. Bari ◽  
Elissa Sutlief ◽  
Kurt M. Fraser ◽  
Tabitha H. Kim ◽  
...  

ABSTRACTLearning from past interactions with the environment is critical for adaptive behavior. Within the framework of reinforcement learning, the nervous system builds expectations about future reward by computing reward prediction errors (RPEs), the difference between actual and predicted rewards. Correlates of RPEs have been observed in the midbrain dopamine system, which is thought to locally compute this important variable in service of learning. However, the extent to which RPE signals may be computed upstream of the dopamine system is largely unknown. Here, we quantify history-based RPE signals in the ventral pallidum (VP), an input region to the midbrain dopamine system implicated in reward-seeking behavior. We trained rats to associate cues with future delivery of reward and fit computational models to predict individual neuron firing rates at the time of reward delivery. We found that a subset of VP neurons encoded RPEs and did so more robustly than nucleus accumbens, an input to VP. VP RPEs predicted trial-by-trial task engagement, and optogenetic inhibition of VP reduced subsequent task-related reward seeking. Consistent with reinforcement learning, activity of VP RPE cells adapted when rewards were delivered in blocks. We further found that history- and cue-based RPEs were largely separate across the VP neural population. The presence of behaviorally-instructive RPE signals in the VP suggests a pivotal role for this region in value-based computations.


eLife ◽  
2016 ◽  
Vol 5 ◽  
Author(s):  
Brian F Sadacca ◽  
Joshua L Jones ◽  
Geoffrey Schoenbaum

Midbrain dopamine neurons have been proposed to signal reward prediction errors as defined in temporal difference (TD) learning algorithms. While these models have been extremely powerful in interpreting dopamine activity, they typically do not use value derived through inference in computing errors. This is important because much real world behavior – and thus many opportunities for error-driven learning – is based on such predictions. Here, we show that error-signaling rat dopamine neurons respond to the inferred, model-based value of cues that have not been paired with reward and do so in the same framework as they track the putative cached value of cues previously paired with reward. This suggests that dopamine neurons access a wider variety of information than contemplated by standard TD models and that, while their firing conforms to predictions of TD models in some cases, they may not be restricted to signaling errors from TD predictions.


Sign in / Sign up

Export Citation Format

Share Document