scholarly journals Positive reward prediction errors strengthen incidental memory encoding

2018 ◽  
Author(s):  
Anthony I. Jang ◽  
Matthew R. Nassar ◽  
Daniel G. Dillon ◽  
Michael J. Frank

AbstractThe dopamine system is thought to provide a reward prediction error signal that facilitates reinforcement learning and reward-based choice in corticostriatal circuits. While it is believed that similar prediction error signals are also provided to temporal lobe memory systems, the impact of such signals on episodic memory encoding has not been fully characterized. Here we develop an incidental memory paradigm that allows us to 1) estimate the influence of reward prediction errors on the formation of episodic memories, 2) dissociate this influence from other factors such as surprise and uncertainty, 3) test the degree to which this influence depends on temporal correspondence between prediction error and memoranda presentation, and 4) determine the extent to which this influence is consolidation-dependent. We find that when choosing to gamble for potential rewards during a primary decision making task, people encode incidental memoranda more strongly even though they are not aware that their memory will be subsequently probed. Moreover, this strengthened encoding scales with the reward prediction error, and not overall reward, experienced selectively at the time of memoranda presentation (and not before or after). Finally, this strengthened encoding is identifiable within a few minutes and is not substantially enhanced after twenty-four hours, indicating that it is not consolidation-dependent. These results suggest a computationally and temporally specific role for putative dopaminergic reward prediction error signaling in memory formation.

2014 ◽  
Vol 26 (3) ◽  
pp. 447-458 ◽  
Author(s):  
Ernest Mas-Herrero ◽  
Josep Marco-Pallarés

In decision-making processes, the relevance of the information yielded by outcomes varies across time and situations. It increases when previous predictions are not accurate and in contexts with high environmental uncertainty. Previous fMRI studies have shown an important role of medial pFC in coding both reward prediction errors and the impact of this information to guide future decisions. However, it is unclear whether these two processes are dissociated in time or occur simultaneously, suggesting that a common mechanism is engaged. In the present work, we studied the modulation of two electrophysiological responses associated to outcome processing—the feedback-related negativity ERP and frontocentral theta oscillatory activity—with the reward prediction error and the learning rate. Twenty-six participants performed two learning tasks differing in the degree of predictability of the outcomes: a reversal learning task and a probabilistic learning task with multiple blocks of novel cue–outcome associations. We implemented a reinforcement learning model to obtain the single-trial reward prediction error and the learning rate for each participant and task. Our results indicated that midfrontal theta activity and feedback-related negativity increased linearly with the unsigned prediction error. In addition, variations of frontal theta oscillatory activity predicted the learning rate across tasks and participants. These results support the existence of a common brain mechanism for the computation of unsigned prediction error and learning rate.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Harry J. Stewardson ◽  
Thomas D. Sambrook

AbstractReinforcement learning in humans and other animals is driven by reward prediction errors: deviations between the amount of reward or punishment initially expected and that which is obtained. Temporal difference methods of reinforcement learning generate this reward prediction error at the earliest time at which a revision in reward or punishment likelihood is signalled, for example by a conditioned stimulus. Midbrain dopamine neurons, believed to compute reward prediction errors, generate this signal in response to both conditioned and unconditioned stimuli, as predicted by temporal difference learning. Electroencephalographic recordings of human participants have suggested that a component named the feedback-related negativity (FRN) is generated when this signal is carried to the cortex. If this is so, the FRN should be expected to respond equivalently to conditioned and unconditioned stimuli. However, very few studies have attempted to measure the FRN’s response to unconditioned stimuli. The present study attempted to elicit the FRN in response to a primary aversive stimulus (electric shock) using a design that varied reward prediction error while holding physical intensity constant. The FRN was strongly elicited, but earlier and more transiently than typically seen, suggesting that it may incorporate other processes than the midbrain dopamine system.


2015 ◽  
Vol 112 (8) ◽  
pp. 2539-2544 ◽  
Author(s):  
Xiaosi Gu ◽  
Terry Lohrenz ◽  
Ramiro Salas ◽  
Philip R. Baldwin ◽  
Alireza Soltani ◽  
...  

Little is known about how prior beliefs impact biophysically described processes in the presence of neuroactive drugs, which presents a profound challenge to the understanding of the mechanisms and treatments of addiction. We engineered smokers’ prior beliefs about the presence of nicotine in a cigarette smoked before a functional magnetic resonance imaging session where subjects carried out a sequential choice task. Using a model-based approach, we show that smokers’ beliefs about nicotine specifically modulated learning signals (value and reward prediction error) defined by a computational model of mesolimbic dopamine systems. Belief of “no nicotine in cigarette” (compared with “nicotine in cigarette”) strongly diminished neural responses in the striatum to value and reward prediction errors and reduced the impact of both on smokers’ choices. These effects of belief could not be explained by global changes in visual attention and were specific to value and reward prediction errors. Thus, by modulating the expression of computationally explicit signals important for valuation and choice, beliefs can override the physical presence of a potent neuroactive compound like nicotine. These selective effects of belief demonstrate that belief can modulate model-based parameters important for learning. The implications of these findings may be far ranging because belief-dependent effects on learning signals could impact a host of other behaviors in addiction as well as in other mental health problems.


2016 ◽  
Vol 18 (1) ◽  
pp. 23-32 ◽  

Reward prediction errors consist of the differences between received and predicted rewards. They are crucial for basic forms of learning about rewards and make us strive for more rewards—an evolutionary beneficial trait. Most dopamine neurons in the midbrain of humans, monkeys, and rodents signal a reward prediction error; they are activated by more reward than predicted (positive prediction error), remain at baseline activity for fully predicted rewards, and show depressed activity with less reward than predicted (negative prediction error). The dopamine signal increases nonlinearly with reward value and codes formal economic utility. Drugs of addiction generate, hijack, and amplify the dopamine reward signal and induce exaggerated, uncontrolled dopamine effects on neuronal plasticity. The striatum, amygdala, and frontal cortex also show reward prediction error coding, but only in subpopulations of neurons. Thus, the important concept of reward prediction errors is implemented in neuronal hardware.


2020 ◽  
Vol 46 (6) ◽  
pp. 1535-1546
Author(s):  
Teresa Katthagen ◽  
Jakob Kaminski ◽  
Andreas Heinz ◽  
Ralph Buchert ◽  
Florian Schlagenhauf

Abstract Increased striatal dopamine synthesis capacity has consistently been reported in patients with schizophrenia. However, the mechanism translating this into behavior and symptoms remains unclear. It has been proposed that heightened striatal dopamine may blunt dopaminergic reward prediction error signaling during reinforcement learning. In this study, we investigated striatal dopamine synthesis capacity, reward prediction errors, and their association in unmedicated schizophrenia patients (n = 19) and healthy controls (n = 23). They took part in FDOPA-PET and underwent functional magnetic resonance imaging (fMRI) scanning, where they performed a reversal-learning paradigm. The groups were compared regarding dopamine synthesis capacity (Kicer), fMRI neural prediction error signals, and the correlation of both. Patients did not differ from controls with respect to striatal Kicer. Taking into account, comorbid alcohol abuse revealed that patients without such abuse showed elevated Kicer in the associative striatum, while those with abuse did not differ from controls. Comparing all patients to controls, patients performed worse during reversal learning and displayed reduced prediction error signaling in the ventral striatum. In controls, Kicer in the limbic striatum correlated with higher reward prediction error signaling, while there was no significant association in patients. Kicer in the associative striatum correlated with higher positive symptoms and blunted reward prediction error signaling was associated with negative symptoms. Our results suggest a dissociation between striatal subregions and symptom domains, with elevated dopamine synthesis capacity in the associative striatum contributing to positive symptoms while blunted prediction error signaling in the ventral striatum related to negative symptoms.


2017 ◽  
Author(s):  
Anna O Ermakova ◽  
Franziska Knolle ◽  
Azucena Justicia ◽  
Edward T Bullmore ◽  
Peter B Jones ◽  
...  

AbstractOngoing research suggests preliminary, though not entirely consistent, evidence of neural abnormalities in signalling prediction errors in schizophrenia. Supporting theories suggest mechanistic links between the disruption of these processes and the generation of psychotic symptoms. However, it is not known at what stage in psychosis these impairments in prediction error signalling develop. One major confound in prior studies is the use of medicated patients with strongly varying disease durations. Our study aims to investigate the involvement of the meso-cortico-striatal circuitry during reward prediction error signalling in the earliest stages of psychosis. We studied patients with first episode psychosis (FEP) and help-seeking individuals at risk for psychosis due to subthreshold prodromal psychotic symptoms. Patients with either FEP (n = 14), or at-risk for developing psychosis (n= 30), and healthy volunteers (n = 39) performed a reinforcement learning task during fMRI scanning. ANOVA revealed significant (p<0.05 family-wise error corrected) prediction error signalling differences between groups in the dopaminergic midbrain and right middle frontal gyrus (dorsolateral prefrontal cortex, DLPFC). Patients with FEP showed disrupted reward prediction error signalling compared to controls in both regions. At-risk patients showed intermediate activation in the midbrain that significantly differed from controls and from FEP patients, but DLPFC activation that did not differ from controls. Our study confirms that patients with FEP have abnormal meso-cortical signalling of reward prediction errors, whilst reward prediction error dysfunction in the at-risk patients appears to show a more nuanced pattern of activation with a degree of midbrain impairment but preserved cortical function.


2019 ◽  
Author(s):  
Emma L. Roscow ◽  
Matthew W. Jones ◽  
Nathan F. Lepora

AbstractNeural activity encoding recent experiences is replayed during sleep and rest to promote consolidation of the corresponding memories. However, precisely which features of experience influence replay prioritisation to optimise adaptive behaviour remains unclear. Here, we trained adult male rats on a novel maze-based rein-forcement learning task designed to dissociate reward outcomes from reward-prediction errors. Four variations of a reinforcement learning model were fitted to the rats’ behaviour over multiple days. Behaviour was best predicted by a model incorporating replay biased by reward-prediction error, compared to the same model with no replay; random replay or reward-biased replay produced poorer predictions of behaviour. This insight disentangles the influences of salience on replay, suggesting that reinforcement learning is tuned by post-learning replay biased by reward-prediction error, not by reward per se. This work therefore provides a behavioural and theoretical toolkit with which to measure and interpret replay in striatal, hippocampal and neocortical circuits.


2014 ◽  
Vol 26 (3) ◽  
pp. 467-471 ◽  
Author(s):  
Samuel J. Gershman

Temporal difference learning models of dopamine assert that phasic levels of dopamine encode a reward prediction error. However, this hypothesis has been challenged by recent observations of gradually ramping stratal dopamine levels as a goal is approached. This note describes conditions under which temporal difference learning models predict dopamine ramping. The key idea is representational: a quadratic transformation of proximity to the goal implies approximately linear ramping, as observed experimentally.


Sign in / Sign up

Export Citation Format

Share Document