Theory of reinforcement learning and motivation in the basal ganglia

Learning the payoffs and costs of actions

10.1101/346114 ◽

2018 ◽

Author(s):

Moritz Möller ◽

Rafal Bogacz

Keyword(s):

Basal Ganglia ◽

Weak Decay ◽

Specific Aspect ◽

Prediction Errors ◽

Negative Consequences ◽

Motivational State ◽

Vertebrate Brain ◽

Reward Prediction ◽

Dopaminergic Modulation ◽

Overall Evaluation

AbstractA set of sub-cortical nuclei called basal ganglia is critical for learning the values of actions. The basal ganglia include two pathways, which have been associated with approach and avoid behavior respectively, and are differentially modulated by dopamine projections from the midbrain. According to the influential opponent actor learning model, these pathways represent learned estimates of the positive and negative consequences (payoffs and costs) of actions. The level of dopamine release controls to what extent payoffs and costs enter the overall evaluation of actions. How the knowledge about payoff and cost is acquired is still an open question, even though many theories describe learning from feedback in the basal ganglia. We examine whether a set of plasticity rules proposed to model reinforcement learning in the pathways of the basal ganglia is suitable to extract payoffs and costs from a reward prediction error signal. First, we determine the result of such learning, both analytically and via simulations, for different reward schedules that feature payoffs and costs. Then, we combine the plasticity rules with a decision rule to examine the emerging effect of dopaminergic modulation on the willingness to work for reward. We find that the plasticity rules are suitable to infer the mean payoffs and costs of actions, if those occur at different moments in time. Successful learning requires differential effects of positive and negative reward prediction errors on the two pathways, and a weak decay of synaptic weights over trials. We also confirm that dopaminergic modulation produces effects on the willingness to work for reward similar to those observed in classical experiments.Author summaryThe basal ganglia are structures underneath the surface of the vertebrate brain, associated with error driven learning. Much is known about the anatomical and biological features of the basal ganglia; scientists now try to understand the algorithms implemented by these structures. Numerous models aspire to capture the learning functionality, but many of them only cover some specific aspect of the algorithm. Instead of further adding to that pool of partial models, we unify two existing ones - one which captures what the basal ganglia learns, and one that describes the learning mechanism itself. The first model suggests that the basal ganglia keeps track of both positive and negative consequences of frequent opportunities, and weighs these by the motivational state in decisions. It explains how payoff and cost are represented, but not how those representations arise. The other model consists of biologically plausible plasticity rules, which describe how learning takes place, but not how the brain makes use of what is learned. We show that the two theories are compatible. Together, they form a model of learning and decision making that integrates the motivational state as well as the learned payoffs and costs of opportunities.

Download Full-text

Striatal dynamics explain duration judgments

10.1101/020883 ◽

2015 ◽

Cited By ~ 2

Author(s):

Thiago S. Gouvêa ◽

Tiago Monteiro ◽

Asma Motiwala ◽

Sofia Soares ◽

Christian K. Machens ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Basal Ganglia ◽

Interval Timing ◽

Time Dependent ◽

Striatal Neurons ◽

Input Structure ◽

State Changes ◽

Psychophysical Task

The striatum is an input structure of the basal ganglia implicated in several time-dependent functions including reinforcement learning, decision making, and interval timing. To determine whether striatal ensembles drive subjects' judgments of duration, we manipulated and recorded from striatal neurons in rats performing a duration categorization psychophysical task. We found that the dynamics of striatal neurons predicted duration judgments, and that simultaneously recorded ensembles could judge duration as well as the animal. Furthermore, striatal neurons were necessary for duration judgments, as muscimol infusions produced a specific impairment in animals' duration sensitivity. Lastly, we show that time as encoded by striatal populations ran faster or slower when rats judged a duration as longer or shorter, respectively. These results demonstrate that the speed with which striatal population state changes supports the fundamental ability of animals to judge the passage of time.

Download Full-text

Striatal dynamics explain duration judgments

eLife ◽

10.7554/elife.11386 ◽

2015 ◽

Vol 4 ◽

Cited By ~ 66

Author(s):

Thiago S Gouvêa ◽

Tiago Monteiro ◽

Asma Motiwala ◽

Sofia Soares ◽

Christian Machens ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Basal Ganglia ◽

Interval Timing ◽

Time Dependent ◽

Striatal Neurons ◽

Input Structure ◽

State Changes ◽

Psychophysical Task

The striatum is an input structure of the basal ganglia implicated in several time-dependent functions including reinforcement learning, decision making, and interval timing. To determine whether striatal ensembles drive subjects' judgments of duration, we manipulated and recorded from striatal neurons in rats performing a duration categorization psychophysical task. We found that the dynamics of striatal neurons predicted duration judgments, and that simultaneously recorded ensembles could judge duration as well as the animal. Furthermore, striatal neurons were necessary for duration judgments, as muscimol infusions produced a specific impairment in animals' duration sensitivity. Lastly, we show that time as encoded by striatal populations ran faster or slower when rats judged a duration as longer or shorter, respectively. These results demonstrate that the speed with which striatal population state changes supports the fundamental ability of animals to judge the passage of time.

Download Full-text

A Neurocomputational Model of Dopamine and Prefrontal–Striatal Interactions during Multicue Category Learning by Parkinson Patients

Journal of Cognitive Neuroscience ◽

10.1162/jocn.2010.21420 ◽

2011 ◽

Vol 23 (1) ◽

pp. 151-167 ◽

Cited By ~ 29

Author(s):

Ahmed A. Moustafa ◽

Mark A. Gluck

Keyword(s):

Reinforcement Learning ◽

Basal Ganglia ◽

Category Learning ◽

Weather Prediction ◽

Critical Role ◽

Neural Mechanism ◽

Competitive Dynamics ◽

Stimulus Selection ◽

Striatal Neurons ◽

Attentional Learning

Most existing models of dopamine and learning in Parkinson disease (PD) focus on simulating the role of basal ganglia dopamine in reinforcement learning. Much data argue, however, for a critical role for prefrontal cortex (PFC) dopamine in stimulus selection in attentional learning. Here, we present a new computational model that simulates performance in multicue category learning, such as the “weather prediction” task. The model addresses how PD and dopamine medications affect stimulus selection processes, which mediate reinforcement learning. In this model, PFC dopamine is key for attentional learning, whereas basal ganglia dopamine, consistent with other models, is key for reinforcement and motor learning. The model assumes that competitive dynamics among PFC neurons is the neural mechanism underlying stimulus selection with limited attentional resources, whereas competitive dynamics among striatal neurons is the neural mechanism underlying action selection. According to our model, PD is associated with decreased phasic and tonic dopamine levels in both PFC and basal ganglia. We assume that dopamine medications increase dopamine levels in both the basal ganglia and PFC, which, in turn, increase tonic dopamine levels but decrease the magnitude of phasic dopamine signaling in these brain structures. Increase of tonic dopamine levels in the simulated PFC enhances attentional shifting performance. The model provides a mechanistic account for several phenomena, including (a) medicated PD patients are more impaired at multicue probabilistic category learning than unmedicated patients and (b) medicated PD patients opt out of reversal when there are alternative and redundant cue dimensions.

Download Full-text

Predictive olfactory learning in Drosophila

Scientific Reports ◽

10.1038/s41598-021-85841-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chang Zhao ◽

Yves F. Widmer ◽

Sören Diegelmann ◽

Mihai A. Petrovici ◽

Simon G. Sprecher ◽

...

Keyword(s):

Synaptic Plasticity ◽

Dopaminergic Neurons ◽

Mushroom Body ◽

Trace Conditioning ◽

Fruit Fly ◽

Olfactory Learning ◽

Shock Strength ◽

Behavioral Experiments ◽

Kenyon Cells ◽

Output Neurons

AbstractOlfactory learning and conditioning in the fruit fly is typically modelled by correlation-based associative synaptic plasticity. It was shown that the conditioning of an odor-evoked response by a shock depends on the connections from Kenyon cells (KC) to mushroom body output neurons (MBONs). Although on the behavioral level conditioning is recognized to be predictive, it remains unclear how MBONs form predictions of aversive or appetitive values (valences) of odors on the circuit level. We present behavioral experiments that are not well explained by associative plasticity between conditioned and unconditioned stimuli, and we suggest two alternative models for how predictions can be formed. In error-driven predictive plasticity, dopaminergic neurons (DANs) represent the error between the predictive odor value and the shock strength. In target-driven predictive plasticity, the DANs represent the target for the predictive MBON activity. Predictive plasticity in KC-to-MBON synapses can also explain trace-conditioning, the valence-dependent sign switch in plasticity, and the observed novelty-familiarity representation. The model offers a framework to dissect MBON circuits and interpret DAN activity during olfactory learning.

Download Full-text

Corticostriatal Plastic Changes in Experimental L-DOPA-Induced Dyskinesia

Parkinson s Disease ◽

10.1155/2012/358176 ◽

2012 ◽

Vol 2012 ◽

pp. 1-10 ◽

Cited By ~ 3

Author(s):

Veronica Ghiglieri ◽

Vincenza Bagetta ◽

Valentina Pendolino ◽

Barbara Picconi ◽

Paolo Calabresi

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Basal Ganglia ◽

Experimental Models ◽

Striatal Neurons ◽

Da Receptors ◽

Electrophysiological Studies ◽

Plastic Changes ◽

Stimulation Of

In Parkinson’s disease (PD), alteration of dopamine- (DA-) dependent striatal functions and pulsatile stimulation of DA receptors caused by the discontinuous administration of levodopa (L-DOPA) lead to a complex cascade of events affecting the postsynaptic striatal neurons that might account for the appearance of L-DOPA-induced dyskinesia (LID). Experimental models of LID have been widely used and extensively characterized in rodents and electrophysiological studies provided remarkable insights into the inner mechanisms underlying L-DOPA-induced corticostriatal plastic changes. Here we provide an overview of recent findings that represent a further step into the comprehension of mechanisms underlying maladaptive changes of basal ganglia functions in response to L-DOPA and associated to development of LID.

Download Full-text