AbstractA set of sub-cortical nuclei called basal ganglia is critical for learning the values of actions. The basal ganglia include two pathways, which have been associated with approach and avoid behavior respectively, and are differentially modulated by dopamine projections from the midbrain. According to the influential opponent actor learning model, these pathways represent learned estimates of the positive and negative consequences (payoffs and costs) of actions. The level of dopamine release controls to what extent payoffs and costs enter the overall evaluation of actions. How the knowledge about payoff and cost is acquired is still an open question, even though many theories describe learning from feedback in the basal ganglia. We examine whether a set of plasticity rules proposed to model reinforcement learning in the pathways of the basal ganglia is suitable to extract payoffs and costs from a reward prediction error signal. First, we determine the result of such learning, both analytically and via simulations, for different reward schedules that feature payoffs and costs. Then, we combine the plasticity rules with a decision rule to examine the emerging effect of dopaminergic modulation on the willingness to work for reward. We find that the plasticity rules are suitable to infer the mean payoffs and costs of actions, if those occur at different moments in time. Successful learning requires differential effects of positive and negative reward prediction errors on the two pathways, and a weak decay of synaptic weights over trials. We also confirm that dopaminergic modulation produces effects on the willingness to work for reward similar to those observed in classical experiments.Author summaryThe basal ganglia are structures underneath the surface of the vertebrate brain, associated with error driven learning. Much is known about the anatomical and biological features of the basal ganglia; scientists now try to understand the algorithms implemented by these structures. Numerous models aspire to capture the learning functionality, but many of them only cover some specific aspect of the algorithm. Instead of further adding to that pool of partial models, we unify two existing ones - one which captures what the basal ganglia learns, and one that describes the learning mechanism itself. The first model suggests that the basal ganglia keeps track of both positive and negative consequences of frequent opportunities, and weighs these by the motivational state in decisions. It explains how payoff and cost are represented, but not how those representations arise. The other model consists of biologically plausible plasticity rules, which describe how learning takes place, but not how the brain makes use of what is learned. We show that the two theories are compatible. Together, they form a model of learning and decision making that integrates the motivational state as well as the learned payoffs and costs of opportunities.