SQL Injections and Reinforcement Learning: An Empirical Evaluation of the Role of Action Structure

A variety of machine learning models have been proposed to assess the performance of players in professional sports. However, they have only a limited ability to model how player performance depends on the game context. This paper proposes a new approach to capturing game context: we apply Deep Reinforcement Learning (DRL) to learn an action-value Q function from 3M play-by-play events in the National Hockey League (NHL). The neural network representation integrates both continuous context signals and game history, using a possession-based LSTM. The learned Q-function is used to value players' actions under different game contexts. To assess a player's overall performance, we introduce a novel Game Impact Metric (GIM) that aggregates the values of the player's actions. Empirical Evaluation shows GIM is consistent throughout a play season, and correlates highly with standard success measures and future salary.

Download Full-text

Shared mechanisms of auditory and non-auditory vocal learning in the songbird brain

10.1101/2021.12.09.471883 ◽

2021 ◽

Author(s):

James McGregor ◽

Abigail Grassler ◽

Paul I. Jaffe ◽

Amanda Louise Jacob ◽

Michael Brainard ◽

...

Keyword(s):

Reinforcement Learning ◽

Basal Ganglia ◽

General Principle ◽

Auditory Feedback ◽

Sensory Feedback ◽

Vocal Learning ◽

Neural Systems ◽

Auditory Information ◽

Vocal Plasticity

Songbirds and humans share the ability to adaptively modify their vocalizations based on sensory feedback. Prior studies have focused primarily on the role that auditory feedback plays in shaping vocal output throughout life. In contrast, it is unclear whether and how non-auditory information drives vocal plasticity. Here, we first used a reinforcement learning paradigm to establish that non-auditory feedback can drive vocal learning in adult songbirds. We then assessed the role of a songbird basal ganglia-thalamocortical pathway critical to auditory vocal learning in this novel form of vocal plasticity. We found that both this circuit and its dopaminergic inputs are necessary for non-auditory vocal learning, demonstrating that this pathway is not specialized exclusively for auditory-driven vocal learning. The ability of this circuit to use both auditory and non-auditory information to guide vocal learning may reflect a general principle for the neural systems that support vocal plasticity across species.

Download Full-text

Empirical evaluation of the role of sodium silicate on the separation of silica from Jordanian siliceous phosphate

Separation and Purification Technology ◽

10.1016/j.seppur.2009.03.034 ◽

2009 ◽

Vol 67 (3) ◽

pp. 289-294 ◽

Cited By ~ 12

Author(s):

S. Al-Thyabat

Keyword(s):

Sodium Silicate ◽

Empirical Evaluation

Download Full-text

Role of Prefrontal Cortex in Reinforcement Learning and Decision Making

Principles of Frontal Lobe Function ◽

10.1093/med/9780199837755.003.0020 ◽

2013 ◽

pp. 259-272

Author(s):

Daeyeol Lee ◽

Soyoun Kim ◽

Hyojung Seo

Keyword(s):

Decision Making ◽

Prefrontal Cortex ◽

Reinforcement Learning

Download Full-text

Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013647 ◽

2019 ◽

Vol 33 ◽

pp. 3647-3655

Author(s):

Carles Gelada ◽

Marc G. Bellemare

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Theoretical Perspective ◽

Empirical Evaluation ◽

Nonlinear Function ◽

Policy Learning ◽

Extensive Evaluation ◽

Probability Simplex ◽

Performance Gains ◽

Nonlinear Function Approximation

In this paper we revisit the method of off-policy corrections for reinforcement learning (COP-TD) pioneered by Hallak et al. (2017). Under this method, online updates to the value function are reweighted to avoid divergence issues typical of off-policy learning. While Hallak et al.’s solution is appealing, it cannot easily be transferred to nonlinear function approximation. First, it requires a projection step onto the probability simplex; second, even though the operator describing the expected behavior of the off-policy learning algorithm is convergent, it is not known to be a contraction mapping, and hence, may be more unstable in practice. We address these two issues by introducing a discount factor into COP-TD. We analyze the behavior of discounted COP-TD and find it better behaved from a theoretical perspective. We also propose an alternative soft normalization penalty that can be minimized online and obviates the need for an explicit projection step. We complement our analysis with an empirical evaluation of the two techniques in an off-policy setting on the game Pong from the Atari domain where we find discounted COP-TD to be better behaved in practice than the soft normalization penalty. Finally, we perform a more extensive evaluation of discounted COP-TD in 5 games of the Atari domain, where we find performance gains for our approach.

Download Full-text