Temporal Induced Self-Play for Stochastic Bayesian Games

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/14 ◽

2021 ◽

Author(s):

Weizhe Chen ◽

Zihan Zhou ◽

Yi Wu ◽

Fei Fang

Keyword(s):

Dynamic Games ◽

Space Representation ◽

Decision Point ◽

Bayesian Games ◽

Bayesian Equilibrium ◽

Policy Gradient ◽

Gradient Based ◽

Security Games ◽

Zero Sum ◽

Parametric Approximation

One practical requirement in solving dynamic games is to ensure that the players play well from any decision point onward. To satisfy this requirement, existing efforts focus on equilibrium refinement, but the scalability and applicability of existing techniques are limited. In this paper, we propose Temporal-Induced Self-Play (TISP), a novel reinforcement learning-based framework to find strategies with decent performances from any decision point onward. TISP uses belief-space representation, backward induction, policy learning, and non-parametric approximation. Building upon TISP, we design a policy-gradient-based algorithm TISP-PG. We prove that TISP-based algorithms can find approximate Perfect Bayesian Equilibrium in zero-sum one-sided stochastic Bayesian games with finite horizon. We test TISP-based algorithms in various games, including finitely repeated security games and a grid-world game. The results show that TISP-PG is more scalable than existing mathematical programming-based methods and significantly outperforms other learning-based methods.

Download Full-text

Noncooperative Game Theory

10.23943/princeton/9780691175218.001.0001 ◽

2017 ◽

Author(s):

João P. Hespanha

Keyword(s):

Game Theory ◽

Computer Science ◽

Dynamic Games ◽

Design Methodology ◽

Noncooperative Game ◽

Noncooperative Game Theory ◽

Original Design ◽

Theoretical Perspectives ◽

Engineering Designs ◽

Zero Sum

This book is aimed at students interested in using game theory as a design methodology for solving problems in engineering and computer science. The book shows that such design challenges can be analyzed through game theoretical perspectives that help to pinpoint each problem's essence: Who are the players? What are their goals? Will the solution to “the game” solve the original design problem? Using the fundamentals of game theory, the book explores these issues and more. The use of game theory in technology design is a recent development arising from the intrinsic limitations of classical optimization-based designs. In optimization, one attempts to find values for parameters that minimize suitably defined criteria—such as monetary cost, energy consumption, or heat generated. However, in most engineering applications, there is always some uncertainty as to how the selected parameters will affect the final objective. Through a sequential and easy-to-understand discussion, the book examines how to make sure that the selection leads to acceptable performance, even in the presence of uncertainty—the unforgiving variable that can wreck engineering designs. The book looks at such standard topics as zero-sum, non-zero-sum, and dynamic games and includes a MATLAB guide to coding. This book offers students a fresh way of approaching engineering and computer science applications.

Download Full-text

Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay

IEEE Access ◽

10.1109/access.2021.3074535 ◽

2021 ◽

pp. 1-1

Author(s):

Chaohai Kang ◽

Chuiting Rong ◽

Weijian Ren ◽

Fengcai Huo ◽

Pengyun Liu

Keyword(s):

Double Network ◽

Policy Gradient ◽

Experience Replay ◽

Gradient Based

Download Full-text

Existence of Value and Randomized Strategies in Zero-Sum Discrete-Time Stochastic Dynamic Games

SIAM Journal on Control and Optimization ◽

10.1137/0319039 ◽

1981 ◽

Vol 19 (5) ◽

pp. 617-634 ◽

Cited By ~ 30

Author(s):

P. R. Kumar ◽

T. H. Shiau

Keyword(s):

Discrete Time ◽

Dynamic Games ◽

Stochastic Dynamic ◽

Randomized Strategies ◽

Existence Of Value ◽

Zero Sum ◽

Stochastic Dynamic Games

Download Full-text

Zero-sum dynamic games and a stochastic variation of Ramsey's theorem

Stochastic Processes and their Applications ◽

10.1016/j.spa.2004.03.001 ◽

2004 ◽

Vol 112 (2) ◽

pp. 319-329 ◽

Cited By ~ 1

Author(s):

Eran Shmaya ◽

Eilon Solan

Keyword(s):

Dynamic Games ◽

Stochastic Variation ◽

Ramsey's Theorem ◽

Ramsey’S Theorem ◽

Zero Sum

Download Full-text

Event-Triggered Control of Discrete-Time Zero-Sum Games via Deterministic Policy Gradient Adaptive Dynamic Programming

IEEE Transactions on Systems Man and Cybernetics Systems ◽

10.1109/tsmc.2021.3105663 ◽

2021 ◽

pp. 1-13

Author(s):

Yongwei Zhang ◽

Bo Zhao ◽

Derong Liu ◽

Shunchao Zhang

Keyword(s):

Dynamic Programming ◽

Discrete Time ◽

Adaptive Dynamic Programming ◽

Adaptive Dynamic ◽

Zero Sum Games ◽

Time Zero ◽

Policy Gradient ◽

Zero Sum ◽

Event Triggered

Download Full-text

Deep Deterministic Policy Gradient-based intelligent control scheme design for DC-DC circuit

10.1109/icamechs54019.2021.9661500 ◽

2021 ◽

Author(s):

Ligong Zhang ◽

Xinhui Zhu ◽

Chenyang Bai ◽

Junshan Li

Keyword(s):

Intelligent Control ◽

Scheme Design ◽

Control Scheme ◽

Policy Gradient ◽

Gradient Based

Download Full-text

Using One-Sided Partially Observable Stochastic Games for Solving Zero-Sum Security Games with Sequential Attacks

Lecture Notes in Computer Science - Decision and Game Theory for Security ◽

10.1007/978-3-030-64793-3_21 ◽

2020 ◽

pp. 385-404

Author(s):

Petr Tomášek ◽

Branislav Bošanský ◽

Thanh H. Nguyen

Keyword(s):

Stochastic Games ◽

Security Games ◽

Zero Sum ◽

Partially Observable

Download Full-text

Inverse design of grating couplers using the policy gradient method from reinforcement learning

Nanophotonics ◽

10.1515/nanoph-2021-0332 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Sean Hooten ◽

Raymond G. Beausoleil ◽

Thomas Van Vaerenbergh

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Gradient Method ◽

Photonic Devices ◽

Inverse Design ◽

Grating Couplers ◽

Electromagnetic Devices ◽

Policy Gradient ◽

Gradient Based ◽

Local Gradient

Abstract We present a proof-of-concept technique for the inverse design of electromagnetic devices motivated by the policy gradient method in reinforcement learning, named PHORCED (PHotonic Optimization using REINFORCE Criteria for Enhanced Design). This technique uses a probabilistic generative neural network interfaced with an electromagnetic solver to assist in the design of photonic devices, such as grating couplers. We show that PHORCED obtains better performing grating coupler designs than local gradient-based inverse design via the adjoint method, while potentially providing faster convergence over competing state-of-the-art generative methods. As a further example of the benefits of this method, we implement transfer learning with PHORCED, demonstrating that a neural network trained to optimize 8° grating couplers can then be re-trained on grating couplers with alternate scattering angles while requiring >10× fewer simulations than control cases.

Download Full-text

Policy Gradient-Based Core Placement Optimization for Multichip Many-Core Systems

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2021.3117878 ◽

2021 ◽

pp. 1-15

Author(s):

Wooshik Myung ◽

Donghyun Lee ◽

Chenhang Song ◽

Guanrui Wang ◽

Cheng Ma

Keyword(s):

Placement Optimization ◽

Policy Gradient ◽

Gradient Based ◽

Many Core

Download Full-text

State-Feedback Zero-Sum Dynamic Games

Noncooperative Game Theory ◽

10.23943/princeton/9780691175218.003.0017 ◽

2017 ◽

Author(s):

João P. Hespanha

Keyword(s):

Discrete Time ◽

Information Structure ◽

Dynamic Games ◽

State Feedback ◽

Linear Quadratic ◽

Feedback Information ◽

Time Dynamic ◽

Finite State ◽

Solution Methods ◽

Zero Sum

This chapter focuses on the computation of the saddle-point equilibrium of a zero-sum discrete time dynamic game in a state-feedback policy. It begins by considering solution methods for two-player zero sum dynamic games in discrete time, assuming a finite horizon stage-additive cost that Player 1 wants to minimize and Player 2 wants to maximize, and taking into account a state feedback information structure. The discussion then turns to discrete time dynamic programming, the use of MATLAB to solve zero-sum games with finite state spaces and finite action spaces, and discrete time linear quadratic dynamic games. The chapter concludes with a practice exercise that requires computing the cost-to-go for each state of the tic-tac-toe game, and the corresponding solution.

Download Full-text