scholarly journals Temporal Induced Self-Play for Stochastic Bayesian Games

Author(s):  
Weizhe Chen ◽  
Zihan Zhou ◽  
Yi Wu ◽  
Fei Fang

One practical requirement in solving dynamic games is to ensure that the players play well from any decision point onward. To satisfy this requirement, existing efforts focus on equilibrium refinement, but the scalability and applicability of existing techniques are limited. In this paper, we propose Temporal-Induced Self-Play (TISP), a novel reinforcement learning-based framework to find strategies with decent performances from any decision point onward. TISP uses belief-space representation, backward induction, policy learning, and non-parametric approximation. Building upon TISP, we design a policy-gradient-based algorithm TISP-PG. We prove that TISP-based algorithms can find approximate Perfect Bayesian Equilibrium in zero-sum one-sided stochastic Bayesian games with finite horizon. We test TISP-based algorithms in various games, including finitely repeated security games and a grid-world game. The results show that TISP-PG is more scalable than existing mathematical programming-based methods and significantly outperforms other learning-based methods.

Author(s):  
João P. Hespanha

This book is aimed at students interested in using game theory as a design methodology for solving problems in engineering and computer science. The book shows that such design challenges can be analyzed through game theoretical perspectives that help to pinpoint each problem's essence: Who are the players? What are their goals? Will the solution to “the game” solve the original design problem? Using the fundamentals of game theory, the book explores these issues and more. The use of game theory in technology design is a recent development arising from the intrinsic limitations of classical optimization-based designs. In optimization, one attempts to find values for parameters that minimize suitably defined criteria—such as monetary cost, energy consumption, or heat generated. However, in most engineering applications, there is always some uncertainty as to how the selected parameters will affect the final objective. Through a sequential and easy-to-understand discussion, the book examines how to make sure that the selection leads to acceptable performance, even in the presence of uncertainty—the unforgiving variable that can wreck engineering designs. The book looks at such standard topics as zero-sum, non-zero-sum, and dynamic games and includes a MATLAB guide to coding. This book offers students a fresh way of approaching engineering and computer science applications.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Chaohai Kang ◽  
Chuiting Rong ◽  
Weijian Ren ◽  
Fengcai Huo ◽  
Pengyun Liu

Nanophotonics ◽  
2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Sean Hooten ◽  
Raymond G. Beausoleil ◽  
Thomas Van Vaerenbergh

Abstract We present a proof-of-concept technique for the inverse design of electromagnetic devices motivated by the policy gradient method in reinforcement learning, named PHORCED (PHotonic Optimization using REINFORCE Criteria for Enhanced Design). This technique uses a probabilistic generative neural network interfaced with an electromagnetic solver to assist in the design of photonic devices, such as grating couplers. We show that PHORCED obtains better performing grating coupler designs than local gradient-based inverse design via the adjoint method, while potentially providing faster convergence over competing state-of-the-art generative methods. As a further example of the benefits of this method, we implement transfer learning with PHORCED, demonstrating that a neural network trained to optimize 8° grating couplers can then be re-trained on grating couplers with alternate scattering angles while requiring >10× fewer simulations than control cases.


Author(s):  
Wooshik Myung ◽  
Donghyun Lee ◽  
Chenhang Song ◽  
Guanrui Wang ◽  
Cheng Ma

Author(s):  
João P. Hespanha

This chapter focuses on the computation of the saddle-point equilibrium of a zero-sum discrete time dynamic game in a state-feedback policy. It begins by considering solution methods for two-player zero sum dynamic games in discrete time, assuming a finite horizon stage-additive cost that Player 1 wants to minimize and Player 2 wants to maximize, and taking into account a state feedback information structure. The discussion then turns to discrete time dynamic programming, the use of MATLAB to solve zero-sum games with finite state spaces and finite action spaces, and discrete time linear quadratic dynamic games. The chapter concludes with a practice exercise that requires computing the cost-to-go for each state of the tic-tac-toe game, and the corresponding solution.


Sign in / Sign up

Export Citation Format

Share Document