Optimal Myopic Policy for Restless Bandit: A Perspective of Eigendecomposition

Author(s):  
Kehao Wang ◽  
Jihong Yu ◽  
Lin Chen ◽  
Pan Zhou ◽  
Moe Win
1999 ◽  
Vol 12 (2) ◽  
pp. 151-160 ◽  
Author(s):  
Doncho S. Donchev

We consider the symmetric Poissonian two-armed bandit problem. For the case of switching arms, only one of which creates reward, we solve explicitly the Bellman equation for a β-discounted reward and prove that a myopic policy is optimal.


2021 ◽  
Author(s):  
Xiao-Yue Gong ◽  
Vineet Goyal ◽  
Garud N. Iyengar ◽  
David Simchi-Levi ◽  
Rajan Udwani ◽  
...  

We consider an online assortment optimization problem where we have n substitutable products with fixed reusable capacities [Formula: see text]. In each period t, a user with some preferences (potentially adversarially chosen) who offers a subset of products, St, from the set of available products arrives at the seller’s platform. The user selects product [Formula: see text] with probability given by the preference model and uses it for a random number of periods, [Formula: see text], that is distributed i.i.d. according to some distribution that depends only on j generating a revenue [Formula: see text] for the seller. The goal of the seller is to find a policy that maximizes the expected cumulative revenue over a finite horizon T. Our main contribution is to show that a simple myopic policy (where we offer the myopically optimal assortment from the available products to each user) provides a good approximation for the problem. In particular, we show that the myopic policy is 1/2-competitive, that is, the expected cumulative revenue of the myopic policy is at least half the expected revenue of the optimal policy with full information about the sequence of user preference models and the distribution of random usage times of all the products. In contrast, the myopic policy does not require any information about future arrivals or the distribution of random usage times. The analysis is based on a coupling argument that allows us to bound the expected revenue of the optimal algorithm in terms of the expected revenue of the myopic policy. We also consider the setting where usage time distributions can depend on the type of each user and show that in this more general case there is no online algorithm with a nontrivial competitive ratio guarantee. Finally, we perform numerical experiments to compare the robustness and performance of myopic policy with other natural policies. This paper was accepted by Gabriel Weintraub, revenue management and analytics.


2021 ◽  
Vol 290 (2) ◽  
pp. 622-639
Author(s):  
Jianyu Xu ◽  
Lujie Chen ◽  
Ou Tang

Sign in / Sign up

Export Citation Format

Share Document