A supporting hyperplane derivation of the Hamilton-Jacobi-Bellman equation of dynamic programming

In this work we study the stochastic recursive control problem, in which the aggregator (or generator) of the backward stochastic differential equation describing the running cost is continuous but not necessarily Lipschitz with respect to the first unknown variable and the control, and monotonic with respect to the first unknown variable. The dynamic programming principle and the connection between the value function and the viscosity solution of the associated Hamilton-Jacobi-Bellman equation are established in this setting by the generalized comparison theorem for backward stochastic differential equations and the stability of viscosity solutions. Finally we take the control problem of continuous-time Epstein−Zin utility with non-Lipschitz aggregator as an example to demonstrate the application of our study.

Download Full-text

A Generalized dynamic programming principle and hamilton-jacobi-bellman equation

Stochastics and Stochastics Reports ◽

10.1080/17442509208833749 ◽

1992 ◽

Vol 38 (2) ◽

pp. 119-134 ◽

Cited By ~ 135

Author(s):

Shige Peng

Keyword(s):

Dynamic Programming ◽

Bellman Equation ◽

Dynamic Programming Principle ◽

Hamilton Jacobi Bellman Equation ◽

Hamilton Jacobi Bellman

Download Full-text

Model-Free Gradient-Based Adaptive Learning Controller for an Unmanned Flexible Wing Aircraft

Robotics ◽

10.3390/robotics7040066 ◽

2018 ◽

Vol 7 (4) ◽

pp. 66 ◽

Cited By ~ 7

Author(s):

Mohammed Abouheaf ◽

Wail Gueaieb ◽

Frank Lewis

Keyword(s):

Dynamic Programming ◽

Adaptive Learning ◽

Bellman Equation ◽

Control Strategies ◽

Iteration Process ◽

Flexible Wing ◽

Model Free ◽

Hamilton Jacobi Bellman Equation ◽

Gradient Based ◽

Hamilton Jacobi Bellman

Classical gradient-based approximate dynamic programming approaches provide reliable and fast solution platforms for various optimal control problems. However, their dependence on accurate modeling approaches poses a major concern, where the efficiency of the proposed solutions are severely degraded in the case of uncertain dynamical environments. Herein, a novel online adaptive learning framework is introduced to solve action-dependent dual heuristic dynamic programming problems. The approach does not depend on the dynamical models of the considered systems. Instead, it employs optimization principles to produce model-free control strategies. A policy iteration process is employed to solve the underlying Hamilton–Jacobi–Bellman equation using means of adaptive critics, where a layer of separate actor-critic neural networks is employed along with gradient descent adaptation rules. A Riccati development is introduced and shown to be equivalent to solving the underlying Hamilton–Jacobi–Bellman equation. The proposed approach is applied on the challenging weight shift control problem of a flexible wing aircraft. The continuous nonlinear deformation in the aircraft’s flexible wing leads to various aerodynamic variations at different trim speeds, which makes its auto-pilot control a complicated task. Series of numerical simulations were carried out to demonstrate the effectiveness of the suggested strategy.

Download Full-text

Quadratization of Hamilton-Jacobi-Bellman Equation for Near-Optimal Control of Nonlinear Systems

2020 59th IEEE Conference on Decision and Control (CDC) ◽

10.1109/cdc42340.2020.9303913 ◽

2020 ◽

Author(s):

Arash Amini ◽

Qiyu Sun ◽

Nader Motee

Keyword(s):

Optimal Control ◽

Nonlinear Systems ◽

Bellman Equation ◽

Hamilton Jacobi Bellman Equation ◽

Hamilton Jacobi Bellman

Download Full-text

Interception of automated adversarial drone swarms in partially observed environments

Integrated Computer-Aided Engineering ◽

10.3233/ica-210653 ◽

2021 ◽

pp. 1-14

Author(s):

Daniel Saranovic ◽

Martin Pavlovski ◽

William Power ◽

Ivan Stojkovic ◽

Zoran Obradovic

Keyword(s):

Bellman Equation ◽

Pid Controllers ◽

Partially Observed ◽

Hamilton Jacobi Bellman Equation ◽

Partial Feedback ◽

Point Solution ◽

Physical Limitations ◽

Defensive Mechanisms ◽

Hamilton Jacobi Bellman ◽

New Framework

As the prevalence of drones increases, understanding and preparing for possible adversarial uses of drones and drone swarms is of paramount importance. Correspondingly, developing defensive mechanisms in which swarms can be used to protect against adversarial Unmanned Aerial Vehicles (UAVs) is a problem that requires further attention. Prior work on intercepting UAVs relies mostly on utilizing additional sensors or uses the Hamilton-Jacobi-Bellman equation, for which strong conditions need to be met to guarantee the existence of a saddle-point solution. To that end, this work proposes a novel interception method that utilizes the swarm’s onboard PID controllers for setting the drones’ states during interception. The drone’s states are constrained only by their physical limitations, and only partial feedback of the adversarial drone’s positions is assumed. The new framework is evaluated in a virtual environment under different environmental and model settings, using random simulations of more than 165,000 swarm flights. For certain environmental settings, our results indicate that the interception performance of larger swarms under partial observation is comparable to that of a one-drone swarm under full observation of the adversarial drone.

Download Full-text