We study the convergence of deterministic policy gradient algorithms in continuous state and action space for the prototypical Linear Quadratic Regulator (LQR) problem when the search space is not limited to the family of linear policies. We first provide a counterexample showing that extending the policy class to piecewise linear functions results in local minima of the policy gradient algorithm. To solve this problem, we develop a new approach that involves sequentially increasing a discount factor between iterations of the original policy gradient algorithm. We finally prove that this homotopic variant of policy gradient methods converges to the global optimum of the undiscounted Linear Quadratic Regulator problem for a large class of Lipschitz, non-linear policies.
A Homotopic Approach to Policy Gradients for Linear Quadratic Regulators with Nonlinear Controls
Agazzi A.
2022-01-01
Abstract
We study the convergence of deterministic policy gradient algorithms in continuous state and action space for the prototypical Linear Quadratic Regulator (LQR) problem when the search space is not limited to the family of linear policies. We first provide a counterexample showing that extending the policy class to piecewise linear functions results in local minima of the policy gradient algorithm. To solve this problem, we develop a new approach that involves sequentially increasing a discount factor between iterations of the original policy gradient algorithm. We finally prove that this homotopic variant of policy gradient methods converges to the global optimum of the undiscounted Linear Quadratic Regulator problem for a large class of Lipschitz, non-linear policies.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.