Online reinforcement learning control by Bayesian inference

Xia Zhongpu Zhao Dongbin · 2016

阅读量：96

期刊名称：

IET Control Theory and Applications 2016 年 10 卷 12 期

发表日期：

2016.08.08

摘要：

Reinforcement learning offers a promising way for self-learning control of an unknown system, but it involves the issues of policy evaluation and exploration, especially in the domain of continuous state. In this study, these issues are addressed from the perspective of probability. It models the action value function as the latent variable of Gaussian process, while the reward as the observed variable. Then an online approach is proposed to update the action value function by Bayesian inference. Taking an advantage of the proposed framework, a prior knowledge can be incorporated into the action value function, and thus an efficient exploration strategy is presented. At last, the Bayesian-state-action-reward-state-action algorithm is tested on some benchmark problems and empirical results show its effectiveness.