Error controlled actor-critic

Xingen Gao, Fei Chao*, Changle Zhou, Zhen Ge, Longzhi Yang, Xiang Chang, Changjing Shang, Qiang Shen

*Awdur cyfatebol y gwaith hwn

Allbwn ymchwil: Cyfraniad at gyfnodolynErthygladolygiad gan gymheiriaid

1 Dyfyniadau(SciVal)
21 Wedi eu Llwytho i Lawr (Pure)

Crynodeb

The approximation inaccuracy of the value function in reinforcement learning (RL) algorithms unavoidably leads to an overestimation phenomenon, which has negative effects on the convergence of the algorithms. To limit the negative effects of the approximation error, we propose error controlled actor-critic (ECAC) which ensures the approximation error is limited within the value function. We present an investigation of how approximation inaccuracy can impair the optimization process of actor-critic approaches. In addition, we derive an upper bound for the approximation error of the Q function approximator and discover that the error can be reduced by limiting the KL- divergence between every two consecutive policies during policy training. Experiments on a variety of continuous control tasks demonstrate that the proposed actor-critic approach decreases approximation error and outperforms previous model-free RL algorithms by a significant margin.

Iaith wreiddiolSaesneg
Tudalennau (o-i)62-74
Nifer y tudalennau13
CyfnodolynInformation Sciences
Cyfrol612
Dyddiad ar-lein cynnar02 Medi 2022
Dynodwyr Gwrthrych Digidol (DOIs)
StatwsCyhoeddwyd - 01 Hyd 2022

Ôl bys

Gweld gwybodaeth am bynciau ymchwil 'Error controlled actor-critic'. Gyda’i gilydd, maen nhw’n ffurfio ôl bys unigryw.

Dyfynnu hyn