IMPACT: Importance Weighted Asynchronous Architectures with Clipped\n Target Networks
Preprint 2019 en
Authors
ML
Michael Luo
JY
Jiahao Yao
RL
Richard Liaw
Abstract
1 min read
The practical usage of reinforcement learning agents is often bottlenecked by\nthe duration of training time. To accelerate training, practitioners often turn\nto distributed reinforcement learning architectures to parallelize and\naccelerate the training process. However, modern methods for scalable\nreinforcement learning (RL) often tradeoff between the throughput of samples\nthat an RL agent can learn from (sample throughput) and the quality of learning\nfrom each sample (sample efficiency). In these scalable RL architectures, as\none increases sample throughput (i.e. increasing parallelization in IMPALA),\nsample efficiency drops significantly. To address this, we propose a new\ndistributed reinforcement learning algorithm, IMPACT. IMPACT extends IMPALA\nwith three changes: a target network for stabilizing the surrogate objective, a\ncircular buffer, and truncated importance sampling. In discrete action-space\nenvironments, we show that IMPACT attains higher reward and, simultaneously,\nachieves up to 30% decrease in training wall-time than that of IMPALA. For\ncontinuous control environments, IMPACT trains faster than existing scalable\nagents while preserving the sample efficiency of synchronous PPO.\n
Discussion(0)
No comments yet. Be the first to comment.