Penalized Bootstrapping for Reinforcement Learning in Robot Control.

The recent progress in reinforcement learning algorithms enabled more complex tasks and, at the same time, enforced the need for a careful balance between exploration and exploitation. Enhanced exploration supersedes the requirement to hardly constrain the agent, e.g., with complex reward functions. This seems highly promising as it reduces the work for learning new tasks, while improving the agents performance. In this paper, we address deep exploration in reinforcement learning. Our approach is based on Thompson sampling and keeps multiple hypotheses of the posterior knowledge. We maintain the distribution over the hypotheses by a potential field based penalty function. The resulting policy is more performant in terms of collected reward. Furthermore, is our method faster in application and training than the current state of the art. We evaluate our approach in low-level robot control tasks to back up our claims of a more performant policy and faster training procedure

Images

Links fields

Files and Media

Settings

Layout

Options

Categorization

Contents

There are currently no items in this folder.

Reinforcement Learning