Ensemble Policies for Diverse Query-Generation in Preference Alignment of Robot Navigation

Conference Proceeding

To align mobile robot navigation policies with user preferences through

reinforcement learning from human feedback (RLHF), reliable and

behavior-diverse user queries are required. However, deterministic policies

fail to generate a variety of navigation trajectory suggestions for a given

navigation task configuration. We introduce EnQuery, a query generation

approach using an ensemble of policies that achieve behavioral diversity

through a regularization term. For a given navigation task, EnQuery produces

multiple navigation trajectory suggestions, thereby optimizing the efficiency

of preference data collection with fewer queries. Our methodology demonstrates

superior performance in aligning navigation policies with user preferences in

low-query regimes, offering enhanced policy convergence from sparse preference

queries. The evaluation is complemented with a novel explainability

representation, capturing full scene navigation behavior of the mobile robot in

a single plot.

IEEE International on Human & Robot Interactive Communication (RO-MAN)

2024