Domain-Dependent Option Policies in Autonomous Robot Learning

Leticia M. Friske; Carlos H. C. Ribeiro

doi:10.1109/SCCC.2001.972633

Abstract

In Control-related applications such as Robotics, determination of optimal solutions is made very difficult for many reasons. Among these stands the difficulty in finding out an appropriate model of the domain, as defined by the control agent (robot), environment where it acts and their interaction. Reinforcement Learning is a theory which defines a colection of algorithms for determination of control actions under model-free assumptions, which allows control agents to learn optimal actions in an autonomous way. In Reinforcement Learning, a cost functional to be optimised is determined in advance. The agent then learns how to perform this optimisation via trial and error on its environment. A trial corresponds to execution of actions chosen by the agent, and the error is the immediate result (a real-valued reinforcement) of this action.In the work reported herein, we consider trials by a learning robotic agent which are not based on low level actions, but instead on sequences of actions (options or macro-operators). We analysed the performance - both in terms of learning speed and quality of learned control - for options that correspond to mappings from states to action policies. Experimental results show that careful (domain-dependent) selection of options produce much faster learning for option-based robots when compared to their action-based counterparts. Of critical importance, however, is the option mapping in regions of the state space where the options are not assumed to be necessary: as performance of Reinforcement Learning algorithms is strongly dependent on sufficient exploration of the state space, even in such regions a careful selection of actions is of foremost importance.

Domain-Dependent Option Policies in Autonomous Robot Learning

Authors

Abstract