Abstract
In Reinforcement Learning applications such as autonomous robot navigation, the use of options (macro-operators) instead of low level actions has been reported to produce learning speedup due to a more agressive exploration of the state space. In this paper we present an evaluation of the use of option policies OS. Each option policy in this framework is a fixed sequence of actions, depending exclusively on the state in which the option is initiated. This contrasts with option policies OP, more common in the literature and that correspond to action sequences that depend on the states visited during the execution of the options. One of our goals was to analyse the effects of a variation of the action sequence length for OS policies. The main contribution of the paper, however, is a study on the use of a Termination Improvement technique which allows for the abortion of option execution if a more promissing one is found. Experimental results show that Termination Improvement for OS options, whose benefits had already been reported for OP options, can be much more effective - due to its adaptation of the size of the action sequence depending on the state where the option is initiated - than indiscriminately augmenting the option size in order to increase exploration of the state space.