We introduce a reinforcement learning architecture designed for problems with an infinite number of states, where each state can be seen as a vector of real numbers and with a finite number of actions, where each action requires a vector of real numbers as parameters. The main objective of this architecture is to distribute in two actors the work required to learn the final policy. One actor decides what action must be performed; meanwhile, a second actor determines the right parameters for the selected action. We tested our architecture and one algorithm based on it solving the robot dribbling problem, a challenging robot control problem taken from the RoboCup competitions. Our experimental work with three different function approximators provides enough evidence to prove that the proposed architecture can be used to implement fast, robust, and reliable reinforcement learning algorithms. 1. Introduction Applying reinforcement learning (RL) to solve real-world robotic problems is certainly not so common nowadays mainly because most RL methods require several training episodes to learn an optimal policy. This condition supposes having a robot performing a task several thousand times, as it learns through reinforcement learning. In addition to the time required for the training process, we must also consider the time we must spend calibrating sensors and actuators, and the possible damage the robots may suffer. Therefore, one common approach is to first try to solve difficult problems with continuous states and actions in simulated environments, where even the noise of real sensors and actuators can be simulated. In this paper we propose a novel RL architecture for continuous state and actions spaces. Such an architecture was tested with a difficult control problem in the official simulator of the RoboCup . The Robot World Cup or RoboCup for short is an international tournament taking place every year since 1997, each year in a different country. The RoboCup is known up to date as a standard and challenging problem for artificial intelligence and robotics. The most important goal of RoboCup is to advance the overall technological level of society, and as a more pragmatic goal to achieve the following. By mid-twenty-first century, a team of fully autonomous humanoid robot soccer players shall win the soccer game, complying with the official rule of the FIFA, against the winner of the most recent World Cup. One of the competitions in this tournament is the simulation league. In this category two teams of eleven virtual soccer players each play for ten
M. Riedmiller and T. Gabel, “On experiences in a complex and competitive gaming domain: reinforcement learning meets RoboCup,” in Proceedings of the 3rd IEEE Symposium on Computational Intelligence and Games (CIG '07), pp. 17–23, April 2007.
P. Stone, G. Kuhlmann, M. E. Taylor, and Y. Liu, “Keepaway soccer: from machine learning testbed to benchmark,” in RoboCup-2005: Robot Soccer World Cup IX, I. Noda, A. Jacoff, A. Bredenfeld, and Y. Takahashi, Eds., pp. 93–105, Springer, Berlin, Germany, 2006.