NeurIPS 2019 - Robot open-Ended Autonomous Learning
Challenge Rules
The rules of the competition will be as follows:
- Overview The competition focuses on autonomous open-ended learning with a simulated robot. The setup features a simulated robot that in a first intrinsic phase interacts autonomously with a partially unknown environment and learns how to interact with it, and then in a second extrinsic phase has to solve a number of tasks on the basis of the knowledge acquired in the first phase. Importantly, in the intrinsic phase the system does not know the tasks it will have to solve in the extrinsic phase.
- Simulator To this purpose, the competitors will be given a software kit with which they will be able to install the simulator of the robot and environment on their machines (see below).
- Robot The robot will be formed by: an arm; a gripper; one camera.
- Environment The environment used will be a simplified kitchen-like scenario formed by: a table with; a shelf; some kitchen objects.
- Training and testing phases During the development of the system, and during its evaluation (performance scoring), the competitor systems will have to undergo two phases: an intrinsic phase of training; an extrinsic phase of testing.
- Intrinsic phase During the intrinsic phase, the robot will have to autonomously interact with an environment for a certain period of time during which it should acquire as much knowledge and skills as possible, to best solve the tasks in the extrinsic phase. Importantly, during the intrinsic phase the robot will not be aware of the tasks it will have to solve in the extrinsic phase.
-
Extrinsic phase During the extrinsic phase the system will be tested for the quality of the knowledge acquired during the intrinsic phase.
During the extrinsic phase, the robot will undergo 3 challenges (see below) to be solved on the basis of the knowledge acquired during the intrinsic phase.
For every task the robot is given during these challenges, the environment will be put in a different starting state and the robot will be given a camera image of how the environment has to look like when he has achieved the goal of the task. -
Learning time budget The time available for learning in the intrinsic phase is limited to Dint minutes of simulated time. Learning in the extrinsic phase will be possible but its utility will be strongly limited by the short time available to solve each task, consisting in Dext seconds of simulated time for solving each task.
- Three challenges During the extrinsic phase, there will be three kind of challenges.
The three challenges involve tasks drawn from the following classes of possible problems defined on the basis of the nature of the goal to accomplish.
For each task, the agent is given an image of the configuration of the objects it has to reach. Each time the agent is given a task, the objects are placed in a different starting position.
The challenges are as follows:
- 2D challenge: goal defined in terms of the configuration of 3 objects on the table plane; objects will not be placed on the shelf.
- 2.5D challenge: goal defined in terms of the configuration
of 3 objects on the table plane and on the shelf; one or more objects will have to be moved from the table to the shelf and vice-versa.
For both 2D and 2.5D challenges, the objects will start in different positions for each task but they will have a fixed orientation, both in the initial positions and in the final configuration they are required to reach. - 3D challenge: goal defined in terms of a configuration of 3 objects with no restrictions (objects can assume any orientation and be in any part of the table and on the shelf).
-
Repetitions of the challenges Each challenge will be repeated multiple times with different goals..
-
Knowledge transfer The only regularities (`structure’) that are shared between the intrinsic and the extrinsic phase are related to the environment and objects; in particular in the intrinsic phase the robot has no knowledge about which tasks it will be called to solve in the extrinsic phase. Therefore, in the intrinsic phase the robot should undergo an autonomous open-ended learning process that should lead it to acquire, in the available time, as much knowledge and as many skills as possible to be ready to best face the unknown tasks of the following extrinsic phase.
- Competition structure The competition will be divided into two rounds.
- Round 1: During the first round, submissions will be evaluated by running only the extrinsic phase. Participants will have to pre-train their robot controllers on their machines before submission. Top 20 ranked participants whose submissions follow the spirit of the rules will be able to participate to Round 2 (see also Spirit of the Rules and Code inspection below).
- Round 2: during the second round, submissions will be evaluated by running both the intrinsic and extrinsic phase. All final submissions will be checked for coherence with the spirit of the rules.
-
Spirit of the rules As also explained above, the spirit of the rules is that during the intrinsic phase the robot is not explicitly given any task to learn and it does not know of the future extrinsic tasks, but it rather learns in a fully autonomous way.
As such, the Golden Rule is that it is explicitly forbidden to use the scoring function of the extrinsic phase or variants of it as a reward function to train the agent. Participants should give as little information as possible to the robot, rather the system should learn from scratch to interact with the objects using curiosity, intrinsic motivations, self-generated goals, etc.
However, given the difficulty of the competition and the many challenges that it contains and to encourage a wide participation, in Round 1 it will be possible to violate in part the aspects of the spirit of the competition, except the Golden Rule above. For example, it will be possible to use hardwired or pre-trained models for recognising the identity of objects and their position in space.
All submissions, except those violating the Golden Rule, will be considered valid and ranked for Round 1. However, only submissions fully complying with the spirit of the rules will access Round 2 and take part to the final ranking. -
Code inspection To be eligible for ranking, participants are required to open the source code of their submissions to the competition monitoring check. Submitted systems will be sampled for checking their compliance with the competition rules and spirit during the competition. Top ranked submission of Round 1 will be checked for admission to Round 2. All final submissions of Round 2 will be checked before announcing the final ranking and winners.
- Eligibility Participants belonging to the GOAL-Robots project, AIcrowd, or other parts of the Organization Team might participate to the competition to provide baselines for other participants but are ineligible for the final Round 1 and Round 2 competition ranking.