The MAGICAL benchmark suite for robust imitation learning (NeurIPS 2020)

MAGICAL is a benchmark suite to evaluate the generalisation capabilities of imitation learning algorithms. Rather than using the same setting for training and testing, MAGICAL provides one set of "training" environments where demonstrations are observed, and another, distinct set of "testing" environments which each vary in different ways. MAGICAL is a multitask suite, and we refer to the training environment for a given task as the "demo variant", and the testing environments for task as "test variants". This structure makes it possible to evaluate how well an imitation learning (or reward learning) algorithm is able to generalise the intent behind a set of demonstrations to a substantially different setting.

The different tasks that comprise the MAGICAL suite each require similar skills, such as manipulation of 2D blocks, perception of shape and colour, relational reasoning, and so on. This makes it possible, in principle, to use multi-task and meta-IL algorithms that allow for transfer of skills between tasks, and (hopefully) extrapolation of demonstrator intent across the different variants for each task.

Installing and using MAGICAL

You can install MAGICAL using pip:

pip install magical-il

If you have an X server and input device, you can try controlling the robot in
one of the environments:

python -m magical --env-name FindDupe-Demo-v0

Use the arrow keys to move, space bar to close the grippers, and R key to
reset the environment.

At an API level, MAGICAL tasks and variants are just Gym environments. Once
you've installed MAGICAL, you can use the Gym environments as follows:

import gym
import magical

# magical.register_envs() must be called before making any Gym envs

# creating a demo variant for one task
env = gym.make('FindDupe-Demo-v0')

# We can also make the test variant of the same environment, or add a
# preprocessor to the environment. In this case, we are creating a
# TestShape variant of the original environment, and applying the
# LoRes4E preprocessor to observations. LoRes4E stacks four
# egocentric frames together and downsamples them to 96x96.
env = gym.make('FindDupe-TestShape-LoRes4E-v0')
init_obs = env.reset()
print('Observation type:', type(obs))  # np.ndarray
print('Observation shape:', obs.shape)  # (96, 96, 3)

In general, Gym environment names for MAGICAL take the form
<task-name>-<variant>[-<preprocessor]-v0, where the final preprocessor name is
optional. For instance,FindDupe-Demo-v0, MoveToCorner-Demo-LoResStack-v0 and
ClusterColour-TestAll-v0 are all available Gym environments. Keep reading to
see a list of all available tasks and variants, as well as all the builtin
observation preprocessors that ship with MAGICAL.

Note that the reference demonstration data for MAGICAL is not included in the
PyPI package. Rather, it is packaged as another Github
. See the "Using pre-recorded
demonstrations" section below for instructions on using this data.