The Unsupervised Reinforcement Learning Benchmark (URLB)

URLB provides a set of leading algorithms for unsupervised reinforcement learning where agents first pre-train without access to extrinsic rewards and then are finetuned to downstream tasks.


We assume you have access to a GPU that can run CUDA 10.2 and CUDNN 8. Then, the simplest way to install all required dependencies is to create an anaconda environment by running

conda env create -f conda_env.yml

After the instalation ends you can activate your environment with

conda activate urlb

Implemented Agents

Agent Command Implementation Author(s) Paper
ICM agent=icm Denis paper
ProtoRL agent=proto Denis paper
DIAYN agent=diayn Misha paper
APT(ICM) agent=icm_apt Hao, Kimin paper
APT(Ind) agent=ind_apt Hao, Kimin paper
APS agent=aps Hao, Kimin paper
SMM agent=smm Albert paper
RND agent=rnd Kevin paper
Disagreement agent=disagreement Catherine paper

Available Domains

We support the following domains.

Domain Tasks
walker stand, walk, run, flip
quadruped walk, run, stand, jump
jaco reach_top_left, reach_top_right, reach_bottom_left, reach_bottom_right

Domain observation mode

Each domain supports two observation modes: states and pixels.

Model Command
states obs_type=states
pixels obs_type=pixels



To run pre-training use the script

python agent=icm domain=walker

or, if you want to train a skill-based agent, like DIAYN, run:

python agent=diayn domain=walker

This script will produce several agent snapshots after training for 100k, 500k, 1M, and 2M frames. The snapshots will be stored under the following directory:


For example:



Once you have pre-trained your method, you can use the saved snapshots to initialize the DDPG agent and fine-tune it on a downstream task. For example, let’s say you have pre-trained ICM, you can fine-tune it on walker_run by running the following command:

python pretrained_agent=icm task=walker_run snapshot_ts=1000000 obs_type=states

This will load a snapshot stored in ./pretrained_models/states/walker/icm/, initialize DDPG with it (both the actor and critic), and start training on walker_run using the extrinsic reward of the task.

For methods that use skills, include the agent, and the reward_free tag to false.

python pretrained_agent=smm task=walker_run snapshot_ts=1000000 obs_type=states agent=smm reward_free=false


Logs are stored in the exp_local folder. To launch tensorboard run:

tensorboard --logdir exp_local

The console output is also available in a form:

| train | F: 6000 | S: 3000 | E: 6 | L: 1000 | R: 5.5177 | FPS: 96.7586 | T: 0:00:42

a training entry decodes as

F  : total number of environment frames
S  : total number of agent steps
E  : total number of episodes
R  : episode return
FPS: training throughput (frames per second)
T  : total training time


View Github