DMControl Generalization Benchmark

Benchmark for generalization in continuous control from pixels, based on DMControl.

Also contains an official implementation of

Generalization in Reinforcement Learning by Soft Data Augmentation (SODA)
by Nicklas Hansen and Xiaolong Wang.

Test environments

This repository provides two distinct benchmarks for visual generalization, random colors and video backgrounds:


Both benchmarks are offered in easy and hard variants. Samples are shown below.









By default, algorithms are trained for 500k frames and are continuously evaluated in both training and test environments. Environment randomization is seeded to promote reproducibility.


This repository contains implementations of the following papers in a unified framework:

using standardized architecture and hyper-parameters, wherever applicable. If you want to add an algorithm, feel free to send a pull request.


If you find our work useful in your research, please consider citing the paper as follows:

  title={Generalization in Reinforcement Learning by Soft Data Augmentation},
  author={Nicklas Hansen and Xiaolong Wang},


We assume that you have access to a GPU with CUDA >=9.2 support. All dependencies can then be installed with the following commands:

conda env create -f setup/conda.yml
conda activate dmcgen
sh setup/


Part of this repository relies on external datasets. SODA uses the Places dataset for data augmentation, which can be downloaded by running


You should familiarize yourself with their terms before downloading. After downloading and extracting the data, add your dataset directory to the data_dirs list in src/

The video_easy environment was proposed in PAD, and the video_hard environment uses a subset of the RealEstate10K dataset for background rendering. All test environments (including video files) are included in this repository, namely in the src/env/ directory.

Training & Evaluation

The scripts directory contains training and evaluation bash scripts for all the included algorithms. Alternatively, you can call the python scripts directly, e.g. for training call

python3 src/ \
    --algorithm soda \
    --aux_lr 3e-4 \
    --seed 0

to run SODA on the default task, walker_walk. This should give you an output of the form:

Working directory: logs/walker_walk/soda/0
Evaluating: logs/walker_walk/soda/0
| eval | S: 0 | ER: 26.2285 | ERTEST: 25.3730
| train | E: 1 | S: 250 | D: 70.1 s | R: 0.0000 | ALOSS: 0.0000 | CLOSS: 0.0000 | AUXLOSS: 0.0000

where ER and ERTEST corresponds to the average return in the training and test environments, respectively. You can select the test environment used in evaluation with the --eval_mode argument, which accepts one of (train, color_easy, color_hard, video_easy, video_hard).


SODA demonstrates significantly improved generalization over previous methods, exhibits stable training, and has a sample efficiency that is comparable to the baseline SAC. Average return of SODA and baselines in the train and color_hard environments is shown below.


We also provide a full comparison of the SODA, PAD, RAD, and CURL methods on all four test environments. Results for video_easy and color_hard are shown below: