This repository contains the code for the CVPR 2021 paper Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. If you find our code or paper useful, please cite

  author = {Prakash, Aditya and Chitta, Kashyap and Geiger, Andreas},
  title = {Multi-Modal Fusion Transformer for End-to-End Autonomous Driving},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2021}


Install anaconda

source ~/.profile

Clone the repo and build the environment

git clone
cd transfuser
conda create -n transfuser python=3.7
pip3 install -r requirements.txt
conda activate transfuser

Download and setup CARLA

chmod +x

Data Generation

The training data is generated using leaderboard/team_code/ in 8 CARLA towns and 14 weather conditions. The routes and scenarios files to be used for data generation are provided at leaderboard/data.

Running CARLA Server

With Display

./ --world-port=2000 -opengl

Without Display

Without Docker:

SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=0 ./ --world-port=2000 -opengl

With Docker:

Instructions for setting up docker are available here. Pull the docker image of CARLA docker pull carlasim/carla:

Docker 18:

docker run -it --rm -p 2000-2002:2000-2002 --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 carlasim/carla: ./ --world-port=2000 -opengl

Docker 19:

docker run -it --rm --net=host --gpus '"device=0"' carlasim/carla: ./ --world-port=2000 -opengl

If the docker container doesn't start properly then add another environment variable -e SDL_AUDIODRIVER=dsp.

Run the Autopilot

Once the CARLA server is running, rollout the autopilot to start data generation.


The expert agent used for data generation is defined in leaderboard/team_code/ Different variables which need to be set are specified in leaderboard/scripts/ The expert agent is based on the autopilot from this codebase.

Routes and Scenarios

Each route is defined by a sequence of waypoints (and optionally a weather condition) that the agent needs to follow. Each scenario is defined by a trigger transform (location and orientation) and other actors present in that scenario (optional). The leaderboard repository provides a set of routes and scenarios files. To generate additional routes, spin up a CARLA server and follow the procedure below.

Generating routes with intersections

The position of traffic lights is used to localize intersections and (start_wp, end_wp) pairs are sampled in a grid centered at these points.

python3 tools/ --save_file <path_of_generated_routes_file> --town <town_to_be_used>

Sampling individual junctions from a route

Each route in the provided routes file is interpolated into a dense sequence of waypoints and individual junctions are sampled from these based on change in navigational commands.

python3 tools/ --routes_file <xml_file_containing_routes> --save_file <path_of_generated_file>

Generating Scenarios

Additional scenarios are densely sampled in a grid centered at the locations from the reference scenarios file. More scenario files can be found here.

python3 tools/ --scenarios_file <scenarios_file_to_be_used_as_reference> --save_file <path_of_generated_json_file> --towns <town_to_be_used>


The training code and pretrained models are provided below.

mkdir model_ckpt
wget -P model_ckpt
unzip model_ckpt/ -d model_ckpt/
rm model_ckpt/


Spin up a CARLA server (described above) and run the required agent. The adequate routes and scenarios files are provided in leaderboard/data and the required variables need to be set in leaderboard/scripts/

CUDA_VISIBLE_DEVICES=0 ./leaderboard/scripts/


This implementation is based on codebase from several repositories.