RIIT

Open-source code for Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning. Our goal is to call for a fair comparison of the performance of MARL algorithms.

Code-level Optimizations

There are so many code-level tricks in the Multi-agent Reinforcement Learning (MARL), such as:

  • Value function clipping (clip max Q values for QMIX)
  • Value Normalization
  • Reward scaling
  • Orthogonal initialization and layer scaling
  • Adam
  • learning rate annealing
  • Reward Clipping
  • Observation Normalization
  • Gradient Clipping
  • Large Batch Size
  • N-step Returns(including GAE($\lambda$) and Q($\lambda$))
  • Rollout Process Number
  • $\epsilon$-greedy annealing steps
  • Death Agent Masking

Related Works

  • Implementation Matters in Deep RL: A Case Study on PPO and TRPO
  • What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study
  • The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Finetuned-QMIX

Using a few of tricks above (Bold texts), we enabled QMIX to solve almost all of SMAC's scenarios (finetuned QMIX for each scenarios).

Senarios Difficulty QMIX (batch_size=128) Finetuned-QMIX
8m Easy - 100%
2c_vs_1sc Easy - 100%
2s3z Easy - 100%
1c3s5z Easy - 100%
3s5z Easy - 100%
8m_vs_9m Hard 84% 100%
5m_vs_6m Hard 84% 90%
3s_vs_5z Hard 96% 100%
bane_vs_bane Hard 100% 100%
2c_vs_64zg Hard 100% 100%
corridor Super Hard 0% 100%
MMM2 Super Hard 98% 100%
3s5z_vs_3s6z Super Hard 3% 85%(Number of Envs = 4)
27m_vs_30m Super Hard 56% 100%
6h_vs_8z Super Hard 0% 93%($\lambda$ = 0.3)

Re-Evaluation

Afterwards, we re-evaluate numerous QMIX variants with normalized the tricks (a genaral set of hyperparameters), and find that QMIX achieves the SOTA.

Scenarios Difficulty Value-based Policy-based
QMIX VDNs Qatten QPLEX WQMIX LICA DOP RIIT
2c_vs_64zg Hard 100% 100% 100% 100% 93% 100% 56% 100%
8m_vs_9m Hard 100% 100% 100% 95% 90% 48% 18% 95%
3s_vs_5z Hard 100% 100% 100 % 100% 100% 3% 0% 96%
5m_vs_6m Hard 90% 90% 90% 90% 90% 53% 9% 67%
3s5z_vs_3s6z Super-Hard 75% 43% 62% 68% 6% 0% 0% 75%
corridor Super-Hard 100% 98% 100% 96% 96% 0% 0% 100%
6h_vs_8z Super-Hard 84% 87% 82% 78% 78% 4% 1% 19%
MMM2 Super-Hard 100% 96% 100% 100% 23% 0% 0% 100%
27m_vs_30m Super-Hard 100% 100% 100% 100% 0% 9% 0% 93%
Discrete Predator-Prey - 40 39 - 39 39 30 32 38
Avg. Score Hard+ 94.9% 91.2% 92.7% 92.5% 67.4% 29.2% 14.0% 84.0%

PyMARL

PyMARL is WhiRL's framework for deep multi-agent reinforcement learning and includes implementations of the following algorithms:

Value-based Methods:

Actor Critic Methods:

Installation instructions

Install Python packages

# require Anaconda 3 or Miniconda 3
bash install_dependecies.sh

Set up StarCraft II (2.4.10) and SMAC:

bash install_sc2.sh

This will download SC2.4.10 into the 3rdparty folder and copy the maps necessary to run over.

Command Line Tool

Run an experiment

# For SMAC
python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor

# For Cooperative Predator-Prey
python3 src/main.py --config=qmix_prey --env-config=stag_hunt with env_args.map_name=stag_hunt

The config files act as defaults for an algorithm or environment.

They are all located in src/config. --config refers to the config files in src/config/algs --env-config refers to the config files in src/config/envs

Run n parallel experiments

# bash run.sh config_name map_name_list (threads_num arg_list gpu_list experinments_num)
bash run.sh qmix corridor 2 epsilon_anneal_time=500000 0,1 5

xxx_list is separated by ,.

All results will be stored in the Results folder and named with map_name.

Kill all training processes

# all python and game processes of current user will quit.
bash clean.sh

Cite

@article{hu2021rethinking,
      title={Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning}, 
      author={Jian Hu and Siyang Jiang and Seth Austin Harding and Haibin Wu and Shih-wei Liao},
      year={2021},
      eprint={2102.03479},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

GitHub

https://github.com/hijkzzz/pymarl2