Modular Deep Reinforcement Learning framework in PyTorch.


A multitask agent solving both OpenAI Cartpole-v0 and Unity Ball2D.


SLM Lab is created for deep reinforcement learning research.


  • numerous canonical algorithms (list below)
  • reusable modular components: algorithm, policy, network, memory
  • ease and speed of building new algorithms
  • clear and unified design; production-grade code



  • scalable hyperparameter search with ray
  • graphs and analytics
  • fitness function for comparing experiments
  • open science - Log Book


SLM Lab implements most of the recent canonical algorithms and various extensions. These are used as the base of research.


code: slm_lab/agent/algorithm

Various algorithms are in fact extensions of some simpler ones, and they are implemented as such. This makes the code very concise.

Policy Gradient:

  • AC (Vanilla Actor-Critic)
    • shared or separate actor critic networks
    • plain TD
    • entropy term control
  • A2C (Advantage Actor-Critic)
    • extension of AC with with advantage function
    • N-step returns as advantage
    • GAE (Generalized Advantage Estimate) as advantage
  • PPO (Proximal Policy Optimization)
    • extension of A2C with PPO loss function
  • SIL (Self-Imitation Learning)
    • extension of A2C with off-policy training on custom loss


  • DQN (Deep Q Learning)
    • boltzmann or epsilon-greedy policy
  • DRQN (Recurrent DQN)
  • Dueling DQN
  • DDQN (Double DQN)
  • Dueling DDQN
  • Multitask DQN (multi-environment DQN)
  • Hydra DQN (multi-environment DQN)

Below are the modular building blocks for the algorithms. They are designed to be general, and are reused extensively.


code: slm_lab/agent/memory

For on-policy algorithms (policy gradient):

  • OnPolicyReplay
  • OnPolicySeqReplay
  • OnPolicyBatchReplay
  • OnPolicyBatchSeqReplay

For off-policy algorithms (value-based)

  • Replay
  • SeqReplay
  • StackReplay
  • AtariReplay
  • PrioritizedReplay

Neural Network

code: slm_lab/agent/net

These networks are usable for all algorithms.

  • MLPNet (Multi Layer Perceptron)
  • MLPHeterogenousTails (multi-tails)
  • HydraMLPNet (multi-heads, multi-tails)
  • RecurrentNet
  • ConvNet

These networks are usable for Q-learning algorithms. For more details see this paper.

  • DuelingMLPNet
  • DuelingConvNet


code: slm_lab/agent/algorithm/

  • different probability distributions for sampling actions
  • default policy
  • Boltzmann policy
  • Epsilon-greedy policy
  • numerous rate decay methods

Experimentation framework

Deep Reinforcement Learning is very empirical. The systematic building blocks above need a experimentation framework to study variables systematically. The experiment framework completes the lab.

Experiment graph summarizing the trials in hyperparameter search.

Trial graph showing average envelope of repeated sessions.

Session graph showing total rewards, exploration variable and loss for the episodes.

Read on for tutorials, research and results.


  1. Clone the SLM-Lab repo:

    git clone
  2. Install dependencies (or inspect bin/setup_* first):

    cd SLM-Lab/
    yarn install
    source activate lab

Alternatively, run the content of bin/setup_macOS or bin/setup_ubuntu on your terminal manually.
Docker image and Dockerfile with instructions are also available


A config file config/default.json will be created.

  "data_sync_dir": "~/Dropbox/SLM-Lab/data"
  • update "data_sync_dir" if you run lab on remote and want to sync data for easy access; it will copy data/ there.


To update SLM Lab, pull the latest git commits and run update:

git pull
yarn update


Run the demo to quickly see the lab in action (and to test your installation).

It is DQN in CartPole-v0:

  1. see slm_lab/spec/demo.json for example spec:

    "dqn_cartpole": {
      "agent": [{
        "name": "DQN",
        "algorithm": {
          "name": "DQN",
          "action_pdtype": "Argmax",
          "action_policy": "epsilon_greedy",
  2. see config/experiments.json to schedule experiments:

    "demo.json": {
      "dqn_cartpole": "dev"

    To run faster, change lab mode from "dev" to "train" above and rendering will be disabled.

  3. launch terminal in the repo directory, run the lab:

    source activate lab
    yarn start
  4. This demo will run a single trial using the default parameters, and render the environment. After completion, check the output for data data/dqn_cartpole_2018_06_16_214527/ (timestamp will differ). You should see some healthy graphs.

    Trial graph showing average envelope of repeated sessions.

    Session graph showing total rewards, exploration variable and loss for the episodes.

  5. Enjoy mode - when a session ends, a model file will automatically save. You can find the session prepath that ends in its trial and session numbers. The example above is trial 1 session 0, and you can see a pyotrch model saved at data/dqn_cartpole_2018_06_16_214527/dqn_cartpole_t1_s0_model_net.pth. Use the prepath at config/experiments.json to run eval mode:

    "demo.json": {
      "dqn_cartpole": "[email protected]/dqn_cartpole_2018_06_16_214527/dqn_cartpole_t1_s0"

    enjoy mode will automatically disable learning and exploration. Graphs will still save.

  6. Next, change the run mode from "train" to "search" config/experiments.json, and rerun. This runs experiments of multiple trials with hyperparameter search. Environments will not be rendered.:

    "demo.json": {
      "dqn_cartpole": "search"

    When it ends, refer to {prepath}_experiment_graph.png and {prepath}_experiment_df.csv to find the best trials.

If the demo fails, consult Debugging.

Now the lab is ready for usage.

Read on: Github | Documentation | Experiment Log


If you use SLM-Lab in your research, please cite below:

    author = {Wah Loon Keng, Laura Graesser},
    title = {SLM-Lab},
    year = {2017},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{}},