Reinforcement Learning Agents

Reinforcement learning algorithms implemented for Tensorflow 2.0+ [DQN, DDPG, AE-DDPG, SAC, PPO, Primal-Dual DDPG]


  • Install dependancies imported (my tf2 conda env as reference)
  • Each file contains example code that runs training on CartPole env
  • Training: python3
  • Tensorboard: tensorboard --logdir=DDPG/logs

Hyperparameter tuning


Agents tested using CartPole env.

Name On/off policy Model Action space support
DQN off-policy Dense, LSTM discrete
DDPG off-policy Dense, LSTM discrete, continuous
AE-DDPG off-policy Dense discrete, continuous
SAC:bug: off-policy Dense continuous
PPO on-policy Dense discrete, continuous

Contrained MDP

Name On/off policy Model Action space support
Primal-Dual DDPG off-policy Dense discrete, continuous


Models used to generate the demos are included in the repo, you can also find q value, reward and/or loss graphs


DQN Basic, time step = 4, 500 reward DQN LSTM, time step = 4, 500 reward
DDPG Basic, 500 reward DDPG LSTM, time step = 5, 500 reward
AE-DDPG Basic, 500 reward PPO Basic, 500 reward