## Reinforcement Learning Agents

Reinforcement learning algorithms implemented for Tensorflow 2.0+ [DQN, DDPG, AE-DDPG, SAC, PPO, Primal-Dual DDPG]

## Usage

- Install dependancies imported (my tf2 conda env as reference)
- Each file contains example code that runs training on CartPole env
- Training:
`python3 TF2_DDPG_LSTM.py`

- Tensorboard:
`tensorboard --logdir=DDPG/logs`

## Hyperparameter tuning

- Install hyperopt https://github.com/hyperopt/hyperopt
- Optional: switch agent used and configure param space in
`hyperparam_tune.py`

- Run:
`python3 hyperparam_tune.py`

## Agents

Agents tested using CartPole env.

Name | On/off policy | Model | Action space support |
---|---|---|---|

DQN | off-policy | Dense, LSTM | discrete |

DDPG | off-policy | Dense, LSTM | discrete, continuous |

AE-DDPG | off-policy | Dense | discrete, continuous |

SAC:bug: | off-policy | Dense | continuous |

PPO | on-policy | Dense | discrete, continuous |

#### Contrained MDP

Name | On/off policy | Model | Action space support |
---|---|---|---|

Primal-Dual DDPG | off-policy | Dense | discrete, continuous |

## Models

Models used to generate the demos are included in the repo, you can also find q value, reward and/or loss graphs

## Demos

DQN Basic, time step = 4, 500 reward | DQN LSTM, time step = 4, 500 reward |
---|---|

DDPG Basic, 500 reward | DDPG LSTM, time step = 5, 500 reward |
---|---|

AE-DDPG Basic, 500 reward | PPO Basic, 500 reward |
---|---|