Policy Gradient Algorithms (One Step Actor Critic & PPO) from scratch using Numpy

Jan 18, 2022 1 min read

Policy Gradient Algorithms From Scratch (NumPy)

This repository showcases two policy gradient algorithms (One Step Actor Critic and Proximal Policy Optimization) applied to two MDPs. The algorithms are implemented from scratch with Numpy and utilize linear regression for the value function and single layer Softmax for the policy. The MDPs are: Gridworld and Mountain Car.

Run Instructions

Packages:

numpy and matplotlib

Create virtual environment, install requirements and run:
(windows instructions)

Run python -m venv venv
Run .\venv\Scripts\activate (windows)
Run pip install -r requirements.txt
Run python .\experiments.py
be wary of long compute times and plots that will pop up and must be exited in order to comtinue.

Some Sample Plots

Files

experiments.py – Runs pre programmed experiments that output various plots both in the terminal and saved to .png files.
mdp.py – Contains two MDP domains: Gridworld and Mountain Car, that the experiments are run on.
models.py – Contains ValueFunction and Policy which are the two models used (linear layers) for function approximation by the algorithms.
policy_gradient_algorithms.py – Contains the policy gradient algorithms One Step Actor Critic and Proximal Policy Optimization (PPO).

MIT License

GitHub

View Github

John was the first writer to have joined pythonawesome.com. He has since then inculcated very effective writing and reviewing culture at pythonawesome which rivals have found impossible to imitate.