Policy Gradient Algorithms From Scratch (NumPy)
This repository showcases two policy gradient algorithms (One Step Actor Critic and Proximal Policy Optimization) applied to two MDPs. The algorithms are implemented from scratch with Numpy and utilize linear regression for the value function and single layer Softmax for the policy. The MDPs are: Gridworld and Mountain Car.
numpy and matplotlib
Create virtual environment, install requirements and run:
python -m venv venv
pip install -r requirements.txt
be wary of long compute times and plots that will pop up and must be exited in order to comtinue.
Some Sample Plots
experiments.py– Runs pre programmed experiments that output various plots both in the terminal and saved to .png files.
mdp.py– Contains two MDP domains: Gridworld and Mountain Car, that the experiments are run on.
models.py– Contains ValueFunction and Policy which are the two models used (linear layers) for function approximation by the algorithms.
policy_gradient_algorithms.py– Contains the policy gradient algorithms One Step Actor Critic and Proximal Policy Optimization (PPO).