Policy Gradient Algorithms From Scratch (NumPy)

This repository showcases two policy gradient algorithms (One Step Actor Critic and Proximal Policy Optimization) applied to two MDPs. The algorithms are implemented from scratch with Numpy and utilize linear regression for the value function and single layer Softmax for the policy. The MDPs are: Gridworld and Mountain Car.

Run Instructions


numpy and matplotlib

Create virtual environment, install requirements and run:
(windows instructions)

  1. Run python -m venv venv
  2. Run .\venv\Scripts\activate (windows)
  3. Run pip install -r requirements.txt
  4. Run python .\experiments.py
    be wary of long compute times and plots that will pop up and must be exited in order to comtinue.

Some Sample Plots


  • experiments.py – Runs pre programmed experiments that output various plots both in the terminal and saved to .png files.
  • mdp.py – Contains two MDP domains: Gridworld and Mountain Car, that the experiments are run on.
  • models.py – Contains ValueFunction and Policy which are the two models used (linear layers) for function approximation by the algorithms.
  • policy_gradient_algorithms.py – Contains the policy gradient algorithms One Step Actor Critic and Proximal Policy Optimization (PPO).

MIT License


View Github