Safe Policy Optimization with Local Feature (SPOLF)
This is the sourcecode for implementing the algorithms in the paper “Safe Policy Optimization with Local Generalized Linear Function Approximations” which was presented in NeurIPS21.
Installation
There is requirements.txt
in this repository. Except for the common modules (e.g., numpy, scipy), our source code depends on the following modules.

Mandatory
 GymMiniGrid (https://github.com/maximecb/gymminigrid)
 Hydra (https://github.com/facebookresearch/hydra)
 pymdptoolbox (https://github.com/sawcordwell/pymdptoolbox)

Optional
We also provide Dockerfile
in this repository, which can be used for reproducing our gridworld experiment.
Simulation configuration
We manage the simulation configuration using hydra. Configurations are listed in config.yaml
. For example, the algorithm to run should be chosen from the ones we implemented:
sim_type: {safe_glm, unsafe_glm, random, oracle, safe_gp_state, safe_gp_feature, safe_glm_stepwise}
Grid World Experiment
The source code necessary for our gridworld experiment is contained in /grid_world
folder. To run the simulation, for example, use the following commands.
cd grid_world
python main.py sim_type=safe_glm env.reuse_env=False
For the monte carlo simulation while comparing our proposed method with baselines, use the shell file, run.sh
.
We also provide a script for visualization. If you want to render how the agent behaves, use the following command.
python main.py sim_type=safe_glm env.reuse_env=True
SafetyGym Experiment
The source code necessary for our safetygym experiment is contained in /safety_gym_discrete
folder. Our experiment is based on safetygym. Our proposed method utilize dynamic programming algorithms to solve Bellman Equation, so we modified engine.py
to discrtize the environment. We attach modified safetygym source code in /safety_gym_discrete/engine.py
. To use the modified library, please clone safetygym, then replace safetygym/safety_gym/envs/engine.py
using /safety_gym_discrete/engine.py
in our repo. Using the following commands to install the modified library:
cd safety_gym
pip install e .
Note that MuJoCo licence is needed for installing SafetyGym. To run the simulation, use the folowing commands.
cd safety_gym_discrete
python main.py sim_idx=0
We compare our proposed method with three notable baselines: CPO, PPOLagrangian, and TRPOLagrangian. The baseline implementation depends on . We modified run_agent.py
in the repo source code.
To run the baseline, use the folowing commands.
cd safety_gym_discrete/baseline
python baseline_run.py sim_type=cpo
The environment that agent runs on is generated using generate_env.py
. We provide 10 50*50 environments. If you want to generate other environments, you can change the world shape in safety_gym_discrete.py
, and running the following commands:
cd safety_gym_discrete
python generate_env.py
Citation
If you find this code useful in your research, please consider citing:
@inproceedings{wachi_yue_sui_neurips2021,
Author = {Wachi, Akifumi and Wei, Yunyue and Sui, Yanan},
Title = {Safe Policy Optimization with Local Generalized Linear Function Approximations},
Booktitle = {Neural Information Processing Systems (NeurIPS)},
Year = {2021}
}