PyTorch RL Minimal Implementations
There are implementations of some reinforcement learning algorithms, whose characteristics are as follow:
- Less packages-based: Only PyTorch and Gym, for building neural networks and testing algorithms’ performance respectively, are necessary to install.
- Independent implementation: All RL algorithms are implemented in separate files, which facilitates to understand their processes and modify them to adapt to other tasks.
- Various expansion configurations: It’s convenient to configure various parameters and tools, such as reward normalization, advantage normalization, tensorboard, tqdm and so on.
RL Algorithms List
Quick Start
Requirements
pytorch
gym
tensorboard # for summary writer
tqdm # for process bar
Abstract Agent
Components / Parameters
Component |
Description |
policy |
neural network model |
gamma |
discount factor of cumulative reward |
lr |
learning rate. i.e. lr_actor , lr_critic |
lr_decay |
weight decay to schedule the learning rate |
lr_scheduler |
scheduler for the learning rate |
coef_critic_loss |
coefficient of critic loss |
coef_entropy_loss |
coefficient of entropy loss |
writer |
summary writer to record information |
buffer |
replay buffer to store historical trajectories |
use_cuda |
use GPU |
clip_grad |
gradients clipping |
max_grad_norm |
maximum norm of gradients clipped |
norm_advantage |
advantage normalization |
open_tb |
open summary writer |
open_tqdm |
open process bar |
Methods
Methods |
Description |
preprocess_obs() |
preprocess observation before input into the neural network |
select_action() |
use actor network to select an action based on the policy distribution. |
estimate_obs() |
use critic network to estimate the value of observation |
update() |
update the parameter by calculate losses and gradients |
train() |
set the neural network to train mode |
eval() |
set the neural network to evaluate mode |
save() |
save the model parameters |
load() |
load the model parameters |
Update & To-do & Limitations
Update History
2021-12-09
ADD
TRICK
:norm_critic_loss in PPO
2021-12-09
ADD
PARAM
: coef_critic_loss, coef_entropy_loss, log_step
2021-12-07
ADD
ALGO
: A3C
2021-12-05
ADD
ALGO
: PPO
2021-11-28
ADD
ALGO
: A2C
2021-11-20
ADD
ALGO
: Q learning, Reinforce
To-do List
-
ADD
ALGO
DQN, Double DQN, Dueling DQN, DDPG
-
ADD
NN
RNN Mode
Current Limitations
- Unsupport
Vectorized environments
- Unsupport
Continuous action space
- Unsupport
RNN-based model
- Unsupport
Imatation learning
Reference & Acknowledgements
GitHub
View Github