Average-Reward PPO
TBD
References
@inproceedings{ma2021average-reward,
title={Average-Reward Reinforcement Learning with Trust Region Methods},
author={Ma, Xiaoteng and Tang, Xiaohang and Xia, Li and Yang, Jun and Zhao, Qianchuan},
journal={International Joint Conferences on Artificial Intelligence},
pages={2797--2803},
year={2021}
Also, original implementation from the authors.