Average-Reward PPO

TBD

References

@inproceedings{ma2021average-reward,
    title={Average-Reward Reinforcement Learning with Trust Region Methods},
    author={Ma, Xiaoteng and Tang, Xiaohang and Xia, Li and Yang, Jun and Zhao, Qianchuan},
    journal={International Joint Conferences on Artificial Intelligence},
    pages={2797--2803},
    year={2021}

Also, original implementation from the authors.

GitHub

View Github