DvD-TD3: Diversity via Determinants for TD3 version
The implementation of paper
Effective Diversity in Population Based Reinforcement Learning.
Install pbrl and clone this repo:
git clone https://github.com/jjccero/DvD_TD3 cd DvD_TD3 python train.py
I train agents using multiprocessing, and demo_grad.py shows how gradients are transferred in different
When DDP kernel matrix uses dot product kernel (or cosine similarity, see loss.py) as entry, we can take a
linear mapping to make the value between 0 and 1.
Training may cost a lot because evaluation (bandits’ update) after every iteration, so I reduced the frequency of
evaluation to 0.01.
Thank Jack Parker-Holder (the author of the paper) for his help.
And welcome to get in touch with me if you have any questions about this implementation.