Implement A3C for Mujoco gym envs in python

Jul 14, 2021 2 min read

pytorch-a3c-mujoco

This code aims to solve some control problems, espicially in Mujoco, and is highly based on pytorch-a3c. What's difference between this repo and pytorch-a3c:

compatible to Mujoco envionments
the policy network output the mu, and sigma
construct a gaussian distribution from mu and sigma
sample the data from the gaussian distribution
modify entropy

Note that this repo is only compatible with Mujoco in OpenAI gym. If you want to train agent in Atari domain, please refer to pytorch-a3c.

Usage

There're three tasks/modes for you: train, eval, develop.

train:

python main.py --env-name InvertedPendulum-v1 --num-processes 16 --task train

eval:

python main.py --env-name InvertedPendulum-v1 --task eval --display True --load_ckpt ckpt/a3c/InvertedPendulum-v1.a3c.100

You can choose to display or not using display flags

develop:

python main.py --env-name InvertedPendulum-v1 --num-processes 16 --task develop

In some case that you want to check if you code runs as you want, you might resort to pdb. Here, I provide a develop mode, which only runs in one thread (easy to debug).

Experiment results

learning curve

The plot of total reward/episode length in 1000 steps:

InvertedPendulum-v1

InvertedPendulum-v1.a3c.log

In InvertedPendulum-v1, total reward exactly equal to episode length.

InvertedDoublePendulum-v1

InvertedDoublePendulum-v1.a3c.log

Note that the x axis denote the time in minute

The above curve is plotted from python plot.py --log_path ./logs/a3c/InvertedPendulum-v1.a3c.log

video

InvertedPendulum-v1

InvertedDoublePendulum-v1

Requirements

gym
mujoco-py
pytorch
matplotlib (optional)
seaborn (optional)

GitHub

https://github.com/andrewliao11/pytorch-a3c-mujoco

PyTorch

John was the first writer to have joined pythonawesome.com. He has since then inculcated very effective writing and reviewing culture at pythonawesome which rivals have found impossible to imitate.

PyTorch

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning

TorchRL Disclaimer This library is not officially released yet and is subject to change. The features are available before an official release so that users and collaborators can get early access and provide

17 January 2023

PyTorch

A PyTorch Implementation of i-MAE: Linearly Separable Representation in MAE

17 January 2023

PyTorch

Official ECCV 2022 repository for SUPR: A Sparse Unified Part-Based Human Representation

SUPR: A Sparse Unified Part-Based Human Representation (ECCV 2022) TLDR: We release a full suite of state of the art models ( 18 models ): A body model, hand model, head model and a foot

16 January 2023

PyTorch

Yet another PyTorch implementation of Stable Diffusion

stable-diffusion-pytorch Yet another PyTorch implementation of Stable Diffusion. I tried my best to make the codebase minimal, self-contained, consistent, hackable, and easy to read. Features are pruned if not needed in Stable Diffusion

16 January 2023

PyTorch

A parallel ODE solver for PyTorch

A Parallel ODE Solver for PyTorch torchode is a suite of single-step ODE solvers such as dopri5 or tsit5 that are compatible with PyTorch’s JIT compiler and parallelized across a batch.

16 January 2023

PyTorch

PyTorch implementation and pre-trained models for ASP - Autoregressive Structured Prediction with Language Models

16 January 2023

PyTorch

Reliable Decision from Multiple Subtasks through Threshold Optimization: Content Moderation in the Wild

TruSThresh Official PyTorch implementation of “Reliable Deicision from Multiple Subtasks through Threshold Optimization: Content Moderation in the Wild“ (WSDM’23). Installation pip install -e ./ Quick Example import numpy as np

16 January 2023

PyTorch

The official PyTorch implementation of the paper Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP

13 January 2023

Implement A3C for Mujoco gym envs in python

pytorch-a3c-mujoco

Usage

Experiment results

learning curve

video

Requirements

GitHub

John

The challenge involves 2 agents who can either cooperate or defect

Learning Semantic Boundaries from Noisy Annotations

pytorch-a3c-mujoco

Usage

Experiment results

learning curve

video

Requirements

GitHub

The challenge involves 2 agents who can either cooperate or defect

Learning Semantic Boundaries from Noisy Annotations

You might also like...