Masked Visual Pre-training for Motor Control
This is a PyTorch implementation of the paper Masked Visual Pre-training for Motor Control. It contains the benchmark suite, pre-trained models, and the training code to reproduce the results from the paper.
Installation
Please see INSTALL.md
for installation instructions.
Pre-trained visual enocoders
We provide pre-trained visual encoders used in the paper. The models are in the same format as mae and timm:
backbone | objective | data | md5 | download |
---|---|---|---|---|
ViT-S | MAE | in-the-wild | model | |
ViT-S | MAE | ImageNet | model | |
ViT-S | Supervised | ImageNet | model |
By default, the code assumes that the pre-trained encoders are placed in /tmp/pretrained
directory.
Example training commands
Train FrankaPick
from states:
python tools/train.py task=FrankaPick
Train FrankaPick
from pixels:
python tools/train.py task=FrankaPickPixels
Train on 8 GPUs:
python tools/train_dist.py num_gpus=8
Test a policy after N iterations:
python tools/train.py test=True headless=False logdir=/path/to/job resume=N
Citation
If you find the code or pre-trained models useful in your research, please use the following BibTeX entry:
@article{Xiao2022
title = {Masked Visual Pre-training for Motor Control},
author = {Tete Xiao and Ilija Radosavovic and Trevor Darrell and Jitendra Malik},
journal = {arXiv:2203.06173},
year = {2022}
}
Acknowledgments
We thank NVIDIA IsaacGym and PhysX teams for making the simulator and preview code examples available.