All-in-one

Code for the paper: All in One: Exploring Unified Video-Language Pre-training Arxiv

Install

1. PytorchLighting

In this work, we use PytorchLighting for distributed training with mixed precision.
Install pytorch and PytorchLighting first.

conda create -n allinone python=3.7
source activate allinone
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
cd [Path_To_This_Code]
pip install -r requirements.txt

2. On-the-fly decode

To speed up the pre-training, we adopt on-the-fly decode for fast IO.
Install ffmpeg and pytorchvideo (for data augmentation) as below.

sudo conda install -y ffmpeg
pip install ffmpeg-python
pip install pytorchvideo

Please install the required packages if not included in the requirements.

Download Pretrained Weights

We provide three pretrained weights in google driver.

Model	Parameter	Pretrained Weight	Trained Log	Hparams
All-in-one-Ti	12M	Google Driver	Google Driver	Google Driver
All-in-one-S	33M	Google Driver	Google Driver	Google Driver
All-in-one-B	110M	Google Driver	Google Driver	Google Driver

After downloaded these pretrained weights, move them into pretrained dir.

mkdir pretrained
cp *.ckpt pretrained/

Dataset Preparation

See DATA.md

Pre-training

See TRAIN.md

Evaluation on Downstream Tasks

See EVAL.md

By unified design and sparse sampling, AllInOne show much small flops.

News

2022.3.14 The first version of AllInOne is released. The data.md is in progress.

Citation

If you find our work helps, please cite our paper.

@article{wang2022allinone,
  title={All in One: Exploring Unified Video-Language Pre-training},
  author={Wang, Alex Jinpeng and Ge, Yixiao and Yan, Rui and Ge Yuying and Lin, Xudong and Cai, Guanyu  and Wu, Jianping and Shan, Ying and Qie, Xiaohu and Shou, Mike Zheng},
  journal={arXiv preprint arXiv:2203.07303},
  year={2022}
}

Acknowledgement

This work is mainly based on ViLT, Frozen and Merlot.

GitHub

View Github

All in One: Exploring Unified Video-Language Pre-training

All-in-one

Code for the paper: All in One: Exploring Unified Video-Language Pre-training Arxiv

Install

1. PytorchLighting

2. On-the-fly decode

Download Pretrained Weights

Dataset Preparation

Pre-training

Evaluation on Downstream Tasks

News

2022.3.14 The first version of AllInOne is released. The data.md is in progress.

Citation

Acknowledgement

GitHub

John

Searches locations based off of a list addresses to find Latitude and Longitude Points

Refine video frame based on nearby frames

All-in-one

Code for the paper: All in One: Exploring Unified Video-Language Pre-training Arxiv

Install

1. PytorchLighting

2. On-the-fly decode

Download Pretrained Weights

Dataset Preparation

Pre-training

Evaluation on Downstream Tasks

News

2022.3.14 The first version of AllInOne is released. The data.md is in progress.

Citation

Acknowledgement

GitHub

Searches locations based off of a list addresses to find Latitude and Longitude Points

Refine video frame based on nearby frames

You might also like...