Downloading our datasets

Dataset structure

  • Each dataset may have several subdatasets (most of them only have one)
  • The pickle file has the following format:
            'image_tensors':np.array((N,3,224,224)), # N: image number
            'input_ids':np.array(S), # S: token length of the filled template text
            'template_input_ids':np.array(S_), # S_: token length of the un-filled template text
  • ABO dataset contains an additional label_to_text.json file, which provides text template for each subdataset and label.

A list of available datasets and subdatasets

Datasetdataset name (-i)subdataset name (-d)
Clevr CountingClevrCountingcounting
Amazon Berkeley Objects (ABO)ABOmaterial,color
Caltech-UCSD Birds 200 (CUB)CUBclassification

Training with provided datasets provided example code for performing training and meta-testing on our datasets.

Output format

Each model checkpoint dir contains two files:

  • step1.ckpt: model checkpoint after training phase
  • dev_test_results.json: scores on each task configuration on dev and test set during meta-testing

Loading checkpoint

  • Here is an example snippet for loading step1.ckpt from multitask-finetuning/classical-finetuning/zeroshot models:
    model = MultitaskFinetuneCLIP()
    model = model.load_from_checkpoint(checkpoint_path="<path to log dir>/step1.ckpt")
  • Here is an example snippet for loading step1.ckpt from fomaml models:
    model = LightningCLIP()
    model = l2l.algorithms.MAML(model, lr=1e-5 first_order=True)
    model.load_state_dict(torch.load("<path to log dir>/step1.ckpt"))

Training with custom datasets

preprocess dataset

  • put your new dataset in the same format as provided dataset into data/
  • Specify template_function or the path to label_to_text json file (an example file can be found in /data/ABO/label_to_text.json) at line 350 and 355 in
  • provides an example of running to create pickle file for your new dataset
  • add your dataset into construct_dataset(): line 77 in and line 80 in


  • modify to train and meta-test on your own dataset
  • refer to and for default and tuning hyperparameters for each algorithm
GitHub - MikeWangWZHL/Multitask-Finetuning_CLIP at
Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models - GitHub - MikeWangWZHL/Multitask-Finetuning_CLIP at