DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

teaser image

PaintSkills – Visual Reasoning

Dataset

  • Download four skill data: object.zip, count.zip, color.zip and spatial.zip from the Google Drive link and unzip.

unzip object.zip
unzip count.zip
unzip color.zip
unzip spatial.zip
  • Each skill directory has hierarchy as below:

{skill}/        # skill name (i.e.., object, count, color, and spatial)
    # Images
    images/

    # Scene configuration
    scenes/
        {skill}_train.json
        {skill}_val.json

    # Bounding box annotations - only needed for DETR
    {skill}_train_bounding_boxes.json
    {skill}_val_bounding_boxes.json

Evaluation

Please see ./paintskills/detr/README.md for our DETR-based visual reasoning skill evaluation.

Acknowledgements

  • We thank the developers of DETR for public releases of their code.

Reference

Please cite our paper if you use our models in your works:

@article{Cho2022DallEval,
  title         = {DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers},
  author        = {Jaemin Cho and Abhay Zala and Mohit Bansal},
  year          = {2022},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  eprint        = {2202.04053}
}