DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers
- Authors: Jaemin Cho, Abhay Zala, and Mohit Bansal
- Paper
PaintSkills – Visual Reasoning
Dataset
- Download four skill data:
object.zip
,count.zip
,color.zip
andspatial.zip
from the Google Drive link and unzip.
unzip object.zip
unzip count.zip
unzip color.zip
unzip spatial.zip
- Each skill directory has hierarchy as below:
{skill}/ # skill name (i.e.., object, count, color, and spatial)
# Images
images/
# Scene configuration
scenes/
{skill}_train.json
{skill}_val.json
# Bounding box annotations - only needed for DETR
{skill}_train_bounding_boxes.json
{skill}_val_bounding_boxes.json
Evaluation
Please see ./paintskills/detr/README.md for our DETR-based visual reasoning skill evaluation.
Acknowledgements
- We thank the developers of DETR for public releases of their code.
Reference
Please cite our paper if you use our models in your works:
@article{Cho2022DallEval,
title = {DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers},
author = {Jaemin Cho and Abhay Zala and Mohit Bansal},
year = {2022},
archivePrefix = {arXiv},
primaryClass = {cs.CV},
eprint = {2202.04053}
}