DiffusionDB

DiffusionDB is the first large-scale text-to-image prompt dataset. It contains 2 million images generated by Stable Diffusion using prompts and hyperparameters specified by real users. The unprecedented scale and diversity of this human-actuated dataset provide exciting research opportunities in understanding the interplay between prompts and generative models, detecting deepfakes, and designing human-AI interaction tools to help users more easily use these models.

Get Started

DiffusionDB is available at 🤗 Hugging Face Datasets.

Dataset Structure

We use a modularized file structure to distribute DiffusionDB. The 2 million images in DiffusionDB are split into 2,000 folders, where each folder contains 1,000 images and a JSON file that links these 1,000 images to their prompts and hyperparameters.

./
├── images
│   ├── part-000001
│   │   ├── 3bfcd9cf-26ea-4303-bbe1-b095853f5360.png
│   │   ├── 5f47c66c-51d4-4f2c-a872-a68518f44adb.png
│   │   ├── 66b428b9-55dc-4907-b116-55aaa887de30.png
│   │   ├── 99c36256-2c20-40ac-8e83-8369e9a28f32.png
│   │   ├── f3501e05-aef7-4225-a9e9-f516527408ac.png
│   │   ├── [...]
│   │   └── part-000001.json
│   ├── part-000002
│   ├── part-000003
│   ├── part-000004
│   ├── [...]
│   └── part-002000
└── metadata.parquet

These sub-folders have names part-00xxxx, and each image has a unique name generated by UUID Version 4. The JSON file in a sub-folder has the same name as the sub-folder. Each image is a PNG file. The JSON file contains key-value pairs mapping image filenames to their prompts and hyperparameters. For example, below is the image of f3501e05-aef7-4225-a9e9-f516527408ac.png and its key-value pair in part-000001.json.

{
  "f3501e05-aef7-4225-a9e9-f516527408ac.png": {
    "p": "geodesic landscape, john chamberlain, christopher balaskas, tadao ando, 4 k, ",
    "se": 38753269,
    "c": 12.0,
    "st": 50,
    "sa": "k_lms"
  },
}

The data fields are:

key: Unique image name
p: Prompt
se: Random seed
c: CFG Scale (guidance scale)
st: Steps
sa: Sampler

At the top level folder of DiffusionDB, we include a metadata table in Parquet format metadata.parquet. This table has seven columns: image_name, prompt, part_id, seed, step, cfg, and sampler, and it has 2 million rows where each row represents an image. seed, step, and cfg are We choose Parquet because it is column-based: researchers can efficiently query individual columns (e.g., prompts) without reading the entire table. Below are the five random rows from the table.

image_name	prompt	part_id	seed	step	cfg	sampler
49f1e478-ade6-49a8-a672-6e06c78d45fc.png	ryan gosling in fallout 4 kneels near a nuclear bomb	1643	2220670173	50	7.0	8
b7d928b6-d065-4e81-bc0c-9d244fd65d0b.png	A beautiful robotic woman dreaming, cinematic lighting, soft bokeh, sci-fi, modern, colourful, highly detailed, digital painting, artstation, concept art, sharp focus, illustration, by greg rutkowski	87	51324658	130	6.0	8
19b1b2f1-440e-4588-ba96-1ac19888c4ba.png	bestiary of creatures from the depths of the unconscious psyche, in the style of a macro photograph with shallow dof	754	3953796708	50	7.0	8
d34afa9d-cf06-470f-9fce-2efa0e564a13.png	close up portrait of one calico cat by vermeer. black background, three – point lighting, enchanting, realistic features, realistic proportions.	1685	2007372353	50	7.0	8
c3a21f1f-8651-4a58-a4d4-7500d97651dc.png	a bottle of jack daniels with the word medicare replacing the word jack daniels	243	1617291079	50	7.0	8

To save space, we use an integer to encode the sampler in the table above.

Sampler	Integer Value
ddim	1
plms	2
k_euler	3
k_euler_ancestral	4
ddik_heunm	5
k_dpm_2	6
k_dpm_2_ancestral	7
k_lms	8

Dataset Creation

We collected all images from the official Stable Diffusion Discord server. Please read our research paper for details. The code is included in ./scripts/.

Data Removal

If you find any harmful images or prompts in DiffusionDB, you can use this Google Form to report them. Similarly, if you are a creator of an image included in this dataset, you can use the same form to let us know if you would like to remove your image from DiffusionDB. We will closely monitor this form and update DiffusionDB periodically.

Credits

DiffusionDB is created by Jay Wang, Evan Montoya, David Munechika, Alex Yang, Ben Hoover, Polo Chau.

Citation

@article{wangDiffusionDBLargescalePrompt2022,
  title = {{{DiffusionDB}}: {{A}} Large-Scale Prompt Gallery Dataset for Text-to-Image Generative Models},
  author = {Wang, Zijie J. and Montoya, Evan and Munechika, David and Yang, Haoyang and Hoover, Benjamin and Chau, Duen Horng},
  year = {2022},
  journal = {arXiv:2210.14896 [cs]},
  url = {https://arxiv.org/abs/2210.14896}
}

Licensing

The DiffusionDB dataset is available under the CC0 1.0 License. The Python code in this repository is available under the MIT License.

Contact

If you have any questions, feel free to open an issue or contact Jay Wang.

GitHub

View Github

A large-scale text-to-image prompt gallery dataset based on Stable Diffusion

DiffusionDB

Get Started

Dataset Structure

Dataset Creation

Data Removal

Credits

Citation

Licensing

Contact

GitHub

John

Uses opencv to detect when a camera is panning or zooming

Next Gen Face detection based on YOLOv7

DiffusionDB

Get Started

Dataset Structure

Dataset Creation

Data Removal

Credits

Citation

Licensing

Contact

GitHub

Uses opencv to detect when a camera is panning or zooming

Next Gen Face detection based on YOLOv7

You might also like...