Synthesize photos from PhotoDNA.
See the blog post for more information.
You can install Python dependencies using
pip install -r requirements.txt. If you want to install the packages manually, here is a list:
Ribosome is released with 4 pre-trained models:
coco-model.pt: trained on COCO 2017 train images
celeba-model.pt: trained on CelebA aligned+cropped images
nsfw-model.pt: trained on 100K SFW+NSFW images scraped from Reddit
coco+celeba+nsfw-model.pt: trained on the combination of the above
Use the models trained on NSFW data at your own risk.
infer.py script to produce images from hashes:
python infer.py [--model MODEL] [--output OUTPUT] hash
The hash is a base64-encoded string, e.g.
Datasets consist of images paired with hashes, in the format of a CSV file with paths/hashes, and image files in a directory. The CSV file has two colums, path and hash (no header row). The hash is base64-encoded. Images are 100×100 in size. After producing such a CSV, it may be convenient to shuffle it and split it into a training set and validation set.
Ribosome includes an example dataset in this format, produced from COCO:
coco100x100.tar.gz: image files
coco-train.csv: training set hashes
coco-val.csv: validation set hashes
Preparing a dataset
To produce 100×100 images from an existing dataset, it may be convenient to use ImageMagick.
image.jpg to 100×100 ignoring the original aspect ratio:
mogrify -resize '100x100!' image.jpg
image.jpg to 100×100 by taking a center crop:
mogrify -resize '100x100^' -gravity Center -extent '100x100' image.jpg
You can process files in parallel using
xargs, e.g. to convert all
.jpg images using 24 threads:
find . -name '*.jpg' | xargs -n 1 -P 24 mogrify -resize '100x100!'
Ribosome does not provide code to compute PhotoDNA hashes, but such code is available in pyPhotoDNA.
Train a model
train.py script to train a model on a dataset:
python train.py --train-data TRAIN_DATA ...
--train-datais the path to the train data CSV
- Paths in the CSV are interpreted relative to
.if not supplied)
--val-datais the path to the validation data CSV; if provided, the script will report the validation loss after every epoch
python train.py --help for all the options.
Copyright (c) Anish Athalye. Released under the MIT License. See LICENSE.md for details.