Ribosome Build Status

Synthesize photos from PhotoDNA.

Ribosome demo

See the blog post for more information.

Installation

Dependencies

You can install Python dependencies using pip install -r requirements.txt. If you want to install the packages manually, here is a list:

Pre-trained models

Ribosome is released with 4 pre-trained models:

Use the models trained on NSFW data at your own risk.

Usage

Inference

Use the infer.py script to produce images from hashes:

python infer.py [--model MODEL] [--output OUTPUT] hash

The hash is a base64-encoded string, e.g. cVwhQ58OSCEOIwF+AigAkT0GAWdwAQs8o04KGYMfHBUANRUOAycUEFABCh6PABIghDBzCa4RTysQYVcvMDdkMypBPSyNAgRCcTf2AC9PfiYSWDw3KTcxPxM2HSqTDSIsgxJFFA+iihERcU4fHEY4Lj0xhw3QJN4OXQwbIzJjVTsUodIVVy3/FY8I/wcui11O.

Training

Datasets

Datasets consist of images paired with hashes, in the format of a CSV file with paths/hashes, and image files in a directory. The CSV file has two colums, path and hash (no header row). The hash is base64-encoded. Images are 100×100 in size. After producing such a CSV, it may be convenient to shuffle it and split it into a training set and validation set.

Example dataset

Ribosome includes an example dataset in this format, produced from COCO:

Preparing a dataset

To produce 100×100 images from an existing dataset, it may be convenient to use ImageMagick.

To resize image.jpg to 100×100 ignoring the original aspect ratio:

mogrify -resize '100x100!' image.jpg

To resize image.jpg to 100×100 by taking a center crop:

mogrify -resize '100x100^' -gravity Center -extent '100x100' image.jpg

You can process files in parallel using find / xargs, e.g. to convert all .jpg images using 24 threads:

find . -name '*.jpg' | xargs -n 1 -P 24 mogrify -resize '100x100!'

Ribosome does not provide code to compute PhotoDNA hashes, but such code is available in pyPhotoDNA.

Train a model

Use the train.py script to train a model on a dataset:

python train.py --train-data TRAIN_DATA ...
  • --train-data is the path to the train data CSV
  • Paths in the CSV are interpreted relative to --data-dir (or . if not supplied)
  • --val-data is the path to the validation data CSV; if provided, the script will report the validation loss after every epoch

See python train.py --help for all the options.

License

Copyright (c) Anish Athalye. Released under the MIT License. See LICENSE.md for details.

GitHub

View Github