text2image

This repository includes the implementation for Text to Image Generation with Semantic-Spatial Aware GAN

This repo is not completely.

Network Structure

framework-2

The structure of the spatial-semantic aware convolutional network (SSACN) is shown as below

text2image

Requirements

  • python 3.6+
  • pytorch 1.0+
  • numpy
  • matplotlib
  • opencv

Or install full requirements by running:

pip install -r requirements.txt

TODO

  • [x] instruction to prepare dataset
  • [ ] remove all unnecessary files
  • [x] add link to download our pre-trained model
  • [ ] clean code including comments
  • [ ] instruction for training
  • [ ] instruction for evaluation

Prepare data

  1. Download the preprocessed metadata for birds coco and save them to data/
  2. Download the birds image data. Extract them to data/birds/
  3. Download coco dataset and extract the images to data/coco/

Pre-trained text encoder

  1. Download the pre-trained text encoder for CUB and save it to DAMSMencoders/bird/inception/
  2. Download the pre-trained text encoder for coco and save it to DAMSMencoders/coco/inception/

Trained model

you can download our trained models from our onedrive repo

Start training

See opts.py for the options.

Evaluation

please run IS.py and test_lpips.py (remember to change the image path) to evaluate the IS and diversity scores, respectively.

For evaluating the FID score, please use this repo https://github.com/bioinf-jku/TTUR.

Performance

You will get the scores close to below after training under xe loss for xxxxx epochs:

results--1-

Qualitative Results

Some qualitative results on coco and birds dataset from different methods are shown as follows:
qualitative

The predicted mask maps on different stages are shown as as follows:
mask

Reference

If you find this repo helpful in your research, please consider citing our paper:

@article{liao2021text,
  title={Text to Image Generation with Semantic-Spatial Aware GAN},
  author={Liao, Wentong and Hu, Kai and Yang, Michael Ying and Rosenhahn, Bodo},
  journal={arXiv preprint arXiv:2104.00567},
  year={2021}
}

GitHub

https://github.com/wtliao/text2image