This repository includes the implementation for Text to Image Generation with Semantic-Spatial Aware GAN

This repo is not completely.

Network Structure


The structure of the spatial-semantic aware convolutional network (SSACN) is shown as below



  • python 3.6+
  • pytorch 1.0+
  • numpy
  • matplotlib
  • opencv

Or install full requirements by running:

pip install -r requirements.txt


  • [x] instruction to prepare dataset
  • [ ] remove all unnecessary files
  • [x] add link to download our pre-trained model
  • [ ] clean code including comments
  • [ ] instruction for training
  • [ ] instruction for evaluation

Prepare data

  1. Download the preprocessed metadata for birds coco and save them to data/
  2. Download the birds image data. Extract them to data/birds/
  3. Download coco dataset and extract the images to data/coco/

Pre-trained text encoder

  1. Download the pre-trained text encoder for CUB and save it to DAMSMencoders/bird/inception/
  2. Download the pre-trained text encoder for coco and save it to DAMSMencoders/coco/inception/

Trained model

you can download our trained models from our onedrive repo

Start training

See opts.py for the options.


please run IS.py and test_lpips.py (remember to change the image path) to evaluate the IS and diversity scores, respectively.

For evaluating the FID score, please use this repo https://github.com/bioinf-jku/TTUR.


You will get the scores close to below after training under xe loss for xxxxx epochs:


Qualitative Results

Some qualitative results on coco and birds dataset from different methods are shown as follows:

The predicted mask maps on different stages are shown as as follows:


If you find this repo helpful in your research, please consider citing our paper:

  title={Text to Image Generation with Semantic-Spatial Aware GAN},
  author={Liao, Wentong and Hu, Kai and Yang, Michael Ying and Rosenhahn, Bodo},
  journal={arXiv preprint arXiv:2104.00567},