Plan-then-Generate: Controlled Data-to-Text Generation via Planning

Authors: Yixuan Su, David Vandyke, Sihui Wang, Yimai Fang, and Nigel Collier

Code for EMNLP 2021 paper Plan-then-Generate: Controlled Data-to-Text Generation via Planning

1. Environment Setup:

(1) Hardware Requirement:

The code in this repo is thoroughly tested on our machine with a single Nvida V100 GPU (16GB)

(2) Installation:

chmod +x ./config_setup.sh
./config_setup.sh

2. ToTTo Data Preprocessing:

Option (1): Preprocess the ToTTo data from scratch by yourself:

cd ./data
chmod +x ./prepare_data.sh
./prepare_data.sh

This process could take up to 1 hour

Option (2): Download the our processed data here

unzip data.zip and replace with the empty ./data folder

For more details about ToTTo dataset, please refer to the original Google Research repo

3. Content Planner:

Please refer to README.md in ./content_planner folder

4. Sequence Generator:

Please refer to README.md in ./generator folder

5. Citation

If you find our paper and resources useful, please kindly cite our paper:

@inproceedings{su2021plangen,
    title={Plan-then-Generate: Controlled Data-to-Text Generation via Planning}, 
     author={Yixuan Su and David Vandyke and Sihui Wang and Yimai Fang and Nigel Collier},
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    publisher = "Association for Computational Linguistics",
}

GitHub

https://github.com/yxuansu/PlanGen