The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution. Audio samples can be found here.

Feel free to create issues or send an email to [email protected] if you have problems running the code.

Before running the code, you need to install the dependicies by pip install -r requirements.txt.

The configs for model architecture and training scheme is saved in config.yaml. You can overwrite some of the attributes by adding the --hparams flag when running a command. The general way to run a python script is

python $SRC$ --config $CONFIG$ --hparams $KEY1$=$VALUE1$,$KEY2$=$VALUE2$,...

See for more details.

To prepare data

Before training, you need to binarize the data first. The raw wav files should be put in the hparams['raw_data_path']. The binarized data would be put in the hparams['binary_data_path'].

Specifically, for the VCTK corpus, the file structure should be like


where the model checkpoints are in checkpoints/wsrglow.

The command to binarize is

python --config config.yaml

To modify the architecture of the model

The current WSRGlow model in is designed for x4 super-resolution and takes waveform, spectrogram and phase information as input.

To train

Run python --config config.yaml on a GPU.

To infer

Change the code in to specify the checkpoint you want to load and the sample inputs you want to use for inference. Run python --config config.yaml on a GPU, modify the code for the correct path of checkpoints and wav files.