Parallel WaveGAN implementation with Pytorch

Open In Colab

This repository provides UNOFFICIAL pytorch implementations of the following models:

You can combine these state-of-the-art non-autoregressive models to build your own great vocoder!

Please check our samples in our demo HP.

Source of the figure: https://arxiv.org/pdf/1910.11480.pdf

The goal of this repository is to provide real-time neural vocoder, which is compatible with ESPnet-TTS.
Also, this repository can be combined with NVIDIA/tacotron2-based implementation (See this comment).

You can try the real-time end-to-end text-to-speech demonstration in Google Colab!

  • Real-time demonstration with ESPnet2 Open In Colab
  • Real-time demonstration with ESPnet1 Open In Colab

What’s new

Requirements

This repository is tested on Ubuntu 20.04 with a GPU Titan V.

  • Python 3.6+
  • Cuda 10.0+
  • CuDNN 7+
  • NCCL 2+ (for distributed multi-gpu training)
  • libsndfile (you can install via sudo apt install libsndfile-dev in ubuntu)
  • jq (you can install via sudo apt install jq in ubuntu)
  • sox (you can install via sudo apt install sox in ubuntu)

Different cuda version should be working but not explicitly tested.
All of the codes are tested on Pytorch 1.4, 1.5.1, 1.7.1, 1.8.1, and 1.9.

Pytorch 1.6 works but there are some issues in cpu mode (See #198).

Setup

You can select the installation method from two alternatives.

A. Use pip

$ git clone https://github.com/kan-bayashi/ParallelWaveGAN.git
$ cd ParallelWaveGAN
$ pip install -e .
# If you want to use distributed training, please install
# apex manually by following https://github.com/NVIDIA/apex
$ ...

Note that your cuda version must be exactly matched with the version used for the pytorch binary to install apex.
To install pytorch compiled with different cuda version, see tools/Makefile.

B. Make virtualenv

$ git clone https://github.com/kan-bayashi/ParallelWaveGAN.git
$ cd ParallelWaveGAN/tools
$ make
# If you want to use distributed training, please run following
# command to install apex.
$ make apex

Note that we specify cuda version used to compile pytorch wheel.
If you want to use different cuda version, please check tools/Makefile to change the pytorch wheel to be installed.

Recipe

This repository provides Kaldi-style recipes, as the same as ESPnet.
Currently, the following recipes are supported.

  • LJSpeech: English female speaker
  • JSUT: Japanese female speaker
  • JSSS: Japanese female speaker
  • CSMSC: Mandarin female speaker
  • CMU Arctic: English speakers
  • JNAS: Japanese multi-speaker
  • VCTK: English multi-speaker
  • LibriTTS: English multi-speaker
  • YesNo: English speaker (For debugging)

To run the recipe, please follow the below instruction.

<div class="highlight highlight-source-shell position-relative overflow-auto" data-snippet-clipboard-copy-content="# Let us move on the recipe directory
$ cd egs/ljspeech/voc1

# Run the recipe from scratch
$ ./run.sh

# You can change config via command line
$ ./run.sh –conf

# You can select the stage to start and stop
$ ./run.sh –stage 2 –stop_stage 2

# If you want to specify the gpu
$ CUDA_VISIBLE_DEVICES=1 ./run.sh –stage 2

# If you want to resume training from 10000 steps checkpoint
$ ./run.sh –stage 2 –resume //checkpoint-10000steps.pkl
“>

# Let us move on the recipe directory
$ cd egs/ljspeech/voc1

# Run the recipe from scratch
$ ./run.sh

# You can change config via command line
$ ./run.sh --conf <your_customized_yaml_config>

# You can select the stage to start and stop
$ ./run.sh --stage 2 --stop_stage 2

# If you want to specify the gpu
$ CUDA_VISIBLE_DEVICES=1 ./run.sh --stage 2

# If you want to resume training from 10000 steps checkpoint
$ ./run.sh --stage 2 --resume <path>/<to>/checkpoint-10000steps.pkl