Parallel WaveGAN implementation with Pytorch

This repository provides UNOFFICIAL pytorch implementations of the following models:

You can combine these state-of-the-art non-autoregressive models to build your own great vocoder!

Please check our samples in our demo HP.

Source of the figure: https://arxiv.org/pdf/1910.11480.pdf

The goal of this repository is to provide real-time neural vocoder, which is compatible with ESPnet-TTS.
Also, this repository can be combined with NVIDIA/tacotron2-based implementation (See this comment).

You can try the real-time end-to-end text-to-speech demonstration in Google Colab!

Real-time demonstration with ESPnet2
Real-time demonstration with ESPnet1

What’s new

2021/10/21 Single-speaker Korean recipe [egs/kss/voc1] is available.
2021/08/24 Add more pretrained models of StyleMelGAN and HiFi-GAN.
2021/08/07 Add initial pretrained models of StyleMelGAN and HiFi-GAN.
2021/08/03 Support StyleMelGAN generator and discriminator!
2021/08/02 Support HiFi-GAN generator and discriminator!
2020/10/07 JSSS recipe is available!
2020/08/19 Real-time demo with ESPnet2 is available!
2020/05/29 VCTK, JSUT, and CSMSC multi-band MelGAN pretrained model is available!
2020/05/27 New LJSpeech multi-band MelGAN pretrained model is available!
2020/05/24 LJSpeech full-band MelGAN pretrained model is available!
2020/05/22 LJSpeech multi-band MelGAN pretrained model is available!
2020/05/16 Multi-band MelGAN is available!
2020/03/25 LibriTTS pretrained models are available!
2020/03/17 Tensorflow conversion example notebook is available (Thanks, @dathudeptrai)!
2020/03/16 LibriTTS recipe is available!
2020/03/12 PWG G + MelGAN D + STFT-loss samples are available!
2020/03/12 Multi-speaker English recipe egs/vctk/voc1 is available!
2020/02/22 MelGAN G + MelGAN D + STFT-loss samples are available!
2020/02/12 Support MelGAN‘s discriminator!
2020/02/08 Support MelGAN‘s generator!

Requirements

This repository is tested on Ubuntu 20.04 with a GPU Titan V.

Python 3.6+
Cuda 10.0+
CuDNN 7+
NCCL 2+ (for distributed multi-gpu training)
libsndfile (you can install via sudo apt install libsndfile-dev in ubuntu)
jq (you can install via sudo apt install jq in ubuntu)
sox (you can install via sudo apt install sox in ubuntu)

Different cuda version should be working but not explicitly tested.
All of the codes are tested on Pytorch 1.4, 1.5.1, 1.7.1, 1.8.1, and 1.9.

Pytorch 1.6 works but there are some issues in cpu mode (See #198).

Setup

You can select the installation method from two alternatives.

A. Use pip

$ git clone https://github.com/kan-bayashi/ParallelWaveGAN.git
$ cd ParallelWaveGAN
$ pip install -e .
# If you want to use distributed training, please install
# apex manually by following https://github.com/NVIDIA/apex
$ ...

Note that your cuda version must be exactly matched with the version used for the pytorch binary to install apex.
To install pytorch compiled with different cuda version, see tools/Makefile.

B. Make virtualenv

$ git clone https://github.com/kan-bayashi/ParallelWaveGAN.git
$ cd ParallelWaveGAN/tools
$ make
# If you want to use distributed training, please run following
# command to install apex.
$ make apex

Note that we specify cuda version used to compile pytorch wheel.
If you want to use different cuda version, please check tools/Makefile to change the pytorch wheel to be installed.

Recipe

This repository provides Kaldi-style recipes, as the same as ESPnet.
Currently, the following recipes are supported.

LJSpeech: English female speaker
JSUT: Japanese female speaker
JSSS: Japanese female speaker
CSMSC: Mandarin female speaker
CMU Arctic: English speakers
JNAS: Japanese multi-speaker
VCTK: English multi-speaker
LibriTTS: English multi-speaker
YesNo: English speaker (For debugging)

To run the recipe, please follow the below instruction.

<div class="highlight highlight-source-shell position-relative overflow-auto" data-snippet-clipboard-copy-content="# Let us move on the recipe directory
$ cd egs/ljspeech/voc1

# Run the recipe from scratch
$ ./run.sh

# You can change config via command line
$ ./run.sh –conf

# You can select the stage to start and stop
$ ./run.sh –stage 2 –stop_stage 2

# If you want to specify the gpu
$ CUDA_VISIBLE_DEVICES=1 ./run.sh –stage 2

# If you want to resume training from 10000 steps checkpoint
$ ./run.sh –stage 2 –resume //checkpoint-10000steps.pkl
“>

# Let us move on the recipe directory
$ cd egs/ljspeech/voc1

# Run the recipe from scratch
$ ./run.sh

# You can change config via command line
$ ./run.sh --conf <your_customized_yaml_config>

# You can select the stage to start and stop
$ ./run.sh --stage 2 --stop_stage 2

# If you want to specify the gpu
$ CUDA_VISIBLE_DEVICES=1 ./run.sh --stage 2

# If you want to resume training from 10000 steps checkpoint
$ ./run.sh --stage 2 --resume <path>/<to>/checkpoint-10000steps.pkl

<div class="highlight highlight-source-shell position-relative overflow-auto" data-snippet-clipboard-copy-content="[decode]: 100%|██████████| 250/250 [00:30

Model	Conf	Lang	Fs [Hz]	Mel range [Hz]	FFT / Hop / Win [pt]	# iters
ljspeech_parallel_wavegan.v1	link	EN	22.05k	80-7600	1024 / 256 / None	400k
ljspeech_parallel_wavegan.v1.long	link	EN	22.05k	80-7600	1024 / 256 / None	1M
ljspeech_parallel_wavegan.v1.no_limit	link	EN	22.05k	None	1024 / 256 / None	400k
ljspeech_parallel_wavegan.v3	link	EN	22.05k	80-7600	1024 / 256 / None	3M
ljspeech_melgan.v1	link	EN	22.05k	80-7600	1024 / 256 / None	400k
ljspeech_melgan.v1.long	link	EN	22.05k	80-7600	1024 / 256 / None	1M
ljspeech_melgan_large.v1	link	EN	22.05k	80-7600	1024 / 256 / None	400k
ljspeech_melgan_large.v1.long	link	EN	22.05k	80-7600	1024 / 256 / None	1M
ljspeech_melgan.v3	link	EN	22.05k	80-7600	1024 / 256 / None	2M
ljspeech_melgan.v3.long	link	EN	22.05k	80-7600	1024 / 256 / None	4M
ljspeech_full_band_melgan.v1	link	EN	22.05k	80-7600	1024 / 256 / None	1M
ljspeech_full_band_melgan.v2	link	EN	22.05k	80-7600	1024 / 256 / None	1M
ljspeech_multi_band_melgan.v1	link	EN	22.05k	80-7600	1024 / 256 / None	1M
ljspeech_multi_band_melgan.v2	link	EN	22.05k	80-7600	1024 / 256 / None	1M
ljspeech_hifigan.v1	link	EN	22.05k	80-7600	1024 / 256 / None	2.5M
ljspeech_style_melgan.v1	link	EN	22.05k	80-7600	1024 / 256 / None	1.5M
jsut_parallel_wavegan.v1	link	JP	24k	80-7600	2048 / 300 / 1200	400k
jsut_multi_band_melgan.v2	link	JP	24k	80-7600	2048 / 300 / 1200	1M
just_hifigan.v1	link	JP	24k	80-7600	2048 / 300 / 1200	2.5M
just_style_melgan.v1	link	JP	24k	80-7600	2048 / 300 / 1200	1.5M
csmsc_parallel_wavegan.v1	link	ZH	24k	80-7600	2048 / 300 / 1200	400k
csmsc_multi_band_melgan.v2	link	ZH	24k	80-7600	2048 / 300 / 1200	1M
csmsc_hifigan.v1	link	ZH	24k	80-7600	2048 / 300 / 1200	2.5M
csmsc_style_melgan.v1	link	ZH	24k	80-7600	2048 / 300 / 1200	1.5M
arctic_slt_parallel_wavegan.v1	link	EN	16k	80-7600	1024 / 256 / None	400k
jnas_parallel_wavegan.v1	link	JP	16k	80-7600	1024 / 256 / None	400k
vctk_parallel_wavegan.v1	link	EN	24k	80-7600	2048 / 300 / 1200	400k
vctk_parallel_wavegan.v1.long	link	EN	24k	80-7600	2048 / 300 / 1200	1M
vctk_multi_band_melgan.v2	link	EN	24k	80-7600	2048 / 300 / 1200	1M
vctk_hifigan.v1	link	EN	24k	80-7600	2048 / 300 / 1200	2.5M
vctk_style_melgan.v1	link	EN	24k	80-7600	2048 / 300 / 1200	1.5M
libritts_parallel_wavegan.v1	link	EN	24k	80-7600	2048 / 300 / 1200	400k
libritts_parallel_wavegan.v1.long	link	EN	24k	80-7600	2048 / 300 / 1200	1M
libritts_multi_band_melgan.v2	link	EN	24k	80-7600	2048 / 300 / 1200	1M
libritts_hifigan.v1	link	EN	24k	80-7600	2048 / 300 / 1200	2.5M
libritts_style_melgan.v1	link	EN	24k	80-7600	2048 / 300 / 1200	1.5M
kss_parallel_wavegan.v1	link	KO	24k	80-7600	2048 / 300 / 1200	400k
hui_acg_hokuspokus_parallel_wavegan.v1	link	DE	24k	80-7600	2048 / 300 / 1200	400k
ruslan_parallel_wavegan.v1	link	RU	24k	80-7600	2048 / 300 / 1200	400k

Parallel WaveGAN implementation with Pytorch

Parallel WaveGAN implementation with Pytorch

What’s new

Requirements

Setup

A. Use pip

B. Make virtualenv

Recipe

Speed

Results

How-to-use pretrained models

Analysis-synthesis

Decoding with ESPnet-TTS model’s features

Decoding with dumped npy files

References

Acknowledgement

Author

GitHub

John

Pass2Pwn: a simple python3 tool created to assist penetration testers generate possible passwords

ScreenshotLogger works just like a keylogger but instead of capturing keystroke,it captures the screen, stores it or sends via email

Parallel WaveGAN implementation with Pytorch

What’s new

Requirements

Setup

A. Use pip

B. Make virtualenv

Recipe

Speed

Results

How-to-use pretrained models

Analysis-synthesis

Decoding with ESPnet-TTS model’s features

Decoding with dumped npy files

References

Acknowledgement

Author

GitHub

Pass2Pwn: a simple python3 tool created to assist penetration testers generate possible passwords

ScreenshotLogger works just like a keylogger but instead of capturing keystroke,it captures the screen, stores it or sends via email

You might also like...