Multiband RNN_MS

Fast and Simple vocoder, Multiband RNN_MS.


Quick Training

How to Use

1. Install

# pip install "torch==1.10.0" -q      # Based on your environment (validated with v1.10)
# pip install "torchaudio==0.10.0" -q # Based on your environment
pip install git+

2. Data & Preprocessing

“Batteries Included”.
RNNMS transparently download corpus and preprocess it for you

3. Train

python -m mbrnnms.main_train

For arguments, check ./mbrnnms/

Advanced: Other datasets

You can switch dataset with arguments.
All speechcorpusy‘s preset corpuses are supported.

# LJSpeech corpus
python -m mbrnnms.main_train data.data_name=LJ

Advanced: Custom dataset

Copy mbrnnms.main_train and replace DataModule.

    # datamodule = LJSpeechDataModule(batch_size, ...)
    datamodule = YourSuperCoolDataModule(batch_size, ...)
    # That's all!

System Details


  • PreNet: GRU
  • Upsampler: time-directional nearest interpolation
  • Decoder: Embedding-auto-regressive generative RNN with 10-bit μ-law encoding


Output Sample



X [iter/sec] @ NVIDIA T4 on Google Colaboratory (AMP+, num_workers=8)

It takes about Ydays for full training.



  • : Basic vocoder concept came from this paper.
  • bshall/UniversalVocoding: Model and hyperparams are derived from this repository. All codes are re-written.


