- Python >= 3.7
- Clone this repository:
git https://github.com/yl4579/AuxiliaryASR.git cd AuxiliaryASR
- Install python requirements:
pip install SoundFile torchaudio torch jiwer pyyaml click matplotlib g2p_en
- Prepare your own dataset and put the
Datafolder (see Training section for more details).
python train.py --config_path ./Configs/config.yml
Please specify the training and validation data in
config.yml file. The data list format needs to be
filename.wav|label|speaker_number, see train_list.txt as an example (a subset for LJSpeech). Note that
speaker_number can just be
0 for ASR, but it is useful to set a meaningful number for TTS training (if you need to use this repo for StyleTTS).
Checkpoints and Tensorboard logs will be saved at
log_dir. To speed up training, you may want to make
batch_size as large as your GPU RAM can take. However, please note that
batch_size = 64 will take around 10G GPU RAM.
This repo is set up for English with the g2p_en package, but you can train it with other languages. If you would like to train for datasets in different languages, you will need to modify the meldataset.py file (L86-93) with your own phonemizer. You also need to change the vocabulary file (word_index_dict.txt) and change
config.yml to reflect the number of tokens. A recommended phonemizer for other languages is phonemizer.
The author would like to thank @tosaka-m for his great repository and valuable discussions.