Openspeech
Openspeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recognition. We aim to make ASR technology easier to use for everyone.
Openspeech is backed by the two powerful libraries — PyTorch-Lightning and Hydra.
Various features are available in the above two libraries, including Multi-GPU and TPU training, Mixed-precision, and hierarchical configuration management.
Get Started
We use Hydra to control all the training configurations.
If you are not familiar with Hydra we recommend visiting the Hydra website.
Generally, Hydra is an open-source framework that simplifies the development of research applications by providing the ability to create a hierarchical configuration dynamically.
If you want to know how we used Hydra, we recommend you to read here.
Supported Datasets
We support LibriSpeech, KsponSpeech, and AISHELL-1.
LibriSpeech is a corpus of approximately 1,000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data was derived from reading audiobooks from the LibriVox project, and has been carefully segmented and aligned.
Aishell is an open-source Chinese Mandarin speech corpus published by Beijing Shell Shell Technology Co.,Ltd. 400 people from different accent areas in China were invited to participate in the recording, which was conducted in a quiet indoor environment using high fidelity microphone and downsampled to 16kHz.
KsponSpeech is a large-scale spontaneous speech corpus of Korean. This corpus contains 969 hours of general open-domain dialog utterances, spoken by about 2,000 native Korean speakers in a clean environment. All data were constructed by recording the dialogue of two people freely conversing on a variety of topics and manually transcribing the utterances. To start training, the KsponSpeech dataset must be prepared in advance. To download KsponSpeech, you need permission from AI Hub.
Pre-processed Manifest Files
Dataset | Unit | Manifest | Vocab | SP-Model |
---|---|---|---|---|
LibriSpeech | character | [Link] | [Link] | - |
LibriSpeech | subword | [Link] | [Link] | [Link] |
AISHELL-1 | character | [Link] | [Link] | - |
KsponSpeech | character | [Link] | [Link] | - |
KsponSpeech | subword | [Link] | [Link] | [Link] |
KsponSpeech | grapheme | [Link] | [Link] | - |
KsponSpeech needs permission from AI Hub.
Please send e-mail including the approved screenshot to [email protected].
Manifest File
- Manifest file format:
LibriSpeech/test-other/8188/269288/8188-269288-0052.flac ▁ANNIE ' S ▁MANNER ▁WAS ▁VERY ▁MYSTERIOUS 4039 20 5 531 17 84 2352
LibriSpeech/test-other/8188/269288/8188-269288-0053.flac ▁ANNIE ▁DID ▁NOT ▁MEAN ▁TO ▁CONFIDE ▁IN ▁ANYONE ▁THAT ▁NIGHT ▁AND ▁THE ▁KIND EST ▁THING ▁WAS ▁TO ▁LEAVE ▁HER ▁A LONE 4039 99 35 251 9 4758 11 2454 16 199 6 4 323 200 255 17 9 370 30 10 492
LibriSpeech/test-other/8188/269288/8188-269288-0054.flac ▁TIRED ▁OUT ▁LESLIE ▁HER SELF ▁DROPP ED ▁A SLEEP 1493 70 4708 30 115 1231 7 10 1706
LibriSpeech/test-other/8188/269288/8188-269288-0055.flac ▁ANNIE ▁IS ▁THAT ▁YOU ▁SHE ▁CALL ED ▁OUT 4039 34 16 25 37 208 7 70
LibriSpeech/test-other/8188/269288/8188-269288-0056.flac ▁THERE ▁WAS ▁NO ▁REPLY ▁BUT ▁THE ▁SOUND ▁OF ▁HURRY ING ▁STEPS ▁CAME ▁QUICK ER ▁AND ▁QUICK ER ▁NOW ▁AND ▁THEN ▁THEY ▁WERE ▁INTERRUPTED ▁BY ▁A ▁GROAN 57 17 56 1368 33 4 489 8 1783 14 1381 133 571 49 6 571 49 82 6 76 45 54 2351 44 10 3154
LibriSpeech/test-other/8188/269288/8188-269288-0057.flac ▁OH ▁THIS ▁WILL ▁KILL ▁ME ▁MY ▁HEART ▁WILL ▁BREAK ▁THIS ▁WILL ▁KILL ▁ME 299 46 71 669 50 41 235 71 977 46 71 669 50
...
...
Training examples
You can simply train with LibriSpeech dataset like below:
- Example1: Train the
conformer-lstm
model withfilter-bank
features on GPU.
$ python ./openspeech_cli/hydra_train.py \
dataset=librispeech \
dataset.dataset_download=True \
dataset.dataset_path=$DATASET_PATH \
dataset.manifest_file_path=$MANIFEST_FILE_PATH \
vocab=libri_subword \
model=conformer_lstm \
audio=fbank \
lr_scheduler=warmup_reduce_lr_on_plateau \
trainer=gpu \
criterion=joint_ctc_cross_entropy
You can simply train with KsponSpeech dataset like below:
- Example2: Train the
listen-attend-spell
model withmel-spectrogram
features On TPU:
$ python ./openspeech_cli/hydra_train.py \
dataset=ksponspeech \
dataset.dataset_path=$DATASET_PATH \
dataset.manifest_file_path=$MANIFEST_FILE_PATH \
dataset.test_dataset_path=$TEST_DATASET_PATH \
dataset.test_manifest_dir=$TEST_MANIFEST_DIR \
vocab=kspon_character \
model=listen_attend_spell \
audio=melspectrogram \
lr_scheduler=warmup_reduce_lr_on_plateau \
trainer=tpu \
criterion=joint_ctc_cross_entropy
You can simply train with AISHELL-1 dataset like below:
- Example2: Train the
quartznet
model withmfcc
features On GPU with FP16:
$ python ./openspeech_cli/hydra_train.py \
dataset=aishell \
dataset.dataset_path=$DATASET_PATH \
dataset.dataset_download=True \
dataset.manifest_file_path=$MANIFEST_FILE_PATH \
vocab=aishell_character \
model=quartznet15x5 \
audio=mfcc \
lr_scheduler=warmup_reduce_lr_on_plateau \
trainer=gpu-fp16 \
criterion=ctc
Evaluation examples
- Example1: Evaluation the
listen_attend_spell
model:
$ python ./openspeech_cli/hydra_eval.py \
audio=melspectrogram \
eval.model_name=listen_attend_spell \
eval.dataset_path=$DATASET_PATH \
eval.checkpoint_path=$CHECKPOINT_PATH \
eval.manifest_file_path=$MANIFEST_FILE_PATH
- Example2: Evaluation the
listen_attend_spell
,conformer_lstm
models with ensemble:
$ python ./openspeech_cli/hydra_eval.py \
audio=melspectrogram \
eval.model_names=(listen_attend_spell, conformer_lstm) \
eval.dataset_path=$DATASET_PATH \
eval.checkpoint_paths=($CHECKPOINT_PATH1, $CHECKPOINT_PATH2) \
eval.ensemble_weights=(0.3, 0.7) \
eval.ensemble_method=weighted \
eval.manifest_file_path=$MANIFEST_FILE_PATH
Installation
This project recommends Python 3.7 or higher.
We recommend creating a new virtual environment for this project (using virtual env or conda).
Prerequisites
- numpy:
pip install numpy
(Refer here for problem installing Numpy). - pytorch: Refer to PyTorch website to install the version w.r.t. your environment.
- librosa:
conda install -c conda-forge librosa
(Refer here for problem installing librosa) - torchaudio:
pip install torchaudio==0.6.0
(Refer here for problem installing torchaudio) - sentencepiece:
pip install sentencepiece
(Refer here for problem installing sentencepiece) - pytorch-lightning:
pip install pytorch-lightning
(Refer here for problem installing pytorch-lightning) - hydra:
pip install hydra-core --upgrade
(Refer here for problem installing hydra) - warp-rnnt: Refer to warp-rnnt page to install the library.
- ctcdecode: Refer to ctcdecode page to install the library.
Install from pypi
You can install openspeech with pypi.
pip install openspeech-core
Install from source
Currently we only support installation from source code using setuptools. Checkout the source code and run the
following commands:
$ ./install.sh
Install Apex (for 16-bit training)
For faster training install NVIDIA's apex library:
$ git clone https://github.com/NVIDIA/apex
$ cd apex
# ------------------------
# OPTIONAL: on your cluster you might need to load CUDA 10 or 9
# depending on how you installed PyTorch
# see available modules
module avail
# load correct CUDA before install
module load cuda-10.0
# ------------------------
# make sure you've loaded a cuda version > 4.0 and < 7.0
module load gcc-6.1.0
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./