An end to end ASR Transformer model training repo

Dec 09, 2021 1 min read

END TO END ASR TRANSFORMER

本项目基于transformer 6*encoder+6*decoder的基本结构构造的端到端的语音识别系统

Model

Instructions

1.数据准备:
- 自行下载数据，遵循文件结构如下：

├── data
│   ├── train
│   ├── dev
│   ├── test

2.数据预处理：
- 运行prepare_data.py对数据进行预处理, 获得整个词表，每个样本音频的mel-scale-spectrogram，文本的token-ids
3.模型训练：
- 运行train_transformer.py --ngpus 8进行transformer网络的训练. 该网络输入mel-scale-spectrogram, 输出token-ids
4.模型推理：
- 运行evlauate.py在dev/test上测试准确率

Acknowledgements

Reference

Ashish Vaswani et al. “Attention Is All You Need” (2017).
Abdel-rahman Mohamed et al. “Transformers with convolutional context for ASR” arXiv: Computation and Language (2019).
Albert Zeyer et al. “Improved Training of End-to-end Attention Models for Speech Recognition” Conference of the International Speech Communication Association (2018).

GitHub

View Github

John was the first writer to have joined pythonawesome.com. He has since then inculcated very effective writing and reviewing culture at pythonawesome which rivals have found impossible to imitate.