Speech

A collection of 113 posts

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

09 July 2022

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

01 April 2022

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

31 March 2022

HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

31 March 2022

STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation

STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation

18 March 2022

A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

19 February 2022

Yet another Telegram Voice Recognition bot but using vosk and supports 20+ languages

Yet another Telegram Voice Recognition bot but using vosk and supports 20+ languages

15 February 2022

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

14 February 2022

Convert PDF to AudioBook and Audio Speech to PDF

Convert PDF to AudioBook and Audio Speech to PDF

14 February 2022

Speech recognition

How to make use of the Speech Recognition and pyttsx3 library of Python

How to make use of the Speech Recognition and pyttsx3 library of Python

14 February 2022

CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition

CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition

13 February 2022

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

11 February 2022

SpeechHacks - QHacks 2022 Project

SpeechHacks - QHacks 2022 Project

30 January 2022

Creating an Audiobook (mp3 file) using a Ebook (epub) using BeautifulSoup and Google Text to Speech

Creating an Audiobook (mp3 file) using a Ebook (epub) using BeautifulSoup and Google Text to Speech

30 January 2022

Some utils for auto speech recognition

Some utils for auto speech recognition

25 January 2022

Simple Text-To-Speech Bot For Discord made with python

Simple Text-To-Speech Bot For Discord made with python

24 January 2022

A real-time speech emotion recognition application using Scikit-learn and gradio

A real-time speech emotion recognition application using Scikit-learn and gradio

A real-time speech emotion recognition application using Scikit-learn and gradio

21 January 2022

A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk

A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk

21 January 2022

TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

20 January 2022

A bot that interacts with you over voice and sends mail.Uses speech_recognition,pyttsx3 and smtplib

A bot that interacts with you over voice and sends mail.Uses speech_recognition,pyttsx3 and smtplib

18 January 2022

PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

12 January 2022

Self-Supervised

A self-supervised learning framework for audio-visual speech

A self-supervised learning framework for audio-visual speech

09 January 2022

Speech Recognition Database Management with python

Speech Recognition Database Management with python

09 January 2022

API for SpeechAnalytics integration with FreePBX/Asterisk

API for SpeechAnalytics integration with FreePBX/Asterisk

08 January 2022

Speech to text streamlit app

Speech to text streamlit app

08 January 2022

ConferencingSpeech2022; Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge

ConferencingSpeech2022; Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge

08 January 2022

API for Speech Analytics integration with FreePBX/Asterisk

API for Speech Analytics integration with FreePBX/Asterisk

08 January 2022

Paddlespeech Streaming ASR GUI

Paddlespeech Streaming ASR GUI

05 January 2022

Automatic speech recognition(ASR),中文语音识别

Automatic speech recognition(ASR),中文语音识别

04 January 2022

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

02 January 2022

A Pytorch implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recogni

A Pytorch implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recogni

01 January 2022

基于百度的语音识别，用python实现，pyaudio+pyqt

基于百度的语音识别，用python实现，pyaudio+pyqt

30 December 2021

Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition

Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition

21 December 2021

A really simple text-to-speech app made with python and tkinter

A really simple text-to-speech app made with python and tkinter

21 December 2021

Create light scenes , voice control, ifttt, fuzzywuzzy speech correction and much more with Tuya light bulbs

Create light scenes , voice control, ifttt, fuzzywuzzy speech correction and much more with Tuya light bulbs

Create light scenes , voice control, ifttt, fuzzywuzzy speech correction and much more with Tuya light bulbs

20 December 2021

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

19 December 2021

PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

29 November 2021

Combine Tacotron2 and Hifi GAN to generate speech from text

Combine Tacotron2 and Hifi GAN to generate speech from text

28 November 2021

Efficient Speech Processing Tookit for Automatic Speaker Recognition

Efficient Speech Processing Tookit for Automatic Speaker Recognition

Efficient Speech Processing Tookit for Automatic Speaker Recognition

24 November 2021

A CSRankings-like index for speech researchers

A CSRankings-like index for speech researchers

21 November 2021

Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

17 November 2021

A simple Speech Emotion Recognition (SER) API created using Flask and running in a Docker container

A simple Speech Emotion Recognition (SER) API created using Flask and running in a Docker container

10 November 2021

A Python library provides common speech features for ASR including MFCCs and filterbank energies

A Python library provides common speech features for ASR including MFCCs and filterbank energies

04 November 2021

using speech recognition change .wav file to .txt file

using speech recognition change .wav file to .txt file

30 October 2021

LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech

LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech

LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech

24 October 2021

Minimal GUI for accessing the Watson Text to Speech service

Minimal GUI for accessing the Watson Text to Speech service

23 October 2021

Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR

Speech_38_ru_commands Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR Программа умеет распознавать 38 ключевых слов на русском языке , произнесенных в микрофон из списка:

23 October 2021

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

20 October 2021

A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

20 October 2021

CPC-big and k-means clustering for zero-resource speech processing

CPC-big and k-means clustering for zero-resource speech processing

18 October 2021