EdiTTS: Score-based Editing for Controllable Text-to-Speech
This repository contains code for a user editable text-to-speech system, namely EdiTTS.
Setup
-
Create a Python virtual environment (
venv
orconda
) and install package requirements as specified inrequirements.txt
. -
For more information, you may refer to the official repository of Grad-TTS.
Pitch shifting
Mark the part to be edited with ||.
In | the face of impediments confessedly discouraging |
We provide some examplary sentences in resources/filelists/edit_pitch_example.txt
To synthesize pitch-edited speech, run
CUDA_VISIBLE_DEVICES=0 python edit_pitch.py -f resources/filelists/edit_pitch_example.txt -c checkpts/grad-tts-old.pt -t 1000 -s out/pitch/wavs
Content replacement
Prepare two sentences. Concatenate them with # and mark the parts to be replaced with ||.
Three others subsequently | identified | Oswald from a photograph. #Three others subsequently | recognized | Oswald from a photograph.
We provide some examplary sentences in resources/filelists/edit_content_example.txt
To synthesize content-replaced speech, run
CUDA_VISIBLE_DEVICES=0 python edit_content.py -f resources/filelists/edit_content_example.txt -c checkpts/grad-tts-old.pt -t 1000 -s out/content/wavs
Demo page
Audio samples generated by EdiTTS can be found here
Checkpoints
This repository uses the following checkpoints.
- Grad-TTS (old ver.): click here
- HiFi-GAN (LJ_FT_T2_V1 ver.): click here