EdiTTS: Score-based Editing for Controllable Text-to-Speech

This repository contains code for a user editable text-to-speech system, namely EdiTTS.

Setup

Create a Python virtual environment (venv or conda) and install package requirements as specified in requirements.txt.
For more information, you may refer to the official repository of Grad-TTS.

Mark the part to be edited with ||.

In | the face of impediments confessedly discouraging |

We provide some examplary sentences in resources/filelists/edit_pitch_example.txt

To synthesize pitch-edited speech, run

CUDA_VISIBLE_DEVICES=0 python edit_pitch.py -f resources/filelists/edit_pitch_example.txt -c checkpts/grad-tts-old.pt -t 1000 -s out/pitch/wavs

Prepare two sentences. Concatenate them with # and mark the parts to be replaced with ||.

Three others subsequently | identified | Oswald from a photograph. #Three others subsequently | recognized | Oswald from a photograph.

We provide some examplary sentences in resources/filelists/edit_content_example.txt

To synthesize content-replaced speech, run

CUDA_VISIBLE_DEVICES=0 python edit_content.py -f resources/filelists/edit_content_example.txt -c checkpts/grad-tts-old.pt -t 1000 -s out/content/wavs

Audio samples generated by EdiTTS can be found here

This repository uses the following checkpoints.