simple_diarizer
Simplified diarization pipeline using some pretrained models.
Made to be a simple as possible to go from an input audio file to diarized segments.
import soundfile as sf
import matplotlib.pyplot as plt
from simple_diarizer.diarizer import Diarizer
from simple_diarizer.utils import combined_waveplot
diar = Diarizer(
embed_model='xvec', # 'xvec' and 'ecapa' supported
cluster_method='sc' # 'ahc' and 'sc' supported
)
segments = diar.diarize(WAV_FILE, num_speakers=NUM_SPEAKERS)
signal, fs = sf.read(WAV_FILE)
combined_waveplot(signal, fs, segments)
plt.show()
Source Video
Pre-trained Models
The following pretrained models are used:
- Voice Activity Detection (VAD)
- Deep speaker embedding extraction
- (Optional/Experimental) Speech-to-text
- ESPnet Model Zoo
- English ASR model
- ESPnet Model Zoo
Demo
It can be checked out in the above link, where it will try and diarize any input YouTube URL. It will also use YouTube's autogenerated transcriptions to produce a speaker labelled transcription.
Hopefully this can be of use as a free basic tool to produce a diarized transcript of a video/audio of interest.
Other References
- Spectral clustering methods lifted from https://github.com/wq2012/SpectralCluster