About the dataset
LibriVAD is an open source dataset for voice activity detection in noisy environments. It is derived from LibriSpeech signals (clean subset) and DNS challenge noises.
You need to download LibriSpeech, the noise from the (datasets/noise) and the forced alignments.
To generate LibriVAD, clone the repo and run the main script :
run.sh with correct paths)
git clone https://github.com/JorisCos/LibriMix cd LibriMix ./run.sh storage_dir