wav2vec-toolkit
A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models
This repository accompanies the ? HuggingFace Community Paper on finetuning Wav2Vec2 XLSR for
low-resource languages [link]
How to contribute
(Mostly identical to the huggingface/datasets contributing guide)
-
Fork the repository by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.
-
Clone your fork to your local disk, and add the base repository as a remote:
git clone [email protected]:<your Github handle>/wav2vec-toolkit.git cd wav2vec-toolkit git remote add upstream https://github.com/anton-l/wav2vec-toolkit.git
-
Set up a development environment by running the following command in a virtual environment:
conda create -n env python=3.7 --y conda activate env pip install -e ".[dev]" pip install -r languages/{YOUR_SPECIFIC_LANGUAGE}/requirements.txt
(If wav2vec-toolkit was already installed in the virtual environment, remove
it withpip uninstall wav2vec_toolkit
before reinstalling it in editable
mode with the-e
flag.) -
Create a new branch to hold your development changes:
git checkout -b a-descriptive-name-for-my-changes
do not work on the
master
branch. -
Develop the features on your branch.
- Adding a new language here
-
Format your code. Run black and isort so that your newly added files look nice with the following command:
black --line-length 119 --target-version py36 src scripts languages isort src scripts languages
-
Once you're happy with your implementation, add your changes and make a commit to record your changes locally:
git add . git commit
It is a good idea to sync your copy of the code with the original
repository regularly. This way you can quickly account for changes:git fetch upstream git rebase upstream/main
Push the changes to your account using:
git push -u origin a-descriptive-name-for-my-changes
-
Once you are satisfied, go the webpage of your fork on GitHub. Click on "Pull request" to send your to the project maintainers for review.