wav2vec-toolkit

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

This repository accompanies the ? HuggingFace Community Paper on finetuning Wav2Vec2 XLSR for
low-resource languages [link]

How to contribute

(Mostly identical to the huggingface/datasets contributing guide)

  1. Fork the repository by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.

  2. Clone your fork to your local disk, and add the base repository as a remote:

    git clone [email protected]:<your Github handle>/wav2vec-toolkit.git
    cd wav2vec-toolkit
    git remote add upstream https://github.com/anton-l/wav2vec-toolkit.git
    
  3. Set up a development environment by running the following command in a virtual environment:

    conda create -n env python=3.7 --y
    conda activate env
    pip install -e ".[dev]"
    pip install -r languages/{YOUR_SPECIFIC_LANGUAGE}/requirements.txt
    

    (If wav2vec-toolkit was already installed in the virtual environment, remove
    it with pip uninstall wav2vec_toolkit before reinstalling it in editable
    mode with the -e flag.)

  4. Create a new branch to hold your development changes:

    git checkout -b a-descriptive-name-for-my-changes
    

    do not work on the master branch.

  5. Develop the features on your branch.

    1. Adding a new language here
  6. Format your code. Run black and isort so that your newly added files look nice with the following command:

    black --line-length 119 --target-version py36 src scripts languages
    isort src scripts languages
    
  7. Once you're happy with your implementation, add your changes and make a commit to record your changes locally:

    git add .
    git commit
    

    It is a good idea to sync your copy of the code with the original
    repository regularly. This way you can quickly account for changes:

    git fetch upstream
    git rebase upstream/main
    

    Push the changes to your account using:

    git push -u origin a-descriptive-name-for-my-changes
    
  8. Once you are satisfied, go the webpage of your fork on GitHub. Click on "Pull request" to send your to the project maintainers for review.

GitHub

https://github.com/anton-l/wav2vec-toolkit