The NL-Augmenter is a collaborative effort intended to add transformations of datasets dealing with natural language. Transformations augment text datasets in diverse ways, including: introducing spelling errors, translating to a different language, randomizing names and numbers, paraphrasing ... and whatever creative augmentation you contribute to the benchmark. We invite submissions of transformations to this framework by way of GitHub pull request, through September 1, 2021. All submitters of accepted transformations (and filters) will be included as co-authors on a paper announcing this framework.
|September 1, 2021||Pull request must be opened to be eligible for inclusion in the framework and associated paper|
|September 22, 2021||Review process for pull request above must be complete|
A transformation can be revised between the pull request submission and pull request merge deadlines. We will provide reviewer feedback to help with the revisions.
To quickly see transformations and filters in action, run through our colab notebook.
- Python 3.7
# When creating a new transformation, replace this with your forked repository (see below) git clone https://github.com/GEM-benchmark/NL-Augmenter.git cd NL-Augmenter python setup.py sdist pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.0/en_core_web_sm-2.2.0.tar.gz
How do I create a transformation?
First, fork the repository in GitHub! :fork_and_knife:
Your fork will have its own location, which we will call
Next, clone the forked repository and create a branch for your transformation, which here we will call my_awesome_transformation:
git clone $PATH_TO_YOUR_FORK cd NL-Augmenter git checkout -b my_awesome_transformation
We will base our transformation on an existing example.
Create a new transformation directory by copying over an existing transformation:
cd transformations/ cp -r butter_fingers_perturbation my_awesome_transformation cd my_awesome_transformation
Creating a transformation
- In the file
transformation.py, rename the class
MyAwesomeTransformationand choose one of the interfaces from the
interfaces/folder. See the full list of options here.
- Now put all your creativity in implementing the
generatemethod. If you intend to use external libraries, add them with their version numbers in
my_awesome_transformation/README.mdto describe your transformation.
Testing and evaluating (Optional)
Once you are done, add at least 5 example pairs as test cases in the file
test.json so that no one breaks your code inadvertently.
Once the transformation is ready, test it:
pytest -s --t=my_awesome_transformation
If you would like to evaluate your transformation against a common ?HuggingFace model, we encourage you to check evaluation
Code Styling To standardized the code we use the black code formatter which will run at the time of pre-commit.
To use the pre-commit hook, install
pip install pre-commit (should already be installed if you followed the above instructions).
pre-commit install to install the hook. On future commits, you should see the black code formatter is run on all python files you've staged for commit.
Once the tests pass and you are happy with the transformation, submit them for review.
First, commit and push your changes:
git add transformations/my_awesome_transformation/* git commit -m "Added my_awesome_transformation" git push --set-upstream origin my_awesome_transformation
Finally, submit a pull request.
git push command prints a URL that can be copied into a browser to initiate such a pull request.
Alternatively, you can do so from the GitHub website.
:sparkles: Congratulations, you've submitted a transformation to NL-Augmenter! :sparkles: