A Pytorch Implementation of the Transformer: Attention Is All You Need
Our implementation is largely based on Tensorflow implementation
- NumPy >= 1.11.1
- Pytorch >= 0.3.0
- tensorboard-pytorch (build from source)
Why This Project?
I’m a freshman of pytorch. So I tried to implement some projects by pytorch. Recently, I read the paper Attention is all you need and impressed by the idea. So that’s it. I got similar result compared with the original tensorflow implementation.
Differences with the original paper
I don’t intend to replicate the paper exactly. Rather, I aim to implement the main ideas in the paper and verify them in a SIMPLE and QUICK way. In this respect, some parts in my code are different than those in the paper. Among them are
- I used the IWSLT 2016 de-en dataset, not the wmt dataset because the former is much smaller, and requires no special preprocessing.
- I constructed vocabulary with words, not subwords for simplicity. Of course, you can try bpe or word-piece if you want.
- I parameterized positional encoding. The paper used some sinusoidal formula, but Noam, one of the authors, says they both work. See the discussion in reddit
- The paper adjusted the learning rate to global steps. I fixed the learning to a small number, 0.0001 simply because training was reasonably fast enough with the small dataset (Only a couple of hours on a single GTX 1060!!).
hyperparams.pyincludes all hyper parameters that are needed.
prepro.pycreates vocabulary files for the source and the target.
data_load.pycontains functions regarding loading and batching data.
modules.pyhas all building blocks for encoder/decoder networks.
train.pyhas the model.
eval.pyis for evaluation.
- STEP 1. Download IWSLT 2016 German–English parallel corpus and extract it to
wget -qO- https://wit3.fbk.eu/archive/2016-01//texts/de/en/de-en.tgz | tar xz; mv de-en corpora
- STEP 2. Adjust hyper parameters in
- STEP 3. Run
prepro.pyto generate vocabulary files to the
- STEP 4. Run
train.pyor download pretrained weights, put it into folder ‘./models/’ and change the
- STEP 5. Show loss and accuracy in tensorboard
tensorboard --logdir runs
I got a BLEU score of 16.7.(tensorflow implementation 17.14) (Recollect I trained with a small dataset, limited vocabulary) Some of the evaluation results are as follows. Details are available in the
source: Ich bin nicht sicher was ich antworten soll
expected: I’m not really sure about the answer
got: I’m not sure what I’m going to answer
source: Was macht den Unterschied aus
expected: What makes his story different
got: What makes a difference
source: Vielen Dank
expected: Thank you
got: Thank you
source: Das ist ein Baum
expected: This is a tree
got: So this is a tree