TensorFlow implementation of Attention is all you need (Transformer)

Dec 30, 2021 2 min read

[TensorFlow 2] Attention is all you need (Transformer)

TensorFlow implementation of “Attention is all you need (Transformer)”

Dataset

The MNIST dataset is used for confirming the working of the transformer.
The dataset is processed as follows for regarding as a sequential form.

Trim off the sides from the square image.
- (H X W) -> (H X W_trim)
  - H (Height) = W (Width) = 28
  - W_trim = 18
- The height axis is regarded as a sequence and the width axis is regarded as a feature of each sequence.
  - (H X W) = (S X F)
  - S (Sequence) = 28
  - F (Feature) = 18
Specify the target Y as an inverse sequence of X to differentiate the input sequence from the target sequence.
- In the figure, the data is shown in an upside-down form.

Results

Training

Generation

Class	Attention Map	Reconstruction
0
1
2
3
4
5
6
7
8
9

Requirements

Tensorflow 2.4.0
whiteboxlayer 0.2.1

Reference

[1] Vaswani, Ashish, et al. Attention is all you need. Advances in neural information processing systems. 2017.

GitHub

View Github

John was the first writer to have joined pythonawesome.com. He has since then inculcated very effective writing and reviewing culture at pythonawesome which rivals have found impossible to imitate.