# DeepMind Perceiver in PyTorch

*Disclaimer: This is not official and I'm not affiliated with DeepMind.*

My implementation of the *Perceiver: General Perception with Iterative Attention*. You can read more about the model on DeepMind's website.

I trained an MNIST model which you can find in `models/mnist.pkl`

or by using `perceiver.load_mnist_model()`

. It gets 96.02% on the test-data.

## Getting started

To run this you need PyTorch installed:

`pip3 install torch`

From `perceiver`

you can import `Perceiver`

or `PerceiverLogits`

.

Then you can use it as such (or look in `examples.ipynb`

):

```
from perceiver import Perceiver
model = Perceiver(
input_channels, # <- How many channels in the input? E.g. 3 for RGB.
input_shape, # <- How big is the input in the different dimensions? E.g. (28, 28) for MNIST
fourier_bands=4, # <- How many bands should the positional encoding have?
latents=64, # <- How many latent vectors?
d_model=32, # <- Model dimensionality. Every pixel/token/latent vector will have this size.
heads=8, # <- How many heads in self-attention? Cross-attention always has 1 head.
latent_blocks=6, # <- How much latent self-attention for each cross attention with the input?
dropout=0.1, # <- Dropout
layers=8, # <- This will become two unique layer-blocks: layer 1 and layer 2-8 (using weight sharing).
)
```

The above model outputs the latents after the final layer. If you want logits instead, use the following model:

```
from perceiver import PerceiverLogits
model = PerceiverLogits(
input_channels, # <- How many channels in the input? E.g. 3 for RGB.
input_shape, # <- How big is the input in the different dimensions? E.g. (28, 28) for MNIST
output_features, # <- How many different classes? E.g. 10 for MNIST.
fourier_bands=4, # <- How many bands should the positional encoding have?
latents=64, # <- How many latent vectors?
d_model=32, # <- Model dimensionality. Every pixel/token/latent vector will have this size.
heads=8, # <- How many heads in self-attention? Cross-attention always has 1 head.
latent_blocks=6, # <- How much latent self-attention for each cross attention with the input?
dropout=0.1, # <- Dropout
layers=8, # <- This will become two unique layer-blocks: layer 1 and layer 2-8 (using weight sharing).
)
```

To use my pre-trained MNIST model:

```
from perceiver import load_mnist_model
model = load_mnist_model()
```

## TODO:

- Positional embedding generalized to
*n*dimensions (with fourier features) - Train other models (like CIFAR-100 or something not in the image domain)
- Type indication
- Unit tests for components of model
- Package