This repository implements MLP-Mixer as proposed in MLP-Mixer: An all-MLP Architecture for Vision. The paper introduces an all MLP (Multi-layer Perceptron) architecture for computer vision tasks. Yannic Kilcher walks through the architecture in this video.

Experiments reported in this repository are on CIFAR-10.

What's included?

  • Distributed training with mixed-precision.
  • Visualization of the token-mixing MLP weights.
  • A TensorBoard callback to keep track of the learned linear projections of the image patches.


Note: These notebooks are runnable on Colab. If you don't have access to a tensor-core GPU, please disable the mixed-precision block while running the code.


MLP-Mixer achieves competitive results. The figure below summarizes top-1 accuracies on CIFAR-10 test set with respect to varying MLP blocks.

Notable hyperparameters are:

  • Image size: 72x72
  • Patch size: 9x9
  • Hidden dimension for patches: 64
  • Hidden dimension for patches: 128

The table below reports the parameter counts for the different MLP-Mixer variants:

ResNet20 (0.571969 Million) achieves 78.14% under the exact same training configuration. Refer to this notebook for more details.


You can reproduce the results reported above. The model files are available here.


ML-GDE Program for providing GCP credits.