This repository implements MLP-Mixer as proposed in MLP-Mixer: An all-MLP Architecture for Vision. The paper introduces an all MLP (Multi-layer Perceptron) architecture for computer vision tasks. Yannic Kilcher walks through the architecture in this video.
Experiments reported in this repository are on CIFAR-10.
- Distributed training with mixed-precision.
- Visualization of the token-mixing MLP weights.
- A TensorBoard callback to keep track of the learned linear projections of the image patches.
MLP_Mixer_Training.ipynb: MLP-Mixer utilities along with model training.
ResNet20.ipynb: Trains a ResNet20 for comparison purposes.
Visualization.ipynb: Visualizes the learned projections and token-mixing MLPs.
Note: These notebooks are runnable on Colab. If you don't have access to a tensor-core GPU, please disable the mixed-precision block while running the code.
MLP-Mixer achieves competitive results. The figure below summarizes top-1 accuracies on CIFAR-10 test set with respect to varying MLP blocks.
Notable hyperparameters are:
- Image size: 72x72
- Patch size: 9x9
- Hidden dimension for patches: 64
- Hidden dimension for patches: 128
The table below reports the parameter counts for the different MLP-Mixer variants:
ResNet20 (0.571969 Million) achieves 78.14% under the exact same training configuration. Refer to this notebook for more details.
You can reproduce the results reported above. The model files are available here.
ML-GDE Program for providing GCP credits.