Image Classification Using Deep Learning

Learning and Building Convolutional Neural Networks using PyTorch. Models, selected are based on number of citation of the paper with the help of paperwithcode along with unique idea deviating from typical architecture like using transformers for CNN.

Image Classification is a fundamental computer vision task with huge scope in various applications like self driving cars, medical imaging, video frame prediction etc. Each model is either a new idea or built upon exisiting idea. We'll capture each of these idea and experiment and benchmark on predefined criteria.

Try Out In Google Colab

? Papers With Implementation

Base Config: { epochs: 10, lr: 0.001, batch_size: 128, img_resolution: 224, optim: adam }.

Some architecture like SqueezeNet, ShuffleNet, InceptionV3, EfficientNet, Darknet53 and others didn't work at base config because of increased complexity of the architecture, thus by reducing the batch size the architecture was executed in Google Colab and Kaggle.

Optimizer script: Includes the approx setup of the optimizer used in the original paper. Few setup led to Nan loss and were tweaked to make it work.

I've noticed that Google Colab has 12GB GPU while Kaggle has 16 GB GPU. So in worst case scenario, I've reduced the batch size in accordance to fit the Kaggle GPU. Just to mention, I use RTX2070 8GB.

CNN Based Accuracy Parameters FLOPS Configuration LR-Scheduler(Accuracy)
AlexNet 71.27 58.32M 1.13GFlops - CyclicLR(79.56)
VGGNet 75.93 128.81M 7.63GFlops - -
Network In Network 71.03 2.02M 0.833GFlops - -
ResNet 83.39 11.18M 1.82GFlops - CyclicLR(74.9)
DenseNet-Depth40 68.25 0.18M - B_S = 8 -
MobileNetV1 81.72 3.22M 0.582GFlops - -
MobileNetV2 83.99 2.24M 0.318GFlops - -
GoogLeNet 80.28 5.98M 1.59GFlops - -
InceptionV3 - - 209.45GFlops H_C_R
Darknet-53 - - 7.14GFlops H_C_R
Xception 85.9 20.83M 4.63GFlops B_S = 96 -
ResNeXt - 69.41GFlops H_C_R
SENet 83.39 11.23M 1.82GFlops - CyclicLR(78.10)
SqueezeNet 62.2 0.73M 2.64GFlops B_S = 64
ShuffleNet 2.03GFlops B_S = 32 -
EfficientNet-B0 - 4.02M 0.4GFlops
Transformer Based
ViT - 53.59M - - WarmupCosineSchedule(55.34)
MLP Based
MLP-Mixer 68.52 13.63M - - WarmupLinearSchedule(69.5)
ResMLP 65.5 14.97M - - -

B_S - Batch Size \
H_C_R - High Compute Required

Note: Marked few cells as high compute required because even with batch_size = 8, the kaggle compute was not enough. The performance of the model especially with regards to accuracy is less because the model runs only for 10 epochs, with more epochs the model converges further. Learning rate scheduler is underestimated, try out various learning rate scheduler to get the maximum out of the network.

Google Colab Notebook to tune hyperparameters with Weights and Biases Visualizations

Create Environment

python -m venv CNNs 
source CNNs/bin/activate 
git clone


pip install -r requirements.txt


python --model=resnet

To Save Model

python --model=resnet --model_save=True

To Create Checkpoint

python --model=resnet --checkpoint=True

Note: Parameters can be changed in YAML file. The module supports only two datasets, MNIST and CIFAR-10, but you can modify the dataset file and include any other datasets.


By default, the plots between train & test accuracy, train & test loss is stored in plot folder for each model.

Model Train vs Test Accuracy Train vs Test Loss
AlexNet AlexNet Accuracy Curve AlexNet Loss Curve
ResNet ResNet Accuracy Curve ResNet Loss Curve
ViT ViT Accuracy Curve ViT Loss Curve
MLP-Mixer MLP-Mixer Accuracy Curve MLP-Mixer Loss Curve
ResMLP ResMLP Accuracy Curve ResMLP Loss Curve
SENet SENet Accuracy Curve SENet Loss Curve