modelvshuman is a Python library to benchmark the gap between human and machine vision. Using this library, both PyTorch and TensorFlow models can be evaluated on 17 out-of-distribution datasets with high-quality human comparison data.
The top-10 models are listed here; training dataset size is indicated in brackets. Additionally, standard ResNet-50 is included as the last entry of the table for comparison. Model ranks are calculated across the full range of 52 models that we tested. If your model scores better than some (or even all) of the models here, please open a pull request and we'll be happy to include it here!
Most human-like behaviour
|winner||model||accuracy difference ↓||observed consistency ↑||error consistency ↑||mean rank ↓|
|:1st_place_medal:||CLIP: ViT-B (400M)||.023||.758||.281||1|
|:2nd_place_medal:||SWSL: ResNeXt-101 (940M)||.028||.752||.237||3.67|
|:3rd_place_medal:||BiT-M: ResNet-101x1 (14M)||.034||.733||.252||4|
|:clap:||BiT-M: ResNet-152x2 (14M)||.035||.737||.243||4.67|
|:clap:||BiT-M: ResNet-152x4 (14M)||.035||.732||.233||7.33|
|:clap:||BiT-M: ResNet-50x1 (14M)||.042||.718||.240||9|
|:clap:||BiT-M: ResNet-50x3 (14M)||.040||.726||.228||9|
|:clap:||SWSL: ResNet-50 (940M)||.041||.727||.211||11.33|
|...||standard ResNet-50 (1M)||.087||.665||.208||29|
Highest out-of-distribution robustness
|winner||model||OOD accuracy ↑||rank ↓|
|:2nd_place_medal:||CLIP: ViT-B (400M)||.708||2|
|:clap:||SWSL: ResNeXt-101 (940M)||.698||4|
|:clap:||BiT-M: ResNet-152x2 (14M)||.694||5|
|:clap:||BiT-M: ResNet-152x4 (14M)||.688||6|
|:clap:||BiT-M: ResNet-101x3 (14M)||.682||7|
|:clap:||BiT-M: ResNet-50x3 (14M)||.679||8|
|:clap:||SimCLR: ResNet-50x4 (1M)||.677||9|
|:clap:||SWSL: ResNet-50 (940M)||.677||10|
|...||standard ResNet-50 (1M)||.559||31|
Simply clone the repository to a location of your choice and follow these steps:
Set the repository home path by running the following from the command line:
Install package (remove the -e option if you don't intend to add your own model or make any other changes)
pip install -e .
:microscope: User experience
examples/evaluate.py as desired. This will test a list of models on out-of-distribution datasets, generating plots. If you then compile
latex-report/report.tex, all the plots will be included in one convenient PDF report.
:camel: Model zoo
The following models are currently implemented:
- [x] 20+ standard supervised models from the torchvision model zoo
- [x] 5 self-supervised contrastive models (InsDis, MoCo, MoCoV2, InfoMin, PIRL) from the pycontrast repo
- [x] 3 self-supervised contrastive SimCLR model variants (simclr_resnet50x1, simclr_resnet50x2, simclr_resnet50x4) from the ptrnet repo
- [x] 3 vision transformer variants (vit_small_patch16_224, vit_base_patch16_224 and vit_large_patch16_224) from the pytorch-image-models repo
- [x] 10 adversarially "robust" models from robust-models-transfer repo implemented via the ptrnet repo
- [x] 3 "ShapeNet" ResNet-50 models with different degree of stylized training from the texture-vs-shape repo
- [x] 3 BagNet models from the BagNet repo
- [x] 1 semi-supervised ResNet-50 model pre-trained on 940M images from the semi-supervised-ImageNet1K-models repo
- [x] 6 Big Transfer models from the pytorch-image-models repo
If you e.g. add/implement your own model, please make sure to compute the ImageNet accuracy as a sanity check.
How to load a model
If you just want to load a model from the model zoo, this is what you can do:
# loading a PyTorch model from the zoo from modelvshuman.models.pytorch.model_zoo import InfoMin model = InfoMin("InfoMin") # loading a Tensorflow model from the zoo from modelvshuman.models.tensorflow.model_zoo import efficientnet_b0 model = efficientnet_b0("efficientnet_b0")
How to list all available models
All implemented models are registered by the model registry, which can then be used to list all available models of a certain framework with the following method:
from modelvshuman import models print(models.list_models("pytorch")) print(models.list_models("tensorflow"))
How to add a new model
Adding a new model is possible for standard PyTorch and TensorFlow models. Depending on the framework (pytorch / tensorflow), open
modelvshuman/models/<framework>/model_zoo.py. Here, you can add your own model with a few lines of code - similar to how you would load it usually. If your model has a custom model definition, create a new subdirectory called
modelvshuman/models/<framework>/my_fancy_model/fancy_model.py which you can then import from
from .my_fancy_model import fancy_model.
In total, 17 datasets with human comparison data collected under highly controlled laboratory conditions are available.
Twelve datasets correspond to parametric or binary image distortions. Top row: colour/grayscale, contrast, high-pass, low-pass (blurring), phase noise, power equalisation. Bottom row: opponent colour, rotation, Eidolon I, II and III, uniform noise.
The remaining five datasets correspond to the following nonparametric image manipulations: sketch, stylized, edge, silhouette, texture-shape cue conflict.
How to load a dataset
Similarly, if you're interested in just loading a dataset, you can do this via:
from modelvshuman.datasets import sketch dataset = sketch(batch_size=16, num_workers=4)
How to list all available datasets
from modelvshuman import datasets print(list(datasets.list_datasets().keys()))