Estimating the instrinsic dimensionality of image datasets

Code for: The Intrinsic Dimensionaity of Images and Its Impact On Learning – Phillip Pope and Chen Zhu, Ahmed Abdelkader, Micah Goldblum, Tom Goldstein (ICLR 2021, spotlight)

Basenjis of Varying dimensionality


This code was developed in the following environment

conda create dimensions python=3.6 jupyter matplotlib scikit-learn pytorch==1.5.0 torchvision cudatoolkit=10.2 -c pytorch

To generate new data of controlled dimensionality with GANs, you must install:

pip install pytorch-pretrained-biggan

To use the shortest-path method (Granata and Carnevale 2016) you must also compile the fast graph shortest path code gsp (written by Jake VdP + Sci-Kit Learn)

cd estimators/gsp
python setup.py install

Generate data of controlled dimensionality

python generate_data/gen_images.py \
  --num_samples 1000 \
  --class_name basenji \
  --latent_dim 16 \
  --batch_size 100 \
  --save_dir samples/basenji_16

Estimate dimension of generated samples

To run the MLE (Levina and Bickel) estimator on the synthetic GAN data generated above:

python main.py \
    --estimator mle \
    --k1 25 \
    --single-k \
    --eval-every-k \
    --average-inverse \
    --dset  samples/basenji_16 \
    --max_num_samples 1000 \
    --save-path results/basenji_16.json

Use --estimators to try different estimators


If you find our paper or code useful, please cite our paper:

  author    = {Phillip Pope and
               Chen Zhu and
               Ahmed Abdelkader and
               Micah Goldblum and
               Tom Goldstein},
  title     = {The Intrinsic Dimension of Images and Its Impact on Learning},
  booktitle = {9th International Conference on Learning Representations, {ICLR} 2021,
               Virtual Event, Austria, May 3-7, 2021},
  publisher = {OpenReview.net},
  year      = {2021},
  url       = {https://openreview.net/forum?id=XJk19XzGq2J},
  timestamp = {Wed, 23 Jun 2021 17:36:39 +0200},
  biburl    = {https://dblp.org/rec/conf/iclr/PopeZAGG21.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}


We gratefully acknowledge use of the following codebases when developing our dimensionality estimators:

We also thank Prof. Vishnu Boddeti for clarifying comments on the graph-distance estimator.


This code released as is. We will do our best to address questions/bugs, but cannot guarantee support.