Improving Generalization via Scalable Neighborhood Component Analysis.

This repo constains the pytorch implementation for the ECCV 2018 paper (paper). We use deep networks to learn feature representations optimized for nearest neighbor classifiers, which could generalize better for new object categories. This project is a re-investigation of Neighborhood Component Analysis (NCA) with recent technologies to make it scalable to deep networks and large-scale datasets.

Much of code is extended from the previous unsupervised learning project. Please refer to this repo for more details.

Pretrained Models

Currently, we provide three pretrained ResNet models.
Each release contains the feature representation of all ImageNet training images (600 mb) and model weights (100-200mb).
Models and their performance with nearest neighbor classifiers are as follows.

Code to reproduce the rest of the experiments are comming soon.

Nearest Neighbors

Please follow this link for a list of nearest neighbors on ImageNet.
Results are visualized from our ResNet50 feature, compared with baseline ResNet50 feature, raw image features and previous unsupervised features.
First column is the query image, followed by 20 retrievals ranked by the similarity.


Our code extends the pytorch implementation of imagenet classification in official pytorch release.
Please refer to the official repo for details of data preparation and hardware configurations.

  • install python2 and pytorch>=0.4

  • clone this repo: git clone

  • Training on ImageNet:

    python DATAPATH --arch resnet18 -j 32 --temperature 0.05 --low-dim 128 -b 256

    • During training, we monitor the supervised validation accuracy by K nearest neighbor with k=1, as it's faster, and gives a good estimation of the feature quality.
  • Testing on ImageNet:

    python DATAPATH --arch resnet18 --resume input_model.pth.tar -e runs testing with default K=30 neighbors.

  • Memory Consumption and Computation Issues

    Memory consumption is more of an issue than computation time.
    Currently, the implementation of nca module is not paralleled across multiple GPUs.
    Hence, the first GPU will consume much more memory than the others.
    For example, when training a ResNet18 network, GPU 0 will consume 11GB memory, while the others each takes 2.5GB.
    You will need to set the Caffe style "-b 128 --iter-size 2" for training deeper networks.
    Our released models are trained with V100 machines.

  • Training on CIFAR10:

    python --temperature 0.05 --lr 0.1