AtSNE

AtSNE is a solution of high-dimensional data visualization problem. It can project large-scale high-dimension vectors into low-dimension space while keeping the pair-wise similarity amount point. AtSNE is efficient and scalable and can visualize 20M points in less than 5 hours using GPU. The spatial structure of its result is also robust to random initializations. It implements the algorithm of our KDD'19 paper - AtSNE: Efficient and Robust Visulization on GPU through Hierarchical Optimization

Visualization Examples

AtSNE

Performance

Compared Algorithms:

How to use

Requirement

  • CUDA (8 or later), nvcc and cublas included
  • gcc
  • faiss

Compile

  1. Clone this project
  2. init submodule (cmdline and faiss)
    • enter the project root directory
    • run git submodule init; git submodule update
  3. Compile faiss, enter directory of faiss (vendor/faiss), and follow Step1 and Step3, confirm that vendor/faiss/libfaiss.a and vendor/faiss/gpu/libgpufaiss.a is generated. Simplified instructions are shown below:
    • install required BLAS library (MKL, openblas): sudo apt install libopenblas-dev
    • cd vender/faiss
    • build faiss cpu library: ./configure && make -j8
    • build faiss gpu library: cd gpu; make -j
  4. enter project root directory, run make -j

Run

./qvis_gpu -b mnist_vec784D_data.txt.fvecs -o mnist_result.txt

We choose good default parameters for you. And there are many other parameters you can change. If you want to reproduce the test in our KDD paper, please add --n_negative 400.

./qvis_gpu -b mnist_vec784D_data.txt.fvecs --n_negative 400 -o mnist_result.txt

ivecs/fvecs vector file formats are defined here

Supplementary tools

There are some supplementary tools we use during developing/debugging/experimentation

  • tools/view.py Draw the result in 2D space and save images for you.
  • Label file is optional.
  • Use multi-process to draw images for results with the same filename-prefix
  • tools/txt_to_fvecs.py covert txt file, like result of largevVis or label file, to ivecs/fvecs
  • tools/largevis_convert.py convert dataset of fvecs/ivecs to largeVis input format
  • tools/imagenet_infer.py generate 128D imagenet feature vectors from ImageNet dataset
  • tools/box_filter.py Give a bounding-box, print the points and corresponding labels. Used for case-study in our paper
  • test_knn_accuracy (Build required) Test knn classifier accuracy(label needed) of visualization result
  • test_top1_error (Build required) Test top-1 error of visualization result. The top-1 error is the ratio that the nearest neighbor of one point in low-dimension is not the nearest neighbor in high-dimension

GitHub