Anchor-t-SNE for large-scale and high-dimension vector visualization

AtSNE

AtSNE is a solution of high-dimensional data visualization problem. It can project large-scale high-dimension vectors into low-dimension space while keeping the pair-wise similarity amount point. AtSNE is efficient and scalable and can visualize 20M points in less than 5 hours using GPU. The spatial structure of its result is also robust to random initializations. It implements the algorithm of our KDD'19 paper - AtSNE: Efficient and Robust Visulization on GPU through Hierarchical Optimization

Visualization Examples

AtSNE

Performance

Compared Algorithms:

How to use

Requirement

CUDA (8 or later), nvcc and cublas included
gcc
faiss

Compile

Clone this project
init submodule (cmdline and faiss)
- enter the project root directory
- run git submodule init; git submodule update
Compile faiss, enter directory of faiss (vendor/faiss), and follow Step1 and Step3, confirm that vendor/faiss/libfaiss.a and vendor/faiss/gpu/libgpufaiss.a is generated. Simplified instructions are shown below:
- install required BLAS library (MKL, openblas): sudo apt install libopenblas-dev
- cd vender/faiss
- build faiss cpu library: ./configure && make -j8
- build faiss gpu library: cd gpu; make -j
enter project root directory, run make -j

Run

./qvis_gpu -b mnist_vec784D_data.txt.fvecs -o mnist_result.txt

We choose good default parameters for you. And there are many other parameters you can change. If you want to reproduce the test in our KDD paper, please add --n_negative 400.

./qvis_gpu -b mnist_vec784D_data.txt.fvecs --n_negative 400 -o mnist_result.txt

ivecs/fvecs vector file formats are defined here

Supplementary tools

There are some supplementary tools we use during developing/debugging/experimentation

tools/view.py Draw the result in 2D space and save images for you.
Label file is optional.
Use multi-process to draw images for results with the same filename-prefix
tools/txt_to_fvecs.py covert txt file, like result of largevVis or label file, to ivecs/fvecs
tools/largevis_convert.py convert dataset of fvecs/ivecs to largeVis input format
tools/imagenet_infer.py generate 128D imagenet feature vectors from ImageNet dataset
tools/box_filter.py Give a bounding-box, print the points and corresponding labels. Used for case-study in our paper
test_knn_accuracy (Build required) Test knn classifier accuracy(label needed) of visualization result
test_top1_error (Build required) Test top-1 error of visualization result. The top-1 error is the ratio that the nearest neighbor of one point in low-dimension is not the nearest neighbor in high-dimension