AtSNE
AtSNE is a solution of high-dimensional data visualization problem. It can project large-scale high-dimension vectors into low-dimension space while keeping the pair-wise similarity amount point. AtSNE is efficient and scalable and can visualize 20M points in less than 5 hours using GPU. The spatial structure of its result is also robust to random initializations. It implements the algorithm of our KDD'19 paper - AtSNE: Efficient and Robust Visulization on GPU through Hierarchical Optimization
Visualization Examples
Performance
Compared Algorithms:
How to use
Requirement
- CUDA (8 or later), nvcc and cublas included
- gcc
- faiss
Compile
- Clone this project
- init submodule (cmdline and faiss)
- enter the project root directory
- run
git submodule init; git submodule update
- Compile faiss, enter directory of faiss (
vendor/faiss
), and follow Step1 and Step3, confirm thatvendor/faiss/libfaiss.a
andvendor/faiss/gpu/libgpufaiss.a
is generated. Simplified instructions are shown below:- install required BLAS library (MKL, openblas):
sudo apt install libopenblas-dev
cd vender/faiss
- build faiss cpu library:
./configure && make -j8
- build faiss gpu library:
cd gpu; make -j
- install required BLAS library (MKL, openblas):
- enter project root directory, run
make -j
Run
./qvis_gpu -b mnist_vec784D_data.txt.fvecs -o mnist_result.txt
We choose good default parameters for you. And there are many other parameters you can change. If you want to reproduce the test in our KDD paper, please add --n_negative 400
.
./qvis_gpu -b mnist_vec784D_data.txt.fvecs --n_negative 400 -o mnist_result.txt
ivecs/fvecs vector file formats are defined here
Supplementary tools
There are some supplementary tools we use during developing/debugging/experimentation
tools/view.py
Draw the result in 2D space and save images for you.- Label file is optional.
- Use multi-process to draw images for results with the same filename-prefix
tools/txt_to_fvecs.py
covert txt file, like result of largevVis or label file, to ivecs/fvecstools/largevis_convert.py
convert dataset of fvecs/ivecs to largeVis input formattools/imagenet_infer.py
generate 128D imagenet feature vectors from ImageNet datasettools/box_filter.py
Give a bounding-box, print the points and corresponding labels. Used for case-study in our papertest_knn_accuracy
(Build required) Test knn classifier accuracy(label needed) of visualization resulttest_top1_error
(Build required) Test top-1 error of visualization result. The top-1 error is the ratio that the nearest neighbor of one point in low-dimension is not the nearest neighbor in high-dimension