ADBench: Anomaly Detection Benchmark

Official implementation of paper ADBench: Anomaly Detection Benchmark. Please star, watch, and fork ADBench for the active updates!

Who Are We? ✨

ADBench is a colloborative product between researchers at Shanghai University of Finance and Economics (SUFE) and Carnegie Mellon University (CMU). The project is designed and conducted by Minqi Jiang (SUFE) and Yue Zhao (CMU) and Xiyang Hu (CMU)–the author(s) of important anomaly detection libraries, including anomaly detection for tabular (PyOD), time-series (TODS), and graph data (PyGOD).

Why Do You Need ADBench?

ADBench is (to our best knowledge) the most comprehensive tabular anomaly detection benchmark, where we analyze the performance of 30 anomaly detection algorithms on 55 benchmark datasets. By analyzing both research needs and deployment requirements in industry, ADBench conducts 93,654 experiments with three major angles:

the effect of supervision (e.g., ground truth labels) by including 14 unsupervised, 7 semi-supervised, and 9 supervised methods;
algorithm performance under different types of anomalies by simulating the environments with 4 types of anomalies; and
algorithm robustness and stability under 3 settings of data corruptions.

Key Takeaways: Adbench answers many questions for both researchers with interesting findings:

‼️ surprisingly none of the benchmarked unsupervised algorithms is statistically better than others, emphasizing the importance of algorithm selection;
‼️ with merely 1% labeled anomalies, most semi-supervised methods can outperform the best unsupervised method, justifying the importance of supervision;
in controlled environments, we observe that best unsupervised methods for specific types of anomalies are even better than semi- and fully-supervised methods, revealing the necessity of understanding data characteristics;
semi-supervised methods show potential in achieving robustness in noisy and corrupted data, possibly due to their efficiency in using labels and feature selection;
⁉️ and many more can be found in our papers (Section 4)

The Figure below provides an overview of our proposed ADBench (see our paper for details).

How to use ADBench?

We envision three primary usages of ADBench:

Have better understanding of anomaly detection algorithms: please read our paper for details
Conduct future research on anomaly detection: we list 4 important future research questions in the paper–see Section 4 to see some thoughts!
Access rich algorithm implementation and datasets: see details below for how to use them
Benchmark your anomaly detection algorithms: see notebook for instruction.

Dependency

The experiment code is written in Python 3 and built on a number of Python packages:

scikit-learn==0.20.3
pyod==0.9.8
Keras==2.3.0 (required only for certain deep learning methods)
tensorflow==1.15.0 (required only for certain deep learning methods)
torch==1.9.0 (required only for certain deep learning methods)
rtdl==0.0.13

Datasets

ADBench includes 55 existing and freshly proposed datasets, as shown in the following Table.

Among them, 48 widely-used real-world datasets are gathered for model evaluation, which cover many application domains, including healthcare (e.g., disease diagnosis), audio and language processing (e.g., speech recognition), image processing (e.g., object identification), finance (e.g., financial fraud detection), etc.

Moreover, as most of these datasets are relatively small, we introduce 7 more complex datasets from CV and NLP domains with more samples and richer features in ADBench. Pretrained models are applied to extract data embedding from NLP and CV datasets to access more complex representation. For NLP datasets, we use BERT pretrained on the BookCorpus and English Wikipedia to extract the embedding of the [CLS] token. For CV datasets, we use ResNet18 pretrained on the ImageNet to extract the embedding after the last average pooling layer.

Algorithms

Compared to the previous benchmark studies, we have a larger algorithm collection with

latest unsupervised AD algorithms like DeepSVDD and ECOD;
SOTA semi-supervised algorithms, including DeepSAD and DevNet;
latest network architectures like ResNet in computer vision (CV) and Transformer in natural language processing (NLP) domain —we adapt ResNet and FTTransformer models for tabular AD in the proposed ADBench; and
ensemble learning methods like LightGBM, XGBoost, and CatBoost. The Figure below shows the algorithms (14 unsupervised, 7 semi-supervised, and 9 supervised algorithms) in ADBench.

For each algorithm, we also introduce its specific implementation in the following Table.

Model	Year	Type	DL	Import	Source
PCA	Before 2017	Unsup	✗	from baseline.PyOD import PYOD	Link
OCSVM	Before 2017	Unsup	✗	from baseline.PyOD import PYOD	Link
LOF	Before 2017	Unsup	✗	from baseline.PyOD import PYOD	Link
CBLOF	Before 2017	Unsup	✗	from baseline.PyOD import PYOD	Link
COF	Before 2017	Unsup	✗	from baseline.PyOD import PYOD	Link
HBOS	Before 2017	Unsup	✗	from baseline.PyOD import PYOD	Link
KNN	Before 2017	Unsup	✗	from baseline.PyOD import PYOD	Link
SOD	Before 2017	Unsup	✗	from baseline.PyOD import PYOD	Link
COPOD	2020	Unsup	✗	from baseline.PyOD import PYOD	Link
ECOD	2022	Unsup	✗	from baseline.PyOD import PYOD	Link
IForest†	Before 2017	Unsup	✗	from baseline.PyOD import PYOD	Link
LODA†	Before 2017	Unsup	✗	from baseline.PyOD import PYOD	Link
DeepSVDD	2018	Unsup	✓	from pyod.models.deep_svdd import DeepSVDD	Link
DAGMM	2018	Unsup	✓	from baseline.DAGMM.run import DAGMM	Link
GANomaly	2018	Semi	✓	from baseline.GANomaly.run import GANomaly	Link
XGBOD†	2018	Semi	✗	from baseline.PyOD import PYOD	Link
DeepSAD	2019	Semi	✓	from baseline.DeepSAD.src.run import DeepSAD	Link
REPEN	2018	Semi	✓	from baseline.REPEN.run import REPEN	Link
DevNet	2019	Semi	✓	from baseline.DevNet.run import DevNet	Link
PReNet	2020	Semi	✓	from baseline.PReNet.run import PReNet	/
FEAWAD	2021	Semi	✓	from baseline.FEAWAD.run import FEAWAD	Link
NB	Before 2017	Sup	✗	from baseline.Supervised import supervised	Link
SVM	Before 2017	Sup	✗	from baseline.Supervised import supervised	Link
MLP	Before 2017	Sup	✓	from baseline.Supervised import supervised	Link
RF†	Before 2017	Sup	✗	from baseline.Supervised import supervised	Link
LGB†	NIPS, 2017	Supervised	✗	from baseline.Supervised import supervised	Link
XGB†	Before 2017	Sup	✗	from baseline.Supervised import supervised	Link
CatB†	2019	Sup	✗	from baseline.Supervised import supervised	Link
ResNet	2019	Sup	✓	from baseline.FTTransformer.run import FTTransformer	Link
FTTransformer	2019	Sup	✓	from baseline.FTTransformer.run import FTTransformer	Link

‘†’ marks ensembling.
Un-, semi-, and fully-supervised methods are denoted as unsup, semi and sup, respectively.

Results in Our Papers

For complete results of ADBench, please refer to the original paper.
For reproduce experiment results of ADBench, please run the code in run.py.

Quickly implement ADBench for benchmarking AD algorithms.

We provide an example for quickly implementing ADBench for any customized (AD) algorithms, as shown in run_customized.ipynb.

GitHub

View Github

ADBench: Anomaly Detection Benchmark

Who Are We? ✨

Why Do You Need ADBench?