Francesco Croce* (University of Tübingen), Maksym Andriushchenko* (EPFL), Vikash Sehwag* (Princeton University), Nicolas Flammarion (EPFL), Mung Chiang (Purdue University), Prateek Mittal (Princeton University), Matthias Hein (University of Tübingen)

Main idea

The goal of RobustBench is to systematically track the real progress in adversarial robustness.
There are already more than 2'000 papers
on this topic, but it is still unclear which approaches really work and which only lead to overestimated robustness.
We start from benchmarking the Linf-robustness since it is the most studied setting in the literature.
We plan to extend the benchmark to other threat models in the future: first to other Lp-norms and then to more general perturbation sets
(Wasserstein perturbations, common corruptions, etc).

Robustness evaluation in general is not straightforward and requires adaptive attacks (Tramer et al., (2020)).
Thus, in order to establish a reliable standardized benchmark, we need to impose some restrictions on the defenses we consider.
In particular, we accept only defenses that are (1) have in general non-zero gradients wrt the inputs, (2) have a fully deterministic forward pass (i.e. no randomness) that
(3) does not have an optimization loop.
Often, defenses that violate these 3 principles only make gradient-based attacks
harder but do not substantially improve robustness (Carlini et al., (2019)) except those
that can present concrete provable guarantees (e.g. Cohen et al., (2019)).

RobustBench consists of two parts:

  • a website with the leaderboard based on many recent papers (plots below 👇)
  • a collection of the most robust models, Model Zoo, which are easy to use for any downstream application (see the tutorial below after FAQ 👇)