ImageNet-CoG is a benchmark for concept generalization. It provides a full evaluation framework for pre-trained visual representations which measure how well they generalize to unseen concepts.
Gait Recognition in the Wild: A Benchmark
Reproducible nvim completion framework benchmarks
A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models
TruthfulQA: Measuring How Models Imitate Human Falsehoods
wireguard-config-benchmark is a python script that benchmarks the download speeds for the connections defined in one or more wireguard config files
Standard implementations of FedLab and its provided benchmarks
Model Quantization Benchmark in python
The program calculates pi with an accuracy of 10,000 decimal places. The time spent on the calculation is counted as the test result. The result is determined by the average of 10 attempts. Lower is Better.
A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms