ZEBRA: Zero Evidence Biometric Recognition Assessment
license: LGPLv3 – please reference our paper
author: Andreas Nautsch (EURECOM)
Disclaimer - this toolkit is a standalone implementation of our paper Nautsch, Patino, Tomashenko, Yamagishi, Noe, Bonastre, Todisco and Evans: "The Privacy ZEBRA: Zero Evidence Biometric Recognition Assessment" in Proc. Interspeech 2020
This work is academic (non-for-profit);
a reference implementation without warranty.
What is the ZEBRA framework?
How can we assess for privacy preservation in the processing of human signals, such as speech data?
Mounting privacy legislation calls for the preservation of privacy in speech technology, though solutions are gravely lacking. While evaluation campaigns are long-proven tools to drive progress, the need to consider a privacy adversary implies that traditional approaches to evaluation must be adapted to the assessment of privacy and privacy preservation solutions. This paper presents the first step in this direction: metrics.
We propose the ZEBRA framework which is inspired by forensic science.
On the contrary to method validation in modern cryptography, which is backboned by
zero-knowledge proofs (see Shanon), we need to tackle
zero-evidence. The former defines input data (e.g., an A is represented by the number 65); the latter models input data (we can only describe e.g., acoustic data, biometric identities, and semantic meaning).
Communication is more than the written word; we need to leave the stiff perspective of the written word behind, when the medium changes to speech and to other human signals (e.g., video surveillance).
Privacy preservation for human data is not binary
Only levels of privacy preservation can be quantified (theoretic proofs for a yes/no decision are unavailble).
The ZEBRA framework compares candidate privacy safeguards in an after-the-fact evaluation:
- candidate algorithms protect human signals (e.g., speech) regarding the disclosure of specific sensitive information (e.g., the biometric identity);
- knowing the facts of how much sensitive information could be exposed, how much is exposed after using each candidate safeguard?
Privacy & the realm of the adversary
The conventional signal processing or machine learning perspectives as system evaluators does not suffice anymore!
Adversaries are evaluators of safeguard evaluators.
We need to shift our perspective.
To optimize algorithms and their parameters, we are used to improve some average/expected performance loss.
- Expectation values reflect on a population level, yet, privacy is a fundamental human right that is: for each individual – how badly is information disclosed for those who belong to a minority in the eyes of a candidate privacy safeguard?
- An adversary can only infer information based on observations – figuratively speaking like a judge/jury assesses evidence.
- By formalizing decision inference based on evidence, the
strength of evidenceis estimated – it allows to reflect to which extent one of two decisions should be favored over another; given the circumstances of a case – an individual performance.
- Given the circumstances is formalized by the
prior belief; like forensic practitioners cannot know the prior belief of a judge/jury, we cannot know the prior belief of an adversary.
- Empirical Cross-Entropy (ECE) plots are introduced in forensic voice biometrics to simulate ECE for all possible prior beliefs, such that one can report an expected gain in relative information – an average/expected performance.
- Categorical tags are introduced in forensic science and constantly refined since the 1960s to summarize different levels of
strength of evidenceinto a scale that is easier to digest by the human mind.
ZEBRA, a zero-evidence framework to assess for preserving privacy on empirical data
The proposed ZEBRA framework has two metrics:
- on the population level, the expected ECE is quantified by integrating out all possible prior beliefs; the result is:
expected empirical cross-entropy [in bits]which is
0 bitfor full privacy; and
1/log(4) ~ 0.721 bitfor no privacy.
- on the individual level, the worst-case strength of evidence is quantified. In forensic science, the strength of evidence is referred to a so-called log-likelihood ratio (LLR) which symmetrically encodes the relative strength of evidence of one possible decision outcome over the other; an LLR of
0 means zero strength of evidence for either possible deicision outcome– on the contrary,
values towards inifnity would resembe towards 'inifnitely decisive' evidence(no privacy). The worst-case strength of evidence is the
Categorical tags summarizes the maximum(absolute(LLR)) value; an example adopted from the literature:
|Tag||Description (for a 50:50 prior belief)|
|0||50:50 decision making of the adversary|
|A||adversary makes better decisions than 50:50|
|B||adversary makes 1 wrong decision out of 10 to 100|
|C||adversary makes 1 wrong decision out of 100 to 1000|
|D||adversary makes 1 wrong decision out of 1000 to 100.000|
|E||adversary makes 1 wrong decision out of 100.000 to 1.000.000|
|F||adversary makes 1 wrong decision in at least 1.000.000|
The better an adversary can make decisions, despite the privacy preservation of a candidate safeguard applied, the worse is the categorical tag.
Scope of this ZEBRA reference implementation
Computation and visualization of the ZEBRA framework.
- Metrics in ZEBRA profile: (population, individual, tag)
- ZEBRA profile in ECE plots
y = 0 (profiles equal to the x-axis)
For display only, LLRs are in base 10.
Automatic assessment of the 2020 VoicePrivacy Challenge
ReadMe: use ZEBRA for kaldi experiments
Computation and visualizations of conventional metrics:
ReadMe: conventional plots & metrics
- ECE plots (Ramos et al.)
metrics: ECE & min ECE
- APE plots (Brümmer et al.)
metrics: DCF & min DCF
- Computation only
metrics: Cllr, min Cllr & ROCCH-EER
- ECE plots (Ramos et al.)
The installation uses Miniconda, which creates Python environments into a folder structure on your hard drive.
Deinstallation is easy: delete the miniconda folder.
- install miniconda, see:
- create a Python environment
conda create python=3.7 –name zebra -y
- activate the environment
conda activate zebra
- installing required packages
conda install -y numpy pandas matplotlib seaborn tabulate
A quick reference guide for using Python, the command line and to customization.
Command line: metric computation
Computing the metrics (command structure):
python zero_evidence.py -s [SCORE_FILE] -k [KEY_FILE]
An example is provided with
key.txt as score and key files:
scr=exp/Baseline/primary/results-2020-05-10-14-29-38/ASV-libri_test_enrolls-libri_test_trials_f/scores key=keys-voiceprivacy-2020/libri_test_trials_f python zero_evidence.py -s $scr -k $key
Population: 0.584 bit
Individual: 3.979 (C)
Command line: visualization
Display each plot:
python zero_evidence.py -s $scr -k $key -p
Command line: customization
Custom label for an experiment:
python zero_evidence.py -s $scr -k $key -l "libri speech, primary baseline"
libri speech, primary baseline
Population: 0.584 bit
Individual: 3.979 (C)
Save the profile visualization (without their display):
python zero_evidence.py -s $scr -k $key -l "profile" -e png
-l profilefor a file name: ZEBRA-profile
note: “ZEBRA-“ is an automatic prefix to the exported plot file names
Supported file types:
-e tex: LaTeX
-e pdf: PDF
-e png: PNG
To save a plot with its display, use both options:
-p -e png
Python: high-level implementation
Calling the API provided by
from zebra import PriorLogOddsPlots, zebra_framework, export_zebra_framework_plots # initialize the ZEBRA framework zebra_plot = PriorLogOddsPlots() # declare score & key paths scr = 'exp/Baseline/primary/results-2020-05-10-14-29-38/ASV-libri_test_enrolls-libri_test_trials_f/scores' key = 'keys-voiceprivacy-2020/libri_test_trials_f' # run the framework zebra_framework(plo_plot=zebra_plot, scr_path=scr, key_path=key) # saving the ZEBRA plot export_zebra_framework_plots(plo_plot=zebra_plot, filename='my-experiment', save_plot_ext='png')
Python: low-level implementation
Code snippets from
classB_scoresare numpy arrays of scores.
<div class="snippet-clipboard-content position-relative overflow-auto" data-snippet-clipboard-copy-content="from numpy import log, abs, hstack, argwhere
from zebra import PriorLogOddsPlots
zebra_plot = PriorLogOddsPlots(classA_scores, classB_scores)
# population metric
dece = zebra_plot.get_delta_ECE()
# individual metric
max_abs_LLR = abs(hstack((plo_plot.classA_llr_laplace, plo_plot.classB_llr_laplace))).max()
# categorical tag
max_abs_LLR_base10 = max_abs_LLR / log(10)
cat_idx = argwhere((cat_ranges = 5e-4 else ‘%.e’) % dece
str_max_abs_llr = (‘%.3f’ if max_abs_LLR >= 5e-4 else ‘%.e’) % max_abs_LLR
if dece == 0:
str_dece = ‘0’
if max_abs_LLR == 0:
str_max_abs_llr = ‘0’”>
from numpy import log, abs, hstack, argwhere from zebra import PriorLogOddsPlots zebra_plot = PriorLogOddsPlots(classA_scores, classB_scores) # population metric dece = zebra_plot.get_delta_ECE() # individual metric max_abs_LLR = abs(hstack((plo_plot.classA_llr_laplace, plo_plot.classB_llr_laplace))).max() # categorical tag max_abs_LLR_base10 = max_abs_LLR / log(10) cat_idx = argwhere((cat_ranges < max_abs_LLR_base10).sum(1) == 1).squeeze() cat_tag = list(categorical_tags.keys())[cat_idx] # nicely formatted string representations str_dece = ('%.3f' if dece >= 5e-4 else '%.e') % dece str_max_abs_llr = ('%.3f' if max_abs_LLR >= 5e-4 else '%.e') % max_abs_LLR if dece == 0: str_dece = '0' if max_abs_LLR == 0: str_max_abs_llr = '0'