Content uploaded by Nasir D. Memon
Author content
All content in this area was uploaded by Nasir D. Memon on Jul 09, 2017
Content may be subject to copyright.
Secure Sketch for Biometric Templates
Qiming Li1, Yagiz Sutcu2,andNasirMemon
3
1Department of Computer and Information Science
2Department of Electrical and Computer Engineering
3Department of Computer and Information Science
Polytechnic University
6 Metrotech Center, Brooklyn, NY 11201
qiming.li@ieee.org, ygzstc@yahoo.com, memon@poly.edu
Abstract. There have been active discussions on how to derive a con-
sistent cryptographic key from noisy data such as biometric templates,
with the help of some extra information called a sketch. It is desirable
that the sketch reveals little information about the biometric templates
even in the worst case (i.e., the entropy loss should be low). The main
difficulty is that many biometric templates are represented as points in
continuous domains with unknown distributions, whereas known results
either work only in discrete domains, or lack rigorous analysis on the
entropy loss. A general approach to handle points in continuous domains
is to quantize (discretize) the points and apply a known sketch scheme in
the discrete domain. However, it can be difficult to analyze the entropy
loss due to quantization and to find the “optimal” quantizer. In this
paper, instead of trying to solve these problems directly, we propose to
examine the relative entropy loss of any given scheme, which bounds the
number of additional bits we could have extracted if we used the optimal
parameters. We give a general scheme and show that the relative entropy
loss due to suboptimal discretization is at most (nlog 3), where nis the
number of points, and the bound is tight. We further illustrate how our
scheme can be applied to real biometric data by giving a concrete scheme
for face biometrics.
Keywords: Secure sketch, biometric template, continuous domain.
1 Introduction
The main challenge in using biometric data in cryptography is that they cannot
be reproduced exactly. Some noise will be inevitably introduced into biometric
samples during acquisition and processing. There have been active discussions
on how to extract a reliable cryptographic key from such noisy data. Some
recent techniques attempt to correct the noise in the data by using some public
information Pderived from the original biometric template X. These techniques
include fuzzy commitment [12], fuzzy vault [11], helper data [19], and secure
sketch [7]. In this paper, we follow Dodis et al. [7] and call such public information
Pasketch.
X. Lai and K. Chen (Eds.): ASIACRYPT 2006, LNCS 4284, pp. 99–113, 2006.
c
International Association for Cryptologic Research 2006
100 Q. Li, Y. Sutcu, and N. Memon
Typically, there are two main components in a secure sketch scheme. The first is
the sketch generation algorithm, which we will refer to as the encoder.Ittakesthe
original biometric template Xas the input, and outputs a sketch P. The second al-
gorithm is the biometric template reconstruction algorithm, or the decoder,which
takes another biometric template Yand the sketch Pas the input and outputs X.
If Yand Xare sufficiently similar according to some similarity measure, we will
have X=X. An important requirement for such a scheme is that the sketch P
should not reveal too much information about the biometric template X.Dodis
et al. [7] gives a notion of entropy loss, which (informally speaking) measures the
advantage that Pgives to any adversary in guessing X,whenXis discrete in na-
ture (Section 3 provides the details). It is worth to note that the entropy loss is a
worst case bound for all distributions of X.
There are several difficulties in applying many known secure sketch tech-
niques to known types of biometric templates directly. Firstly, many biometric
templates are represented by sequences of npoints in a continuous domain (say,
R), or equivalently, points in an n-dimensional space (say, Rn). In this case,
since the entropy of the original data can be very large, and the length of the
extracted key is typically quite limited, the “entropy loss” as defined in [7] can
be very high for any possible scheme. For example, Xis often a discrete approx-
imation of some points in a continuous domain (e.g., decimal fractions obtained
by rounding real numbers). As the precision of Xgets higher, both the entropy
of Xand the entropy loss from Pbecome larger, but the extracted key can
become stronger. Hence, this notion of entropy loss alone is insufficient, and the
seemingly high entropy loss for this type of biometric data would be misleading.
We will discuss this issue in detail in Section 4, and give a complimentary defini-
tion of relative entropy loss for noisy data in the continuous domain. Informally
speaking, the relative entropy loss of a sketch measures the imperfectness of the
rounding, which is the maximum amount of additional entropy we can obtain
by the “optimal” rounding. At the same time, the entropy loss from Pserves as
a measure of the security of the sketch in the discrete domain.
Secondly, even if the biometric templates are represented in discrete form,
there are practical problems when the entropy of the original template is high.
For example, the iris pattern of an eye can be represented by a 2048 bit binary
string called iris code, and up to 20% of the bits could be changed under noise
[9]. The fuzzy commitment scheme based on binary error-correcting codes [12]
seems to be applicable at the first glance. However, it would be impractical to
apply a binary error-correcting code on such a long string with such a large
error-correcting capability. A two-level error-correcting technique is proposed in
[9], which essentially changes the similarity measure. As a result, the space is no
longer a metric space.
Thirdly, the similarity measures for many known biometric templates can
be quite different from those considered in many theoretical works (such as
Hamming distance, set difference and edit distance in [7]). This can happen as
a result of technical considerations (e.g., in the case of iris codes). However,
in many cases this is due to the nature of biometric templates. For instance,
Secure Sketch for Biometric Templates 101
a fingerprint template usually consists of a set of minutiae (feature points in
2-D space), and two templates are considered as similar if more than a certain
number of minutiae in one template are near distinct minutiae in the other. In
this case, the similarity measure has to consider both Euclidean distance and set
difference at the same time.
The secure sketch for point sets [5] is perhaps the first rigorous approach to
similarity measures that do not define a metric space. A generic scheme is pro-
posed in [5] for point sets in bounded discrete d-dimensional space for any d,
where the underlying similarity measure is motivated by the similarity measure
of fingerprint templates. While such a scheme is potentially applicable to fin-
gerprints represented as minutiae, other types of biometrics are different both
in representations and similarity measures, thus require different considerations
and different schemes.
In this paper, we study how to design secure sketch for biometric templates,
where the worst case bound can be proved. We observe that many biometric
templates can be represented in a general form: The original Xcan be considered
as a list of npoints, where each point xof Xis in a bounded continuous domain.
Under noise, each point can be perturbed by a distance less than δ,andontop
of that, at most tpoints can be replaced. Similar to [5], we will refer to the
first noise as the white noise, and the second replacement noise.Wenotethat
this similarity measure can be applied to handwritten online signatures [8], iris
patterns [9], voice features [15], and face biometrics [17]. This formulation is
different from that in [5] in two ways: (1) The points are in a continuous domain,
and (2) the points are always ordered.
To handle points in continuous domain, a general two step approach is to
(1) quantize (i.e., discretize) the points in Xto a discrete domain with a scalar
quantizer Qλ,whereλis the step size, and (2) apply secure sketch techniques on
the quantized points
X=Qλ(X) in the quantized domain, which is discrete. For
example, if points in Xare real numbers between 0 and 1, assume that we have
a scalar quantizer Qλwith step size λ=0.01, such that Qλ(x)=xif and only
if xλ ≤x<(x+1)λ,theneverypointinXwould be mapped to an integer in
[0,99]. After that, we can apply a secure sketch for discrete points in the domain
[0,99]nto achieve error-tolerance.
However, there are two difficulties when this approach is applied. Firstly, if we
follow the notion of secure sketch and entropy loss as in [7], the quantization error
X−
Xin the first step has to be kept in the sketch, since exact reconstruction of
Xis required by definition. However, it can be difficult to give an upper bound on
the entropy loss from the quantization errors. Even if we can, it can be very large.
Furthermore, as the quantization step λbecomes very small, the bound on the
entropy loss in the quantized domain during the second step canbeveryhigh.For
instance, for x∈[0,1) and δ=0.01, when λ=0.01, the entropy loss in Step
(2) will be log 3, and the bound is tight. When λ=0.001, the entropy loss will
be log 21. However, the big difference in entropy loss in the quantized domain can
be misleading. We will revisit this example in Section 5, and will show that the
second case actually results in a stronger key if Xis uniformly distributed.
102 Q. Li, Y. Sutcu, and N. Memon
To address the above problems, we consider the following strategy. Instead of
trying to answer the question of how much entropy is lost during quantization,
we study how different quantizers affect the strength of the key that we can
finally extract from the noisy data. In particular, given a secure sketch scheme
in the discrete domain and a quantizer Q1with step size λ1,weconsiderany
quantizer Q2with step size λ2. Assuming that m1and m2are the strengths of
the keys under these two quantizers respectively, we found that it is possible to
give an upper bound on the difference between m1and m2, for any distribution
of X, and any choices of λ2(hence Q2) within a certain range. This bound can
be expressed as a function of λ1. In other words, although we do not know what
is the exact entropy loss due to the quantizer Q1, we do know that at most how
far away Q1can be from the “optimal” one. Based on this, we give a notion
of relative entropy loss for data in continuous domain. Furthermore, we show
that if Xis uniformly distributed, the relative entropy loss can be bounded by
a constant for any choice of λ1.
To illustrate how our general approach can be applied to practical biometric
templates, we give a scheme based on the authentication scheme for face biomet-
rics in [17]. We will also discuss some practical issues in designing secure sketch
schemes for biometric templates.
We note that our proposed schemes and analysis can be applied for two parties
to extract secret keys given correlated random variables (e.g., [14]), where the
random variables take values in a continuous domain (e.g. R). The entropy loss
in the quantized domain measures how much information can be leaked to an
eavesdropper, while the relative entropy loss measures how many additional bits
that we might be able to extract.
We will give a review of related works in Section 2, followed by some pre-
liminary formal definitions in Section 3. Our definition of secure sketch and its
security will be presented in Section 4. We give a general similarity measure and
our proposed schemes in Section 5, together with a security analysis and some
discussions on choosing the parameters. A concrete secure sketch scheme for face
biometrics will be given in 6.
2 Related Works
It is not surprising that the construction of the sketch largely depends on the
representation of the biometric templates and the underlying distance function
that measures the similarity. Most of the known techniques assume that the
noisy data under consideration are represented as points in some metric space.
The fuzzy commitment scheme [12], which is based on binary error-correcting
codes, considers binary strings where the similarity is measured by Hamming
distance. The fuzzy vault scheme [11] considers sets of elements in a finite field
with set difference as the distance function, and corrects errors by polynomial
interpolation. Dodis et al. [7] further gives the notion of fuzzy extractors,wherea
“strong extractor” (such as pair-wise independent hash functions) is applied after
the original Xis reconstructed to obtain an almost uniform key. Constructions
Secure Sketch for Biometric Templates 103
and rigorous analysis of secure sketch are given in [7] for three metrics: Hamming
distance, set difference and edit distance. Secure sketch schemes for point sets in
[5] are motivated by the typical similarity measure used for fingerprints, where
each template consists of a set of points in 2-D space, and the similarity measure
does not define a metric space.
On the other hand, there have been a number of works on how to extract
consistent keys from real biometric templates, which have quite different rep-
resentations and similarity measures from the above theoretical works. Such
biometric templates include handwritten online signatures [8], fingerprints [20],
iris patterns [9], voice features [15], and face biometrics [17]. These works, how-
ever, do not have sufficiently rigorous treatment of the security, compared to
well-established cryptographic techniques. Some of the works give analysis on
the entropy of the biometrics, and approximated amount of efforts required by
a brute-force attacker.
Boyen [2] shows that a sketch scheme that is provably secure may be insecure
when multiple sketches of the same biometric data are obtained. Boyen et al.
further study the security of secure sketch schemes under more general attacker
models in [1], and techniques to achieve mutual authentication are proposed.
Linnartz and Tuyls [13] consider a similar problem for biometric authentica-
tion applications. They consider zero mean i.i.d. jointly Gaussian random vectors
as biometric templates, and use mutual information as the measure of security
against dishonest verifiers. Tuyls and Goseling [19] consider a similar notion of
security, and develop some general results when the distribution of the original
is known and the verifier can be trusted. Some practical results along this line
also appear in [18].
3 Preliminaries
3.1 Entropy and Entropy Loss in Discrete Domain
In the case where Xis discrete, we follow the definitions by Dodis et al. [7]. They
consider a variant of the average min-entropy of Xgiven P, which is essentially
the minimum strength of the key that can be consistently extracted from X
when Pis made public.
In particular, the min-entropy H∞(A) of a discrete random variable Ais
defined as H∞(A)=−log(maxaPr[A=a]). For two discrete random variables
Aand B, the average min-entropy of Agiven Bis defined as
H∞(A|B)=
−log(Eb←B[2−H∞(A|B=b)]).
For discrete X, the entropy loss of the sketch Pis defined as L=H∞(X)−
H∞(X|P). This definition is useful in the analysis, since for any -bit string B,
we have
H∞(A|B)≥H∞(A)−. For any secure sketch scheme for discrete X,
let Rbe the randomness invested in constructing the sketch, it is not difficult to
show that when Rcan be computed from Xand P,wehave
L=H∞(X)−
H∞(X|P)≤|P|−H∞(R).(1)
104 Q. Li, Y. Sutcu, and N. Memon
In other words, the entropy loss can be bounded from above by the difference
between the size of Pand the amount of randomness we invested in computing
P. This allows us to conveniently find an upper bound of Lfor any distribution
of X, since it is independent of X.
3.2 Secure Sketch in Discrete Domain
Our definitions of secure sketch and entropy loss in the discrete domain follow
that in [7]. Let Mbe a finite set of points with a similarity relation S⊆M×M.
When (X, Y )∈S,wesaytheYis similar to X, or the pair (X, Y ) is similar.
Definition 1. A sketch scheme in discrete domain is a tuple (M,S,Enc,Dec),
where Enc :M→{0,1}∗is an encoder and Dec :M×{0,1}∗→Mis a decoder
such that for all X, Y ∈M,Dec(Y, Enc(X)) = Xif (X, Y )∈S.Thestring
P=Enc(X)is the sketch, and is to be made public. We say that the scheme is
L-secure if for all random variables Xover M, the entropy loss of the sketch P
is at most L. That is, H∞(X)−
H∞(X|Enc(X)) ≤L.
We call
H∞(X|P)theleft-over entropy, which in essence measures the “strength”
of the key that can be extracted from Xgiven that Pis made public. Note that
in most cases, the ultimate goal is to maximize the left-over entropy for some par-
ticular distribution of X. However, in the discrete case, the min-entropy of Xis
fixed but can be difficult to analyze. Hence, entropy loss becomes an equivalent
measure which is easier to quantify.
4 Secure Sketch in Continuous Domain
In this section we propose a general approach to handle noisy data in a contin-
uous domain. We consider points in a universe U,whichisasetthatmaybe
uncountable. Let Sbe a similarity relation on U, i.e., S⊆U×U.LetMbe a
set of finite points, and let Q:U→Mbe a function that maps points in Uto
points in M. We will refer to such a function Qas a quantizer.
Definition 2. A quantization-based sketch scheme is a tuple (U,S,Q,M,Enc,Dec),
where Enc :M→{0,1}∗is an encoder and Dec :M×{0,1}∗→Mis an decoder
such that for all X, Y ∈U,Dec(Q(Y),Enc(Q(X))) = Q(X)if (X, Y )∈S.The
string P=Enc(Q(X)) is the sketch. We say that the scheme is L-secure in the
quantized domain if for all random variable Xover U, the entropy loss of Pis at
most L, i.e., H∞(Q(X)) −
H∞(Q(X)|Enc(Q(X))) ≤L.
In other words, a quantization is applied to transform the points in the con-
tinuous domain to a discrete domain, and a sketch scheme for discrete domain
is applied to obtain the sketch P. During reconstruction, we require the exact
reconstruction of the quantization Q(X) instead of the original Xin the contin-
uous domain. When required, a strong extractor can be further applied to Q(X)
to extract a key (as the fuzzy extractor in [7]). That is, we treat Q(X)asthe
“discrete original”. Similarly, we call
H∞(Q(X)|P) the left-over entropy.
Secure Sketch for Biometric Templates 105
When Qis fixed, we can use the entropy loss on Q(X) to analyze the security
of the scheme, and bound the entropy loss of P. However, using this entropy loss
alone may be misleading, since there are many ways to quantize X, and different
quantizer would make a difference in both the min-entropy of Q(X)andthe
entropy loss. Since our ultimate goal is to maximize the left-over entropy (i.e.,
the average min-entropy
H∞(Q(X)|P)), the entropy loss alone is not sufficient
to compare different quantization strategies.
To illustrate the subtleties, we consider the following example. Let xbe a point
uniformly distributed in the interval [0,1), and under noise, it can be shifted but
still within the range [x−0.01,x+0.01). We can use a scalar quantizer Q1with
step size 0.01, such that all points in the interval [0,1) are mapped to integers
[0,99]. In this case, the min-entropy H∞(Q1(x)) = log 100. As we can see later,
there is an easy way to construct a secure sketch for such Q1(x) with entropy
loss of log 3. Hence, the left-over entropy is log(100/3) ≈5.06. Now we consider
another scalar quantizer Q2with step size 0.001, such that the range of Q2(x)is
[0,999]. A similar scheme on Q2(x) would give entropy loss of log 21, which seems
much larger than the previous log 3. However, the min-entropy of Q2(x)isalso
increased to log 1000, and the left-over entropy would be log(1000/21) ≈5.57,
which is slightly higher than the case where Q1is used.
Intuitively, for a given class of methods of handling noisy data in the quantized
domain, it is important to examine how different precisions of the quantization
process affect the strength of the extracted key. For this purpose, we propose
to consider not just one, but a family of quantizers Q, where each quantizer Q
drawn from Qdefines a mapping from Uto a finite set MQ.LetMbe the set
of such MQfor all Q∈Q. We also define a family of encoders Eand decoders
D, such that for each Qand MQ, there exist uniquely defined EncQ∈Eand
DecQ∈Dthat can handle Q(X)inMQ.
Definition 3. A quantization-based sketch family is a tuple (U,S,Q,M,E,D),
such that for each quantizer Q∈Q,thereexistM∈M,Enc ∈Eand Dec ∈D,
and (U,S,Q,M,Enc,Dec)is a quantization-based sketch scheme. We say that
such a scheme is a member of the family, and is identified by Q.
Definition 4. A quantization-based sketch family (U,S,Q,M,E,D)is (L,R)-
secure for functions L,R:Q→Rif for any member identified by Q1(with
encoder Enc1) it holds that
1. This member is L(Q1)-secure in the quantized domain; and
2. For any random variable X, and any member identified by Q2(with encoder
Enc2), we have
H∞(Q2(X)|Enc2(Q2(X))) −
H∞(Q1(X)|Enc1(Q1(X))) ≤R(Q1).
In other words, to measure the security of the family of schemes, we examine two
aspects of the family. Firstly, we consider the entropy loss in the quantized do-
main for each member of the family. This is represented by the function L,which
serves as a measure of security when the quantizer is fixed. Secondly, given any
106 Q. Li, Y. Sutcu, and N. Memon
quantizer in the family, we consider the question: If we use another quantizer, how
many more bits can be extracted? We call this the relative entropy loss,whichis
represented by the function R.
We observe that for some sketch families, the relative entropy loss for any given
member can be conveniently bounded by the size of of the sketch generated by
that member. We say that such sketch families are well-formed. More precisely,
we have
Definition 5. A quantization-based sketch family (U,S,Q,M,E,D)is well-formed
if for any two members (U,S,Q1,M1,Enc1,Dec1)and (U,S,Q2,M2,Enc2,Dec2),it
holds for any random variable Xthat
H∞(Q1(X)|P1,P
2)=
H∞(Q2(X)|P1,P
2)(2)
where P1=Enc1(Q1(X)) and P2=Enc2(Q2(X)).
Theorem 1. For any well-formed quantization-based sketch family, given any
two members (U,S,Q1,M1,Enc1,Dec1)and (U,S,Q2,M2,Enc2,Dec2), it holds
for any random variable Xthat
H∞(Q2(X)|P2)−
H∞(Q1(X)|P1)≤|P1|
where P1=Enc1(Q1(X)) and P2=Enc2(Q2(X)).
Proof: First, it is not difficult to show that for any random variables A, B and
C,wehave
H∞(A|B)−|C|≤
H∞(A|B, C)≤
H∞(A|B).(3)
Let
X1=Q1(X)and
X2=Q2(X). Since the sketch family is well-formed,
H∞
X1|P1,P
2=
H∞
X2|P1,P
2.(4)
Substituting Bby P1,Cby P2,andAby
X1and
X2respectively in (3), we have
H∞
X2|P2−|P1|≤
H∞
X2|P1,P
2
=
H∞
X1|P1,P
2≤
H∞
X1|P1.
(5)
5 A General Scheme for Biometric Templates
We observe that many biometric templates can be represented as a sequence of
points in some bounded continuous domain. There are two types of noise that
can occur. The first noise, white noise, perturbs each points by a small distance,
and the second noise, replacement noise, replaces some points by different points.
Secure Sketch for Biometric Templates 107
Without loss of generality, we assume that each biometric template Xcan be
written as a sequence X=x1,x
2,··· ,x
n,whereeachxi∈Rand 0 ≤xi<1.
In other words, X∈U=[0,1)n. For each pair of biometric templates Xand
Y,wesaythat(X, Y )∈Sif there exists a subset Cof {1,··· ,n}, such that
|C|≥n−tfor some threshold t,andforeveryi∈C, it holds that |xi−yi|<δ,
for some threshold δ.
Similar to the two-part approach in [5], we construct the sketch in two parts.
The first part, the white noise sketch, handles the white noise in the noisy data,
and the second part, the replacement noise sketch, corrects the replacement noise.
We will concentrate on the white noise sketch in this paper, and the replacement
noise sketch can be implemented using a known secure sketch scheme for set
difference (e.g., that in [7,3]).
5.1 Proposed Quantization-Based Sketch Family
Each member of the family is parameterized by a λsuch that λ∈Rand 0 <
λ≤δ.
Quantizer Qλ.Each quantizer Qλin Qis a scalar quantizer with step size
λ∈R.Foreachx∈U,Qλ(x)=xif and only if λx≤x<λ(x+1), and
the quantization of Xis defined as
X=Qλ(X)Qλ(x1),··· ,Qλ(xn).The
corresponding quantized domain is thus Mλ=[0,1
λ]n. The encoders and the
decoders work only on the quantized domain. The white noise appeared in the
quantized domain is of level
δλ=δ/λ. In other words, under white noise, a
point xin the quantized domain can be shifted by a distance of at most
δλ.Let
us denote Δλ2
δλ+1.
Codebook Cλ.Furthermore, for each quantized domain Mλwe consider a code-
book Cλ, where every codeword c∈C
λhas the form c=kΔλfor some
non-negative integer k.WeuseCλ(·) to denote the function such that given
a quantized point x, it returns a value c=Cλ(x) such that |x−c|≤
δλ.Thatis,
the functions finds the unique codeword cthat is nearest to xin the codebook.
Encoder Encλ.Given a quantized
X∈M
λ, the encoder Encλdoes the following.
1. For each xi∈
X, compute ci=Cλ(xi);
2. Output P=Encλ(
X)=d1,··· ,d
n,wheredi=xi−cifor 1 ≤i≤n.
In other words, for every xi, the encoder outputs the distance of xifrom its
nearest codeword in the codebook Cλ.
Decoder Decλ.For a corrupted template Y, it is first quantized by
Y=Qλ(Y).
Given P=d1,··· ,d
nand
Y=y1,··· ,yn, and the decoder Decλdoes the
following.
1. For each yi∈
Y, compute ci=Cλ(yi−di);
2. Output
X=Decλ(
Y)=c1+d1,··· ,c
n+dn.
In other words, the decoder shifts every yiby di, maps it to the nearest codeword
in Cλ, and shifts it back by the same distance.
108 Q. Li, Y. Sutcu, and N. Memon
5.2 Security Analysis
For each member of the sketch family with parameter λ, the difference dibe-
tween xiand piranges from −
δλto
δλ.Intuitively,logΔλbits are sufficient
and necessary to describe the white noise in the quantized domain (recall that
Δλ=2
δλ+1=2δ
λ+ 1). Hence, we have
Lemma 2. The quantization-based sketch scheme (U,S,Qλ,Mλ,Encλ,Decλ)is
(nlogΔλ
)-secure in the quantized domain.
Proof: Note that the size of each digenerated in the second step of the encoder
is log Δλ. Hence the total size of the sketch is nlog Δλ. Therefore, the entropy
loss of the sketch Pis at most nlog Δλby Equation (1).
It is not difficult to see that the above bound is tight. For example, when each
xis uniformly distributed in the quantized domain, the min-entropy of each x
after quantization would be log1
λ, and the average min-entropy of xgiven P
would be at most log |Cλ|=log1
λ−log Δλ.
Now we consider the relative entropy loss. First of all, we observe that the
proposed sketch family is well-formed according to Definition 5.
Lemma 3. The quantization-based sketch family defined in Section 5.1 is well-
formed.
Proof: We consider any two members in the sketch family. The first is identified
by Qλ1with step size λ1, and the second is identified by Qλ2with step size λ2.
For any p o i nt x∈X,letx1=Qλ1(x). Recall that during encoding, a code-
word is computed as c1=Cλ1(x1), and the difference d1=x1−c1is put into
the sketch. Similarly, let x2=Qλ2(x), c2=Cλ2(x2)andd2=x2−c2.
Since λ1≤δand λ2≤δ, it is easy to see that if d1,d
2and x1is known, we
can compute x2deterministically. Similarly, given d1,d
2and x2,x1can also be
determined. Thus, we have
H∞(x1|d1,d
2)=
H∞(x1,x2|d1,d
2)=
H∞(x2|d1,d
2).(6)
ThesameargumentscanbeappliedtoallthepointsinX. Hence, let P1=
Encλ1(X)andP2=Encλ2(X), we have
H∞
X1|P1,P
2=
H∞
X1,
X2|P1,P
2=
H∞
X2|P1,P
2.(7)
That is, the proposed sketch family is well-formed.
By combining Theorem 1 and Lemma 3, and considering that for the member
of the sketch family identified by Qλ1with step size λ1, the size of the sketch
|P1|=n(log Δλ1), we have the following lemma.
Lemma 4. For the quantization-based sketch family defined in Section 5.1, given
any member identified by Qλ1with step size λ1and encoder Encλ1it holds that, for
Secure Sketch for Biometric Templates 109
every random variable X∈Uand any member identified by Qλ2with step size λ2
and encoder Encλ2, we have
H∞(Qλ2(X)|Encλ2(Qλ2(X))) −
H∞(Qλ1(X)|Encλ1(Qλ1(X))) ≤n(log Δλ1).
In other words, the relative entropy loss is at most n(log Δλ1)for Qλ1.
Not only the above is a worst case bound, we can show that the worst case can
indeed happen.
Lemma 5. The relative entropy loss in Lemma 4 is tight for sufficiently small δ.
Proof: Fo r a ny g ive n λ1, we find a λ2such that it is possible to find Δλ1
(2δ/λ1+1) points W={w0,··· ,w
Δλ1−1}such that Qλ1(wi)−Cλ1(Qλ1(w1)) =
i−δ/λ1,andCλ2(wi)=cifor some codeword ci∈C
λ2. In other words, we
want to find points such that each of them would generate a different diin the
final sketch with Qλ1, but would generate exactly the same number (i.e., 0) in
the sketch when Qλ2is used. Note that when δis sufficiently small, there would
be sufficiently many codewords in Cλ1, and it is always possible to find such λ2
(e.g., λ2=λ1/2).
When each x∈Xis uniformly distributed over W,wecanseethatthesketch
from the scheme identified by Qλ1would reveal all information about X, but in
the case of Qλ2, the left-over entropy would be exactly logΔλ1.
Therefore, combining lemmas 2, 4 and 5 we have
Theorem 6. The quantization-based sketch family defined in Section 5.1 is (L,R)-
secure where for each member in the family identified by Qλwith step size λ,where
L(Qλ)=R(Qλ)=nlog Δλ. Furthermore, the bounds are tight.
For exa m p l e , if λ=δ,wewouldhaveL(Qλ)=R(Qλ)=n(log 3). Note that
although decreasing λmight give a larger left-over entropy, this is not guaranteed.
In fact, if we use a λ<λ, by applying the above theorem on Qλ,wecansee
that it may result in a smaller left-over entropy than using Qλ(e.g., consider
the example in the proof of Lemma 5).
5.3 A Special Case
We further study a special case when each point x∈Xis independently and
uniformly distributed over [0,1). We further assume that 1/δ is an integer, and
the family of schemes only consists of members with step size λsuch that 1/λ is
an integer that is a multiple of Δλ. This additional assumption is only for the
convenience of the analysis, and would not make too much difference in practice.
In this case, the entropy loss in the quantized domain for the member identified
by Qλwith step size λwould be exactly n(log Δλ), which shows that Lemma 2
is tight. Moreover, it is interesting that the relative entropy loss in this case can
be bounded by a constant.
110 Q. Li, Y. Sutcu, and N. Memon
Corollary 7. When each x∈Xis independently and uniformly distributed, the
quantization-based sketch family defined in Section 5.1 is (L,R)-secure where
for each member in the family identified by Qλwith step size λ,whereL(Qλ)=
n(log Δλ),andR(Qλ)=nlog(1 + λ
2δ)≤nlog(3/2).
Proof: The claim L(Qλ)=n(log Δλ) follows directly from Lemma 2, so we
only focus on R. Consider two members of the family identified by Qλ1and
Qλ2respectively. Without loss of generality, we assume λ1>λ
2.Considerany
x∈X,letx1=Qλ1(x), c1=Cλ1(x1). Similarly we define x2=Qλ2(x)andc2=
Cλ2(x2). Hence, the min-entropy in the quantized domain would be log(1/λ1)
and log(1/λ2) respectively.
Clearly, c1and c2are also uniformly distributed over Cλ1and Cλ2respectively,
and do not depend on d1and d2. Hence, the left-over entropy for these two
members would be log(|Cλ1|)=log 1
λ1+2δand log(|Cλ2|)=log 1
λ2+2δrespectively.
Furthermore, recall that 0 <λ
2<λ
1≤δ, and the difference between these two
quantities can be bounded as
log(|Cλ2|)−log(|Cλ1|)=logλ1+2δ
λ2+2δ<log(1 + λ1
2δ)≤log 3
2.
Therefore, the relative entropy loss is bounded by nlog(3/2) as claimed.
5.4 Remarks
Choosing the step size λ.We can view the step size λas a measure of the precision
of
X. Since the white noise in the continuous domain is fixed at δ,whenλbecomes
smaller, the corresponding white noise in the quantized domain would increase,
and vice versa. That is intuitively why it is not possible to obtain much more left-
over entropy by simply having Xrepresented in a higher precision. In fact, it is
not difficult to show that there are certain distributions of Xsuch that a smaller
step size would reveal more information. Furthermore, the scheme can be more
efficient if we use a relatively larger step size, since we would need fewer bits to
represent both Xand the white noise in the quantized domain. If we use the same
quantizer for both encoding and decoding, the simplest form of white noise in the
quantized domain can be achieved when λ=δ, where a quantized xcan be either
left unchanged, or shifted by 1. In this case, from Theorem 6, we can get at most
nlog 3 additional bits if we choose other λ<δ.IfXis uniformly distributed, the
increment is at most nlog(3/2) by Corollary 7.
When λ>δ, the form of white noise in the quantized domain would remain
unchanged, but we may lose too much information about Xdue to the large
quantization step, which may result in a much lower left-over entropy. There-
fore, it is not desirable to have a step size larger than δin general. If different
quantizers are used during encoding and decoding, with large step size (e.g., 2δ),
it is possible to reduce the white noise in the quantized domain to a special 0-1
noise, under which an xis either left unchanged or shifted to x+1, as observed
in [4]. Nevertheless, this strategy may give lower left-over entropy.
Secure Sketch for Biometric Templates 111
Handling replacement noise. After the white noise has been corrected, an exist-
ing scheme for set difference can be applied in the quantized domain to correct
the replacement noise. There are known schemes that can achieve entropy loss
of O(tlog1
λ) with small leading constant, such as those in [7,3]. Although the
replacement noise is not considered for the face biometrics that we study in
Section 6, it may need to be addressed for other biometric templates (e.g., iris
patterns [9]).
Extension to higher dimensions. It is straightforward to extend our scheme to
higher dimensions, where each x∈Xis a point in some d-dimensional space. For
example, we can apply a scalar quantizer on each coordinate of every point, and
let the distance of two points in d-dimensional space be measured by max-norm
(i.e., the maximum distance in all dimensions). The entropy loss of the resulting
scheme would be dtimes that in the current construction for 1-D points. If there
is no replacement noise, we could also expand the npoints in d-dimensional
space into nd points in 1-D and apply the proposed scheme.
The choice of the sketch family. It is important to note that evenif a quantization-
based sketch family is well-formed, it does not guarantee the existence of a “good”
quantizer in that family. Nevertheless, it does allow us to evaluate any given mem-
ber in the family with respect to the “optimal” member in the family. We consider
it a challenging open problem to find a general algorithm to find the optimal quan-
tizer among all possible quantizers, given certain practical constraints (e.g., the
smallest possible quantization step and the distribution of X).
6 A Concrete Construction for Face Biometrics
Face images, especially those taken from a controlled environment, can be used
as the basis of identity verification, Here we follow the techniques employed in
[17] and make use of the singular value decomposition (SVD) of the face images
for verification, which is a well-known strategy in the face recognition literature
(such as [10,6]). Given a face image Aof size M×N, we can always find matrices
U,Σand Vsuch that A=UΣV T,whereΣis an M×Nmatrix with min(M, N )
non-zero elements ordered according to their significance.Asnotedin[17],some
(say, n)mostsignificantcoefficientsofΣcontain significant identity information
of the individual. Typically nis chosen such that the sum of these ncoefficients
is more than, say, 98% of the sum of all the coefficients.
In [17], the biometric template of an individual is obtained as follows. First,
we take a few face images, compute the SVD, and obtain the minimum mini
and maximum maxiof the i-th significant coefficient, for 1 ≤i≤n,wheren
is chosen to be 20. The mean value ai=(maxi+mini)/2 is then taken as
a point in the template. When a new face image is presented for verification,
its SVD is computed, and if for 1 ≤i≤n,thei-th significant coefficient is
sufficiently close to ai, it is considered as authenticated. The scheme in [17] is
applied to face images from the Essex Faces94 Database [16], which contains
152 faces with 20 images for each face (24bit color JPEG). Twelve images per
112 Q. Li, Y. Sutcu, and N. Memon
face are randomly chosen to compute the templates, and the rest 8 are used for
testing. The experiments show that when the false accept rate is 0.005, the false
reject rate is less than 0.045.
To apply our sketch scheme, for each coefficient, we further compute the min-
imum min and the maximum max of all the templates in the database (assuming
that the number of templates is large). Hence, we can compute our biometric
template Xas a sequence of npoints, where the i-th point xi=ai−min
max−min .We
set the noise level δi=k(maxi−ai)
max−min for some constant k≥1. In this way, each
point xiwill be between 0 and 1 so that our scheme can be applied. There is a
difference, however, that we have a different δifor each point, which we have to
put as part of the sketch. Nevertheless, our analysis on the entropy loss can be
easily adapted to this case, and the difference here will not affect the security of
the scheme. Here we choose λi=δifor all 1 ≤i≤n.
In this way, the sketch produced by our proposed scheme, would be the tuple
P=(min,max,λ
1,··· ,λ
n,x1−C
λ1(x1),··· ,xn−C
λn(xn))
where xi=Qλi(xi)for1≤i≤n. By applying the arguments in Theorem 6 and
Corollary 7 to each point in X,wehave
Corollary 8. The entropy loss in the quantized domain for the aforementioned
scheme is at most nlog 3.Letmbe the left-over entropy. When λi<δ
ifor any
i,1≤i≤n, let the left-over entropy be m. We have m−m≤nlog 3.Ifall
points are uniformly distributed, we have m−m≤nlog(3/2).
When n= 20, the above bounds are approximately 31.7and11.7 respectively.
References
1. X. Boyen, Y. Dodis, J. Katz, R. Ostrovsky, and A. Smith. Secure remote authen-
tication using biometric data. In Eurocrypt, 2005.
2. Xavier Boyen. Reusable cryptographic fuzzy extractors. In ACM CCS, pages
82–91, Washington DC, USA, 2004. ACM Press.
3. Ee-Chien Chang, Vadym Fedyukovych, and Qiming Li. Secure sketch for multi-set
difference. Cryptology ePrint Archive, Report 2006/090, 2006. http://eprint.
iacr.org/.
4. Ee-Chien Chang and Qiming Li. Small secure sketch for point-set difference. Cryp-
tology ePrint Archive, Report 2005/145, 2005. http://eprint.iacr.org/ .
5. Ee-Chien Chang and Qiming Li. Hiding secret points amidst chaff. In Eurocrypt,
volume 4004 of LNCS, pages 59–72, 2006.
6. Yong-Qing Cheng. Human face recognition method based on the statistical model of
small sample size. In SPIE Proc. Intell. Robot and Compu. Vision, pages 85–95, 1991.
7. Yevgeniy Dodis, Leonid Reyzin, and Adam Smith. Fuzzy extractors: How to gen-
erate strong keys from biometrics and other noisy data. In Eurocrypt, volume 3027
of LNCS, pages 523–540. Springer-Verlag, 2004.
8. F. Hao and C.W. Chan. Private key generation from on-line handwritten signa-
tures. Information Management and Computer Security, 10(2), 2002.
Secure Sketch for Biometric Templates 113
9. Feng Hao, Ross Anderson, and John Daugman. Combining cryptography with bio-
metrics effectively. Technical Report UCAM-CL-TR-640, University of Cambridge,
2005.
10. Z. Hong. Algebraic feature extraction of image for recognition. Pattern Recognition,
24:211–219, 1991.
11. Ari Juels and Madhu Sudan. A fuzzy vault scheme. In IEEE Intl. Symp. on
Information Theory, 2002.
12. Ari Juels and Martin Wattenberg. A fuzzy commitment scheme. In ACM CCS,
pages 28–36, 1999.
13. J.-P. Linnartz and P. Tuyls. New shielding functions to enhance privacy and pre-
vent misuse of biometric templates. In AVB PA , pages 393–402, 2003.
14. Ueli Maurer and Stefan Wolf. Information-theoretic key agreement: From weak to
strong secrecy for free. In Eurocrypt, 2000.
15. F. Monrose, M.K. Reiter, Q. Li, and S. Wetzel. Cryptographic key generation from
voice. In IEEE Symp. on Security and Privacy, 2001.
16. Libor Spacek. The essex faces94 database. http://cswww.essex.ac.uk/mv/allfaces/.
17. Y. Sutcu, T. Sencar, and N. Memon. A secure biometric authentication scheme
based on robust hashing. In ACM MM-SEC Workshop, 2005.
18. P. Tuyls, A.H.M. Akkermans, T.A.M. Kevenaar, G.J. Schrijen, A.M. Bazen, and
R.N.J. Veldhuis. Practical biometric authentication with template protection. In
AVB PA , pages 436–446, 2005.
19. P. Tuyls and J. Goseling. Capacity and examples of template-protecting biometric
authentication systems. In ECCV Workshop BioAW, pages 158–170, 2004.
20. Shenglin Yang and Ingrid Verbauwhede. Automatic secure fingerprint verification
system based on fuzzy vault scheme. In IEEE Intl. Conf. on Acoustics, Speech,
and Signal Processing (ICASSP), pages 609–612, 2005.