ArticlePDF Available

A Fast Provably Secure Cryptographic Hash Function.

Authors:

Abstract and Figures

We propose a family of fast and provably secure cryptographic hash functions. The security of these functions relies directly on the well-known syndrome decoding problem for linear codes. Attacks on this problem are well identified and their complexity is known. This enables us to study precisely the practical security of the hash functions and propose valid parameters for implementation. Furthermore, the design proposed here is fully scalable, with respect to security, hash size and output rate.
Content may be subject to copyright.
A Fast Provably Secure Cryptographic
Hash Function
Daniel Augot, Matthieu Finiasz, and Nicolas Sendrier
Projet Codes, INRIA Rocquencourt
BP 105, 78153 Le Chesnay - Cedex, France
[Daniel.Augot,Matthieu.Finiasz,Nicolas.Sendrier]@inria.fr
Abstract. We propose a family of fast and provably secure crypto-
graphic hash functions. The security of these functions relies directly
on the well-known syndrome decoding problem for linear codes. Attacks
on this problem are well identified and their complexity is known. This
enables us to study precisely the practical security of the hash functions
and propose valid parameters for implementation. Furthermore, the de-
sign proposed here is fully scalable, with respect to security, hash size
and output rate.
Key Words: hash functions, syndrome decoding, NP-completeness.
1 Introduction
The main cryptographic hash function design in use today iterates a so called
compression function according to Merkle’s and Damg˚ard’s constructions [5, 13].
Classical compression functions are very fast [3, 14, 16] but cannot be proven
secure. However, provable security may be achieved with compression functions
designed according to public key principles, at the cost of a poor efficiency.
Unlike most other public key cryptosystems, the encryption function of the
McEliece cryptosystem [11] (or of Niederreiter’s version [15]) is nearly as fast as a
symmetric cipher. Using this function with a random matrix instead of the usual
Goppa code parity check matrix, we obtain a provably secure one-way function
with no trap. The purpose of this paper is to use this function to obtain a fast
cryptographic hash function whose security is assessed by a difficult algorithmic
problem.
For didactic purposes, we introduce the Syndrome Based (SB) compression
function, which directly relies on Niederreiter’s scheme and the syndrome de-
coding problem. However, this function can hardly be simultaneously fast and
secure for practical parameters. Hence we introduce the Fast Syndrome Based
(FSB) compression function, derived from the previous one and relying on a
similar hard problem. Section 2 is devoted to the description of both functions.
In Section 3 we show that, as for McEliece’s and Niederreiter’s cryptosystems,
the security of SB can be reduced to the hardness of syndrome decoding in the
average case. Similarly, we prove that the security of FSB is reduced to the av-
erage case difficulty of two new NP-complete problems. Finally, in Section 4, we
show how the best known decoding techniques can be adapted to the cryptanal-
ysis of our functions. From that we can evaluate the practical security and the
scalability of the system, and eventually propose a choice of parameters. Note
that, for clarity of the presentation, NP-completeness proofs are postponed in
the appendix.
2 The Hash Functions
We will present two different versions of the hash function: the first one is the
Syndrome Based hash function (SB), the second, a modified version, is called Fast
Syndrome Based (FSB) and is much faster in practice, but also more secure.
2.1 General Construction
There is one main construction for designing hash functions: it consists in it-
erating a compression function which takes as input sbits and returns rbits,
with s > r, so that, using such a chaining, the resulting function can operate
on strings of arbitrary length (see Fig. 1). The validity of such a design has
been well established [5, 13] and its security is not worse than the security of the
compression function. There for we will concentrate on the security of the latter.
Compression
Chaining
Last round
D
Document
Last round
First round
Padding
I.V.
Hash value
Fig. 1. A standard hash function construction
2.2 Description of the Syndrome Based Compression Function
The core of the compression function is a random binary matrix Hof size r×n.
Hashing a document will consist in adding (using binary XORs) wcolumns of
this matrix to finally obtain a hash of length r.
The parameters for the hash function are:
nthe number of columns of matrix H;
rthe number of rows of matrix Hand the size in bits of the function output;
wthe number of columns of Hadded at each round.
Once these parameters are chosen, a (truly) random r×nmatrix His gen-
erated. This matrix is chosen once and for all for the hash function. Using the
scheme explained next a function with input size s= log2¡n
w¢and output size r
is obtained.
As we use a standard chaining method, the input of the compression function
will consist of rbits taken from the output of the previous round and srbits
taken from the file. Of course, wmust be such that s > r.
The compression is performed in two steps:
Input:sbits of data
1. encode the sbits in a word eof length nand weight w;
2. compute HeTto obtain a binary string of length r.
Output:rbits of hash
2
The first step requires to convert the sinput bits into a binary n-tuple
containing exactly wones (and the rest of zeros everywhere else). Then multiply
this word by H(add the corresponding wcolumns of H) to obtain the rbit
hash.
This function is expected to be very fast as only a few operations are required:
input encoding and a few XORs. In practice, the second step will be very fast,
but the first step is much slower. Indeed, the best algorithm for embedding data
in a constant weight word [6, 7] makes extensive use of large integer arithmetics
and is by far the most expensive part.
2.3 Description of the Fast Syndrome Based Compression Function
Definition 1. A word of length nand weight wis called regular if it has exactly
one non-zero position in each of the wintervals r(i1) n
w;in
wzi=1..w.
To improve the speed, we embed less data in each constant weight word by
using a faster, no longer one-to-one, constant weight encoder. Instead of using
any word of weight wwe embed the input bits in a regular word of weight w.
Hence we will have s=wlog2(n/w).
The matrix His split into wsub-blocks Hiof size r×n
wand the algorithm
is the following:
Input:sbits of data
1. split the sinput bits in wparts s1, . . . , swof log2(n/w) bits;
2. convert each sito an integer between 1 and n
w;
3. choose the corresponding column in each Hi;
4. add the wchosen columns to obtain a binary string of length r.
Output:rbits of hash
Using this encoder the cost of the first step becomes negligible as it only
consists in reading the input bits a fixed number at a time. This compression
function is hence very fast and its speed is directly linked to the number of XORs
required for a round.
2.4 Related Work
In Merkle’s construction [13], the compression function is an encryption function
(modular squaring, knapsack. . . ), used as a one-way function. In the case of
the SB hash function the compression function is very similar to the encoding
function of Niederreiter’s version of the McEliece cryptosystem [11, 15]. The only
difference is that, instead of using the parity check matrix of a permuted Goppa
code, SB uses a random matrix H. Doing this, the trap in the one-way function
of the cryptosystem was removed.
From a security point of view this can only strengthen the system as all
attacks on the trap no longer hold. Breaking Niederreiter’s cryptosystem can be
reduced to the two problems of inverting the one-way function or recovering the
trap. In the case of SB, only the inversion problem remains.
3 Theoretical Security
As stated in [12], a cryptographic hash function has to be pre-image resistant,
second pre-image resistant and collision resistant. As the second pre-image re-
sistance is strictly weaker than collision resistance, we will only check that both
3
hash functions are collision free and resistant to inversion. In the SB hashing
scheme this can be reduced to solving an instance of Syndrome Decoding, which
is NP-complete [2]. In the FSB version, these two kinds of attack can be re-
duced to two very close new problems. We will first describe them and show (in
appendix) that they are also NP-complete.
We will then show that finding a collision or an inversion is at least as hard
as solving one of these two new problems. This is what we call provable security.
3.1 Two New NP-complete Problems
In this section we will recall the problems of syndrome decoding and null syn-
drome decoding and then describe two closely related new problems.
Syndrome Decoding (SD)
Input: a binary matrix Hof dimension r×nand a bit string Sof length r.
Property: there exists a set of w0columns of Hadding to S(with 0 < w0w).
Null Syndrome Decoding (NSD)
Input: a binary matrix Hof dimension r×n.
Property: there exists a set of w0columns of Hadding to 0 (with 0 < w0w).
These two problems are NP-complete [2], which means that at least some
instances of the problem are difficult. However, it is a common belief that they
should be difficult in average (for well chosen parameter ranges), which means
that random instances are difficult. For cryptographic purposes this is much
more interesting than simple NP-completeness.
For the following two problems the same comment can be made. They are
NP-complete (see appendix A) and we believe that they are hard in average.
Regular Syndrome Decoding (RSD)
Input: wmatrices Hiof dimension r×nand a bit string Sof length r.
Property: there exists a set of wcolumns, one in each Hi, adding to S.
2-Regular Null Syndrome Decoding (2-RNSD)
Input: wmatrices Hiof dimension r×n.
Property: there exists a set of 2w0columns (with 0 < w0w), 0 or 2 in each
Hi, adding to 0.
3.2 Security Reduction
In this section we will show how finding collisions or inverting any of the two
proposed hash function is as hard as solving an instance of one of the NP-
complete problems described in the previous section.
We will prove the security of the compression function, which is enough when
using a standard construction (see [12] p. 333).
Security of the Syndrome Based Hash Function. Finding an inversion
for this compression function consists in finding an input (of length s) which will
hash to a given bit string S. Now suppose an algorithm Ais able to compute
inversions for this function, and an instance (H, w, S) of the SD problem has to
be solved. Then, using A, it is possible to compute inverses for the compression
function using H, and so obtain an input with hash S. This means that the w
columns corresponding to this input, when added together, sum to S. A solution
to the given instance of SD has been found.
Finding a collision for this scheme consists in finding two different inputs
hashing to the same string. Now suppose an algorithm A0is able to find collisions
4
for this compression function, and a given instance (H,2w) of the NSD problem
has to be solved. The algorithm A0can compute two inputs hashing (through
H) to the same string. These inputs correspond to two different words of weight
wwhich when added together give a word mof weight 2w. Due to linearity,
the product HmTis 0. The word mis a solution to the instance of NSD.
So, finding either a collision or an inversion for the SB hash function is at
least as hard as solving an instance of the SD or NSD problems.
Security of the FSB Hash Function. For this version of the hash function
the security reduction can be done exactly like for the SB function. Inverting
the compression function can be reduced to the RSD problem, and collision to
the 2-RNSD problem.
These reductions to NP-complete problems do not prove that all instances are
difficult but only that some instances are difficult. For a cryptographic security
this is clearly not enough. However, in the same manner as Gurevich and Levin [8,
10] have discussed it for SD, we believe that all these NP-complete problems are
difficult in average (for well chosen parameters).
4 Practical Security
This section is dedicated to the study of the practical security of the two versions
of the hash function. As for the security reduction, attacking the hash function
as a whole is equivalent to attacking a single round and takes the same amount
of computation. Therefore we need to identify the possible attacks on one round
of the system, and then study the minimal workfactors required to perform these
attacks.
4.1 Existing Attacks
Decoding in a random linear code is at least as hard as giving an answer to SD.
This problem has been extensively studied along the years and many attacks
against it have been developed (see [1]): split syndrome decoding, gradient-like
decoding, information set decoding. . . All these attacks are exponential. Still, as
stated by Sendrier [17], the most efficient attacks seem to be all derived from
Information Set Decoding (ISD).
Definition 2. An information set is a set of k=nr(the dimension of the
code) positions among the npositions of the support.
Definition 3. Let (H, w, S)be an instance of SD. An information set will be
called valid if there exists a solution to this SD problem which has no 1s among
the chosen kpositions.
The ISD technique consists in picking information sets, until a valid one is
found. Checking whether the information set is valid or not mainly consists in
performing a Gaussian elimination on an r×rsubmatrix of the parity check
matrix Hof the code. Once this is done, if a solution exists it is found in constant
time. Let GE(r) be the cost of this Gaussian elimination and Pwthe probability
for a random information set to be valid. Then the complexity of this algorithm
is GE(r)/Pw.
5
This technique was improved several times [4, 9, 18]. The effect of these im-
provements on the complexity of the algorithm is to reduce the degree of the
polynomial part. The complexity is then g(r)/Pw, with d(g)< d(GE). How-
ever, all the different versions were designed for instances of SD having only
one (or a few) solutions. For SB, as we will see in section 4.4, the range of pa-
rameters we are interested in leads to instances with many more solutions (over
2400). Anyhow, ISD attacks still be the most suitable and behave the same way
as when there is a single solution.
Applied to the FSB hash function, this technique will, in addition, have to
return regular words. This will decrease considerably the number of solutions
and, in this way, decrease the probability for a given information set to be valid,
thus enhancing security.
4.2 Analysis of Information Set Decoding
We have seen that the complexity of an information set decoding attack can be
expressed as g(r)/Pw. Hence, it is very important to evaluate precisely Pwin
both versions of the hash function and for both kind of attacks. The polynomial
part has less importance and is approximately the same in all four cases.
The probability Pwwe want to calculate will depend on two things: the
probability Pw,1that a given information set is valid for one given solution of
SD, and the expected number Nwof valid solutions for SD. Even though the
probabilities we deal with are not independent, we shall consider
Pw= 1 (1 − Pw,1)Nw.
It is important to note that Nwis the average number of valid solutions to
SD. For small values of w,Nwcan be much smaller than one; the formula is
valid though.
In this section, for the sake of simplicity, we will use the approximation
Pw' Pw,1× Nw. When calculating the security curves (see section 4.3) and
choosing the final parameters (see 4.4), we have used the exact formulas for the
calculations.
Solving SD: attacking the SB hash function. For inversion, the problem
is the following: we have a syndrome which is a string Sof rbits. We want to
find an information set (of size k=nr) which is valid for one inverse of Sof
weight w.
For one given inverse of weight wthe probability for a given information set
to be valid is:
Pw,1=¡nw
k¢
¡n
k¢=¡nk
w¢
¡n
w¢=¡r
w¢
¡n
w¢.
The average number of solutions (of inverses) of weight wis:
Nw=¡n
w¢
2r.
In the end, we get a total probability of choosing an information set valid for
inversion of:
Pinv =Pw,1× Nw=¡r
w¢
2r.
6
To find a collision, it is enough to find a word of even weight 2iwith 0 < i
w. We have exactly the same formulas as for inversion, but this time the final
probability is:
Pcol =
w
X
i=1 ¡r
2i¢
2r.
Solving RSD: inverting the FSB hash function. In that case, one needs
to find a regular word of weight whaving a given syndrome S. Since the word is
regular there will be less solutions for each syndrome, and even if each of them
is easier to find, in the end the security is increased compared to SB.
The number of regular solutions to RSD is, in average:
Nw=¡n
w¢
2r
w
.
The probability of finding a valid information set for a given solution is
however a little more intricate. For instance, as the solutions are not random
words, the attacker shouldn’t choose the information set at random, but should
rather choose the sets which have the best chance of being valid. In our case it
is easy to see that the attacker will maximize his chances when taking the same
number of positions in each block, that is, taking k/w positions wtimes. The
probability of success is then:
Pw,1=án/w1
k/w ¢
¡n/w
k/w¢!w
=¡r
w¢w
¡n
w¢w.
The final probability is:
Pinv =Pw,1× Nw=¡r
w¢
2r
w
.
One can check that this probability is much smaller than for the SB hash
function (a ratio of w!/ww). Using the fast constant weight encoder, and restrict-
ing the set of solutions to RSD, has strengthened the system.
Solving 2-RNSD: collisions in the FSB hash function. When looking
for collisions one needs to find two regular words of weight whaving the same
syndrome. However these two words can coincide on some positions. With the
block structure of regular words, this means that we are looking for words with
a null syndrome having some blocks (say i) with a weight of 2 and the remaining
blocks with a weight of 0. The number of such words is:
Ni=¡w
i¢¡n/w
2¢i
2r.
Once again, the attacker can choose his strategy when choosing information
sets. However this time the task is a little more complicated as there is not a
single optimal strategy. If the attacker is looking for words of weight up to 2w
then the strategy is the same as for RSD, choosing an equal amount of positions
in each set. For each value of i, the probability of validity is then:
7
Pi,1=¡w
i¢¡n/wk/w
2¢i
¡w
i¢¡n/w
2¢i=¡r/w
2¢i
¡n/w
2¢i.
The total probability of success for one information set is then:
Pcol total =
w
X
i=1 ¡w
i¢¡r/w
2¢i
2r=1
2r·µr/w
2+ 1¸w
.
But the attacker can also decide to focus on some particular words. For ex-
ample he could limit himself to words with non-zero positions only in a given set
of w0< w blocks, take all the information set points available in the remaining
ww0blocks and distribute the rest of the information set in the w0chosen
blocks. He then works on less solutions, but with a greater probability of success.
This probability is:
Pcol w0=1
2r·µn
wk0
w0
2+ 1¸w0
,
with k0=k(ww0)×n
w=n w0
wr. This can be simplified into:
Pcol w0=1
2r·µ r
w0
2+ 1¸w0
.
Surprisingly, this no longer depends of wand n. As the attacker has the
possibility to choose the strategy he prefers (depending on the parameters of the
system) he will be able to choose the most suitable value for w0. However, as
w0whe might not always be able to take the absolute maximum of Pcol w0.
The best he can do will be:
Pcol optimal =1
2rmax
w0J1;wK·µ r
w0
2+ 1¸w0
.
This maximum will be reached for w0=α·rwhere α0.24 is a constant.
Hence, if w > 0.24rwe have:
Pcol optimal =1
2r·µ1
α
2+ 1¸αr
'0.81r
4.3 Some security curves
These are some curves obtained when plotting the exact versions of the formulas
above. For n= 214 and r= 160, we see on Fig. 2 that the security of the FSB
hash function is much higher than that of the SB version. This is the case for
the chosen parameters but also for about any sound set of parameters.
Fig. 3 focuses on the security of the FSB hash function against inversion and
collision. We see that by choosing different parameters for the hash function we
can obtain different security levels. This level does not depend significantly on n
or wbut is mainly a function of r. With r= 160 we get a probability of 247.7
that an information set is valid. With r= 224 we decrease this probability to
266.7. This may seem far below the usual security requirements of 280 binary
8
20
0
40
60
80
100
120
140
160
180
200
10 20 30 40 50 60 70 80
w
20
0
40
60
80
100
120
140
160
180
200
10 20 30 40 50 60 70 80
w
Fig. 2. These two curves show the log2of the inverse of the probability that an infor-
mation set is valid, as a function of w. The dotted line corresponds to the SB version
of the scheme and the plain line to the FSB version. On the left the attack is made for
inversion, on the right for collision. The curves correspond to the parameters n= 214 ,
r= 160.
20
0
40
60
80
100
120
140
160
180
200
20 40 100
60 80
66.7
w
Fig. 3. On the left is plotted the security (inverse of the probability of success) of the
FSB hash function for the parameters n= 214,r= 160 when using the optimal attacks.
The curve on the right corresponds to n= 3 ·213 ,r= 224.
9
operations, however, once an information set is chosen, the simple fact of veri-
fying whether it is valid or not requires to perform a Gaussian elimination on a
small r×rmatrix. This should take at least O(r2) binary operations. This gives
a final security of 262.3binary operations for r= 160 which is still too small. For
r= 224 we get 282.3operations which can be considered as secure.
For extra strong security, when trying to take into account some statements
in appendix B, one could try to aim at a probability below 280 (corresponding
to an attacker able to perform Gaussian eliminations in constant time) or 2130
(for an attacker using an idealistic algorithm). This is achieved, for example,
with r= 288 with a probability of choosing a valid information set of 285.8or
r= 512 with a probability of 2152.5. These levels of security are probably far
above what practical attacks could do, but it is interesting to see how they can
be reached using reasonable parameters.
0
100
200
160
10 20 30 40 50 60 70 80
500
600
400
300
224
w
20
0
60
80
250 30020015050 64 96 128
29.1
39.5
76.2
w
Fig. 4. On the left is the number of bits of input for one round of the SB (dotted line)
or the FSB (plain line) hash functions as a function of w. These are calculated for a
fixed n= 214. On the right are the curves corresponding to the number of XORs per
bit of input as a function of wfor the FSB hash function with n= 214,r= 160 (dotted
line), n= 3 ·213 ,r= 224 (plain line) and n= 213,r= 288 (dashed line).
In terms of efficiency, what we are concerned in is the required number of
binary XORs per bit of input. We see on Fig. 4 (left) that the FSB version of
the hash function is a little less efficient than the SB version as, for the same
number of XORs, it will read more input bits. The figure on the right shows the
number of bit XORs required for one bit of input. It corresponds to the following
formula:
NXOR =r·w
wlog2(n/w)r.
This function will always reach its minimum for w=r·ln 2, and values
around this will also be nearly optimal as the curve is quite flat. A smaller w
will moreover yield smaller block size for a same hashing cost, which is better
when hashing small files.
Note that NXOR can always be reduced to have a faster function by increasing
n, however this will increase the block size of the function and will also increase
the size of the random matrix to be used. In software, as soon as this matrix
10
becomes larger than the machine’s memory cache size, the speed will immediately
drop as the number of cache misses will become too large.
4.4 Proposed Parameters for Software Implementation
The choice of all the parameters of the system should be done with great care as
a bad choice could strongly affect the performances of the system. One should
first choose rto have the output size he wishes and the security required. Then,
the choice of wand nshould verify the following simple rules:
n
wis a power of 2, so as to read an integer number of input bits at a time.
This may be 28to read the input byte by byte.
n×ris smaller than the cache size to avoid cache misses.
wis close to its optimal value (see Fig. 4).
ris a multiple of 32 (or 64 depending on the CPU architecture) for a full
use of the word-size XORs
Thus we make three proposals. Using r= 160, w= 64 and n= 214 and
a standard well optimized C implementation of the FSB we obtained an algo-
rithm faster than the standard md5sum linux function and nearly as fast as a C
implementation of SHA-1. That is a rate of approximately 300 Mbits of input
hashed in one second on a 2GHz pentium 4. However, these parameters are not
secure enough for collision resistance (only 262.3binary operations). They could
nevertheless be used when simply looking for pre-image resistance and higher
output rate, as the complexity of inverting this function remains above the limit
of 280 operations.
With the secure r= 224, w= 96, n= 3 ·213 parameters (probability of
266.7and 282.3binary operations) the speed is a little smaller with only up to
200 Mbits/s. With r= 288, w= 128, n= 213 (probability below 280), the
speed should be just above 100 Mbits/s.
5 Conclusion
We have proposed a family of fast and provably secure hash functions. This
construction enjoys some interesting features: both the block size of the hash
function and the output size are completely scalable, the security depends di-
rectly of the output size and can hence be set to any desired level, the number
of XORs used by FSB per input bit can be decreased to improve speed. Note
that collision resistance can be put aside in order to allow parameters giving an
higher output rate.
However, reaching very high output rates requires the use of a large matrix.
This can be a limitation when trying to use FSB on memory constrained devices.
On classical architectures this will only fix a maximum speed (most probably
when the size of the matrix is just below the memory cache size).
Another important point is the presence of weak instances of this hash func-
tion: it is clear that the matrix Hcan be chosen with bad properties. For instance,
the all zero matrix will define a hash function with constant zero output. How-
ever, these bad instances only represent a completely negligible proportion of all
the matrices and when choosing a matrix at random there is no risk of choosing
a weak instance.
11
References
1. A. Barg. Complexity issues in coding theory. In V. S. Pless and W. C. Huffman,
editors, Handbook of Coding theory, volume I, chapter 7, pages 649–754. North-
Holland, 1998.
2. E. R. Berlekamp, R. J. McEliece, and H. C. van Tilborg. On the inherent in-
tractability of certain coding problems. IEEE Transactions on Information Theory,
24(3), May 1978.
3. J. Black, P. Rogaway, and T. Shrimpton. Black box analysis of the block ci-
pher based hash-function constructions from PGV. In Advances in Cryptology -
CRYPTO 2002, volume 2442 of LNCS. Springer-Verlag, 2002.
4. A. Canteaut and F. Chabaud. A new algorithm for finding minimum-weight words
in a linear code: Application to McEliece’s cryptosystem and to narrow-sense BCH
codes of length 511. IEEE Transactions on Information Theory, 44(1):367–378,
January 1998.
5. I.B. Damgard. A design principle for hash functions. In Gilles Brassard, editor,
Advances in Cryptology - Crypto’ 89, LNCS, pages 416–426. Springer-Verlag, 1989.
6. J.-B. Fischer and J. Stern. An efficient pseudo-random generator provably as
secure as syndrome decoding. In Ueli M. Maurer, editor, Advances in Cryptology -
EUROCRYPT ’96, volume 1070 of LNCS, pages 245–255. Springer-Verlag, 1996.
7. P. Guillot. Algorithmes pour le codage `a poids constant. Unpublished.
8. Y. Gurevich. Average case completeness. Journal of Computer and System Sci-
ences, 42(3):346–398, 1991.
9. P. J. Lee and E. F. Brickell. An observation on the security of McEliece’s public-
key cryptosystem. In C. G. G¨unther, editor, Advances in Cryptology – EURO-
CRYPT’88, volume 330 of LNCS, pages 275–280. Springer-Verlag, 1988.
10. L. Levin. Average case complete problems. SIAM Journal on Computing,
15(1):285–286, 1986.
11. R. J. McEliece. A public-key cryptosystem based on algebraic coding theory. DSN
Prog. Rep., Jet Prop. Lab., California Inst. Technol., Pasadena, CA, pages 114–116,
January 1978.
12. A. Menezes, P. van Oorschot, and S. Vanstone. Handbook of Applied Cryptography.
CRC Press, 1996.
13. R. C. Merkle. One way hash functions and DES. In Gilles Brassard, editor,
Advances in Cryptology - Crypto’ 89, LNCS. Springer-Verlag, 1989.
14. National Insitute of Standards and Technology. FIPS Publication 180: Secure Hash
Standard, 1993.
15. H. Niederreiter. Knapsack-type crytosystems and algebraic coding theory. Prob.
Contr. Inform. Theory, 15(2):157–166, 1986.
16. R.L. Rivest. The MD4 message digest algorithm. In A.J. Menezes and S.A.
Vanstone, editors, Advances in Cryptology - CRYPTO ’90, LNCS, pages 303–311.
Springer-Verlag, 1991.
17. N. Sendrier. On the security of the McEliece public-key cryptosystem. In
M. Blaum, P.G. Farrell, and H. van Tilborg, editors, Information, Coding and
Mathematics, pages 141–163. Kluwer, 2002. Proceedings of Workshop honoring
Prof. Bob McEliece on his 60th birthday.
18. J. Stern. A method for finding codewords of small weight. In G. Cohen and
J. Wolfmann, editors, Coding theory and applications, volume 388 of LNCS, pages
106–113. Springer-Verlag, 1989.
12
A NP-completeness Proofs
The most general problem we want to study concerning syndrome decoding with
regular words is:
b-Regular Syndrome Decoding (b-RSD)
Input: wbinary matrices Hiof dimension r×nand a bit string Sof length r.
Property: there exists a set of b×w0columns (with 0 < w0w), 0 or b columns
in each Hi, adding to S.
Note that in this problem b is not an input parameter. The fact that for any
value of b this problem is NP-complete is much stronger than simply saying that
the problem where b is an instance parameter is NP-complete. This also means
that there is not one, but an infinity of such problems (one for each value of b).
However we consider them as a single problem as the proof is the same for all
values of b.
The two following sub-problems are derived from the previous one. They
correspond more precisely to the kind of instances that an attacker on the FSB
hash function would need to solve.
Regular Syndrome Decoding (RSD)
Input: wmatrices Hiof dimension r×nand a bit string Sof length r.
Property: there exists a set of wcolumns, 1 per Hi, adding to S.
2-Regular Null Syndrome Decoding (2-RNSD)
Input: wmatrices Hiof dimension r×n.
Property: there exists a set of 2 ×w0columns (with 0 < w0w), taking 0 or 2
columns in each Hiadding to 0.
It is easy to see that all of these problems are in NP. To prove that they
are NP-complete we will use a reduction similar to the one given by Berlekamp,
McEliece and van Tilborg for Syndrome Decoding [2]. We will use the following
known NP-complete problem.
Three-Dimensional Matching (3DM)
Input: a subset UT×T×Twhere Tis a finite set.
Property: there is a set VUsuch that |V|=|T|and no two elements of V
agree on any coordinate.
Let’s study the following example: let T={1,2,3}and |U|= 6
U1= (1,2,2)
U2= (2,2,3)
U3= (1,3,2)
U4= (2,1,3)
U5= (3,3,1)
One can see that the set consisting of U1,U4and U5verifies the property.
However if you remove U1from Uthen no solution exist. In our case it is more
convenient to represent an instance of this problem in another way: we associate
a 3|T|× |U|binary incidence matrix Ato the instance. For the previous example
it would give:
13
122 223 132 213 331
1 1 0 1 0 0
2 0 1 0 1 0
3 0 0 0 0 1
1 0 0 0 1 0
2 1 1 0 0 0
3 0 0 1 0 1
1 0 0 0 0 1
2 1 0 1 0 0
3 0 1 0 1 0
A solution to the problem will then be a subset of |T|columns adding to the
all-1 column. Using this representation, we will now show that any instance of
this problem can be reduced to solving an instance of BSD, hence proving that
BSD is NP-complete.
Reductions of 3DM to RSD. Given an input UT×T×Tof the 3DM
problem, let Abe the 3|T| × |U|incidence matrix described above. For ifrom 1
to |T|we take Hi=A.
If we try to solve the BSD problem on these matrices with w=|T|and
S= (1,...,1) a solution will exist if and only if we are able to add w≤ |T|
columns of A(possibly many times the same one) and obtain a column of 1s.
As all the columns of Acontain only three 1s, the only way to have 3 × |T|1s
at the end is that during the adding no two columns have a 1 on the same line
(each time two columns have a 1 on the same line the final weight decreases by
2). Hence the |T|chosen columns will form a suitable subset Vfor the 3DM
problem.
This means that if we are able to give an answer to this RSD instance, we
will be able to answer the 3DM instance we wanted to solve. Thus RSD is NP-
complete.
Reduction of 3DM to b-RSD. This proof will be exactly the same as the
one above. The input is the same, but this time we build the following matrix:
B=
A
A
A
0
0
the block matrix with b times A
on the diagonal
Once again we take Hi=Band use S= (1, . . . , 1). The same arguments as
above apply here and prove that for any given value of b, if we are able to give
an answer to this b-RSD instance, we will be able to answer the 3DM instance
we wanted to solve. Hence, for any b, b-RSD is NP-complete.
14
Reduction of 3DM to 2-RNSD. We need to construct a matrix for which
solving a 2-RNSD instance is equivalent to solving a given 3DM instance. A
difficulty is that, this time, we can’t choose S= (1, . . . , 1) as this problem is
restricted to the case S= 0. For this reason we need to construct a somehow
complicated matrix Hwhich is the concatenation of the matrices Hiwe will use.
It is constructed as follows:
H=
A
0
Id Id
0
A
0
Id Id
0
A
0
Id
Id
Id
Id
0
0
0
0
1
0
U
1
1
T
1
T
()
0
This matrix is composed of three parts: the top part with the Amatrices,
the middle part with pairs of identity |U| × |U|matrices, and the bottom part
with small lines of 1s.
The aim of this construction is to ensure that a solution to 2-RNSD on this
matrix (with w=|T|+ 1) exists if and only if one can add |T|columns of Aand
a column of 1s to obtain 0. This is then equivalent to having a solution to the
3DM problem.
The top part of the matrix will be the part where the link to 3DM is placed:
in the 2-RNSD problem you take 2 columns in some of the block, our aim is to
take two columns in each block, and each time, one in the Asub-block and one in
the 0 sub-block. The middle part ensures that when a solution chooses a column
in Hit has to choose the only other column having a 1 on the same line so that
the final sum on this line is 0. This means that any time a column is chosen in
one of the Asub-blocks, the “same” column is chosen in the 0 sub-block. Hence
in the final 2w0columns, w0will be taken in the Asub-blocks (or the 1 sub-block)
and w0in the 0 sub-blocks. You will then have a sum of w0columns of Aor 1
(not necessarily distinct) adding to 0. Finally, the bottom part of the matrix is
there to ensure that if w0>0 (as requested in the formulation of the problem)
then w0=w. Indeed, each time you pick a column in the block number i, the
middle part makes you have to pick one in the other half of the block, creating
two ones in the final sum. To eliminate these ones the only way is to pick some
columns in the blocks i1 and i+ 1 and so on, until you pick some columns in
all of the wblocks.
As a result, we see that solving an instance of 2-RNSD on His equivalent
to choosing |T|columns in A(not necessarily different) all adding to 1. As in
15
the previous proof, this concludes the reduction and 2-RNSD is now proven
NP-complete.
It is interesting to note that instead of using 3DM we could directly have used
RSD for this reduction. You simply replace the Amatrices with the wblocks
of the RSD instance you need to solve and instead of a matrix of 1s you put a
matrix containing columns equal to S. Then the reduction is also possible.
B Modeling Information Set Decoding
Using a classical ISD attack we have seen that the average amount of calcula-
tion required to find a solution to an instance of SD is g(r)/Pw. This is true
when a complete Gaussian elimination is done for each information set chosen.
However, some additional calculations could be done. In this way, each choice of
information set could allow to test more words. For instance, in [9], each time an
information set is chosen, the validity of this set is tested, but at the same time,
partial validity is tested. That is, if there exists a solution with a few 1s among
the kpositions of the information set, it will also be found by the algorithm. Of
course, the more solutions one wants to test for each Gaussian elimination, the
more additional calculations he will have to perform.
In a general way, if K1and K2denote respectively the complexities in space
and time of the algorithm performing the additional computations, and Mde-
notes the amount of additional possible solutions explored, we should have:
K1× K2M.
Moreover, if Pwis the probability of finding a solution for one information
set, then, when observing Mtimes more solutions at a time, the total probability
of success is not greater than MPw.
Hence, the total time complexity of such an attack would be:
K ≥ g(r) + K2
MPw
g(r)
MPw
+1
K1Pw
.
When Mbecomes large the g(r)/MPwterm becomes negligible (the cost of
the Gaussian elimination no longer counts) and we have:
K ≥ 1
K1Pw
.
This would mean that, in order to be out of reach of any possible attack, the
inverse of the probability Pwshould be at least as large as K × K1. Allowing
complexities up to 280 in time and 250 in space we would need Pw2130.
However this is only theoretical. In practice there is no known algorithm for
which K1× K2=M. Using existing algorithm this would rather be:
K1× K2=M×Loss and K ≥ Loss
K1Pw
,
where Loss denotes a function of log M,n,rand w. For this reason, the optimal
values for the attack will often only correspond to a small amount of extra calcu-
lation for each information set. This will hence save some time on the Gaussian
eliminations but will hardly gain anything on the rest. The time complexity K
will always remain larger than 1/Pwand will most probably be even a little
above.
16
... Their security relies on the intractability of the binary syndrome decoding problem (SDP) [6]. The SDP is the core hard problem of several cryptographic constructions, e.g., the FSB hash function [4], the SYND stream cipher [28] or the Stern identification scheme [51]. Given a parity-check matrix H of a binary linear code, a binary syndrome vector s * and an integer t, the SDP consists in asking for fixed Hamming weight (HW(x) = t) solution to the linear system H x = s * . ...
Article
Full-text available
Code-based cryptography received attention after the NIST started the post-quantum cryptography standardization process in 2016. A central NP-hard problem is the binary syndrome decoding problem, on which the security of many code-based cryptosystems lies. The best known methods to solve this problem all stem from the information-set decoding strategy, first introduced by Prange in 1962. A recent line of work considers augmented versions of this strategy, with hints typically provided by side-channel information. In this work, we consider the integer syndrome decoding problem, where the integer syndrome is available but might be noisy. We study how the performance of the decoder is affected by the noise. First we identify the noise model as being close to a centered in zero binomial distribution. Second we model the probability of success of the ISD-score decoder in presence of a binomial noise. Third, we demonstrate that with high probability our algorithm finds the solution as long as the noise parameter d is linear in t (the Hamming weight of the solution) and t is sub-linear in the code-length. We provide experimental results on cryptographic parameters for the BIKE and Classic McEliece cryptosystems, which are both candidates for the fourth round of the NIST standardization process.
... is the binary vector of length 2 c and Hamming weight 1 whose sole 1 entry is at [4]). Let c be a positive integer, c divides k and m = 2 c · k c . ...
Article
Since its introduction by Chaum and Heyst, group signature has been one of the most active areas of cryptographic research with numerous applications to computer security and privacy. Group signature permits the members of a group to sign a document on behalf of the entire group keeping signer's identity secret and enabling disclosure of the signer's identity if required. In this work, we present the first code-based fully-dynamic group signature scheme which allows group members to join or leave the group at any point of time. We employ a code-based updatable Merkle-tree accumulator in our design to achieve logarithmic-size signature and utilize randomized Niederreiter encryption to trace the identity of the signer. More positively, we equipped our scheme with deniability characteristic whereby the tracing authority can furnish evidence showing that a given member is not the signer of a particular signature. Our scheme satisfies the security requirements of anonymity, non-frameability, traceability and tracing-soundness in the random oracle model under the hardness of generic decoding problem. We emphasize that our scheme provides full-dynamicity, features deniability in contrast to the existing code-based group signature schemes and works favourably in terms of signature size, group public key size and secret key size.
Article
Overfitting is one of the most important factors affecting the performance of machine lear¬ning algorithms. When solving machine learning problems, it is important to be able to effectively solve the problem of overfitting. The research objective. The purpose of this article is to study the problem of overfitting in machine learning tasks. The article discusses effective learning methods aimed at preventing overfitting. Material and methods. The focus of the article is on various non-standard issues related to overfitting that are important from a practical point of view. Various causes of overfitting, its consequences and methods of combating overfitting are considered. The dependence of overfitting and generalizing abi¬lity on the quality of features and properties of the training set is studied. Particular attention is paid to the features of training and the formation of a training sample in multidimensional feature spaces. The question of the correct formation of the training set and the correct addition of data to the training set from the point of view of overfitting prevention, as well as the impact of incorrect distribution of the target variable on overfitting, is considered. It is explained why the methods of adding incorrect data to the training set, such as MixUp and CutMix, can improve the quality of training. The problem of the algorithm's confidence in its predictions is considered, as well as the problem of algorithm overconfidence in incorrect predictions, which is also typical for ChatGPT. The problem of assessing the quality of the algorithm is considered. It is shown why normalization can help avoid overfitting. Results. An algorithm for training decision trees Random Samples Mix-Up is proposed to combat overfitting, which improves the quality of training decision trees. A comparative analysis of the quality of models before and after the application of this method of combating overfitting is carried out. Experiments on real data confirm effectiveness of this method. Conclusion. The results of the study can be useful in developing new machine learning algorithms and improving the efficiency of existing ones. The results of the study can be useful for developers of machine learning algorithms and specialists in the field of artificial intelligence.
Chapter
This paper introduces a new family of CVE schemes built from generic errors (GE-CVE) and identifies a vulnerability therein. To introduce the problem, we generalize the concept of error sets beyond those defined by a metric, and use the set-theoretic difference operator to characterize when these error sets are detectable or correctable by codes. We prove the existence of a general, metric-less form of the Gilbert-Varshamov bound, and show that - like in the Hamming setting - a random code corrects a generic error set with overwhelming probability. We define the generic error SDP (GE-SDP), which is contained in the complexity class of NP-hard problems, and use its hardness to demonstrate the security of GE-CVE. We prove that these schemes are complete, sound, and zero-knowledge. Finally, we identify a vulnerability of the GE-SDP for codes defined over large extension fields and without a very high rate. We show that certain GE-CVE parameters suffer from this vulnerability, notably the restricted CVE scheme.
Chapter
We introduce a new candidate post-quantum digital signature scheme from the regular syndrome decoding (RSD) assumption, an established variant of the syndrome decoding assumption which asserts that it is hard to find \(w \)-regular solutions to systems of linear equations over \(\mathbb {F}_2\) (a vector is regular if it is a concatenation of \(w \) unit vectors). Our signature is obtained by introducing and compiling a new 5-round zero-knowledge proof system constructed using the MPC-in-the-head paradigm. At the heart of our result is an efficient MPC protocol in the preprocessing model that checks correctness of a regular syndrome decoding instance by using a share ring-conversion mechanism.The analysis of our construction is non-trivial and forms a core technical contribution of our work. It requires careful combinatorial analysis and combines several new ideas, such as analyzing soundness in a relaxed setting where a cheating prover is allowed to use any witness sufficiently close to a regular vector. We complement our analysis with an in-depth overview of existing attacks against RSD.Our signatures are competitive with the best-known code-based signatures, ranging from 12.52 KB (fast setting, with signing time of the order of a few milliseconds on a single core of a standard laptop) to about 9 KB (short setting, with estimated signing time of the order of 15 ms).
Chapter
Zero-knowledge proofs of knowledge are useful tools to design signature schemes. The ongoing effort to build a quantum computer urges the cryptography community to develop new secure cryptographic protocols based on quantum-hard cryptographic problems. One of the few directions is code-based cryptography for which the strongest problem is the syndrome decoding (SD) for random linear codes. This problem is known to be NP-hard and the cryptanalysis state of the art has been stable for many years. A zero-knowledge protocol for this problem was pioneered by Stern in 1993. Since its publication, many articles proposed optimizations, implementation, or variants.In this paper, we introduce a new zero-knowledge proof for the syndrome decoding problem on random linear codes. Instead of using permutations like most of the existing protocols, we rely on the MPC-in-the-head paradigm in which we reduce the task of proving the low Hamming weight of the SD solution to proving some relations between specific polynomials. Specifically, we propose a 5-round zero-knowledge protocol that proves the knowledge of a vector x such that \(y=Hx\) and \({\text {wt}}(x)\le w\) and which achieves a soundness error closed to 1/N for an arbitrary N.While turning this protocol into a signature scheme, we achieve a signature size of 11–12 KB for 128-bit security when relying on the hardness of the SD problem on binary fields. Using larger fields (like \(\mathbb {F}_{2^8}\)), we can produce fast signatures of around 8 KB. This allows us to outperform Picnic3 and to be competitive with SPHINCS+, both post-quantum signature candidates in the ongoing NIST standardization effort. Moreover, our scheme outperforms all the existing code-based signature schemes for the common “signature size \(+\) public key size” metric.
Conference Paper
Full-text available
We show that the Goppa codes Γ(L,g) where g is a binary polynomial constitute a recognizable family of weak keys for McEliece (1978) cryptosystem, thus inducing naturally a structural attack against the system
Article
Full-text available
We explain and advance Levin's theory of average case completeness. In particular, we exhibit examples of problems complete in the average case and prove a limitation on the power of deterministic reductions.
Conference Paper
Full-text available
We show that if there exists a computationally collision free function f from m bits to t bits where m > t, then there exists a computationally collision free function h mapping messages of arbitrary polynomial lengths to t-bit strings. Let n be the length of the message. h can be constructed either such that it can be evaluated in time linear in n using 1 processor, or such that it takes time O(log(n)) using O(n) processors, counting evaluations of f as one step. Finally, for any constant k and large n, a speedup by a factor of k over the first construction is available using k processors. Apart from suggesting a generally sound design principle for hash functions, our results give a unified view of several apparently unrelated constructions of hash functions proposed earlier. It also suggests changes to other proposed constructions to make a proof of security potentially easier. We give three concrete examples of constructions, based on modular squaring, on Wolfram’s pseudoranddom bit generator [Wo], and on the knapsack problem.
Conference Paper
Full-text available
One way hash functions are a major tool in cryptography. DES is the best known and most widely used encryption function in the commercial world today. Generating a one-way hash function which is secure if DES is a “good” block cipher would therefore be useful. We show three such functions which are secure if DES is a good random block cipher.
Book
Cryptography, in particular public-key cryptography, has emerged in the last 20 years as an important discipline that is not only the subject of an enormous amount of research, but provides the foundation for information security in many applications. Standards are emerging to meet the demands for cryptographic protection in most areas of data communications. Public-key cryptographic techniques are now in widespread use, especially in the financial services industry, in the public sector, and by individuals for their personal privacy, such as in electronic mail. This Handbook will serve as a valuable reference for the novice as well as for the expert who needs a wider scope of coverage within the area of cryptography. It is a necessary and timely guide for professionals who practice the art of cryptography. The Handbook of Applied Cryptography provides a treatment that is multifunctional: It serves as an introduction to the more practical aspects of both conventional and public-key cryptography It is a valuable source of the latest techniques and algorithms for the serious practitioner It provides an integrated treatment of the field, while still presenting each major topic as a self-contained unit It provides a mathematical treatment to accompany practical discussions It contains enough abstraction to be a valuable reference for theoreticians while containing enough detail to actually allow implementation of the algorithms discussed Now in its third printing, this is the definitive cryptography reference that the novice as well as experienced developers, designers, researchers, engineers, computer scientists, and mathematicians alike will use.
Article
As RSA, the McEliece public-key cryptosystem has successfully resisted more than 20 years of cryptanalysis effort. However, despite the fact that it is faster, it was not as successful as RSA as far as applications are concerned. This is certainly due to its very large public key and probably also to the belief that the system could not be used for the design of a digital signature scheme. We present here the state of art of the implementation and the security of the two main variants of code-based public-key encryption schemes (McEliece’s and Niedereitter’s) as well as the more recent signature scheme derived from them. We also show how it is possible to formally reduce the security of these systems to two well identified algorithmic problems. The decoding attack (aimed on one particular ciphertext) is connected to the NP-complete syndrome decoding problem. The structural attack (aimed on the public key) is connected to the problem of distinguishing binary Goppa codes from random codes. We conjecture that both these problems are difficult and present some arguments to support this claim.
Article
From the Publisher: A valuable reference for the novice as well as for the expert who needs a wider scope of coverage within the area of cryptography, this book provides easy and rapid access of information and includes more than 200 algorithms and protocols; more than 200 tables and figures; more than 1,000 numbered definitions, facts, examples, notes, and remarks; and over 1,250 significant references, including brief comments on each paper.
Conference Paper
The MD4 message digest algorithm takes an input message of arbitrary length and produces an output 128-bit "fingerprint" or "message digest", in such a way that it is (hopefully) computationally infeasible to produce two messages having the same message digest, or to produce any message having a given prespecified target message digest. The MD4 algorithm is thus ideal for digital signature applications: a large file can be securely "compressed" with MD4 before being signed with (say) the RSA public-key cryptosystem.The MD4 algorithm is designed to be quite fast on 32-bit machines. For example, on a SUN Sparc station, MD4 runs at 1,450,000 bytes/second (11.6 Mbit/sec). In addition, the MD4 algorithm does not require any large substitution tables; the algorithm can be coded quite compactly.The MD4 algorithm is being placed in the public domain for review and possible adoption as a standard.