ArticlePDF Available

A Fast Provably Secure Cryptographic Hash Function.

January 2003

January 2003
2003:230

Source
DBLP

Authors:

Daniel Augot

National Institute for Research in Computer Science and Control

Nicolas Sendrier

National Institute for Research in Computer Science and Control

We propose a family of fast and provably secure cryptographic hash functions. The security of these functions relies directly on the well-known syndrome decoding problem for linear codes. Attacks on this problem are well identified and their complexity is known. This enables us to study precisely the practical security of the hash functions and propose valid parameters for implementation. Furthermore, the design proposed here is fully scalable, with respect to security, hash size and output rate.

A standard hash function construction

…

Figures - uploaded by Nicolas Sendrier

Content may be subject to copyright.

Content uploaded by Nicolas Sendrier

Content may be subject to copyright.

A Fast Provably Secure Cryptographic

Hash Function

Daniel Augot, Matthieu Finiasz, and Nicolas Sendrier

Projet Codes, INRIA Rocquencourt

BP 105, 78153 Le Chesnay - Cedex, France

[Daniel.Augot,Matthieu.Finiasz,Nicolas.Sendrier]@inria.fr

Abstract. We propose a family of fast and provably secure crypto-

graphic hash functions. The security of these functions relies directly

on the well-known syndrome decoding problem for linear codes. Attacks

on this problem are well identiﬁed and their complexity is known. This

enables us to study precisely the practical security of the hash functions

and propose valid parameters for implementation. Furthermore, the de-

sign proposed here is fully scalable, with respect to security, hash size

and output rate.

Key Words: hash functions, syndrome decoding, NP-completeness.

1 Introduction

The main cryptographic hash function design in use today iterates a so called

compression function according to Merkle’s and Damg˚ard’s constructions [5, 13].

Classical compression functions are very fast [3, 14, 16] but cannot be proven

secure. However, provable security may be achieved with compression functions

designed according to public key principles, at the cost of a poor eﬃciency.

Unlike most other public key cryptosystems, the encryption function of the

McEliece cryptosystem [11] (or of Niederreiter’s version [15]) is nearly as fast as a

symmetric cipher. Using this function with a random matrix instead of the usual

Goppa code parity check matrix, we obtain a provably secure one-way function

with no trap. The purpose of this paper is to use this function to obtain a fast

cryptographic hash function whose security is assessed by a diﬃcult algorithmic

problem.

For didactic purposes, we introduce the Syndrome Based (SB) compression

function, which directly relies on Niederreiter’s scheme and the syndrome de-

coding problem. However, this function can hardly be simultaneously fast and

secure for practical parameters. Hence we introduce the Fast Syndrome Based

(FSB) compression function, derived from the previous one and relying on a

similar hard problem. Section 2 is devoted to the description of both functions.

In Section 3 we show that, as for McEliece’s and Niederreiter’s cryptosystems,

the security of SB can be reduced to the hardness of syndrome decoding in the

average case. Similarly, we prove that the security of FSB is reduced to the av-

erage case diﬃculty of two new NP-complete problems. Finally, in Section 4, we

show how the best known decoding techniques can be adapted to the cryptanal-

ysis of our functions. From that we can evaluate the practical security and the

scalability of the system, and eventually propose a choice of parameters. Note

that, for clarity of the presentation, NP-completeness proofs are postponed in

the appendix.

2 The Hash Functions

We will present two diﬀerent versions of the hash function: the ﬁrst one is the

Syndrome Based hash function (SB), the second, a modiﬁed version, is called Fast

Syndrome Based (FSB) and is much faster in practice, but also more secure.

2.1 General Construction

There is one main construction for designing hash functions: it consists in it-

erating a compression function which takes as input sbits and returns rbits,

with s > r, so that, using such a chaining, the resulting function can operate

on strings of arbitrary length (see Fig. 1). The validity of such a design has

been well established [5, 13] and its security is not worse than the security of the

compression function. There for we will concentrate on the security of the latter.

Compression

Chaining

Last round

Document

Last round

First round

Padding

I.V.

Hash value

Fig. 1. A standard hash function construction

2.2 Description of the Syndrome Based Compression Function

The core of the compression function is a random binary matrix Hof size r×n.

Hashing a document will consist in adding (using binary XORs) wcolumns of

this matrix to ﬁnally obtain a hash of length r.

The parameters for the hash function are:

–nthe number of columns of matrix H;

–rthe number of rows of matrix Hand the size in bits of the function output;

–wthe number of columns of Hadded at each round.

Once these parameters are chosen, a (truly) random r×nmatrix His gen-

erated. This matrix is chosen once and for all for the hash function. Using the

scheme explained next a function with input size s= log2¡n

w¢and output size r

is obtained.

As we use a standard chaining method, the input of the compression function

will consist of rbits taken from the output of the previous round and s−rbits

taken from the ﬁle. Of course, wmust be such that s > r.

The compression is performed in two steps:

Input:sbits of data

1. encode the sbits in a word eof length nand weight w;

2. compute HeTto obtain a binary string of length r.

Output:rbits of hash

The ﬁrst step requires to convert the sinput bits into a binary n-tuple

containing exactly wones (and the rest of zeros everywhere else). Then multiply

this word by H(add the corresponding wcolumns of H) to obtain the rbit

hash.

This function is expected to be very fast as only a few operations are required:

input encoding and a few XORs. In practice, the second step will be very fast,

but the ﬁrst step is much slower. Indeed, the best algorithm for embedding data

in a constant weight word [6, 7] makes extensive use of large integer arithmetics

and is by far the most expensive part.

2.3 Description of the Fast Syndrome Based Compression Function

Deﬁnition 1. A word of length nand weight wis called regular if it has exactly

one non-zero position in each of the wintervals r(i−1) n

w;in

wzi=1..w.

To improve the speed, we embed less data in each constant weight word by

using a faster, no longer one-to-one, constant weight encoder. Instead of using

any word of weight wwe embed the input bits in a regular word of weight w.

Hence we will have s=wlog2(n/w).

The matrix His split into wsub-blocks Hiof size r×n

wand the algorithm

is the following:

Input:sbits of data

1. split the sinput bits in wparts s1, . . . , swof log2(n/w) bits;

2. convert each sito an integer between 1 and n

3. choose the corresponding column in each Hi;

4. add the wchosen columns to obtain a binary string of length r.

Output:rbits of hash

Using this encoder the cost of the ﬁrst step becomes negligible as it only

consists in reading the input bits a ﬁxed number at a time. This compression

function is hence very fast and its speed is directly linked to the number of XORs

required for a round.

2.4 Related Work

In Merkle’s construction [13], the compression function is an encryption function

(modular squaring, knapsack. . . ), used as a one-way function. In the case of

the SB hash function the compression function is very similar to the encoding

function of Niederreiter’s version of the McEliece cryptosystem [11, 15]. The only

diﬀerence is that, instead of using the parity check matrix of a permuted Goppa

code, SB uses a random matrix H. Doing this, the trap in the one-way function

of the cryptosystem was removed.

From a security point of view this can only strengthen the system as all

attacks on the trap no longer hold. Breaking Niederreiter’s cryptosystem can be

reduced to the two problems of inverting the one-way function or recovering the

trap. In the case of SB, only the inversion problem remains.

3 Theoretical Security

As stated in [12], a cryptographic hash function has to be pre-image resistant,

second pre-image resistant and collision resistant. As the second pre-image re-

sistance is strictly weaker than collision resistance, we will only check that both

hash functions are collision free and resistant to inversion. In the SB hashing

scheme this can be reduced to solving an instance of Syndrome Decoding, which

is NP-complete [2]. In the FSB version, these two kinds of attack can be re-

duced to two very close new problems. We will ﬁrst describe them and show (in

appendix) that they are also NP-complete.

We will then show that ﬁnding a collision or an inversion is at least as hard

as solving one of these two new problems. This is what we call provable security.

3.1 Two New NP-complete Problems

In this section we will recall the problems of syndrome decoding and null syn-

drome decoding and then describe two closely related new problems.

Syndrome Decoding (SD)

Input: a binary matrix Hof dimension r×nand a bit string Sof length r.

Property: there exists a set of w0columns of Hadding to S(with 0 < w0≤w).

Null Syndrome Decoding (NSD)

Input: a binary matrix Hof dimension r×n.

Property: there exists a set of w0columns of Hadding to 0 (with 0 < w0≤w).

These two problems are NP-complete [2], which means that at least some

instances of the problem are diﬃcult. However, it is a common belief that they

should be diﬃcult in average (for well chosen parameter ranges), which means

that random instances are diﬃcult. For cryptographic purposes this is much

more interesting than simple NP-completeness.

For the following two problems the same comment can be made. They are

NP-complete (see appendix A) and we believe that they are hard in average.

Regular Syndrome Decoding (RSD)

Input: wmatrices Hiof dimension r×nand a bit string Sof length r.

Property: there exists a set of wcolumns, one in each Hi, adding to S.

2-Regular Null Syndrome Decoding (2-RNSD)

Input: wmatrices Hiof dimension r×n.

Property: there exists a set of 2w0columns (with 0 < w0≤w), 0 or 2 in each

Hi, adding to 0.

3.2 Security Reduction

In this section we will show how ﬁnding collisions or inverting any of the two

proposed hash function is as hard as solving an instance of one of the NP-

complete problems described in the previous section.

We will prove the security of the compression function, which is enough when

using a standard construction (see [12] p. 333).

Security of the Syndrome Based Hash Function. Finding an inversion

for this compression function consists in ﬁnding an input (of length s) which will

hash to a given bit string S. Now suppose an algorithm Ais able to compute

inversions for this function, and an instance (H, w, S) of the SD problem has to

be solved. Then, using A, it is possible to compute inverses for the compression

function using H, and so obtain an input with hash S. This means that the w

columns corresponding to this input, when added together, sum to S. A solution

to the given instance of SD has been found.

Finding a collision for this scheme consists in ﬁnding two diﬀerent inputs

hashing to the same string. Now suppose an algorithm A0is able to ﬁnd collisions

for this compression function, and a given instance (H,2w) of the NSD problem

has to be solved. The algorithm A0can compute two inputs hashing (through

H) to the same string. These inputs correspond to two diﬀerent words of weight

wwhich when added together give a word mof weight ≤2w. Due to linearity,

the product HmTis 0. The word mis a solution to the instance of NSD.

So, ﬁnding either a collision or an inversion for the SB hash function is at

least as hard as solving an instance of the SD or NSD problems.

Security of the FSB Hash Function. For this version of the hash function

the security reduction can be done exactly like for the SB function. Inverting

the compression function can be reduced to the RSD problem, and collision to

the 2-RNSD problem.

These reductions to NP-complete problems do not prove that all instances are

diﬃcult but only that some instances are diﬃcult. For a cryptographic security

this is clearly not enough. However, in the same manner as Gurevich and Levin [8,

10] have discussed it for SD, we believe that all these NP-complete problems are

diﬃcult in average (for well chosen parameters).

4 Practical Security

This section is dedicated to the study of the practical security of the two versions

of the hash function. As for the security reduction, attacking the hash function

as a whole is equivalent to attacking a single round and takes the same amount

of computation. Therefore we need to identify the possible attacks on one round

of the system, and then study the minimal workfactors required to perform these

attacks.

4.1 Existing Attacks

Decoding in a random linear code is at least as hard as giving an answer to SD.

This problem has been extensively studied along the years and many attacks

against it have been developed (see [1]): split syndrome decoding, gradient-like

decoding, information set decoding. . . All these attacks are exponential. Still, as

stated by Sendrier [17], the most eﬃcient attacks seem to be all derived from

Information Set Decoding (ISD).

Deﬁnition 2. An information set is a set of k=n−r(the dimension of the

code) positions among the npositions of the support.

Deﬁnition 3. Let (H, w, S)be an instance of SD. An information set will be

called valid if there exists a solution to this SD problem which has no 1s among

the chosen kpositions.

The ISD technique consists in picking information sets, until a valid one is

found. Checking whether the information set is valid or not mainly consists in

performing a Gaussian elimination on an r×rsubmatrix of the parity check

matrix Hof the code. Once this is done, if a solution exists it is found in constant

time. Let GE(r) be the cost of this Gaussian elimination and Pwthe probability

for a random information set to be valid. Then the complexity of this algorithm

is GE(r)/Pw.

This technique was improved several times [4, 9, 18]. The eﬀect of these im-

provements on the complexity of the algorithm is to reduce the degree of the

polynomial part. The complexity is then g(r)/Pw, with d◦(g)< d◦(GE). How-

ever, all the diﬀerent versions were designed for instances of SD having only

one (or a few) solutions. For SB, as we will see in section 4.4, the range of pa-

rameters we are interested in leads to instances with many more solutions (over

2400). Anyhow, ISD attacks still be the most suitable and behave the same way

as when there is a single solution.

Applied to the FSB hash function, this technique will, in addition, have to

return regular words. This will decrease considerably the number of solutions

and, in this way, decrease the probability for a given information set to be valid,

thus enhancing security.

4.2 Analysis of Information Set Decoding

We have seen that the complexity of an information set decoding attack can be

expressed as g(r)/Pw. Hence, it is very important to evaluate precisely Pwin

both versions of the hash function and for both kind of attacks. The polynomial

part has less importance and is approximately the same in all four cases.

The probability Pwwe want to calculate will depend on two things: the

probability Pw,1that a given information set is valid for one given solution of

SD, and the expected number Nwof valid solutions for SD. Even though the

probabilities we deal with are not independent, we shall consider

Pw= 1 −(1 − Pw,1)Nw.

It is important to note that Nwis the average number of valid solutions to

SD. For small values of w,Nwcan be much smaller than one; the formula is

valid though.

In this section, for the sake of simplicity, we will use the approximation

Pw' Pw,1× Nw. When calculating the security curves (see section 4.3) and

choosing the ﬁnal parameters (see 4.4), we have used the exact formulas for the

calculations.

Solving SD: attacking the SB hash function. For inversion, the problem

is the following: we have a syndrome which is a string Sof rbits. We want to

ﬁnd an information set (of size k=n−r) which is valid for one inverse of Sof

weight w.

For one given inverse of weight wthe probability for a given information set

to be valid is:

Pw,1=¡n−w

k¢

¡n

k¢=¡n−k

w¢

¡n

w¢=¡r

w¢

¡n

w¢.

The average number of solutions (of inverses) of weight wis:

Nw=¡n

w¢

2r.

In the end, we get a total probability of choosing an information set valid for

inversion of:

Pinv =Pw,1× Nw=¡r

w¢

2r.

To ﬁnd a collision, it is enough to ﬁnd a word of even weight 2iwith 0 < i ≤

w. We have exactly the same formulas as for inversion, but this time the ﬁnal

probability is:

Pcol =

i=1 ¡r

2i¢

2r.

Solving RSD: inverting the FSB hash function. In that case, one needs

to ﬁnd a regular word of weight whaving a given syndrome S. Since the word is

regular there will be less solutions for each syndrome, and even if each of them

is easier to ﬁnd, in the end the security is increased compared to SB.

The number of regular solutions to RSD is, in average:

Nw=¡n

w¢

The probability of ﬁnding a valid information set for a given solution is

however a little more intricate. For instance, as the solutions are not random

words, the attacker shouldn’t choose the information set at random, but should

rather choose the sets which have the best chance of being valid. In our case it

is easy to see that the attacker will maximize his chances when taking the same

number of positions in each block, that is, taking k/w positions wtimes. The

probability of success is then:

Pw,1=Ã¡n/w−1

k/w ¢

¡n/w

k/w¢!w

=¡r

w¢w

¡n

w¢w.

The ﬁnal probability is:

Pinv =Pw,1× Nw=¡r

w¢

One can check that this probability is much smaller than for the SB hash

function (a ratio of w!/ww). Using the fast constant weight encoder, and restrict-

ing the set of solutions to RSD, has strengthened the system.

Solving 2-RNSD: collisions in the FSB hash function. When looking

for collisions one needs to ﬁnd two regular words of weight whaving the same

syndrome. However these two words can coincide on some positions. With the

block structure of regular words, this means that we are looking for words with

a null syndrome having some blocks (say i) with a weight of 2 and the remaining

blocks with a weight of 0. The number of such words is:

Ni=¡w

i¢¡n/w

2¢i

2r.

Once again, the attacker can choose his strategy when choosing information

sets. However this time the task is a little more complicated as there is not a

single optimal strategy. If the attacker is looking for words of weight up to 2w

then the strategy is the same as for RSD, choosing an equal amount of positions

in each set. For each value of i, the probability of validity is then:

Pi,1=¡w

i¢¡n/w−k/w

2¢i

¡w

i¢¡n/w

2¢i=¡r/w

2¢i

¡n/w

2¢i.

The total probability of success for one information set is then:

Pcol total =

i=1 ¡w

i¢¡r/w

2¢i

2r=1

2r·µr/w

2¶+ 1¸w

But the attacker can also decide to focus on some particular words. For ex-

ample he could limit himself to words with non-zero positions only in a given set

of w0< w blocks, take all the information set points available in the remaining

w−w0blocks and distribute the rest of the information set in the w0chosen

blocks. He then works on less solutions, but with a greater probability of success.

This probability is:

Pcol w0=1

2r·µn

w−k0

2¶+ 1¸w0

with k0=k−(w−w0)×n

w=n w0

w−r. This can be simpliﬁed into:

Pcol w0=1

2r·µ r

2¶+ 1¸w0

Surprisingly, this no longer depends of wand n. As the attacker has the

possibility to choose the strategy he prefers (depending on the parameters of the

system) he will be able to choose the most suitable value for w0. However, as

w0≤whe might not always be able to take the absolute maximum of Pcol w0.

The best he can do will be:

Pcol optimal =1

2rmax

w0∈J1;wK·µ r

2¶+ 1¸w0

This maximum will be reached for w0=α·rwhere α≈0.24 is a constant.

Hence, if w > 0.24rwe have:

Pcol optimal =1

2r·µ1

2¶+ 1¸αr

'0.81r

4.3 Some security curves

These are some curves obtained when plotting the exact versions of the formulas

above. For n= 214 and r= 160, we see on Fig. 2 that the security of the FSB

hash function is much higher than that of the SB version. This is the case for

the chosen parameters but also for about any sound set of parameters.

Fig. 3 focuses on the security of the FSB hash function against inversion and

collision. We see that by choosing diﬀerent parameters for the hash function we

can obtain diﬀerent security levels. This level does not depend signiﬁcantly on n

or wbut is mainly a function of r. With r= 160 we get a probability of 2−47.7

that an information set is valid. With r= 224 we decrease this probability to

2−66.7. This may seem far below the usual security requirements of 280 binary

100

120

140

160

180

200

10 20 30 40 50 60 70 80

100

120

140

160

180

200

10 20 30 40 50 60 70 80

Fig. 2. These two curves show the log2of the inverse of the probability that an infor-

mation set is valid, as a function of w. The dotted line corresponds to the SB version

of the scheme and the plain line to the FSB version. On the left the attack is made for

inversion, on the right for collision. The curves correspond to the parameters n= 214 ,

r= 160.

100

120

140

160

180

200

10 20 30 40 50 60 70 80

47.7

100

120

140

160

180

200

20 40 100

60 80

66.7

Fig. 3. On the left is plotted the security (inverse of the probability of success) of the

FSB hash function for the parameters n= 214,r= 160 when using the optimal attacks.

The curve on the right corresponds to n= 3 ·213 ,r= 224.

operations, however, once an information set is chosen, the simple fact of veri-

fying whether it is valid or not requires to perform a Gaussian elimination on a

small r×rmatrix. This should take at least O(r2) binary operations. This gives

a ﬁnal security of 262.3binary operations for r= 160 which is still too small. For

r= 224 we get 282.3operations which can be considered as secure.

For extra strong security, when trying to take into account some statements

in appendix B, one could try to aim at a probability below 2−80 (corresponding

to an attacker able to perform Gaussian eliminations in constant time) or 2−130

(for an attacker using an idealistic algorithm). This is achieved, for example,

with r= 288 with a probability of choosing a valid information set of 2−85.8or

r= 512 with a probability of 2−152.5. These levels of security are probably far

above what practical attacks could do, but it is interesting to see how they can

be reached using reasonable parameters.

100

200

160

10 20 30 40 50 60 70 80

500

600

400

300

224

250 30020015050 64 96 128

29.1

39.5

76.2

Fig. 4. On the left is the number of bits of input for one round of the SB (dotted line)

or the FSB (plain line) hash functions as a function of w. These are calculated for a

ﬁxed n= 214. On the right are the curves corresponding to the number of XORs per

bit of input as a function of wfor the FSB hash function with n= 214,r= 160 (dotted

line), n= 3 ·213 ,r= 224 (plain line) and n= 213,r= 288 (dashed line).

In terms of eﬃciency, what we are concerned in is the required number of

binary XORs per bit of input. We see on Fig. 4 (left) that the FSB version of

the hash function is a little less eﬃcient than the SB version as, for the same

number of XORs, it will read more input bits. The ﬁgure on the right shows the

number of bit XORs required for one bit of input. It corresponds to the following

formula:

NXOR =r·w

wlog2(n/w)−r.

This function will always reach its minimum for w=r·ln 2, and values

around this will also be nearly optimal as the curve is quite ﬂat. A smaller w

will moreover yield smaller block size for a same hashing cost, which is better

when hashing small ﬁles.

Note that NXOR can always be reduced to have a faster function by increasing

n, however this will increase the block size of the function and will also increase

the size of the random matrix to be used. In software, as soon as this matrix

becomes larger than the machine’s memory cache size, the speed will immediately

drop as the number of cache misses will become too large.

4.4 Proposed Parameters for Software Implementation

The choice of all the parameters of the system should be done with great care as

a bad choice could strongly aﬀect the performances of the system. One should

ﬁrst choose rto have the output size he wishes and the security required. Then,

the choice of wand nshould verify the following simple rules:

–n

wis a power of 2, so as to read an integer number of input bits at a time.

This may be 28to read the input byte by byte.

–n×ris smaller than the cache size to avoid cache misses.

–wis close to its optimal value (see Fig. 4).

–ris a multiple of 32 (or 64 depending on the CPU architecture) for a full

use of the word-size XORs

Thus we make three proposals. Using r= 160, w= 64 and n= 214 and

a standard well optimized C implementation of the FSB we obtained an algo-

rithm faster than the standard md5sum linux function and nearly as fast as a C

implementation of SHA-1. That is a rate of approximately 300 Mbits of input

hashed in one second on a 2GHz pentium 4. However, these parameters are not

secure enough for collision resistance (only 262.3binary operations). They could

nevertheless be used when simply looking for pre-image resistance and higher

output rate, as the complexity of inverting this function remains above the limit

of 280 operations.

With the secure r= 224, w= 96, n= 3 ·213 parameters (probability of

2−66.7and 282.3binary operations) the speed is a little smaller with only up to

200 Mbits/s. With r= 288, w= 128, n= 213 (probability below 2−80), the

speed should be just above 100 Mbits/s.

5 Conclusion

We have proposed a family of fast and provably secure hash functions. This

construction enjoys some interesting features: both the block size of the hash

function and the output size are completely scalable, the security depends di-

rectly of the output size and can hence be set to any desired level, the number

of XORs used by FSB per input bit can be decreased to improve speed. Note

that collision resistance can be put aside in order to allow parameters giving an

higher output rate.

However, reaching very high output rates requires the use of a large matrix.

This can be a limitation when trying to use FSB on memory constrained devices.

On classical architectures this will only ﬁx a maximum speed (most probably

when the size of the matrix is just below the memory cache size).

Another important point is the presence of weak instances of this hash func-

tion: it is clear that the matrix Hcan be chosen with bad properties. For instance,

the all zero matrix will deﬁne a hash function with constant zero output. How-

ever, these bad instances only represent a completely negligible proportion of all

the matrices and when choosing a matrix at random there is no risk of choosing

a weak instance.

References

1. A. Barg. Complexity issues in coding theory. In V. S. Pless and W. C. Huﬀman,

editors, Handbook of Coding theory, volume I, chapter 7, pages 649–754. North-

Holland, 1998.

2. E. R. Berlekamp, R. J. McEliece, and H. C. van Tilborg. On the inherent in-

tractability of certain coding problems. IEEE Transactions on Information Theory,

24(3), May 1978.

3. J. Black, P. Rogaway, and T. Shrimpton. Black box analysis of the block ci-

pher based hash-function constructions from PGV. In Advances in Cryptology -

CRYPTO 2002, volume 2442 of LNCS. Springer-Verlag, 2002.

4. A. Canteaut and F. Chabaud. A new algorithm for ﬁnding minimum-weight words

in a linear code: Application to McEliece’s cryptosystem and to narrow-sense BCH

codes of length 511. IEEE Transactions on Information Theory, 44(1):367–378,

January 1998.

5. I.B. Damgard. A design principle for hash functions. In Gilles Brassard, editor,

Advances in Cryptology - Crypto’ 89, LNCS, pages 416–426. Springer-Verlag, 1989.

6. J.-B. Fischer and J. Stern. An eﬃcient pseudo-random generator provably as

secure as syndrome decoding. In Ueli M. Maurer, editor, Advances in Cryptology -

EUROCRYPT ’96, volume 1070 of LNCS, pages 245–255. Springer-Verlag, 1996.

7. P. Guillot. Algorithmes pour le codage `a poids constant. Unpublished.

8. Y. Gurevich. Average case completeness. Journal of Computer and System Sci-

ences, 42(3):346–398, 1991.

9. P. J. Lee and E. F. Brickell. An observation on the security of McEliece’s public-

key cryptosystem. In C. G. G¨unther, editor, Advances in Cryptology – EURO-

CRYPT’88, volume 330 of LNCS, pages 275–280. Springer-Verlag, 1988.

10. L. Levin. Average case complete problems. SIAM Journal on Computing,

15(1):285–286, 1986.

11. R. J. McEliece. A public-key cryptosystem based on algebraic coding theory. DSN

Prog. Rep., Jet Prop. Lab., California Inst. Technol., Pasadena, CA, pages 114–116,

January 1978.

12. A. Menezes, P. van Oorschot, and S. Vanstone. Handbook of Applied Cryptography.

CRC Press, 1996.

13. R. C. Merkle. One way hash functions and DES. In Gilles Brassard, editor,

Advances in Cryptology - Crypto’ 89, LNCS. Springer-Verlag, 1989.

14. National Insitute of Standards and Technology. FIPS Publication 180: Secure Hash

Standard, 1993.

15. H. Niederreiter. Knapsack-type crytosystems and algebraic coding theory. Prob.

Contr. Inform. Theory, 15(2):157–166, 1986.

16. R.L. Rivest. The MD4 message digest algorithm. In A.J. Menezes and S.A.

Vanstone, editors, Advances in Cryptology - CRYPTO ’90, LNCS, pages 303–311.

Springer-Verlag, 1991.

17. N. Sendrier. On the security of the McEliece public-key cryptosystem. In

M. Blaum, P.G. Farrell, and H. van Tilborg, editors, Information, Coding and

Mathematics, pages 141–163. Kluwer, 2002. Proceedings of Workshop honoring

Prof. Bob McEliece on his 60th birthday.

18. J. Stern. A method for ﬁnding codewords of small weight. In G. Cohen and

J. Wolfmann, editors, Coding theory and applications, volume 388 of LNCS, pages

106–113. Springer-Verlag, 1989.

A NP-completeness Proofs

The most general problem we want to study concerning syndrome decoding with

regular words is:

b-Regular Syndrome Decoding (b-RSD)

Input: wbinary matrices Hiof dimension r×nand a bit string Sof length r.

Property: there exists a set of b×w0columns (with 0 < w0≤w), 0 or b columns

in each Hi, adding to S.

Note that in this problem b is not an input parameter. The fact that for any

value of b this problem is NP-complete is much stronger than simply saying that

the problem where b is an instance parameter is NP-complete. This also means

that there is not one, but an inﬁnity of such problems (one for each value of b).

However we consider them as a single problem as the proof is the same for all

values of b.

The two following sub-problems are derived from the previous one. They

correspond more precisely to the kind of instances that an attacker on the FSB

hash function would need to solve.

Regular Syndrome Decoding (RSD)

Input: wmatrices Hiof dimension r×nand a bit string Sof length r.

Property: there exists a set of wcolumns, 1 per Hi, adding to S.

2-Regular Null Syndrome Decoding (2-RNSD)

Input: wmatrices Hiof dimension r×n.

Property: there exists a set of 2 ×w0columns (with 0 < w0≤w), taking 0 or 2

columns in each Hiadding to 0.

It is easy to see that all of these problems are in NP. To prove that they

are NP-complete we will use a reduction similar to the one given by Berlekamp,

McEliece and van Tilborg for Syndrome Decoding [2]. We will use the following

known NP-complete problem.

Three-Dimensional Matching (3DM)

Input: a subset U⊆T×T×Twhere Tis a ﬁnite set.

Property: there is a set V⊆Usuch that |V|=|T|and no two elements of V

agree on any coordinate.

Let’s study the following example: let T={1,2,3}and |U|= 6

U1= (1,2,2)

U2= (2,2,3)

U3= (1,3,2)

U4= (2,1,3)

U5= (3,3,1)

One can see that the set consisting of U1,U4and U5veriﬁes the property.

However if you remove U1from Uthen no solution exist. In our case it is more

convenient to represent an instance of this problem in another way: we associate

a 3|T|× |U|binary incidence matrix Ato the instance. For the previous example

it would give:

122 223 132 213 331

1 1 0 1 0 0

2 0 1 0 1 0

3 0 0 0 0 1

1 0 0 0 1 0

2 1 1 0 0 0

3 0 0 1 0 1

1 0 0 0 0 1

2 1 0 1 0 0

3 0 1 0 1 0

A solution to the problem will then be a subset of |T|columns adding to the

all-1 column. Using this representation, we will now show that any instance of

this problem can be reduced to solving an instance of BSD, hence proving that

BSD is NP-complete.

Reductions of 3DM to RSD. Given an input U⊆T×T×Tof the 3DM

problem, let Abe the 3|T| × |U|incidence matrix described above. For ifrom 1

to |T|we take Hi=A.

If we try to solve the BSD problem on these matrices with w=|T|and

S= (1,...,1) a solution will exist if and only if we are able to add w≤ |T|

columns of A(possibly many times the same one) and obtain a column of 1s.

As all the columns of Acontain only three 1s, the only way to have 3 × |T|1s

at the end is that during the adding no two columns have a 1 on the same line

(each time two columns have a 1 on the same line the ﬁnal weight decreases by

2). Hence the |T|chosen columns will form a suitable subset Vfor the 3DM

problem.

This means that if we are able to give an answer to this RSD instance, we

will be able to answer the 3DM instance we wanted to solve. Thus RSD is NP-

complete.

Reduction of 3DM to b-RSD. This proof will be exactly the same as the

one above. The input is the same, but this time we build the following matrix:

the block matrix with b times A

on the diagonal

Once again we take Hi=Band use S= (1, . . . , 1). The same arguments as

above apply here and prove that for any given value of b, if we are able to give

an answer to this b-RSD instance, we will be able to answer the 3DM instance

we wanted to solve. Hence, for any b, b-RSD is NP-complete.

Reduction of 3DM to 2-RNSD. We need to construct a matrix for which

solving a 2-RNSD instance is equivalent to solving a given 3DM instance. A

diﬃculty is that, this time, we can’t choose S= (1, . . . , 1) as this problem is

restricted to the case S= 0. For this reason we need to construct a somehow

complicated matrix Hwhich is the concatenation of the matrices Hiwe will use.

It is constructed as follows:

Id Id

()

This matrix is composed of three parts: the top part with the Amatrices,

the middle part with pairs of identity |U| × |U|matrices, and the bottom part

with small lines of 1s.

The aim of this construction is to ensure that a solution to 2-RNSD on this

matrix (with w=|T|+ 1) exists if and only if one can add |T|columns of Aand

a column of 1s to obtain 0. This is then equivalent to having a solution to the

3DM problem.

The top part of the matrix will be the part where the link to 3DM is placed:

in the 2-RNSD problem you take 2 columns in some of the block, our aim is to

take two columns in each block, and each time, one in the Asub-block and one in

the 0 sub-block. The middle part ensures that when a solution chooses a column

in Hit has to choose the only other column having a 1 on the same line so that

the ﬁnal sum on this line is 0. This means that any time a column is chosen in

one of the Asub-blocks, the “same” column is chosen in the 0 sub-block. Hence

in the ﬁnal 2w0columns, w0will be taken in the Asub-blocks (or the 1 sub-block)

and w0in the 0 sub-blocks. You will then have a sum of w0columns of Aor 1

(not necessarily distinct) adding to 0. Finally, the bottom part of the matrix is

there to ensure that if w0>0 (as requested in the formulation of the problem)

then w0=w. Indeed, each time you pick a column in the block number i, the

middle part makes you have to pick one in the other half of the block, creating

two ones in the ﬁnal sum. To eliminate these ones the only way is to pick some

columns in the blocks i−1 and i+ 1 and so on, until you pick some columns in

all of the wblocks.

As a result, we see that solving an instance of 2-RNSD on His equivalent

to choosing |T|columns in A(not necessarily diﬀerent) all adding to 1. As in

the previous proof, this concludes the reduction and 2-RNSD is now proven

NP-complete.

It is interesting to note that instead of using 3DM we could directly have used

RSD for this reduction. You simply replace the Amatrices with the wblocks

of the RSD instance you need to solve and instead of a matrix of 1s you put a

matrix containing columns equal to S. Then the reduction is also possible.

B Modeling Information Set Decoding

Using a classical ISD attack we have seen that the average amount of calcula-

tion required to ﬁnd a solution to an instance of SD is g(r)/Pw. This is true

when a complete Gaussian elimination is done for each information set chosen.

However, some additional calculations could be done. In this way, each choice of

information set could allow to test more words. For instance, in [9], each time an

information set is chosen, the validity of this set is tested, but at the same time,

partial validity is tested. That is, if there exists a solution with a few 1s among

the kpositions of the information set, it will also be found by the algorithm. Of

course, the more solutions one wants to test for each Gaussian elimination, the

more additional calculations he will have to perform.

In a general way, if K1and K2denote respectively the complexities in space

and time of the algorithm performing the additional computations, and Mde-

notes the amount of additional possible solutions explored, we should have:

K1× K2≥M.

Moreover, if Pwis the probability of ﬁnding a solution for one information

set, then, when observing Mtimes more solutions at a time, the total probability

of success is not greater than MPw.

Hence, the total time complexity of such an attack would be:

K ≥ g(r) + K2

MPw

≥g(r)

MPw

K1Pw

When Mbecomes large the g(r)/MPwterm becomes negligible (the cost of

the Gaussian elimination no longer counts) and we have:

K ≥ 1

K1Pw

This would mean that, in order to be out of reach of any possible attack, the

inverse of the probability Pwshould be at least as large as K × K1. Allowing

complexities up to 280 in time and 250 in space we would need Pw≤2−130.

However this is only theoretical. In practice there is no known algorithm for

which K1× K2=M. Using existing algorithm this would rather be:

K1× K2=M×Loss and K ≥ Loss

K1Pw

where Loss denotes a function of log M,n,rand w. For this reason, the optimal

values for the attack will often only correspond to a small amount of extra calcu-

lation for each information set. This will hence save some time on the Gaussian

eliminations but will hardly gain anything on the rest. The time complexity K

will always remain larger than 1/Pwand will most probably be even a little

above.

Integer syndrome decoding in the presence of noise

Article

Full-text available

May 2024

Code-based cryptography received attention after the NIST started the post-quantum cryptography standardization process in 2016. A central NP-hard problem is the binary syndrome decoding problem, on which the security of many code-based cryptosystems lies. The best known methods to solve this problem all stem from the information-set decoding strategy, first introduced by Prange in 1962. A recent line of work considers augmented versions of this strategy, with hints typically provided by side-channel information. In this work, we consider the integer syndrome decoding problem, where the integer syndrome is available but might be noisy. We study how the performance of the decoder is affected by the noise. First we identify the noise model as being close to a centered in zero binomial distribution. Second we model the probability of success of the ISD-score decoder in presence of a binomial noise. Third, we demonstrate that with high probability our algorithm finds the solution as long as the noise parameter d is linear in t (the Hamming weight of the solution) and t is sub-linear in the code-length. We provide experimental results on cryptographic parameters for the BIKE and Classic McEliece cryptosystems, which are both candidates for the fourth round of the NIST standardization process.

Post-quantum secure fully-dynamic logarithmic-size deniable group signature in code-based setting

Article

Jan 2022

Since its introduction by Chaum and Heyst, group signature has been one of the most active areas of cryptographic research with numerous applications to computer security and privacy. Group signature permits the members of a group to sign a document on behalf of the entire group keeping signer's identity secret and enabling disclosure of the signer's identity if required. In this work, we present the first code-based fully-dynamic group signature scheme which allows group members to join or leave the group at any point of time. We employ a code-based updatable Merkle-tree accumulator in our design to achieve logarithmic-size signature and utilize randomized Niederreiter encryption to trace the identity of the signer. More positively, we equipped our scheme with deniability characteristic whereby the tracing authority can furnish evidence showing that a given member is not the signer of a particular signature. Our scheme satisfies the security requirements of anonymity, non-frameability, traceability and tracing-soundness in the random oracle model under the hardness of generic decoding problem. We emphasize that our scheme provides full-dynamicity, features deniability in contrast to the existing code-based group signature schemes and works favourably in terms of signature size, group public key size and secret key size.

Group Encryption: Full Dynamicity, Message Filtering and Code-Based Instantiation

Article

Jun 2024
THEOR COMPUT SCI

OVERFITTING IN MACHINE LEARNING: PROBLEMS AND SOLUTIONS

Article

May 2024

Overfitting is one of the most important factors affecting the performance of machine lear¬ning algorithms. When solving machine learning problems, it is important to be able to effectively solve the problem of overfitting. The research objective. The purpose of this article is to study the problem of overfitting in machine learning tasks. The article discusses effective learning methods aimed at preventing overfitting. Material and methods. The focus of the article is on various non-standard issues related to overfitting that are important from a practical point of view. Various causes of overfitting, its consequences and methods of combating overfitting are considered. The dependence of overfitting and generalizing abi¬lity on the quality of features and properties of the training set is studied. Particular attention is paid to the features of training and the formation of a training sample in multidimensional feature spaces. The question of the correct formation of the training set and the correct addition of data to the training set from the point of view of overfitting prevention, as well as the impact of incorrect distribution of the target variable on overfitting, is considered. It is explained why the methods of adding incorrect data to the training set, such as MixUp and CutMix, can improve the quality of training. The problem of the algorithm's confidence in its predictions is considered, as well as the problem of algorithm overconfidence in incorrect predictions, which is also typical for ChatGPT. The problem of assessing the quality of the algorithm is considered. It is shown why normalization can help avoid overfitting. Results. An algorithm for training decision trees Random Samples Mix-Up is proposed to combat overfitting, which improves the quality of training decision trees. A comparative analysis of the quality of models before and after the application of this method of combating overfitting is carried out. Experiments on real data confirm effectiveness of this method. Conclusion. The results of the study can be useful in developing new machine learning algorithms and improving the efficiency of existing ones. The results of the study can be useful for developers of machine learning algorithms and specialists in the field of artificial intelligence.

\textsf{ReSolveD} $$: Shorter Signatures from Regular Syndrome Decoding and VOLE-in-the-Head

Chapter

Apr 2024

Fully Dynamic Attribute-Based Signatures for Circuits from Codes

Chapter

Apr 2024

Generic Error SDP and Generic Error CVE

Chapter

Oct 2023

This paper introduces a new family of CVE schemes built from generic errors (GE-CVE) and identifies a vulnerability therein. To introduce the problem, we generalize the concept of error sets beyond those defined by a metric, and use the set-theoretic difference operator to characterize when these error sets are detectable or correctable by codes. We prove the existence of a general, metric-less form of the Gilbert-Varshamov bound, and show that - like in the Hamming setting - a random code corrects a generic error set with overwhelming probability. We define the generic error SDP (GE-SDP), which is contained in the complexity class of NP-hard problems, and use its hardness to demonstrate the security of GE-CVE. We prove that these schemes are complete, sound, and zero-knowledge. Finally, we identify a vulnerability of the GE-SDP for codes defined over large extension fields and without a very high rate. We show that certain GE-CVE parameters suffer from this vulnerability, notably the restricted CVE scheme.

Oblivious Transfer with Constant Computational Overhead

Chapter

Apr 2023

Short Signatures from Regular Syndrome Decoding in the Head

Chapter

Apr 2023

We introduce a new candidate post-quantum digital signature scheme from the regular syndrome decoding (RSD) assumption, an established variant of the syndrome decoding assumption which asserts that it is hard to find $w $-regular solutions to systems of linear equations over $\mathbb {F}_2$ (a vector is regular if it is a concatenation of $w $ unit vectors). Our signature is obtained by introducing and compiling a new 5-round zero-knowledge proof system constructed using the MPC-in-the-head paradigm. At the heart of our result is an efficient MPC protocol in the preprocessing model that checks correctness of a regular syndrome decoding instance by using a share ring-conversion mechanism.The analysis of our construction is non-trivial and forms a core technical contribution of our work. It requires careful combinatorial analysis and combines several new ideas, such as analyzing soundness in a relaxed setting where a cheating prover is allowed to use any witness sufficiently close to a regular vector. We complement our analysis with an in-depth overview of existing attacks against RSD.Our signatures are competitive with the best-known code-based signatures, ranging from 12.52 KB (fast setting, with signing time of the order of a few milliseconds on a single core of a standard laptop) to about 9 KB (short setting, with estimated signing time of the order of 15 ms).

Syndrome Decoding in the Head: Shorter Signatures from Zero-Knowledge Proofs

Chapter

Oct 2022

Zero-knowledge proofs of knowledge are useful tools to design signature schemes. The ongoing effort to build a quantum computer urges the cryptography community to develop new secure cryptographic protocols based on quantum-hard cryptographic problems. One of the few directions is code-based cryptography for which the strongest problem is the syndrome decoding (SD) for random linear codes. This problem is known to be NP-hard and the cryptanalysis state of the art has been stable for many years. A zero-knowledge protocol for this problem was pioneered by Stern in 1993. Since its publication, many articles proposed optimizations, implementation, or variants.In this paper, we introduce a new zero-knowledge proof for the syndrome decoding problem on random linear codes. Instead of using permutations like most of the existing protocols, we rely on the MPC-in-the-head paradigm in which we reduce the task of proving the low Hamming weight of the SD solution to proving some relations between specific polynomials. Specifically, we propose a 5-round zero-knowledge protocol that proves the knowledge of a vector x such that $y=Hx$ and ${\text {wt}}(x)\le w$ and which achieves a soundness error closed to 1/N for an arbitrary N.While turning this protocol into a signature scheme, we achieve a signature size of 11–12 KB for 128-bit security when relying on the hardness of the SD problem on binary fields. Using larger fields (like $\mathbb {F}_{2^8}$), we can produce fast signatures of around 8 KB. This allows us to outperform Picnic3 and to be competitive with SPHINCS+, both post-quantum signature candidates in the ongoing NIST standardization effort. Moreover, our scheme outperforms all the existing code-based signature schemes for the common “signature size $+$ public key size” metric.

Some weak keys in McEliece public-key cryptosystem

Conference Paper

Full-text available

Sep 1998

We show that the Goppa codes Γ(L,g) where g is a binary polynomial constitute a recognizable family of weak keys for McEliece (1978) cryptosystem, thus inducing naturally a structural attack against the system

Average case completeness

Article

Full-text available

Jun 1991

Yuri Gurevich

We explain and advance Levin's theory of average case completeness. In particular, we exhibit examples of problems complete in the average case and prove a limitation on the power of deterministic reductions.

A Design Principle for Hash Functions

Conference Paper

Full-text available

Jan 1989

Ivan Damgård

We show that if there exists a computationally collision free function f from m bits to t bits where m > t, then there exists a computationally collision free function h mapping messages of arbitrary polynomial lengths to t-bit strings. Let n be the length of the message. h can be constructed either such that it can be evaluated in time linear in n using 1 processor, or such that it takes time O(log(n)) using O(n) processors, counting evaluations of f as one step. Finally, for any constant k and large n, a speedup by a factor of k over the first construction is available using k processors. Apart from suggesting a generally sound design principle for hash functions, our results give a unified view of several apparently unrelated constructions of hash functions proposed earlier. It also suggests changes to other proposed constructions to make a proof of security potentially easier. We give three concrete examples of constructions, based on modular squaring, on Wolfram’s pseudoranddom bit generator [Wo], and on the knapsack problem.

One Way Hash Functions and DES

Conference Paper

Full-text available

Aug 1989

Ralph C. Merkle

One way hash functions are a major tool in cryptography. DES is the best known and most widely used encryption function in the commercial world today. Generating a one-way hash function which is secure if DES is a “good” block cipher would therefore be useful. We show three such functions which are secure if DES is a good random block cipher.

Handbook of Applied Cryptography

Book

Jan 1996

Cryptography, in particular public-key cryptography, has emerged in the last 20 years as an important discipline that is not only the subject of an enormous amount of research, but provides the foundation for information security in many applications. Standards are emerging to meet the demands for cryptographic protection in most areas of data communications. Public-key cryptographic techniques are now in widespread use, especially in the financial services industry, in the public sector, and by individuals for their personal privacy, such as in electronic mail. This Handbook will serve as a valuable reference for the novice as well as for the expert who needs a wider scope of coverage within the area of cryptography. It is a necessary and timely guide for professionals who practice the art of cryptography. The Handbook of Applied Cryptography provides a treatment that is multifunctional: It serves as an introduction to the more practical aspects of both conventional and public-key cryptography It is a valuable source of the latest techniques and algorithms for the serious practitioner It provides an integrated treatment of the field, while still presenting each major topic as a self-contained unit It provides a mathematical treatment to accompany practical discussions It contains enough abstraction to be a valuable reference for theoreticians while containing enough detail to actually allow implementation of the algorithms discussed Now in its third printing, this is the definitive cryptography reference that the novice as well as experienced developers, designers, researchers, engineers, computer scientists, and mathematicians alike will use.

Knapsack-type crytosystems and algebraic coding theory

Article

Jan 1986

H. Niederreiter

On the Security of the McEliece Public-Key Cryptosystem

Article

Jan 2002

Nicolas Sendrier

As RSA, the McEliece public-key cryptosystem has successfully resisted more than 20 years of cryptanalysis effort. However, despite the fact that it is faster, it was not as successful as RSA as far as applications are concerned. This is certainly due to its very large public key and probably also to the belief that the system could not be used for the design of a digital signature scheme. We present here the state of art of the implementation and the security of the two main variants of code-based public-key encryption schemes (McEliece’s and Niedereitter’s) as well as the more recent signature scheme derived from them. We also show how it is possible to formally reduce the security of these systems to two well identified algorithmic problems. The decoding attack (aimed on one particular ciphertext) is connected to the NP-complete syndrome decoding problem. The structural attack (aimed on the public key) is connected to the problem of distinguishing binary Goppa codes from random codes. We conjecture that both these problems are difficult and present some arguments to support this claim.

Handbook of Applied Cryptography

Article

Jan 1997

From the Publisher: A valuable reference for the novice as well as for the expert who needs a wider scope of coverage within the area of cryptography, this book provides easy and rapid access of information and includes more than 200 algorithms and protocols; more than 200 tables and figures; more than 1,000 numbered definitions, facts, examples, notes, and remarks; and over 1,250 significant references, including brief comments on each paper.

Handbook of Applied CryptographyL Chapter 6

Article

Jan 1997

Lecture Notes in Computer Science

Conference Paper

Jan 1990

Ronald L. Rivest

The MD4 message digest algorithm takes an input message of arbitrary length and produces an output 128-bit "fingerprint" or "message digest", in such a way that it is (hopefully) computationally infeasible to produce two messages having the same message digest, or to produce any message having a given prespecified target message digest. The MD4 algorithm is thus ideal for digital signature applications: a large file can be securely "compressed" with MD4 before being signed with (say) the RSA public-key cryptosystem.The MD4 algorithm is designed to be quite fast on 32-bit machines. For example, on a SUN Sparc station, MD4 runs at 1,450,000 bytes/second (11.6 Mbit/sec). In addition, the MD4 algorithm does not require any large substitution tables; the algorithm can be coded quite compactly.The MD4 algorithm is being placed in the public domain for review and possible adoption as a standard.

A Fast Provably Secure Cryptographic Hash Function.

Abstract and Figures

Recommended publications

Design of fast checkers for constant-weight codes

A New Approach 160-bit Message Digest Algorithm

Error control by product codes in arithmetic units

Leakage-resilient authentication.