Conference Paper

Integer Syndrome Decoding in the Presence of Noise

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Workshop (ITW) 2022 [21]. We are extending on this short version by providing the following contributions. ...
... -We give full proofs of the results in [21] with additional comments. We extend the results by providing sharper statement, quantifying exactly the error probability, e.g., Corollary 2 and Theorem 2 from [21]. ...
... -We give full proofs of the results in [21] with additional comments. We extend the results by providing sharper statement, quantifying exactly the error probability, e.g., Corollary 2 and Theorem 2 from [21]. On top of that the technical details from the proofs reflect the gain of our method compared to [25]. ...
Article
Full-text available
Code-based cryptography received attention after the NIST started the post-quantum cryptography standardization process in 2016. A central NP-hard problem is the binary syndrome decoding problem, on which the security of many code-based cryptosystems lies. The best known methods to solve this problem all stem from the information-set decoding strategy, first introduced by Prange in 1962. A recent line of work considers augmented versions of this strategy, with hints typically provided by side-channel information. In this work, we consider the integer syndrome decoding problem, where the integer syndrome is available but might be noisy. We study how the performance of the decoder is affected by the noise. First we identify the noise model as being close to a centered in zero binomial distribution. Second we model the probability of success of the ISD-score decoder in presence of a binomial noise. Third, we demonstrate that with high probability our algorithm finds the solution as long as the noise parameter d is linear in t (the Hamming weight of the solution) and t is sub-linear in the code-length. We provide experimental results on cryptographic parameters for the BIKE and Classic McEliece cryptosystems, which are both candidates for the fourth round of the NIST standardization process.
Chapter
Among the fourth round finalists of the NIST post-quantum cryptography standardization process for public-key encryption algorithms and key encapsulation mechanisms, three rely on hard problems from coding theory. Key encapsulation mechanisms are frequently used in hybrid cryptographic systems: a public-key algorithm for key exchange and a secret key algorithm for communication. A major point is thus the initial key exchange that is performed thanks to a key encapsulation mechanism. In this paper, we analyze side-channel vulnerabilities of the key encapsulation mechanism implemented by the Classic McEliece cryptosystem, whose security is based on the syndrome decoding problem. We use side-channel leakages to reduce the complexity of the syndrome decoding problem by reducing the length of the code considered. The columns punctured from the original code reduce the complexity of a hard problem from coding theory. This approach leads to efficient profiled side-channel attacks that recover the session key with high success rates, even in noisy scenarios.KeywordsPost-quantum cryptographyCode-based cryptographySide-channel attacks
Technical Report
Full-text available
we introduce HQC, an efficient encryption scheme based on coding theory. HQC stands for Hamming Quasi-Cyclic. This proposal has been published in IEEE Transactions on Information Theory. HQC is a code-based public key cryptosystem with several desirable properties: • It is proved IND-CPA assuming the hardness of (a decisional version of) the Syndrome Decoding on structured codes. By construction, HQC perfectly fits the recent KEMDEM transformation of [23], and allows to get an hybrid encryption scheme with strong security guarantees (IND-CCA2), • In contrast with most code-based cryptosystems, the assumption that the family of codes being used is indistinguishable among random codes is no longer required, and • It features a detailed and precise upper bound for the decryption failure probability analysis.
Chapter
Full-text available
The selection of secure parameter sets requires an estimation of the attack cost to break the respective cryptographic scheme instantiated under these parameters. The current NIST standardization process for post-quantum schemes makes this an urgent task, especially considering the announcement to select final candidates by the end of 2021. For code-based schemes, recent estimates seemed to contradict the claimed security of most proposals, leading to a certain doubt about the correctness of those estimates. Furthermore, none of the available estimates include most recent algorithmic improvements on decoding linear codes, which are based on information set decoding (ISD) in combination with nearest neighbor search. In this work we observe that all major ISD improvements are build on nearest neighbor search, explicitly or implicitly. This allows us to derive a framework from which we obtain practical variants of all relevant ISD algorithms including the most recent improvements. We derive formulas for the practical attack costs and make those online available in an easy to use estimator tool written in python and C. Eventually, we provide classical and quantum estimates for the bit security of all parameter sets of current code-based NIST proposals.KeywordsISDsyndrome decodingnearest neighborestimatorcode-based
Article
Full-text available
In this article, we model a variant of the well-known syndrome decoding problem as a linear optimization problem. Most common algorithms used for solving optimization problems, e.g. the simplex algorithm, fail to find a valid solution for the syndrome decoding problem over a finite field. However, our simulations prove that a slightly modified version of the syndrome decoding problem can be solved by the simplex algorithm. More precisely, the algorithm returns a valid error vector when the syndrome vector is an integer vector, i.e.,the matrix-vector multiplication, is realized over Z, instead of Fq.
Article
Full-text available
We propose an algorithm to compute the nearest codeword in a BCS. For a linear code of length n and rate R, the algorithm executes in time of order 2 n(1-R)/2 . For codes with distance d increasing linearly with length, we propose an algorithm capable of correcting [(d-1)/2]+const errors which involves a linearly increasing number of attempts to correct [(d-1)/2] errors.
Conference Paper
Full-text available
Decoding random linear codes is a fundamental problem in complexity theory and lies at the heart of almost all code-based cryptography. The best attacks on the most prominent code-based cryptosystems such as McEliece directly use decoding algorithms for linear codes. The asymptotically best decoding algorithm for random linear codes of length n was for a long time Stern’s variant of information-set decoding running in time \(\tilde{\mathcal{O}}\left(2^{0.05563n}\right)\). Recently, Bernstein, Lange and Peters proposed a new technique called Ball-collision decoding which offers a speed-up over Stern’s algorithm by improving the running time to \(\tilde{\mathcal{O}}\left(2^{0.05558n}\right)\). In this paper, we present a new algorithm for decoding linear codes that is inspired by a representation technique due to Howgrave-Graham and Joux in the context of subset sum algorithms. Our decoding algorithm offers a rigorous complexity analysis for random linear codes and brings the time complexity down to \(\tilde{\mathcal{O}}\left(2^{0.05363n}\right)\).
Article
Full-text available
Background Genome-wide association studies have revealed that rare variants are responsible for a large portion of the heritability of some complex human diseases. This highlights the increasing importance of detecting and screening for rare variants. Although the massively parallel sequencing technologies have greatly reduced the cost of DNA sequencing, the identification of rare variant carriers by large-scale re-sequencing remains prohibitively expensive because of the huge challenge of constructing libraries for thousands of samples. Recently, several studies have reported that techniques from group testing theory and compressed sensing could help identify rare variant carriers in large-scale samples with few pooled sequencing experiments and a dramatically reduced cost. Results Based on quantitative group testing, we propose an efficient overlapping pool sequencing strategy that allows the efficient recovery of variant carriers in numerous individuals with much lower costs than conventional methods. We used random k-set pool designs to mix samples, and optimized the design parameters according to an indicative probability. Based on a mathematical model of sequencing depth distribution, an optimal threshold was selected to declare a pool positive or negative. Then, using the quantitative information contained in the sequencing results, we designed a heuristic Bayesian probability decoding algorithm to identify variant carriers. Finally, we conducted in silico experiments to find variant carriers among 200 simulated Escherichia coli strains. With the simulated pools and publicly available Illumina sequencing data, our method correctly identified the variant carriers for 91.5–97.9% variants with the variant frequency ranging from 0.5 to 1.5%. Conclusions Using the number of reads, variant carriers could be identified precisely even though samples were randomly selected and pooled. Our method performed better than the published DNA Sudoku design and compressed sequencing, especially in reducing the required data throughput and cost.
Conference Paper
Full-text available
Decoding random linear codes is a well studied problem with many applications in complexity theory and cryptography. The security of almost all coding and LPN/LWE-based schemes relies on the assumption that it is hard to decode random linear codes. Recently, there has been progress in improving the running time of the best decoding algorithms for binary random codes. The ball collision technique of Bernstein, Lange and Peters lowered the complexity of Stern’s information set decoding algorithm to 20.0556n . Using representations this bound was improved to 20.0537n by May, Meurer and Thomae. We show how to further increase the number of representations and propose a new information set decoding algorithm with running time 20.0494n .
Conference Paper
Full-text available
We examine the tradeoff between privacy and usability of statistical databases. We model a statistical database by an n-bit string d1,..,dn, with a query being a subset q ⊆ [n] to be answered by Σiεqdi. Our main result is a polynomial reconstruction algorithm of data from noisy (perturbed) subset sums. Applying this reconstruction algorithm to statistical databases we show that in order to achieve privacy one has to add perturbation of magnitude (Ω√n). That is, smaller perturbation always results in a strong violation of privacy. We show that this result is tight by exemplifying access algorithms for statistical databases that preserve privacy while adding perturbation of magnitude Õ(√n).For time-T bounded adversaries we demonstrate a privacypreserving access algorithm whose perturbation magnitude is ≈ √T.
Chapter
This paper studies how to incorporate small information leakages (called “hints”) into information-set decoding (ISD) algorithms. In particular, the influence of these hints on solving the (n, k, t)-syndrome-decoding problem (SDP), i.e., generic syndrome decoding of a code of length n, dimension k, and an error of weight t, is analyzed. We motivate all hints by leakages obtainable through realistic side-channel attacks on code-based post-quantum cryptosystems. One class of studied hints consists of partial knowledge of the error or message, which allow to reduce the length, dimension, or error weight using a suitable transformation of the problem. As a second class of hints, we assume that the Hamming weights of sub-blocks of the error are known, which can be motivated by a template attack. We present adapted ISD algorithms for this type of leakage. For each third-round code-based NIST submission (Classic McEliece, BIKE, HQC), we show how many hints of each type are needed to reduce the work factor below the claimed security level. E.g., for Classic McEliece mceliece348864, the work factor is reduced below \(2^{128}\) for 9 known error locations, 650 known error-free positions or known Hamming weights of 29 sub-blocks of roughly equal size.KeywordsPost-quantum cryptographyCode-based cryptographyInformation set decodingSide-channel attacks
Chapter
Code-based public-key cryptosystems are promising candidates for standardization as quantum-resistant public-key cryptographic algorithms. Their security is based on the hardness of the syndrome decoding problem. Computing the syndrome in a finite field, usually F2, guarantees the security of the constructions. We show in this article that the problem becomes considerably easier to solve if the syndrome is computed in N instead. By means of laser fault injection, we illustrate how to compute the matrix-vector product in N by corrupting specific instructions, and validate it experimentally. To solve the syndrome decoding problem in N, we propose a reduction to an integer linear programming problem. We leverage the computational efficiency of linear programming solvers to obtain real-time message recovery attacks against the code-based proposal to the NIST Post-Quantum Cryptography standardization challenge. We perform our attacks in the worst-case scenario, i.e. considering random binary codes, and retrieve the initial message within minutes on a desktop computer. Our attack targets the reference implementation of the Niederreiter cryptosystem in the NIST PQC competition finalist Classic McEliece and is practically feasible for all proposed parameters sets of this submission. For example, for the 256-bit security parameters sets, we successfully recover the message in a couple of seconds on a desktop computer Finally, we highlight the fact that the attack is still possible if only a fraction of the syndrome entries are faulty. This makes the attack feasible even though the fault injection does not have perfect repeatability and reduces the computational complexity of the attack, making it even more practical overall.
Conference Paper
We propose a new decoding algorithm for random binary linear codes. The so-called information set decoding algorithm of Prange (1962) achieves worst-case complexity \(2^{0.121n}\). In the late 80s, Stern proposed a sort-and-match version for Prange’s algorithm, on which all variants of the currently best known decoding algorithms are build. The fastest algorithm of Becker, Joux, May and Meurer (2012) achieves running time \(2^{0.102n}\) in the full distance decoding setting and \(2^{0.0494n}\) with half (bounded) distance decoding. In this work we point out that the sort-and-match routine in Stern’s algorithm is carried out in a non-optimal way, since the matching is done in a two step manner to realize an approximate matching up to a small number of error coordinates. Our observation is that such an approximate matching can be done by a variant of the so-called High Dimensional Nearest Neighbor Problem. Namely, out of two lists with entries from \({\mathbb F}_2^m\) we have to find a pair with closest Hamming distance. We develop a new algorithm for this problem with sub-quadratic complexity which might be of independent interest in other contexts. Using our algorithm for full distance decoding improves Stern’s complexity from \(2^{0.117n}\) to \(2^{0.114n}\). Since the techniques of Becker et al apply for our algorithm as well, we eventually obtain the fastest decoding algorithm for binary linear codes with complexity \(2^{0.097n}\). In the half distance decoding scenario, we obtain a complexity of \(2^{0.0473n}\).
Chapter
Group testing, introduced by Dorfman in 1943, increases the efficiency of screening individuals for low prevalence diseases. A wider use of this kind of methodology is restricted by the loss of sensitivity inherent to the mixture of samples. Moreover, as this methodology attains greater cost reduction in the cases of lower prevalence (and, consequently, a higher optimal batch size), the phenomenon of rarefaction is crucial to understand that sensitivity reduction. Suppose, with no loss of generality, that an experimental individual test consists in determining if the amount of substance overpasses some prefixed threshold l. For a pooled sample of size n, the amount of substance of interest is represented by \(\left (Y _{1},\cdots \,,Y _{n}\right )\), with mean \(\overline{Y }_{n}\) and maximum M n . The goal is to know if any of the individual samples exceeds the threshold l, that is, M n > l. It is shown that the dependence between \(\overline{Y }_{n}\) and M n has a crucial role in deciding the use of group testing since a higher dependence corresponds to more information about M n given by the observed value of \(\overline{Y }_{n}\).
Article
We introduce a variation of the classic group testing problem referred to as group testing under sum observations. In this new formulation, when a test is carried out on a group of items, the result reveals not only whether the group is contaminated, but also the number of defective items in the tested group. We establish the optimal nested test plan within a minimax framework that minimizes the total number of tests for identifying all defective items in a given population. This optimal test plan and its performance are given in closed forms. It guarantees to identify all $d$ defective items in a population of $n$ items in ${O\left(d\log_2{\left( n/d \right)}\right)}$ tests. This new formulation is motivated by the heavy hitter detection problem in traffic monitoring in Internet and general communication networks. For such applications, it is often the case that a few abnormal traffic flows with exceptionally high volume (referred to as heavy hitters) make up most of the traffic seen by the entire network. To detect the heavy hitters, it is more efficient to group subsets of flows together and measure the aggregated traffic rather than testing each flow one by one. Since the volume of heavy hitters is much higher than that of normal flows, the number of heavy hitters in a group can be accurately estimated from the aggregated traffic load.
Conference Paper
The best known cryptanalytic attack on McEliece’s public-key cryptosystem based on algebraic coding theory is to repeatedly select k bits at random from an n-bit ciphertext vector, which is corrupted by at most t errors, in hope that none of the selected k bits are in error until the cryptanalyst recovers the correct message. The method of determining whether the recovered message is the correct one has not been throughly investigated. In this paper, we suggest a systematic method of checking, and describe a generalized version of the cryptanalytic attack which reduces the work factor significantly (factor of 211 for the commonly used example of n=1024 Goppa code case). Some more improvements are also given. We also note that these cryptanalytic algorithms can be viewed as generalized probabilistic decoding algorithms for any linear error correcting codes.
Conference Paper
We describe a probabilistic algorithm, which can be used to discover words of small weight in a linear binary code. The work-factor of the algorithm is asymptotically quite large but the method can be applied for codes of a medium size. Typical instances that are investigated are codewords of weight 20 in a code of length 300 and dimension 150.
Article
This paper considers the problem of providing security to statistical databases against disclosure of confidential information. Security-control methods suggested in the literature are classified into four general approaches: conceptual, query restriction, data perturbation, and output perturbation. Criteria for evaluating the performance of the various security-control methods are identified. Security-control methods that are based on each of the four approaches are discussed, together with their performance with respect to the identified evaluation criteria. A detailed comparative analysis of the most promising methods for protecting dynamic-online statistical databases is also presented. To date no single security-control method prevents both exact and partial disclosures. There are, however, a few perturbation-based methods that prevent exact disclosure and enable the database administrator to exercise "statistical disclosure control." Some of these methods, however introduce bias into query responses or suffer from the 0/1 query-set-size problem (i.e., partial disclosure is possible in case of null query set or a query set of size 1). We recommend directing future research efforts toward developing new methods that prevent exact disclosure and provide statistical-disclosure control, while at the same time do not suffer from the bias problem and the 0/1 query-set-size problem. Furthermore, efforts directed toward developing a bias-correction mechanism and solving the general problem of small query-set-size would help salvage a few of the current perturbation-based methods.
Article
The fact that the general decoding problem for linear codes and the general problem of finding the weights of a linear code are both NP-complete is shown. This strongly suggests, but does not rigorously imply, that no algorithm for either of these problems which runs in polynomial time exists.
Article
A class of decoding algorithms using encoding-and-comparison is considered for error-correcting code spaces. Code words, each of which agrees on some information set for the code with the word r to be decoded, are constructed and compared with r . An operationally simple algorithm of this type is studied for cyclic code spaces A . Let A have length n , dimension k over some finite field, and minimal Hamming distance m . The construction of fewer than n^2/2 code words is required in decoding a word r . The procedure seems to be most efficient for small minimal distance m , but somewhat paradoxically it is suggested on operational grounds that it may prove most useful in those cases where m is relatively large with respect to the code length n .
Article
MEMBER, IEEE, AND HENK C. A. V~ TILBORG The fact that the general decoding problem for linear codes and the general problem of finding the weights of a linear code are both NP-complete is shown. This strongly suggests, but does not rigorously imply, that no algorithm for either of these problems which runs in polynomial time exists.
Article
A computer is generally considered to be a universal computational device; i.e., it is believed able to simulate any physical computational device with a increase in computation time of at most a polynomial factor. It is not clear whether this is still true when quantum mechanics is taken into consideration. Several researchers, starting with David Deutsch, have developed models for quantum mechanical computers and have investigated their computational properties. This paper gives Las Vegas algorithms for finding discrete logarithms and factoring integers on a quantum computer that take a number of steps which is polynomial in the input size, e.g., the number of digits of the integer to be factored. These two problems are generally considered hard on a classical computer and have been used as the basis of several proposed cryptosystems. (We thus give the first examples of quantum cryptanalysis.) 1 Introduction Since the discovery of quantum mechanics, people have found the behavior of...
Hamming quasi-cyclic (hqc)
  • C A Melchor
  • N Aragon
  • S Bettaieb
  • L Bidoux
  • O Blazy
  • J.-C Deneuville
Decoding random linear codes in O(20.054n)
  • A May
  • A Meurer
  • E Thomae
Message-recovery profiled side-channel attack on the classic mceliece cryptosystem
  • B Colombier
  • V.-F Dragoi
  • P.-L Cayrel
  • V Grosso
Quantitative group testing and the rank of random matrices
  • U Feige
  • A Lellouche
Security-control methods for statistical databases: A comparative study
  • N R Adam
  • J C Worthmann
Message-recovery profiled side-channel attack on the classic mceliece cryptosystem
  • colombier