Multiplier Stage with CarryPropagate Adders and Non-redundant Representation

Source publication

Hardware Factorization Based on Elliptic Curve Method

Conference Paper

Full-text available

May 2005

The security of the most popular asymmetric cryptographic scheme RSA depends on the hardness of factoring large numbers. The best known method for factorization large integers is the general number field sieve (GNFS). Recently, architectures for special purpose hardware for the GNFS have been proposed. One important step within the GNFS is the fact...

Context 1

... inter- nal word additions are performed by simple adders. Figure 3 shows the architectures of one stage. While in [13] a structure with carry-save adders and redundant representation of operands has been implemented, we have chosen a configuration with carry-propagate adders and non-redundant repre- sentation that makes possible more effective imple- mentation especially when target platform supports fast carry chain logic. ...

View in full-text

Table 2 Synthesis Summary of Proposed Cryptoprocessor

FPGA Implementation of Elliptic Curve Cryptoprocessor for Perceptual Layer of the Internet of Things

Article

Full-text available

Oct 2018

Today’s developing era data and information security plays an important role in unsecured communication between Internet of Things (IoT) elements. In IoT, data are transmitted in plaintext for many reasons. One of the most common reason is the availability of hardware. Many IoT products are inexpensive components with limited memory and computation...

High-Speed Elliptic Curve Cryptography Accelerator for Koblitz Curves

Conference Paper

Full-text available

May 2008

We present an FPGA-based accelerator for elliptic curve cryptography on a Koblitz curve targeting for applications requiring very high speed. The accelerator supports fast computation of point multiplication by using window methods as well as multiple point multiplications with joint sparse form representations. Optimized operation-specific process...

Area/performance trade-off analysis of an FPGA digit-serial GF(2m)GF(2m) Montgomery multiplier based on LFSR

Article

Full-text available

Feb 2013

Montgomery Multiplication is a common and important algorithm for improving the efficiency of public key cryptographic algorithms, like RSA and Elliptic Curve Cryptography (ECC). A natural choice for implementing this time consuming multiplication defined on finite fields, mainly over GF(2m)GF(2m), is the use of Field Programmable Gate Arrays (FPGA...

High-Performance Elliptic Curve Cryptography by Using the CIOS Method for Modular Multiplication

Conference Paper

Full-text available

Mar 2017

Elliptic Curve Cryptography (ECC) is becoming unavoidable, and should be used for public key protocols. It has gained increasing acceptance in practice due to the significantly smaller bit size of the operands compared to RSA for the same security level. Most protocols based on ECC imply the computation of a scalar multiplication. ECC can be perfor...

Reconfigurable Architectures for Curve-Based Cryptography on Embedded Micro-Controllers

Conference Paper

Full-text available

Sep 2006

This paper discusses architectures for embedded security to enable various cryptographic services at low cost. To realize the large bit-lengths and complex arithmetic on an 8-bit embedded micro-controller, several hardware acceleration options for elliptic and hyperelliptic curve cryptography (ECC and HECC) are studied and systematically evaluated....

Reconfigurable Computing Architectures

Article

Full-text available

Mar 2015
P IEEE

Reconfigurable architectures can bring unique capabilities to computational tasks. They offer the performance and energy efficiency of hardware with the flexibility of software. In some domains, they are the only way to achieve the required, real-time performance without fabricating custom integrated circuits. Their functionality can be upgraded and repaired during their operational lifecycle and specialized to the particular instance of a task. We survey the field of reconfigurable computing, providing a guide to the body-of-knowledge accumulated in architecture, compute models, tools, run-time reconfiguration, and applications.

On the Analysis of Public-Key Cryptologic Algorithms

Article

Jan 2015

Andrea Miele

The RSA cryptosystem introduced in 1977 by Ron Rivest, Adi Shamir and Len Adleman is the most commonly deployed public-key cryptosystem. Elliptic curve cryptography (ECC) introduced in the mid 80's by Neal Koblitz and Victor Miller is becoming an increasingly popular alternative to RSA offering competitive performance due the use of smaller key sizes. Most recently hyperelliptic curve cryptography (HECC) has been demonstrated to have comparable and in some cases better performance than ECC. The security of RSA relies on the integer factorization problem whereas the security of (H)ECC is based on the (hyper)elliptic curve discrete logarithm problem ((H)ECDLP). In this thesis the practical performance of the best methods to solve these problems is analyzed and a method to generate secure ephemeral ECC parameters is presented. The best publicly known algorithm to solve the integer factorization problem is the number field sieve (NFS). Its most time consuming step is the relation collection step. We investigate the use of graphics processing units (GPUs) as accelerators for this step. In this context, methods to efficiently implement modular arithmetic and several factoring algorithms on GPUs are presented and their performance is analyzed in practice. In conclusion, it is shown that integrating state-of-the-art NFS software packages with our GPU software can lead to a speed-up of 50%. In the case of elliptic and hyperelliptic curves for cryptographic use, the best published method to solve the (H)ECDLP is the Pollard rho algorithm. This method can be made faster using classes of equivalence induced by curve automorphisms like the negation map. We present a practical analysis of their use to speed up Pollard rho for elliptic curves and genus 2 hyperelliptic curves defined over prime fields. As a case study, 4 curves at the 128-bit theoretical security level are analyzed in our software framework for Pollard rho to estimate their practical security level. In addition, we present a novel many-core architecture to solve the ECDLP using the Pollard rho algorithm with the negation map on FPGAs. This architecture is used to estimate the cost of solving the Certicom ECCp-131 challenge with a cluster of FPGAs. Our design achieves a speed-up factor of about 4 compared to the state-of-the-art. Finally, we present an efficient method to generate unique, secure and unpredictable ephemeral ECC parameters to be shared by a pair of authenticated users for a single communication. It provides an alternative to the customary use of fixed ECC parameters obtained from publicly available standards designed by untrusted third parties. The effectiveness of our method is demonstrated with a portable implementation for regular PCs and Android smartphones. On a Samsung Galaxy S4 smartphone our implementation generates unique 128-bit secure ECC parameters in 50 milliseconds on average.

Cofactorization on Graphics Processing Units

Conference Paper

Full-text available

Sep 2014

We show how the cofactorization step, a compute-intensive part of the relation collection phase of the number field sieve (NFS), can be farmed out to a graphics processing unit. Our implementation on a GTX 580 GPU, which is integrated with a state-of-the-art NFS implementation, can serve as a cryptanalytic co-processor for several Intel i7-3770K quad-core CPUs simultaneously. This allows those processors to focus on the memory-intensive sieving and results in more useful NFS-relations found in less time.

ECM at work

Conference Paper

Full-text available

Dec 2012

The performance of the elliptic curve method (ECM) for integer factorization plays an important role in the security assessment of RSA-based protocols as a cofactorization tool inside the number field sieve. The efficient arithmetic for Edwards curves found an application by speeding up ECM. We propose techniques based on generating and combining addition-subtracting chains to optimize Edwards ECM in terms of both performance and memory requirements. This makes our approach very suitable for memory-constrained devices such as graphics processing units (GPU). For commonly used ECM parameters we are able to lower the required memory up to a factor 55 compared to the state-of-the-art Edwards ECM approach. Our ECM implementation on a GTX 580 GPU sets a new throughput record, outperforming the best GPU, CPU and FPGA results reported in literature.

On the Cryptanalysis of Public-Key Cryptography

Article

Jan 2012

Joppe Willem Bos

Nowadays, the most popular public-key cryptosystems are based on either the integer factorization or the discrete logarithm problem. The feasibility of solving these mathematical problems in practice is studied and techniques are presented to speed-up the underlying arithmetic on parallel architectures. The fastest known approach to solve the discrete logarithm problem in groups of elliptic curves over finite fields is the Pollard rho method. The negation map can be used to speed up this calculation by a factor √2. It is well known that the random walks used by Pollard rho when combined with the negation map get trapped in fruitless cycles. We show that previously published approaches to deal with this problem are plagued by recurring cycles, and we propose effective alternative countermeasures. Furthermore, fast modular arithmetic is introduced which can take advantage of prime moduli of a special form using efficient "sloppy reduction." The effectiveness of these techniques is demonstrated by solving a 112-bit elliptic curve discrete logarithm problem using a cluster of PlayStation 3 game consoles: breaking a public-key standard and setting a new world record. The elliptic curve method (ECM) for integer factorization is the asymptotically fastest method to find relatively small factors of large integers. From a cryptanalytic point of view the performance of ECM gives information about secure parameter choices of some cryptographic protocols. We optimize ECM by proposing carry-free arithmetic modulo Mersenne numbers (numbers of the form 2M – 1) especially suitable for parallel architectures. Our implementation of these techniques on a cluster of PlayStation 3 game consoles set a new record by finding a 241-bit prime factor of 21181 – 1. A normal form for elliptic curves introduced by Edwards results in the fastest elliptic curve arithmetic in practice. Techniques to reduce the temporary storage and enhance the performance even further in the setting of ECM are presented. Our results enable one to run ECM efficiently on resource-constrained platforms such as graphics processing units.

Area-Time Efficient Implementation of the Elliptic Curve Method of Factoring in Reconfigurable Hardware for Application in the Number Field Sieve

Article

Full-text available

Sep 2010
IEEE T COMPUT

A novel portable hardware architecture of the Elliptic Curve Method of factoring, designed and optimized for application in the relation collection step of the Number Field Sieve, is described and analyzed. A comparison with an earlier proof-of-concept design by Pelzl et al. has been performed, and a substantial improvement has been demonstrated in terms of both the execution time and the area-time product. The ECM architecture has been ported across five different families of FPGA devices in order to select the family with the best performance to cost ratio. A timing comparison with the highly optimized software implementation, GMP-ECM, has been performed. Our results indicate that low-cost families of FPGAs, such as Spartan-3 and Spartan-3E, offer at least an order of magnitude improvement over the same generation of microprocessors in terms of the performance to cost ratio, without the use of embedded FPGA resources, such as embedded multipliers.

Efficient SIMD Arithmetic Modulo a Mersenne Number

Article

Full-text available

Jan 2010

This paper describes carry-less arithmetic operations modulo an integer 2^M-1 in the thousand-bit range, targeted at single instruction multiple data platforms and applications where overall throughput is the main performance criterion. Using an implementation on a cluster of PlayStation 3 game consoles a new record was set for the elliptic curve method for integer factorization.

Cryptanalysis with COPACOBANA

Article

Full-text available

Dec 2008
IEEE T COMPUT

Cryptanalysis of ciphers usually involves massive computations. The security parameters of cryptographic algorithms are commonly chosen so that attacks are infeasible with available computing resources. This contribution presents a variety of cryptanalytical applications utilizing the COPACOBANA (Cost-Optimized Parallel Code Breaker) machine which is a high-performance, low-cost cluster consisting of 120 Field Programmable Gate Arrays (FPGA). COPACOBANA appears to be the only such reconfigurable parallel FPGA machine optimized for code breaking tasks reported in the open literature. Depending on the actual algorithm, the parallel hardware architecture can outperform conventional computers by several orders of magnitude. In this work, we will focus on novel implementations of cryptanalytical algorithms, utilizing the impressive computational power of COPACOBANA. We describe various exhaustive key search attacks on symmetric ciphers and demonstrate an attack on a security mechanism employed in the electronic passport. Furthermore, we describe time-memory tradeoff techniques which can, e.g., be used for attacking the popular A5/1 algorithm used in GSM voice encryption. In addition, we introduce efficient implementations of more complex cryptanalysis on asymmetric cryptosystems, e.g., Elliptic Curve Cryptosystems (ECC) and number co-factorization for RSA.

CAIRN 2: An FPGA Implementation of the Sieving Step in the Number Field Sieve Method

Conference Paper

Full-text available

Sep 2007

The hardness of the integer factorization problem assures the security of some public-key cryptosystems including RSA, and the number field sieve method (NFS), the most efficient algorithm for factoring large integers currently, is a threat for such cryptosystems. Recently, dedicated factoring devices attract much attention since it might reduce the computing cost of the number field sieve method. In this paper, we report implementational and experimental results of a dedicated sieving device “CAIRN 2” with Xilinx’s FPGA which is designed to handle up to 768-bit integers. Used algorithm is based on the line sieving, however, in order to optimize the efficiency, we adapted a new implementational method (the pipelined sieving). In addition, we actually factored a 423-bit integer in about 30 days with the developed device CAIRN 2 for the sieving step and usual PCs for other steps. As far as the authors know, this is the first FPGA implementation and experiment of the sieving step in NFS.

FPGA Implementation of High Throughput Circuit for Trial Division by Small Primes

Article

Full-text available

Jan 2007

Trial division is the most straightforward way to determine the prime fac- tors of a number, but the execution time is exponentially dependent on the size of the number. We have developed a novel hardware architecture which performs trial division of large dividends by small prime divisors at a much higher throughput than previously reported architectures. Our de- sign is implemented in FPGA devices and provides a speed-up of between one and two orders of magnitude over an optimized software implemen- tation of the same algorithm. These results can be employed to speed up factoring algorithms like the quadratic sieve or number field sieve when implemented in reconfigurable computers.

Multiplier Stage with CarryPropagate Adders and Non-redundant Representation

Context in source publication

Similar publications

Citations