Fig 11 - uploaded by Manfred Schimmler
Content may be subject to copyright.
VHDL implementation of carry save adder

VHDL implementation of carry save adder

Source publication
Conference Paper
Full-text available
The computational fundament of most public-key cryptosystems is the modular multiplication. Improving the efficiency of the modular multiplication is directly associated with the efficiency of the whole cryptosystem. This paper presents an implementation and comparison of three recently proposed, highly efficient architectures for modular multiplic...

Similar publications

Article
Full-text available
An error correction code (ECC) design for a wired serial digital multi-gigabit communication system is presented. The code design combines a maximum run length code and a 2-error-correcting primitive BCH code. The implementation of the design in a field programmable gate array (FPGA) and the logic design of this code for low latency is discussed. R...
Conference Paper
Full-text available
This paper presents a new single event upset (SEU), multiple bit upset (MBU) and single hardware error (SHE) mitigation strategy to be used in Virtex-4 FPGAs. This strategy aims to increase not only the effectiveness of traditional triple module redundancy (TMR), but also the overall system availability. Frame readback with ECC detection and frame...

Citations

... However, it requires external circuits to transfer data to the Montgomery domain and correct the result to an acceptable range. Amanor et al. [43] proposed a new modular multiplication method based on interleaved modular multiplication, utilizing carry-save adders instead of normal ones. It can efficiently solve the high latency generated by the series connection of adders. ...
Article
Full-text available
In this paper, we present a novel lightweight elliptic curve scalar multiplication architecture for random Weierstrass curves over prime field Fp. The elliptic curve scalar multiplication is executed in Jacobian coordinates based on the Montgomery ladder algorithm with (X,Y)-only common Z coordinate arithmetic. At the finite field operation level, the adder-based modular multiplier and modular divider are optimized by the pre-calculation method to reduce the critical path while maintaining low resource consumption. At the group operation level, the point addition and point doubling methods in (X,Y)-only common Z coordinate arithmetic are modified to improve computation parallelism. A compact scheduling method is presented to improve the architecture’s performance, which includes appropriate scheduling of finite field operations and specific register connections. Compared with existing works, our design is implemented on the FPGA platform without using DSPs or BRAMs for higher portability. It utilizes 6.4k~6.5k slices in Kintex-7, Virtex-7, and ZYNQ FPGA and executes an elliptic curve scalar multiplication for a field size of 256-bit in 1.73 ms, 1.70 ms, and 1.80 ms, respectively. Additionally, our design is resistant to timing attacks, simple power analysis attacks, and safe-error attacks. This architecture outperforms most state-of-the-art lightweight designs in terms of area-time products.
... The NIST SECP256K1 curve is a short Weierstrass elliptic curve defined as E: y 2 = x 3 + ax + b mod p , where a = 0 and b = 7 and p = 2 256 −2 32 −2 9 −2 8 −2 7 −2 6 −2 4 −1, and provides efficient implementation of group operations. Moreover, the interleaved modular multiplier is area efficient and possesses low power consumption characteristics[41] [42]. ...
Article
Full-text available
The processing of locally harvested data at the physically accessible edge devices opens a new avenue of security threats for edge enhanced analytics. Cryptographic algorithms are used to secure the data being processed on the edge device. However, the implementation weakness of the algorithms on the edge devices can lead to side-channel attack vulnerability, which is exacerbated with the application of machine-learning techniques. This research proposes a deep learning-based system integrated at the edge device to identify the side-channel leakages. To design such a deep learning-based system, one of the challenges is formulating the suitable attack model for the underlying target algorithm. Based on the previous findings, three machine learning-based side-channel attack models are curated and investigated for the edge device security evaluations. As a test case, the standard elliptic-curve cryptographic algorithm is selected. Moreover, quantitative analysis is provided for the best attack model selection using standard machine-learning evaluation metrics. A comparative analysis is performed on the raw unaligned data samples and reduced feature-engineered samples using edge enhanced security analytics. The investigation concludes that the vulnerable algorithm implementation can lead to the secret key recovery from the edge device, with 96% accuracy, using a neural-network-based algorithm to analyse side-channel attacks.
... For ECC, there have been a large number of hardware architectures [7][8][9][10][11][12][13][14][15][16][17]. Among them, there are two methods for the realization of modular multiplication (MM), namely, the multiplier and the adder. ...
... In this study, the interleaved modular multiplication algorithm is selected. e standard interleaved modulo multiplication in [16] (Algorithm 2) has certain shortcomings. Since steps 5, 6, and 7 carry out addition operations with carry propagation and steps 6 and 7 check all lengths of the operands, there is a large latency. ...
... Since steps 5, 6, and 7 carry out addition operations with carry propagation and steps 6 and 7 check all lengths of the operands, there is a large latency. In response to this problem, the improved algorithm in [16] (Algorithm 3) performs addition operations with carry-save adders in the loop. Moreover, the modified algorithm in [16] (Algorithm 4) reduces the area and time by lookup-table method. ...
Article
Full-text available
This paper proposes a hardware-efficient elliptic curve cryptography (ECC) architecture over GF(p), which uses adders to achieve scalar multiplication (SM) through hardware-reuse method. In terms of algorithm, the improvement of the interleaved modular multiplication (IMM) algorithm and the binary modular inverse (BMI) algorithm needs two adders. In addition to the adder, the data register is another optimize target. The design compiler is synthesized on 0.13 µm CMOS ASIC platform. The time range of performing scalar multiplication over 160, 192, 224, and 256 field orders under 150 MHz frequency is 1.99–3.17 ms. Moreover, the gate area required for different field orders in this design is in the range of 35.65k–59.14k, with 50%–91% hardware resource less than other processors.
... Specific prime field multiplication and Montgomery multiplication are used in multiplier-based structures [21]. Interleaved multiplication algorithm is usually applied in the adder-based structures [22]. The processors in [9,10,12,17,18] are based on adders and aim at low hardware and power consumption. ...
Article
Full-text available
Elliptic curve cryptography (ECC) is widely used in practical applications because ECC has far fewer bits for operands at the same level of security than other public-key cryptosystems such as RSA. The performance of an ECC processor is usually determined by modular multiplication (MM) and point multiplication (PM) operations. For recommended prime field, MM operation can consist of multiplication and fast reduction operations. In this paper, a 256-bit multiplication operation is implemented by a 129-bit (half-word) multiplier using Karatsuba–Ofman multiplication algorithm. The fast reduction is a modulo operation, which gets 512-bit input data from multiplication and outputs a 256-bit result ( 0 ≤ Z < p ) . We propose a two-stage fast reduction algorithm (TSFR) over SCA-256 prime field, which can obtain an intermediate result of 0 ≤ Z < 2 p instead of 0 ≤ Z < 14 p in traditional algorithm, avoiding a lot of repetitive subtraction operations. The PM operation is implemented in width nonadjacent form (NAF) algorithm and its operational schedules are improved to increase the parallelism of multiplication and fast reduction operations. Synthesized with a 0.13 μ m complementary metal oxide semiconductor (CMOS) standard cell library, the proposed processor costs an area of 280 k gates and PM operation takes 0.057 ms at the frequency of 250 MHz. The design is also implemented on Xilinx Virtex-6 platform, which consumes 27.655 k LUTs and takes 0.37 ms to perform one 256-bit PM operation, attaining six times speed-up over the state-of-the-art. The processor makes a tradeoff between area and performance, thus it is better than other methods.
... A modular reduction unit is designed based on an interleaved modular multiplier architecture similar to the one proposed in [33,34]. Based on the implementation results in [33,34], an interleaved modular multiplier has more efficient area and timing characteristics. ...
... A modular reduction unit is designed based on an interleaved modular multiplier architecture similar to the one proposed in [33,34]. Based on the implementation results in [33,34], an interleaved modular multiplier has more efficient area and timing characteristics. For fast and area-efficient implementation of such a multiplier, we use just one CSA adder and a look-up table. ...
Article
Full-text available
Security of embedded systems is the need of the hour. A mathematically secure algorithm runs on a cryptographic chip on these systems, but secret private data can be at risk due to side-channel leakage information. This research focuses on retrieving secret-key information, by performing machine-learning-based analysis on leaked power-consumption signals, from Field Programmable Gate Array (FPGA) implementation of the elliptic-curve algorithm captured from a Kintex-7 FPGA chip while the elliptic-curve cryptography (ECC) algorithm is running on it. This paper formalizes the methodology for preparing an input dataset for further analysis using machine-learning-based techniques to classify the secret-key bits. Research results reveal how pre-processing filters improve the classification accuracy in certain cases, and show how various signal properties can provide accurate secret classification with a smaller feature dataset. The results further show the parameter tuning and the amount of time required for building the machine-learning models.
... The design required major changes to make it routable and to pipeline it (mainly the rounds steps). -A 1 Kb modular multiplier based on the interleaved modular multiplication algorithm [51]. -A modular exponentiation block (mexp) based on the Square-and-Multiply algorithm [52]. ...
... Those two operations cannot be pipelined without delay. There are researchers have tried to address these problems previously, such as shown in [22]. In which, Algorithm 4 adopts Modular multiplication using carry save addition and Algorithm 5 uses Optimized version of the new algorithm. ...
Article
Full-text available
In this paper, a low hardware consumption design of elliptic curve cryptography (ECC) over GF(p) in embedded applications is proposed. The adder-based architecture is explored to reduce the hardware consumption of performing scalar multiplication (SM). The Interleaved Modular Multiplication Algorithm and Binary Modular Inversion Algorithm are improved and implemented with two full-word adder units. The full-word register units for data storage are also optimized. The design is based on two full-word adder units and twelve full-word register units of pipeline structure and was implemented on Xilinx Virtex-4 platform. Design Compiler is used to synthesized the proposed architecture with 0.13 μm CMOS standard cell library. For 160, 192, 224, 256 field order, the proposed architecture consumes 5595, 7080, 8423, 9370 slices, respectively, and saves 17.58∼54.93% slice resources on FPGA platform when compared with other design architectures. The synthesized result uses 35.43 k, 43.37 k, 50.38 k, 57.05 k gate area and saves 52.56∼91.34% in terms of gate count in comparison. The design takes 2.56∼4.07 ms to perform SM operation over different field order under 150 MHz frequency. The proposed architecture is safe from simple power analysis (SPA). Thus, it is a good choice for embedded applications.
... These values were obtained from the analysis for FPGA Xilinx XC2V6000. FPGA Xilinx XCV2000E-6 was used in Paar (2005), 1024 bits RSA exponentiation can be performed with frequency 69.4 MHz and with time 6.1 ms. ...
Article
Full-text available
This article deals with encryption on Field Programmable Gate Array (FPGA). The first part describes current state of symmetric and asymmetric cryptography. The following part focuses on the AES algorithm and its implementation in VHDL language. The last part shows testing results of mentioned implementation on card NFB-40G2 containing FPGA from Xilinx series Virtex-7.
... The main difficulty of the Blakley algorithm is the computation of addition on large operands. The modified Blakley algorithm for large operands is shown in [5] and [6]. The use of carry save adder (CSA) helps to speed up the repeated additions on large operands. ...
Article
Full-text available
This paper is devoted to the design of dual core crypto processor for executing both Prime field and binaryfield instructions. The proposed design is specifically optimized for Field programmable gate array(FPGA) platform. Combination of two different field(prime field GF(p) and Binary field GF(2m))instructions execution is analysed.The design is implemented in Spartan 3E and virtex5. Both theperformance results are compared. The implementation result shows the execution of parallelism usingdual field instructions
... The main difficulty of the Blakley algorithm is the computation of addition on large operands. The modified Blakley algorithm for large operands is shown in [8] and [9]. The use of carry save adder (CSA) helps to speed up the repeated additions on large operands. ...