Three hardware architectures of a hash function: a) basic iterative: x1, b) folded horizontally by a factor of 2: /2(h), c) folded vertically by a factor of 2: /2(v). R-round, S1, S2-selection functions.

Source publication

Throughput vs. Area Trade-offs in High-Speed Architectures of Five Round 3 SHA-3 Candidates Implemented Using Xilinx and Altera FPGAs

Conference Paper

Full-text available

Sep 2011

In this paper we present a comprehensive comparison of all Round 3 SHA-3 candidates and the current standard SHA-2 from the point of view of hardware performance in modern FPGAs. Each algorithm is implemented using multiple architectures based on the concepts of folding, unrolling, and pipelining. Trade-offs between speed and area are investigated,...

Context 1

... starting point for our exploration of various architectures of hash functions is the basic iterative architecture, shown in Fig. 1a. The characteristic features of this architecture are as follows: a) datapath width = state size (denoted by s), b) one round is performed in a single clock cycle, c) only one message is processed at a time. The minimum block processing time is typically given by ...

View in full-text

Context 2

... a round of a hash function has a symmetric structure, with two or more similar operations performed one after another, horizontal folding is possible. In Fig. 1b, horizontal folding by a factor of two is demonstrated. We will denote this architecture by /2(h). ...

View in full-text

Context 3

... case horizontal folding is either not possible or does not achieve the re- quired reduction in area, vertical folding may be attempted. In Fig. 1c, we demonstrate vertical folding by a factor of 2, which we denote by /2(v). In this architecture, the datapath width is reduced by a factor of two. As a result two clock cycles are required to complete a round. In the first clock cycle, only bits of the internal state affecting the first half of the round output are provided to the ...

View in full-text

Hardware-Efficient Architecture for Generalized Voronoi Diagram Construction Using a Prediction-Correction Approach

Article

Full-text available

Jan 2007

This paper presents a hardware-efficient scheme to con- struct sensor-based Generalized Voronoi Diagram (GVD) of an indoor environment. An architecture to construct the GVD using a prediction and correction strategy is pre- sented. The approach is based on processing distance infor- mation from ultrasonic sensors. A feature of the proposed approach...

A modified MixColumn-InversMixColumn in AES algorithm suitable for hardware implementation using FPGA device

Article

Full-text available

Dec 2023

This article described the Advanced Encryption Standard (AES) encryption and decryption process without using lookup tables in the MixColumns transformation and parallelizing the transformation process implemented in the Field Programmable Gate Array (FPGA) hardware. Parallelism of the hardware process conducted to the transformation of key schedul...

Figure 1. The main components of an electrocardiogram (author: Hank van...

Figure 3. Control data storage virtual circuit model. In Figure 3, the...

Figure 4. Complete signal chain of the system.

Figure 5. Detail of the two stages composing the system.

A Low-Latency, Low-Power FPGA Implementation of ECG Signal Characterization Using Hermite Polynomials

Article

Full-text available

Sep 2021

Automatic ECG signal characterization is of critical importance in patient monitoring and diagnosis. This process is computationally intensive, and low-power, online (real-time) solutions to this problem are of great interest. In this paper, we present a novel, dedicated hardware implementation of the ECG signal processing chain based on Hermite fu...

Pseudocode of adaptive recovery CAMP algorithm

Chip-Scope experimental results for reconstructed LFMCW radar signal...

Synthesize report for the implemented generic pipelined adaptive...

Difference between pipelined processing and non-pipelined processing

General block diagram of CS radar signal processing for LFMCW radar...

Design and implementation of proposed pipelined adaptive recovery CAMP algorithm for LFMCW radar

Article

Full-text available

Mar 2021

Sameh G. Salem

Recently, Compressive Sensing (CS) theory based on the traditional CAMP reconstruction algorithm has applied in radar systems to achieve the benefits of CS such as low sampling rate, small memory size, less complexity in hardware and consequently reduces the required processing time as using a low speed Analog-to-Digital Converter. A modified recon...

An Efficient Design Flow for Accelerating Complicated-connected CNNs on a Multi-FPGA Platform

Conference Paper

Full-text available

Aug 2019

Convolutional Neural Networks (CNNs) have achieved impressive performance on various computer vision tasks. To facilitate better performance, some complicated-connected CNN models (e.g., GoogLeNet and DenseNet) have recently been proposed, and have achieved state-of-the-art performance in the fields of image classification and segmentation. However...

Secure Hash Algorithms and the Corresponding FPGA Optimization Techniques

Article

Oct 2020
ACM COMPUT SURV

Cryptographic hash functions are widely used primitives with a purpose to ensure the integrity of data. Hash functions are also utilized in conjunction with digital signatures to provide authentication and non-repudiation services. The SHA has been developed over time by the National Institute of Standards and Technology for security, optimal performance, and robustness. The best-known hash standards are SHA-1, SHA-2, and SHA-3. Security is the most notable criterion for evaluating the hash functions. However, the hardware performance of an algorithm serves as a tiebreaker among the contestants when all other parameters (security, software performance, and flexibility) have equal strength. Field Programmable Gateway Array (FPGA) is a reconfigurable hardware that supports a variety of design options, making it the best choice for implementing the hash standards. In this survey, particular attention is devoted to the FPGA optimization techniques for the three hash standards. The study covers several types of optimization techniques and their contributions to the performance of FPGAs. Moreover, the article highlights the strengths and weaknesses of each of the optimization methods and their influence on performance. We are optimistic that the study will be a useful resource encompassing the efforts carried out on the SHAs and FPGA optimization techniques in a consolidated form.

Field Programmable Gate Array Applications—A Scientometric Review

Article

Full-text available

Nov 2019

Field Programmable Gate Array (FPGA) is a general purpose programmable logic device that can be configured by a customer after manufacturing to perform from a simple logic gate operations to complex systems on chip or even artificial intelligence systems. Scientific publications related to FPGA started in 1992 and, up to now, we found more than 70,000 documents in the two leading scientific databases (Scopus and Clarivative Web of Science). These publications show the vast range of applications based on FPGAs, from the new mechanism that enables the magnetic suspension system for the kilogram redefinition, to the Mars rovers’ navigation systems. This paper reviews the top FPGAs’ applications by a scientometric analysis in ScientoPy, covering publications related to FPGAs from 1992 to 2018. Here we found the top 150 applications that we divided into the following categories: digital control, communication interfaces, networking, computer security, cryptography techniques, machine learning, digital signal processing, image and video processing, big data, computer algorithms and other applications. Also, we present an evolution and trend analysis of the related applications.

Resource Efficient Implementation of the Keccak, Skein & JH Algorithms on a Reconfigurable Platform

Article

Full-text available

Jan 2016

In this work, we present a compact hardware implementation of cryptographic hash algorithms; [Keccak, Skein & JH] on Field Programmable Gate Array (FPGA) by using an efficient primitive level programming approach. All the logic is not only mapped onto Look-Up-Table (LUT) but also effectively utilizes FPGAs internal dedicated logical resource, such as Fast Carry Chain logic with MUXCY and XORCY to reduce overall hardware resources. This approach results in the usage of a minimized chip area with a good balance between resources and speed for selected hash algorithms. All the implementation has been done on the latest Xilinx FPGAs and their results comparisons are presented in the form of chip area consumption, throughput and throughput per area with previous up-to-date implementations. The results show a substantial improvement as compared to all the previously reported works.

SCA Resistance Analysis on FPGA Implementations of Sponge Based MAC−PHOTON

Conference Paper

Full-text available

Jun 2015

N. Nalla Anandakumar

\(\mathtt{PHOTON} \) is a lightweight hash function which was proposed by Guo et al. in CRYPTO 2011. This is used in low-resource ubiquitous computing devices such as RFID tags, wireless sensor nodes, smart cards and mobile devices. \(\mathtt{PHOTON} \) is built using sponge construction and it provides a new \(\mathtt{MAC} \) function called \(\mathtt{MAC}-\mathtt{PHOTON} \). This paper deals with FPGA implementations of \(\mathtt{MAC}-\mathtt{PHOTON} \) and their side-channel attack (SCA) resistance. First, we describe three architectures of the \(\mathtt{MAC}-\mathtt{PHOTON} \) based on the concepts of iterative, folding and unrolling, and we provide their performance results on the Xilinx Virtex-5 FPGAs. Second, we analyse security of the \(\mathtt{MAC}-\mathtt{PHOTON} \) against side-channel attack using a SASEBO-GII development board. Finally, we present an analysis of its Threshold Implementation (TI) and discuss its resistance against first-order power analysis attacks.

FPGA-based SHA-3 Acceleration on a 32-bit Processor via Instruction Set Extension

Conference Paper

Full-text available

Jun 2015

As embedded systems play more and more important roles Internet of Things (IoT), the integration of cryptographic functionalities is an urgent demand to ensure data and information security. Recently, KECCAK was declared as the winner of the third generation of Secure Hashing Algorithm (SHA-3). However, implementing SHA-3 on a specific 32-bit processor failed to meet the performance requirement. On the other hand, implementing it as a cryptographic coprocessor consumes a lot of extra area and requires customized driver program. Although implementing KECCAK on a 64-bit platform is more efficient, this platform is not suitable for embedded implementation. In this paper, we propose a novel SHA-3 implementation using instruction set extension based on a 32-bit LEON3 processor (an open source processor), with the goals of reducing execution cycles and code size. Experimental results show that the proposed design reduces around 87% execution cycles and 10.5% code size as compared to reference designs. Our design takes up only 9.44% extra area with negligible speed overhead compared to the standard LEON3 processor. Compared to the existing hardware accelerators, our proposed design occupies only half of area resources and does not require extra driver programs to be developed when integrated into the overall system.

An Efficient FPGA-Based Architecture of Skein for Simple Hashing and MAC Function

Conference Paper

Full-text available

Sep 2013

Skein is a hash function that reached the semifinals of the NIST competition for the selection of standard SHA-3. This paper describes the implementation of Skein-512 operating as simple hash function and as MAC function. The design was coded using VHDL language and for the hardware implementation, two XILINX FPGAs, Virtex-6 and Virtex-7 were used. The proposed implementation reaches a data throughput of 894 Mbps at 110 MHz clock frequency for Virtex-6 and a throughput of 975 Mbps at 120 MHz clock frequency for Virtex-7.

Online scheduling and placement of hardware tasks with multiple variants on dynamically reconfigurable field-programmable gate arrays

Article

Aug 2013
COMPUT ELECTR ENG

Thomas Marconi

DEVELOPMENT AND BENCHMARKING OF NEW HARDWARE ARCHITECTURES FOR EMERGING CRYPTOGRAPHIC TRANSFORMATIONS

Thesis

Full-text available

Jul 2013

Marcin Rogawski

Cryptography is a very active branch of science. Due to the everlasting struggle between cryptographers, designing new algorithms, and cryptanalysts, attempting to break them, the cryptographic standards are constantly evolving. In the period 2007-2012, the National Institute of Standards and Technology (NIST) held a competition to select a new cryptographic hash function standard, called SHA-3. The major outcome of this contest, apart from the winner - Keccak, is a strong portfolio of cryptographic hash functions. One of the five final SHA-3 finalists, Groestl, has been inspired by Advanced Encryption Standard (AES), and thus can share hardware resources with AES. As a part of this thesis, we have developed a new hardware architecture fora high-speed coprocessor supporting HMAC (Hash Message Authentication Code) based on Groestl and AES in the counter mode. Both algorithms provide efficient hardware acceleration for the authenticated encryption functionality, used in multiple practical security protocols (e.g., IPSec, SSL, and SSH). Our coprocessor outperforms the most competitive design by Jarvinen in terms of the throughput and throughput/area ratio by 133\% and 64\%, respectively. Pairing-based cryptography has emerged as an important alternative and supplement to traditional public key cryptography. Pairing-based schemes can be used for identity-based encryption, tripartite key exchange protocols, short signatures, identity-based signatures, cryptanalysis, and many other important applications. Compared to other popular public key cryptosystems, such as ECC and RSA, pairing-based schemes are much more computationally intensive. Therefore, hardware acceleration based on modern high-performance FPGAs is an important implementation option. Pairing-schemes over prime fields are considered particularly resistant to cryptanalysis, but at the same time, the most challenging to implement in hardware. One of the most promising optimization options is taking advantage of embedded resources of modern FPGAs. Practically all FPGA vendors incorporate in modern FPGAs, apart from basic reconfigurable logic blocks, also embedded components, such as DSP units, Fast Carry Chain Adders, and large memory blocks. These hardwired FPGA resources, together with meticulously selected prime numbers, such as Mersenne, Fermat, or Solinas primes, can serve as a basis of an efficient hardware implementation. In this work, we demonstrate a novel high-speed architecture for Tate pairing over prime fields, based on the use of Solinas primes, Fast Carry Chains, and DSP units of modern FPGAs. Our architecture combines Booth recoding, Barrett modular reduction, and the high-radix carry-save representation in the new design for modular multiplication over Solinas primes. Similarly, a low-latency modular adder, based on high-radix carry save addition, Fast Carry Chains, and the Kogge-Stone architecture, has been proposed. The modular multiplier and adder based on the aforementioned principles have been used as basic building blocks for a higher level application - a high-speed hardware accelerator for Tate pairing on twisted supersingular Edwards curves over prime fields. The fastest version of our design calculates Tate pairing at the 80, 120 and 128-bit security level over prime fields in 0.13, 0.54 and 0.70 ms, respectively. It is the fastest pairing implementation over prime fields in the 120-128-bit security range. Apart of the properly designed architectures for cryptographic algorithms, one more ingredient contributes to the success of a hardware coprocessor for any application - an electronic design automation software and its set of options. Concerning this issue, Cryptographic Engineering Research Group (CERG) at Mason has developed an open-source environment, called ATHENa (Automated Tool for Hardware EvaluatioN), for fair, comprehensive, automated, and collaborative hardware benchmarking and optimization of algorithms implemented in FPGAs. One of the contributions of this thesis is the design of the heart of ATHENa: its most efficient heuristic optimization algorithm, called GMU_Optimization_1. As a basis of its development, multiple comprehensive experiments have been conducted. This algorithm has been demonstrated to provide up to 100\% improvement in terms of the throughput to area ratio, when applied to 14 SHA-3 Round 2 candidates. Additionally, our optimization strategy is applicable to the optimization of dedicated hardware in any other area of science and engineering.

Throughput/Area Trade-Offs of Loop-Unrolling, Functional and Structural Pipeline for Skein Hash Function

Article

Full-text available

Feb 2013

Efficient Hardware Implementation of SHA-3 Candidate Grøstl using FPGA

Article

Oct 2012

In 2007 NIST announced a public competition to develop a new cryptographic hash algorithm. This competition was announced due to the fact that in recent years, several successful attacks have been reported against SHA-1, thus raised significant alarming conditions against SHA-2. This new algorithm will replace the SHA-2 and can be used in various security applications in the information infrastructure. This paper focuses on efficient implementation of one of the SHA-3 candidates and round-3 finalist Grøstl on FPGA. The aim of this work is to achieve high throughput to area ratio (TPA) simultaneously by achieving high throughput by considering tradeoff between area and speed. The design is implemented as fully autonomous with both permutations P and Q are executed in parallel, and are equipped with I/O wrapper. The developed hardware has two designs, first with S-box is implemented using Look-Up-Table (LUT) or Distributed Memory and second with S-box implemented as Block RAM (BRAM). The implementation results obtained using virtex-5, when S-box is implemented as LUT has a throughput of 9.360Gbps and occupied 2253 Slices including I/O wrapper, thus achieves TPA of 4.154 and when S-box implemented as BRAM has throughput of 5.565Gbps and occupied 1356 Slices with wrapper, thus achieves 4.104 throughput per unit area (TPA). 1.

Three hardware architectures of a hash function: a) basic iterative: x1, b) folded horizontally by a factor of 2: /2(h), c) folded vertically by a factor of 2: /2(v). R-round, S1, S2-selection functions.

Contexts in source publication

Similar publications

Citations