Multiply-Accumulate (MAC) RNS cell (adopted from [37])

Multiply-Accumulate (MAC) RNS cell (adopted from [37])

Source publication
Article
Full-text available
In the last few years, the ancient residue number system has gained renewed scientific interest and has emerged as an interesting alternative in the field of secure hardware implementations. In this survey, however, we investigate some modern and non-typical applications of RNS in the areas of post-quantum cryptography, cloud infrastructures, and h...

Context in source publication

Context 1
... categories share common characteristics; for example, all calculations are simplified down to modulo multiply-accumulate (MAC) operations (i.e., a multiplication of small RNS digits followed by an addition and a modulo reduction by the respective RNS modulus; the output is recursively fed to the input to repeat the process). An example of a suitable MAC hardware architecture that supports BC algorithm 3.3 is shown in Figure 3 [37]. Each MAC unit comprises of a multiplier, an adder, and the modular reduction unit per each modulus of the RNS base (of the special form 2 r − µ i ). ...

Similar publications

Article
Full-text available
Multiplication is one of the basic operations that influence the performance of many computer applications such as cryptography. The main challenge of the multiplication operation is the cost of the operation as compared to other basic operations such as addition and subtraction, especially when the size of the numbers is large. In this work, we in...

Citations

... In recent decades, the residue number system (RNS) [1][2][3][4][5][6] has been increasingly applied in cryptography [2,7], error correction codes [8], and digital signal processing [3], owing to its carry-free nature and parallel computation. A reduced power consumption, shorter latency, and smaller hardware area can be achieved for applications based on RNS modulation addition [9][10][11][12][13] and multiplication [14][15][16][17][18][19][20][21][22][23][24][25]. ...
... In recent decades, the residue number system (RNS) [1][2][3][4][5][6] has been increasingly applied in cryptography [2,7], error correction codes [8], and digital signal processing [3], owing to its carry-free nature and parallel computation. A reduced power consumption, shorter latency, and smaller hardware area can be achieved for applications based on RNS modulation addition [9][10][11][12][13] and multiplication [14][15][16][17][18][19][20][21][22][23][24][25]. ...
Article
Full-text available
A multi-modulus architecture based on the radix-8 Booth encoding of a modulo (2n − 1) multiplier, a modulo (2n) multiplier, and a modulo (2n + 1) multiplier is proposed in this paper. It uses the original single circuit and shares many common circuit characteristics with a small extra circuit to carry out multi-modulus operations. Compared with a previous radix-4 study, the radix-8 architecture can increase the modulation multiplication encoding selection from three codes to four codes. This reduces the use of partial products from ⌊n/2⌋ to ⌊n/3⌋ + 1, but it increases the operation complexity for multiplication by three circuits. A hard multiple generator (HMG) is used to address this problem. Two judgment signals in the multi-modulus circuit can be used to perform three operations of the modulo (2n − 1) multiplier, modulo (2n) multiplier, and modulo (2n + 1) multiplier at the same time. The weighted representation is used to reduce the number of partial products. Compared with previously reported methods in the literature, the proposed approach can achieve better performance by being more area-efficient, being faster, consuming low power, and having a lower area-delay product (ADP) and power-delay product (PDP). With the multi-modulus HMG, the proposed modified architecture can save 34.48–55.23% of hardware area. Compared with previous studies on the multi-modulus multiplier, the proposed architecture can save 22.78–35.46%, 4.12–11.15%, 12.59–24.73%, 27.88–38.88%, and 20.49–27.85% of hardware area, delay time, dissipation power, ADP, and PDP, respectively. Xilinx field programmable gate array (FPGA) Vivado 2019.2 tools and the Verilog hardware description language are used for synthesis and implementation. The Xilinx Artix-7 XC7A35T-CSG324-1 chipset is adopted to evaluate the performance.
... As such, RNS is proper for digital signal and image processing [1,2] and cryptography [1]. In addition, RNS is a practical implementing option for RSA and elliptic curve cryptography, which are the most popular public key systems [3,4]. Indeed, complete computation of convolutional neural networks [5][6][7] in RNS would reduce cost and time, which are crucial factors. ...
Article
Full-text available
Given the efficiency of the residue number system in high-speed and low-power applications, a wide variety of moduli sets have been introduced in the literature. Selecting an appropriate moduli set is one of the most important issues of using a residue number system. Moduli sets can be evaluated by several factors such as dynamic range, balanced channels, fast and low-power modular operations, and efficient reverse converter. In this paper, 3- and 4-moduli sets are separately reviewed and compared from a different perspective. Furthermore, the most efficient 3- and 4-moduli sets are suggested. A new parameter for average delay comparison is introduced, which shows the delay of every bit in the dynamic range providing a fair comparison of moduli sets with different numbers of moduli. Our comparison showed that either balanced channels or the delay of computations are essential factors for designing an efficient new moduli set.
... However, to the best of our knowledge, this goal has not yet been achieved. In this regard homomorphic encryption (HE) [13] designed to allow arithmetic operations to be performed on the encrypted data. Therefore, designing a process mining algorithm that works with homomorphic encrypted data is a novel research area. ...
... Definition 3 (Homomorphic encryption): HE allows arithmetic operations including addition and multiplication over encrypted data without decryption procedure which can be used as a basis for computing complex functions [13]. Two types of HE have received more attention: partial homomorphic and fully homomorphic encryption cryptosystems. ...
Preprint
Full-text available
Novel technological achievements in the fields of business intelligence, business management and data science are based on real-time and complex virtual networks. Sharing data between a large number of organizations that leads to a system with high computational complexity is one of the considerable characteristics of the current business networks. Discovery, conformance and enhancement of the business processes are performed using the generated event logs. In this regard, one of the overlooked challenges is privacy-preserving in the field of process mining in the industry. To preserve the data-privacy with a low computational complexity structure that is a necessity for the current digital business technology, a novel lightweight encryption method based on Haar transform and a private key is proposed in this paper. We compare the proposed method with the well-known homomorphic cryptosystem and Walsh- Hadamard encryption (WHE) in terms of cryptography, computational complexity and structure vulnerability. The analyses show that the proposed method anonymizes the event logs with the lower complexity and more accuracy compared with two aforementioned cryptosystems, significantly.
... RNS comprises only the residue of the integers to process the data very promptly. Residue arithmetic multiplication is an inherent operation in RNS-based applications like Cryptosystem, Signal Processing [1] [2] and many other complex applications [3] [4]. Modified Booth Encoded (MBE) residue multiplication method is comparatively quicker than array-based modulo multiplication method. ...
Conference Paper
Residue Multiplication operations are extensively used in Residue Number System (RNS) based cryptosystem architecture. Pointing to increase the speed performance of RNS crypto processors, the new parallel unsigned 2n+1 residue multiplier is designed in this work. Mathematical model, Algorithm, Architecture and FPGA implementation is done in this work. The proposed residue multiplier is described in Verilog HDL and synthesized in Application-Specific Integrated Circuits (ASIC) environment. Cadence RTL Compiler estimates the Area, Power and Delay performance parameters using various CMOS libraries. The proposed 2ⁿ+1 residue multiplication scheme saves 13% of the area, improves the speed by 19% and PDP by 23% compared to the recent 2ⁿ+1 residue multipliers.
... Since for cryptosystems, computations with large integers (or F p elements) are required, hardware support is needed, and RNS properties can be exploited to speed up cryptographic computations. For example, RNS has been used to speed up computations with large operands for RSA in [2,15,16], for elliptic curve cryptography in [11,1,3], and for lattice basecryptography in [4,18]. ...
Preprint
Full-text available
We establish a connection between semi-primitive roots of the multiplicative group of integers modulo $2^{k}$ where $k\geq 3$, and the logarithmic base in the algorithm introduced by Fit-Florea and Matula (2004) for computing the discrete logarithm modulo $2^{k}$. Fit-Florea and Matula used properties of the semi-primitive root 3 modulo $2^{k}$ to obtain their results and provided a conversion formula for other possible bases. We show that their results can be extended to any semi-primitive root modulo $2^{k}$ and also present a generalized version of their algorithm to find the discrete logarithm modulo $2^{k}$. Various applications in cryptography, symbolic computation, and others can potentially benefit from higher precision hardware integer arithmetic. The algorithm is suitable for hardware support of applications where fast arithmetic computation is desirable.
... This mathematical operation is illustrated by the following representation of the output sequence defined as Y (n) and its input sequence X (n). The FIR filter formula is given by (13). ...
Article
Full-text available
The primary motive of this paper is to give the design and implementation of RNS (Residue Number System) based Area efficient and excessive-overall performance FIR filter of 4-tap, eight-tap, 16-tap of input eight bit. Additionally, RNS mathematics is a treasured device for theoretical research of the limits of fast mathematics. These proposed strategies additionally have a few additions operation, through the use of convention adder will decrease the speed of operation and additionally increase the number of logic gates. So, to conquer the one's issues we are using Ladner Fischer parallel prefix adder to lower the delay and area. First, the multiplier is designed through the use of RNS approach. In which the delay is decreased through 78.57% and power dissipation is likewise reduced to 64.65% for the RNS_PPA multiplier. A combination of those algorithms generates a brand-new structure of excessive speed and low implementation area in a single multiplier for FIR filter using Xilinx 14.7.
... We aim to directly generate randomly distributed integers with the desired precision using the PWLCM. The proposed algorithm combines chaos, modular arithmetic, and lattice-based cryptography [3,8,35]. The latter allows to easily extend the external key length without duplication. ...
... The latter allows to easily extend the external key length without duplication. Such a property is required as lattice-based ciphers are assumed to resist future attacks in the era of post-quantum computing [3,17,28,35]. For the algorithm to be implemented even with low-end processors, we consider only 4-bit precision random numbers generated from a 4D PWLCM for performing the confusion and diffusion operations. ...
Article
Full-text available
This paper presents a multiplierless image-cipher, with extendable 2048-bit key-space, based on a 4-dimensional (4D) quantized piece-wise linear cat map (PWLCM). The quantized PWLCM exhibits limit-cycles of 4-bit encoded integers with periods greater than 10⁷. The synthesis of the PWLCM in a finite state space allows to eliminate the undesirable finite precision effect due to the hardware realization. The proposed image-cipher combines chaos, modular arithmetic, and lattice-based cryptography to encrypt a color image by performing pixel permutation and diffusion in a single operation. Further, an image-dependent confusion operation based on an 8-bit 2D-PWLCM is performed on the whole image to enhance security. In order to increase the key-space without key duplication, 16 × 16 sub-images are modified using sub-keys of different lattice length vectors generated from the external key. Both simulations and security analyses confirm that the proposed algorithm can resist common cipher attacks, in addition to its advantages such as simplicity, ease of implementation on low-end processors and extensibility of key-space that allows it to easily adapt even for future post-quantum computing attacks.
... The three main areas of application are signal processing [9], [10], [11], cryptography but also in theoretical computer science to reach complexity bounds [12], [13]. The work in this paper is relevant for all applications on large numbers including cryptography since the 90's [14] with RSA, DH, ECC [15], [16], pairing [17], Euclidean lattices, homomorphic protocols [18], [19], etc. ...
Article
Residue Number Systems (RNS) are proven to be effective in speeding up computations involving additions and products. For these representations, there exists efficient modular reduction algorithms that can be used in the context of arithmetic over finite fields or modulo large numbers, especially when used in the context of cryptographic engineering. Their independence allows random draws of bases, which also makes it possible to protect against side-channel attacks, or even to detect them using redundancy. These systems are easily scalable, however the existence of large bases for some specific uses remains a difficult question. In this article, we present four techniques to extract RNS bases from specific sets of integers, giving better performance and flexibility to previous works in the litterature. While our techniques do not allow to solve efficiently every possible case, we provide techniques to provably and efficiently find the largest possible available RNS bases in several cases, improving the state-of-the-art on various works of the recent literature.
... Despite existing researches, there are still many unsolved problems that reduce trust in modern cloud and remote data storage services [3][4][5]. One of the ways to increase the reliability and security of data storage systems is the use of residual number system (RNS) [6,7]. ...
... Let's consider examples of the implementation of each of the abovementioned reverse transformation methods. Let's take the system of modules with minimal values p i = [3,5,7,11], n=4. ...
Article
The methods of conversion from the residual number system to the decimal number system based on the classical Chinese remainder theorem (CRT) and its improvements CRT I, CRT II are considered in this paper. Analytical dependences of the time complexity of the specified methods are analyzed and constructed. As the result of carried out investigation, it is established that CRT II is characterized by greater efficiency compared to the other methods mentioned above. Examples of the implementation of direct and reverse conversion of RNS based on the application of CRT , CRT I, CRT II are given.
... If the moduli are pair-wise prime, dynamic range M is maximized (i.e., M = Π k i=1 m i ). In RNS, operations like addition, subtraction, and multiplication are performed in k parallel independent channels, which makes it a promising candidate for applications that use frequent add/multiply operations such as finite impulse response digital filters [4], data transmission [1], cryptography [18], and image processing [27]. Furthermore, digital signal processing (DSP) has employed RNS due to such properties as carry-free operations, parallelism, and modularity [2]. ...
Article
Full-text available
Comparison, division and sign detection are considered complicated operations in residue number system (RNS). A straightforward solution is to convert RNS numbers into binary formats and then perform complicated operations using conventional binary operators. If efficient circuits are provided for comparison, division and sign detection, the application of RNS can be extended to the cases including these operations.For RNS comparison in the 3-moduli set , we have only found one hardware realization. In this paper, an efficient RNS comparator is proposed for the moduli set which employs sign detection method and operates more efficient than its counterparts. The proposed sign detector and comparator utilize dynamic range partitioning (DRP), which has been recently presented for unsigned RNS comparison. Delay and cost of the proposed comparator are lower than the previous works and makes it appropriate for RNS applications with limited delay and cost.