Proposed 4-bit Carry Look-Ahead Adder

Source publication

Efficient Reversible Logic Design of BCD Subtractors

Article

Full-text available

Jan 2009

Reversible logic is emerging as a promising computing paradigm, having its applications in low-power CMOS, quantum computing, nanotech-nology and optical computing. Firstly, we showed a modified design of conventional BCD subtractors and also proposed designs of carry look-ahead and carry skip BCD subtractors. The proposed designs of carry look-ahe...

Context 1

View in full-text

Context 2

... MCLA [11] uses the modified full adder (MFA) as shown in Fig.4. In our proposed carry look-ahead adder shown in Fig.5, we propose replacing the 4 th MFA in the MCLA by a full adder to reduce the area (number of gates) without sacrificing speed improvement.It can easily be verified that there will be a reduction in the number of gates to generate the final carry, as shown in Fig.5. In order to have further savings in terms of the number of gates, the proposed 4-bit CLA can be cascaded in a series to design an expanded width CLA, as shown in Fig.6. ...

View in full-text

Context 3

... of the prominent functionalities of the TSG gate is that it can work singly as a reversible full adder unit. Figure 15 shows the implementation of the TSG gate as a reversible full adder. TSG can implement the reversible full adder with a bare minimum of two garbage output (at least two garbage output will be required to realize a reversible full adder). ...

View in full-text

Context 4

... designing the individual reversible components of the carry look-ahead BCD subtractor, the components are combined together to design the complete reversible carry look-ahead BCD subtractor, as shown in Fig.25. It is to be noted that we have used the same strategy of connecting the Feynman gates as chains for generating the XOR, copying and NOT functions, with zero garbage (Please refer to Fig.8, in which four XOR and one NOT gate is required in the middle of the CLA BCD subtractor). ...

View in full-text

Figure 6. A 4-level decomposition of this 2-bit multiplier

Figure 8. A 4-level decomposition of this 2-bit vedic multiplier

Figure 10. 1-bit adder block schematic using peres gates

Figure 11. 1-bit adder quantum circuit using Peres Gates

Optimized Design of Reversible Adder andMultiplier Using Peres Gates

Preprint

Full-text available

Apr 2024

This paper details the approach to the designand optimization of reversible adder and multiplierutilizing Peres gates which is a three input, three outputgate. Peres gates are recognized for their universality andenergy efficient properties and present an intriguingoption for constructing reversible circuits. Reversiblelogic characterized by its ab...

Design Methodologies for Reversible Logic Based Barrel Shifters

Article

Full-text available

Jan 2012

Saurabh Kotiyal

The reversible logic has the promising applications in emerging computing paradigm such as quantum computing, quantum dot cellular automata, optical computing, etc. In reversible logic gates there is a unique one-to-one mapping between the inputs and outputs. To generate an useful gate function the reversible gates require some constant ancillary i...

Realization of Peres Gate as Universal Structure using Quantum Dot Cellular Automata

Article

Full-text available

May 2016

The problems pertaining to scaling which the conventional technologies were facing was successfully addressed with the novel technology known as Quantum-dot Cellular Automata (QCA). The principle spin and anti-spin has been already employed for representation of the binary logic states. The QCA has exploited the concept of spin and evolved a new co...

Design of Reversible Fault Tolerant Programmable Logic Arrays with Vector Orientation

Article

Full-text available

Dec 2011

In recent years, reversible logic has emerged as a promising computing paradigm having application in low power CMOS, quantum computing and error detecting. Reversible computing dissipates zero energy in terms of information loss and also it can detect error of circuit by keeping unique input-output mapping. In this paper, we have proposed a regula...

Design of Efficient Reversible Multiplier

Chapter

Full-text available

Jan 2013

Rangaraju H. GAakash Babu SureshMuralidhara K. N

Reversible logic is emerging computing paradigm with applications in Ultra-low power Nano computing, Quantum computing, Low power CMOS design, Optical Information Processing, Bioinformatics etc. In this paper, the 4x4 reversible mul-tiplier circuit proposed with the design of new reversible gate called RAM. The proposed multiplier circuit is effici...

Design and analysis of efficient QCA reversible adders

Article

Full-text available

Apr 2019
J SUPERCOMPUT

Quantum-dot cellular automata (QCA) as an emerging nanotechnology are envisioned to overcome the scaling and the heat dissipation issues of the current CMOS technology. In a QCA structure, information destruction plays an essential role in the overall heat dissipation, and in turn in the power consumption of the system. Therefore, reversible logic, which significantly controls the information flow of the system, is deemed suitable to achieve ultra-low-power structures. In order to benefit from the opportunities QCA and reversible logic provide, in this paper, we first review and implement prior reversible full-adder art in QCA. We then propose a novel reversible design based on three- and five-input majority gates, and a robust one-layer crossover scheme. The new full-adder significantly advances previous designs in terms of the optimization metrics, namely cell count, area, and delay. The proposed efficient full-adder is then used to design reversible ripple-carry adders (RCAs) with different sizes (i.e., 4, 8, and 16 bits). It is demonstrated that the new RCAs lead to 33% less garbage outputs, which can be essential in terms of lowering power consumption. This along with the achieved improvements in area, complexity, and delay introduces an ultra-efficient reversible QCA adder that can be beneficial in developing future computer arithmetic circuits and architectures.

A stochastic link-fault-tolerant routing algorithm in folded hypercubes

Article

Full-text available

Oct 2018
J SUPERCOMPUT

A folded hypercube is obtained by adding complementary links to a hypercube. The diameter of the folded hypercube is almost half of that of the hypercube, while its degree is larger than the degree of the hypercube by only one. In this paper, we propose a stochastic link-fault-tolerant routing algorithm in a folded hypercube by introducing a limited global information called routing probabilities. For an n-dimensional folded hypercube, we have proved that the routing probabilities for all distances can be calculated in \(O(n^2\log n)\) time at each node and the message can be forwarded to its neighbor node at each node in O(n) time. We also conducted a computer experiment, the results of which show that our algorithm achieves a better performance than the best algorithm for a hypercube.

An efficient parallel algorithm for the coupling of global climate models and regional climate models on a large-scale multi-core cluster

Article

Full-text available

Aug 2018
J SUPERCOMPUT

High-performance computing for climate models has always been an interesting research area. It is valuable to nest a regional climate model within a global climate model, but large-scale simulation of the nesting or coupling severely challenges to the development of efficient parallel algorithms that fit well into multi-core clusters. This paper first presents research on the coupling of the Institute of Atmospheric Physics of Chinese Academy of Sciences Atmospheric General Circulation Model version 4.0 and the Weather Research and Forecasting model, then proposes an efficient parallel algorithm of the coupling. The algorithm includes initialization of input data, decomposition of computing grid and processes, parallel computing of component models, and data exchange by a coupler. By calling some subroutines of the Model Coupling Toolkit, the parallelization of the proposed algorithm is implemented. Experiments show that the parallel algorithm is very effective and scalable. The parallel efficiency of the algorithm on 1,024 CPU cores can reach up to 70%. Moreover, its parallel efficiency with respect to weak scalability is 72.56% on a multi-core cluster.

Performance of preconditioned iterative solvers in MFiX–Trilinos for fluidized beds

Article

Full-text available

Aug 2018
J SUPERCOMPUT

MFiX, a general-purpose Fortran-based suite, simulates the complex flow in fluidized bed applications via BiCGStab and GMRES methods along with plane relaxation preconditioners. Trilinos, an object-oriented framework, contains various first- and second-generation Krylov subspace solvers and preconditioners. We developed a framework to integrate MFiX with Trilinos as MFiX does not possess advanced linear methods. The framework allows MFiX to access advanced linear solvers and preconditioners in Trilinos. The integrated solver is called MFiX–Trilinos, here after. In the present work, we study the performance of variants of GMRES and CGS methods in MFiX–Trilinos and BiCGStab and GMRES solvers in MFiX for a 3D gas–solid fluidized bed problem. Two right preconditioners employed along with various solvers in MFiX–Trilinos are Jacobi and smoothed aggregation. The flow from MFiX–Trilinos is validated against the same from MFiX for BiCGStab and GMRES methods. And, the effect of the preconditioning on the iterative solvers in MFiX–Trilinos is also analyzed. In addition, the effect of left and right smoothed aggregation preconditioning on the solvers is studied. The performance of the first- and second-generation solver stacks in MFiX–Trilinos is studied as well for two different problem sizes. © 2018 Springer Science+Business Media, LLC, part of Springer Nature

SRAM- and STT-RAM-based hybrid, shared last-level cache for on-chip CPU–GPU heterogeneous architectures

Article

Full-text available

Jul 2018
J SUPERCOMPUT

Shared last-level cache (LLC) in on-chip CPU–GPU heterogeneous architectures is critical to the overall system performance, since CPU and GPU applications usually show completely different characteristics on cache accesses. Therefore, when co-running with CPU applications, GPU ones can easily occupy the majority of the LLC, making CPU applications starve severely. This imposes significant challenges to the design and management of the shared LLC in CPU–GPU heterogeneous architectures. To improve the overall system performance, we consider integrating conventional SRAM and a new memory technology (i.e., STT-RAM) to enlarge the shared LLC. Furthermore, we propose comprehensive management policies to reduce the contention between CPU and GPU units. Experimental results show that, compared with the conventional SRAM-only LLC design, our proposal improves the performance of CPU workloads by 17% while not hurting GPU ones and reduces the LLC energy consumption by 30% on average.

A fault-tolerant computing method for Xdraw parallel algorithm

Article

Full-text available

Jun 2018
J SUPERCOMPUT

Viewshed analysis has widely been used in various spatial analysis applications. But the expense of viewshed computation remains high both in time and space complexity for large-scale terrain data, so parallel computing technique has been introduced to improve their performance. However, the failure in such a parallel computing system with a lot of computing nodes or processors may lead to an increase in execution time and cost of running viewshed computation. Highly fault-tolerant parallel computing will greatly enhance the reliability of the algorithm without losing its performance. In this article, we present a fault-tolerant computing framework for parallel viewshed computation in a parallel computing system using redundancy computing strategy. Two schedule strategies, layer and axis direction schedule, are adopted, respectively, as primary process and slave process to check whether or not there are errors to occur during the computation. A rollback and re-computation process is presented to correct these errors, while an error is found by comparing the results of the primary process and its slave process. The fault-tolerant algorithm in this article is implemented using process-level and thread-level parallelization. Our method can make full use of multiple processors providing by parallel computing environment without losing the computation efficiency of the algorithm. To illustrate the usefulness of our approach, several experiments are executed by using Xdraw viewshed algorithm. The results demonstrate that our approach achieves the 14.91 of speedup ratio with 16 processes and the 99.4% of average precision rate in comparison with simple checkpoint Xdraw algorithm.

Novel parity-preserving reversible logic array multipliers

Article

Full-text available

Nov 2017
J SUPERCOMPUT

Mojtaba Valinataj

Reversible logic as a new promising design domain can be used for DNA computations, nanocomputing, and especially constructing quantum computers. However, the vulnerability to different external effects may lead to deviation from producing correct results. The multiplication is one of the most important operations because of its huge usage in different computing systems. Thus, in this paper, some novel reversible logic array multipliers are proposed with error detection capability through the usage of parity-preserving gates. By utilizing the new arrangements of existing reversible gates, some new circuits are presented for partial product generation and multi-operand addition required in array multipliers which results in two unsigned and three signed parity-preserving array multipliers. The experimental results show that the best of signed and unsigned proposed multipliers have the lowest values among the existing designs regarding the main reversible logic criteria including quantum cost, gate count, constant inputs, and garbage outputs. For \(4\times 4\) multipliers, the proposed designs achieve up to 28 and 46% reduction in the quantum cost and gate count, respectively, compared to the existing designs. Moreover, the proposed unsigned multipliers can reach up to 58% gate count reduction in \(16\times 16\) multipliers.

A simple token-based algorithm for the mutual exclusion problem in distributed systems

Article

Full-text available

Sep 2017
J SUPERCOMPUT

Solving the problem of mutually exclusive access to a critical resource is a major challenge in distributed systems. In some solutions, there is a unique token in the whole system which acts as a privilege to access a critical resource. Practical and easily implemented, the token-ring algorithm is one of the most popular token-based mutual exclusion algorithms known in this field’s literature. However, it suffers from low scalability and a high average waiting time for resource seekers. The present paper proposes a new algorithm which employs a two-dimensional torus logical structure of N processes and the token-ring algorithm concept. It performs in a way that increasingly raises scalability and reduces the average waiting time of the token-ring algorithm. The token makes a circular movement along the columns of the two-dimensional torus (vertical ring), while the requests for the critical resource make a circular movement along the rows of the torus (horizontal ring). In this algorithm, the number of messages exchanged is between 2√N + 1 and 3√N + 1 under light load situations and, under heavy load situations, is at the most three messages per critical section invocation. Thus, in contrast with the leading algorithms, the proposed algorithm has gained significant improvements, in addition to having been proved to operate correctly.

GPU parallelization of the sequential matrix diagonalization algorithm and its application to high-dimensional data

Article

Full-text available

Aug 2017
J SUPERCOMPUT

This paper presents the parallelization on a GPU of the sequential matrix diagonalization (SMD) algorithm, a method for diagonalizing polynomial covariance matrices, which is the most recent technique for polynomial eigenvalue decomposition. We first parallelize with CUDA the calculation of the polynomial covariance matrix. Then, following a formal transformation of the polynomial matrix multiplication code—extensively used by SMD—we insert in this code the cublasDgemm function of CUBLAS library. Furthermore, a specialized cache memory system is implemented within the GPU to greatly limit the PC-to-GPU transfers of slices of polynomial matrices. The resulting SMD code can be applied efficiently over high-dimensional data. The proposed method is verified using sequences of images of airplanes with varying spatial orientation. The performance of the parallel codes for polynomial covariance matrix generation and SMD is evaluated and reveals speedups of up to 161 and 67, respectively, relative to sequential execution on a PC.

A parallel solving method for block-tridiagonal equations on CPU–GPU heterogeneous computing systems

Article

Full-text available

May 2017
J SUPERCOMPUT

Solving block-tridiagonal systems is one of the key issues in numerical simulations of many scientific and engineering problems. Non-zero elements are mainly concentrated in the blocks on the main diagonal for most block-tridiagonal matrices, and the blocks above and below the main diagonal have little non-zero elements. Therefore, we present a solving method which mixes direct and iterative methods. In our method, the submatrices on the main diagonal are solved by the direct methods in the iteration processes. Because the approximate solutions obtained by the direct methods are closer to the exact solutions, the convergence speed of solving the block-tridiagonal system of linear equations can be improved. Some direct methods have good performance in solving small-scale equations, and the sub-equations can be solved in parallel. We present an improved algorithm to solve the sub-equations by thread blocks on GPU, and the intermediate data are stored in shared memory, so as to significantly reduce the latency of memory access. Furthermore, we analyze cloud resources scheduling model and obtain ten block-tridiagonal matrices which are produced by the simulation of the cloud-computing system. The computing performance of solving these block-tridiagonal systems of linear equations can be improved using our method.

Proposed 4-bit Carry Look-Ahead Adder

Contexts in source publication

Similar publications

Citations