Article

Principles o/cmos vlsi design

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... But area is increased and circuit becomes complex as number of bits increases [16]. Later many methods are introduced to improve the performance of CSLA, like modifying the internal full adder structure [19], but internal adder structure improves the performance, but the architecture is same which is again bulky and complex as the number of bits increases and another type of CSLA is replacing one of the RCA block with BEC, SQT-root adder [1,5,6]. ...
... Comparative analysis is done by implementing conventional CSLA, CSLA using D-Latch, CSLA using CLA. Sumalatha [19] designed FIR filter using CSLA-BEC based were area, power increased. ...
... This architecture further improves the speed and as well as area. But in CLA as number of bits increases complexity increases because the length of Boolean equation increases [19] but using Group of CLA method [22] complexity reduces. ...
Article
Adders are one of the basic arithmetic circuits of any processor, microcomputer, multiplication circuits , etc. Present the most substantial area in the research of VLSI design is area, power, and efficient high-speed circuits. In this paper two new architectures Carry Select Adder (CSLA) using D-latch and CSLA using multiplexers are proposed to reduce the power, delay, and its efficiency is compared with conventional carry select adder (CSLA) and existing literature. Architectural level power reduction is the most important area where it plays a vital role in improving the speed and power of the overall circuit. The existing and proposed CSLAs are synthesized with the Synopsys EDA tool using 32 nm technology node is used for the design and implementation. The values obtained across technology in nm underline the dominance of the proposed adder architectures in terms of delay and energy and area efficiency. For the proposed architectures, the evaluation results show that 60%-75% improvement in delay and 22%-56%in power when compared to other architectures. Proposed architectures shows increment in area but overall are delay product reduces compares to existing designs The proposed CSLA-DLATCH-FIR obtained significant reduction for 16-bits for power (46.4 (µW)), delay (0.66), ADP (359.8 × 10-15), and PDP (30.63 × 10-15). While the proposed CSLA-MUX-FIR has attained substantial performance for power (52.44 (µW)), delay (0.82 ns), ADP (457.392 × 10-15), and PDP (42.9 × 10-15). With the use of this proposed CSLA, a FIR filter was able to significantly reduce its power and delay.
... As the most important function of a PFD is to sense the phase difference and error frequency between the input reference (F REF ) and output signal (F VCO ) of voltage control oscillator (VCO), it plays a vital role in the performance of CDR circuit. A conventional PFD is generally built by a state machine with memory elements presented in Fig. 1b only to offer a major drawback in terms of a huge dead zone in the phase behavior which in-turn leads to more jitter (Weste and Eshragrian 1993;Winterstein and Nossek 2018). So the design of a speed and power improved blind/dead zone free PFD with broad input range becomes the need of an hour to cut down the design complexity of a high speed CDR (Weste and Eshragrian 1993;Kondoh et al. 1995;Soliman et al. 2002). ...
... A conventional PFD is generally built by a state machine with memory elements presented in Fig. 1b only to offer a major drawback in terms of a huge dead zone in the phase behavior which in-turn leads to more jitter (Weste and Eshragrian 1993;Winterstein and Nossek 2018). So the design of a speed and power improved blind/dead zone free PFD with broad input range becomes the need of an hour to cut down the design complexity of a high speed CDR (Weste and Eshragrian 1993;Kondoh et al. 1995;Soliman et al. 2002). Another major challenge in designing a CDR is that the CP is expected to produce a stable matching current among pull up network (charging current) and pull down network (discharging current). ...
... A PFD is preferred in most of the CDR applications as it measures both frequency as well as phase at a time unlike a phase detector, which only detects the phase transition error (Soliman et al. 2002;Razavi 1996). The presence of reset path in conventional PFD doesn't only create a huge dead zone in phase characteristics resulting to jitters in locked state, but also restricts the highest speed of operation (Weste and Eshragrian 1993;Razavi 1996). This is because the conventional PFD is an asynchronous state machine and the delay needed to reset each and every internal node slows down the speed of overall circuit. ...
Article
Full-text available
This paper explores a speed and power improved dead zone free, low gate count CMOS phase frequency detector with charge pump (PFD-CP) for clock and data recovery application. Implemented in 90 nm CMOS technology, the proposed circuit configuration estimates a layout area of 420.66 lm 2 and burns a low power as small as 172.10 lW when simulated with 5 GHz frequency at a power supply of 1.2 V at Cadence Virtuoso platform. With the elimination of reset path available in conventional PFD, this architecture doesn't only become blind zone free, but it also offers a lower phase noise and output noise of-142.46 dBc/Hz and-131.145 dBc/Hz respectively at 1 MHz offset. We have also studied the performance metrics with skew and without skew at different extreme corners for schematic and post layout to manifest the variation awareness and robustness of the circuit. The scalability of the circuit arrangement is also endorsed at lower CMOS technology.
... In an ASIC environment, each generation attempts to squeeze in as many I/O circuits on the periphery, by reducing the width of the I/O cell, and compensate by increasing the height of the ASIC I/O cell [11][12][13][14][15]. With the long-narrow ASIC I/O cell, the receiver network is moved farther away from the bond pad and the first stage of the ESD network. ...
... In an ASIC system, there is a given chip area specified for the I/O circuitry [9][10][11][12][13][14][15]. This is planned as a certain percentage of the total chip area. ...
... ESD networks are required between ground power rails for every independent domain. In an ASIC system, analog and digital circuits are in separate power domains [7,8,[11][12][13][14][15][16][17][18]. These domains must be interconnected through ground-to-ground ESD networks. ...
... Various abstraction levels in the digital design flow are system level, module level, register transfer level (RTL), gate level and transistor level, respectively. As this go from system level to device level the amount of abstraction keep on decreasing [35]. Approaches like power down and partitioning address the power minimization problem at a system level. ...
... At a circuit level, performance improvement is obtained by using different logic styles as per the requirement, approaches like clock gating, energy recovery etc. [38]. Methods like dual VT and threshold reduction are applicable to minimize power dissipation at technology level [35]. Table IV. ...
Article
Full-text available
Of late, there is a steep rise in the usage of handheld gadgets and high speed applications. VLSI designers often choose static CMOS logic style for low power applications. This logic style provides low power dissipation and is free from signal noise integrity issues. However, designs based on this logic style often are slow and cannot be used in high performance circuits. On the other hand designs based on Domino logic style yield high performance and occupy less area. Yet, they have more power dissipation compared to their static CMOS counterparts. As a practice, designers during circuit synthesis, mix more than one logic style judiciously to obtain the advantages of each logic style. Carefully designing a mixed static Domino CMOS circuit can tap the advantages of both static and Domino logic styles overcoming their own short comings.
... There are several different ways to implement a such a circuit. In this section we consider the design of a basic full adder circuit using the equations (1) and (2) given in the references [9,10].The equation for the Sum S of traditional 1-bit full adder is: ...
... In nanomechanical systems of sufficiently large sale, the ~1012W/m3 power dissipation density of ~1GHz nanomechanical logic systems exceeds any possible means of cooling [1,2]. To overcome the above limitations and speed constraints due to mechanical design, parallel architectures are one possible solution [9]. Reversible computer logic also present an appropriate solution to the above problem of energy dissipation and thermal noise [5,6]. ...
Article
Full-text available
Nanomechanical computational systems proposed by K.Drexler is one of the approaches to molecular scale electronic circuits for memory and logic applications. The nanomechanical rod logic proposed is distinguished by its small size and very low energy dissipation, compared to transistors. it has been found from the sources available to us, that presently the idea was highly unexplored and still remains in its nascent stages This paper presents the use of nanorods as variant to the existing silicon technology for the design of a full adder circuit along with the speed, power and energy dissipation characteristics. The full adder circuit is then extended to an n-bit adder (which forms the basis of an ALU circuit). The paper also addresses the use of parallel architectures to overcome the limitations of switching speeds.
... This section attempts a comparison of the four techniques presented in the previous sections in terms of their hardware cost, assuming CMOS VLSI target technology [136]. In this teclinology, it is known that typically CostA}4D/OR=6 transistors (implemented as a 2input NAND/NOR followed by an inverter), and CoR(k)=2x(k+l) transistors (imple-mented as a k-iaput NOR followed by an inverter). ...
... In this teclinology, it is known that typically CostA}4D/OR=6 transistors (implemented as a 2input NAND/NOR followed by an inverter), and CoR(k)=2x(k+l) transistors (imple-mented as a k-iaput NOR followed by an inverter). It is also assumed that the XOR gates are implemented as transmission-gate XORs, thus yielding CostxoR=6 transistors [136]. It can be argued that the transmission-gate XOR, although particularly cheap, is not the best implementation of an XOR function; indeed, the realisation using three NAND gates and two inverters is usually prefered by most designers. ...
Thesis
p>On-line testing increases hardware reliability, which is essential in safety-critical applications, particularly in hostile operating conditions. High-level synthesis, on the other hand, offers fast time-to-market and allows quick and painless design space exploration. This thesis details the realisation of on-line testability, in the form of self-checking design, within a high-level synthesis environment. The MOODS (Multiple Objective Optimisation in Data and control path Synthesis) high-level synthesis suite is used for the implementation of this concept. A high-level synthesis tool typically outputs controller / datapath hardware architectures. These two parts pose different self-checking problems that require different solutions. Datapath self-checking is realised using duplication and inversion testing schemes within the circuit data-flow graph. The challenge therein is to identify and implement suitable high-level transformations and algorithms to enable the automatic addition of self-checking properties to the system functionality. This further involves the introduction of an expression quantifying on-line testability and including it in the standard high-level synthesis cost function, thus materialising a three-dimensional design space, to be explored by the designer feeding the synthesis tool with the problem specifications and constraints. In contrast, controller self-checking is not implemented within the synthesis process, but is rather the result of a post-processing synthesis step, directly applying an appropriate checker to the system control signals. Nevertheless, challenges include choosing suitable self-checking techniques, achieving the Totally Self-Checking (TSC) goal, and investigating ways to reuse any existing datapath self-checking resources for controller on-line testability. Solutions based both on parity-checking and on straightforward 1-hot checking are given, again providing the designer with enhanced opportunities for time-efficient experimentation in search for the best solution in every given synthesis project. The self-checking structures are finally verified theoretically and experimentally, through fault simulation. Overall, the enhanced version of the MOODS system, produced as a result of this research work, enables the implementation of reliable electronics efficiently, so that reliability-critical applications can be accommodated in a mass production context.</p
... Equation (38) to Equation (51)) consists of multiplication with a constant factor of 0.707. Normally, general purpose multiplier architectures present in FPGA (Wayne Wolf " FPGA-Based System Design, 2004) are used for this purpose which is designed using complex architecture and requires large amount of hardware resources (Neil & Eshraghian, 1994). This increases the overall hardware utilisations of the design and decreases the maximum operating frequency. ...
... The uses of non-separable equations with complex conjugate property helps to built optimised architecture which requires very less hardware resources than existing. Moreover the uses of novel Multiplier Equivalent block in the place of general multiplier architecture also reduces the overall hardware utilisations and worst path delay (Neil & Eshraghian, 1994). ...
Article
Full-text available
Fast Fourier Transform (FFT) is widely used in image and video processing applications to convert the respective image or video frames into transform domain that is very helpful to extract the accurate features of that image or video frame for various real-time applications. In this paper, efficient non-separable 8-point FFT architecture (DIT-FFT) is proposed that is implemented on Spartan-6 (xc6slx45-3csg324) FPGA. The proposed architecture consists of Data Format Conversion, Addition, Subtraction, Multiplier Equivalent and D-FF blocks, respectively. The non-separable equations of 8-point DIT-FFT are derived from the respective Butterfly Diagram that is then implemented using basic logic gates, which optimises the hardware utilisations with the help of Complex Conjugate property. The constant multiplications present in the non-separable DIT-FFT equations are implemented through Adders and Shifters presents in Multiplier Equivalent block which further optimises the overall hardware utilisations. Moreover, the Q-format are used to increase the data accuracy of the architecture. The comparison results show that the proposed architecture is better than existing in different prospectives. ARTICLE HISTORY
... To efficiently overcome the drawbacks of other designs, true single phase circuit design technique was proposed. True single phase clock (TSPC) circuits put forward fast and fully pipelined digital circuits using single clock signal [8,[15][16][17]. Due to single clock signal used, TSPC dynamic CMOS circuit technique has no clock skew problem. ...
... Due to single clock signal used, TSPC dynamic CMOS circuit technique has no clock skew problem. TSPC logic is more potent in implementations and individualization [15]. In [18], authors have done the study of impact of slope of clock and based on that perceptible limits and need of clock buffer is presented. ...
Article
Full-text available
A dynamic circuit design technique on the basis of true single phase logic is presented in this paper to minimize leakage power consumption. The circuit is comprehensively designed by incorporating a pair of diode transistor and a pair of stacked transistors. Active mode as well as idle mode power consumption and delay is analysed at low and high die temperature. 89–17% saving in power delay product is obtained for the same along with higher unity noise gain and reduced voltage bouncing noise. The analysis of the circuit also includes the investigation of voltage variation effect, process corner analysis and sizing effect analysis. The proposed technique is compared with several previously proposed dynamic circuit design techniques and it is found to have best power delay product. Further, it is implemented on 32 output decoder for enduring the technique. Comprehensive simulation using 90 nm technology in cadence specter, shows that the proposed design vanquish conventional and other previously proposed dynamic circuit design techniques in terms of power, delay, noise and robust against parameter and process corner variations.
... In CMOS inverter the drain of PMOS and NMOS circuit is connected to the output terminal and the gate is connected to both of the input terminal. The source terminal of the PMOS circuit is connected to the supply voltage, and the source terminal of the NMOS circuit is connected to the ground [5]. The CMOS inverter diagram is shown in Figure 1. ...
Article
Full-text available
An efficient power dissipation with Adiabatic logic using 2 phase adiabatic static CMOS logic (2PASCL) has been presented. In this research work adiabatic logic is mainly used to minimize the energy loss during the operation of the circuit. The Adiabatic circuits are low power circuits, which performs the "Reversible Logic" to conserve energy and it gives the efficient power dissipation. The Full Adder plays an important role in many arithmetic operations such as adder, multiplier, divider and Processor. This paper proposes a 2PASCL Full Adder using Adiabatic logic, it follows the principle of Adiabatic Switching and Energy Recovery. Normally, Full adder represents the three inputs and two outputs. The proposed 2PASCL Full Adder has been simulated by 125nm technology using tanner EDA tool. The average power dissipation and transistor count has been reduced by using this technique.
... In comparison to fixed-function solutions, they have the advantage of potentially being reprogrammed in the field, allowing product upgrades or fixes. They are often more costeffective (and less risky) than custom hardware, particularly for low-volume applications, where the development cost of custom ICs [13] may be prohibitive. And in comparison to other types of microprocessors, DSP processors often have an advantage in terms of speed, cost, and energy efficiency [1]. ...
Article
Full-text available
This paper presents the design and implementation of signed-unsigned Modified Booth Encoding (SUMBE) multiplier. The present Modified Booth Encoding (MBE) multiplier and the Baugh-Wooley multiplier perform multiplication operation on signed numbers only. Therefore, this paper presents the design and implementation of SUMBE multiplier. The modified Booth Encoder circuit generates half the partial products in parallel. By extending sign bit of the operands and generating an additional partial product the SUMBE multiplier is obtained. The Carry Save Adder (CSA) tree and the final Carry Look ahead (CLA) adder used to speed up the multiplier operation. Since signed and unsigned multiplication operation is performed by the same multiplier unit the required hardware and the chip area reduces and this in turn reduces power dissipation and cost of a system. The proposed radix-2 modified Booth algorithm MAC with SPST gives a factor of 5 less delay and 7% less power consumption as compared to array MAC.
... The static CMOS full adder has the advantages of being robust with respect to voltage scaling and transistor sizing along with its ease of use and generality; however the large number of pMOS transistors results in high input load. It also has weak output driving capability due to series transistor in the output stage [1].The complimentary pass transistor logic(CPL) has small input loads and good output driving capability however it has high wiring overhead and has a substantial number of nodes [1].Some other design styles include transmission-function full adder (TFA) [2] and transmission-gate full adder (TGA) [3].The TFA and the TGA have the advantage of being low power consuming however their performance degrades rapidly when cascaded and they lack driving capability. The 14-T adder uses only 14 transistors to implement the adder logic thus substantially reducing the transistor count however at low supply voltages the circuit does not operate reliability [4].Adders that are implemented using a combination of more than one logic style are called the hybrid-CMOS logic style [5]. ...
Article
Full-text available
In this paper a 16 and a 32 bit carry select full adders were designed using hybrid CMOS logic style. The circuits were simulated in 180nm technology using cadence virtuoso. The full adders are designed using XOR-XNOR circuits. This hybrid full adder circuit can provide good driving capability, good noise robustness and also offers low power consumption. Improvement in power delay product is also observed. It is also energy efficient as it operates at low voltages and offers better performance than the other standard full adders. Index Terms-Carry select full adder, Hybrid CMOS logic style, low power consumption.
... The static CMOS full adder has the advantages of being robust with respect to voltage scaling and transistor sizing along with its ease of use and generality; however the large number of pMOS transistors results in high input load. It also has weak output driving capability due to series transistor in the output stage [1].The complimentary pass transistor logic(CPL) has small input loads and good output driving capability however it has high wiring overhead and has a substantial number of nodes [1].Some other design styles include transmission-function full adder (TFA) [2] and transmission-gate full adder (TGA) [3].The TFA and the TGA have the advantage of being low power consuming however their performance degrades rapidly when cascaded and they lack driving capability. The 14-T adder uses only 14 transistors to implement the adder logic thus substantially reducing the transistor count however at low supply voltages the circuit does not operate reliability [4].Adders that are implemented using a combination of more than one logic style are called the hybrid-CMOS logic style [5]. ...
Presentation
Full-text available
In this paper a 16 and a 32 bit carry select full adders were designed using hybrid CMOS logic style. The circuits were simulated in 180nm technology using cadence virtuoso. The full adders are designed using XOR-XNOR circuits. This hybrid full adder circuit can provide good driving capability, good noise robustness and also offers low power consumption. Improvement in power delay product is also observed. It is also energy efficient as it operates at low voltages and offers better performance than the other standard full adders. Index Terms-Carry select full adder, Hybrid CMOS logic style, low power consumption.
... Hence, the number of transistors increased. Another alternative design approach is complementary pass-transistor logic (CPL) using 32 transistors having the better driving ability, but required high-power [5]. The different pass transistor logic (PTL) [6,7] families are used in many integrated circuits by eliminating redundant transistors and this technique effectively reduced transistor count. ...
Article
Full-text available
In this paper, we proposed an efficient full adder circuit using 16 transistors. The proposed high-speed adder circuit is able to operate at very low voltage and maintain the proper output voltage swing and also balance the power consumption and speed. Proposed design is based on CMOS mixed threshold voltage logic (MTVL) and implemented in 180nm CMOS technology. In the proposed technique the most time-consuming and power consuming XOR gates and multiplexer are designed using MTVL scheme. The maximum average power consumed by the proposed circuit is 6.94µW at 1.8V supply voltage and frequency of 500 MHz, which is less than other conventional methods. Power, delay, and area are optimized by using pass transistor logic and verified using the SPICE simulation tool at desired broad frequency range. It is also observed that the proposed design may be successfully utilized in many cases, especially whenever the lowest power consumption and delay are aimed. Keywords-Low-power full-adder, Low-power CMOS design, multiplexer based full-adder design, multi-threshold voltage based Full-adder design, pass transmission logic.
... Hence, boosting its efficiency is exigent for performance enhancing of most circuits/systems. Classical FAs usually use only one logic structure such as static CMOS logic, 15 complementary pass-transistor logic (CPL), 15 transmission-gate (TG), 16 and transmission-function (TF). 17 The other FAs use more than one classical logic style for their implementation, known as hybrid-logic style. ...
Article
The sharp increase in the leakage part of the total power of the very large scale integration (VLSI) circuits is a significant concern in the deep submicron CMOS process. The NOT gates, Gate Diffusion Input (GDI) cells, restorer NMOS–PMOS transistors for full-swing operation, and any path from the power voltage to the ground are the main sources of leakage power dissipation as well as short-circuit in the VLSI CMOS circuits/chips. The input controlled leakage restrainer transistor (ICLRT) is a new circuit-level method that is proposed in this paper. The ICLRTs can be deliberately added to any VLSI CMOS circuit to largely diminish the total power dissipation especially by the reduction of its leakage and short-circuit parts. The full adders are vital parts in various VLSI circuits/systems, especially in circuits used for fulfilling arithmetic operations. Those are often placed in the critical paths for multiplication and division, so influence the throughout the efficiency of the system. To test the proposed technique, ICLRTs added to five best 1-bit hybrid full adders in the deep submicron process to fit the needs of the day. The efficiency of the proposed method is evaluated using SPICE simulations in 22-nm CMOS BSIM4 process. Evaluation outcomes with 1-V power supply verified that the power dissipation and power-delay product (PDP) of the hybrid full adders based on ICLRT technique relative to corresponding original designs are reduced 65.67–95.7% and 35.85–87.37%, respectively. Mismatch analysis and Monte Carlo simulations prove the robustness and stability of the presented circuits in the presence of the process, voltage, and temperature (PVT) variations.
... In comparison to fixed-function solutions, they have the advantage of potentially being reprogrammed in the field, allowing product upgrades or fixes. They are often more cost-effective (and less risky) than custom hardware, particularly for low-volume applications, where the development cost of custom ICs [13] may be prohibitive. And in comparison to other types of microprocessors, DSP processors often have an advantage in terms of speed, cost, and energy efficiency [1]. ...
Article
Full-text available
This paper presents the design and implementation of signed-unsigned Modified Booth Encoding (SUMBE) multiplier. The present Modified Booth Encoding (MBE) multiplier and the Baugh-Wooley multiplier perform multiplication operation on signed numbers only. Therefore, this paper presents the design and implementation of SUMBE multiplier. The modified Booth Encoder circuit generates half the partial products in parallel. By extending sign bit of the operands and generating an additional partial product the SUMBE multiplier is obtained. The Carry Save Adder (CSA) tree and the final Carry Look ahead (CLA) adder used to speed up the multiplier operation. Since signed and unsigned multiplication operation is performed by the same multiplier unit the required hardware and the chip area reduces and this in turn reduces power dissipation and cost of a system. The proposed radix-2 modified Booth algorithm MAC with SPST gives a factor of 5 less delay and 7% less power consumption as compared to array MAC.The Simulation results are obtained from MODELSIM and Physical design is done from encounter tool from cadence also area, power and timing reports are obtained from RTL Compiler from cadence.
... The FA circuits until now have been broadly classified into static and dynamic logic styles [4][5][6][7]. Each logic style has some good aspects, but at the cost of few other critical parameters. ...
Chapter
Full-text available
In this paper, optimized designs of two 4-bit adders, namely ripple carry adder and look-ahead carry adder have been presented. These adder circuits are highly efficient in terms of delay, power, and PDP. A 1-bit hybrid full adder, which is the basic unit of the presented designs, is constructed using XOR and XNOR gates. Thus, energy-efficient XOR-XNOR gates are employed in the construction of 1-bit full adder and this adder when compared with conventional CMOS-based full adder, is found to have superior performance. This hybrid full adder cell is further used to implement the two 4-bit adders using Cadence Virtuoso EDA tool. The simulations, carried out at 45-nm CMOS process technology in a range of 0.6–1.2 V supply voltage, indicate that presented designs are superior in speed and power as compared to their conventional CMOS-based counterparts.
... N. Weste et.al. [1] proposed CMOS logic by taking 28 number of transistors with the delay of five MOS transistors. The implemented design takes 26 number of transistors with the delay of 6 MOS transistors. ...
Chapter
Full-text available
Different adder structures have been reviewed with CMOS logic and hybrid logic styles in this paper. XOR/XNOR cell, which is the key element in full adder design was also reviewed. Hybrid adders have advantage of low delay and low area occupancy due to less transistor count used. Lower value of PDP also can be achieved with hybrid structures. Adder structures are implemented in this paper by designing XOR and XNOR cells. Critical path estimations are made and number of transistors in critical path helps in estimating the critical path delay. Delay comparison for several adders was reviewed.
... This happens because the distance needed to travel inside the chip decreases while electron speed remains constant. Additionally, in smaller implementations the capacitance is reduced which also contributes to faster switching times of logical units and ultimately higher operating frequencies (Weste and Eshraghian, 2010). Data collected from Berkeleys extended database (Danowitz et al., 2012). ...
Thesis
Modern mobile processors are constrained by their limited energy resource and demanding applications that require fast execution. Single core designs use voltage/frequency throttling techniques that allow the system to switch between performant and efficient configurations to address this problem. Heterogeneous multicores also need to decide on which core to run rather than just adjust voltage and frequency. Consequently, they remain an open subject in terms of near optimal design and operation. This thesis investigates the performance and energy trade-off when migrating between heterogeneous cores and presents designs that enable low overhead transitions between cores through a series of contributions. The first contribution is based on a novel methodology, that can directly compare the execution of heterogeneous cores. With it, an in-depth investigation of the effects on the memory system when migrating between such cores is conducted. The analysis reveals that heterogeneous multiprocessor system slowdown is asymmetrical. In-Order core performance relies on the memory subsystem, while Out-of-Order execution depends on accurate speculation. A proposed design minimises migration overheads in In-Order cores without making the design prohibitively complex to implement. This is achieved by only sharing the larger caches and the translation state. The second contribution is a branch predictor design that transfers state when a migration occurs between heterogeneous cores. The design eliminates the warm up for Out-of-Order cores by transferring only minimal state. This improves post migration accuracy, potentially enabling better performance and energy efficiency. Finally, security has become a major concern for shared or transferable components in multicore systems, namely the branch predictor. The third contribution in this thesis investigates mitigation techniques of recently discovered side channel attacks. The proposed design flushes all but the most useful branch predictor state, ensuring isolation with minimal performance loss.
... CMOS technology (Allen and Holberg, 2011) has been around for many decades and are used in a number of products, such as microprocessors, microcontrollers, and other digital logic circuits. CMOS devices are extremely popular as they are immune to high noise and utilises minimal static power (Weste and Eshraghian, 1985). In order to fully understand the power dissipation of a CMOS circuit, this section will discuss the dynamic power consumption, the static power consumption, and the power dissipation caused by short circuits (Rabaey et al., 2003). ...
Thesis
Full-text available
This research is the first of its kind to investigate the utilisation of a multi-threading software-based countermeasure to mitigate Side Channel Analysis (SCA) attacks, with a particular focus on the AES-128 cryptographic algorithm. This investigation is novel, as there has not been a software-based countermeasure relying on multi-threading to our knowledge. The research has been tested on the Atmel microcontrollers, as well as a more fully featured system in the form of the popular Raspberry Pi that utilises the ARM7 processor. The main contributions of this research is the introduction of a multi-threading software based countermeasure used to mitigate SCA attacks on both an embedded device and a Raspberry Pi. These threads are comprised of various mathematical operations which are utilised to generate electromagnetic (EM) noise resulting in the obfuscation of the execution of the AES-128 algorithm. A novel EM noise generator known as the FRIES noise generator is implemented to obfuscate data captured in the EM field. FRIES comprises of hiding the execution of AES-128 algorithm within the EM noise generated by the 512 Secure Hash Algorithm (SHA) from the libcrypto++ and OpenSSL libraries. In order to evaluate the proposed countermeasure, a novel attack methodology was de- veloped where the entire secret AES-128 encryption key was recovered from a Raspberry Pi, which has not been achieved before. The FRIES noise generator was pitted against this new attack vector and other known noise generators. The results exhibited that the FRIES noise generator withstood this attack whilst other existing techniques still leaked out secret information. The visual location of the AES-128 encryption algorithm in the EM spectrum and key recovery was prevented. These results demonstrated that the proposed multi-threading software based countermeasure was able to be resistant to existing and new forms of attacks, thus verifying that a multi-threading software based countermeasure can serve to mitigate SCA attacks.
... [1]. Full adder is the basic element of arithmetic circuits as subtraction, multiplication, division and address calculation are implemented using addition in DSP architecture and microprocessor [2] and need to be improved. ...
... With the advancement of the technology, increasing the demand for the digital electronic devices enormously. According to Moore's law [1][2][3] the number of transistors has been doubled for every 36 months. As results it increases the interconnection over the circuit ,therefore delay and power should be increase rapidly.Now a day's majority of the microprocessors and digital signal processors(DSP) dynamic power consumption becomes one of the serious issue.As dissipation of the power is more, linearly it will degrades the reliability of the device. ...
... Normally the register logic consists of 2n D-type flip-flops (DFFs) for nbit resolution [143] (see Figure 4.12). It is assumed that each DFF can be modeled with a minimum of twelve transistors if they are replaced with NMOS pass transistors (almost six inverters) [144]. ...
Thesis
Recent advances in the field of neuroscience have suggested that new generation brain computer interfaces demand a critical step in biomedical signal processing requiring online/on-chip spike sorting. Spike sorting is the process of grouping signals from an individual neuron by grouping action potentials (spikes) into a specific cluster based on the similarity of their shapes. The extraction of single-unit activity by sensors at a distance from specific neurons is necessary for a wide range of clinical applications such as disorder treatments, muscular stimulation (e.g., epidural spinal cord stimulation for treatment acceleration), cochlear implant and neural prostheses. A brain machine interface, for example, can potentially substitute the missing motor pathway/sensory information between the motor cortex and an artificial limb. With the aim of developing an energy-efficient spike sorting chip for hardware implantable systems, this thesis introduces a new feature extraction method based on extrema analysis (positive and negative peaks) of spike shapes and their discrete derivatives. The proposed method runs in real-time and does not require any offline training. Compared to other methods it offers a better tradeoff between accuracy and computational complexity using online sorting. It additionally eliminates multiplications which are computationally expensive, power hungry and require appreciable silicon area. A minimum power limit for implantable neural front-end interfaces is also derived. It involved: 1) system level optimization - the front-end specifications including the bandwidth, data converter resolution and sampling rate were defined by exploring the effect of the parameters on spike sorting via a standard spike bank; 2) block level optimization - The front-end power was minimized by using an opamp-less cyclic converter; and 3) estimating the power limit equation of the frontend. The new optimization methodology addresses the future demands of neural recording interfaces. Finally the thesis presents the design, implementation and testing of the first generation of an adaptive spike sorting processor. It enhances the accuracy-power characteristics by employing self-calibration of processing features. The chip prototype was fabricated in a 180-nm CMOS technology. It achieves an overall clustering accuracy of 84.5% using a standard spike data bank and has a power consumption of 148-μW from 1.8-V supply voltage. The fabricated spike processor has almost 10%higher clustering accuracy than the state-of-the-art.Measurements show good power-performance characteristics compared to the state-of-the-art online and offline clustering methods.
Article
The Earth’s infrared energy storage is huge, and if it can be used on a large scale, it will effectively improve the Earth’s greenhouse effect . In this paper, using water as a good energy storage vector,a long wave energy storage system (LWESS) was designed, which can radiate long wave infrared with huge energy density and fixed bandwidth or fixed wavelength. 1. The feasibility of using hollow glass and infrared fiber to deliver LWESS for cooling and heating is presented. 2. On the basis of literature, principle of the system achieving thermoelectric power generation is described; 3. Using antenna theory to derive the composition structure of planar antenna and propose the selection requirements of rectifier diode, which provides the theoretical and practical basis for converting long-wave radiation using Rectenna; Finally, the Quantum theory of long-wave radiation is used to propose the idea of manufacturing long-wave radiation module which can directly convert energy, and the long-wave energy storage system has unique advantages and can quickly realize the harmony between human and nature.
Conference Paper
Full-text available
Circular arrays are very much preferred due to their obvious reasons and capability to control the main beam position inherently. In this paper, synthesis of circular array using amplitude spacing technique is demonstrated. The analysis is carried out using the simulated radiation patterns with sidelobe level suppression. The simulation is carried out in MATLAB.
Article
Full-text available
In under this research article, neoteric circuits for Exclusive OR gate and Exclusive NOR gate are designed. The designed logic is highly refined in terms of power consummation and speed, which are due to minimum CL at the output and low leakage power. We followed six novel hybrids, one bit one full-adder design based on the new Exclusive OR gate and Exclusive NOR (XOR-XNOR) gates. Many Relevant designed logics carries its advantages within aspect relevant to delay power, dissipation power, speed, as well as all that. Within validate the presentation of the introduced design, major SPICE as well as Tanner EDA simulations function as executed. This simulation outcomes, arrange at a 65-nanometer based on hybrid technique process, reveal for the introduced architecture have the best speed and power in contempt of different Full Adder architectures. The proposed design has a minimum power of 0.8 nw & delay of 9.4 ns, which is very optimized & efficient than the reference design. The previous design has 4.08-microwatt power. We customized the design with 22T and change the design methodology to make the results optimized.
Conference Paper
Full-text available
This paper presents the design and implementation of signed-unsigned Modified Booth Encoding (SUMBE) multiplier. The present Modified Booth Encoding (MBE) multiplier and the Baugh-Wooley multiplier perform multiplication operation on signed numbers only. The array multiplier and Braun array multipliers perform multiplication operation on unsigned numbers only. Thus, the requirement of the modern computer system is a dedicated and very high speed unique multiplier unit for si g ned and unsigned numbers. Therefore, this paper presents the design and implementation of SUMBE multiplier. The modified Booth Encoder circuit generates half the partial products in parallel. By extending sign bit of the operands and generating an additional partial prod uct t he SUMBE multiplier is obtained. The Carry Save Adderr (CSA) tree and the final Carry Look ahead (CLA) adder used to speed up the multiplier operation. Since signed and unsi g ned multiplication operation is performed by the same multiplier unit the required hardware and the chip area reduces and this in turn reduces p o wer dissipation and cost of a system. In this project an 8x8 multiplier was designed and simulated a t t he gate level and at the transistor level using the AMS simulator in Cadence Design System. We optimized the multiplier for speed by implementing fundamental building blocks directly in CMOS with the IBM CMRF7SF 0.18um process. Booth's multiplication algorithm was used to reduce the number of partial products, and thus the number of adders, providing a speed advantage. Furthermore, the adder circuit, which is the primary source of delay, was constructed with two layers of carry look ahead logic (CLA) to decrease prop agat io n delay. A sign extension trick is utilized to furt her decrease the number of logic gates between the input and output. By using transistor level implementations for the CLA logic and the full adder, our design also reduces the total area required compared to gate level designs. La yout was constructed for each block and the full architecture. The worst case delay time with 100fF load capacitance was approximately 2.98ns. This propagation delay is 46% faster than the reference gate level design, where d ela y w a s 5.50ns. The final schematic consumes a total area less than 200x200 square microns. To t al p o wer consumption at an input signal rate o f 2 0 0MHz was 2.5mW. I INTRODUCTION Multipliers have an important effect in designing arithmetic, signal and image processors. Many mandatory functions in such processors make u se o f multipliers. The advanced digital processors now have fast bit-parallel multipliers embedded in t h em. Multipliers for unsigned numbers are designed usin g dizzying array of ways with each method h avin g it s own advantages and tradeoffs. In recent years, h igh-speed multipliers play an important role while designing any architecture and researchers are still working on many factors to increase the speed of operation of these basic elements. Algorithms for designing high-speed multipliers have been modified and developed for better efficiency. The increased complexity of various applications, demands not only faster multiplier chips but also smarter and efficient multiplying algorithms that can be implemented in the chips. It is u p t o t h e need of the hour and the application on to which t h e multiplier is implemented and what tradeoffs need t o be considered. Generally, the efficiency of the multipliers is classified based on the variation in speed, area and configuration. Due to rapidly growing system-on-chip industry, not only the faster units but also smaller area and less power has become a major concern for designing very large scale integration (VLSI) circuits. Digital circuits make u se of digital arithmetic's. Among various arithmetic operations, multiplication is one of the fundamental operation used and is being performed by an add ed. There are many ways to build a multiplier each providing trade-off between delays and other characteristics, such as area and energy d issipat ion The objective of a good multiplier and accumulat or (MAC) is to provide a physically compact, good speed and low power consuming chip. To save significant power consumption of a VLSI design, it is a good direction to reduce its dynamic power t hat is the major part of total power dissipation.
Article
Full-text available
— The project titled “Designing of Modified Booth Encoder with power suppression technique” is useful technique in reducing the area, delay and power consumption. Recent studies in designing of DSP systems have revealed loss of high performance due to huge area utilization and delay. In Radix-4 Modified Booth encoder the partial products has been reduced to half of the earlier one. Hence the number of adders are reduced, so the area consumption will be reduced and delay in the output also reduced by a factor. The power Suppression technique used along with the Modified booth Encoder is SPST (Spurious Power Suppression Technique). By using this technique the glitches can be reduced and avoids unwanted operation such as addition of repeated zeros. Therefore the power consumption will be less. Hence this project is useful in the field of signal processing and also very useful in portable systems.
Chapter
Full-text available
Adder is a basic building block of the arithmetic logic unit (ALU). Designing of optimized adder circuit inherently makes a pavement for obtaining optimized ALU design. The implementation of metal–oxide–semiconductor field-effect transistor (MOSFET)-based very large-scale integration (VLSI) circuits in the nanoscale range is reached saturation condition. This is due to the MOSFET that meets significant issues like producing more leakage current and highly dependent on PVT variation during nanoscale fabrication. The carbon nanotube field-effect transistor (CNTFET) can overcome the demerits of MOSFET, and it supports low-power, delay-optimized VLSI circuit design. In this paper, different types of full adders are implemented using CNTFET and their power delay product (PDP) is analysed for single and multiple threshold voltages of CNTFET. From the simulation, the low and high PDP of full adders are identified. The PDP of full adders is optimized by varying the threshold voltage of CNTFET. The simulation is carried out using the HSPICE simulation tool. The Stanford University 32-nm-CNTFET model is used for the simulation.
Chapter
Full-text available
This study is done to analyze the effect of structural metal of metamaterial on the performance of the absorber. Absorbers are important part of various applications because of the increased demand of radiation absorption. This study is aimed to study the effect of various metals and compared based on their absorption performance. It is observed that bismuth provides the best absorption characteristics among all the considered metals.
Article
As the integration scale grows, greater power consumption and area consumption limit the usefulness of the circuit. The market for mobile telephones, tablets and laptops battery-powered devices has risen. Two suggested total adder structures using XOR-XNOR gates are introduced in this article. The optimization of these circuits is due to the low output capacity of the adder and its power consumption and delay. Compared to other conventional full adder structures, the proposed full adder structures consume 0.32μW and 0.34μW respectively of power, which are small. These complete additives not only achieve low power and high speed but also provide complete swing with fewer transistors. Tanner Tools are used to test the performance of the circuits. This simulation is based on 25 nm technology.
Article
Full-text available
In the current age of technology advancement it is necessary to design different new concepts to reduce area of the cell as well as power consumption. The adders are always meant to be the most fundamental requirements for process of high performance and other multi core devices. In this paper, novel circuits for XOR/XNOR and simultaneous XOR-XNOR functions are proposed. The proposed circuits are highly optimized in terms of the power consumption and delay, which are due to low output capacitance and low short-circuit power dissipation. We also propose six new hybrid 1-bit full-adder (FA) circuits based on the novel full-swing XOR-XNOR or XOR/XNOR gates. KEY WORDS: Full adder (FA), noise, particle swarm optimization (PSO), transistor sizing method, XOR-XNOR.
Chapter
This chapter consists of a brief review or introduction, depending on the reader’s background, of the basics of computer arithmetic. The first two sections are on algorithms and designs of hardware units for addition and multiplication. (Subtraction is another fundamental operation, but it is almost always realized as the addition of the negation of the subtrahend.) For each of the two operations, a few architectures for hardware implementation are sketched that are sufficiently exemplary of the variety of possibilities. The third section of the chapter is on division, an operation that in its direct form is (in this book) not as significant as addition and multiplication but which may nevertheless be useful in certain cases. The discussions on algorithms and architectures for division are therefore limited.
Article
Full-text available
This paper provides the An Artificial Neural Network (ANN) is mathematical model that is presented for dynamic circuit Simulizations model. The ANN is used to learn modelling for dynamic circuit's simulation, in modelling for circuit simulation, there are major applications that need to be distinguished because of their different requirements. The designs of artificial neural network model are constructed with layers of units, and thus are termed multilayer ANNs. Dynamic circuits are used widely in custom ANN circuits to achieve high speed in smaller area, and effetely to lower power consumption due to glitch-free operation. The classical approach to obtain suitable compact physical model and table model for circuit simulation.
ResearchGate has not been able to resolve any references for this publication.