ArticlePDF Available

Serial Addition: Locally Connected Architectures

December 2007
IEEE Transactions on Circuits and Systems I Regular Papers 54(11):2564 - 2579

December 2007
54(11):2564 - 2579

DOI:10.1109/TCSI.2007.907885

Source
IEEE Xplore

Authors:

Valeriu Beiu

Aurel Vlaicu University of Arad

Jabulani Nyathi

Eastern Washington University

Show all 5 authorsHide

This paper will briefly review nanoelectronic challenges while focusing on reliability. We shall present and analyze a series of CMOS-based examples for addition starting from the device level and moving up to the gate, the circuit, and the block level. Our analysis, backed by simulation results, on comparing parallel and serial addition shows that serial adders are more reliable while also dissipating less. Their reliability can be improved by using reliability-enhanced gates and/or other redundancy techniques (like e.g., multiplexing). Additionally, the architectural technique of short-circuiting the outputs (of several redundant devices/gates/blocks) exhibits "vanishing" voting and an inherent fault detection mechanism, as both transient and permanent faults could be detected based on current changes. The choice of CMOS is due to the broad design base available (but the ideas can be applied to other technologies), while addition was chosen due to its very solid background (both theoretical and practical). The design approach will constantly be geared towards enhancing reliability as much as possible at all the levels. Theory and simulations will support the claim that a serial adder is a very serious candidate for highly reliable and low power operations. Finally, our simulations will identify the V<sub>DD</sub> range where the power-delay-product and energy-delay-product are minimized. All of these suggest that a reliable (redundant) solution can also be a low power one if using serial architectures, while speed could still be traded for power (e.g., by dynamically varying the supply voltage both above and below V<sub>th</sub>).

Radiation hardned by design. (a) Differential fault-tolerant CMOS gate [73]. (b) Soft error suppression technique for domino logic [74]. (c) Single event upset hardned inverter in silicon-on-insulator [75]. (d) Hardened majority voter [75] (based on the classical output-wired-inverters [78]).

…

Device level build-in redundancy. (a) High matching techniques (used in analog designs) [77]. (b) Interesting enhancement of the output-wired principle [82]. (c) High matching technique applied to multiplexed SET circuits [52], [83]. (d) PLA style circuit with device level duplication [84], [85].

…

Interconnect pattern for parallel adders. (a) BK. (b) HC. (c) KS. (d) Layout of a 64-bit KS adder.

…

Estimated probability of failure of different adders (RC, BK, HC, and KS) versus the number of input bits for p = 0:1%.

…

(a) Classical RC adder where the square blocks represent FAs. Three different multiplexed 3-RC adders: (b) using MAJ-3 gates (circles); (c) short circuiting the outputs of three FAs and using three inverters (triangles) to recover the voltage; and (d) short circuiting the outputs of three FAs (the voltage is recovered by the next FAs).

…

Figures - uploaded by Valeriu Beiu

Content may be subject to copyright.

Content uploaded by Valeriu Beiu

Content may be subject to copyright.

2564 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 54, NO. 11, NOVEMBER 2007

Serial Addition: Locally Connected Architectures

Valeriu Beiu, Senior Member, IEEE, Snorre Aunet, Senior Member, IEEE, Jabulani Nyathi, Member, IEEE,

Robert R. Rydberg III, Student Member, IEEE, and Walid Ibrahim, Member, IEEE

Abstract—This paper will brieﬂy review nanoelectronic chal-

lenges while focusing on reliability. We shall present and analyze

a series of CMOS-based examples for addition starting from the

device level and moving up to the gate, the circuit, and the block

level. Our analysis, backed by simulation results, on comparing

parallel and serial addition shows that serial adders are more reli-

able while also dissipating less. Their reliability can be improved

by using reliability-enhanced gates and/or other redundancy

techniques (like e.g., multiplexing). Additionally, the architectural

technique of short-circuiting the outputs (of several redundant

devices/gates/blocks) exhibits “vanishing” voting and an inherent

fault detection mechanism, as both transient and permanent faults

could be detected based on current changes. The choice of CMOS

is due to the broad design base available (but the ideas can be

applied to other technologies), while addition was chosen due to its

very solid background (both theoretical and practical). The design

approach will constantly be geared towards enhancing reliability

as much as possible at all the levels. Theory and simulations will

support the claim that a serial adder is a very serious candidate for

highly reliable and low power operations. Finally, our simulations

will identify the

range where the power-delay-product and

energy-delay-product are minimized. All of these suggest that a

reliable (redundant) solution can also be a low power one if using

serial architectures, while speed could still be traded for power

(e.g., by dynamically varying the supply voltage both above and

below

Index Terms—Addition, fault/defect tolerance, multiplexing,

nanoarchitectures, reliability, serial architectures.

I. INTRODUCTION

MOS scaling has been the means by which the semicon-

ductor industry has achieved its historically unprecedented

gains in productivity and performance quantiﬁed by the highly

cited Moore’s Law [3]. Scaling CMOS technology to the next

generation has always increased transistor densities, improved

performance, and reduced power consumption. The most recent

International Technology Roadmap for Semiconductors (ITRS)

Manuscript received January 15, 2007; revised July 16, 2007. This paper was

recommended by Guest Editor C. Lau.

V. Beiu and W. Ibrahim are with the College of Information Tech-

nology, United Arab Emirates University, Al Ain 17555, U.A.E. (e-mail:

vbeiu@uaeu.ac.ae; walidibr@uaeu.ac.ae).

S. Aunet is with the Department of Informatics, University of Oslo, Oslo

0316, Norway (e-mail: sa@iﬁ.uio.no).

J. Nyathi and R. R. Rydberg III are with the School of Electrical Engineering

and Computer Science, Washington State University, Pullman, WA 99163 USA

(e-mail: jabu@eecs.wsu.edu; rrydberg@eecs.wsu.edu).

Digital Object Identiﬁer 10.1109/TCSI.2007.907885

This paper is a signiﬁcantly expanded version of a conference paper entitled:

“On the advantages of serial architectures for low-power reliable computations”

[1], and incorporates parts from an unpublished presentation: “The vanishing

majority gate: Trading power and speed for reliability” [2], which have also

been extended and updated.

report [4] predicts that the semiconductor industry will still con-

tinue its success in downscaling CMOS for a few more genera-

tions. It also predicts that the scaling will become quite difﬁcult

as the industry approaches the 16-nm technology node. Scaling

might continue further, but it is expected that alternative nan-

odevices will start to be integrated with CMOS onto a silicon

platform. The alternative nanodevices currently being evaluated

can be classiﬁed into solid-state (e.g., rapid single ﬂux quanta,

1-D structures like nanowires and carbon nanotubes, resonant

tunneling devices, single-electron technology (SET), ferromag-

netic devices, spin devices, etc.) and molecular ones. However,

there are many fundamental and technical challenges that must

be resolved to continue the scaling of CMOS technology deep

into the nanometer regime [5]–[12]. Probably the three greatest

challenges are: power [13] (and the associated heat dissipation),

reliability, and interconnectivity [14], [15]. Some other difﬁcult

challenges include (see [14]–[18]): veriﬁcation, as well as logic

encoding and hybrid integration, and the overall complexity (of

design, test, and fabrication).

The remainder of this paper is organized as follows. In

Section II we review some of the challenges outlined above. In

Section III we go over the design of reliable gates, which was

started a long time ago, including rad-hard by design solutions.

We further present details of our own work on reliability. Four

different adders are analyzed in Section IV, with multiplexing

used for enhancing reliability. Three different solutions for im-

plementing multiplexing are introduced and compared. Finally,

in Section V, the power-performances of the four different

adders are estimated, and simulations are used to identify the

supply voltage

range where the power-delay-product

and energy-delay-product are minimized. These show that a

reliable design approach could also yield a promising balance

between the conﬂicting metrics of power and speed (which

will be improved by scaling). In Section VI, we summarize our

ﬁnding and discuss implications for further research.

II. N

ANOELECTRONIC CHALLENGES

A. Challenges Due to Scaling

Power dissipation (and the associated heat) is strongly

affected by the increasing leakage currents. With the advent

of the sub-100-nm CMOS technology (i.e., the “nanoera”),

leakage currents have reached a level that cannot be ignored

anymore. Leakage will continue to increase the static power

dissipation exponentially (about 5

at each generation at 30

C), till multigate transistors and high- dielectrics, which

are expected to reduce leakage down to about 10% and 1%,

respectively will become mainstream. Nevertheless, with fore-

casted

devices per chip [16] (for comparison the human

body has roughly

cells, while the Brain has about

BEIU et al.: SERIAL ADDITION: LOCALLY CONNECTED ARCHITECTURES 2565

neurons and synapses), even SET, which is advocated for

its ultra-low-power will become power constrained (see, e.g.,

[19]).

With device geometries scaling below the 65-nm range,

the

available reliability margins are drastically being reduced

[20]. As a result, the reliability community will be forced

to thoroughly investigate accurate metrics able to determine

these margins, and how can we change our reliability assess-

ment methodologies to gain new reliability space for the most

advanced technologies. Currently, from the chip designers’

perspective, reliability manifests itself more and more as

time-dependent uncertainties in electrical parameters. In the

sub-65-nm era, these device-level parametric uncertainties

will be too high to handle with prevailing worst-case design

techniques-without incurring signiﬁcant penalty in terms of

area, delay, and energy. Additionally, with continued scaling,

the copper resistivity is starting to increase sharply due to

interfacial and grain boundary scattering. Besides, the per-

formance issues associated with interconnect scaling, several

interconnect-related reliability issues are becoming trouble-

some (electromigration, stress migration, and heating), while,

equally important, others are increasing with scaling (poor

pattern deﬁnition, line-edge roughness, nanoscale corrosion,

low-

dielectric cracks, post-chemical-mechanical polishing

residues, etc.). The global picture is that reliability looks like

one of the greatest threat to the design of future integrated com-

puting systems. For emerging nanodevices and their associated

interconnects, the expected higher probabilities of failures, as

well as the higher sensitivities to noise and variations, could

make future chips prohibitively unreliable. The result is that

the current IC design approach-based on the conventional

zero-defect foundation might simply not work. Therefore,

fault- and defect-tolerance techniques that allow the system to

recover from manufacturing and operational errors will have to

be considered from the (very) early design phases.

Finally, complexity will certainly be the name-of-the-game

for nanoelectronics. Miniaturization will increase the device

density, which will subsequently increase the complexity of

every aspect related to the design and the manufacturing of

future chips. For example, the modeling complexity of a mul-

tilevel interconnect network in a Gigascle chip is of the order

coupling inductances and capacitances through-out a

nine-to-ten-level metal stack [11]. This complexity aggravates

many other problems like: testing and veriﬁcation [21], integra-

tion and packaging [11], [16], and hybridization.

Before going further, it is important to highlight here that all

the challenges enumerated above are intimately entangled: re-

dundancy translating into higher power and connectivity.

B. Reliability Challenge

The reasons chip reliability is becoming a major hurdle are

on the one hand due to the continuous increase in internal elec-

trical ﬁelds and current densities, and on the other hand due to

the introduction of new materials and devices with unknown re-

liability behavior (let alone the decrease in the number of elec-

trons per device and the increase in the number of devices and

interconnects). Another reason for the heightened importance

of reliability is that in spite of these scaling trends, the market

has continuously been demanding higher reliability levels due

to the emergency of new applications. In the past, the reliability

margins have always been sufﬁciently high and have been guar-

anteed at the technology level (e.g., based on accelerated stress

tests). Currently, the semiconductor industry approach is to ex-

tensively test the fabricated circuits and abandon most (if not

all) of those not operating correctly. Still, in 1994 Intel had to

start a massive recall campaign that cost US$ 475 millions when

it was discovered that the Pentium processor generated slightly

incorrect results for some ﬂoating-point operations. Very re-

cently, Microsoft announced that it expects to spend more than

US$1 billion (up to US$1.15 billion) to repair widespread hard-

ware problems (no technical details currently available) in its

Xbox 360 video game console. In the future, larger number of

devices will be deployed in many applications and embedded

systems, and intrinsic reliability could turn out to be a show-

stopper for economically viable technology scaling: the cost to

perform a recall—especially in the realm of failure-sensitive and

energy-conscious real-time embedded systems—will be expo-

nentially higher. Thus, there is very high pressure to make sure

that future nanoelectronic systems will be functioning correctly

over their lifetime—even if not free of defects and faults!

There is also an increasing concern that the massive scaling

of the CMOS devices will introduce extreme static and dynamic

parameter ﬂuctuations at the material, device and circuit levels

[8], [9], [16], [22]. Extreme parameter variations are a major

barrier to achieving reliable and predictable system implemen-

tations [23]. Typical increases in propagation delay and power

dissipation due to such ﬂuctuations are expected to be in the

30% to 50% above nominal for the 45-nm generation CMOS

logic circuits [16].

Additionally, soft errors will occur due to material decay, in-

terference or electrostatic discharge. Since capacitance and volt-

ages will be only a fraction of what they are today, very small

charges will be needed to ﬂip a bit in memory, the output of

a gate, or the voltage on a wire. Although such an event is

highly unlikely per single device, soft errors are becoming an-

other reliability concern for future systems based on nanode-

vices [24]–[28], due to the expected massive number of devices

the system will have. For instance, a hypothetic one Terabyte

memory chip with a probability of soft error per single bit of one

per million years will experience a soft error about every 4 s!

Typically, data in memory is protected using error-correcting

codes-on top of stand-by spare rows and columns, and off-line

reconﬁguration. However, mechanisms to protect latches and

ﬂip-ﬂops (that store state) and random logic have only recently

started to appear on the researchers’ agendas [25]–[28].

According to classical scaling theory, the gate insulator thick-

ness should shrink with the other transistor dimensions. The cur-

rent ﬂowing through the gate oxide causes reliability problems

by leading to long-term parameter shifts and eventually to oxide

breakdown. On top of these, interconnect scaling raises another

reliability concern for future nanodevices. In [11], Davis et al.

mentioned that the miniaturization of interconnects, unlike tran-

sistors, does not enhance their performance (for fresh perspec-

tives see [29] and [30]). Interconnect scaling will also signiﬁ-

cantly affect the circuit reliability due to increased crosstalk and

latency. Sakurai [7] concluded in 2000 that the interconnects

2566 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 54, NO. 11, NOVEMBER 2007

rather than transistors will be the major factor in determining

the cost, delay, power, reliability and turn-around time of the

semiconductor industry

Unfortunately, reliability problems for beyond CMOS tech-

nologies [31], [32], are expected to get even worse. The intro-

duction of new materials could sharply decrease reliability mar-

gins. Beyond CMOS device failure rates are predicted to be as

high as 10% (e.g., background charge for SET [33]) going up to

30% (e.g., self-assembled DNA [34], [35]). As a fresh example,

[36] has reported defect rates of 60% for a 160-kilobit molecular

electronic memory. Clearly, achieving 100% correctness at the

system levelusing suchdevicesand interconnects willbe not only

outrageously expensive but might be plainly impossible. Hence,

relaxing the requirement of 100% correctness for devices and in-

terconnectsmightreducethe costsofmanufacturing,veriﬁcation,

and test [21]. Still, this will lead to more transient and permanent

failuresofsignals, logicvalues,devices,and interconnects.These

conﬂicting trends will render technologists unable to meet failure

rate targets, and impose the delegation of reliability qualiﬁcation

to designers, i.e., failures will have to be compensated at the ar-

chitectural level [14], [15], [17], [18], [37]–[42].

Previously fault-tolerance has been an issue only for safety-

critical designs, but as argued above it looks like it will be here

to stay and will become part of any future design [43], [44].

Any architecture that will disregard the fact that the underneath

devices and interconnects are unreliable is anticipated to be im-

practical [43], [45].

From the system design perspective, errors fall into one of

the following three classes: permanent, intermittent, and tran-

sient [44]. The oldest and most commonly used fault model

is the “stuck-at” model. It is not clear if emerging technolo-

gies will not require new fault models [43], [46], or if mul-

tiple errors might have to be dealt with [27]. Recently, even the

well-established assumption of a bounding constant probability

of failure for each gate was challenged [47]–[49]. These papers

have shown that approximating the gates probabilities of failure

by (bounding) constants introduce sizeable errors, leading to

over-design.

The well-known approach for developing fault-tolerant

architectures in the face of uncertainties (both permanent and

transient faults) is to incorporate redundancy [50]. Redundancy

can be either static (in space, time, or information) or dynamic

(requiring fault detection, location, containment, and recovery).

Space (hardware) redundancy relies on voters (e.g., generic,

inexact, midvalue, median, weighted average, analog, hybrid,

etc.) and includes among others the well-known: modular

redundancy, cascaded modular redundancy, and multiplexing

(including von Neumann multiplexing [50], enhanced von

Neuman multiplexing [51], [52], and parallel restitution [53]).

Time redundancy is trading space for time (e.g., alternating

logic, recomputing with shifted operands, recomputing with

swapped operands, etc.), while information redundancy is

based on error detection and error correction codes. Hybrid

approaches are also known, e.g., time-shared triple modular re-

dundancy, recomputing with triplication with voting, hardware

partitioning in time redundancy, recomputing with partitioning

and voting, quadruple time redundancy (see [54]–[59]), as well

as reconﬁguration [40].

Some of the reliability-enhanced schemes enumerated above

can be implemented at several different levels: device, gate, cir-

cuit, block, and system. All of them have in common that im-

proved reliability is traded off for increased area (number of de-

vices) and higher connectivity, while it is expected that these will

lead to higher power consumption and/or slower computations

[60]. As an early example for nanoelectronics, Roychowdhury et

al. [61] suggested that a quantum-dot cellular automata circuit

implemented with a redundancy factor

(i.e., the number

of replicated identical copies) will be able to perform correctly

with very high probability even if 15% of the devices failed.

Till now, VLSI designers did not (have/want to) care about

reliability, which was characterized at the technology level. In

the future only material and device engineering will not suf-

ﬁce anymore for tackling reliability. No ’silver bullet’ will be

able to cope with all the types of faults in nanoscale circuits

and systems, and a combination of several techniques will cer-

tainly be needed [62]. Boosting reliability will require more and

more a cooperative involvement of the logic designers and ar-

chitects, where high-level techniques will rely upon lower levels

support based on novel modeling and electronic design automa-

tion (EDA) tools.

In the following, we shall go through a series of examples

starting from the device level and going towards the system

level, by trying to emphasize the synergy between the design

and the technology levels, and qualitatively and quantitatively

analyzing the beneﬁts such an approach would bring.

III. M

ORE RELIABLE

GATES

The design of more reliable gates has been of high in-

terest when vacuum tubes were the elementary devices [50],

[63], [64]. By that time, threshold logic (including majority

and ternary logic) was an active research topic, even used in

building some of the early computers. During 1957 and 1958,

Rosenblatt together with Charles Wightman and others con-

structed the Mark I Perceptron having 512 adjustable weights

(see [65]). Shortly afterwards, Bernard Widrow together with

his students developed the ADALINE (ADAptive LINear

Element) [66]. The next threshold logic computer was DONUT

[67], followed later by Setun [68], [69]. Due to the low relia-

bility of vacuum tubes, fast elements on miniature ferrite cores

and semiconductor diodes were designed for implementing

ternary logic. Brousentsov even stated that: “Ternary threshold

logic gates, as compared with the binary ones, provide more

speed and reliability, and required less equipment and power.

These were the reasons to design a ternary computer”[68].

With the advent of MOS and CMOS integrated circuits, such

topics were quickly forgotten. Still, radiation hardened [70] has

constantly been in demand for special applications. Rad-hard by

design solutions can be classiﬁed into either layout-level (i.e.,

based on modifying the layout) or switch-level (i.e., based on

modifying the circuit at the transistor level) techniques, while

hybrid and adaptive techniques have also been developed.

Most circuit designers have used the switch-level approach.

A CMOS fault tolerant gate based on encoding the circuit out-

puts with Berger codes (error correcting codes) requires the in-

troduction of additional networks to provide tolerance to single

stuck-at faults, as well as to a number of multiple faults, while

BEIU et al.: SERIAL ADDITION: LOCALLY CONNECTED ARCHITECTURES 2567

Fig. 1. Radiation hardned by design. (a) Differential fault-tolerant CMOS gate [73]. (b) Soft error suppression technique for domino logic [74]. (c)

Single event

upset hardned inverter in silicon-on-insulator [75]. (d) Hardened majority voter [75] (based on the classical output-wired-inverters [78]).

Fig. 2. Device level build-in redundancy. (a) High matching techniques (used in analog designs) [77]. (b) Interesting enhancement of the output-wired principle

[82]. (c) High matching technique applied to multiplexed SET circuits [52], [83]. (d) PLA style circuit with device level duplication [84], [85].

also reducing unidirectional faults [71]. Another device-level

approach is quadruplication applied to combinatorial CMOS

gates both at the net (the p-stack and the n-stack are separately

quadruplicated as a whole) and at the transistor level (every tran-

sistor is quadruplicated preserving the interconnection topology

of the net) [72]. An alternate solution can be seen in Fig. 1(a).

It duplicates the n-stack and the p-stack and adds cross-coupled

transistors, achieving fault tolerance with marginal performance

degradation [73]. A low-power soft error suppression technique

for dynamic logic can be seen in Fig. 1(b). It adds pass transistor

device(s) as isolation and weak keeper(s) to standard domino

logic [74]. Optimizing the size of the keeper (layout-level) can

also help. As can be seen from these few examples, the solu-

tions developed are quite elaborate. Under switch-level redun-

dancy, we should also include active biasing and isolated well

transistors [75] [see Fig. 1(c)]. These prevent transients in com-

binational logic from reaching the output node. Such approaches

complement noise-immune designs like [76], [77], and can be

combined with hardwired voting [see Fig. 1(d)], e.g., based on

the classical output-wired-inverters idea [78].

In parallel, the analysis of the failure mechanisms of in-

tegrated circuits, has led to deﬁning layout rules enabling

to improve the testability of circuits [79]. Later layout-level

design for testability rules were used for avoiding some hard to

detect faults or even undetectable faults [80]. Such layout rules

include: redundancy of contacts, ring-shaped or close loop

conductive layers, and duplication of interconnections and I/O

conductive paths. These avoid some open faults, or reduce their

appearance probability. Another layout technique—borrowing

from analog designs—is high matching [Fig. 2(a)]. This has

been used in a combined switch- and layout-level approach

for enhancing the noise immunity of threshold logic gates

[77]. Additionally, gate level redundancy and shorted outputs

were suggested in [81]. Having their roots in hardwired voting

[78], other solutions have been detailed for CMOS [82] [see

Fig. 2(b)], as well as for SET [52], [83] [Fig. 2(c)], and very

recently for programmable logic-array (PLA) nanocircuits

[84], [85] [Fig. 2(d)]. These rely on built-in transistor-level

redundancy (e.g., [86] quadruplicates the transistors), with

redundant signals added on top. Such designs can be combined

at higher levels with system-level voting (like e.g., [51]), and/or

with error correcting codes [62].

Among the few other robust circuit and system design

methodologies targeting nanodevices we should mention here

the use of threshold logic gates for evaluating an analog average

[87], [88], while [81] advocates for real time reconﬁgurable

threshold elements (see also [89]).

Other novel process variation compensation techniques being

designed and evaluated range from [90], trying to optimally ad-

justs the strength of the keepers for domino logic gates (based

on an on-die leakage current sensor), to [54] which uses active

body bias and transient noise attenuation via voltage division.

2568 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 54, NO. 11, NOVEMBER 2007

Fig. 3. Interconnect pattern for parallel adders. (a) BK. (b) HC. (c) KS. (d) Layout of a 64-bit KS adder.

All these solutions are highly innovative, but there is a clear

need for comparing such non-standard CMOS designs. Only a

few results based on Monte Carlo simulations for CMOS have

started to be reported (see e.g., [91]), while there are almost none

for beyond CMOS nanotechnologies [47], [48], [83].

We conclude this section by mentioning that a growing

number of publications are dealing with such problems. This

is encouraging, while there is a tendency to move too quickly

to the higher levels [24], [26], [55]–[57], [92]. We believe

that this should be done at a slower pace as the expectation

is that the highest reliability rewards will be at the lowest

level, so one should ﬁrst of all take advantage of (all of) the

’low-lying fruits’. The implication is that many versions of the

same gate, having different reliability performances, should be

designed and tested. These could lead to designs equivalent to

multiple-

ones, which have long been advocated and used in

the bid for low power. In the future, EDA tools might use such

extended libraries of gates for optimizing circuits’ reliability,

in the same way current EDA tools are using multiple-

and

multiple-

for optimizing power consumption.

IV. R

ELIABLE ADDERS

A. Theoretical Analysis

Binary addition has been studied extensively, starting with the

classical (serial) ripple carry (RC) adder and going towards par-

allel implementations [93]–[97]. It is commonly accepted that

RC is the slowest, while Kogge–Stone (KS) [94] is, theoreti-

cally, the fastest, but requires of the order of

more transis-

tors (which translates into larger area and power—both dynamic

due to longer wires, and leakage due to more transistors/gates).

Still, only a few recent studies have analyzed the reliability of

adders [98]–[108]. To get a clear understanding, four different

adders will be analyzed in this section. The four adders under

investigation are:

• the classical RC;

• Brent–Kung (BK) [96] [Fig. 3(a)];

• Han–Carlson (HC) [97] [Fig. 3(b)];

• KS [94] [Fig. 3(c)].

All four adders have been characterized by their number of

layers, their number of nodes (i.e., blocks), their number of

gates, and the length of their wires on the longest (critical) path

(for details, see e.g., [109]).

• The number of layers has been estimated as

, , , and

• The number of nodes has been estimated as

, ,

, and .

• The number of gates has been estimated as

, and .

• Finally, the length of the wires on the critical path was esti-

mated geometrically as

, ,

The factor 2 used for multiplying (the number of) Layers

when estimating the Length of BK, HC, and KS is conservative,

and accounts for the height of the nodes and for the routing space

between adjacent layers. For supporting this claim, compare the

schematic of a 16-bit KS adder [Fig. 3(c)] with the layout of a

64-bit KS adder [Fig. 3(d)] which was drawn at the same ver-

tical scale (other layouts can be found in [110]–[113]; see also

[114]). Remark: All the equations enumerated above have been

reﬁned using ceilings and ﬂoors (when appropriate), while for

even more accurate estimates the interested reader should con-

sult [115], [116].

The reliability of these four adders was quickly (but roughly)

estimated as

(1)

where

is the gate failure probability, and Gates is the

number of gates of the adder (given by the equations detailed

above). The results are plotted in Fig. 4, and are supporting the

intuition that a simpler structure is more reliable. The interested

reader can ﬁnd a few more simulation results for RC, BK, and

KS in [99], [100], as well as some theoretical results (for RC)

in [106]. Additionally, any redundancy scheme is much easier

to integrate with RC than with a parallel adder [2], [117] (or

in general with any locally connected architecture like: systolic

arrays, cellular neural networks, cellular automata, etc.).

B. Practical Multiplexed Adders

Multiplexing has beenadvocated as a powerfulsolution for en-

hancing reliability, unfortunately one that requires very large re-

dundancy factors [50], [53], [118]. Fresh detailed analyses and

exactsimulationshaverevealedthattherequired redundancyfac-

tors are considerably smaller than those predicted by theory. A

detailed performanceevaluation of enhanced multiplexing(

MUX)

schemes has been reported in [51], [52], with simulation results

BEIU et al.: SERIAL ADDITION: LOCALLY CONNECTED ARCHITECTURES 2569

Fig. 4. Estimated probability of failure of different adders (RC, BK, HC, and

KS) versus the number of input bits for

detailed in [47], [83]. These papers conﬁrm that there is a max-

imum threshold for

up to which MUX

schemes improve

on the reliability of the individual gates. For SET and

the

values obtained from Monte Carlo simulations are

for

NAND-2 MUX and for MAJ-3 MUX (when implemented

using capacitive SET), i.e., much higher than the theoretical pre-

dictions (0.0107 for

NAND-2 and respectively 0.0197 for MAJ-3),

andarealsoconﬁrmedbyrecent PTMsimulations[48].Foraclear

understanding of these unexpected results, the interested reader

is referred to [119] and [120]: the fact that each gate is made of

unreliable devices is used in explaining (both theoretically and

through exact simulations) the detailed behavior of the system.

An obvious solution would be to increase the redundancy factor

, 7, 9, 11 (which will be brieﬂy presented later), while

even better approaches are possible.

Based on all of the above (and on the simulation results

from [121]), we decided to focus our attention on

MUX RC (see

Fig. 5). In Fig. 5, the main block of the RC is the well-known

full adder (FA). For a detailed PTM reliability analysis of

four FAs see [49], [122]. The FA we will consider here is the

standard CMOS implementation (Fig. 6 shows only the carry

out circuitry), but many other implementations are possible

[49], [122] (with MAJ FAs more reliable than classical FAs).

For enhancing reliability, a

MUX RC with a redundancy factor

(at the adder level) of

is used (3-RC). Fig. 5(a) presents

a block diagram of the standard RC. A 3-RC [see Fig. 5(b)] has

three FAs (squares) per stage as follows:

• three FAs (used in parallel) represent the execution stage

of a von Neumann

MUX (vN-MUX);

• three MAJ-3 gates (circles) represent the restorative stage

of a vN-

MUX (for the three coming from the three FAs

at bit position

);

• the output of each of these three MAJ-3 (circles) is used to

drive the next three FAs (i.e., they represent three

at bit

position

This is the standard MAJ-3 vN-

MUX. Still, this basic idea

can be implemented in several different ways out of which

Fig. 5. (a) Classical RC adder where the square blocks represent FAs. Three

different multiplexed 3-RC adders: (b) using MAJ-3 gates (circles); (c) short

circuiting the outputs of three FAs and using three inverters (triangles) to re-

cover the voltage; and (d) short circuiting the outputs of three FAs (the voltage

is recovered by the next FAs).

three are detailed here, with each subsequent conﬁguration

being simpler than the previous one. The ﬁrst of the three

structures[Fig. 5(b)] properly implements three MAJ-3 gates

for the restorative stage (represented as circles). This solution

roughly doubles the delay and increases power signiﬁcantly.

The second solution [Fig. 5(c)] is simpler, as the outputs of the

FAs are fed to restorative inverters (triangles). The MAJ-3 gates

have now been replaced by output-wired-inverters (remember

[78]). This solution will be faster, and will dissipate less than

the previous one, as long as there are no faults/defects. In case

of faults/defects, there will be ﬁghting, which will increase the

power consumption, while the inverters will try to restore the

correct logic levels. The simplest structure [Fig. 5(d)] elim-

inates even the restorative inverters and relies upon the next

(stage of) FAs for providing the needed signal restoration (see

also [81]). The restorative MAJ-3 gates have now completely

vanished. This solution will be the fastest, as long as there are

no faults/defects. In case of faults/defects the shorting of the

outputs will result in ﬁghting, increasing the current and the

signal propagation delay while the system could still operate

correctly (this behavior is correctly know as “graceful degra-

dation”). These three structures have been tested for stuck-at

faults. This is a simplistic scenario, as in practice a fault/defect

could manifest itself as an analog value (see [2], [46], [78],

[88], [117], [121]).

MUX using short-circuited outputs of three FAs (MAJ-3 gates

implemented as mirrored adders) is shown in Fig. 6, and corre-

sponds to one stage of MAJ-3 vN-

MUX. In principle, the failure

of one transistor can make a circuit malfunction. This is not the

case when redundancy is introduced. All the transistors on the

schematic have been labeled to enable ease of tracking when an-

alyzing defects. The schematic in Fig. 6, using a 90-nm CMOS

process has been used for simulations. The supply voltage was

lowered down to

mV. Output voltages, in milli-

volts, for all the eight possible input combinations are shown

in Table I. The last column, labeled Defect(s), lists the transis-

tors that were removed from the schematic (representing defects

within the circuit). In this example, the only error appears for the

input combination “001” when transistors P3, P8, and P13 are

removed. This gives 15.04 mV at the output, while the output

should be logic “1.”

2570 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 54, NO. 11, NOVEMBER 2007

Fig. 6. Multiplexed MAJ-3 gates (mirrored adders) with short-circuited outputs.

TABLE I

UTPUT OF

THREE

SHORT-CIRCUITED MINORITY

(MIRRORED ADDER)G

ATES

(SEE FIG.6)

NM CMOS AT

Fig. 7. Estimated probability of failure of four different adders (RC, BK, HC, and KS) and the multiplexed RC (3-RC, 5-RC, 7-RC, 9-RC, and 11-RC) versus the

number of input bits for: (a) gate failure probability of 1%; (b) gate failure probability of 10%.

Fig. 7 compares the results of the

MUXed RC (3-RC) adder

with the standard RC, BK, HC, and KS adders (see Fig. 4).

We have used

in Fig. 7(a), and

in Fig. 7(b) (i.e., 1% and respectively 10%). These two values

have been used in

for RC, BK, HC,

and KS. For 3-RC we have used the probability transfer matrix

(PTM, see [123], [124]) to calculate exactly the reliability

of one FA block. We have used PTM to evaluate the relia-

bility of one

MUXed block [48], [122] and have

used

to estimate the probability

of failure of the 3-RC as a whole. Using PTM (which gives

exact results), as well as exhaustive counting [119] simul-

taneously, we have determined

for

, and for

(see also, [48], [49]). These simulation results show that im-

plementing

MUX at the smallest redundancy factor could still

improve reliability. The case when

(i.e., 1%)

is presented in Fig. 7(a). For 16-bit adders

MUX was able to

reduce the probability of failure of the RC adder from 0.35

(3-RC), i.e., about 35 times. Except 3-RC, none

of the other adders would be able to operate correctly when

. When is increased to 0.1 (i.e., 10%

errors), the simulation results from Fig. 7(b) show that not

even 3-RC is good enough. This suggests that more redun-

BEIU et al.: SERIAL ADDITION: LOCALLY CONNECTED ARCHITECTURES 2571

Fig. 8. Current (worst case) for MAJ-3 MUX RCs (90 nm at 275 mV): (a) using MAJ-3 gates [Fig. 5(b)]; (b) when short circuiting the outputs and using inverters

[Fig. 5(c)]; (c) with short-circuited outputs [Fig. 5(d)]; and (d) when short circuiting the outputs and using inverters (70-nm BPTM [158] at 200 mV).

dancy is needed. The normal solution would be to increase

(long advocated by von Neumann [50]). Using exact counting

arguments [119], we have determined the

for the

case of MAJ gates of fan-in

.For these are

as follows

, ,

. These have been used to plot

3-RC, 5-RC, 7-RC, 9-RC, and respectively 11-RC in Fig. 7(b).

It can be easily seen that 11-RC for

[Fig. 7(b)] is

about as reliable as 3-RC for

[Fig. 7(a)]. This

represents a sizeable increase of the redundancy factor (from

to ). We advocate for a less costly solution,

namely to keep

and use more reliable gates (like the ones

presented in Section III) in the design of the elementary FAs.

Such an approach would ﬁrst of all reduce

, and should

make the combined approach (reliable gates by design and

enhanced low redundancy

MUX) perform similarly to the 3-RC

presented in Fig. 7(a). Such claims require massive Monte

Carlo simulations for the

MUX RC with hardened (reliability

enhanced) gates which we have started to investigate.

The last two solutions [Fig. 5(c) and (d)] have one more ad-

vantage, namely the fact that an error will cause ﬁghting. Ap-

parently this would seem to be a disadvantage, as it is going

to increase the current and the power consumption (also heat

and temperature) when a fault or defect occurs. This is abso-

lutely true, and simulations for two technology nodes can be

seen in Fig. 8. These simulations show that each error (defect

or fault) translates into an increase in current. In case of a tran-

sient fault, the current will increase but will return to its “nom-

inal” value once the fault disappears. In case of a defect, the

current will increase and remain high. This means that we have

a quite simple way to detect errors: by monitoring the current

[2], [90], [125] (or, equivalently but less precisely, heat). This

will allow us to log faults, and also to identify the case when

a fault becomes a defect (see the stair-stepping behavior of the

current in three of the four simulations detailed in Fig. 8, when

the number of faults increases). Knowing that the circuit can

tolerate a few defects, we could set a certain threshold value for

a current sensor (built-in

testing). If the current becomes

larger than a threshold, the sensing circuit will automatically

send a request to the higher level. A local control scheme could

automatically:

• reconﬁgure the I/O connections;

• power-up a spare unit;

• shut down the defective circuit (hot swap).

Such an approach combines several reliability schemes:

• low area (i.e., small redundancy transistor level schemes)

highly reliable gates;

• small redundancy factors gate-level designs;

• automatic (current-based) detection at the circuit/block

level; followed by

• reconﬁguration at the block/system level.

2572 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 54, NO. 11, NOVEMBER 2007

This would span not only all the design levels, but also four

different schemes for enhancing reliability, leading to a self-

healing system. Such an approach should be similarly to the

way the brain is minimizing energy. We should decrease the

supply voltage till the circuit starts making errors, and continue

decreasing the voltage a bit more. This will force the circuit to

make errors (always), but the redundancy scheme used will take

care of them. Fundamentally, as a redundant circuit can tolerate

(a few) errors, it should try to make (a few) errors. Hence, one

way for achieving minimum energy when using a redundant cir-

cuit is to make errors, and this is also how the system knows

that it is running close to the minimum. One last remark is that

such an approach is not at all straightforward when using the

classical MAJ-3 vN-

MUX

presented in Fig. 5(b). This is because

such a solution does not lead to ﬁghting, so the current cannot be

used (easily) for detecting faults and defects [see corresponding

simulations in Fig. 8(a)]. This suggests that certain circuits (de-

signs) are better ﬁtted for local (automatic) current sensing tech-

niques than others.

V. P

OWER-PERFORMANCE CONSIDERATIONS

Reliability by itself cannot be the one and only design goal,

and it should certainly support performance as represented by

the well-known speed-power or energy-delay tradeoffs. Many

results concerning adders have been reported over the years,

with recent ones including both simulations and measurements

[114]–[116], [126], [127]. On top of these, other results have

been reported using threshold logic instead of Boolean logic

[19], [128], [129], and even mixed Boolean-threshold logic so-

lutions have been advocated [130]. Originally, the speed was

the only goal, with speed-power and energy-delay optimizations

emerging later [116].

One approach in the bid to lower power dissipation is the re-

duction of

—obviously the most effective way of reducing

all the components of power (dynamic, static, and leakage)

[131]. The aggressive scaling of

to below (known as

subthreshold operation) has been known since 1972 and used in

ultra low-power designs [132]–[140]. The major disadvantage

of subthreshold operation is its very slow speed. Therefore,

subthreshold operation has been considered a poor approach

where the much-needed speed is sacriﬁced for ultra-low power.

This has drastically limited its application range. Still, while

reducing

might save the day for power consumption, it

is detrimentally affecting reliability. That is because in sub-

threshold noise plays a signiﬁcant role [24], [28], [141], let

alone the higher sensitivity to variations [22], and the reduced

noise margins. In spite of these, quite a few subthreshold results

have been reported recently [142]–[147]. Falling in this trend,

an RC and a KS operated in subthreshold have been compared

in [121]. The main conclusions were that:

• the wires reduce the speed advantage of KS over RC from

4.5

to 2.2 (other results showing that wire delays in

parallel adders are non-negligible were presented in [116]);

• the higher speed of KS at a given

can be matched by

an RC at a

which is only 10% to 20% higher;

• at equal speeds the RC still maintains both its power and

energy advantages.

Obviously, wires are playing an important role in sub-

threshold, affecting the delay and inﬂuencing the dynamic

power. The delay of the adders was evaluated taking into

account both the number of gates on the longest path (which

for KS, HC, and BK,

while being only

for RC, and the

length of the wires on the longest path (Length)

(2)

When

, only the gates are contributing to the Delay, while

the wires are not. By increasing

, the wires (Length) start af-

fecting the Delay more and more.

For estimating Power, both the length on the longest path

(Length) and the total number of gates (Gates) were con-

sidered. These account for part of the switching capacitance

(Length)—hence, dynamic power is subestimated for the par-

allel adders—and for the leakage power (Gates). A factor

was used for specifying (indirectly) the ratio between dynamic

and leakage power leading to the following estimate:

(3)

Spice simulation results (from [121]) were used to ﬁne-tune the

estimates for the Delay [given by (2)] and Power [given by (3)].

By ﬁtting the simulation results obtained for RC and KS,

and (equivalently, for a leakage power of about

33% of the dynamic power as

)

have been determined. Probably both

and could be raised

even higher (up to 0.50), while for power more accurate esti-

mates are available (e.g., [115], [116], [148]), but these will not

(substantially) changes the trends.

Finally, the power-delay product (PDP) and the energy-delay

product (EDP) have been estimated in a straightforward manner

(4)

(5)

The results of these approximations can be seen in Fig. 9, where

Delays, PDPs, and EDPs are shown for two cases:

• without wires and with no leakage power (i.e., for

and ), see Fig. 9(a)–(c).

• with a delay and power comparable with that of

low-voltage subthreshold operation (i.e., for

and

), see Fig. 9(d)–(f).

Obviously, when scaling down CMOS,

will increase so the

Delay of the KS, HC, and BK adders will increase, but KS, HC,

and BK are always going to be faster than RC (as long as

Still, the more interesting results are the ones showing PDPs and

EDPs. When wires and leakage are being accounted for (i.e., for

), the RC gets the best PDP always [Fig. 9(e)],

while achieving the best EDP for any

[Fig. 9(f)]. These

results should get even better for practical implementations, as

the power for KS, HC, and BK was underestimated, while the

estimates for RC are quite accurate. Even more, these results

will improve with scaling as

and should increase.

The plots in Fig. 9 support the claim that serial adders achieve

better PDP and EDP than parallel adders in nanotechnologies,

and bear more weight when one or more of the following are

true:

BEIU et al.: SERIAL ADDITION: LOCALLY CONNECTED ARCHITECTURES 2573

Fig. 9. Estimates of the Delay [(a) and (d)], the PDP [(b) and (e)], and the EDP [(c) and (f)] for the four different adders analyzed (RC, BK, HC, and KS) wit

[(a), (b), and (c)] and without [(d), (e), and (f)] wire effects.

TABLE II

32-

BIT RCA SIMULATIONS FOR VARYING FROM 100 MV TO 700 MV (70 NM BPTM)

• the CMOS circuits are operated in subthreshold;

• elementary devices have small gain (e.g., SET, molecular,

DNA);

• leakage power represents a signiﬁcant part of the power

consumption;

• wires are introducing considerable delays.

Detailed Spice simulations for RC are shown in Fig. 10, with

data presented in Table II, while results for 3-RC are detailed in

Table III.

The four different adders we have analyzed show very dif-

ferent power-delay tradeoffs both when working correctly and

when faulty. The somehow unexpected result is that the most re-

liable adder can also be a low-power one, but unfortunately it

is not the fastest one. Still, it looks like the speed advantage will

not be sustainable when gain will become very small (e.g., SET,

molecular, DNA). Remark: Some improvements will still be

possible by using carry-bypass adders and special layouts (e.g.,

8-like snail geometries) which would allow for all the wires to

be very short (somehow equivalent to the folding of the cortex).

Finally, subthreshold operation might become an interesting

design approach particularly because the operation speeds are

improving as scaling proceeds towards smaller technology

nodes [121], but also because it might allow for an easier

hybridization with ultra low-voltage technologies (e.g., SET,

molecular, DNA). It is not difﬁcult to envision a situation in

which designs in older CMOS technology nodes operating at

standard power supply voltages (nominal

) would have

comparable operation speeds to those in advanced technologies

but operating in subthreshold. This would mean for example

that a microprocessor or DSP designed to run at 1 GHz in

2574 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 54, NO. 11, NOVEMBER 2007

Fig. 10. Effect of varying (32-bit RC) on power, PDP, and EDP (90-nm top, and 70-nm [158] bottom). Optimum operating point is marked with a circle.

TABLE III

ERFORMANCES OF DIFFERENT MAJ-3 MUX (REDUNDANT)RCADDERS (SEE FIG.6)

0.18- m (or 0.13- m) at nominal could be redesigned

to operate at 1 GHz in subthreshold in say 22 nm (or 16 nm).

The main advantage would be a power reduction of one to

two orders of magnitude. Even incorporating redundancy, such

a solution (1 GHz 22-nm subthreshold) could still dissipate

signiﬁcantly less than the one we have started from (1 GHz

0.18-

m nominal ), while also being highly reliable. The

research path is being opened for portable applications that

could enjoy highly extended operation times [131], and also

towards low-power analog computations [117], [149]–[151].

VI. C

ONCLUSION

Although we have not discussed memory and communica-

tions,some would saythat these willbe non-volatileand probably

electro-optical, and they might be right. Still, it is far from clear

how future computations will be implemented. Irrespective of

the technology, reliability will be a major concern. Based on the

examples presented in this paper we should expect the following.

• Serial solutions (or more generally locally connected ones

like systolic arrays, cellular neural networks, cellular au-

BEIU et al.: SERIAL ADDITION: LOCALLY CONNECTED ARCHITECTURES 2575

tomata, etc.) seem to be a very good bet from the power-re-

liability perspective.

• Reliable designs for implementing computations can be

low-power (e.g., [55], [57], [90], [152]–[156]). Possible

explanations can be found in neural computations and com-

munications [157], which suggests that unreliable devices

(i.e., neurons and synapses) can compute and communicate

reliably by cleverly combining redundancy and encoding,

while simultaneously minimizing energy. Additionally,

encoding could help both for reliable communications and

for reducing power.

• For certain designs, defects and faults could manifest them-

selves as reasonably well-deﬁned current steps. If this is the

case, current sensors could automatically trigger reconﬁg-

uration at the higher levels [2].

• Redundant designs should not be weighted with respect to

(their) redundancy factors (as commonly done in the liter-

ature), but with respect to power, energy, and area, as it is

customary in the VLSI community. This also implies that

circuit optimization will have to be done in four dimensions:

delay, area, power, and reliability (for more details on these

lines see [15] and [17]).

• Algorithms for quickly and accurately estimating the relia-

bility of large (complex) systems will have to be developed

[159].

• Optimization will become even more difﬁcult, as there are

many more options when trading power and speed versus

reliability.

• Subthreshold operation might be a simple and practical way

to test novel reliability ideas, as a subthreshold design in 65

nm might be ’as sensitive as’ a CMOS design in 22 nm at

nominal

, or a molecular, or a SET one, etc.

Finally, one more aspect that certainly deserves attention-and

was not discussed here-is that of asynchronous or self-timed

circuits. MAJ gates are well positioned here again [160], as a

C-Muller element is nothing else but a MAJ gate with feedback.

Such designs will be needed for two reasons: (

i) the fact that

delay variations will have to be compensated (and it might be

cheaper to use self-timed circuits than hardwired redundancy,

but as far as we know nobody seems to have a good/detailed

answer for this tradeoff); and (ii) the fact that the power required

for clock distribution is already prohibitive. It follows that many

more items should be added to an already busy reliability-related

research agenda.

EFERENCES

[1] V. Beiu et al., “On the advantages of serial architectures for

low-power reliable computations,” in Proc. Int. Conf. Appl.-Speciﬁc

Syst., Arch. Processors (ASAP’05), Samos, Greece, Jul. 2005, pp.

276–281.

[2] V. Beiu et al., “The vanishing majority gate: Trading power and

speed for reliability,” in Proc. Defect Fault Tolerant Nanoscale Arch.

(NanoArch’05), Palm Springs, CA, May 2005 [Online]. Available:

www.eecs.wsu.edu/~vbeiu/Publications/2005NanoArch.pdf

[3] G. E. Moore, “Cramming more components onto integrated circuits,”

Electron. Mag., vol. 38, pp. 114–117, Apr. 1965.

[4] International technology roadmap for semiconductors, ITRS, San Jose,

CA [Online]. Available: http://public.itrs.net

[5] K. F. Goser et al., “Aspects of systems and circuits for nanoelectronics,”

Proc. IEEE, vol. 85, pp. 558–573, Apr. 1997.

[6] S. Borkar, “Design challenges of technology scaling,” IEEE Micro, vol.

19, no. 4, pp. 23–29, Jul. 1999.

[7] T. Sakurai, “Design challenges for 0.1-

m and beyond,” in Proc. Asia

South Paciﬁc Design Autom. Conf. (ASP-DAC’00), Tokyo, Japan, Jan.

2000, pp. 553–558.

[8] D. J. Frank et al., “Device scaling limits of Si MOSFETs and their

application dependencies,” Proc. IEEE, vol. 89, no. 4, pp. 259–288,

Mar. 2001.

[9] R. W. Keyes, “Fundamental limits of silicon technology,” Proc. IEEE,

vol. 89, no. 3, pp. 227–239, Mar. 2001.

[10] R. E. Bryant et al., “Limitations and challenges for computer-aided

design technology for CMOS VLSI,” Proc. IEEE, vol. 89, no. 3, pp.

341–365, Mar. 2001.

[11] J. A. Davis et al., “Interconnect limits on gigascale integration

(GSI) in the 21st century,” Proc. IEEE, vol. 89, no. 3, pp.

305–324, Mar. 2001.

[12] Q. Chen and J. D. Meindl, “Nanoscale metal-oxide-semiconductor

ﬁeld-effect transistors: Scaling limits and opportunities,” Nanotech-

nology, vol. 15, pp. S549–S555, Jul. 2004.

[13] M. Horowitz, E. Alon, D. Patil, S. Naffzinger, R. Kumar, and K.

Berstein, “Scaling, power, and the future of CMOS,” in Proc. Int.

Electr. Dev. Meeting (IDEM’05), Washington, DC, Dec. 2006, pp.

7–7.

[14] V. Beiu et al., “On nanoelectronic architectural challenges and solu-

tions,” in Proc. IEEE Conf. Nanotech. (NANO’04), Munich, Germany,

Aug. 2004, pp. 628–631.

[15] V. Beiu, “Limits, challenges, and issues in nanoscale and bio-inspired

computing,” in Bio-Inspired and Nano-Scale Integrated Computing,M.

M. Eshaghian-Wilner, Ed. New York: Wiley, 2007/8.

[16] J. D. Meindl, Q. Chen, and J. A. Davis, “Limits on silicon nanoelec-

tronics for terascale integration,” Sci, vol. 293, pp. 2044–2049, Sep.

2001.

[17] V. Beiu and U. Rückert, Eds., Emerging Brain Inspired Nano Archi-

tectures. Singapore: World Scientiﬁc Press, 2008.

[18] S. Tiwari et al., “Electronics at nanoscale: Fundamental and practical

challenges, and emerging directions,” in Proc. Int. Conf. Emerging

Tech. Nanolelectr. (Nano’06), Singapore, Jan. 2006, pp. 481–486.

[19] M. H. Sulieman and V. Beiu, “Characterization of a 16-bit threshold

logic single-electron adder,” in Proc. Int. Symp. Circuits Syst.

(ISCAS’04), Vancouver, QC, Canada, May 2004, pp. 681–684.

[20] C. Constantinescu, “Trends and challenges in VLSI circuit reliability,”

IEEE Micro, vol. 23, no. 4, pp. 14–19, Jul. 2003.

[21] S. K. Shukla et al., “Nano, quantum, and molecular computing: Are

we ready for the validation and test challenges?,” in Proc. 8th IEEE

Int. High-Level Design Validation Test Workshop, San Francisco, CA,

Nov. 2003, pp. 3–7.

[22] A. R. Brown, A. Asenov, and J. R. Watling, “Intrinsic ﬂuctuation in

sub 10-nm double-gate MOSFETs introduced by discreteness of charge

and matter,” IEEE Trans. Nanotech., vol. 1, pp. 195–200, Dec. 2002.

[23] S. Borkar, “Designing reliable systems from unreliable components:

The challenges of transistor variability and degradation,” IEEE Micro,

vol. 25, no. 6, pp. 10–16, Nov.-Dec. 2005.

[24] V. Degalahal et al., “The effect of threshold voltages on the soft error

rate,” in Proc. Int. Symp. Quality Electr. Design (ISQED’04), San Jose,

CA, Mar. 2004, pp. 503–508.

[25] M. Nicolaidis, “Design for soft error mitigation,” IEEE Trans. Dev.

Material Reliab., vol. 5, no. 5, pp. 405–418, Sep. 2005.

[26] K. Constantinides et al., “Assessing SEU vulnerability via circuit-level

timing analysis,” in Proc. Workshop Arch. Rel. (WAR-1), Nov. 2005.

[27] D. Rossi et al., “Multiple transient faults in logic: An issue for next

generation ics?,” in Proc. 20th IEEE Int. Symp. Defect Fault Tolerance

VLSI Syst. (DFT’05), Monterey, CA, Oct. 2005, pp. 352–360.

[28] P. Shivakumar et al., “Modeling the effect of technology trends on soft

error rate of combinatorial logic,” in Proc. Int. Conf. Dependable Sys.

Networks (DSN’02), Washington, DC, Jun. 2002, pp. 389–398.

[29] R. Ho, “On-Chip wires: Scaling and efﬁciency” Ph.D. thesis, Elect.

Eng. Dept., Stanford Univ, Stanford, CA, 2003 [Online]. Available:

www-vlsi.stanford.edu/papers/rh_thesis.pdf

[30] W. Burleson and A. Maheshwari, VLSI Interconnects: A Design Per-

spective. San Francisco, CA: Elsevier/Morgan Kaufman, 2007/8.

[31] J. A. Hutchby et al., “Extending the road beyond CMOS,” IEEE Cir-

cuits Dev. Mag., vol. 18, no. 3, pp. 28–41, Mar. 2002.

[32] R. Waser, Ed., Nanoelectronics and Information Technology, 2nd ed.

New York: Wiley, 2005.

[33] K. K. Likharev, “Single-electron devices and their applications,” Proc.

IEEE, vol. 87, no. 4, pp. 606–632, Apr. 1999.

[34] U. Feldkamp and C. M. Niemeyer, “Rational design of DNA nanoar-

chitectures,” Angew. Chem. Int. Ed., vol. 45, pp. 1856–1876, Mar.

2006.

2576 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 54, NO. 11, NOVEMBER 2007

[35] C. Lin et al., “DNA tile based self-assembly: Building complex nanoar-

chitectures,” ChemPhysChem, vol. 7, pp. 1641–1647, Aug. 2006.

[36] J. E. Green et al., “A 160-kilobit molecular electronic memory

patterned at

bits per square centimeter,” Nature, vol. 445, pp.

414–417, Jan. 2007.

[37] M. Forshaw et al., “A short review of nanoelectronic architectures,”

Nanotechnology, vol. 15, pp. S220–S223, Feb. 2004.

[38] M. Forshaw et al., A review of the status of research and training into

architectures for nanoelectronic and nanophotonic systems in the Eu-

ropean research area, Univ. College London, London, U.K., Tech. Rep.

FP6/2002/IST/1, Contract #507519, 2004.

[39] V. Beiu, “Neural inspired architectures for nanoelectronics: Highly re-

liable, ultra-low-power, reconﬁgurable, asynchronous,” in Proc. Neural

Inf. Process. Syst. (NIPS’03), Whistler, Canada, Dec. 2003 [Online].

Available: www.eecs.wsu.edu/\sim vbeiu/workshop_nips03

[40] J. R. Heath et al., “A defect-tolerant computer architecture: Opportu-

nities for nanotechnology,” Sci, vol. 280, pp. 1716–1721, Jun. 1998.

[41] K. Nicolic, A. S. Sadek, and M. Forshaw, “Architectures for reliable

computing with unreliable nanodevices,” in Proc. IEEE Conf. Nan-

otech. (NANO’01), Maui, HI, Oct. 2001, pp. 254–259.

[42] K. Nicolic, A. S. Sadek, and M. Forshaw, “Fault-tolerant techniques for

nanocomputers,” Nanotechnology, vol. 13, pp. 357–362, May 2002.

[43] J. A. B. Fortes, “Future challenges in VLSI system design,” in Proc.

IEEE Comput. Soc. Annu. Symp. VLSI. New Trends Technol. For VLSI

Syst. Design. (ISVLSI’03), Tampa, FL, Feb. 2003, pp. 5–7.

[44] T. Lehtonen, J. Plosila, and J. Isoaho, On fault tolerance techniques

towards nanoscale circuits and systems, Dept. of Inf. Technol., Univ.

of Turku , Turku, Finland, 2005.

[45] J. E. Harlow III, “Toward design technology in 2020: Trends, issues,

and challenges,” in Proc. Int. Symp. VLSI (ISVLSI’03), Tampa, FL, Feb.

2003, pp. 3–4.

[46] M. T. Niemeir, M. Crocker, X. Sharon Hu, and M. Lieberman, “Using

CAD to shape experiments in molecular QCA,” in Proc. Int. Conf.

Comp.-Aided Design (ICCAD’06), San Jose, CA, Nov. 2006, pp.

907–914.

[47] V. Beiu and M. H. Sulieman, “On practical multiplexing issues,” in

Proc. IEEE Conf. Nanotech. (NANO’06), Cincinnati, OH, Jul. 2006,

pp. 310–313.

[48] V. Beiu et al., “Gate failures effectively shape multiplexing,” in Proc.

21st IEEE Int. Symp. Defect Fault Tolerance VLSI Syst. (DFT’06),Ar-

lington, VA, Oct. 2006, pp. 29–40.

[49] W. Ibrahim, V. Beiu, and Y. A. Alkhawwar, “On the reliability of four

full adder cells,” in Proc. Int. Design Test Workshop (IDT’06), Dubai,

U.A.E., Nov. 2006.

[50] J. von Neumann, “Probabilistic logics and the synthesis of reliable

organisms from unreliable components,” in Automata Studies,C.E.

Shannon and J. McCarthy, Eds. Princeton, NJ: Princeton University

Press, 1956, pp. 43–98.

[51] S. Roy and V. Beiu, “Multiplexing schemes for cost-effective fault-

tolerance,” in Proc. 4th IEEE Conf. Nanotech. (NANO’04), Munich,

Germany, Aug. 2004, pp. 589–592.

[52] S. Roy and V. Beiu, “Majority multiplexing-economical redundant

fault-tolerant designs for nanoarchitectures,” IEEE Trans. Nanotech.,

vol. 4, no. 4, pp. 441–451, Jul. 2005.

[53] A. S. Sadek, K. Nikolic, and M. Forshaw, “Parallel information

and computation with restitution for noise-tolerant nanoscale logic

networks,” Nanotechnology, vol. 15, pp. 192–210, Jan. 2004.

[54] M. Zhang and N. R. Shanbhag, “A CMOS design style for logic circuit

hardening,” in Proc. Int. Reliab. Phys. Symp. (IRPS’05), San Jose, CA,

Apr. 2005, pp. 223–229.

[55] K. Constantinides et al., “BulletProof: A defect-tolerant CMP switch

architecture,” in Proc. Int. Symp. High-Perf. Comp. Arch. (HPCA’06),

Austin, TX, Feb. 2006, pp. 5–16.

[56] S. Mitra et al., “Soft error resilient system design through error cor-

rection,” in Proc. IFIP Int. Conf. Very Large Scale Integration, Nice,

France, Oct. 2006, pp. 332–337.

[57] S. Shyam et al., “Ultra low-cost defect protection for microprocessor

pipelines,” in Proc. Int. Conf. Arch. Support Prog. Lang. Op. Sys. (AS-

PLOS’06), San Jose, CA, Oct. 2006, pp. 73–82.

[58] A. J. KleinOsowski and D. J. Lilja, “The nanobox project: Exploring

fabrics of self-correcting logic blocks for high defect rate molecular

device technologies,” in Proc. IEEE Comput. Soc. Annu. Symp. VLSI

(IVLSI’04), Lafayette, LA, Feb. 2004, pp. 19–24.

[59] A. J. KleinOsowski et al., “The recursive nanobox processor grid: A

reliable system architecture for unreliable nanotechnology devices,” in

Proc. Int. Conf. Dependable Sys. Networks (DSN’04), Florence, Italy,

Jun. 2004, pp. 167–176.

[60] V. Beiu, “The quest for practical redundant computations,” in Proc.

Int. Conf. Microelectr. (ICM’05), Islamabad, Pakistan, Dec. 2005, pp.

xix–xix.

[61] V. P. Roychowdhury, D. B. Janes, and S. Bandyopadhyay, “Nanoelec-

tronic architectures for boolean logic,” Proc. IEEE, vol. 85, no. 4, pp.

574–588, Apr. 1997.

[62] R. Reischuk and B. Schmeltz, “Area efﬁcient methods to increase

the reliability of combinatorial circuits,” in Proc. Int. Symp. Theor.

Aspects Comp. Sci. (STACS’89), Paderbon, Germany, Feb. 1989,

pp. 314–326.

[63] E. F. Moore and C. E. Shannon, “Reliable circuits using less reliable

relays,” J. Franklin Inst. B, vol. 262, pp. 191–208, Sep.–Oct. 1956.

[64] S. Winograd and J. D. Cowan, Reliable Computation in the Presence

of Noise. Cambridge, MA: MIT Press, 1963.

[65] N. J. Nilsson, Learning Machines. New York: McGraw-Hill, 1965.

[66] B. Widrow and M. E. Hoff, “Adaptive switching circuits,” IRE Wescon

Conv., vol. 4, pp. 96–104, Aug. 1960.

[67] C. Coates and P. Lewis, “DONUT: A threshold gate computer,” IRE

Trans. Electr. Comput., vol. 13, pp. 240–247, Jun. 1964.

[68] N. P. Brousentsov, “Computing machine setun of moscow state univer-

sity,” New Develop. Comp. Tech., pp. 226–234, 1960.

[69] N. P. Brousentsov, “Threshold realization of three-valued logic on elec-

tromagnetic elements,” Comp. Probls. Cyber., vol. 9, pp. 3–35, 1972.

[70] H. L. Hughes and J. M. Benedetto, “Radiation effects and hardening of

MOS technology: Devices and circuits,” IEEE Trans. Nucl. Sci., vol.

50, pp. 500–521, Jun. 2003.

[71] C. Bolchini et al., “A CMOS fault tolerant architecture for switch-level

faults,” in Proc. IEEE Int. Workshop Defect Fault Tolerance VLSI Syst.,

Montreal, QC, Canada, Oct. 1994, pp. 10–18.

[72] C. Bolchini et al., “Static redundancy techniques for CMOS gates,” in

Proc. IEEE Int. Symp. Circuits Syst. (ISCAS’96), Atlanta, GA, May

1996, vol. 4, pp. 576–579.

[73] C. Bolchini et al., “An improved fault tolerant architecture at CMOS

level,” in Proc. Int. Symp. Circuits Syst. (ISCAS’97), Kowloon, Hong

Kong, Jun. 1997, pp. 2737–2740.

[74] J. Kumar and M. B. Tahoori, “A low power soft error suppression

technique for dynamic logic,” in Proc. 20th IEEE Int. Symp. Defect

Fault Tolerance VLSI Syst. (DFT’05), Monterey, CA, Oct. 2005, pp.

454–462.

[75] M. P. Baze, S. P. Buchner, and D. McMorrow, “A digital CMOS design

technique for SEU hardening,” IEEE Trans. Nucl. Sci., vol. 47, no. 12,

pp. 2603–2608, Dec. 2000.

[76] V. Beiu, “Ultra-fast noise immune CMOS threshold gates,” in Proc.

Int. Midwest Symp. Circuits Syst. (MWSCAS’00), Lansing, USA, Aug.

2000, pp. 1310–1313.

[77] S. Tatapudi and V. Beiu, “Split-precharge differential noise-immune

threshold logic gate (SPD-NTL),” in Proc. Int. Work-Conf. Artif.

Neural Networks (IWANN’03), Menorca, Spain, Jun. 2003, pp. 49–56.

[78] J. B. Lerch, “Threshold gate circuits employing ﬁeld-effect transistors,”

U.S. patent # 3715603, Feb. 6, 1973.

[79] J. Galiay, Y. Crouzet, and M. Vergniault, “Physical versus logical fault

models MOS LSI circuits: Impact on their testability,” IEEE Trans.

Comp., vol. C-29, no. 6, pp. 527–531, Jun. 1980.

[80] F. C. Blom et al., “Layout level design for testability strategy applied

to a CMOS cell library,” in Proc. Int. Workshop Defect Fault Tolerance

VLSI Sys. (DFT’93), Venice, Italy, Oct. 1993, pp. 199–206.

[81] S. Aunet and M. Hartmann, “Real-time reconﬁgurable linear threshold

elements and some applications to neural hardware,” in Proc. Int.

Conf. Evolvable Syst. (ICES’03), Trondheim, Norway, Mar. 2003, pp.

365–376.

[82] S. Aunet, Y. Berg, and V. Beiu, “Ultra-low-power redundant logic

based on majority-3 gates,” in Proc. IFIP Conf. VLSI Sys.-On-Chip

(VLSI-SoC’05), Perth, Australia, Oct. 2005, pp. 553–558.

[83] M. H. Sulieman and V. Beiu, “Design and analysis of SET circuits:

Using MATLAB modules and simon,” in Proc. IEEE Conf. Nanotech.

(NANO’04), Munich, Germany, Aug. 2004, pp. 618–621.

[84] C. A. Moritz and T. Wang, “Towards defect-tolerant nanoscale archi-

tectures,” in Proc. IEEE Conf. Nanotech. (NANO’06), Cincinnati, OH,

Jul. 2006, pp. 331–334.

[85] T. Wang et al., “Combining circuit level and system level techniques

for defect-tolerant architectures,” in Proc. Int. Workshop Defect

Fault Tolerant Nanoscale Arch. (NanoArch’06), Boston, MA, Jun.

2006.

[86] L. Anghel and M. Nicolaidis, “Defects tolerant logic gates for

unreliable future nanotechnologies,” in Proc. Int. Work-Conf. Artif.

Neural Networks (IWANN’07), San Sebastián, Spain, Jun. 2007,

pp. 422–429.

BEIU et al.: SERIAL ADDITION: LOCALLY CONNECTED ARCHITECTURES 2577

[87] A. Schmid and Y. Leblebici, “Robust circuit and system design

methodologies for nanometer-scale devices and single-electron tran-

sistors,” in

Proc. IEEE Conf. Nanotech. (NANO’03), San Francisco,

CA, Aug. 2003, vol. 2, pp. 516–519.

[88] A. Schmid and Y. Leblebici, “Robust circuit and system design

methodologies for nanometer-scale devices and single-electron tran-

sistors,” IEEE Trans. VLSI Syst., vol. 12, no. 11, pp. 1156–1166, Nov.

2004.

[89] S. Aunet and V. Beiu, “Ultra-low-power fault tolerant neural inspired

CMOS logic,” in Proc. Int. Joint Conf. Neural Networks (IJCNN’05),

Montréal, Canada, Aug. 2005, pp. 2843–2848.

[90] C. H. Kim et al., “A process variation compensation technique with an

on-die leakage current sensor for nanometer scale dynamic circuits,”

IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 6, pp.

646–649, Jun. 2005.

[91] K. Granhaug and S. Aunet, “Improving yield and defect tolerance in

multifunction subthreshold CMOS gates,” in Proc. Int. Symp. Defect

Fault Tolerance VLSI Syst. (DFT’06), Arlington, VA, Oct. 2006, pp.

20–28.

[92] Y. Cao et al., “Yield optimization with energy-delay constraints in low-

power digital circuits,” in Proc. Int. Conf. Electron Dev. Solid-State

Circuits, Kowloon, Hong Kong, Dec. 2003, pp. 285–288.

[93] A. Weinberger and J.L. Smith, “A logic for high-speed addition,” Natl.

Bur. Stand. Circuits 591, pp. 3–12, 1958.

[94] P. M. Kogge and H. Stone, “A parallel algorithm for the efﬁcient so-

lution of a general class of recurrence equations,” IEEE Trans. Comp.,

vol. C-22, no. 8, pp. 786–793, Aug. 1973.

[95] R. E. Ladner and M. J. Fischer, “Parallel preﬁx computations,” J. ACM,

vol. 27, pp. 831–838, Oct. 1980.

[96] R. P. Brent and H. T. Kung, “A regular layout for parallel adders,” IEEE

Trans. Comp., vol. C-31, pp. 260–264, Mar. 1982.

[97] T. Han and D. A. Carlson, “Fast area-efﬁcient VLSI adders,” in Proc.

Int. Symp. Comp. Arithmetic (ARITH’87), Como, Italy, May 1987, pp.

49–56.

[98] M. Nicolaidis, “Carry checking/parity prediction adders and alus,”

IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 11, pp.

121–128, Feb. 2003.

[99] R. Ramanarayanan et al., “Soft errors in adder circuits,” in Proc. Mil.

Aerosp. Appls. Prog. Logic Devs. Tech. (MAPLD’04), Washington,

DC, Sep. 2004 [Online]. Available: klabs.org/mapld04/abstracts/ra-

manarayanan_a.pdf

[100] S. Tosun et al., “An ILP formulation for reliability-oriented high-level

synthesis,” in Proceedings. 6th Int. Symp. Quality Electron. Design

(ISQED’05), San Jose, CA, Mar. 2005, pp. 364–369.

[101] D. B. Strukov and K. K. Likharev, “CMOL FPGA: A reconﬁgurable

architecture for hybrid digital circuits with two-terminal nanodevices,”

Nanotechnology, vol. 16, pp. 888–999, Jun. 2005.

[102] S. Peng and R. Manohar, “Fault tolerant asynchronous adder through

dynamic self-reconﬁguration,” in Proc. Int. Conf. Comp. Design: VLSI

Comp. (ICCD’05), San Jose, CA, Oct. 2005, pp. 171–178.

[103] T. Hogg and G. S. Snider, “Defect-tolerant adder circuits with

nanoscale crossbars,” IEEE Trans. Nanotechnol., vol. 5, no. 2, pp.

97–100, Mar. 2006.

[104] W. Rao, A. Orailoglu, and R. Karri, “Fault identiﬁcation in recon-

ﬁgurable carry lookahead adders targeting nanoelectronic fabrics,” in

Proc. Eur. Test Symp. (ETS’06), Southampton, U.K., May 2006, pp.

63–68.

[105] F. Worm, P. Thiran, and P. Ienne, “Designing robust checkers

in the presence of massive timing errors,” in Proc. 12th IEEE

Int. On-Line Testing Symp. (IOLTS’06), Como, Italy, Jul. 2006,

pp. 281–286.

[106] J. P. Hayes, I. Polian, and B. Becker, “A model for transient faults in

logic circuits,” in Proc. Int. Design Test Workshop (IDT’06), Dubai,

U.A.E., Nov. 2006.

[107] T. Hogg and G. S. Snider, “Defect-tolerant logic with nanoscale

crossbar circuits,” J. Electr. Testing: Theor. Appls., vol. 23, pp.

117–129, Jun. 2007.

[108] D. Patil et al., “Robust energy-efﬁcient adder topologies,” in Proc. Int.

Symp. Comp. Arith. (ARITH18), Montpellier, France, Jun. 2007, pp.

16–28.

[109] R. Zimmermann, “Binary adder architectures for cell-based VLSI and

their synthesis” Ph.D. thesis, Swiss Federal Inst. Tech, Zurich, Switzer-

land, 1997.

[110] Y. Shimazaki, R. Zlatanovici, and B. Nikolic, “A shared-well dual-

supply-voltage 64-bit ALU,” in Proc. Int. Solid-State Circuits Conf.

(ISSCC’03), San Francisco, CA, Feb. 2003, pp. 104–105.

[111] Q.-W. Kuo, V. Sharma, and C. C.-P. Chen, “Substrate-bias optimized

0.18-

m 2.5 GHz 32-bit adder with post-manufacture tunable clock,”

in Proc. Int. Symp. VLSI Design. Autom. Test (DAT’05), Hsinchu,

Taiwan, R.O.C., Apr. 2005, pp. 341–344.

[112] G. Yang et al., “A 32-bit carry lookahead adder using dual-path all-n

logic,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no.

8, pp. 992–996, Aug. 2005.

[113] A. Glodovsky et al., “A folded 32-bit preﬁx tree adder in 0.16-

static CMOS,” in Proc. Midwest Symp. Circuits Syst. (MWSCAS’00),

Lansing, MI, Aug. 2000, pp. 268–373.

[114] M. Ziegler and M. R. Stan, “Optimal logarithmic adder structures with

a fan-out of two for minimizing the area-delay product,” in Proc. Int.

Symp. Circuits Syst. (ISCAS’01), Sydney, Australia, May 2001, pp.

657–660.

[115] R. A. Freking and K. K. Parhi, “Theoretical estimation of power

consumption in binary adders,” in Proc. Int. Symp. Circuits Syst.

(ISCAS’98), Monterey, CA, Jun. 1998, pp. 453–457.

[116] V. G. Oklobdzija and R. Krishnamurthy, “Design of power efﬁcient

VLSI arithmetic: Speed and power tradeoffs,” in Proc. Int. Symp.

Comp. Arith. (ARITH’03), Santiago de Compostela, Spain, Jun. 2003,

pp. 280–1280 [Online]. Available: www.acsel-lab.com/Presenta-

tions/ARITH-Tutorial-Vojin.pps

[117] V. Beiu, “A novel highly reliable low-power nano architecture: When

von Neumann augments Kolmogorov,” in Proc. Int. Conf. Application-

speciﬁc Sys., Arch. Processors (ASAP’04), Galveston, TX, Sep. 2004.

[118] M. Forshaw, K. Nikolic, and A. S. Sadek, ANSWERS: Autonomous

nanoelectronic systems with extended replication and signaling Uni-

versity College London, London, U.K., MEL-ARI #28667, 3rd Year

Report, 2001.

[119] V. Beiu, W. Ibrahim, and S. Lazarova-Molnar, “What von neumann did

not say about multiplexing-the gory details,” in Proc. Int. Work-Conf.

Artif. Neural Networks (IWANN’07), San Sebastian, Spain, Jun. 2007,

pp. 487–496.

[120] V. Beiu, W. Ibrahim, and S. Lazarova-Molnar, “A fresh look

at majority multiplexing-when devices get into the picture,” in

Proc. IEEE Int. Conf. Nanotech. (NANO’07), Hong Kong, Aug.

2007, pp. 883–888.

[121] V. Beiu, A. Djupdal, and S. Aunet, “Ultra-low-power neural inspired

addition: When serial might outperform parallel architectures,” in Proc.

Int. Work-Conf. Artif. Neural Networks (IWANN’05), Barcelona, Spain,

Jun. 2005, pp. 486–493.

[122] W. Ibrahim, V. Beiu, and M. H. Sulieman, “On the reliability of ma-

jority gates full adders,” IEEE Trans. Nanotech., accepted for publica-

tion.

[123] K. N. Patel, I. L. Markov, and J. P. Hayes, “Evaluating circuit relia-

bility under probabilistic gate-level fault models,” in Proc. Int. Work-

shop Logic Synthesis (IWLS’03), Laguna Beach, CA, May 2003, pp.

59–64.

[124] S. Krishnaswamy et al., “Accurate reliability evaluation and enhance-

ments via probabilistic transfer matrices,” in Proc. Design Autom. Test

Eur. (DATE’05), Munich, Germany, Mar. 2005, pp. 282–287.

[125] D. F. Hepner and A. D. Walls, “Predictive failure analysis and failure

isolation using current sensing,” U.S. patent # 7003409, Feb. 21, 2006.

[126] D. M. Markovic, A power/area optimal approach to VLSI

signal processing Univ. California, Berkeley, CA, Tech. Rep.

UCB/EECS-2006–65, 2006.

[127] S. Kao, R. Zlatanovici, and B. Nikolic, “A 240 ps 64b carry-looka-

head adder in 90-nm CMOS,” in Proc. Int. Solid-State Circuits Conf.

(ISSCC’06), San Francisco, CA, Feb. 2006, pp. 1735–1744.

[128] V. Beiu, “Constructive threshold logic addition: A synopsis of the last

decade,” in Proc. Int. Conf. Neural Networks (ICANN’03), Istanbul,

Turkey, Jul. 2003, pp. 745–752.

[129] V. Beiu, J. M. Quintana, and M. J. Avedillo, “VLSI implementations of

threshold logic: A comprehensive survey,” IEEE Trans. Neural Netw.,

no. 14, pp. 1217–1243, Sep. 2003.

[130] P. Celinski, S. F. Al-Sarawi, and D. Abbott, “Logical effort

based design exploration of 64-bit adders using a mixed dy-

namic-cmos/threshold-logic approach,” in Proc. IEEE Comput.

Soc. Annu. Symp. VLSI (IVLSI’04), Lafayette, LA, Feb. 2004, pp.

127–132.

[131] V. Beiu et al., “Femto joule switching for nano electronics,” in Proc.

ACS/IEEE Int. Conf. Comp. Sys. Appls. (AICCSA’06), Sharjah, U.A.E.,

Mar. 2006, pp. 415–423.

[132] R. M. Swanson and J. D. Meindl, “Ion-implanted complementary MOS

transistors in low-voltage circuits,” IEEE J. Solid-State Circuits, vol.

SC-7, no. 4, pp. 146–153, Apr. 1972.

2578 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 54, NO. 11, NOVEMBER 2007

[133] E. A. Vittoz and J. Fellrath, “CMOS analog integrated circuits based on

weak inversion operation,”

IEEE J. Solid-State Circuits, vol. SC-12, no.

3, pp. 224–231, Jun. 1977.

[134] C. A. Mead, “Neuromorphic electronic systems,” Proc. IEEE, vol. 78,

no. 10, pp. 1629–1636, Oct. 1990.

[135] E. A. Vittoz, “Very low power circuit design: Fundamentals and limits,”

in Proc. Int. Symp. Circuits Syst. (ISCAS’93), Chicago, IL, May 1993,

pp. 1439–1442.

[136] E. A. Vittoz, “Low-power design: Ways to approach the limits,” in Proc.

IEEE Int. Solid-State Circuits Conf. (ISSCC ’94), San Francisco, CA,

Feb. 1994, pp. 14–18.

[137] G. Schrom and S. Selberherr, “Ultra-low-power CMOS technologies,”

in Proc. Int. Semicond. Conf. (CAS’96), Sinaia, Romania, Oct. 1996,

vol. 1, pp. 237–246.

[138] J. B. Burr and J. Shott, “A 200-mV self-testing encoder/decoder using

Stanford ultra-low-power CMOS,” in Proc. IEEE Int. Solid-State Cir-

cuits Conf. (ISSCC ’94), San Francisco, CA, Feb. 1994, pp. 84–85.

[139] T. S. Lande et al., “FLOGIC-Floating-gate logic for low-power oper-

ation,” in Proc. 3rd Int. Conf. Electronics, Circuits, Syst. (ICECS’96),

Rodos, Greece, Oct. 1996, vol. 2, pp. 1041–1044.

[140] C. H. Kim, H. Soeleman, and K. Roy, “Ultra-low-power DLMS adap-

tive ﬁlter for hearing aid applications,” IEEE Trans. Very Large Scale

Integr. (VLSI) Syst., vol. 11, no. 12, pp. 1058–1067, Dec. 2003.

[141] P. Shivakumar et al., “Exploiting microarchitectural redundancy for

defect tolerance,” in Proc. 21st Int. Conf. Comput. Design, San Jose,

CA, Oct. 2003, pp. 481–488.

[142] S. Aunet et al., “Reconﬁgurable subthreshold CMOS perceptron,” in

Proc. Int. Joint Conf. Neural Networks (IJCNN’04), Budapest, Hun-

gary, Jul. 2004, pp. 1983–1988.

[143] B. H. Calhoun, A. Wang, and A. Chandrakasan, “Device sizing for

minimum energy operation in subthreshold circuits,” in Proc. Custom

IC Conf. (CICC’04), Orlando, FL, Oct. 2004, pp. 95–98.

[144] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, “Low-power

CMOS digital design,” IEEE J. Solid-State Circuits, vol. 27, no. 4, pp.

473–484, Apr. 1992.

[145] A. Wang, A. P. Chandrakasan, and S. Kosonocky, “Optimal supply

and threshold scaling for subthreshold CMOS circuits,” in Proc. Annu.

Symp. VLSI (ISVLSI’02), Pittsburgh, PA, Apr. 2002, pp. 5–9.

[146] D. D. Wentzloff et al., “Design considerations for next generation wire-

less power-aware microsensor nodes,” in Proc. 17th Int. Conf. VLSI

Design (VLSID’04), Mumbai, India, Jan. 2004, pp. 361–367.

[147] H. Soeleman, K. Roy, and B. Paul, “Robust subthreshold logic for

ultra-low power operation,” IEEE Trans. Very Large Scale Integr.

(VLSI) Syst., no. 9, pp. 90–99, Feb. 2001.

[148] K. Johansson, O. Gustafsson, and L. Wanhammar, “Power estimation

for ripple-carry adders with correlated input data,” in Proc. Int. Conf.

IC Sys. Design (ICSD’04), Toulouse, France, Jul. 2004, pp. 662–674.

[149] G. E. R. Cowan, R. Melville, and Y. Tsividis, “A VLSI analog com-

puter/digital computer accelerator,” IEEE J. Solid-State Circuits, vol.

41, no. 1, pp. 42–53, Jan. 2006.

[150] G. E. R. Cowan, “A VLSI analog computer/math co-processor for a

digital computer,” Ph.D. thesis, Columbia University, New York, 2005

[Online]. Available: digitalcommons.libraries.columbia.edu/disserta-

tions/AAI3174769

[151] R. Sarpeshkar, “Brain power: Borrowing from biology makes for low-

power computing,” IEEE Spectr., vol. 43, no. 5, pp. 24–29, May 2006.

[152] J. Gambles et al., “An ultra-low-power, radiation-tolerant reed solomon

encoder for space applications,” in Proc. Custom IC Conf. (CICC’03),

San Jose, CA, Sep. 2003, pp. 631–634.

[153] J. Donald and M. Martonosi, “Power efﬁciency for variation-tolerant

multicore processors,” in Proc. 2006 Int. Symp. Low Power Electron.

Design (ISLPED’06 ), Tegernsee, Germany, Oct. 2006, pp. 304–309.

[154] A. Datta et al., “Delay modeling and statistical design of pipelined cir-

cuit under process variation,” IEEE Trans. Comput.-Aided Design In-

tegr. Circuits Syst., vol. 25, no. 11, pp. 2427–2436, Nov. 2006.

[155] J. K. McIver III and T. Clark, “Reducing radiation-hardened digital

circuit power consumption,” IEEE Trans. Nucl. Sci., vol. 52, no. 12,

pp. 2503–2509, Dec. 2005.

[156] K. Ishibashi et al., “Low-voltage and low-power logic, memory, and

analog circuit techniques for socs using 90-nm technology and be-

yond,” IEICE Trans. Electron., vol. E89-C, no. 3, pp. 250–262, Mar.

2006.

[157] W. B. Levy and R. A. Baxter, “Energy-efﬁcient neuronal computation

via quantal synaptic failures,” J. Neurosci., vol. 22, pp. 4746–4755, Jun.

2002.

[158] Y. Cao et al., “New paradigm of predictive MOSFET and intercon-

nect modeling for early circuit design,” in Proc. Int. Custom IC Conf.

(CICC’00), Orlando, FL, May 2000, pp. 201–204.

[159] S. Lazarova-Molnar, V. Beiu, and W. Ibrahim, “Reliability: The fourth

optimization pillar of nanoelectronics,” in Proc. Int. Conf. Signal Proc.

Comm. (ICSPC’07), Dubai, U.A.E., Nov. 2007.

[160] V. Beiu, “On brain, yield, energy, and delays-there are plenty of oppor-

tunities at the top,” Semiconductor Research Corp., Research Triangle

Park, NC, Mar. 22, 2007.

Valeriu Beiu (S’92–M’95–SM’96) received the

M.Sc. degree in computer engineering from the

“Politehnica” University of Bucharest, Bucharest,

Romania, in 1980, and the Ph.D. degree (summa cum

laude) in electrical engineering from the Katholieke

Universiteit Leuven, Leuven, Belgium, in 1994.

After graduation, he worked for two years, with

the Research Institute for Computer Techniques,

Bucharest, Romania, focusing on high-speed CPUs

and FPUs. He then rejoined the “Politehnica”

University of Bucharest. Since 1991, has been

“on leave of absence” and a visiting researcher to Katholieke Universiteit

Leuven, Leuven, Belgium (1991–1994), King’s College London, London,

U.K., (1994–1996), and Los Alamos National Laboratory, Los Alamos, NM

(1996–1998). In 1998, he co-founded RN2R, Dallas, TX, a VLSI IP startup

company, and was its Chief Technical Ofﬁcer (1998–2001). In 2001, he joined

the School of Electrical Engineering and Computer Science, Washington State

University, Pullman, WA, and in 2005, he became Visiting Professor with the

School of Computing Intelligent Systems, University of Ulster (Londonderry,

U.K.). Since 2006, he has been the Associate Dean for Research, College of

Information Technology, United Arab Emirates University (Al Ain, U.A.E.).

He was the principal investigator (PI) of over 40 research contracts totaling

over US$ 6M. He holds 11 patents, has received over 40 grants, given over 120

invited talks, and authored over 1640 technical papers in refereed journals and

international conferences, receiving six Best Paper Awards.

Dr. Beiu has received ﬁve fellowships including: Fulbright (1991), Human

Capital and Mobility (1994–1996) with King’s College London (“Pro-

grammable Neural Arrays” project), Director’s Funded Postdoc (1996–1998)

with Los Alamos National Laboratory (“Field Programmable Neural Arrays”

project, under the Deployable Adaptive Processing Systems initiative), and

Fellow of Rose Research (1999–2001). He has authored 12 chapters (out of

which 6 invited), and is working on three forthcoming books, out of which

one on emerging brain-inspired nano-architectures, and another one on the

VLSI complexity of discrete neural networks. His main research interests are

VLSI-efﬁcient designs (i.e., very low-power and highly reliable), and emerging

nanoarchitectures (massively parallel, communication starved, adaptive/recon-

ﬁgurable, regular, fault-tolerant) as well as their optimized designs (inspired by

systolic arrays, artiﬁcial neural networks, and hybrid combinations of these),

being the founder of the Centers for Neural Inspired Nano Architectures

(www.cnina.org). Dr. Beiu is a founding member of the European Neural

Network Society (ENNS, since 1991), and a member of: the International

Neural Network Society (INNS), the Association for Computing Machinery

(ACM), and the Marie Curie Fellowship Association (MCFA). He is a member

of the SRC-NNI Working Group on Novel Nano-architectures (since 2003), of

the IEEE CS Task Force on Nanoarchitectures (since 2005), and a contributor

to the International Technology Roadmap for Semiconductors—Emerging

Research Devices (since 2004). He has organized over 20 conferences and over

40 conference sessions, was the Program Chairman of the IEEE Los Alamos

Section (1997), and is an Associate Editor of the IEEE Transactions on Neural

Networks (since 2005).

Snorre Aunet (M’94–SM’06) received the degree in

electronics engineering from Trondheim Technical

College, Trondheim, Norway, in 1987, the Cand.

Scient. degree from the University of Oslo (UiO),

Oslo, Norway, in 1993, and the Dr. Ing. degree at the

Norwegian University of Science and Technology

(NTNU), Trondheim, Norway, in 2002.

From 1994 to 1997, he was an Analog ASIC de-

signer with Nordic VLSI, Norway. He has held short-

term positions as Assistant Professor and Associate

Professor at NTNU in 2002 and 2003, and as a Post-

doctoral Research Fellow at UiO from December 2003 to June 2006. He is cur-

rently a Research Scientist in the Department of Informatics, UiO. He has pub-

lished more than 50 scientiﬁc papers. His research interests include ultra-low-

power biologically inspired defect tolerant nanoarchitectures.

Dr. Aunet has coorganized and chaired sessions at international conferences

and has served as an expert evaluator for the EU Commission. He is a member

of the Centers for Neural Inspired Nano Architectures (www.cnina.org).

BEIU et al.: SERIAL ADDITION: LOCALLY CONNECTED ARCHITECTURES 2579

Jabulani Nyathi (M’02) received the B.Sc. degree

from Morgan State University, Baltimore, MD, in

1994, and the M.Sc. and Ph.D. degrees from the

State University of New York (SUNY), Binghamton,

in 1996 and 2000, respectively, all in electrical

engineering.

He is currently an Assistant Professor in the

School of Electrical Engineering and Computer Sci-

ence, Washington State University, Pullman, WA. He

has held academic positions at SUNY, Binghamton

(Adjunct Lecturer and Visiting Assistant Professor

1998–2001). His research interests include VLSI design, interconnection

networks, embedded systems, and computer architecture.

Dr. Nyathi is a Member of the Tau Beta Pi.

R. Robert Rydberg III (S’01) received the B.Sc. de-

gree in computer science and computer engineering

from Paciﬁc Lutheran University, Tacoma, WA, in

2003 and the M.Sc. degree in electrical engineering

from Washington State University, Pullman, WA,

in 2005. He is currently working toward the Ph.D.

degree in the School of Electrical Engineering and

Computer Science, Washington State University,

Pullman, WA.

He has held intern positions at Micron Technology

(CMOS Imaging Product Engineering), Sandia Na-

tional Laboratories (ASIC/SoC Design), and the Intel Corporation (Firmware

Validation). His research interests include VLSI design, interconnect networks,

embedded systems, and concurrent algorithms.

Walid Ibrahim (M’06) received the B.Eng. degree in

electrical engineering from Cairo University, Cairo,

Egypt, in 1992, and the Ph.D. degree in systems and

computer engineering from Carleton University, Ot-

tawa, ON, Canada, in 2002.

In September 2004, he joined the College of

Information Technology, United Arab Emirates

University, Al Ain, U.A.E., as an Assistant Professor

with the Computer System Engineering Department.

He is also an Adjunct Research Professor with the

Department of Systems Computer Engineering, Car-

leton University, Ottawa, ON, Canada. Before joining the United Arab Emirates

University, he held several software design and research positions in worldwide

leading telecommunication and semiconductor companies including Nortel

Networks, Alcatel, PMC-Sierra, and Siemens. Dr. Ibrahim research interests

include: reliability of nano-architectures, VLSI testing, resource allocation

and pricing in wireless data networks, applied optimization techniques, and

feasibility of nonlinear programming models.

High Performance Digital Circuit Techniques

Article

Alireza Sadrossadat

A Low-Power Yet High-Speed Configurable Adder for Approximate Computing

Conference Paper

Full-text available

May 2018

Approximate computing is an efficient approach for error-tolerant applications because it can trade off accuracy for power. Addition is a key fundamental function for these applications. In this paper, we proposed a low-power yet high-speed accuracy-configurable adder that also maintains a small design area. The proposed adder is based on the conventional carry look-ahead adder, and its configurability of accuracy is realized by masking the carry propagation at runtime. Compared with the conventional carry look-ahead adder, with only 14.5% area overhead, the proposed 16-bit adder reduced power consumption by 42.7%, and critical path delay by 56.9% most according to the accuracy configuration settings, respectively. Furthermore, compared with other previously studied adders, the experimental results demonstrate that the proposed adder achieved the original purpose of optimizing both power and speed simultaneously without reducing the accuracy.

A Low-Power Configurable Adder for Approximate Applications

Conference Paper

Full-text available

Mar 2018

Addition is a key fundamental function for many error-tolerant applications. Approximate addition is considered to be an efficient technique for trading off energy against performance and accuracy. This paper proposes a carry-maskable adder whose accuracy can be configured at runtime. The proposed scheme can dynamically select the length of the carry propagation to satisfy the quality requirements flexibly. Compared with a conventional ripple carry adder and a conventional carry look-ahead adder, the proposed 16-bit adder reduced power consumption by 54.1% and 57.5% and critical path delay by 72.5% and 54.2%, respectively. In addition, results from an image processing application indicate that the quality of the processed images can be controlled by the proposed adder.

New Metrics for the Reliability of Approximate and Probabilistic Adders

Article

Sep 2013

Addition is a fundamental function in arithmetic operation; several adder designs have been proposed for implementations in inexact computing. These adders show different operational profiles; some of them are approximate in nature while others rely on probabilistic features of nanoscale circuits. However, there has been a lack of appropriate metrics to evaluate the efficacy of various inexact designs. In this paper, new metrics are proposed for evaluating the reliability as well as the power efficiency of approximate and probabilistic adders. Reliability is analyzed using the so-called sequential probability transition matrices (SPTMs). Error distance (ED) is initially defined as the arithmetic distance between an erroneous output and the correct output for a given input. The mean error distance (MED) and normalized error distance (NED) are then proposed as unified figures that consider the averaging effect of multiple inputs and the normalization of multiple-bit adders. It is shown that the MED is an effective metric for measuring the implementation accuracy of a multiple-bit adder and that the NED is a nearly invariant metric independent of the size of an adder. The MED is, therefore, useful in assessing the effectiveness of an approximate or probabilistic adder implementation, while the NED is useful in characterizing the reliability of a specific design. Since inexact adders are often used for saving power, the product of power and NED is further utilized for evaluating the tradeoffs between power consumption and precision. Although illustrated using adders, the proposed metrics are potentially useful in assessing other arithmetic circuit designs for applications of inexact computing.

Estimated Computing for Effective Configurable Adder

Chapter

Mar 2023

Effective configurable adder is a traditional method used for obtaining high accuracy. In day-to-day life, everyone’s goal is to reduce delay and increase speed. The proposed study is mainly focused on these parameters for obtaining high accuracy, less power, and speed operation. The proposed adder includes an effective configurable adder with high-speed error detectable model with easy testability. By incorporating the process of traditional carry look-ahead adder, the proposed adder utilizes the propagation with carry masking method. Then, the accuracy has been tested to remain configurable at run-time. The results from the proposed study indicate that the implementation of proposed design on a Spartan3E FPGA utilizes 62% fewer slice registers and 47% fewer slice LUTs when compared with the standard configurable adder design.KeywordsConfigurable adderGracefully degrading adder (GDA)Carry look-ahead adderCarry maskable half adderVerilog

A Computational Error Method to Datapath Single Event Transient Analysis

Chapter

Jan 2023

The reduction of the size of the devices and the operating voltage are critical challenges in the design of digital circuits. This paper presents a computational error analysis approach for reliability evaluation in the circuit datapath. In this paper, Soft Error Rate (SER) is investigated in terms of dependency to the cell circuit structure and the input combination. Using this method, a sample cell library is evaluated to build up reliability profile for different implementation cases. Accordingly, error rate for different bits are extracted using Monte Carlo analysis, which have been reflected as a computational error. The proposed Method can be used besides other methods for analysing the reliability based computational error; instead of increasing the reliability via redundancy or other techniques, designers select a reliable design by using different implementations of a component. Considering reliability as a computational cost, during high-level synthesis (HLS) reliability can be traded with other implementation costs resulting in higher performance and lower implementation costs. In this paper, computational error is simulated with a Gaussian distribution. Again, in order to find the relationship between the coefficients of the Gaussian error model and error rate, the coefficients are fitted to a cubic polynomial. Results show this that fitting is suitable. #COMESYSO1120.

An Accuracy-Configurable Adder for Low-Power Applications

Article

Mar 2020

Addition is a key fundamental function for many error-tolerant applications. Approximate addition is considered to be an efficient technique for trading off energy against performance and accuracy. This paper proposes a carry-maskable adder whose accuracy can be configured at runtime. The proposed scheme can dynamically select the length of the carry propagation to satisfy the quality requirements flexibly. Compared with a conventional ripple carry adder and a conventional carry look-ahead adder, the proposed 16-bit adder reduced the power consumption by 54.1% and 57.5%, respectively, and the critical path delay by 72.5% and 54.2%, respectively. In addition, results from an image processing application indicate that the quality of processed images can be controlled by the proposed adder. Good scalability of the proposed adder is demonstrated from the evaluation results using a 32-bit length. (Full text: https://uarch.wordpress.com/publications/)

Nanotechnology and Nanodevices

Chapter

Sep 2011

The end of the ITRS roadmap for classical CMOS devices and circuits envisions the emergence of future nanotechnologies and nanodevices and also evidences many new related challenges. This chapter covers some of these issues using a tutorial presentation style.

Muller C-elements based on minority-3 functions for ultra low voltage supplies

Article

Apr 2011

Multiobjective optimization taking area, power con- sumption and robustness into account was used to pick two implementations of the minority-3 function as building blocks to implement Muller C-elements. According to our simulations, the generally better among the two implementations was a 12 transistor implementation based on a 10 transistor minority-3 gate, when compared to a 24 transistor implementation based on 2-input nand, 2-input nor and invert functions. For room temperature and a supply voltage of 150 mV, the simulated delays for the 12T and 24T implementations were 16.2 μs and 18.5 μs, respectively. The mean static power consumption figures were for the same conditions 2.6 pW and 7.4 pW, for the 12T and 24T implementations respectively. Switching energy was also simulated for a 150 mV supply voltage. The switching energy for the 12T version of the Muller C-element was almost 44 % lower compared to the 24T implementation. We also report delay, power and energy for a supply voltage of 300 mV.

Benefits of decomposing wide CMOS transistors into minimum-size gates

Article

Nov 2009

In this paper we show how decomposition of a wide CMOS transistor into a multi-finger FET with gates of minimum size can be beneficial for the reduction of delay and power-delay products in logic gates. This design possibility, which we call a minimum-split transistor (MST), seems to be largely overlooked in the literature. In a 90 nm CMOS process we compare the design to wide transistors. By exploiting the narrow-width effect, reduced parasitic capacitances from a shorter active channel and increased gate-drain spacing, we achieve up to 75-85% higher operation speed at a similar or reduced power consumption. The worst-case timing delay is reduced by 35-40% along with the nominal values. The design technique is considered valuable, in particular for critical time paths. The paper takes the perspective of subthreshold logic design at 200 mV, but the technique is also useful above threshold. A statistical experiment also investigates how Vth variation in MSTs changes with the number of parallell gates.

Soft Error Resilient System Design through Error Correction

Chapter

Full-text available

Jan 2008

This paper presents an overview of the Built-in Soft Error Resilience (BISER) technique for correcting soft errors in latches, flip-flops and combinational logic. The BISER technique enables more than an order of magnitude reduction in chip-level error soft rate with minimal area impact, 6-10% chip-level power impact, and 1-5% performance impact (depending on whether combinational logic error correction is implemented or not). In comparison, several classical error-detection techniques introduce 40-100% power, performance and area overheads, and require significant efforts for designing and validating corresponding recovery mechanisms. Design trade-offs associated with the BISER technique and other existing soft error protection techniques are also analyzed.

CMOS Analog Integrated Circuits Based on Weak Inversion Operation

Article

Jan 1997

Adaptive Switching Circuits

Article

Jan 1960
NEUROCOMPUTING

Robust Circuit and System Design Methodologies for Nanometer-Scale Devices and Single-Electron Transistors

Article

Jan 2003

Ultra low-cost defect protection for microprocessor pipelines

Article

Oct 2006
Comput Architect News

The sustained push toward smaller and smaller technology sizes has reached a point where device reliability has moved to the forefront of concerns for next-generation designs. Silicon failure mechanisms, such as transistor wearout and manufacturing defects, are a growing challenge that threatens the yield and product lifetime of future systems. In this paper we introduce the BulletProof pipeline, the first ultra low-cost mechanism to protect a microprocessor pipeline and on-chip memory system from silicon defects. To achieve this goal we combine area-frugal on-line testing techniques and system-level checkpointing to provide the same guarantees of reliability found in traditional solutions, but at much lower cost. Our approach utilizes a microarchitectural checkpointing mechanism which creates coarse-grained epochs of execution, during which distributed on-line built in self-test (BIST) mechanisms validate the integrity of the underlying hardware. In case a failure is detected, we rely on the natural redundancy of instructionlevel parallel processors to repair the system so that it can still operate in a degraded performance mode. Using detailed circuit-level and architectural simulation, we find that our approach provides very high coverage of silicon defects (89%) with little area cost (5.8%). In addition, when a defect occurs, the subsequent degraded mode of operation was found to have only moderate performance impacts, (from 4% to 18% slowdown).

Neuromorphic electronic systems

Article

Oct 1990

Carver Mead

Biological in formation-processing systems operate on completely different principles from those with which most engineers are familiar. For many problems, particularly those in which the input data are ill-conditioned and the computation can be specified in a relative manner, biological solutions are many orders of magnitude more effective than those we have been able to implement using digital methods. This advantage can be attributed principally to the use of elementary physical phenomena as computational primitives, and to the representation of information by the relative values of analog signals, rather than by the absolute values of digital signals. This approach requires adaptive techniques to mitigate the effects of component differences. This kind of adaptation leads naturally to systems that learn about their environment. Large-scale adaptive analog systems are more robust to component degradation and failure than are more conventional systems, and they use far less power. For this reason, adaptive analog technology can be expected to utilize the full potential of wafer-scale silicon fabrication.

Nanoelectronics and Information Technology

Article