ArticlePDF Available

Serial Addition: Locally Connected Architectures

Authors:

Abstract and Figures

This paper will briefly review nanoelectronic challenges while focusing on reliability. We shall present and analyze a series of CMOS-based examples for addition starting from the device level and moving up to the gate, the circuit, and the block level. Our analysis, backed by simulation results, on comparing parallel and serial addition shows that serial adders are more reliable while also dissipating less. Their reliability can be improved by using reliability-enhanced gates and/or other redundancy techniques (like e.g., multiplexing). Additionally, the architectural technique of short-circuiting the outputs (of several redundant devices/gates/blocks) exhibits "vanishing" voting and an inherent fault detection mechanism, as both transient and permanent faults could be detected based on current changes. The choice of CMOS is due to the broad design base available (but the ideas can be applied to other technologies), while addition was chosen due to its very solid background (both theoretical and practical). The design approach will constantly be geared towards enhancing reliability as much as possible at all the levels. Theory and simulations will support the claim that a serial adder is a very serious candidate for highly reliable and low power operations. Finally, our simulations will identify the V<sub>DD</sub> range where the power-delay-product and energy-delay-product are minimized. All of these suggest that a reliable (redundant) solution can also be a low power one if using serial architectures, while speed could still be traded for power (e.g., by dynamically varying the supply voltage both above and below V<sub>th</sub>).
Content may be subject to copyright.
2564 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 54, NO. 11, NOVEMBER 2007
Serial Addition: Locally Connected Architectures
Valeriu Beiu, Senior Member, IEEE, Snorre Aunet, Senior Member, IEEE, Jabulani Nyathi, Member, IEEE,
Robert R. Rydberg III, Student Member, IEEE, and Walid Ibrahim, Member, IEEE
Abstract—This paper will briefly review nanoelectronic chal-
lenges while focusing on reliability. We shall present and analyze
a series of CMOS-based examples for addition starting from the
device level and moving up to the gate, the circuit, and the block
level. Our analysis, backed by simulation results, on comparing
parallel and serial addition shows that serial adders are more reli-
able while also dissipating less. Their reliability can be improved
by using reliability-enhanced gates and/or other redundancy
techniques (like e.g., multiplexing). Additionally, the architectural
technique of short-circuiting the outputs (of several redundant
devices/gates/blocks) exhibits “vanishing” voting and an inherent
fault detection mechanism, as both transient and permanent faults
could be detected based on current changes. The choice of CMOS
is due to the broad design base available (but the ideas can be
applied to other technologies), while addition was chosen due to its
very solid background (both theoretical and practical). The design
approach will constantly be geared towards enhancing reliability
as much as possible at all the levels. Theory and simulations will
support the claim that a serial adder is a very serious candidate for
highly reliable and low power operations. Finally, our simulations
will identify the
range where the power-delay-product and
energy-delay-product are minimized. All of these suggest that a
reliable (redundant) solution can also be a low power one if using
serial architectures, while speed could still be traded for power
(e.g., by dynamically varying the supply voltage both above and
below
).
Index Terms—Addition, fault/defect tolerance, multiplexing,
nanoarchitectures, reliability, serial architectures.
I. INTRODUCTION
1
C
MOS scaling has been the means by which the semicon-
ductor industry has achieved its historically unprecedented
gains in productivity and performance quantified by the highly
cited Moore’s Law [3]. Scaling CMOS technology to the next
generation has always increased transistor densities, improved
performance, and reduced power consumption. The most recent
International Technology Roadmap for Semiconductors (ITRS)
Manuscript received January 15, 2007; revised July 16, 2007. This paper was
recommended by Guest Editor C. Lau.
V. Beiu and W. Ibrahim are with the College of Information Tech-
nology, United Arab Emirates University, Al Ain 17555, U.A.E. (e-mail:
vbeiu@uaeu.ac.ae; walidibr@uaeu.ac.ae).
S. Aunet is with the Department of Informatics, University of Oslo, Oslo
0316, Norway (e-mail: sa@ifi.uio.no).
J. Nyathi and R. R. Rydberg III are with the School of Electrical Engineering
and Computer Science, Washington State University, Pullman, WA 99163 USA
(e-mail: jabu@eecs.wsu.edu; rrydberg@eecs.wsu.edu).
Digital Object Identifier 10.1109/TCSI.2007.907885
1
This paper is a significantly expanded version of a conference paper entitled:
“On the advantages of serial architectures for low-power reliable computations”
[1], and incorporates parts from an unpublished presentation: “The vanishing
majority gate: Trading power and speed for reliability” [2], which have also
been extended and updated.
report [4] predicts that the semiconductor industry will still con-
tinue its success in downscaling CMOS for a few more genera-
tions. It also predicts that the scaling will become quite difficult
as the industry approaches the 16-nm technology node. Scaling
might continue further, but it is expected that alternative nan-
odevices will start to be integrated with CMOS onto a silicon
platform. The alternative nanodevices currently being evaluated
can be classified into solid-state (e.g., rapid single flux quanta,
1-D structures like nanowires and carbon nanotubes, resonant
tunneling devices, single-electron technology (SET), ferromag-
netic devices, spin devices, etc.) and molecular ones. However,
there are many fundamental and technical challenges that must
be resolved to continue the scaling of CMOS technology deep
into the nanometer regime [5]–[12]. Probably the three greatest
challenges are: power [13] (and the associated heat dissipation),
reliability, and interconnectivity [14], [15]. Some other difficult
challenges include (see [14]–[18]): verification, as well as logic
encoding and hybrid integration, and the overall complexity (of
design, test, and fabrication).
The remainder of this paper is organized as follows. In
Section II we review some of the challenges outlined above. In
Section III we go over the design of reliable gates, which was
started a long time ago, including rad-hard by design solutions.
We further present details of our own work on reliability. Four
different adders are analyzed in Section IV, with multiplexing
used for enhancing reliability. Three different solutions for im-
plementing multiplexing are introduced and compared. Finally,
in Section V, the power-performances of the four different
adders are estimated, and simulations are used to identify the
supply voltage
range where the power-delay-product
and energy-delay-product are minimized. These show that a
reliable design approach could also yield a promising balance
between the conflicting metrics of power and speed (which
will be improved by scaling). In Section VI, we summarize our
finding and discuss implications for further research.
II. N
ANOELECTRONIC CHALLENGES
A. Challenges Due to Scaling
Power dissipation (and the associated heat) is strongly
affected by the increasing leakage currents. With the advent
of the sub-100-nm CMOS technology (i.e., the “nanoera”),
leakage currents have reached a level that cannot be ignored
anymore. Leakage will continue to increase the static power
dissipation exponentially (about 5
at each generation at 30
C), till multigate transistors and high- dielectrics, which
are expected to reduce leakage down to about 10% and 1%,
respectively will become mainstream. Nevertheless, with fore-
casted
devices per chip [16] (for comparison the human
body has roughly
cells, while the Brain has about
1549-8328/$25.00 © 2007 IEEE
BEIU et al.: SERIAL ADDITION: LOCALLY CONNECTED ARCHITECTURES 2565
neurons and synapses), even SET, which is advocated for
its ultra-low-power will become power constrained (see, e.g.,
[19]).
With device geometries scaling below the 65-nm range,
the
available reliability margins are drastically being reduced
[20]. As a result, the reliability community will be forced
to thoroughly investigate accurate metrics able to determine
these margins, and how can we change our reliability assess-
ment methodologies to gain new reliability space for the most
advanced technologies. Currently, from the chip designers
perspective, reliability manifests itself more and more as
time-dependent uncertainties in electrical parameters. In the
sub-65-nm era, these device-level parametric uncertainties
will be too high to handle with prevailing worst-case design
techniques-without incurring signicant penalty in terms of
area, delay, and energy. Additionally, with continued scaling,
the copper resistivity is starting to increase sharply due to
interfacial and grain boundary scattering. Besides, the per-
formance issues associated with interconnect scaling, several
interconnect-related reliability issues are becoming trouble-
some (electromigration, stress migration, and heating), while,
equally important, others are increasing with scaling (poor
pattern denition, line-edge roughness, nanoscale corrosion,
low-
dielectric cracks, post-chemical-mechanical polishing
residues, etc.). The global picture is that reliability looks like
one of the greatest threat to the design of future integrated com-
puting systems. For emerging nanodevices and their associated
interconnects, the expected higher probabilities of failures, as
well as the higher sensitivities to noise and variations, could
make future chips prohibitively unreliable. The result is that
the current IC design approach-based on the conventional
zero-defect foundation might simply not work. Therefore,
fault- and defect-tolerance techniques that allow the system to
recover from manufacturing and operational errors will have to
be considered from the (very) early design phases.
Finally, complexity will certainly be the name-of-the-game
for nanoelectronics. Miniaturization will increase the device
density, which will subsequently increase the complexity of
every aspect related to the design and the manufacturing of
future chips. For example, the modeling complexity of a mul-
tilevel interconnect network in a Gigascle chip is of the order
of
coupling inductances and capacitances through-out a
nine-to-ten-level metal stack [11]. This complexity aggravates
many other problems like: testing and verication [21], integra-
tion and packaging [11], [16], and hybridization.
Before going further, it is important to highlight here that all
the challenges enumerated above are intimately entangled: re-
dundancy translating into higher power and connectivity.
B. Reliability Challenge
The reasons chip reliability is becoming a major hurdle are
on the one hand due to the continuous increase in internal elec-
trical elds and current densities, and on the other hand due to
the introduction of new materials and devices with unknown re-
liability behavior (let alone the decrease in the number of elec-
trons per device and the increase in the number of devices and
interconnects). Another reason for the heightened importance
of reliability is that in spite of these scaling trends, the market
has continuously been demanding higher reliability levels due
to the emergency of new applications. In the past, the reliability
margins have always been sufciently high and have been guar-
anteed at the technology level (e.g., based on accelerated stress
tests). Currently, the semiconductor industry approach is to ex-
tensively test the fabricated circuits and abandon most (if not
all) of those not operating correctly. Still, in 1994 Intel had to
start a massive recall campaign that cost US$ 475 millions when
it was discovered that the Pentium processor generated slightly
incorrect results for some oating-point operations. Very re-
cently, Microsoft announced that it expects to spend more than
US$1 billion (up to US$1.15 billion) to repair widespread hard-
ware problems (no technical details currently available) in its
Xbox 360 video game console. In the future, larger number of
devices will be deployed in many applications and embedded
systems, and intrinsic reliability could turn out to be a show-
stopper for economically viable technology scaling: the cost to
perform a recallespecially in the realm of failure-sensitive and
energy-conscious real-time embedded systemswill be expo-
nentially higher. Thus, there is very high pressure to make sure
that future nanoelectronic systems will be functioning correctly
over their lifetime—even if not free of defects and faults!
There is also an increasing concern that the massive scaling
of the CMOS devices will introduce extreme static and dynamic
parameter uctuations at the material, device and circuit levels
[8], [9], [16], [22]. Extreme parameter variations are a major
barrier to achieving reliable and predictable system implemen-
tations [23]. Typical increases in propagation delay and power
dissipation due to such uctuations are expected to be in the
30% to 50% above nominal for the 45-nm generation CMOS
logic circuits [16].
Additionally, soft errors will occur due to material decay, in-
terference or electrostatic discharge. Since capacitance and volt-
ages will be only a fraction of what they are today, very small
charges will be needed to ip a bit in memory, the output of
a gate, or the voltage on a wire. Although such an event is
highly unlikely per single device, soft errors are becoming an-
other reliability concern for future systems based on nanode-
vices [24][28], due to the expected massive number of devices
the system will have. For instance, a hypothetic one Terabyte
memory chip with a probability of soft error per single bit of one
per million years will experience a soft error about every 4 s!
Typically, data in memory is protected using error-correcting
codes-on top of stand-by spare rows and columns, and off-line
reconguration. However, mechanisms to protect latches and
ip-ops (that store state) and random logic have only recently
started to appear on the researchersagendas [25][28].
According to classical scaling theory, the gate insulator thick-
ness should shrink with the other transistor dimensions. The cur-
rent owing through the gate oxide causes reliability problems
by leading to long-term parameter shifts and eventually to oxide
breakdown. On top of these, interconnect scaling raises another
reliability concern for future nanodevices. In [11], Davis et al.
mentioned that the miniaturization of interconnects, unlike tran-
sistors, does not enhance their performance (for fresh perspec-
tives see [29] and [30]). Interconnect scaling will also signi-
cantly affect the circuit reliability due to increased crosstalk and
latency. Sakurai [7] concluded in 2000 that the interconnects
2566 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 54, NO. 11, NOVEMBER 2007
rather than transistors will be the major factor in determining
the cost, delay, power, reliability and turn-around time of the
semiconductor industry
.
Unfortunately, reliability problems for beyond CMOS tech-
nologies [31], [32], are expected to get even worse. The intro-
duction of new materials could sharply decrease reliability mar-
gins. Beyond CMOS device failure rates are predicted to be as
high as 10% (e.g., background charge for SET [33]) going up to
30% (e.g., self-assembled DNA [34], [35]). As a fresh example,
[36] has reported defect rates of 60% for a 160-kilobit molecular
electronic memory. Clearly, achieving 100% correctness at the
system levelusing suchdevicesand interconnects willbe not only
outrageously expensive but might be plainly impossible. Hence,
relaxing the requirement of 100% correctness for devices and in-
terconnectsmightreducethe costsofmanufacturing,verication,
and test [21]. Still, this will lead to more transient and permanent
failuresofsignals, logicvalues,devices,and interconnects.These
conicting trends will render technologists unable to meet failure
rate targets, and impose the delegation of reliability qualication
to designers, i.e., failures will have to be compensated at the ar-
chitectural level [14], [15], [17], [18], [37][42].
Previously fault-tolerance has been an issue only for safety-
critical designs, but as argued above it looks like it will be here
to stay and will become part of any future design [43], [44].
Any architecture that will disregard the fact that the underneath
devices and interconnects are unreliable is anticipated to be im-
practical [43], [45].
From the system design perspective, errors fall into one of
the following three classes: permanent, intermittent, and tran-
sient [44]. The oldest and most commonly used fault model
is the stuck-at model. It is not clear if emerging technolo-
gies will not require new fault models [43], [46], or if mul-
tiple errors might have to be dealt with [27]. Recently, even the
well-established assumption of a bounding constant probability
of failure for each gate was challenged [47][49]. These papers
have shown that approximating the gates probabilities of failure
by (bounding) constants introduce sizeable errors, leading to
over-design.
The well-known approach for developing fault-tolerant
architectures in the face of uncertainties (both permanent and
transient faults) is to incorporate redundancy [50]. Redundancy
can be either static (in space, time, or information) or dynamic
(requiring fault detection, location, containment, and recovery).
Space (hardware) redundancy relies on voters (e.g., generic,
inexact, midvalue, median, weighted average, analog, hybrid,
etc.) and includes among others the well-known: modular
redundancy, cascaded modular redundancy, and multiplexing
(including von Neumann multiplexing [50], enhanced von
Neuman multiplexing [51], [52], and parallel restitution [53]).
Time redundancy is trading space for time (e.g., alternating
logic, recomputing with shifted operands, recomputing with
swapped operands, etc.), while information redundancy is
based on error detection and error correction codes. Hybrid
approaches are also known, e.g., time-shared triple modular re-
dundancy, recomputing with triplication with voting, hardware
partitioning in time redundancy, recomputing with partitioning
and voting, quadruple time redundancy (see [54][59]), as well
as reconguration [40].
Some of the reliability-enhanced schemes enumerated above
can be implemented at several different levels: device, gate, cir-
cuit, block, and system. All of them have in common that im-
proved reliability is traded off for increased area (number of de-
vices) and higher connectivity, while it is expected that these will
lead to higher power consumption and/or slower computations
[60]. As an early example for nanoelectronics, Roychowdhury et
al. [61] suggested that a quantum-dot cellular automata circuit
implemented with a redundancy factor
(i.e., the number
of replicated identical copies) will be able to perform correctly
with very high probability even if 15% of the devices failed.
Till now, VLSI designers did not (have/want to) care about
reliability, which was characterized at the technology level. In
the future only material and device engineering will not suf-
ce anymore for tackling reliability. No silver bullet will be
able to cope with all the types of faults in nanoscale circuits
and systems, and a combination of several techniques will cer-
tainly be needed [62]. Boosting reliability will require more and
more a cooperative involvement of the logic designers and ar-
chitects, where high-level techniques will rely upon lower levels
support based on novel modeling and electronic design automa-
tion (EDA) tools.
In the following, we shall go through a series of examples
starting from the device level and going towards the system
level, by trying to emphasize the synergy between the design
and the technology levels, and qualitatively and quantitatively
analyzing the benets such an approach would bring.
III. M
ORE RELIABLE
GATES
The design of more reliable gates has been of high in-
terest when vacuum tubes were the elementary devices [50],
[63], [64]. By that time, threshold logic (including majority
and ternary logic) was an active research topic, even used in
building some of the early computers. During 1957 and 1958,
Rosenblatt together with Charles Wightman and others con-
structed the Mark I Perceptron having 512 adjustable weights
(see [65]). Shortly afterwards, Bernard Widrow together with
his students developed the ADALINE (ADAptive LINear
Element) [66]. The next threshold logic computer was DONUT
[67], followed later by Setun [68], [69]. Due to the low relia-
bility of vacuum tubes, fast elements on miniature ferrite cores
and semiconductor diodes were designed for implementing
ternary logic. Brousentsov even stated that: Ternary threshold
logic gates, as compared with the binary ones, provide more
speed and reliability, and required less equipment and power.
These were the reasons to design a ternary computer[68].
With the advent of MOS and CMOS integrated circuits, such
topics were quickly forgotten. Still, radiation hardened [70] has
constantly been in demand for special applications. Rad-hard by
design solutions can be classied into either layout-level (i.e.,
based on modifying the layout) or switch-level (i.e., based on
modifying the circuit at the transistor level) techniques, while
hybrid and adaptive techniques have also been developed.
Most circuit designers have used the switch-level approach.
A CMOS fault tolerant gate based on encoding the circuit out-
puts with Berger codes (error correcting codes) requires the in-
troduction of additional networks to provide tolerance to single
stuck-at faults, as well as to a number of multiple faults, while
BEIU et al.: SERIAL ADDITION: LOCALLY CONNECTED ARCHITECTURES 2567
Fig. 1. Radiation hardned by design. (a) Differential fault-tolerant CMOS gate [73]. (b) Soft error suppression technique for domino logic [74]. (c)
Single event
upset hardned inverter in silicon-on-insulator [75]. (d) Hardened majority voter [75] (based on the classical output-wired-inverters [78]).
Fig. 2. Device level build-in redundancy. (a) High matching techniques (used in analog designs) [77]. (b) Interesting enhancement of the output-wired principle
[82]. (c) High matching technique applied to multiplexed SET circuits [52], [83]. (d) PLA style circuit with device level duplication [84], [85].
also reducing unidirectional faults [71]. Another device-level
approach is quadruplication applied to combinatorial CMOS
gates both at the net (the p-stack and the n-stack are separately
quadruplicated as a whole) and at the transistor level (every tran-
sistor is quadruplicated preserving the interconnection topology
of the net) [72]. An alternate solution can be seen in Fig. 1(a).
It duplicates the n-stack and the p-stack and adds cross-coupled
transistors, achieving fault tolerance with marginal performance
degradation [73]. A low-power soft error suppression technique
for dynamic logic can be seen in Fig. 1(b). It adds pass transistor
device(s) as isolation and weak keeper(s) to standard domino
logic [74]. Optimizing the size of the keeper (layout-level) can
also help. As can be seen from these few examples, the solu-
tions developed are quite elaborate. Under switch-level redun-
dancy, we should also include active biasing and isolated well
transistors [75] [see Fig. 1(c)]. These prevent transients in com-
binational logic from reaching the output node. Such approaches
complement noise-immune designs like [76], [77], and can be
combined with hardwired voting [see Fig. 1(d)], e.g., based on
the classical output-wired-inverters idea [78].
In parallel, the analysis of the failure mechanisms of in-
tegrated circuits, has led to dening layout rules enabling
to improve the testability of circuits [79]. Later layout-level
design for testability rules were used for avoiding some hard to
detect faults or even undetectable faults [80]. Such layout rules
include: redundancy of contacts, ring-shaped or close loop
conductive layers, and duplication of interconnections and I/O
conductive paths. These avoid some open faults, or reduce their
appearance probability. Another layout techniqueborrowing
from analog designsis high matching [Fig. 2(a)]. This has
been used in a combined switch- and layout-level approach
for enhancing the noise immunity of threshold logic gates
[77]. Additionally, gate level redundancy and shorted outputs
were suggested in [81]. Having their roots in hardwired voting
[78], other solutions have been detailed for CMOS [82] [see
Fig. 2(b)], as well as for SET [52], [83] [Fig. 2(c)], and very
recently for programmable logic-array (PLA) nanocircuits
[84], [85] [Fig. 2(d)]. These rely on built-in transistor-level
redundancy (e.g., [86] quadruplicates the transistors), with
redundant signals added on top. Such designs can be combined
at higher levels with system-level voting (like e.g., [51]), and/or
with error correcting codes [62].
Among the few other robust circuit and system design
methodologies targeting nanodevices we should mention here
the use of threshold logic gates for evaluating an analog average
[87], [88], while [81] advocates for real time recongurable
threshold elements (see also [89]).
Other novel process variation compensation techniques being
designed and evaluated range from [90], trying to optimally ad-
justs the strength of the keepers for domino logic gates (based
on an on-die leakage current sensor), to [54] which uses active
body bias and transient noise attenuation via voltage division.
2568 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 54, NO. 11, NOVEMBER 2007
Fig. 3. Interconnect pattern for parallel adders. (a) BK. (b) HC. (c) KS. (d) Layout of a 64-bit KS adder.
All these solutions are highly innovative, but there is a clear
need for comparing such non-standard CMOS designs. Only a
few results based on Monte Carlo simulations for CMOS have
started to be reported (see e.g., [91]), while there are almost none
for beyond CMOS nanotechnologies [47], [48], [83].
We conclude this section by mentioning that a growing
number of publications are dealing with such problems. This
is encouraging, while there is a tendency to move too quickly
to the higher levels [24], [26], [55][57], [92]. We believe
that this should be done at a slower pace as the expectation
is that the highest reliability rewards will be at the lowest
level, so one should rst of all take advantage of (all of) the
low-lying fruits. The implication is that many versions of the
same gate, having different reliability performances, should be
designed and tested. These could lead to designs equivalent to
multiple-
ones, which have long been advocated and used in
the bid for low power. In the future, EDA tools might use such
extended libraries of gates for optimizing circuits reliability,
in the same way current EDA tools are using multiple-
and
multiple-
for optimizing power consumption.
IV. R
ELIABLE ADDERS
A. Theoretical Analysis
Binary addition has been studied extensively, starting with the
classical (serial) ripple carry (RC) adder and going towards par-
allel implementations [93][97]. It is commonly accepted that
RC is the slowest, while KoggeStone (KS) [94] is, theoreti-
cally, the fastest, but requires of the order of
more transis-
tors (which translates into larger area and powerboth dynamic
due to longer wires, and leakage due to more transistors/gates).
Still, only a few recent studies have analyzed the reliability of
adders [98][108]. To get a clear understanding, four different
adders will be analyzed in this section. The four adders under
investigation are:
the classical RC;
BrentKung (BK) [96] [Fig. 3(a)];
HanCarlson (HC) [97] [Fig. 3(b)];
KS [94] [Fig. 3(c)].
All four adders have been characterized by their number of
layers, their number of nodes (i.e., blocks), their number of
gates, and the length of their wires on the longest (critical) path
(for details, see e.g., [109]).
The number of layers has been estimated as
, , , and
.
The number of nodes has been estimated as
, ,
, and .
The number of gates has been estimated as
,
,
, and .
Finally, the length of the wires on the critical path was esti-
mated geometrically as
,
, ,
.
The factor 2 used for multiplying (the number of) Layers
when estimating the Length of BK, HC, and KS is conservative,
and accounts for the height of the nodes and for the routing space
between adjacent layers. For supporting this claim, compare the
schematic of a 16-bit KS adder [Fig. 3(c)] with the layout of a
64-bit KS adder [Fig. 3(d)] which was drawn at the same ver-
tical scale (other layouts can be found in [110][113]; see also
[114]). Remark: All the equations enumerated above have been
rened using ceilings and oors (when appropriate), while for
even more accurate estimates the interested reader should con-
sult [115], [116].
The reliability of these four adders was quickly (but roughly)
estimated as
(1)
where
is the gate failure probability, and Gates is the
number of gates of the adder (given by the equations detailed
above). The results are plotted in Fig. 4, and are supporting the
intuition that a simpler structure is more reliable. The interested
reader can nd a few more simulation results for RC, BK, and
KS in [99], [100], as well as some theoretical results (for RC)
in [106]. Additionally, any redundancy scheme is much easier
to integrate with RC than with a parallel adder [2], [117] (or
in general with any locally connected architecture like: systolic
arrays, cellular neural networks, cellular automata, etc.).
B. Practical Multiplexed Adders
Multiplexing has beenadvocated as a powerfulsolution for en-
hancing reliability, unfortunately one that requires very large re-
dundancy factors [50], [53], [118]. Fresh detailed analyses and
exactsimulationshaverevealedthattherequired redundancyfac-
tors are considerably smaller than those predicted by theory. A
detailed performanceevaluation of enhanced multiplexing(
MUX)
schemes has been reported in [51], [52], with simulation results
BEIU et al.: SERIAL ADDITION: LOCALLY CONNECTED ARCHITECTURES 2569
Fig. 4. Estimated probability of failure of different adders (RC, BK, HC, and
KS) versus the number of input bits for
.
detailed in [47], [83]. These papers conrm that there is a max-
imum threshold for
up to which MUX
schemes improve
on the reliability of the individual gates. For SET and
the
values obtained from Monte Carlo simulations are
for
NAND-2 MUX and for MAJ-3 MUX (when implemented
using capacitive SET), i.e., much higher than the theoretical pre-
dictions (0.0107 for
NAND-2 and respectively 0.0197 for MAJ-3),
andarealsoconrmedbyrecent PTMsimulations[48].Foraclear
understanding of these unexpected results, the interested reader
is referred to [119] and [120]: the fact that each gate is made of
unreliable devices is used in explaining (both theoretically and
through exact simulations) the detailed behavior of the system.
An obvious solution would be to increase the redundancy factor
to
, 7, 9, 11 (which will be briey presented later), while
even better approaches are possible.
Based on all of the above (and on the simulation results
from [121]), we decided to focus our attention on
MUX RC (see
Fig. 5). In Fig. 5, the main block of the RC is the well-known
full adder (FA). For a detailed PTM reliability analysis of
four FAs see [49], [122]. The FA we will consider here is the
standard CMOS implementation (Fig. 6 shows only the carry
out circuitry), but many other implementations are possible
[49], [122] (with MAJ FAs more reliable than classical FAs).
For enhancing reliability, a
MUX RC with a redundancy factor
(at the adder level) of
is used (3-RC). Fig. 5(a) presents
a block diagram of the standard RC. A 3-RC [see Fig. 5(b)] has
three FAs (squares) per stage as follows:
three FAs (used in parallel) represent the execution stage
of a von Neumann
MUX (vN-MUX);
three MAJ-3 gates (circles) represent the restorative stage
of a vN-
MUX (for the three coming from the three FAs
at bit position
);
the output of each of these three MAJ-3 (circles) is used to
drive the next three FAs (i.e., they represent three
at bit
position
).
This is the standard MAJ-3 vN-
MUX. Still, this basic idea
can be implemented in several different ways out of which
Fig. 5. (a) Classical RC adder where the square blocks represent FAs. Three
different multiplexed 3-RC adders: (b) using MAJ-3 gates (circles); (c) short
circuiting the outputs of three FAs and using three inverters (triangles) to re-
cover the voltage; and (d) short circuiting the outputs of three FAs (the voltage
is recovered by the next FAs).
three are detailed here, with each subsequent conguration
being simpler than the previous one. The rst of the three
structures[Fig. 5(b)] properly implements three MAJ-3 gates
for the restorative stage (represented as circles). This solution
roughly doubles the delay and increases power signicantly.
The second solution [Fig. 5(c)] is simpler, as the outputs of the
FAs are fed to restorative inverters (triangles). The MAJ-3 gates
have now been replaced by output-wired-inverters (remember
[78]). This solution will be faster, and will dissipate less than
the previous one, as long as there are no faults/defects. In case
of faults/defects, there will be ghting, which will increase the
power consumption, while the inverters will try to restore the
correct logic levels. The simplest structure [Fig. 5(d)] elim-
inates even the restorative inverters and relies upon the next
(stage of) FAs for providing the needed signal restoration (see
also [81]). The restorative MAJ-3 gates have now completely
vanished. This solution will be the fastest, as long as there are
no faults/defects. In case of faults/defects the shorting of the
outputs will result in ghting, increasing the current and the
signal propagation delay while the system could still operate
correctly (this behavior is correctly know as graceful degra-
dation). These three structures have been tested for stuck-at
faults. This is a simplistic scenario, as in practice a fault/defect
could manifest itself as an analog value (see [2], [46], [78],
[88], [117], [121]).
MUX using short-circuited outputs of three FAs (MAJ-3 gates
implemented as mirrored adders) is shown in Fig. 6, and corre-
sponds to one stage of MAJ-3 vN-
MUX. In principle, the failure
of one transistor can make a circuit malfunction. This is not the
case when redundancy is introduced. All the transistors on the
schematic have been labeled to enable ease of tracking when an-
alyzing defects. The schematic in Fig. 6, using a 90-nm CMOS
process has been used for simulations. The supply voltage was
lowered down to
mV. Output voltages, in milli-
volts, for all the eight possible input combinations are shown
in Table I. The last column, labeled Defect(s), lists the transis-
tors that were removed from the schematic (representing defects
within the circuit). In this example, the only error appears for the
input combination 001 when transistors P3, P8, and P13 are
removed. This gives 15.04 mV at the output, while the output
should be logic 1.
2570 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 54, NO. 11, NOVEMBER 2007
Fig. 6. Multiplexed MAJ-3 gates (mirrored adders) with short-circuited outputs.
TABLE I
O
UTPUT OF
THREE
SHORT-CIRCUITED MINORITY
(MIRRORED ADDER)G
ATES
(SEE FIG.6)
IN
90
NM CMOS AT
mV
Fig. 7. Estimated probability of failure of four different adders (RC, BK, HC, and KS) and the multiplexed RC (3-RC, 5-RC, 7-RC, 9-RC, and 11-RC) versus the
number of input bits for: (a) gate failure probability of 1%; (b) gate failure probability of 10%.
Fig. 7 compares the results of the
MUXed RC (3-RC) adder
with the standard RC, BK, HC, and KS adders (see Fig. 4).
We have used
in Fig. 7(a), and
in Fig. 7(b) (i.e., 1% and respectively 10%). These two values
have been used in
for RC, BK, HC,
and KS. For 3-RC we have used the probability transfer matrix
(PTM, see [123], [124]) to calculate exactly the reliability
of one FA block. We have used PTM to evaluate the relia-
bility of one
MUXed block [48], [122] and have
used
to estimate the probability
of failure of the 3-RC as a whole. Using PTM (which gives
exact results), as well as exhaustive counting [119] simul-
taneously, we have determined
for
, and for
(see also, [48], [49]). These simulation results show that im-
plementing
MUX at the smallest redundancy factor could still
improve reliability. The case when
(i.e., 1%)
is presented in Fig. 7(a). For 16-bit adders
MUX was able to
reduce the probability of failure of the RC adder from 0.35
to
(3-RC), i.e., about 35 times. Except 3-RC, none
of the other adders would be able to operate correctly when
. When is increased to 0.1 (i.e., 10%
errors), the simulation results from Fig. 7(b) show that not
even 3-RC is good enough. This suggests that more redun-
BEIU et al.: SERIAL ADDITION: LOCALLY CONNECTED ARCHITECTURES 2571
Fig. 8. Current (worst case) for MAJ-3 MUX RCs (90 nm at 275 mV): (a) using MAJ-3 gates [Fig. 5(b)]; (b) when short circuiting the outputs and using inverters
[Fig. 5(c)]; (c) with short-circuited outputs [Fig. 5(d)]; and (d) when short circuiting the outputs and using inverters (70-nm BPTM [158] at 200 mV).
dancy is needed. The normal solution would be to increase
(long advocated by von Neumann [50]). Using exact counting
arguments [119], we have determined the
for the
case of MAJ gates of fan-in
.For these are
as follows
, ,
, ,
. These have been used to plot
3-RC, 5-RC, 7-RC, 9-RC, and respectively 11-RC in Fig. 7(b).
It can be easily seen that 11-RC for
[Fig. 7(b)] is
about as reliable as 3-RC for
[Fig. 7(a)]. This
represents a sizeable increase of the redundancy factor (from
to ). We advocate for a less costly solution,
namely to keep
and use more reliable gates (like the ones
presented in Section III) in the design of the elementary FAs.
Such an approach would rst of all reduce
, and should
make the combined approach (reliable gates by design and
enhanced low redundancy
MUX) perform similarly to the 3-RC
presented in Fig. 7(a). Such claims require massive Monte
Carlo simulations for the
MUX RC with hardened (reliability
enhanced) gates which we have started to investigate.
The last two solutions [Fig. 5(c) and (d)] have one more ad-
vantage, namely the fact that an error will cause ghting. Ap-
parently this would seem to be a disadvantage, as it is going
to increase the current and the power consumption (also heat
and temperature) when a fault or defect occurs. This is abso-
lutely true, and simulations for two technology nodes can be
seen in Fig. 8. These simulations show that each error (defect
or fault) translates into an increase in current. In case of a tran-
sient fault, the current will increase but will return to its nom-
inal value once the fault disappears. In case of a defect, the
current will increase and remain high. This means that we have
a quite simple way to detect errors: by monitoring the current
[2], [90], [125] (or, equivalently but less precisely, heat). This
will allow us to log faults, and also to identify the case when
a fault becomes a defect (see the stair-stepping behavior of the
current in three of the four simulations detailed in Fig. 8, when
the number of faults increases). Knowing that the circuit can
tolerate a few defects, we could set a certain threshold value for
a current sensor (built-in
testing). If the current becomes
larger than a threshold, the sensing circuit will automatically
send a request to the higher level. A local control scheme could
automatically:
recongure the I/O connections;
power-up a spare unit;
shut down the defective circuit (hot swap).
Such an approach combines several reliability schemes:
low area (i.e., small redundancy transistor level schemes)
highly reliable gates;
small redundancy factors gate-level designs;
automatic (current-based) detection at the circuit/block
level; followed by
reconguration at the block/system level.
2572 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 54, NO. 11, NOVEMBER 2007
This would span not only all the design levels, but also four
different schemes for enhancing reliability, leading to a self-
healing system. Such an approach should be similarly to the
way the brain is minimizing energy. We should decrease the
supply voltage till the circuit starts making errors, and continue
decreasing the voltage a bit more. This will force the circuit to
make errors (always), but the redundancy scheme used will take
care of them. Fundamentally, as a redundant circuit can tolerate
(a few) errors, it should try to make (a few) errors. Hence, one
way for achieving minimum energy when using a redundant cir-
cuit is to make errors, and this is also how the system knows
that it is running close to the minimum. One last remark is that
such an approach is not at all straightforward when using the
classical MAJ-3 vN-
MUX
presented in Fig. 5(b). This is because
such a solution does not lead to ghting, so the current cannot be
used (easily) for detecting faults and defects [see corresponding
simulations in Fig. 8(a)]. This suggests that certain circuits (de-
signs) are better tted for local (automatic) current sensing tech-
niques than others.
V. P
OWER-PERFORMANCE CONSIDERATIONS
Reliability by itself cannot be the one and only design goal,
and it should certainly support performance as represented by
the well-known speed-power or energy-delay tradeoffs. Many
results concerning adders have been reported over the years,
with recent ones including both simulations and measurements
[114][116], [126], [127]. On top of these, other results have
been reported using threshold logic instead of Boolean logic
[19], [128], [129], and even mixed Boolean-threshold logic so-
lutions have been advocated [130]. Originally, the speed was
the only goal, with speed-power and energy-delay optimizations
emerging later [116].
One approach in the bid to lower power dissipation is the re-
duction of
obviously the most effective way of reducing
all the components of power (dynamic, static, and leakage)
[131]. The aggressive scaling of
to below (known as
subthreshold operation) has been known since 1972 and used in
ultra low-power designs [132][140]. The major disadvantage
of subthreshold operation is its very slow speed. Therefore,
subthreshold operation has been considered a poor approach
where the much-needed speed is sacriced for ultra-low power.
This has drastically limited its application range. Still, while
reducing
might save the day for power consumption, it
is detrimentally affecting reliability. That is because in sub-
threshold noise plays a signicant role [24], [28], [141], let
alone the higher sensitivity to variations [22], and the reduced
noise margins. In spite of these, quite a few subthreshold results
have been reported recently [142][147]. Falling in this trend,
an RC and a KS operated in subthreshold have been compared
in [121]. The main conclusions were that:
the wires reduce the speed advantage of KS over RC from
4.5
to 2.2 (other results showing that wire delays in
parallel adders are non-negligible were presented in [116]);
the higher speed of KS at a given
can be matched by
an RC at a
which is only 10% to 20% higher;
at equal speeds the RC still maintains both its power and
energy advantages.
Obviously, wires are playing an important role in sub-
threshold, affecting the delay and inuencing the dynamic
power. The delay of the adders was evaluated taking into
account both the number of gates on the longest path (which
is
for KS, HC, and BK,
while being only
for RC, and the
length of the wires on the longest path (Length)
(2)
When
, only the gates are contributing to the Delay, while
the wires are not. By increasing
, the wires (Length) start af-
fecting the Delay more and more.
For estimating Power, both the length on the longest path
(Length) and the total number of gates (Gates) were con-
sidered. These account for part of the switching capacitance
(Length)hence, dynamic power is subestimated for the par-
allel addersand for the leakage power (Gates). A factor
was used for specifying (indirectly) the ratio between dynamic
and leakage power leading to the following estimate:
(3)
Spice simulation results (from [121]) were used to ne-tune the
estimates for the Delay [given by (2)] and Power [given by (3)].
By tting the simulation results obtained for RC and KS,
and (equivalently, for a leakage power of about
33% of the dynamic power as
)
have been determined. Probably both
and could be raised
even higher (up to 0.50), while for power more accurate esti-
mates are available (e.g., [115], [116], [148]), but these will not
(substantially) changes the trends.
Finally, the power-delay product (PDP) and the energy-delay
product (EDP) have been estimated in a straightforward manner
as
(4)
(5)
The results of these approximations can be seen in Fig. 9, where
Delays, PDPs, and EDPs are shown for two cases:
without wires and with no leakage power (i.e., for
and ), see Fig. 9(a)(c).
with a delay and power comparable with that of
low-voltage subthreshold operation (i.e., for
and
), see Fig. 9(d)(f).
Obviously, when scaling down CMOS,
will increase so the
Delay of the KS, HC, and BK adders will increase, but KS, HC,
and BK are always going to be faster than RC (as long as
).
Still, the more interesting results are the ones showing PDPs and
EDPs. When wires and leakage are being accounted for (i.e., for
), the RC gets the best PDP always [Fig. 9(e)],
while achieving the best EDP for any
[Fig. 9(f)]. These
results should get even better for practical implementations, as
the power for KS, HC, and BK was underestimated, while the
estimates for RC are quite accurate. Even more, these results
will improve with scaling as
and should increase.
The plots in Fig. 9 support the claim that serial adders achieve
better PDP and EDP than parallel adders in nanotechnologies,
and bear more weight when one or more of the following are
true:
BEIU et al.: SERIAL ADDITION: LOCALLY CONNECTED ARCHITECTURES 2573
Fig. 9. Estimates of the Delay [(a) and (d)], the PDP [(b) and (e)], and the EDP [(c) and (f)] for the four different adders analyzed (RC, BK, HC, and KS) wit
h
[(a), (b), and (c)] and without [(d), (e), and (f)] wire effects.
TABLE II
32-
BIT RCA SIMULATIONS FOR VARYING FROM 100 MV TO 700 MV (70 NM BPTM)
the CMOS circuits are operated in subthreshold;
elementary devices have small gain (e.g., SET, molecular,
DNA);
leakage power represents a signicant part of the power
consumption;
wires are introducing considerable delays.
Detailed Spice simulations for RC are shown in Fig. 10, with
data presented in Table II, while results for 3-RC are detailed in
Table III.
The four different adders we have analyzed show very dif-
ferent power-delay tradeoffs both when working correctly and
when faulty. The somehow unexpected result is that the most re-
liable adder can also be a low-power one, but unfortunately it
is not the fastest one. Still, it looks like the speed advantage will
not be sustainable when gain will become very small (e.g., SET,
molecular, DNA). Remark: Some improvements will still be
possible by using carry-bypass adders and special layouts (e.g.,
8-like snail geometries) which would allow for all the wires to
be very short (somehow equivalent to the folding of the cortex).
Finally, subthreshold operation might become an interesting
design approach particularly because the operation speeds are
improving as scaling proceeds towards smaller technology
nodes [121], but also because it might allow for an easier
hybridization with ultra low-voltage technologies (e.g., SET,
molecular, DNA). It is not difcult to envision a situation in
which designs in older CMOS technology nodes operating at
standard power supply voltages (nominal
) would have
comparable operation speeds to those in advanced technologies
but operating in subthreshold. This would mean for example
that a microprocessor or DSP designed to run at 1 GHz in
2574 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 54, NO. 11, NOVEMBER 2007
Fig. 10. Effect of varying (32-bit RC) on power, PDP, and EDP (90-nm top, and 70-nm [158] bottom). Optimum operating point is marked with a circle.
TABLE III
P
ERFORMANCES OF DIFFERENT MAJ-3 MUX (REDUNDANT)RCADDERS (SEE FIG.6)
0.18- m (or 0.13- m) at nominal could be redesigned
to operate at 1 GHz in subthreshold in say 22 nm (or 16 nm).
The main advantage would be a power reduction of one to
two orders of magnitude. Even incorporating redundancy, such
a solution (1 GHz 22-nm subthreshold) could still dissipate
signicantly less than the one we have started from (1 GHz
0.18-
m nominal ), while also being highly reliable. The
research path is being opened for portable applications that
could enjoy highly extended operation times [131], and also
towards low-power analog computations [117], [149][151].
VI. C
ONCLUSION
Although we have not discussed memory and communica-
tions,some would saythat these willbe non-volatileand probably
electro-optical, and they might be right. Still, it is far from clear
how future computations will be implemented. Irrespective of
the technology, reliability will be a major concern. Based on the
examples presented in this paper we should expect the following.
Serial solutions (or more generally locally connected ones
like systolic arrays, cellular neural networks, cellular au-
BEIU et al.: SERIAL ADDITION: LOCALLY CONNECTED ARCHITECTURES 2575
tomata, etc.) seem to be a very good bet from the power-re-
liability perspective.
Reliable designs for implementing computations can be
low-power (e.g., [55], [57], [90], [152][156]). Possible
explanations can be found in neural computations and com-
munications [157], which suggests that unreliable devices
(i.e., neurons and synapses) can compute and communicate
reliably by cleverly combining redundancy and encoding,
while simultaneously minimizing energy. Additionally,
encoding could help both for reliable communications and
for reducing power.
For certain designs, defects and faults could manifest them-
selves as reasonably well-dened current steps. If this is the
case, current sensors could automatically trigger recong-
uration at the higher levels [2].
Redundant designs should not be weighted with respect to
(their) redundancy factors (as commonly done in the liter-
ature), but with respect to power, energy, and area, as it is
customary in the VLSI community. This also implies that
circuit optimization will have to be done in four dimensions:
delay, area, power, and reliability (for more details on these
lines see [15] and [17]).
Algorithms for quickly and accurately estimating the relia-
bility of large (complex) systems will have to be developed
[159].
Optimization will become even more difcult, as there are
many more options when trading power and speed versus
reliability.
Subthreshold operation might be a simple and practical way
to test novel reliability ideas, as a subthreshold design in 65
nm might be as sensitive asa CMOS design in 22 nm at
nominal
, or a molecular, or a SET one, etc.
Finally, one more aspect that certainly deserves attention-and
was not discussed here-is that of asynchronous or self-timed
circuits. MAJ gates are well positioned here again [160], as a
C-Muller element is nothing else but a MAJ gate with feedback.
Such designs will be needed for two reasons: (
i) the fact that
delay variations will have to be compensated (and it might be
cheaper to use self-timed circuits than hardwired redundancy,
but as far as we know nobody seems to have a good/detailed
answer for this tradeoff); and (ii) the fact that the power required
for clock distribution is already prohibitive. It follows that many
more items should be added to an already busy reliability-related
research agenda.
R
EFERENCES
[1] V. Beiu et al., On the advantages of serial architectures for
low-power reliable computations, in Proc. Int. Conf. Appl.-Specific
Syst., Arch. Processors (ASAP’05), Samos, Greece, Jul. 2005, pp.
276281.
[2] V. Beiu et al., The vanishing majority gate: Trading power and
speed for reliability, in Proc. Defect Fault Tolerant Nanoscale Arch.
(NanoArch’05), Palm Springs, CA, May 2005 [Online]. Available:
www.eecs.wsu.edu/~vbeiu/Publications/2005NanoArch.pdf
[3] G. E. Moore, Cramming more components onto integrated circuits,
Electron. Mag., vol. 38, pp. 114117, Apr. 1965.
[4] International technology roadmap for semiconductors, ITRS, San Jose,
CA [Online]. Available: http://public.itrs.net
[5] K. F. Goser et al., Aspects of systems and circuits for nanoelectronics,
Proc. IEEE, vol. 85, pp. 558573, Apr. 1997.
[6] S. Borkar, Design challenges of technology scaling, IEEE Micro, vol.
19, no. 4, pp. 2329, Jul. 1999.
[7] T. Sakurai, Design challenges for 0.1-
m and beyond,in Proc. Asia
South Pacific Design Autom. Conf. (ASP-DAC’00), Tokyo, Japan, Jan.
2000, pp. 553558.
[8] D. J. Frank et al., Device scaling limits of Si MOSFETs and their
application dependencies, Proc. IEEE, vol. 89, no. 4, pp. 259288,
Mar. 2001.
[9] R. W. Keyes, Fundamental limits of silicon technology,Proc. IEEE,
vol. 89, no. 3, pp. 227239, Mar. 2001.
[10] R. E. Bryant et al., Limitations and challenges for computer-aided
design technology for CMOS VLSI, Proc. IEEE, vol. 89, no. 3, pp.
341365, Mar. 2001.
[11] J. A. Davis et al., Interconnect limits on gigascale integration
(GSI) in the 21st century, Proc. IEEE, vol. 89, no. 3, pp.
305324, Mar. 2001.
[12] Q. Chen and J. D. Meindl, Nanoscale metal-oxide-semiconductor
eld-effect transistors: Scaling limits and opportunities, Nanotech-
nology, vol. 15, pp. S549S555, Jul. 2004.
[13] M. Horowitz, E. Alon, D. Patil, S. Naffzinger, R. Kumar, and K.
Berstein, Scaling, power, and the future of CMOS, in Proc. Int.
Electr. Dev. Meeting (IDEM’05), Washington, DC, Dec. 2006, pp.
77.
[14] V. Beiu et al., On nanoelectronic architectural challenges and solu-
tions,in Proc. IEEE Conf. Nanotech. (NANO’04), Munich, Germany,
Aug. 2004, pp. 628631.
[15] V. Beiu, Limits, challenges, and issues in nanoscale and bio-inspired
computing, in Bio-Inspired and Nano-Scale Integrated Computing,M.
M. Eshaghian-Wilner, Ed. New York: Wiley, 2007/8.
[16] J. D. Meindl, Q. Chen, and J. A. Davis, Limits on silicon nanoelec-
tronics for terascale integration, Sci, vol. 293, pp. 20442049, Sep.
2001.
[17] V. Beiu and U. Rückert, Eds., Emerging Brain Inspired Nano Archi-
tectures. Singapore: World Scientic Press, 2008.
[18] S. Tiwari et al., Electronics at nanoscale: Fundamental and practical
challenges, and emerging directions, in Proc. Int. Conf. Emerging
Tech. Nanolelectr. (Nano’06), Singapore, Jan. 2006, pp. 481486.
[19] M. H. Sulieman and V. Beiu, Characterization of a 16-bit threshold
logic single-electron adder, in Proc. Int. Symp. Circuits Syst.
(ISCAS’04), Vancouver, QC, Canada, May 2004, pp. 681684.
[20] C. Constantinescu, Trends and challenges in VLSI circuit reliability,
IEEE Micro, vol. 23, no. 4, pp. 1419, Jul. 2003.
[21] S. K. Shukla et al., Nano, quantum, and molecular computing: Are
we ready for the validation and test challenges?, in Proc. 8th IEEE
Int. High-Level Design Validation Test Workshop, San Francisco, CA,
Nov. 2003, pp. 37.
[22] A. R. Brown, A. Asenov, and J. R. Watling, Intrinsic uctuation in
sub 10-nm double-gate MOSFETs introduced by discreteness of charge
and matter,IEEE Trans. Nanotech., vol. 1, pp. 195200, Dec. 2002.
[23] S. Borkar, Designing reliable systems from unreliable components:
The challenges of transistor variability and degradation,IEEE Micro,
vol. 25, no. 6, pp. 1016, Nov.-Dec. 2005.
[24] V. Degalahal et al., The effect of threshold voltages on the soft error
rate,in Proc. Int. Symp. Quality Electr. Design (ISQED’04), San Jose,
CA, Mar. 2004, pp. 503508.
[25] M. Nicolaidis, Design for soft error mitigation, IEEE Trans. Dev.
Material Reliab., vol. 5, no. 5, pp. 405418, Sep. 2005.
[26] K. Constantinides et al., Assessing SEU vulnerability via circuit-level
timing analysis,in Proc. Workshop Arch. Rel. (WAR-1), Nov. 2005.
[27] D. Rossi et al., Multiple transient faults in logic: An issue for next
generation ics?,in Proc. 20th IEEE Int. Symp. Defect Fault Tolerance
VLSI Syst. (DFT’05), Monterey, CA, Oct. 2005, pp. 352360.
[28] P. Shivakumar et al., Modeling the effect of technology trends on soft
error rate of combinatorial logic,in Proc. Int. Conf. Dependable Sys.
Networks (DSN’02), Washington, DC, Jun. 2002, pp. 389398.
[29] R. Ho, On-Chip wires: Scaling and efciency Ph.D. thesis, Elect.
Eng. Dept., Stanford Univ, Stanford, CA, 2003 [Online]. Available:
www-vlsi.stanford.edu/papers/rh_thesis.pdf
[30] W. Burleson and A. Maheshwari, VLSI Interconnects: A Design Per-
spective. San Francisco, CA: Elsevier/Morgan Kaufman, 2007/8.
[31] J. A. Hutchby et al., Extending the road beyond CMOS, IEEE Cir-
cuits Dev. Mag., vol. 18, no. 3, pp. 2841, Mar. 2002.
[32] R. Waser, Ed., Nanoelectronics and Information Technology, 2nd ed.
New York: Wiley, 2005.
[33] K. K. Likharev, Single-electron devices and their applications,Proc.
IEEE, vol. 87, no. 4, pp. 606632, Apr. 1999.
[34] U. Feldkamp and C. M. Niemeyer, Rational design of DNA nanoar-
chitectures, Angew. Chem. Int. Ed., vol. 45, pp. 18561876, Mar.
2006.
2576 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 54, NO. 11, NOVEMBER 2007
[35] C. Lin et al., DNA tile based self-assembly: Building complex nanoar-
chitectures,ChemPhysChem, vol. 7, pp. 16411647, Aug. 2006.
[36] J. E. Green et al., A 160-kilobit molecular electronic memory
patterned at
bits per square centimeter, Nature, vol. 445, pp.
414417, Jan. 2007.
[37] M. Forshaw et al., A short review of nanoelectronic architectures,
Nanotechnology, vol. 15, pp. S220S223, Feb. 2004.
[38] M. Forshaw et al., A review of the status of research and training into
architectures for nanoelectronic and nanophotonic systems in the Eu-
ropean research area, Univ. College London, London, U.K., Tech. Rep.
FP6/2002/IST/1, Contract #507519, 2004.
[39] V. Beiu, Neural inspired architectures for nanoelectronics: Highly re-
liable, ultra-low-power, recongurable, asynchronous, in Proc. Neural
Inf. Process. Syst. (NIPS03), Whistler, Canada, Dec. 2003 [Online].
Available: www.eecs.wsu.edu/\sim vbeiu/workshop_nips03
[40] J. R. Heath et al., A defect-tolerant computer architecture: Opportu-
nities for nanotechnology,Sci, vol. 280, pp. 17161721, Jun. 1998.
[41] K. Nicolic, A. S. Sadek, and M. Forshaw, Architectures for reliable
computing with unreliable nanodevices, in Proc. IEEE Conf. Nan-
otech. (NANO01), Maui, HI, Oct. 2001, pp. 254259.
[42] K. Nicolic, A. S. Sadek, and M. Forshaw, Fault-tolerant techniques for
nanocomputers,Nanotechnology, vol. 13, pp. 357362, May 2002.
[43] J. A. B. Fortes, Future challenges in VLSI system design, in Proc.
IEEE Comput. Soc. Annu. Symp. VLSI. New Trends Technol. For VLSI
Syst. Design. (ISVLSI03), Tampa, FL, Feb. 2003, pp. 57.
[44] T. Lehtonen, J. Plosila, and J. Isoaho, On fault tolerance techniques
towards nanoscale circuits and systems, Dept. of Inf. Technol., Univ.
of Turku , Turku, Finland, 2005.
[45] J. E. Harlow III, Toward design technology in 2020: Trends, issues,
and challenges, in Proc. Int. Symp. VLSI (ISVLSI03), Tampa, FL, Feb.
2003, pp. 34.
[46] M. T. Niemeir, M. Crocker, X. Sharon Hu, and M. Lieberman, Using
CAD to shape experiments in molecular QCA, in Proc. Int. Conf.
Comp.-Aided Design (ICCAD06), San Jose, CA, Nov. 2006, pp.
907914.
[47] V. Beiu and M. H. Sulieman, On practical multiplexing issues, in
Proc. IEEE Conf. Nanotech. (NANO06), Cincinnati, OH, Jul. 2006,
pp. 310313.
[48] V. Beiu et al., Gate failures effectively shape multiplexing, in Proc.
21st IEEE Int. Symp. Defect Fault Tolerance VLSI Syst. (DFT06),Ar-
lington, VA, Oct. 2006, pp. 2940.
[49] W. Ibrahim, V. Beiu, and Y. A. Alkhawwar, On the reliability of four
full adder cells,in Proc. Int. Design Test Workshop (IDT06), Dubai,
U.A.E., Nov. 2006.
[50] J. von Neumann, Probabilistic logics and the synthesis of reliable
organisms from unreliable components, in Automata Studies,C.E.
Shannon and J. McCarthy, Eds. Princeton, NJ: Princeton University
Press, 1956, pp. 4398.
[51] S. Roy and V. Beiu, Multiplexing schemes for cost-effective fault-
tolerance, in Proc. 4th IEEE Conf. Nanotech. (NANO04), Munich,
Germany, Aug. 2004, pp. 589592.
[52] S. Roy and V. Beiu, Majority multiplexing-economical redundant
fault-tolerant designs for nanoarchitectures, IEEE Trans. Nanotech.,
vol. 4, no. 4, pp. 441451, Jul. 2005.
[53] A. S. Sadek, K. Nikolic, and M. Forshaw, Parallel information
and computation with restitution for noise-tolerant nanoscale logic
networks,Nanotechnology, vol. 15, pp. 192210, Jan. 2004.
[54] M. Zhang and N. R. Shanbhag, A CMOS design style for logic circuit
hardening,in Proc. Int. Reliab. Phys. Symp. (IRPS05), San Jose, CA,
Apr. 2005, pp. 223229.
[55] K. Constantinides et al., BulletProof: A defect-tolerant CMP switch
architecture, in Proc. Int. Symp. High-Perf. Comp. Arch. (HPCA06),
Austin, TX, Feb. 2006, pp. 516.
[56] S. Mitra et al., Soft error resilient system design through error cor-
rection, in Proc. IFIP Int. Conf. Very Large Scale Integration, Nice,
France, Oct. 2006, pp. 332337.
[57] S. Shyam et al., Ultra low-cost defect protection for microprocessor
pipelines,in Proc. Int. Conf. Arch. Support Prog. Lang. Op. Sys. (AS-
PLOS06), San Jose, CA, Oct. 2006, pp. 7382.
[58] A. J. KleinOsowski and D. J. Lilja, The nanobox project: Exploring
fabrics of self-correcting logic blocks for high defect rate molecular
device technologies, in Proc. IEEE Comput. Soc. Annu. Symp. VLSI
(IVLSI04), Lafayette, LA, Feb. 2004, pp. 1924.
[59] A. J. KleinOsowski et al., The recursive nanobox processor grid: A
reliable system architecture for unreliable nanotechnology devices,in
Proc. Int. Conf. Dependable Sys. Networks (DSN04), Florence, Italy,
Jun. 2004, pp. 167176.
[60] V. Beiu, The quest for practical redundant computations, in Proc.
Int. Conf. Microelectr. (ICM05), Islamabad, Pakistan, Dec. 2005, pp.
xixxix.
[61] V. P. Roychowdhury, D. B. Janes, and S. Bandyopadhyay, Nanoelec-
tronic architectures for boolean logic, Proc. IEEE, vol. 85, no. 4, pp.
574588, Apr. 1997.
[62] R. Reischuk and B. Schmeltz, Area efcient methods to increase
the reliability of combinatorial circuits, in Proc. Int. Symp. Theor.
Aspects Comp. Sci. (STACS89), Paderbon, Germany, Feb. 1989,
pp. 314326.
[63] E. F. Moore and C. E. Shannon, Reliable circuits using less reliable
relays,J. Franklin Inst. B, vol. 262, pp. 191208, Sep.Oct. 1956.
[64] S. Winograd and J. D. Cowan, Reliable Computation in the Presence
of Noise. Cambridge, MA: MIT Press, 1963.
[65] N. J. Nilsson, Learning Machines. New York: McGraw-Hill, 1965.
[66] B. Widrow and M. E. Hoff, Adaptive switching circuits, IRE Wescon
Conv., vol. 4, pp. 96104, Aug. 1960.
[67] C. Coates and P. Lewis, DONUT: A threshold gate computer, IRE
Trans. Electr. Comput., vol. 13, pp. 240247, Jun. 1964.
[68] N. P. Brousentsov, Computing machine setun of moscow state univer-
sity,New Develop. Comp. Tech., pp. 226234, 1960.
[69] N. P. Brousentsov, Threshold realization of three-valued logic on elec-
tromagnetic elements,Comp. Probls. Cyber., vol. 9, pp. 335, 1972.
[70] H. L. Hughes and J. M. Benedetto, Radiation effects and hardening of
MOS technology: Devices and circuits, IEEE Trans. Nucl. Sci., vol.
50, pp. 500521, Jun. 2003.
[71] C. Bolchini et al., A CMOS fault tolerant architecture for switch-level
faults, in Proc. IEEE Int. Workshop Defect Fault Tolerance VLSI Syst.,
Montreal, QC, Canada, Oct. 1994, pp. 1018.
[72] C. Bolchini et al., Static redundancy techniques for CMOS gates,in
Proc. IEEE Int. Symp. Circuits Syst. (ISCAS96), Atlanta, GA, May
1996, vol. 4, pp. 576579.
[73] C. Bolchini et al., An improved fault tolerant architecture at CMOS
level, in Proc. Int. Symp. Circuits Syst. (ISCAS97), Kowloon, Hong
Kong, Jun. 1997, pp. 27372740.
[74] J. Kumar and M. B. Tahoori, A low power soft error suppression
technique for dynamic logic, in Proc. 20th IEEE Int. Symp. Defect
Fault Tolerance VLSI Syst. (DFT05), Monterey, CA, Oct. 2005, pp.
454462.
[75] M. P. Baze, S. P. Buchner, and D. McMorrow, A digital CMOS design
technique for SEU hardening,IEEE Trans. Nucl. Sci., vol. 47, no. 12,
pp. 26032608, Dec. 2000.
[76] V. Beiu, Ultra-fast noise immune CMOS threshold gates, in Proc.
Int. Midwest Symp. Circuits Syst. (MWSCAS00), Lansing, USA, Aug.
2000, pp. 13101313.
[77] S. Tatapudi and V. Beiu, Split-precharge differential noise-immune
threshold logic gate (SPD-NTL), in Proc. Int. Work-Conf. Artif.
Neural Networks (IWANN03), Menorca, Spain, Jun. 2003, pp. 4956.
[78] J. B. Lerch, Threshold gate circuits employing eld-effect transistors,
U.S. patent # 3715603, Feb. 6, 1973.
[79] J. Galiay, Y. Crouzet, and M. Vergniault, Physical versus logical fault
models MOS LSI circuits: Impact on their testability, IEEE Trans.
Comp., vol. C-29, no. 6, pp. 527531, Jun. 1980.
[80] F. C. Blom et al., Layout level design for testability strategy applied
to a CMOS cell library,in Proc. Int. Workshop Defect Fault Tolerance
VLSI Sys. (DFT93), Venice, Italy, Oct. 1993, pp. 199206.
[81] S. Aunet and M. Hartmann, Real-time recongurable linear threshold
elements and some applications to neural hardware, in Proc. Int.
Conf. Evolvable Syst. (ICES03), Trondheim, Norway, Mar. 2003, pp.
365376.
[82] S. Aunet, Y. Berg, and V. Beiu, Ultra-low-power redundant logic
based on majority-3 gates, in Proc. IFIP Conf. VLSI Sys.-On-Chip
(VLSI-SoC05), Perth, Australia, Oct. 2005, pp. 553558.
[83] M. H. Sulieman and V. Beiu, Design and analysis of SET circuits:
Using MATLAB modules and simon,in Proc. IEEE Conf. Nanotech.
(NANO04), Munich, Germany, Aug. 2004, pp. 618621.
[84] C. A. Moritz and T. Wang, Towards defect-tolerant nanoscale archi-
tectures,in Proc. IEEE Conf. Nanotech. (NANO06), Cincinnati, OH,
Jul. 2006, pp. 331334.
[85] T. Wang et al., Combining circuit level and system level techniques
for defect-tolerant architectures, in Proc. Int. Workshop Defect
Fault Tolerant Nanoscale Arch. (NanoArch06), Boston, MA, Jun.
2006.
[86] L. Anghel and M. Nicolaidis, Defects tolerant logic gates for
unreliable future nanotechnologies, in Proc. Int. Work-Conf. Artif.
Neural Networks (IWANN07), San Sebastián, Spain, Jun. 2007,
pp. 422429.
BEIU et al.: SERIAL ADDITION: LOCALLY CONNECTED ARCHITECTURES 2577
[87] A. Schmid and Y. Leblebici, Robust circuit and system design
methodologies for nanometer-scale devices and single-electron tran-
sistors, in
Proc. IEEE Conf. Nanotech. (NANO03), San Francisco,
CA, Aug. 2003, vol. 2, pp. 516519.
[88] A. Schmid and Y. Leblebici, Robust circuit and system design
methodologies for nanometer-scale devices and single-electron tran-
sistors,IEEE Trans. VLSI Syst., vol. 12, no. 11, pp. 11561166, Nov.
2004.
[89] S. Aunet and V. Beiu, Ultra-low-power fault tolerant neural inspired
CMOS logic, in Proc. Int. Joint Conf. Neural Networks (IJCNN05),
Montréal, Canada, Aug. 2005, pp. 28432848.
[90] C. H. Kim et al., A process variation compensation technique with an
on-die leakage current sensor for nanometer scale dynamic circuits,
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 6, pp.
646649, Jun. 2005.
[91] K. Granhaug and S. Aunet, Improving yield and defect tolerance in
multifunction subthreshold CMOS gates, in Proc. Int. Symp. Defect
Fault Tolerance VLSI Syst. (DFT06), Arlington, VA, Oct. 2006, pp.
2028.
[92] Y. Cao et al., Yield optimization with energy-delay constraints in low-
power digital circuits, in Proc. Int. Conf. Electron Dev. Solid-State
Circuits, Kowloon, Hong Kong, Dec. 2003, pp. 285288.
[93] A. Weinberger and J.L. Smith, A logic for high-speed addition,Natl.
Bur. Stand. Circuits 591, pp. 312, 1958.
[94] P. M. Kogge and H. Stone, A parallel algorithm for the efcient so-
lution of a general class of recurrence equations,IEEE Trans. Comp.,
vol. C-22, no. 8, pp. 786793, Aug. 1973.
[95] R. E. Ladner and M. J. Fischer, Parallel prex computations, J. ACM,
vol. 27, pp. 831838, Oct. 1980.
[96] R. P. Brent and H. T. Kung, A regular layout for parallel adders, IEEE
Trans. Comp., vol. C-31, pp. 260264, Mar. 1982.
[97] T. Han and D. A. Carlson, Fast area-efcient VLSI adders,in Proc.
Int. Symp. Comp. Arithmetic (ARITH87), Como, Italy, May 1987, pp.
4956.
[98] M. Nicolaidis, Carry checking/parity prediction adders and alus,
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 11, pp.
121128, Feb. 2003.
[99] R. Ramanarayanan et al., Soft errors in adder circuits,in Proc. Mil.
Aerosp. Appls. Prog. Logic Devs. Tech. (MAPLD04), Washington,
DC, Sep. 2004 [Online]. Available: klabs.org/mapld04/abstracts/ra-
manarayanan_a.pdf
[100] S. Tosun et al., An ILP formulation for reliability-oriented high-level
synthesis, in Proceedings. 6th Int. Symp. Quality Electron. Design
(ISQED05), San Jose, CA, Mar. 2005, pp. 364369.
[101] D. B. Strukov and K. K. Likharev, CMOL FPGA: A recongurable
architecture for hybrid digital circuits with two-terminal nanodevices,
Nanotechnology, vol. 16, pp. 888999, Jun. 2005.
[102] S. Peng and R. Manohar, Fault tolerant asynchronous adder through
dynamic self-reconguration, in Proc. Int. Conf. Comp. Design: VLSI
Comp. (ICCD05), San Jose, CA, Oct. 2005, pp. 171178.
[103] T. Hogg and G. S. Snider, Defect-tolerant adder circuits with
nanoscale crossbars, IEEE Trans. Nanotechnol., vol. 5, no. 2, pp.
97100, Mar. 2006.
[104] W. Rao, A. Orailoglu, and R. Karri, Fault identication in recon-
gurable carry lookahead adders targeting nanoelectronic fabrics, in
Proc. Eur. Test Symp. (ETS06), Southampton, U.K., May 2006, pp.
6368.
[105] F. Worm, P. Thiran, and P. Ienne, Designing robust checkers
in the presence of massive timing errors, in Proc. 12th IEEE
Int. On-Line Testing Symp. (IOLTS06), Como, Italy, Jul. 2006,
pp. 281286.
[106] J. P. Hayes, I. Polian, and B. Becker, A model for transient faults in
logic circuits, in Proc. Int. Design Test Workshop (IDT06), Dubai,
U.A.E., Nov. 2006.
[107] T. Hogg and G. S. Snider, Defect-tolerant logic with nanoscale
crossbar circuits, J. Electr. Testing: Theor. Appls., vol. 23, pp.
117129, Jun. 2007.
[108] D. Patil et al., Robust energy-efcient adder topologies,in Proc. Int.
Symp. Comp. Arith. (ARITH18), Montpellier, France, Jun. 2007, pp.
1628.
[109] R. Zimmermann, Binary adder architectures for cell-based VLSI and
their synthesis Ph.D. thesis, Swiss Federal Inst. Tech, Zurich, Switzer-
land, 1997.
[110] Y. Shimazaki, R. Zlatanovici, and B. Nikolic, A shared-well dual-
supply-voltage 64-bit ALU, in Proc. Int. Solid-State Circuits Conf.
(ISSCC03), San Francisco, CA, Feb. 2003, pp. 104105.
[111] Q.-W. Kuo, V. Sharma, and C. C.-P. Chen, Substrate-bias optimized
0.18-
m 2.5 GHz 32-bit adder with post-manufacture tunable clock,
in Proc. Int. Symp. VLSI Design. Autom. Test (DAT05), Hsinchu,
Taiwan, R.O.C., Apr. 2005, pp. 341344.
[112] G. Yang et al., A 32-bit carry lookahead adder using dual-path all-n
logic, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no.
8, pp. 992996, Aug. 2005.
[113] A. Glodovsky et al., A folded 32-bit prex tree adder in 0.16-
m
static CMOS, in Proc. Midwest Symp. Circuits Syst. (MWSCAS00),
Lansing, MI, Aug. 2000, pp. 268373.
[114] M. Ziegler and M. R. Stan, Optimal logarithmic adder structures with
a fan-out of two for minimizing the area-delay product,in Proc. Int.
Symp. Circuits Syst. (ISCAS01), Sydney, Australia, May 2001, pp.
657660.
[115] R. A. Freking and K. K. Parhi, Theoretical estimation of power
consumption in binary adders, in Proc. Int. Symp. Circuits Syst.
(ISCAS98), Monterey, CA, Jun. 1998, pp. 453457.
[116] V. G. Oklobdzija and R. Krishnamurthy, Design of power efcient
VLSI arithmetic: Speed and power tradeoffs, in Proc. Int. Symp.
Comp. Arith. (ARITH03), Santiago de Compostela, Spain, Jun. 2003,
pp. 2801280 [Online]. Available: www.acsel-lab.com/Presenta-
tions/ARITH-Tutorial-Vojin.pps
[117] V. Beiu, A novel highly reliable low-power nano architecture: When
von Neumann augments Kolmogorov,in Proc. Int. Conf. Application-
specic Sys., Arch. Processors (ASAP04), Galveston, TX, Sep. 2004.
[118] M. Forshaw, K. Nikolic, and A. S. Sadek, ANSWERS: Autonomous
nanoelectronic systems with extended replication and signaling Uni-
versity College London, London, U.K., MEL-ARI #28667, 3rd Year
Report, 2001.
[119] V. Beiu, W. Ibrahim, and S. Lazarova-Molnar, What von neumann did
not say about multiplexing-the gory details, in Proc. Int. Work-Conf.
Artif. Neural Networks (IWANN07), San Sebastian, Spain, Jun. 2007,
pp. 487496.
[120] V. Beiu, W. Ibrahim, and S. Lazarova-Molnar, A fresh look
at majority multiplexing-when devices get into the picture, in
Proc. IEEE Int. Conf. Nanotech. (NANO07), Hong Kong, Aug.
2007, pp. 883888.
[121] V. Beiu, A. Djupdal, and S. Aunet, Ultra-low-power neural inspired
addition: When serial might outperform parallel architectures, in Proc.
Int. Work-Conf. Artif. Neural Networks (IWANN05), Barcelona, Spain,
Jun. 2005, pp. 486493.
[122] W. Ibrahim, V. Beiu, and M. H. Sulieman, On the reliability of ma-
jority gates full adders,IEEE Trans. Nanotech., accepted for publica-
tion.
[123] K. N. Patel, I. L. Markov, and J. P. Hayes, Evaluating circuit relia-
bility under probabilistic gate-level fault models, in Proc. Int. Work-
shop Logic Synthesis (IWLS03), Laguna Beach, CA, May 2003, pp.
5964.
[124] S. Krishnaswamy et al., Accurate reliability evaluation and enhance-
ments via probabilistic transfer matrices,in Proc. Design Autom. Test
Eur. (DATE05), Munich, Germany, Mar. 2005, pp. 282287.
[125] D. F. Hepner and A. D. Walls, Predictive failure analysis and failure
isolation using current sensing,U.S. patent # 7003409, Feb. 21, 2006.
[126] D. M. Markovic, A power/area optimal approach to VLSI
signal processing Univ. California, Berkeley, CA, Tech. Rep.
UCB/EECS-200665, 2006.
[127] S. Kao, R. Zlatanovici, and B. Nikolic, A 240 ps 64b carry-looka-
head adder in 90-nm CMOS, in Proc. Int. Solid-State Circuits Conf.
(ISSCC06), San Francisco, CA, Feb. 2006, pp. 17351744.
[128] V. Beiu, Constructive threshold logic addition: A synopsis of the last
decade, in Proc. Int. Conf. Neural Networks (ICANN03), Istanbul,
Turkey, Jul. 2003, pp. 745752.
[129] V. Beiu, J. M. Quintana, and M. J. Avedillo, VLSI implementations of
threshold logic: A comprehensive survey,IEEE Trans. Neural Netw.,
no. 14, pp. 12171243, Sep. 2003.
[130] P. Celinski, S. F. Al-Sarawi, and D. Abbott, Logical effort
based design exploration of 64-bit adders using a mixed dy-
namic-cmos/threshold-logic approach, in Proc. IEEE Comput.
Soc. Annu. Symp. VLSI (IVLSI04), Lafayette, LA, Feb. 2004, pp.
127132.
[131] V. Beiu et al., Femto joule switching for nano electronics, in Proc.
ACS/IEEE Int. Conf. Comp. Sys. Appls. (AICCSA06), Sharjah, U.A.E.,
Mar. 2006, pp. 415423.
[132] R. M. Swanson and J. D. Meindl, Ion-implanted complementary MOS
transistors in low-voltage circuits, IEEE J. Solid-State Circuits, vol.
SC-7, no. 4, pp. 146153, Apr. 1972.
2578 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 54, NO. 11, NOVEMBER 2007
[133] E. A. Vittoz and J. Fellrath, CMOS analog integrated circuits based on
weak inversion operation,
IEEE J. Solid-State Circuits, vol. SC-12, no.
3, pp. 224231, Jun. 1977.
[134] C. A. Mead, Neuromorphic electronic systems,Proc. IEEE, vol. 78,
no. 10, pp. 16291636, Oct. 1990.
[135] E. A. Vittoz, Very low power circuit design: Fundamentals and limits,
in Proc. Int. Symp. Circuits Syst. (ISCAS93), Chicago, IL, May 1993,
pp. 14391442.
[136] E. A. Vittoz, Low-power design: Ways to approach the limits, in Proc.
IEEE Int. Solid-State Circuits Conf. (ISSCC 94), San Francisco, CA,
Feb. 1994, pp. 1418.
[137] G. Schrom and S. Selberherr, Ultra-low-power CMOS technologies,
in Proc. Int. Semicond. Conf. (CAS96), Sinaia, Romania, Oct. 1996,
vol. 1, pp. 237246.
[138] J. B. Burr and J. Shott, A 200-mV self-testing encoder/decoder using
Stanford ultra-low-power CMOS,in Proc. IEEE Int. Solid-State Cir-
cuits Conf. (ISSCC 94), San Francisco, CA, Feb. 1994, pp. 8485.
[139] T. S. Lande et al., FLOGIC-Floating-gate logic for low-power oper-
ation, in Proc. 3rd Int. Conf. Electronics, Circuits, Syst. (ICECS96),
Rodos, Greece, Oct. 1996, vol. 2, pp. 10411044.
[140] C. H. Kim, H. Soeleman, and K. Roy, Ultra-low-power DLMS adap-
tive lter for hearing aid applications,IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 11, no. 12, pp. 10581067, Dec. 2003.
[141] P. Shivakumar et al., Exploiting microarchitectural redundancy for
defect tolerance, in Proc. 21st Int. Conf. Comput. Design, San Jose,
CA, Oct. 2003, pp. 481488.
[142] S. Aunet et al., Recongurable subthreshold CMOS perceptron, in
Proc. Int. Joint Conf. Neural Networks (IJCNN04), Budapest, Hun-
gary, Jul. 2004, pp. 19831988.
[143] B. H. Calhoun, A. Wang, and A. Chandrakasan, Device sizing for
minimum energy operation in subthreshold circuits,in Proc. Custom
IC Conf. (CICC04), Orlando, FL, Oct. 2004, pp. 9598.
[144] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, Low-power
CMOS digital design,IEEE J. Solid-State Circuits, vol. 27, no. 4, pp.
473484, Apr. 1992.
[145] A. Wang, A. P. Chandrakasan, and S. Kosonocky, Optimal supply
and threshold scaling for subthreshold CMOS circuits,in Proc. Annu.
Symp. VLSI (ISVLSI02), Pittsburgh, PA, Apr. 2002, pp. 59.
[146] D. D. Wentzloff et al., Design considerations for next generation wire-
less power-aware microsensor nodes, in Proc. 17th Int. Conf. VLSI
Design (VLSID04), Mumbai, India, Jan. 2004, pp. 361367.
[147] H. Soeleman, K. Roy, and B. Paul, Robust subthreshold logic for
ultra-low power operation, IEEE Trans. Very Large Scale Integr.
(VLSI) Syst., no. 9, pp. 9099, Feb. 2001.
[148] K. Johansson, O. Gustafsson, and L. Wanhammar, Power estimation
for ripple-carry adders with correlated input data, in Proc. Int. Conf.
IC Sys. Design (ICSD04), Toulouse, France, Jul. 2004, pp. 662674.
[149] G. E. R. Cowan, R. Melville, and Y. Tsividis, A VLSI analog com-
puter/digital computer accelerator, IEEE J. Solid-State Circuits, vol.
41, no. 1, pp. 4253, Jan. 2006.
[150] G. E. R. Cowan, A VLSI analog computer/math co-processor for a
digital computer,Ph.D. thesis, Columbia University, New York, 2005
[Online]. Available: digitalcommons.libraries.columbia.edu/disserta-
tions/AAI3174769
[151] R. Sarpeshkar, Brain power: Borrowing from biology makes for low-
power computing,IEEE Spectr., vol. 43, no. 5, pp. 2429, May 2006.
[152] J. Gambles et al., An ultra-low-power, radiation-tolerant reed solomon
encoder for space applications,in Proc. Custom IC Conf. (CICC03),
San Jose, CA, Sep. 2003, pp. 631634.
[153] J. Donald and M. Martonosi, Power efciency for variation-tolerant
multicore processors, in Proc. 2006 Int. Symp. Low Power Electron.
Design (ISLPED06 ), Tegernsee, Germany, Oct. 2006, pp. 304309.
[154] A. Datta et al., Delay modeling and statistical design of pipelined cir-
cuit under process variation, IEEE Trans. Comput.-Aided Design In-
tegr. Circuits Syst., vol. 25, no. 11, pp. 24272436, Nov. 2006.
[155] J. K. McIver III and T. Clark, Reducing radiation-hardened digital
circuit power consumption, IEEE Trans. Nucl. Sci., vol. 52, no. 12,
pp. 25032509, Dec. 2005.
[156] K. Ishibashi et al., Low-voltage and low-power logic, memory, and
analog circuit techniques for socs using 90-nm technology and be-
yond, IEICE Trans. Electron., vol. E89-C, no. 3, pp. 250262, Mar.
2006.
[157] W. B. Levy and R. A. Baxter, Energy-efcient neuronal computation
via quantal synaptic failures, J. Neurosci., vol. 22, pp. 47464755, Jun.
2002.
[158] Y. Cao et al., New paradigm of predictive MOSFET and intercon-
nect modeling for early circuit design,in Proc. Int. Custom IC Conf.
(CICC00), Orlando, FL, May 2000, pp. 201204.
[159] S. Lazarova-Molnar, V. Beiu, and W. Ibrahim, Reliability: The fourth
optimization pillar of nanoelectronics,in Proc. Int. Conf. Signal Proc.
Comm. (ICSPC07), Dubai, U.A.E., Nov. 2007.
[160] V. Beiu, On brain, yield, energy, and delays-there are plenty of oppor-
tunities at the top,Semiconductor Research Corp., Research Triangle
Park, NC, Mar. 22, 2007.
Valeriu Beiu (S92M95SM96) received the
M.Sc. degree in computer engineering from the
Politehnica University of Bucharest, Bucharest,
Romania, in 1980, and the Ph.D. degree (summa cum
laude) in electrical engineering from the Katholieke
Universiteit Leuven, Leuven, Belgium, in 1994.
After graduation, he worked for two years, with
the Research Institute for Computer Techniques,
Bucharest, Romania, focusing on high-speed CPUs
and FPUs. He then rejoined the Politehnica
University of Bucharest. Since 1991, has been
on leave of absence and a visiting researcher to Katholieke Universiteit
Leuven, Leuven, Belgium (19911994), Kings College London, London,
U.K., (19941996), and Los Alamos National Laboratory, Los Alamos, NM
(19961998). In 1998, he co-founded RN2R, Dallas, TX, a VLSI IP startup
company, and was its Chief Technical Ofcer (19982001). In 2001, he joined
the School of Electrical Engineering and Computer Science, Washington State
University, Pullman, WA, and in 2005, he became Visiting Professor with the
School of Computing Intelligent Systems, University of Ulster (Londonderry,
U.K.). Since 2006, he has been the Associate Dean for Research, College of
Information Technology, United Arab Emirates University (Al Ain, U.A.E.).
He was the principal investigator (PI) of over 40 research contracts totaling
over US$ 6M. He holds 11 patents, has received over 40 grants, given over 120
invited talks, and authored over 1640 technical papers in refereed journals and
international conferences, receiving six Best Paper Awards.
Dr. Beiu has received ve fellowships including: Fulbright (1991), Human
Capital and Mobility (19941996) with Kings College London (Pro-
grammable Neural Arrays project), Directors Funded Postdoc (19961998)
with Los Alamos National Laboratory (Field Programmable Neural Arrays
project, under the Deployable Adaptive Processing Systems initiative), and
Fellow of Rose Research (19992001). He has authored 12 chapters (out of
which 6 invited), and is working on three forthcoming books, out of which
one on emerging brain-inspired nano-architectures, and another one on the
VLSI complexity of discrete neural networks. His main research interests are
VLSI-efcient designs (i.e., very low-power and highly reliable), and emerging
nanoarchitectures (massively parallel, communication starved, adaptive/recon-
gurable, regular, fault-tolerant) as well as their optimized designs (inspired by
systolic arrays, articial neural networks, and hybrid combinations of these),
being the founder of the Centers for Neural Inspired Nano Architectures
(www.cnina.org). Dr. Beiu is a founding member of the European Neural
Network Society (ENNS, since 1991), and a member of: the International
Neural Network Society (INNS), the Association for Computing Machinery
(ACM), and the Marie Curie Fellowship Association (MCFA). He is a member
of the SRC-NNI Working Group on Novel Nano-architectures (since 2003), of
the IEEE CS Task Force on Nanoarchitectures (since 2005), and a contributor
to the International Technology Roadmap for SemiconductorsEmerging
Research Devices (since 2004). He has organized over 20 conferences and over
40 conference sessions, was the Program Chairman of the IEEE Los Alamos
Section (1997), and is an Associate Editor of the IEEE Transactions on Neural
Networks (since 2005).
Snorre Aunet (M94SM06) received the degree in
electronics engineering from Trondheim Technical
College, Trondheim, Norway, in 1987, the Cand.
Scient. degree from the University of Oslo (UiO),
Oslo, Norway, in 1993, and the Dr. Ing. degree at the
Norwegian University of Science and Technology
(NTNU), Trondheim, Norway, in 2002.
From 1994 to 1997, he was an Analog ASIC de-
signer with Nordic VLSI, Norway. He has held short-
term positions as Assistant Professor and Associate
Professor at NTNU in 2002 and 2003, and as a Post-
doctoral Research Fellow at UiO from December 2003 to June 2006. He is cur-
rently a Research Scientist in the Department of Informatics, UiO. He has pub-
lished more than 50 scientic papers. His research interests include ultra-low-
power biologically inspired defect tolerant nanoarchitectures.
Dr. Aunet has coorganized and chaired sessions at international conferences
and has served as an expert evaluator for the EU Commission. He is a member
of the Centers for Neural Inspired Nano Architectures (www.cnina.org).
BEIU et al.: SERIAL ADDITION: LOCALLY CONNECTED ARCHITECTURES 2579
Jabulani Nyathi (M02) received the B.Sc. degree
from Morgan State University, Baltimore, MD, in
1994, and the M.Sc. and Ph.D. degrees from the
State University of New York (SUNY), Binghamton,
in 1996 and 2000, respectively, all in electrical
engineering.
He is currently an Assistant Professor in the
School of Electrical Engineering and Computer Sci-
ence, Washington State University, Pullman, WA. He
has held academic positions at SUNY, Binghamton
(Adjunct Lecturer and Visiting Assistant Professor
19982001). His research interests include VLSI design, interconnection
networks, embedded systems, and computer architecture.
Dr. Nyathi is a Member of the Tau Beta Pi.
R. Robert Rydberg III (S01) received the B.Sc. de-
gree in computer science and computer engineering
from Pacic Lutheran University, Tacoma, WA, in
2003 and the M.Sc. degree in electrical engineering
from Washington State University, Pullman, WA,
in 2005. He is currently working toward the Ph.D.
degree in the School of Electrical Engineering and
Computer Science, Washington State University,
Pullman, WA.
He has held intern positions at Micron Technology
(CMOS Imaging Product Engineering), Sandia Na-
tional Laboratories (ASIC/SoC Design), and the Intel Corporation (Firmware
Validation). His research interests include VLSI design, interconnect networks,
embedded systems, and concurrent algorithms.
Walid Ibrahim (M06) received the B.Eng. degree in
electrical engineering from Cairo University, Cairo,
Egypt, in 1992, and the Ph.D. degree in systems and
computer engineering from Carleton University, Ot-
tawa, ON, Canada, in 2002.
In September 2004, he joined the College of
Information Technology, United Arab Emirates
University, Al Ain, U.A.E., as an Assistant Professor
with the Computer System Engineering Department.
He is also an Adjunct Research Professor with the
Department of Systems Computer Engineering, Car-
leton University, Ottawa, ON, Canada. Before joining the United Arab Emirates
University, he held several software design and research positions in worldwide
leading telecommunication and semiconductor companies including Nortel
Networks, Alcatel, PMC-Sierra, and Siemens. Dr. Ibrahim research interests
include: reliability of nano-architectures, VLSI testing, resource allocation
and pricing in wireless data networks, applied optimization techniques, and
feasibility of nonlinear programming models.
... When operands are serially received, during calculations, the intermediate results pass to the arithmetic units serially, and mostly digit by digit [2]. In [3], the analysis, provided by the simulation results, shows that serial adders dissipate less power than the other ones and therefore this architecture can be used to improve the performance of processors. Also, it is shown that serial adders achieve a better Power-Delay-Product (PDP) and Energy-Delay-Product (EDP) than other adders in nanometer technology [3]. ...
... In [3], the analysis, provided by the simulation results, shows that serial adders dissipate less power than the other ones and therefore this architecture can be used to improve the performance of processors. Also, it is shown that serial adders achieve a better Power-Delay-Product (PDP) and Energy-Delay-Product (EDP) than other adders in nanometer technology [3]. It is reported in [4] that serial adders might be better for low-power operations with redundancy for enhancing reliability and for the goal of performance achievement, and therefore, this architecture was chosen to be probed. ...
... The inputs of these gates are the final carries of the previous stage and the intermediate sums. and is state of the art for high performance adders [3], [15]. However it requires much area and power consumption. ...
... Applications that have recently emerged (such as image recognition and synthesis, digital signal processing, which is computationally demanding, and wearable devices, which require battery power) have created challenges relative to power consumption. Addition is a fundamental arithmetic function for these applications [1] [2]. Most of these applications have an inherent tolerance for insignificant inaccuracies. ...
... The power-delay product (PDP) is proposed to evaluate approximate arithmetic circuits [2]. The results of PDP for these adders are shown in Fig. 6 for a better overview of the circuit characteristics. ...
Conference Paper
Full-text available
Approximate computing is an efficient approach for error-tolerant applications because it can trade off accuracy for power. Addition is a key fundamental function for these applications. In this paper, we proposed a low-power yet high-speed accuracy-configurable adder that also maintains a small design area. The proposed adder is based on the conventional carry look-ahead adder, and its configurability of accuracy is realized by masking the carry propagation at runtime. Compared with the conventional carry look-ahead adder, with only 14.5% area overhead, the proposed 16-bit adder reduced power consumption by 42.7%, and critical path delay by 56.9% most according to the accuracy configuration settings, respectively. Furthermore, compared with other previously studied adders, the experimental results demonstrate that the proposed adder achieved the original purpose of optimizing both power and speed simultaneously without reducing the accuracy.
... Most of these applications are inherently tolerant of small inaccuracies; therfore, there are unprecedented opportunities to reduce power consumption. Addition is a fundamental arithmetic function for such applications [1] [2]. Approximate computing is an efficient approach for error-tolerant applications because it can trade off accuracy for power. ...
... The power-delay product (PDP) is proposed to evaluate approximate arithmetic circuits [2]. The results of PDP for CMA and GDA are shown in Fig. 10 and Fig. 11 for a better overview of the circuit characteristics. ...
Conference Paper
Full-text available
Addition is a key fundamental function for many error-tolerant applications. Approximate addition is considered to be an efficient technique for trading off energy against performance and accuracy. This paper proposes a carry-maskable adder whose accuracy can be configured at runtime. The proposed scheme can dynamically select the length of the carry propagation to satisfy the quality requirements flexibly. Compared with a conventional ripple carry adder and a conventional carry look-ahead adder, the proposed 16-bit adder reduced power consumption by 54.1% and 57.5% and critical path delay by 72.5% and 54.2%, respectively. In addition, results from an image processing application indicate that the quality of the processed images can be controlled by the proposed adder.
... One of the fundamental arithmetic operations in many applications of inexact computing is addition [4,5]. Soft additions are generally based on the operation of deterministic approximate logic or probabilistic imprecise arithmetic (categorized in [6] as design-time and run-time techniques). ...
Article
Addition is a fundamental function in arithmetic operation; several adder designs have been proposed for implementations in inexact computing. These adders show different operational profiles; some of them are approximate in nature while others rely on probabilistic features of nanoscale circuits. However, there has been a lack of appropriate metrics to evaluate the efficacy of various inexact designs. In this paper, new metrics are proposed for evaluating the reliability as well as the power efficiency of approximate and probabilistic adders. Reliability is analyzed using the so-called sequential probability transition matrices (SPTMs). Error distance (ED) is initially defined as the arithmetic distance between an erroneous output and the correct output for a given input. The mean error distance (MED) and normalized error distance (NED) are then proposed as unified figures that consider the averaging effect of multiple inputs and the normalization of multiple-bit adders. It is shown that the MED is an effective metric for measuring the implementation accuracy of a multiple-bit adder and that the NED is a nearly invariant metric independent of the size of an adder. The MED is, therefore, useful in assessing the effectiveness of an approximate or probabilistic adder implementation, while the NED is useful in characterizing the reliability of a specific design. Since inexact adders are often used for saving power, the product of power and NED is further utilized for evaluating the tradeoffs between power consumption and precision. Although illustrated using adders, the proposed metrics are potentially useful in assessing other arithmetic circuit designs for applications of inexact computing.
Chapter
Effective configurable adder is a traditional method used for obtaining high accuracy. In day-to-day life, everyone’s goal is to reduce delay and increase speed. The proposed study is mainly focused on these parameters for obtaining high accuracy, less power, and speed operation. The proposed adder includes an effective configurable adder with high-speed error detectable model with easy testability. By incorporating the process of traditional carry look-ahead adder, the proposed adder utilizes the propagation with carry masking method. Then, the accuracy has been tested to remain configurable at run-time. The results from the proposed study indicate that the implementation of proposed design on a Spartan3E FPGA utilizes 62% fewer slice registers and 47% fewer slice LUTs when compared with the standard configurable adder design.KeywordsConfigurable adderGracefully degrading adder (GDA)Carry look-ahead adderCarry maskable half adderVerilog
Chapter
The reduction of the size of the devices and the operating voltage are critical challenges in the design of digital circuits. This paper presents a computational error analysis approach for reliability evaluation in the circuit datapath. In this paper, Soft Error Rate (SER) is investigated in terms of dependency to the cell circuit structure and the input combination. Using this method, a sample cell library is evaluated to build up reliability profile for different implementation cases. Accordingly, error rate for different bits are extracted using Monte Carlo analysis, which have been reflected as a computational error. The proposed Method can be used besides other methods for analysing the reliability based computational error; instead of increasing the reliability via redundancy or other techniques, designers select a reliable design by using different implementations of a component. Considering reliability as a computational cost, during high-level synthesis (HLS) reliability can be traded with other implementation costs resulting in higher performance and lower implementation costs. In this paper, computational error is simulated with a Gaussian distribution. Again, in order to find the relationship between the coefficients of the Gaussian error model and error rate, the coefficients are fitted to a cubic polynomial. Results show this that fitting is suitable. #COMESYSO1120.
Article
Addition is a key fundamental function for many error-tolerant applications. Approximate addition is considered to be an efficient technique for trading off energy against performance and accuracy. This paper proposes a carry-maskable adder whose accuracy can be configured at runtime. The proposed scheme can dynamically select the length of the carry propagation to satisfy the quality requirements flexibly. Compared with a conventional ripple carry adder and a conventional carry look-ahead adder, the proposed 16-bit adder reduced the power consumption by 54.1% and 57.5%, respectively, and the critical path delay by 72.5% and 54.2%, respectively. In addition, results from an image processing application indicate that the quality of processed images can be controlled by the proposed adder. Good scalability of the proposed adder is demonstrated from the evaluation results using a 32-bit length. (Full text: https://uarch.wordpress.com/publications/)
Chapter
The end of the ITRS roadmap for classical CMOS devices and circuits envisions the emergence of future nanotechnologies and nanodevices and also evidences many new related challenges. This chapter covers some of these issues using a tutorial presentation style.
Article
Multiobjective optimization taking area, power con- sumption and robustness into account was used to pick two implementations of the minority-3 function as building blocks to implement Muller C-elements. According to our simulations, the generally better among the two implementations was a 12 transistor implementation based on a 10 transistor minority-3 gate, when compared to a 24 transistor implementation based on 2-input nand, 2-input nor and invert functions. For room temperature and a supply voltage of 150 mV, the simulated delays for the 12T and 24T implementations were 16.2 μs and 18.5 μs, respectively. The mean static power consumption figures were for the same conditions 2.6 pW and 7.4 pW, for the 12T and 24T implementations respectively. Switching energy was also simulated for a 150 mV supply voltage. The switching energy for the 12T version of the Muller C-element was almost 44 % lower compared to the 24T implementation. We also report delay, power and energy for a supply voltage of 300 mV.
Article
In this paper we show how decomposition of a wide CMOS transistor into a multi-finger FET with gates of minimum size can be beneficial for the reduction of delay and power-delay products in logic gates. This design possibility, which we call a minimum-split transistor (MST), seems to be largely overlooked in the literature. In a 90 nm CMOS process we compare the design to wide transistors. By exploiting the narrow-width effect, reduced parasitic capacitances from a shorter active channel and increased gate-drain spacing, we achieve up to 75-85% higher operation speed at a similar or reduced power consumption. The worst-case timing delay is reduced by 35-40% along with the nominal values. The design technique is considered valuable, in particular for critical time paths. The paper takes the perspective of subthreshold logic design at 200 mV, but the technique is also useful above threshold. A statistical experiment also investigates how Vth variation in MSTs changes with the number of parallell gates.
Chapter
Full-text available
This paper presents an overview of the Built-in Soft Error Resilience (BISER) technique for correcting soft errors in latches, flip-flops and combinational logic. The BISER technique enables more than an order of magnitude reduction in chip-level error soft rate with minimal area impact, 6-10% chip-level power impact, and 1-5% performance impact (depending on whether combinational logic error correction is implemented or not). In comparison, several classical error-detection techniques introduce 40-100% power, performance and area overheads, and require significant efforts for designing and validating corresponding recovery mechanisms. Design trade-offs associated with the BISER technique and other existing soft error protection techniques are also analyzed.
Article
The sustained push toward smaller and smaller technology sizes has reached a point where device reliability has moved to the forefront of concerns for next-generation designs. Silicon failure mechanisms, such as transistor wearout and manufacturing defects, are a growing challenge that threatens the yield and product lifetime of future systems. In this paper we introduce the BulletProof pipeline, the first ultra low-cost mechanism to protect a microprocessor pipeline and on-chip memory system from silicon defects. To achieve this goal we combine area-frugal on-line testing techniques and system-level checkpointing to provide the same guarantees of reliability found in traditional solutions, but at much lower cost. Our approach utilizes a microarchitectural checkpointing mechanism which creates coarse-grained epochs of execution, during which distributed on-line built in self-test (BIST) mechanisms validate the integrity of the underlying hardware. In case a failure is detected, we rely on the natural redundancy of instructionlevel parallel processors to repair the system so that it can still operate in a degraded performance mode. Using detailed circuit-level and architectural simulation, we find that our approach provides very high coverage of silicon defects (89%) with little area cost (5.8%). In addition, when a defect occurs, the subsequent degraded mode of operation was found to have only moderate performance impacts, (from 4% to 18% slowdown).
Article
Biological in formation-processing systems operate on completely different principles from those with which most engineers are familiar. For many problems, particularly those in which the input data are ill-conditioned and the computation can be specified in a relative manner, biological solutions are many orders of magnitude more effective than those we have been able to implement using digital methods. This advantage can be attributed principally to the use of elementary physical phenomena as computational primitives, and to the representation of information by the relative values of analog signals, rather than by the absolute values of digital signals. This approach requires adaptive techniques to mitigate the effects of component differences. This kind of adaptation leads naturally to systems that learn about their environment. Large-scale adaptive analog systems are more robust to component degradation and failure than are more conventional systems, and they use far less power. For this reason, adaptive analog technology can be expected to utilize the full potential of wafer-scale silicon fabrication.