ArticlePDF Available

Program Schemes For Multilevel Flash Memories

Authors:

Abstract and Figures

This paper presents a synthetic overview of multilevel (ML) flash memory program methods. The problem of increasing program time with the number of bits stored in each cell is discussed and methods based on both channel hot electrons (CHE) and Fowler-Nordheim tunneling (FNT) are discussed. In the case of CHE, the use of an increasing voltage rather than a constant one on the control gate (CG) leads to narrower threshold voltage distributions and smaller current absorption, with positive effects on the degree of parallelism and program throughput. As for FNT, much faster programming than that commonly used today can be done using high CG voltages without producing intolerable degradation of cell reliability.
Content may be subject to copyright.
Program Schemes for Multilevel Flash Memories
MARCO GROSSI, MASSIMO LANZONI, AND BRUNO RICCÒ, FELLOW, IEEE
Invited Paper
This paper presents a synthetic overview of multilevel (ML)
Flash memory program methods. The problem of increasing pro-
gram time with the number of bits stored in each cell is discussed
and methods based on both channel hot electrons (CHE) and
Fowler–Nordheim tunneling (FNT) will be discussed. In the case
of CHE, the use of an increasing voltage rather than a constant
one on the control gate (CG) leads to narrower threshold voltage
distributions and smaller current absortion, with positive effects on
degree of parallelism and program throughput. As for FNT, much
faster programming than those commonly used today can be done
using high CG voltages without producing intolerable degradation
of cell reliability.
Keywords—Flash, memories, multilevel, programming.
I. INTRODUCTION
Emerging new applications for Flash memories (e.g.,
audio and video storage) have highly increased the demand
for high-density, low-cost memories. In this context, multi-
level (ML) storage [1] allows to memorize more than one bit
in each cell, thus offering significant cost per bit reduction
for the same cell dimension. ML storage, however, implies
more critical constraints in terms of program and sensing
accuracy, charge retention, read and write disturbs.
In particular, accurate programming requires the place-
ment of the right amount of charge on the cell floating gate
(FG) to produce tight threshold voltage (
) distributions.
If
denotes the number of bits per cell, 2 such distribu-
tions, adequately separated from each other, must cover a
total voltage window (TVW) (in pratice the difference be-
tween the highest and the lowest value of
) that tends to
shrink with new technologies aimed at low-voltage opera-
tions.
Accurate charge placement is normally obtained by
means of program and verify (P&V) algorithms featuring a
sequence of small steps, each followed by a read operation
Manuscript received July 1, 2002; revised January 5, 2003.
The authors are with the Department of Electronics, Computer Science,
and Systems, University of Bologna, 40136 Bologna, Italy (e-mail:
mgrossi@deis.unibo.it).
Digital Object Identifier 10.1109/JPROC.2003.811714
to determine whether or not further programming is to
be made. This approach obviously leads to the required
accuracy, provided that the individual program steps are
small enough. On the other hand, precision is heavily paid
for in terms of program throughput (PT), i.e., number of bits
that can be programmed per second, since the number of
P&V steps increases with decreasing
distribution widths.
This, of course is particularly true for increasing values of
(3,4,…), since the width of the distribution decreases
essentially as 2
(for the same TVW).
In spite of these problems, ML programming with 2 b/cell
in both
NOR
[2], [3] and NAND [4], [5] technology is already a
reality, while a substantial research effort is dedicated at the
cases with
3 and 4.
As for architectures, the
NOR solution has been so far
the mainstream Flash technology since: 1) it allows one
to program cells by both channel hot electrons (CHE) and
Fowler–Nordheim tunneling (FNT); and 2) the absence
of serial connected cells allows faster programming and
reading and avoids write disturbs (seriously affecting the
NAND case).
On the other hand, the
NAND solution is gaining in-
creasing interest due to: 1) its more compact layout (leading
to higher memory density and lower cost per bit); and 2)
the possibility to use very low (or even negative)
values,
thus effectively eliminating the problem of overerased cells
and the consequent need of erase and verify algorithm. A
symmetrical problem exists in
NAND memories for overpro-
gramming. Since unselected cells become pass transistors,
if a cell
is too high, this can prevent it from turning on.
The problem is, however, less important than overerase in
ML memories, since high accuracy in programming must
be guaranteed either in
NOR or NAND architecture to allow
many levels to be stored in the same TVW.
In the case of a
NOR Flash memory, Fig. 1 illustrates the
distributions required for 4, 8, and 16 levels, respectively.
The needs to avoid read disturbs due to excessively low
values as well as undesired programming of low cells
during reading impose a minimum and maximum
value,
thus effectively determining the TVW.
0018-9219/03$17.00 © 2003 IEEE
594 PROCEEDINGS OF THE IEEE, VOL. 91, NO. 4, APRIL 2003
(a) (b)
(c)
Fig. 1.
distributions for ML programming of NOR Flash memory in a TVW of 4.5 V. (a)
Four-level programming. (b) Eight-level programming. (c) 16-level programming.
In the case of Fig. 1, where a TVW of 4.5 V is considered,
the maximum gate voltage (
) applied during reading is
5.25, 5.4, and 5.85 V for 4, 8, and 16 levels, respectively.
In the
NAND architecture, Fig. 2 illustrates the
distribu-
tions for the eight-level
NAND memory discussed in [5]: the
distributionsare well separated (0.4 V), and, although the
maximum
appliedtononselectedword-linesinreadingis
6 V (a trade-off between fast reading and device reliability),
a reliable and efficient device is achieved.
II. M
ULTILEVEL PROGRAM METHODS
Flash memory programming is achieved by injecting elec-
trons into the FG. This can be obtained by means of two dif-
ferent physical mechanisms.
1) CHE: electrons in the channel of the cell MOSFET
gain enough energy by the driving electric field to be
injected into the FG (helped by the vertical electric
field, essentially due to
).
2) FNT: electrons are injected into the FG by tunneling
due to the high vertical electric field.
Compared with FNT, CHE requires lower voltages, with
benefits for the driving circuitry and device reliability, but
is also characterized by large current absortion that limits
the degree of parallelism (DOP) and is problematic for low-
power applications.
In the following sections, program methods for both CHE
and FNT are synthetically discussed.
III. C
HANNEL HOT ELECTRONS
NOR
Flash memories can be programmed by CHE using
two different techniques: 1) conventional box programming;
and 2) ramped voltage programming.
In the former method, a constant voltage is applied on the
CG during the whole operation, while in the latter
is
raised linearly during programming.
GROSSI et al.: PROGRAM SCHEMES FOR MULTILEVEL FLASH MEMORIES 595
Fig. 2. Target distributions for eight-level NAND Flash memory. The picture is taken from [5].
The FG voltage ( ) and the injection current into the FG
(
) are linked by the following equation [6]:
(1)
where
is the FG to CG capacitance; is the FG to
drain capacitance; and
is the total capacitance between
FG and the other MOSFET regions.
In conventional box programming,
, thus
.Since decreaseswith decreasing,both
the programming speed (
) and are high at thebeginning
of programming, but decrease with program time and reach
a low value at the end of the operation, as schematically il-
lustrated in Fig. 3(a) [6]. This behavior represents a problem
because high values of
(hence of ) limit the DOP, thus
the PT. Moreover, strong nonuniformities of
produce high
dispersion in programmed
, hence (relatively) wide
distributions.
Withramped voltageprogramminginstead,
isconstant
(hereafter,
is the slope of the gate bias waveform) and
; thus, . If the initial
valueof theramp applied toCG is set so that
,
the write operation takes place under equilibrium conditions
(
), where both and are constant, as schemati-
cally illustrated in Fig. 3(b) [6].
Qualitative waveforms of
and for ramped voltage
programmingare sketchedin Fig. 4(a) [6], whileFig. 4(b) [6]
showsthe expected transient behavior of
and (here,
denotesthe time necessary to reach the equilibriumcondition
). In Fig. 4(c) [6], the expected waveforms for
and are schematically described.
As already mentioned, constant
helps to maximize
DOP, hence PT. Furthermore, the linear relationship be-
tween programmed
and program time produces a better
accuracy in programming, hence, tighter
distributions.
distribution widths obtained with ramped voltage pro-
grammingdepend onprogrammingconditions,i.e.,drainand
substrate bias (
and , respectively)as well as on .
Fig. 5 shows the standard deviation (
) of the pro-
grammed
distribution measured on 10 K cells as a
function of
, for different values of and .
For all considered bias configurations, the minimum
is obtained at low program speeds (low ) and
increases with . Thus, a tradeoff is in order between
high program speeds and good accuracy in achieving the
final
value.
From this point of view, the ramped voltage programming
technique has been shown to be able to program a Flash
memory array on four levels (2 b/cell) without the need of
P&V algorithms [7], with substantial benefits of PT. In par-
ticular, assuming the same DOP (256), the method of [7] re-
sults in a
0.8 MB s, instead of 0.17 MB/s achieved in
[2].
The obtained
distributions are well separated, and the
minimum read margin (i.e., the difference between the cell
and the gate bias used in reading) is 0.4 V. Also, after 20
K program/erase (P/E) cycles, the read margin does not de-
grade much; thus, the reliability constraints for the memory
are guaranteed.
However, programming the memory on eight or more
levels without P&V algorithms requires a significant in-
crease in TVW that is not compatible with desirable circuit
specification. On the other hand, the use of ramped voltage
programming in conjunction with P&V is problematic,
because before each program step the exact value of cell
must be determined in order to set the correct initial value of
. Since determination is a time-consuming operation,
596 PROCEEDINGS OF THE IEEE, VOL. 91, NO. 4, APRIL 2003
Fig. 3. Conceptual plots of as a function of the FG voltage and corresponding typical
behavior of
during programming operation where the CG has (a) a box waveform or (b) a
ramp waveform.
ramped voltage programming with P&V is more convenient
than conventional box programming only if a minimum
number of verifications is used.
A new programming method that combines ramped
voltage programming with verify operations is described in
[8]. With this algorithm, programming is performed using
only two steps, each precedeed by a
determination.
In detail, and with reference to Fig. 6, the program
algorithm consists of the following steps. First, the initial
value (
) of the cell is determined. Second, the cell
is programmed from
to an intermediate target value
(
) using a ramped CG voltage with slope and
the same overdrive (
) for all cells. Third, the obtained
value (
)of after this program step is determined.
Fourth, the cell is programmed from
to the final
value
with a CG voltage of slope and overdrive
, where . The deter-
mination of
guarantees quasi-equilibrium conditions
during the first program operation, thus avoiding initial high
current absortion and loss of accuracy. The determination of
, instead, allows one to adjust the program overdrive
to account for the characteristics of each individual cell, and
represents the essential element to obtain adequate program
accuracy.
The algorithm is capable of achieving
distribution
widths and displacement of the distribution mean value from
the targets smaller than 150 and 20 mV, respectively.
This method is adequate for 3 b/cell ML schemes while,
for the case of 4 b/cell, the separation between
distribu-
tion is probably insufficient for direct use in real memories,
although the adoption of error correcting codes makes it pos-
sible to use it also for 16-level schemes.
The achieved program time is six times lower than that
obtained with the algorithm of [2] for 4 b/cell at cell level
(70.75 instead of 400
s) that, with a cell matrix scheme
featuring DOP
256 and parallel analog determination of
cells
, results in a PT about three times larger (0.9 instead
of 0.32 MB/s).
IV. F
OWLER–NORDHEIM TUNNELING
Compared with CHE, this programming method has the
advantage of small current absortion, particularly interesting
for low-power applications. Moreover, it allows very high
GROSSI et al.: PROGRAM SCHEMES FOR MULTILEVEL FLASH MEMORIES 597
Fig. 4. (a) Qualitative waveforms of the CG and drain voltages
for ramped voltage programming scheme as well as corresponding
behavior of (b)
, and (c) , .
Fig. 5. Dependence of on for 10 K cells at different
and . BWP indicates the for box programming.
DOP, thus leading to a strong increase in PT. In this regard,
the
NAND state of the art (based on FNT programming) pro-
duces a PT as high as 10 MB/s [9].
However, asdescribed in [10], FNT has several drawbacks
that make it less effective than CHE for ML applications. In
particular, programming by tunneling is more sensitive than
CHE to process parameters, and this produces wider
dis-
tributions. Furthermore, the applied voltages are higher than
with CHE, and this produces high stress in the oxide, re-
sulting in worse devicereliability. In this regard, Fig. 7shows
read disturb time, i.e., the time to produce a 0.5-V
shift
duetodrain stress, as a functionofnumberof P/E cycles[11].
Thus,sincethe applied voltages cannotbetoohigh,program-
ming currents (
) are low; this leads to high programming
times (in the range of 10 ms as opposite to the few
s for
CHE programming).
To maintain competitive PT, high parallel programming
is required, and this leads to high circuit complexity and
die-size overhead, although parallel programming for FNT
is simpler to implement than for CHE.
Compared to CHE, FNT tends to produce wider
distri-
butions and higher programming time; thus, efficient P&V
algorithms are needed in ML programming to guarantee
good program accuracy and PT.
In [12], three different P&V algorithms (schematically
shown in Fig. 8) are presented for a
NAND Flash memory.
Fig. 8(a) illustrates the conventional P&V technique where
pulses of variable widthare applied on the CG,while a verify
operation is carried out between two write pulses. The first
writepulses aresufficientlyshortso asto ensurethatfastcells
willnot overprogrammed,thenthe pulsewidth isincreasedto
minimizethe numberofverifystepsfor slowcells.
Fig. 8(b) shows the trapezoidal pulse algorithm that
achieves much better results than in Fig. 8(a). Higher pro-
gramming speed can be obtained, while the oxide electric
field (
) can be reduced. Moreover, programming time
increase with
distribution width reduction is much
weaker than for the previous case.
Fig. 8(c) instead shows the staircase pulse algorithm that
uses the same approach as in Fig. 8(b) but it is much easier
to generate on-chip.
In Fig. 9, the main characteristics of both FNT and CHE
are compared. Since the advantage of less disturbs and lower
electric fieldsare more important thanthe large DOP allowed
by FNT, CHE seemsto bemore suitable for ML applications,
at least when low power consumption is not the main con-
straint.
Of course, with FNT it is possible to reduce program time
(
) by increasing , thus trading off and device relia-
bility. In this regard, stress-induced leakage current (SILC),
degrading data retention time, is the main phenomenon, and
it has conventionally been considered to increase with
,
thus with the decrease of
[13] (for the same charge flu-
ence, i.e., total charge injected through the oxide).
However, recent studies[14] have shownthat, forthe same
charge fluence, initially SILC increases with decreasing
,
but it tends to decrease with
as the stress time becomes
comparable to the characteristic time required for permanent
oxide degradation.
Fig. 10 shows SILC characteristics of Flash memory cells
as a function of
and for different program condi-
tions. Fig. 10(a) shows that SILC after a 10 K P/E cycling
with
20 ns is not much larger than the one obtained
598 PROCEEDINGS OF THE IEEE, VOL. 91, NO. 4, APRIL 2003
Fig. 6. Representation of the novel algorithm that combines ramped voltage programming with
verify operations. Inside the boxes, the CG voltage during the two program steps is shown.
Fig. 7. A comparison between the read disturb due to CHE and FNT programming, as a function of
P/E cycles. The picture is taken from [11].
with 30 s. Instead, Fig. 10(b) shows that SILC stops
increasing for
1 s and (slightly) decreases with
below such a value.
This shows that FNT programming of Flash memory with
aslowas 20nsis feasible, withgoodresultsin term ofdata
retention, provided that sufficiently low
during reading
is applied.
In this regard, in Fig. 11the maximum readdisturb voltage
(
) compatible with a data retention time of ten years
after 10 K P/E cycles is shown as a function of
.For as
low as 20 ns, this maximum value is about 2.5 V.
However, a significant problem for FNT is due to the
high voltages needed for fast programming [in the case
of Fig. 10(a), for
20 ns, it is 26.5 V], since
this leads to challenging constraints for the high-voltage
programming circuit.
Scaling the oxide thickness has favorable effects because
it decreases the values of
for the same oxide field, but
also produces a drastic decrease in data retention time.
In [15], measurements performed on 6.5-nm oxide Flash
memories have shown a data retention time of 13 hours
after 10 K P/E cycles with a maximum
of 2.5 V during
reading.
Such a retention time is small compared to the ten-year re-
tention of conventional nonvolatile memories, but it is more
than three orders of magnitude greater than typical DRAM
refresh time, thus making fast FNTpotentially interesting for
DRAM-like applications.
GROSSI et al.: PROGRAM SCHEMES FOR MULTILEVEL FLASH MEMORIES 599
Fig. 8. Conventional (a), trapezoidal (b), and staircase (c)
programming pulses. A verify step is carried out after each pulse.
The picture is taken from [12].
Fig. 9. Comparison of FNT and CHE programming mechanisms
for ML applications. The picture is taken from[11].
V. C ONCLUSION
This paper has presented a synthetic review of different
program techniques for ML Flash memories based both on
CHE injection and FNT.
Inthe case of CHE, rampedvoltageprogramminghasbeen
shown able to achieve tighter
distributions and higher
program throughput than the conventional box techniques.
In fact, programming on four levels is feasible without the
use of P&V algorithms. Instead, with 8 or 16 levels, P&V
is mandatory and problems are in order because of the diffi-
cultyof conjugatingramped voltageprogramming andverify
operations.
In the case of FNT, instead, fast programming with pulse
duration of 20 ns seems able to produce very high PT (com-
parable with DRAMs). However, problems occur because of
needto usehigh-voltagecircuitryand/or the reduction ofdata
retention time due to decreased tunnel oxide thickness. For
these reasons, fast FNT seems more suitable for DRAM-like
applications than conventional nonvolatile memories.
Fig. 10. SILC characteristics of the Flash memory cells after
10 K P/E cycling (a) as a function of
for different program
conditions and (b) as a function of
.
Fig. 11. Maximum read disturb voltage which still
guarantees a data retention time of 10 years versus
after 10
K P/E cycles.
REFERENCES
[1] B. Riccò, G. Torelli, M. Lanzoni, A. Manstretta, H. Maes, D. Mon-
tanari, and A. Modelli, “Nonvolatile multilevel memories for digital
applications,” Proc. IEEE, vol. 86, pp. 2399–2421, Dec. 1998.
[2] A. Silvagni, S. Zanardi, A. Manstretta, and M. Scotti, “Modular ar-
chitecture for a family of multilevel 256/192/128/64 mbit 2-bit/cell3
v only NOR Flash memory devices,” IEEE Trans. Electron Devices,
vol. 48, pp. 937–940, Jan. 2001.
[3] M. Bauer, “A multilevel-cell 32 Mb Flash memory,” in IEEE ISSCC
Tech. Dig., 1995, pp. 132–133.
[4] T.-S. Jung, Y.-J. Choi, and K.-D. Suh, “A 117 mm
3.3 v only 128
mb multilevel NAND Flash memory for mass storage applications,”
IEEE J. Solid-State Circuits, vol. 31, pp. 1575–1583, Nov. 1996.
600 PROCEEDINGS OF THE IEEE, VOL. 91, NO. 4, APRIL 2003
[5] H. Nobukata, S. Takagi, and K. Hiraga, “A 144-Mb, eight-level
NAND Flash memory with optimized pulsewidth programming,”
IEEE J. Solid-State Circuits, vol. 35, pp. 682–690, May 2000.
[6] D. Esseni, A. D. Strada, P. Cappelletti, and B. Riccò, “A new
and flexible scheme for hot-electron programming of nonvolatile
memory cells,” IEEE Trans. Electron Devices, vol. 46, pp. 125–133,
Jan. 1999.
[7] R. Versari, D. Esseni, G. Falavigna, M. Lanzoni, and B. Riccò, “Op-
timized programming of multilevel Flash EEPROMs,” IEEE Trans.
Electron Devices, vol. 48, pp. 1641–1646, Aug. 2001.
[8] M. Grossi, M. Lanzoni, and B. Riccò, “A novel algorithm for high
throughput programming of multi-level Flash memories,” IEEE
Trans. Electron Devices., submitted for publication.
[9] H. Nakamura, K. Imamiya, and T. Himeno, “A 125 mm
1Gb
NAND Flash memory with lOMB/s program throughput,” in IEEE
ISSCC Tech. Dig., vol. 1, 2002, pp. 106–450.
[10] B. Eitan, R. Kazerounian,A.Roy, G. Crisenza, P. Cappelletti,andA.
Modelli, “MultilevelFlashcellsandtheir trade-offs,”inIEEEIEDM
Tech. Dig., 1996, pp. 169–172.
[11] B. Eitan and A. Roy, “Binary and multilevel Flash cells,” in
Flash Memories, P. Cappelletti, C. Golla, P. Olivo, and E. Zanoni,
Eds. Boston, MA: Kluwer, 1999, pp. 91–152.
[12] G. Hemink, T. Tanaka, and T. Endoh, “Fast and accurate program-
ming method for multi-level NAND EEPROM’s,” in Symp. VLSI
Technology Dig. Tech. Papers, 1995, pp. 129–130.
[13] R. Moazzami and C. Hu, “Stress-induced current in thin silicon
dioxide film,” in IEEE IEDM Tech. Dig., 1992, pp. 139–141.
[14] R. Versari, A. Pieracci, D. Morigi, and B. Riccò, “Fast tunneling pro-
gramming of nonvolatile memories,” IEEE Trans. Electron Devices,
pp. 1285–1287, June 2000.
[15] R. Versari, A. Pieracci, and B. Riccò, “Fast programming/erasing of
thin-oxide EEPROMs,” IEEE Trans. Electron Devices, pp. 817–819,
Apr. 2001.
Marco Grossi was born in Bologna, Italy,
in 1973. He received the Laurea degree in
electronic engineering from the University
of Bolognain 2000. He is currently working
toward the Ph.D. degree at the Department of
Electronics, Computer Science, and Systems
Laboratory, University of Bologna.
His research interest is characterization of non-
volatile memories. He is currently working in the
field of Flash memories and the multilevel pro-
gramming of these memories using the ramped
gate technique.
Massimo Lanzoni was born in Bologna, Italy,
in 1961. He received the Laurea degree in
electronic engineering from the University of
Bologna, Bologna, Italy, in 1987.
He is with the Microelectronics Research
Group, Department of Electronics, Computer
Science, and Systems, University of Bologna,
working on research projects in the fields of
nonvolatile memories, MOS devices, virtual in-
strumentation, and testing. His research interests
include the characterization of thin dielectrics
reliability, nonvolatile memory cell characteristics and reliability, MOS
transistors’ experimental characterization and new techniques for IC
testing as nonvolatile memories endurance testing and CMOS IC latch-up
testing. He is now involved in projects concerning analog applications of
nonvolatile memories and multilevel programming.
Bruno Riccò (Fellow, IEEE) was born in Parma,
Italy, in 1947. He received the Laurea degree
in electrical engineering from the University
of Bologna, Bologna, Italy, in 1971 and the
Ph.D. degree from the University of Cambridge,
Cambridge, U.K., in 1976, where he worked at
the Cavendish Laboratory.
In 1980, he was a Full Professor of Electronics
at the University of Padova, Padova, Italy. In
1983, he was a Full Professor of Electronics at
the University of Bologna. In 1983 and 1986,
he was Visiting Professor at the University of Stanford, Stanford, CA;
at the IBM Thomas J. Watson Research Center, Yorktown Heights, NY;
and at the University of Washington, Seattle. He is currently with the
Department of Electronics, Computer Science, and Systems, University
of Bologna. He has also been a Consultant for major companies and for
the Commission of the European Union in the definition, evaluation, and
review of research projects in microelectronics. He is author or coauthor of
more than 300 publications (more than half of which have been published
in major international journals), three books, and six patents in the field
of nonvolatile memories. His research interests include solid-state devices
and ICs. He is currently also working in the field of IC design, evaluation,
and testing.
Prof. Riccò has been President of the Group of Electron Devices, Tech-
nologies, andCircuitsoftheItalianAssociationof Electrical and Electronics
Engineers (AEI) since 1996, and was President of the Italian Group of Elec-
tronics Engineers from 1998 to 2001. In 1996, he recieved the G. Marconi
Award from the AEI. He was European Editor of the IEEE T
RANSACTIONS
ON
ELECTRON DEVICES from 1986 to 1996, European Cochair at the Inter-
national Electron Device Meeting (IEDM) from 1992 to 2001, and Vice-
Chairman of the North Italy Section of IEEE from 1999 to 2001. He has
been Chairman of the IEEE North Italy since 2002.
GROSSI et al.: PROGRAM SCHEMES FOR MULTILEVEL FLASH MEMORIES 601
... Fortunately, in the applications of permutation codes, the length of codes is usually not quite large. For example, it was indicated in [16] that, increasing the length of code utilized implies more critical constraints in terms of program and sensing accuracy. ...
... Note that we take the charge voltage level arrangment similar as in[16] for simulation in this paper. ...
Preprint
Permutation codes were extensively studied in order to correct different types of errors for the applications on power line communication and rank modulation for flash memory. In this paper, we introduce the neural network decoders for permutation codes to correct these errors with one-shot decoding, which treat the decoding as $n$ classification tasks for non-binary symbols for a code of length $n$. These are actually the first general decoders introduced to deal with any error type for these two applications. The performance of the decoders is evaluated by simulations with different error models.
... During the programming, the source is connected to the ground, while the control gate (CG) and the drain are applied with 9.1 V (1 μs) and 3.1 V (1 μs) voltage pulses, respectively. The program-and-verify algorithm [34] is adopted. Fig. 4(b) shows the measured transfer characteristics of different states. ...
... On the one hand, limited by the resolution of different storage states coming from the program algorithm [34] and the physical limit of the electron number in the FG [38], each NOR flash cell can only store finite values (0 ∼ N). On the other hand, low precision weights are sufficient for inference applications. ...
Article
Full-text available
In this article, we propose an efficient and robust spike-driven convolutional neural network (SCNN) based on the nor flash computing array (NFCA), which is mapped by the pretrained convolutional neural network with the same structure. The spike-driven system eliminates the additional analog-to-digital/digital-to-analog (AD/DA) conversion in the NFCA-based CNN. To study the performance of the hardware implementation, an NFCA-based SCNN for the recognition of the Mixed National Institute of Standards and Technology (MNIST) data set is simulated. Simulation results illustrate that the system achieves 97.94% accuracy with the computing speed of 1 x 10⁶ frame per second (fps). Compared with the typical mixed-signal NFCA-based CNN, the NFCA-based SCNN saves 97% area and 56% energy consumption. Moreover, the NFCA-based SCNN demonstrates great robustness to 30% image noise with less than 2% accuracy loss. The impact of random telegraph noise (RTN) is also greatly reduced in which less than 1% accuracy decrease can be achieved at the 32-nm technology node.
... The ‗1'-state is erased by applying V BG = +50 V. The difference of the drain current between ‗1'-state and ‗0'-state is large enough for flash memory operation (> 0.4 V, [207]). ...
Article
The evolution of electronic systems and portable devices requires innovation in both circuit design and transistor architecture. During last fifty years, the main issue in MOS transistor has been the gate length scaling down. The reduction of power consumption together with the co-integration of different functions is a more recent avenue. In bulk-Si planar technology, device shrinking seems to arrive at the end due to the multiplication of parasitic effects. The relay has been taken by novel SOI-like device architectures. In this perspective, this manuscript presents the main achievements of our work obtained with a variety of advanced fully depleted SOI MOSFETs, which are very promising candidates for next generation MOSFETs. Their electrical properties have been analyzed by systematic measurements and clarified by analytical models and/or simulations. Ultimately, appropriate applications have been proposed based on their beneficial features.In the first chapter, we briefly addressed the short-channel effects and the diverse technologies to improve device performance. The second chapter was dedicated to the detailed characterization and interesting properties of SOI devices. We have demonstrated excellent gate control and high performance in ultra-thin FD SOI MOSFET. The SCEs are efficiently suppressed by decreasing the body thickness below 7 nm. We have investigated the transport and electrostatic properties as well as the coupling mechanisms. The strong impact of body thickness and temperature range has been outlined. A similar approach was used to investigate and compare vertical double-gate and triple-gate FinFETs. DG FinFETs show enhanced coupling to back-gate bias which is applicable and suitable for dynamic threshold voltage tuning. We have proposed original models explaining the 3D coupling effect in FinFETs and the mobility behavior in ZnO TFTs. Our results pointed on the similarities and differences in SOI and ZnO transistors. According to our low-temperature measurements and new promoted extraction methods, the mobility in ZnO and the quality of ZnO/SiO2 interface are respectable, enabling innovating applications in flexible, transparent and power electronics. In the third chapter, we focused on the mobility behavior in planar SOI and FinFET devices by performing low-temperature magnetoresistance measurements. Unusual mobility curve with multi-branch aspect were obtained when two or more channels coexist and interplay. Another original result in the existence of the geometrical magnetoresistance in triple-gate and even double-gate FinFETs.The operation of a flash memory in FinFETs with ONO buried layer was explored in the forth chapter. Two charge injection mechanisms were proposed and systematically investigated. We have discussed the role of device geometry and temperature. Our novel ONO FinFlash concept has several distinct advantages: double-bit operation, separation of storage medium and reading interface, reliability and scalability. In the final chapter, we explored the avenue of unified memory, by combining nonvolatile and 1T-DRAM operations in a single transistor. The key result is that the transient current, relevant for 1T-DRAM operation, depends on the nonvolatile charges stored in the nitride buried layer. On the other hand, the trapped charges are not disturbed by the 1T-DRAM operation. Our experimental data offers the proof-of-concept for such advanced memory. The performance of the unified/multi-bit memory is already decent but will greatly improve in the coming years by processing dedicated devices.
Article
In multi-level-cell (MLC) memory such as Flash and Phase-change memory, shrinking cell size and the growing number of levels per cell worsen the access-rate to capacity ratio and even reduce access rate. We present Minimal Maximum-Level Programming (MMLP), a scheme for expediting cell programming by sharing physical cells among multiple data sectors and exploiting the fact that making moderate changes to a cell's charge level is faster than making large ones. Specifically, we encode the data such that in the kth writing of data to a cell, only the lowest k+1 levels are utilized. Unlike in previously proposed cell-sharing schemes, different same-size data sectors occupy different numbers of physical cells, and a cell may hold a fraction of a bit of a given data sector. Nevertheless, the exposed sector size remains unchanged. Data is encoded, but without redundancy. In a four-level cell example, we achieve up to 75% reduction in write latency. Read latency may be degraded, depending on the percentage of utilized capacity.
Article
Low-voltage programmed levels are hard to achieve in multilevel Flash memory using staircase CHEI (channel hot electron injection) programming. The reasons are that low-level programming marginally deviates from the linear relation between threshold voltage VTH and control gate voltage VCG. Forward bias enhancement of CHEI is proposed to overcome this drawback. It is demonstrated that the new technique creates a linear relation between VTH and VCG, validated down to a critical V CG that is at least 1 V lower than traditional CHEI. Through extensive measurements, it is further argued that the most suitable magnitude of forward bias is 0.5 V since (i) it produces the lowest program level of 1.4V; and (ii) higher biases cause not only large current consumption but also worsened drain disturb performance in NOR array configuration. The corresponding linear relation with the unity slope is maintained after 105 program/erase cycling.
Article
Floating-body-induced transient mechanism in advanced FinFETs was investigated for unified and multi-bit memory capability. Nonvolatile memory operation was achieved by modifying the SOI buried insulator (BOX) such as the SiO2-Si3N4-SiO2 (ONO) BOX can accumulate permanent charges. Charges are injected/removed in the Si3N4 layer by back-gate or drain bias and sensed remotely, by gate coupling, through the modulation of the drain current flowing at the front interface. On the other hand, the isolated silicon body of the transistor can store volatile charges, generated by impact ionization and able to modulate the drain current flowing at the back interface. Our experimental results successfully demonstrate that these two different memory modes can be advantageously combined for multi-bit volatile memory operation. The volatile memory behavior strongly depends on the distribution of the nonvolatile charges stored in the nitride buried layer. Our measurements manifest that the nonvolatile charges located near the drain terminal have larger influence on the volatile memory operation than the charges located at the opposite terminal. Also, we reveal that the bias conditions and device geometry are important factors for the two memory modes.
Article
Phase change memory (PCM) is a new solid-state memory technology that promises disruptive changes in the way servers and enterprise storage systems are built. Multilevel-cell (MLC) storage is highly desirable for increasing capacity and thus lowering cost-per-bit in memory technologies. In PCM, MLC storage is hampered by noise and resistance drift. In this paper, the issue of reliability in MLC PCM is addressed. A statistical model is developed that captures the main impairments in MLC PCM cell-arrays. A signal processing and coding framework is then introduced that provides robustness to drift and noise, improving reliability and prolonging data retention. Several examples of codes are provided and practical detection schemes are described.
Article
Full-text available
Communication and synchronization between master controller and coprocessors are critical issues in the design of parallel system-on-chip architectures, especially when applications are developed in a high-level programming language and run on a virtual run-time environment (such as a Java Virtual Machine). In this paper we propose a design space exploration environment based on a general HW-SW architectural template and a full-system cycle-accurate simulation tool, built on top of Simics. Our flow takes accurately into account the overheads caused by operating system, virtual environment, drivers, synchronization mechanisms and non-ideal memory system. In a top-down co-design flow, the proposed approach bridges the abstraction gap between HW-SW partitioning and HW synthesis. In particular it provides: i) a realistic evaluation of the effectiveness of a tentative partitioning, ii) guidelines for designing the HW-SW interface, iii) performance constraints for the synthesis of the HW components.
Article
Full-text available
In this work we introduce a hardware-validated simulation model for the exploration of real-time multimedia systems, where system components are modeled as interacting generalized semi-Markov processes (GSMPs). We apply the simulation model to explore the design space of a mobile client accessing streaming data through a wireless network. The model has been characterized and validated against power and performance measurements performed on an instrumented HP’s iPAQ with wireless LAN running a MPEG4 video application. We analyze the impact of tuning parameters for the real-time multimedia system (buffer sizes, channel bandwidth, power management policy) on the trade off between power consumption and QoS.
Book
Compilers and Operating Systems for Low Power focuses on both application-level compiler directed energy optimization and low-power operating systems. Chapters have been written exclusively for this volume by several of the leading researchers and application developers active in the field. The first six chapters focus on low energy operating systems, or more in general, energy-aware middleware services. The next five chapters are centered on compilation and code optimization. Finally, the last chapter takes a more general viewpoint on mobile computing. The material demonstrates the state-of-the-art work and proves that to obtain the best energy/performance characteristics, compilers, system software, and architecture must work together. The relationship between energy-aware middleware and wireless microsensors, mobile computing and other wireless applications are covered. This work will be of interest to researchers in the areas of low-power computing, embedded systems, compiler optimizations, and operating systems.
Chapter
The selection of a Flash cell approach is a reflection of the market and product features that a company decides to pursue. There are two major markets for Flash memories: one is the traditional embedded memory, and the other is the new emerging market of mass storage.
Article
In this paper we present an analysis of crosstalk effects on busses implementing error correcting codes. We show that the redundancy introduced by these codes can be exploited in order to avoid the worst case crosstalk-induced delay. Our analysis is based on the evaluation of the coupling effective capacitance which need to be charged during bus activity. In particular, we analyze the cases of the Hamming and Dual Rail codes. We show that Hamming codes do not allow us to avoid the most delay costly bus transitions, while this can be the case for Dual Rail codes. Furthermore, we illustrate that, by increasing the redundancy of the Dual Rail code by only one bit, even higher crosstalk-induced delay reductions can be achieved. Finally, we show that a further improvement can be obtained by an optimized placing of the bus wires.
Article
Well into the System-on-Chip era, power consump-tion has emerged as one of the most critical challenges to design complexity scaling. Moving from a critical assessment of current technologies and architectures, we survey the distinguishing fea-tures of a design methodology that aims at energy consumption reduction, under guaranteed quality of service (QoS), as a main objective in system design.