Content uploaded by Marco Grossi
Author content
All content in this area was uploaded by Marco Grossi on Nov 20, 2012
Content may be subject to copyright.
Program Schemes for Multilevel Flash Memories
MARCO GROSSI, MASSIMO LANZONI, AND BRUNO RICCÒ, FELLOW, IEEE
Invited Paper
This paper presents a synthetic overview of multilevel (ML)
Flash memory program methods. The problem of increasing pro-
gram time with the number of bits stored in each cell is discussed
and methods based on both channel hot electrons (CHE) and
Fowler–Nordheim tunneling (FNT) will be discussed. In the case
of CHE, the use of an increasing voltage rather than a constant
one on the control gate (CG) leads to narrower threshold voltage
distributions and smaller current absortion, with positive effects on
degree of parallelism and program throughput. As for FNT, much
faster programming than those commonly used today can be done
using high CG voltages without producing intolerable degradation
of cell reliability.
Keywords—Flash, memories, multilevel, programming.
I. INTRODUCTION
Emerging new applications for Flash memories (e.g.,
audio and video storage) have highly increased the demand
for high-density, low-cost memories. In this context, multi-
level (ML) storage [1] allows to memorize more than one bit
in each cell, thus offering significant cost per bit reduction
for the same cell dimension. ML storage, however, implies
more critical constraints in terms of program and sensing
accuracy, charge retention, read and write disturbs.
In particular, accurate programming requires the place-
ment of the right amount of charge on the cell floating gate
(FG) to produce tight threshold voltage (
) distributions.
If
denotes the number of bits per cell, 2 such distribu-
tions, adequately separated from each other, must cover a
total voltage window (TVW) (in pratice the difference be-
tween the highest and the lowest value of
) that tends to
shrink with new technologies aimed at low-voltage opera-
tions.
Accurate charge placement is normally obtained by
means of program and verify (P&V) algorithms featuring a
sequence of small steps, each followed by a read operation
Manuscript received July 1, 2002; revised January 5, 2003.
The authors are with the Department of Electronics, Computer Science,
and Systems, University of Bologna, 40136 Bologna, Italy (e-mail:
mgrossi@deis.unibo.it).
Digital Object Identifier 10.1109/JPROC.2003.811714
to determine whether or not further programming is to
be made. This approach obviously leads to the required
accuracy, provided that the individual program steps are
small enough. On the other hand, precision is heavily paid
for in terms of program throughput (PT), i.e., number of bits
that can be programmed per second, since the number of
P&V steps increases with decreasing
distribution widths.
This, of course is particularly true for increasing values of
(3,4,…), since the width of the distribution decreases
essentially as 2
(for the same TVW).
In spite of these problems, ML programming with 2 b/cell
in both
NOR
[2], [3] and NAND [4], [5] technology is already a
reality, while a substantial research effort is dedicated at the
cases with
3 and 4.
As for architectures, the
NOR solution has been so far
the mainstream Flash technology since: 1) it allows one
to program cells by both channel hot electrons (CHE) and
Fowler–Nordheim tunneling (FNT); and 2) the absence
of serial connected cells allows faster programming and
reading and avoids write disturbs (seriously affecting the
NAND case).
On the other hand, the
NAND solution is gaining in-
creasing interest due to: 1) its more compact layout (leading
to higher memory density and lower cost per bit); and 2)
the possibility to use very low (or even negative)
values,
thus effectively eliminating the problem of overerased cells
and the consequent need of erase and verify algorithm. A
symmetrical problem exists in
NAND memories for overpro-
gramming. Since unselected cells become pass transistors,
if a cell
is too high, this can prevent it from turning on.
The problem is, however, less important than overerase in
ML memories, since high accuracy in programming must
be guaranteed either in
NOR or NAND architecture to allow
many levels to be stored in the same TVW.
In the case of a
NOR Flash memory, Fig. 1 illustrates the
distributions required for 4, 8, and 16 levels, respectively.
The needs to avoid read disturbs due to excessively low
values as well as undesired programming of low cells
during reading impose a minimum and maximum
value,
thus effectively determining the TVW.
0018-9219/03$17.00 © 2003 IEEE
594 PROCEEDINGS OF THE IEEE, VOL. 91, NO. 4, APRIL 2003
(a) (b)
(c)
Fig. 1.
distributions for ML programming of NOR Flash memory in a TVW of 4.5 V. (a)
Four-level programming. (b) Eight-level programming. (c) 16-level programming.
In the case of Fig. 1, where a TVW of 4.5 V is considered,
the maximum gate voltage (
) applied during reading is
5.25, 5.4, and 5.85 V for 4, 8, and 16 levels, respectively.
In the
NAND architecture, Fig. 2 illustrates the
distribu-
tions for the eight-level
NAND memory discussed in [5]: the
distributionsare well separated (0.4 V), and, although the
maximum
appliedtononselectedword-linesinreadingis
6 V (a trade-off between fast reading and device reliability),
a reliable and efficient device is achieved.
II. M
ULTILEVEL PROGRAM METHODS
Flash memory programming is achieved by injecting elec-
trons into the FG. This can be obtained by means of two dif-
ferent physical mechanisms.
1) CHE: electrons in the channel of the cell MOSFET
gain enough energy by the driving electric field to be
injected into the FG (helped by the vertical electric
field, essentially due to
– ).
2) FNT: electrons are injected into the FG by tunneling
due to the high vertical electric field.
Compared with FNT, CHE requires lower voltages, with
benefits for the driving circuitry and device reliability, but
is also characterized by large current absortion that limits
the degree of parallelism (DOP) and is problematic for low-
power applications.
In the following sections, program methods for both CHE
and FNT are synthetically discussed.
III. C
HANNEL HOT ELECTRONS
NOR
Flash memories can be programmed by CHE using
two different techniques: 1) conventional box programming;
and 2) ramped voltage programming.
In the former method, a constant voltage is applied on the
CG during the whole operation, while in the latter
is
raised linearly during programming.
GROSSI et al.: PROGRAM SCHEMES FOR MULTILEVEL FLASH MEMORIES 595
Fig. 2. Target distributions for eight-level NAND Flash memory. The picture is taken from [5].
The FG voltage ( ) and the injection current into the FG
(
) are linked by the following equation [6]:
(1)
where
is the FG to CG capacitance; is the FG to
drain capacitance; and
is the total capacitance between
FG and the other MOSFET regions.
In conventional box programming,
, thus
.Since decreaseswith decreasing,both
the programming speed (
) and are high at thebeginning
of programming, but decrease with program time and reach
a low value at the end of the operation, as schematically il-
lustrated in Fig. 3(a) [6]. This behavior represents a problem
because high values of
(hence of ) limit the DOP, thus
the PT. Moreover, strong nonuniformities of
produce high
dispersion in programmed
, hence (relatively) wide
distributions.
Withramped voltageprogramminginstead,
isconstant
(hereafter,
is the slope of the gate bias waveform) and
; thus, . If the initial
valueof theramp applied toCG is set so that
,
the write operation takes place under equilibrium conditions
(
), where both and are constant, as schemati-
cally illustrated in Fig. 3(b) [6].
Qualitative waveforms of
and for ramped voltage
programmingare sketchedin Fig. 4(a) [6], whileFig. 4(b) [6]
showsthe expected transient behavior of
and (here,
denotesthe time necessary to reach the equilibriumcondition
). In Fig. 4(c) [6], the expected waveforms for
and are schematically described.
As already mentioned, constant
helps to maximize
DOP, hence PT. Furthermore, the linear relationship be-
tween programmed
and program time produces a better
accuracy in programming, hence, tighter
distributions.
distribution widths obtained with ramped voltage pro-
grammingdepend onprogrammingconditions,i.e.,drainand
substrate bias (
and , respectively)as well as on .
Fig. 5 shows the standard deviation (
) of the pro-
grammed
distribution measured on 10 K cells as a
function of
, for different values of and .
For all considered bias configurations, the minimum
is obtained at low program speeds (low ) and
increases with . Thus, a tradeoff is in order between
high program speeds and good accuracy in achieving the
final
value.
From this point of view, the ramped voltage programming
technique has been shown to be able to program a Flash
memory array on four levels (2 b/cell) without the need of
P&V algorithms [7], with substantial benefits of PT. In par-
ticular, assuming the same DOP (256), the method of [7] re-
sults in a
0.8 MB s, instead of 0.17 MB/s achieved in
[2].
The obtained
distributions are well separated, and the
minimum read margin (i.e., the difference between the cell
and the gate bias used in reading) is 0.4 V. Also, after 20
K program/erase (P/E) cycles, the read margin does not de-
grade much; thus, the reliability constraints for the memory
are guaranteed.
However, programming the memory on eight or more
levels without P&V algorithms requires a significant in-
crease in TVW that is not compatible with desirable circuit
specification. On the other hand, the use of ramped voltage
programming in conjunction with P&V is problematic,
because before each program step the exact value of cell
must be determined in order to set the correct initial value of
. Since determination is a time-consuming operation,
596 PROCEEDINGS OF THE IEEE, VOL. 91, NO. 4, APRIL 2003
Fig. 3. Conceptual plots of as a function of the FG voltage and corresponding typical
behavior of
during programming operation where the CG has (a) a box waveform or (b) a
ramp waveform.
ramped voltage programming with P&V is more convenient
than conventional box programming only if a minimum
number of verifications is used.
A new programming method that combines ramped
voltage programming with verify operations is described in
[8]. With this algorithm, programming is performed using
only two steps, each precedeed by a
determination.
In detail, and with reference to Fig. 6, the program
algorithm consists of the following steps. First, the initial
value (
) of the cell is determined. Second, the cell
is programmed from
to an intermediate target value
(
) using a ramped CG voltage with slope and
the same overdrive (
) for all cells. Third, the obtained
value (
)of after this program step is determined.
Fourth, the cell is programmed from
to the final
value
with a CG voltage of slope and overdrive
, where . The deter-
mination of
guarantees quasi-equilibrium conditions
during the first program operation, thus avoiding initial high
current absortion and loss of accuracy. The determination of
, instead, allows one to adjust the program overdrive
to account for the characteristics of each individual cell, and
represents the essential element to obtain adequate program
accuracy.
The algorithm is capable of achieving
distribution
widths and displacement of the distribution mean value from
the targets smaller than 150 and 20 mV, respectively.
This method is adequate for 3 b/cell ML schemes while,
for the case of 4 b/cell, the separation between
distribu-
tion is probably insufficient for direct use in real memories,
although the adoption of error correcting codes makes it pos-
sible to use it also for 16-level schemes.
The achieved program time is six times lower than that
obtained with the algorithm of [2] for 4 b/cell at cell level
(70.75 instead of 400
s) that, with a cell matrix scheme
featuring DOP
256 and parallel analog determination of
cells
, results in a PT about three times larger (0.9 instead
of 0.32 MB/s).
IV. F
OWLER–NORDHEIM TUNNELING
Compared with CHE, this programming method has the
advantage of small current absortion, particularly interesting
for low-power applications. Moreover, it allows very high
GROSSI et al.: PROGRAM SCHEMES FOR MULTILEVEL FLASH MEMORIES 597
Fig. 4. (a) Qualitative waveforms of the CG and drain voltages
for ramped voltage programming scheme as well as corresponding
behavior of (b)
, and (c) , .
Fig. 5. Dependence of on for 10 K cells at different
and . BWP indicates the for box programming.
DOP, thus leading to a strong increase in PT. In this regard,
the
NAND state of the art (based on FNT programming) pro-
duces a PT as high as 10 MB/s [9].
However, asdescribed in [10], FNT has several drawbacks
that make it less effective than CHE for ML applications. In
particular, programming by tunneling is more sensitive than
CHE to process parameters, and this produces wider
dis-
tributions. Furthermore, the applied voltages are higher than
with CHE, and this produces high stress in the oxide, re-
sulting in worse devicereliability. In this regard, Fig. 7shows
read disturb time, i.e., the time to produce a 0.5-V
shift
duetodrain stress, as a functionofnumberof P/E cycles[11].
Thus,sincethe applied voltages cannotbetoohigh,program-
ming currents (
) are low; this leads to high programming
times (in the range of 10 ms as opposite to the few
s for
CHE programming).
To maintain competitive PT, high parallel programming
is required, and this leads to high circuit complexity and
die-size overhead, although parallel programming for FNT
is simpler to implement than for CHE.
Compared to CHE, FNT tends to produce wider
distri-
butions and higher programming time; thus, efficient P&V
algorithms are needed in ML programming to guarantee
good program accuracy and PT.
In [12], three different P&V algorithms (schematically
shown in Fig. 8) are presented for a
NAND Flash memory.
Fig. 8(a) illustrates the conventional P&V technique where
pulses of variable widthare applied on the CG,while a verify
operation is carried out between two write pulses. The first
writepulses aresufficientlyshortso asto ensurethatfastcells
willnot overprogrammed,thenthe pulsewidth isincreasedto
minimizethe numberofverifystepsfor slowcells.
Fig. 8(b) shows the trapezoidal pulse algorithm that
achieves much better results than in Fig. 8(a). Higher pro-
gramming speed can be obtained, while the oxide electric
field (
) can be reduced. Moreover, programming time
increase with
distribution width reduction is much
weaker than for the previous case.
Fig. 8(c) instead shows the staircase pulse algorithm that
uses the same approach as in Fig. 8(b) but it is much easier
to generate on-chip.
In Fig. 9, the main characteristics of both FNT and CHE
are compared. Since the advantage of less disturbs and lower
electric fieldsare more important thanthe large DOP allowed
by FNT, CHE seemsto bemore suitable for ML applications,
at least when low power consumption is not the main con-
straint.
Of course, with FNT it is possible to reduce program time
(
) by increasing , thus trading off and device relia-
bility. In this regard, stress-induced leakage current (SILC),
degrading data retention time, is the main phenomenon, and
it has conventionally been considered to increase with
,
thus with the decrease of
[13] (for the same charge flu-
ence, i.e., total charge injected through the oxide).
However, recent studies[14] have shownthat, forthe same
charge fluence, initially SILC increases with decreasing
,
but it tends to decrease with
as the stress time becomes
comparable to the characteristic time required for permanent
oxide degradation.
Fig. 10 shows SILC characteristics of Flash memory cells
as a function of
and for different program condi-
tions. Fig. 10(a) shows that SILC after a 10 K P/E cycling
with
20 ns is not much larger than the one obtained
598 PROCEEDINGS OF THE IEEE, VOL. 91, NO. 4, APRIL 2003
Fig. 6. Representation of the novel algorithm that combines ramped voltage programming with
verify operations. Inside the boxes, the CG voltage during the two program steps is shown.
Fig. 7. A comparison between the read disturb due to CHE and FNT programming, as a function of
P/E cycles. The picture is taken from [11].
with 30 s. Instead, Fig. 10(b) shows that SILC stops
increasing for
1 s and (slightly) decreases with
below such a value.
This shows that FNT programming of Flash memory with
aslowas 20nsis feasible, withgoodresultsin term ofdata
retention, provided that sufficiently low
during reading
is applied.
In this regard, in Fig. 11the maximum readdisturb voltage
(
) compatible with a data retention time of ten years
after 10 K P/E cycles is shown as a function of
.For as
low as 20 ns, this maximum value is about 2.5 V.
However, a significant problem for FNT is due to the
high voltages needed for fast programming [in the case
of Fig. 10(a), for
20 ns, it is 26.5 V], since
this leads to challenging constraints for the high-voltage
programming circuit.
Scaling the oxide thickness has favorable effects because
it decreases the values of
for the same oxide field, but
also produces a drastic decrease in data retention time.
In [15], measurements performed on 6.5-nm oxide Flash
memories have shown a data retention time of 13 hours
after 10 K P/E cycles with a maximum
of 2.5 V during
reading.
Such a retention time is small compared to the ten-year re-
tention of conventional nonvolatile memories, but it is more
than three orders of magnitude greater than typical DRAM
refresh time, thus making fast FNTpotentially interesting for
DRAM-like applications.
GROSSI et al.: PROGRAM SCHEMES FOR MULTILEVEL FLASH MEMORIES 599
Fig. 8. Conventional (a), trapezoidal (b), and staircase (c)
programming pulses. A verify step is carried out after each pulse.
The picture is taken from [12].
Fig. 9. Comparison of FNT and CHE programming mechanisms
for ML applications. The picture is taken from[11].
V. C ONCLUSION
This paper has presented a synthetic review of different
program techniques for ML Flash memories based both on
CHE injection and FNT.
Inthe case of CHE, rampedvoltageprogramminghasbeen
shown able to achieve tighter
distributions and higher
program throughput than the conventional box techniques.
In fact, programming on four levels is feasible without the
use of P&V algorithms. Instead, with 8 or 16 levels, P&V
is mandatory and problems are in order because of the diffi-
cultyof conjugatingramped voltageprogramming andverify
operations.
In the case of FNT, instead, fast programming with pulse
duration of 20 ns seems able to produce very high PT (com-
parable with DRAMs). However, problems occur because of
needto usehigh-voltagecircuitryand/or the reduction ofdata
retention time due to decreased tunnel oxide thickness. For
these reasons, fast FNT seems more suitable for DRAM-like
applications than conventional nonvolatile memories.
Fig. 10. SILC characteristics of the Flash memory cells after
10 K P/E cycling (a) as a function of
for different program
conditions and (b) as a function of
.
Fig. 11. Maximum read disturb voltage which still
guarantees a data retention time of 10 years versus
after 10
K P/E cycles.
REFERENCES
[1] B. Riccò, G. Torelli, M. Lanzoni, A. Manstretta, H. Maes, D. Mon-
tanari, and A. Modelli, “Nonvolatile multilevel memories for digital
applications,” Proc. IEEE, vol. 86, pp. 2399–2421, Dec. 1998.
[2] A. Silvagni, S. Zanardi, A. Manstretta, and M. Scotti, “Modular ar-
chitecture for a family of multilevel 256/192/128/64 mbit 2-bit/cell3
v only NOR Flash memory devices,” IEEE Trans. Electron Devices,
vol. 48, pp. 937–940, Jan. 2001.
[3] M. Bauer, “A multilevel-cell 32 Mb Flash memory,” in IEEE ISSCC
Tech. Dig., 1995, pp. 132–133.
[4] T.-S. Jung, Y.-J. Choi, and K.-D. Suh, “A 117 mm
3.3 v only 128
mb multilevel NAND Flash memory for mass storage applications,”
IEEE J. Solid-State Circuits, vol. 31, pp. 1575–1583, Nov. 1996.
600 PROCEEDINGS OF THE IEEE, VOL. 91, NO. 4, APRIL 2003
[5] H. Nobukata, S. Takagi, and K. Hiraga, “A 144-Mb, eight-level
NAND Flash memory with optimized pulsewidth programming,”
IEEE J. Solid-State Circuits, vol. 35, pp. 682–690, May 2000.
[6] D. Esseni, A. D. Strada, P. Cappelletti, and B. Riccò, “A new
and flexible scheme for hot-electron programming of nonvolatile
memory cells,” IEEE Trans. Electron Devices, vol. 46, pp. 125–133,
Jan. 1999.
[7] R. Versari, D. Esseni, G. Falavigna, M. Lanzoni, and B. Riccò, “Op-
timized programming of multilevel Flash EEPROMs,” IEEE Trans.
Electron Devices, vol. 48, pp. 1641–1646, Aug. 2001.
[8] M. Grossi, M. Lanzoni, and B. Riccò, “A novel algorithm for high
throughput programming of multi-level Flash memories,” IEEE
Trans. Electron Devices., submitted for publication.
[9] H. Nakamura, K. Imamiya, and T. Himeno, “A 125 mm
1Gb
NAND Flash memory with lOMB/s program throughput,” in IEEE
ISSCC Tech. Dig., vol. 1, 2002, pp. 106–450.
[10] B. Eitan, R. Kazerounian,A.Roy, G. Crisenza, P. Cappelletti,andA.
Modelli, “MultilevelFlashcellsandtheir trade-offs,”inIEEEIEDM
Tech. Dig., 1996, pp. 169–172.
[11] B. Eitan and A. Roy, “Binary and multilevel Flash cells,” in
Flash Memories, P. Cappelletti, C. Golla, P. Olivo, and E. Zanoni,
Eds. Boston, MA: Kluwer, 1999, pp. 91–152.
[12] G. Hemink, T. Tanaka, and T. Endoh, “Fast and accurate program-
ming method for multi-level NAND EEPROM’s,” in Symp. VLSI
Technology Dig. Tech. Papers, 1995, pp. 129–130.
[13] R. Moazzami and C. Hu, “Stress-induced current in thin silicon
dioxide film,” in IEEE IEDM Tech. Dig., 1992, pp. 139–141.
[14] R. Versari, A. Pieracci, D. Morigi, and B. Riccò, “Fast tunneling pro-
gramming of nonvolatile memories,” IEEE Trans. Electron Devices,
pp. 1285–1287, June 2000.
[15] R. Versari, A. Pieracci, and B. Riccò, “Fast programming/erasing of
thin-oxide EEPROMs,” IEEE Trans. Electron Devices, pp. 817–819,
Apr. 2001.
Marco Grossi was born in Bologna, Italy,
in 1973. He received the Laurea degree in
electronic engineering from the University
of Bolognain 2000. He is currently working
toward the Ph.D. degree at the Department of
Electronics, Computer Science, and Systems
Laboratory, University of Bologna.
His research interest is characterization of non-
volatile memories. He is currently working in the
field of Flash memories and the multilevel pro-
gramming of these memories using the ramped
gate technique.
Massimo Lanzoni was born in Bologna, Italy,
in 1961. He received the Laurea degree in
electronic engineering from the University of
Bologna, Bologna, Italy, in 1987.
He is with the Microelectronics Research
Group, Department of Electronics, Computer
Science, and Systems, University of Bologna,
working on research projects in the fields of
nonvolatile memories, MOS devices, virtual in-
strumentation, and testing. His research interests
include the characterization of thin dielectrics
reliability, nonvolatile memory cell characteristics and reliability, MOS
transistors’ experimental characterization and new techniques for IC
testing as nonvolatile memories endurance testing and CMOS IC latch-up
testing. He is now involved in projects concerning analog applications of
nonvolatile memories and multilevel programming.
Bruno Riccò (Fellow, IEEE) was born in Parma,
Italy, in 1947. He received the Laurea degree
in electrical engineering from the University
of Bologna, Bologna, Italy, in 1971 and the
Ph.D. degree from the University of Cambridge,
Cambridge, U.K., in 1976, where he worked at
the Cavendish Laboratory.
In 1980, he was a Full Professor of Electronics
at the University of Padova, Padova, Italy. In
1983, he was a Full Professor of Electronics at
the University of Bologna. In 1983 and 1986,
he was Visiting Professor at the University of Stanford, Stanford, CA;
at the IBM Thomas J. Watson Research Center, Yorktown Heights, NY;
and at the University of Washington, Seattle. He is currently with the
Department of Electronics, Computer Science, and Systems, University
of Bologna. He has also been a Consultant for major companies and for
the Commission of the European Union in the definition, evaluation, and
review of research projects in microelectronics. He is author or coauthor of
more than 300 publications (more than half of which have been published
in major international journals), three books, and six patents in the field
of nonvolatile memories. His research interests include solid-state devices
and ICs. He is currently also working in the field of IC design, evaluation,
and testing.
Prof. Riccò has been President of the Group of Electron Devices, Tech-
nologies, andCircuitsoftheItalianAssociationof Electrical and Electronics
Engineers (AEI) since 1996, and was President of the Italian Group of Elec-
tronics Engineers from 1998 to 2001. In 1996, he recieved the G. Marconi
Award from the AEI. He was European Editor of the IEEE T
RANSACTIONS
ON
ELECTRON DEVICES from 1986 to 1996, European Cochair at the Inter-
national Electron Device Meeting (IEDM) from 1992 to 2001, and Vice-
Chairman of the North Italy Section of IEEE from 1999 to 2001. He has
been Chairman of the IEEE North Italy since 2002.
GROSSI et al.: PROGRAM SCHEMES FOR MULTILEVEL FLASH MEMORIES 601