ArticlePDF Available

Program Schemes For Multilevel Flash Memories

May 2003
Proceedings of the IEEE 91(4):594 - 601

May 2003
91(4):594 - 601

DOI:10.1109/JPROC.2003.811714

Source
IEEE Xplore

Authors:

Marco Grossi

University of Bologna

Massimo Lanzoni

University of Bologna

Bruno Riccò

University of Bologna

This paper presents a synthetic overview of multilevel (ML) flash memory program methods. The problem of increasing program time with the number of bits stored in each cell is discussed and methods based on both channel hot electrons (CHE) and Fowler-Nordheim tunneling (FNT) are discussed. In the case of CHE, the use of an increasing voltage rather than a constant one on the control gate (CG) leads to narrower threshold voltage distributions and smaller current absorption, with positive effects on the degree of parallelism and program throughput. As for FNT, much faster programming than that commonly used today can be done using high CG voltages without producing intolerable degradation of cell reliability.

V distributions for ML programming of NOR Flash memory in a TVW of 4.5 V. (a) Four-level programming. (b) Eight-level programming. (c) 16-level programming.

…

Target V distributions for eight-level NAND Flash memory. The picture is taken from [5].

…

Conceptual plots of _ V as a function of the FG voltage V and corresponding typical behavior of V during programming operation where the CG has (a) a box waveform or (b) a ramp waveform.

…

(a) Qualitative waveforms of the CG and drain voltages for ramped voltage programming scheme as well as corresponding behavior of (b) V , I and (c) V , I .

…

Dependence of on V

…

Figures - uploaded by Marco Grossi

Content may be subject to copyright.

Content uploaded by Marco Grossi

Content may be subject to copyright.

Program Schemes for Multilevel Flash Memories

MARCO GROSSI, MASSIMO LANZONI, AND BRUNO RICCÒ, FELLOW, IEEE

Invited Paper

This paper presents a synthetic overview of multilevel (ML)

Flash memory program methods. The problem of increasing pro-

gram time with the number of bits stored in each cell is discussed

and methods based on both channel hot electrons (CHE) and

Fowler–Nordheim tunneling (FNT) will be discussed. In the case

of CHE, the use of an increasing voltage rather than a constant

one on the control gate (CG) leads to narrower threshold voltage

distributions and smaller current absortion, with positive effects on

degree of parallelism and program throughput. As for FNT, much

faster programming than those commonly used today can be done

using high CG voltages without producing intolerable degradation

of cell reliability.

Keywords—Flash, memories, multilevel, programming.

I. INTRODUCTION

Emerging new applications for Flash memories (e.g.,

audio and video storage) have highly increased the demand

for high-density, low-cost memories. In this context, multi-

level (ML) storage [1] allows to memorize more than one bit

in each cell, thus offering significant cost per bit reduction

for the same cell dimension. ML storage, however, implies

more critical constraints in terms of program and sensing

accuracy, charge retention, read and write disturbs.

In particular, accurate programming requires the place-

ment of the right amount of charge on the cell floating gate

(FG) to produce tight threshold voltage (

) distributions.

denotes the number of bits per cell, 2 such distribu-

tions, adequately separated from each other, must cover a

total voltage window (TVW) (in pratice the difference be-

tween the highest and the lowest value of

) that tends to

shrink with new technologies aimed at low-voltage opera-

tions.

Accurate charge placement is normally obtained by

means of program and verify (P&V) algorithms featuring a

sequence of small steps, each followed by a read operation

Manuscript received July 1, 2002; revised January 5, 2003.

The authors are with the Department of Electronics, Computer Science,

and Systems, University of Bologna, 40136 Bologna, Italy (e-mail:

mgrossi@deis.unibo.it).

Digital Object Identifier 10.1109/JPROC.2003.811714

to determine whether or not further programming is to

be made. This approach obviously leads to the required

accuracy, provided that the individual program steps are

small enough. On the other hand, precision is heavily paid

for in terms of program throughput (PT), i.e., number of bits

that can be programmed per second, since the number of

P&V steps increases with decreasing

distribution widths.

This, of course is particularly true for increasing values of

(3,4,…), since the width of the distribution decreases

essentially as 2

(for the same TVW).

In spite of these problems, ML programming with 2 b/cell

in both

NOR

[2], [3] and NAND [4], [5] technology is already a

reality, while a substantial research effort is dedicated at the

cases with

3 and 4.

As for architectures, the

NOR solution has been so far

the mainstream Flash technology since: 1) it allows one

to program cells by both channel hot electrons (CHE) and

Fowler–Nordheim tunneling (FNT); and 2) the absence

of serial connected cells allows faster programming and

reading and avoids write disturbs (seriously affecting the

NAND case).

On the other hand, the

NAND solution is gaining in-

creasing interest due to: 1) its more compact layout (leading

to higher memory density and lower cost per bit); and 2)

the possibility to use very low (or even negative)

values,

thus effectively eliminating the problem of overerased cells

and the consequent need of erase and verify algorithm. A

symmetrical problem exists in

NAND memories for overpro-

gramming. Since unselected cells become pass transistors,

if a cell

is too high, this can prevent it from turning on.

The problem is, however, less important than overerase in

ML memories, since high accuracy in programming must

be guaranteed either in

NOR or NAND architecture to allow

many levels to be stored in the same TVW.

In the case of a

NOR Flash memory, Fig. 1 illustrates the

distributions required for 4, 8, and 16 levels, respectively.

The needs to avoid read disturbs due to excessively low

values as well as undesired programming of low cells

during reading impose a minimum and maximum

value,

thus effectively determining the TVW.

594 PROCEEDINGS OF THE IEEE, VOL. 91, NO. 4, APRIL 2003

(a) (b)

(c)

Fig. 1.

distributions for ML programming of NOR Flash memory in a TVW of 4.5 V. (a)

Four-level programming. (b) Eight-level programming. (c) 16-level programming.

In the case of Fig. 1, where a TVW of 4.5 V is considered,

the maximum gate voltage (

) applied during reading is

5.25, 5.4, and 5.85 V for 4, 8, and 16 levels, respectively.

In the

NAND architecture, Fig. 2 illustrates the

distribu-

tions for the eight-level

NAND memory discussed in [5]: the

distributionsare well separated (0.4 V), and, although the

maximum

appliedtononselectedword-linesinreadingis

6 V (a trade-off between fast reading and device reliability),

a reliable and efficient device is achieved.

II. M

ULTILEVEL PROGRAM METHODS

Flash memory programming is achieved by injecting elec-

trons into the FG. This can be obtained by means of two dif-

ferent physical mechanisms.

1) CHE: electrons in the channel of the cell MOSFET

gain enough energy by the driving electric field to be

injected into the FG (helped by the vertical electric

field, essentially due to

– ).

2) FNT: electrons are injected into the FG by tunneling

due to the high vertical electric field.

Compared with FNT, CHE requires lower voltages, with

benefits for the driving circuitry and device reliability, but

is also characterized by large current absortion that limits

the degree of parallelism (DOP) and is problematic for low-

power applications.

In the following sections, program methods for both CHE

and FNT are synthetically discussed.

III. C

HANNEL HOT ELECTRONS

NOR

Flash memories can be programmed by CHE using

two different techniques: 1) conventional box programming;

and 2) ramped voltage programming.

In the former method, a constant voltage is applied on the

CG during the whole operation, while in the latter

raised linearly during programming.

GROSSI et al.: PROGRAM SCHEMES FOR MULTILEVEL FLASH MEMORIES 595

Fig. 2. Target distributions for eight-level NAND Flash memory. The picture is taken from [5].

The FG voltage ( ) and the injection current into the FG

(

) are linked by the following equation [6]:

(1)

where

is the FG to CG capacitance; is the FG to

drain capacitance; and

is the total capacitance between

FG and the other MOSFET regions.

In conventional box programming,

, thus

.Since decreaseswith decreasing,both

the programming speed (

) and are high at thebeginning

of programming, but decrease with program time and reach

a low value at the end of the operation, as schematically il-

lustrated in Fig. 3(a) [6]. This behavior represents a problem

because high values of

(hence of ) limit the DOP, thus

the PT. Moreover, strong nonuniformities of

produce high

dispersion in programmed

, hence (relatively) wide

distributions.

Withramped voltageprogramminginstead,

isconstant

(hereafter,

is the slope of the gate bias waveform) and

; thus, . If the initial

valueof theramp applied toCG is set so that

the write operation takes place under equilibrium conditions

(

), where both and are constant, as schemati-

cally illustrated in Fig. 3(b) [6].

Qualitative waveforms of

and for ramped voltage

programmingare sketchedin Fig. 4(a) [6], whileFig. 4(b) [6]

showsthe expected transient behavior of

and (here,

denotesthe time necessary to reach the equilibriumcondition

). In Fig. 4(c) [6], the expected waveforms for

and are schematically described.

As already mentioned, constant

helps to maximize

DOP, hence PT. Furthermore, the linear relationship be-

tween programmed

and program time produces a better

accuracy in programming, hence, tighter

distributions.

distribution widths obtained with ramped voltage pro-

grammingdepend onprogrammingconditions,i.e.,drainand

substrate bias (

and , respectively)as well as on .

Fig. 5 shows the standard deviation (

) of the pro-

grammed

distribution measured on 10 K cells as a

function of

, for different values of and .

For all considered bias configurations, the minimum

is obtained at low program speeds (low ) and

increases with . Thus, a tradeoff is in order between

high program speeds and good accuracy in achieving the

final

value.

From this point of view, the ramped voltage programming

technique has been shown to be able to program a Flash

memory array on four levels (2 b/cell) without the need of

P&V algorithms [7], with substantial benefits of PT. In par-

ticular, assuming the same DOP (256), the method of [7] re-

sults in a

0.8 MB s, instead of 0.17 MB/s achieved in

[2].

The obtained

distributions are well separated, and the

minimum read margin (i.e., the difference between the cell

and the gate bias used in reading) is 0.4 V. Also, after 20

K program/erase (P/E) cycles, the read margin does not de-

grade much; thus, the reliability constraints for the memory

are guaranteed.

However, programming the memory on eight or more

levels without P&V algorithms requires a significant in-

crease in TVW that is not compatible with desirable circuit

specification. On the other hand, the use of ramped voltage

programming in conjunction with P&V is problematic,

because before each program step the exact value of cell

must be determined in order to set the correct initial value of

. Since determination is a time-consuming operation,

596 PROCEEDINGS OF THE IEEE, VOL. 91, NO. 4, APRIL 2003

Fig. 3. Conceptual plots of as a function of the FG voltage and corresponding typical

behavior of

during programming operation where the CG has (a) a box waveform or (b) a

ramp waveform.

ramped voltage programming with P&V is more convenient

than conventional box programming only if a minimum

number of verifications is used.

A new programming method that combines ramped

voltage programming with verify operations is described in

[8]. With this algorithm, programming is performed using

only two steps, each precedeed by a

determination.

In detail, and with reference to Fig. 6, the program

algorithm consists of the following steps. First, the initial

value (

) of the cell is determined. Second, the cell

is programmed from

to an intermediate target value

(

) using a ramped CG voltage with slope and

the same overdrive (

) for all cells. Third, the obtained

value (

)of after this program step is determined.

Fourth, the cell is programmed from

to the final

value

with a CG voltage of slope and overdrive

, where . The deter-

mination of

guarantees quasi-equilibrium conditions

during the first program operation, thus avoiding initial high

current absortion and loss of accuracy. The determination of

, instead, allows one to adjust the program overdrive

to account for the characteristics of each individual cell, and

represents the essential element to obtain adequate program

accuracy.

The algorithm is capable of achieving

distribution

widths and displacement of the distribution mean value from

the targets smaller than 150 and 20 mV, respectively.

This method is adequate for 3 b/cell ML schemes while,

for the case of 4 b/cell, the separation between

distribu-

tion is probably insufficient for direct use in real memories,

although the adoption of error correcting codes makes it pos-

sible to use it also for 16-level schemes.

The achieved program time is six times lower than that

obtained with the algorithm of [2] for 4 b/cell at cell level

(70.75 instead of 400

s) that, with a cell matrix scheme

featuring DOP

256 and parallel analog determination of

cells

, results in a PT about three times larger (0.9 instead

of 0.32 MB/s).

IV. F

OWLER–NORDHEIM TUNNELING

Compared with CHE, this programming method has the

advantage of small current absortion, particularly interesting

for low-power applications. Moreover, it allows very high

GROSSI et al.: PROGRAM SCHEMES FOR MULTILEVEL FLASH MEMORIES 597

Fig. 4. (a) Qualitative waveforms of the CG and drain voltages

for ramped voltage programming scheme as well as corresponding

behavior of (b)

, and (c) , .

Fig. 5. Dependence of on for 10 K cells at different

and . BWP indicates the for box programming.

DOP, thus leading to a strong increase in PT. In this regard,

the

NAND state of the art (based on FNT programming) pro-

duces a PT as high as 10 MB/s [9].

However, asdescribed in [10], FNT has several drawbacks

that make it less effective than CHE for ML applications. In

particular, programming by tunneling is more sensitive than

CHE to process parameters, and this produces wider

dis-

tributions. Furthermore, the applied voltages are higher than

with CHE, and this produces high stress in the oxide, re-

sulting in worse devicereliability. In this regard, Fig. 7shows

read disturb time, i.e., the time to produce a 0.5-V

shift

duetodrain stress, as a functionofnumberof P/E cycles[11].

Thus,sincethe applied voltages cannotbetoohigh,program-

ming currents (

) are low; this leads to high programming

times (in the range of 10 ms as opposite to the few

s for

CHE programming).

To maintain competitive PT, high parallel programming

is required, and this leads to high circuit complexity and

die-size overhead, although parallel programming for FNT

is simpler to implement than for CHE.

Compared to CHE, FNT tends to produce wider

distri-

butions and higher programming time; thus, efficient P&V

algorithms are needed in ML programming to guarantee

good program accuracy and PT.

In [12], three different P&V algorithms (schematically

shown in Fig. 8) are presented for a

NAND Flash memory.

Fig. 8(a) illustrates the conventional P&V technique where

pulses of variable widthare applied on the CG,while a verify

operation is carried out between two write pulses. The first

writepulses aresufficientlyshortso asto ensurethatfastcells

willnot overprogrammed,thenthe pulsewidth isincreasedto

minimizethe numberofverifystepsfor slowcells.

Fig. 8(b) shows the trapezoidal pulse algorithm that

achieves much better results than in Fig. 8(a). Higher pro-

gramming speed can be obtained, while the oxide electric

field (

) can be reduced. Moreover, programming time

increase with

distribution width reduction is much

weaker than for the previous case.

Fig. 8(c) instead shows the staircase pulse algorithm that

uses the same approach as in Fig. 8(b) but it is much easier

to generate on-chip.

In Fig. 9, the main characteristics of both FNT and CHE

are compared. Since the advantage of less disturbs and lower

electric fieldsare more important thanthe large DOP allowed

by FNT, CHE seemsto bemore suitable for ML applications,

at least when low power consumption is not the main con-

straint.

Of course, with FNT it is possible to reduce program time

(

) by increasing , thus trading off and device relia-

bility. In this regard, stress-induced leakage current (SILC),

degrading data retention time, is the main phenomenon, and

it has conventionally been considered to increase with

thus with the decrease of

[13] (for the same charge flu-

ence, i.e., total charge injected through the oxide).

However, recent studies[14] have shownthat, forthe same

charge fluence, initially SILC increases with decreasing

but it tends to decrease with

as the stress time becomes

comparable to the characteristic time required for permanent

oxide degradation.

Fig. 10 shows SILC characteristics of Flash memory cells

as a function of

and for different program condi-

tions. Fig. 10(a) shows that SILC after a 10 K P/E cycling

with

20 ns is not much larger than the one obtained

598 PROCEEDINGS OF THE IEEE, VOL. 91, NO. 4, APRIL 2003

Fig. 6. Representation of the novel algorithm that combines ramped voltage programming with

verify operations. Inside the boxes, the CG voltage during the two program steps is shown.

Fig. 7. A comparison between the read disturb due to CHE and FNT programming, as a function of

P/E cycles. The picture is taken from [11].

with 30 s. Instead, Fig. 10(b) shows that SILC stops

increasing for

1 s and (slightly) decreases with

below such a value.

This shows that FNT programming of Flash memory with

aslowas 20nsis feasible, withgoodresultsin term ofdata

retention, provided that sufficiently low

during reading

is applied.

In this regard, in Fig. 11the maximum readdisturb voltage

(

) compatible with a data retention time of ten years

after 10 K P/E cycles is shown as a function of

.For as

low as 20 ns, this maximum value is about 2.5 V.

However, a significant problem for FNT is due to the

high voltages needed for fast programming [in the case

of Fig. 10(a), for

20 ns, it is 26.5 V], since

this leads to challenging constraints for the high-voltage

programming circuit.

Scaling the oxide thickness has favorable effects because

it decreases the values of

for the same oxide field, but

also produces a drastic decrease in data retention time.

In [15], measurements performed on 6.5-nm oxide Flash

memories have shown a data retention time of 13 hours

after 10 K P/E cycles with a maximum

of 2.5 V during

reading.

Such a retention time is small compared to the ten-year re-

tention of conventional nonvolatile memories, but it is more

than three orders of magnitude greater than typical DRAM

refresh time, thus making fast FNTpotentially interesting for

DRAM-like applications.

GROSSI et al.: PROGRAM SCHEMES FOR MULTILEVEL FLASH MEMORIES 599

Fig. 8. Conventional (a), trapezoidal (b), and staircase (c)

programming pulses. A verify step is carried out after each pulse.

The picture is taken from [12].

Fig. 9. Comparison of FNT and CHE programming mechanisms

for ML applications. The picture is taken from[11].

V. C ONCLUSION

This paper has presented a synthetic review of different

program techniques for ML Flash memories based both on

CHE injection and FNT.

Inthe case of CHE, rampedvoltageprogramminghasbeen

shown able to achieve tighter

distributions and higher

program throughput than the conventional box techniques.

In fact, programming on four levels is feasible without the

use of P&V algorithms. Instead, with 8 or 16 levels, P&V

is mandatory and problems are in order because of the diffi-

cultyof conjugatingramped voltageprogramming andverify

operations.

In the case of FNT, instead, fast programming with pulse

duration of 20 ns seems able to produce very high PT (com-

parable with DRAMs). However, problems occur because of

needto usehigh-voltagecircuitryand/or the reduction ofdata

retention time due to decreased tunnel oxide thickness. For

these reasons, fast FNT seems more suitable for DRAM-like

applications than conventional nonvolatile memories.

Fig. 10. SILC characteristics of the Flash memory cells after

10 K P/E cycling (a) as a function of

for different program

conditions and (b) as a function of

Fig. 11. Maximum read disturb voltage which still

guarantees a data retention time of 10 years versus

after 10

K P/E cycles.

REFERENCES

[1] B. Riccò, G. Torelli, M. Lanzoni, A. Manstretta, H. Maes, D. Mon-

tanari, and A. Modelli, “Nonvolatile multilevel memories for digital

applications,” Proc. IEEE, vol. 86, pp. 2399–2421, Dec. 1998.

[2] A. Silvagni, S. Zanardi, A. Manstretta, and M. Scotti, “Modular ar-

chitecture for a family of multilevel 256/192/128/64 mbit 2-bit/cell3

v only NOR Flash memory devices,” IEEE Trans. Electron Devices,

vol. 48, pp. 937–940, Jan. 2001.

[3] M. Bauer, “A multilevel-cell 32 Mb Flash memory,” in IEEE ISSCC

Tech. Dig., 1995, pp. 132–133.

[4] T.-S. Jung, Y.-J. Choi, and K.-D. Suh, “A 117 mm

3.3 v only 128

mb multilevel NAND Flash memory for mass storage applications,”

IEEE J. Solid-State Circuits, vol. 31, pp. 1575–1583, Nov. 1996.

600 PROCEEDINGS OF THE IEEE, VOL. 91, NO. 4, APRIL 2003

[5] H. Nobukata, S. Takagi, and K. Hiraga, “A 144-Mb, eight-level

NAND Flash memory with optimized pulsewidth programming,”

IEEE J. Solid-State Circuits, vol. 35, pp. 682–690, May 2000.

[6] D. Esseni, A. D. Strada, P. Cappelletti, and B. Riccò, “A new

and flexible scheme for hot-electron programming of nonvolatile

memory cells,” IEEE Trans. Electron Devices, vol. 46, pp. 125–133,

Jan. 1999.

[7] R. Versari, D. Esseni, G. Falavigna, M. Lanzoni, and B. Riccò, “Op-

timized programming of multilevel Flash EEPROMs,” IEEE Trans.

Electron Devices, vol. 48, pp. 1641–1646, Aug. 2001.

[8] M. Grossi, M. Lanzoni, and B. Riccò, “A novel algorithm for high

throughput programming of multi-level Flash memories,” IEEE

Trans. Electron Devices., submitted for publication.

[9] H. Nakamura, K. Imamiya, and T. Himeno, “A 125 mm

1Gb

NAND Flash memory with lOMB/s program throughput,” in IEEE

ISSCC Tech. Dig., vol. 1, 2002, pp. 106–450.

[10] B. Eitan, R. Kazerounian,A.Roy, G. Crisenza, P. Cappelletti,andA.

Modelli, “MultilevelFlashcellsandtheir trade-offs,”inIEEEIEDM

Tech. Dig., 1996, pp. 169–172.

[11] B. Eitan and A. Roy, “Binary and multilevel Flash cells,” in

Flash Memories, P. Cappelletti, C. Golla, P. Olivo, and E. Zanoni,

Eds. Boston, MA: Kluwer, 1999, pp. 91–152.

[12] G. Hemink, T. Tanaka, and T. Endoh, “Fast and accurate program-

ming method for multi-level NAND EEPROM’s,” in Symp. VLSI

Technology Dig. Tech. Papers, 1995, pp. 129–130.

[13] R. Moazzami and C. Hu, “Stress-induced current in thin silicon

dioxide film,” in IEEE IEDM Tech. Dig., 1992, pp. 139–141.

[14] R. Versari, A. Pieracci, D. Morigi, and B. Riccò, “Fast tunneling pro-

gramming of nonvolatile memories,” IEEE Trans. Electron Devices,

pp. 1285–1287, June 2000.

[15] R. Versari, A. Pieracci, and B. Riccò, “Fast programming/erasing of

thin-oxide EEPROMs,” IEEE Trans. Electron Devices, pp. 817–819,

Apr. 2001.

Marco Grossi was born in Bologna, Italy,

in 1973. He received the Laurea degree in

electronic engineering from the University

of Bolognain 2000. He is currently working

toward the Ph.D. degree at the Department of

Electronics, Computer Science, and Systems

Laboratory, University of Bologna.

His research interest is characterization of non-

volatile memories. He is currently working in the

field of Flash memories and the multilevel pro-

gramming of these memories using the ramped

gate technique.

Massimo Lanzoni was born in Bologna, Italy,

in 1961. He received the Laurea degree in

electronic engineering from the University of

Bologna, Bologna, Italy, in 1987.

He is with the Microelectronics Research

Group, Department of Electronics, Computer

Science, and Systems, University of Bologna,

working on research projects in the fields of

nonvolatile memories, MOS devices, virtual in-

strumentation, and testing. His research interests

include the characterization of thin dielectrics

reliability, nonvolatile memory cell characteristics and reliability, MOS

transistors’ experimental characterization and new techniques for IC

testing as nonvolatile memories endurance testing and CMOS IC latch-up

testing. He is now involved in projects concerning analog applications of

nonvolatile memories and multilevel programming.

Bruno Riccò (Fellow, IEEE) was born in Parma,

Italy, in 1947. He received the Laurea degree

in electrical engineering from the University

of Bologna, Bologna, Italy, in 1971 and the

Ph.D. degree from the University of Cambridge,

Cambridge, U.K., in 1976, where he worked at

the Cavendish Laboratory.

In 1980, he was a Full Professor of Electronics

at the University of Padova, Padova, Italy. In

1983, he was a Full Professor of Electronics at

the University of Bologna. In 1983 and 1986,

he was Visiting Professor at the University of Stanford, Stanford, CA;

at the IBM Thomas J. Watson Research Center, Yorktown Heights, NY;

and at the University of Washington, Seattle. He is currently with the

Department of Electronics, Computer Science, and Systems, University

of Bologna. He has also been a Consultant for major companies and for

the Commission of the European Union in the definition, evaluation, and

review of research projects in microelectronics. He is author or coauthor of

more than 300 publications (more than half of which have been published

in major international journals), three books, and six patents in the field

of nonvolatile memories. His research interests include solid-state devices

and ICs. He is currently also working in the field of IC design, evaluation,

and testing.

Prof. Riccò has been President of the Group of Electron Devices, Tech-

nologies, andCircuitsoftheItalianAssociationof Electrical and Electronics

Engineers (AEI) since 1996, and was President of the Italian Group of Elec-

tronics Engineers from 1998 to 2001. In 1996, he recieved the G. Marconi

Award from the AEI. He was European Editor of the IEEE T

RANSACTIONS

ELECTRON DEVICES from 1986 to 1996, European Cochair at the Inter-

national Electron Device Meeting (IEDM) from 1992 to 2001, and Vice-

Chairman of the North Italy Section of IEEE from 1999 to 2001. He has

been Chairman of the IEEE North Italy since 2002.

GROSSI et al.: PROGRAM SCHEMES FOR MULTILEVEL FLASH MEMORIES 601

Neural Network Decoders for Permutation Codes Correcting Different Errors

Preprint

Jun 2022

Permutation codes were extensively studied in order to correct different types of errors for the applications on power line communication and rank modulation for flash memory. In this paper, we introduce the neural network decoders for permutation codes to correct these errors with one-shot decoding, which treat the decoding as $n$ classification tasks for non-binary symbols for a code of length $n$. These are actually the first general decoders introduced to deal with any error type for these two applications. The performance of the decoders is evaluated by simulations with different error models.

Efficient and Robust Spike-Driven Deep Convolutional Neural Networks Based on NOR Flash Computing Array

Article

Full-text available

Apr 2020

In this article, we propose an efficient and robust spike-driven convolutional neural network (SCNN) based on the nor flash computing array (NFCA), which is mapped by the pretrained convolutional neural network with the same structure. The spike-driven system eliminates the additional analog-to-digital/digital-to-analog (AD/DA) conversion in the NFCA-based CNN. To study the performance of the hardware implementation, an NFCA-based SCNN for the recognition of the Mixed National Institute of Standards and Technology (MNIST) data set is simulated. Simulation results illustrate that the system achieves 97.94% accuracy with the computing speed of 1 x 10⁶ frame per second (fps). Compared with the typical mixed-signal NFCA-based CNN, the NFCA-based SCNN saves 97% area and 56% energy consumption. Moreover, the NFCA-based SCNN demonstrates great robustness to 30% image noise with less than 2% accuracy loss. The impact of random telegraph noise (RTN) is also greatly reduced in which less than 1% accuracy decrease can be achieved at the 32-nm technology node.

Characterization, mechanisms and memory applications of advanced SOI MOSFETs

Article

Oct 2013

Sungjae Chang

The evolution of electronic systems and portable devices requires innovation in both circuit design and transistor architecture. During last fifty years, the main issue in MOS transistor has been the gate length scaling down. The reduction of power consumption together with the co-integration of different functions is a more recent avenue. In bulk-Si planar technology, device shrinking seems to arrive at the end due to the multiplication of parasitic effects. The relay has been taken by novel SOI-like device architectures. In this perspective, this manuscript presents the main achievements of our work obtained with a variety of advanced fully depleted SOI MOSFETs, which are very promising candidates for next generation MOSFETs. Their electrical properties have been analyzed by systematic measurements and clarified by analytical models and/or simulations. Ultimately, appropriate applications have been proposed based on their beneficial features.In the first chapter, we briefly addressed the short-channel effects and the diverse technologies to improve device performance. The second chapter was dedicated to the detailed characterization and interesting properties of SOI devices. We have demonstrated excellent gate control and high performance in ultra-thin FD SOI MOSFET. The SCEs are efficiently suppressed by decreasing the body thickness below 7 nm. We have investigated the transport and electrostatic properties as well as the coupling mechanisms. The strong impact of body thickness and temperature range has been outlined. A similar approach was used to investigate and compare vertical double-gate and triple-gate FinFETs. DG FinFETs show enhanced coupling to back-gate bias which is applicable and suitable for dynamic threshold voltage tuning. We have proposed original models explaining the 3D coupling effect in FinFETs and the mobility behavior in ZnO TFTs. Our results pointed on the similarities and differences in SOI and ZnO transistors. According to our low-temperature measurements and new promoted extraction methods, the mobility in ZnO and the quality of ZnO/SiO2 interface are respectable, enabling innovating applications in flexible, transparent and power electronics. In the third chapter, we focused on the mobility behavior in planar SOI and FinFET devices by performing low-temperature magnetoresistance measurements. Unusual mobility curve with multi-branch aspect were obtained when two or more channels coexist and interplay. Another original result in the existence of the geometrical magnetoresistance in triple-gate and even double-gate FinFETs.The operation of a flash memory in FinFETs with ONO buried layer was explored in the forth chapter. Two charge injection mechanisms were proposed and systematically investigated. We have discussed the role of device geometry and temperature. Our novel ONO FinFlash concept has several distinct advantages: double-bit operation, separation of storage medium and reading interface, reliability and scalability. In the final chapter, we explored the avenue of unified memory, by combining nonvolatile and 1T-DRAM operations in a single transistor. The key result is that the transient current, relevant for 1T-DRAM operation, depends on the nonvolatile charges stored in the nitride buried layer. On the other hand, the trapped charges are not disturbed by the 1T-DRAM operation. Our experimental data offers the proof-of-concept for such advanced memory. The performance of the unified/multi-bit memory is already decent but will greatly improve in the coming years by processing dedicated devices.

Multilevel Read

Chapter

Jan 2005

Minimal Maximum-Level Programming - Combined Cell Mapping and Coding for Faster MLC Memory

Article

Sep 2016

In multi-level-cell (MLC) memory such as Flash and Phase-change memory, shrinking cell size and the growing number of levels per cell worsen the access-rate to capacity ratio and even reduce access rate. We present Minimal Maximum-Level Programming (MMLP), a scheme for expediting cell programming by sharing physical cells among multiple data sectors and exploiting the fact that making moderate changes to a cell's charge level is faster than making large ones. Specifically, we encode the data such that in the kth writing of data to a cell, only the lowest k+1 levels are utilized. Unlike in previously proposed cell-sharing schemes, different same-size data sectors occupy different numbers of physical cells, and a cell may hold a fraction of a bit of a given data sector. Nevertheless, the exposed sector size remains unchanged. Data is encoded, but without redundancy. In a four-level cell example, we achieve up to 75% reduction in write latency. Read latency may be degraded, depending on the percentage of utilized capacity.

Non Volatile Memory Devices

Chapter

Jan 2004

NOR Flash memories

Chapter

Jan 2008

Forward bias enhanced channel hot electron injection for low-level programming improvement in multilevel flash memory

Article

Jul 2004

Low-voltage programmed levels are hard to achieve in multilevel Flash memory using staircase CHEI (channel hot electron injection) programming. The reasons are that low-level programming marginally deviates from the linear relation between threshold voltage VTH and control gate voltage VCG. Forward bias enhancement of CHEI is proposed to overcome this drawback. It is demonstrated that the new technique creates a linear relation between VTH and VCG, validated down to a critical V CG that is at least 1 V lower than traditional CHEI. Through extensive measurements, it is further argued that the most suitable magnitude of forward bias is 0.5 V since (i) it produces the lowest program level of 1.4V; and (ii) higher biases cause not only large current consumption but also worsened drain disturb performance in NOR array configuration. The corresponding linear relation with the unity slope is maintained after 105 program/erase cycling.

Demonstration of Unified Memory in FinFETs

Article

Sep 2014

Floating-body-induced transient mechanism in advanced FinFETs was investigated for unified and multi-bit memory capability. Nonvolatile memory operation was achieved by modifying the SOI buried insulator (BOX) such as the SiO2-Si3N4-SiO2 (ONO) BOX can accumulate permanent charges. Charges are injected/removed in the Si3N4 layer by back-gate or drain bias and sensed remotely, by gate coupling, through the modulation of the drain current flowing at the front interface. On the other hand, the isolated silicon body of the transistor can store volatile charges, generated by impact ionization and able to modulate the drain current flowing at the back interface. Our experimental results successfully demonstrate that these two different memory modes can be advantageously combined for multi-bit volatile memory operation. The volatile memory behavior strongly depends on the distribution of the nonvolatile charges stored in the nitride buried layer. Our measurements manifest that the nonvolatile charges located near the drain terminal have larger influence on the volatile memory operation than the charges located at the opposite terminal. Also, we reveal that the bias conditions and device geometry are important factors for the two memory modes.

Phase Change Memory Reliability: A Signal Processing and Coding Perspective

Article

Apr 2015

Phase change memory (PCM) is a new solid-state memory technology that promises disruptive changes in the way servers and enterprise storage systems are built. Multilevel-cell (MLC) storage is highly desirable for increasing capacity and thus lowering cost-per-bit in memory technologies. In PCM, MLC storage is hampered by noise and resistance drift. In this paper, the issue of reliability in MLC PCM is addressed. A statistical model is developed that captures the main impairments in MLC PCM cell-arrays. A signal processing and coding framework is then introduced that provides robustness to drift and noise, improving reliability and prolonging data retention. Several examples of codes are provided and practical detection schemes are described.

Dynamic power management of streaming applications over a wireless lan

Article

Full-text available

Jan 2003

Emanuele Lattanzi

Exploring coprocessor interfaces in an embedded java environment

Article

Full-text available

Jan 2003

Emanuele Lattanzi

Communication and synchronization between master controller and coprocessors are critical issues in the design of parallel system-on-chip architectures, especially when applications are developed in a high-level programming language and run on a virtual run-time environment (such as a Java Virtual Machine). In this paper we propose a design space exploration environment based on a general HW-SW architectural template and a full-system cycle-accurate simulation tool, built on top of Simics. Our flow takes accurately into account the overheads caused by operating system, virtual environment, drivers, synchronization mechanisms and non-ideal memory system. In a top-down co-design flow, the proposed approach bridges the abstraction gap between HW-SW partitioning and HW synthesis. In particular it provides: i) a realistic evaluation of the effectiveness of a tentative partitioning, ii) guidelines for designing the HW-SW interface, iii) performance constraints for the synthesis of the HW components.

A Simulation Model for Streaming Applications over a Power Manageable Wireless Link

Article

Full-text available

Jan 2003

In this work we introduce a hardware-validated simulation model for the exploration of real-time multimedia systems, where system components are modeled as interacting generalized semi-Markov processes (GSMPs). We apply the simulation model to explore the design space of a mobile client accessing streaming data through a wireless network. The model has been characterized and validated against power and performance measurements performed on an instrumented HP’s iPAQ with wireless LAN running a MPEG4 video application. We analyze the impact of tuning parameters for the real-time multimedia system (buffer sizes, channel bandwidth, power management policy) on the trade off between power consumption and QoS.

Compilers and Operating Systems for Low Power

Book

Jan 2003

Compilers and Operating Systems for Low Power focuses on both application-level compiler directed energy optimization and low-power operating systems. Chapters have been written exclusively for this volume by several of the leading researchers and application developers active in the field. The first six chapters focus on low energy operating systems, or more in general, energy-aware middleware services. The next five chapters are centered on compilation and code optimization. Finally, the last chapter takes a more general viewpoint on mobile computing. The material demonstrates the state-of-the-art work and proves that to obtain the best energy/performance characteristics, compilers, system software, and architecture must work together. The relationship between energy-aware middleware and wireless microsensors, mobile computing and other wireless applications are covered. This work will be of interest to researchers in the areas of low-power computing, embedded systems, compiler optimizations, and operating systems.

A novel architecture for power maskable arithmetic units

Conference Paper

Jan 2003

Flash Memories

Chapter

Jan 1999

The selection of a Flash cell approach is a reflection of the market and product features that a company decides to pursue. There are two major markets for Flash memories: one is the traditional embedded memory, and the other is the new emerging market of mass storage.

Compilers and operating systems for low power

Article

A Novel DNA Detection technique based on Integrable Electronics

Conference Paper

Oct 2003

Error Correcting Codes for Crosstalk Effect Minimization

Article

Jan 2003

In this paper we present an analysis of crosstalk effects on busses implementing error correcting codes. We show that the redundancy introduced by these codes can be exploited in order to avoid the worst case crosstalk-induced delay. Our analysis is based on the evaluation of the coupling effective capacitance which need to be charged during bus activity. In particular, we analyze the cases of the Hamming and Dual Rail codes. We show that Hamming codes do not allow us to avoid the most delay costly bus transitions, while this can be the case for Dual Rail codes. Furthermore, we illustrate that, by increasing the redundancy of the Dual Rail code by only one bit, even higher crosstalk-induced delay reductions can be achieved. Finally, we show that a further improvement can be obtained by an optimized placing of the bus wires.

Advanced power management techniques: going beyond intelligent shutdown

Article

Jan 2003

Luca Benini

Well into the System-on-Chip era, power consump-tion has emerged as one of the most critical challenges to design complexity scaling. Moving from a critical assessment of current technologies and architectures, we survey the distinguishing fea-tures of a design methodology that aims at energy consumption reduction, under guaranteed quality of service (QoS), as a main objective in system design.

Program Schemes For Multilevel Flash Memories

Abstract and Figures

Recommended publications

A new flash-erase EEPROM cell with a sidewall select-gate on its source side

Erratic Cell Behavior in Channel Hot Electron Programming of NOR Flash Memories

A novel algorithm for high-throughput programming of multilevel flash memories

Bandwidth optimization of flash memories with the RGP technique

Fast tunneling programming of nonvolatile memories