ArticlePDF Available

A 256Gb NAND flash memory stack with 300MB/s HLNAND interface chip for point-to-point ring topology

May 2011

May 2011

DOI:10.1109/IMW.2011.5873241

Authors:

Peter Gillingham

Jin-Ki Kim

Novachips

Hong-Beom Pyeon

Show all 8 authorsHide

A 256Gb NAND flash device includes eight stacked 32Gb MLC die and a 16.2mm 2 HLNAND interface chip providing a 300MB/s synchronous DDR point-to-point ring topology system interface. Four internal busses supporting both 40MHz asynchronous NAND or 133MHz toggle mode NAND allow independent, concurrent operation of the MLC die. The device features data truncation power savings, programmable page size, and command packet error detection.

Command & Write Packet and Data Packet III. INTERFACE CHIP ARCHITECTURE AND ERROR DETECTION CODE Fig. 3 shows high level architectural details of the interface chip. The chip is divided into an I/O control block and four independent NAND flash interface blocks each provisioned with 8KB data and mask SRAMs. All core logic operates from a nominal 1.8V supply. The I/O control block provides user

…

Figures - uploaded by Peter Gillingham

Content may be subject to copyright.

Content uploaded by Peter Gillingham

Content may be subject to copyright.

A 256Gb NAND Flash Memory Stack with 300MB/s

HLNAND Interface Chip for Point-to-Point Ring

Topology

Peter Gillingham, Jin-Ki Kim, Roland Schuetz, Hong-Beom Pyeon, HakJune Oh, Don Macdonald, Eric Choi, David Chinn

MOSAID Technologies Incorporated

11 Hines Road, Suite 203, Kanata, Ontario, CANADA K2K 2X1

gillingham@mosaid.com

Abstract— A 256Gb NAND flash device includes eight stacked

32Gb MLC die and a 16.2mm2 HLNAND interface chip

providing a 300MB/s synchronous DDR point-to-point ring

topology system interface. Four internal busses supporting both

40MHz asynchronous NAND or 133MHz toggle mode NAND

allow independent, concurrent operation of the MLC die. The

device features data truncation power savings, programmable

page size, and command packet error detection.

Keywords- HyperLink NAND, HLNAND, High speed NAND

flash, DDR, MCP, and SSD

I. INTRODUCTION

In recent years there has been a surge of growth in

applications for NAND flash in such things as solid-state drives

and enterprise storage class memory devices that rely on high

storage capacity and benefit from a high speed interface.

Conventional NAND flash suffers from speed limitations due

to bus loading and presents the controller with large page sizes

that may yet increase [1,2]. An interface chip isolates the multi-

drop NAND interface from the memory system channel and

provides, instead, a 300MB/s synchronous DDR interface for

connection to a point-to-point ring topology supporting up to

255 devices without speed degradation. An I/O throughput 7x

faster than conventional asynchronous NAND interfaces and

vastly more scalable than emerging multi-drop DDR NAND

interfaces is achieved [3].

II. POINT-TO-POINT RING TOPOLOGY

A serial daisy-chain ring, shown in Fig. 1, provides a uni-

directional flow of data and commands from the controller,

through each memory device and back to the controller similar

to RamLink [4]. A single load is seen by each device regardless

of the number of devices in the ring. The HyperLink protocol

[5] defines a multiple byte command packet where the first

byte is a target device address, imposing a maximum of 255

devices in the ring with one broadcast address. Input signals

consist of serial command strobe CSI, data strobe DSI, status

STI, and a user-configurable 1 to 8 bit data bus D[7:0], as well

as differential clock CK/CK#, reset RST# and chip enable CE#

signals distributed in parallel. Each device regenerates the

serial signals to provide outputs CSO, DSO, STO, and Q[7:0]

to the next device. The command strobe is used to demarcate

Figure 1: HyperLink Point-to-Point Ring Topology

commands and write data to be programmed to the cell array or

on-chip registers while the data strobe is used to demarcate

memory or register read data to be output onto the ring by a

selected device as shown in Fig. 2. The serial status ring allows

any device to raise a flag indicating an event such as

completion of page read, page program, or block erase

operations.

Figure 2: Command & Write Packet and Data Packet

III. INTERFACE CHIP ARCHITECTURE AND ERROR

DETECTION CODE

Fig. 3 shows high level architectural details of the interface

chip. The chip is divided into an I/O control block and four

independent NAND flash interface blocks each provisioned

with 8KB data and mask SRAMs. All core logic operates from

a nominal 1.8V supply. The I/O control block provides user-

configurable data bus width, command and data pass through,

device identification, data truncation, Error Detection Code

(EDC) for commands, read data output muxing, and

programmable NAND flash clock generation. The I/O pins use

1.8V Low-Voltage CMOS (LVCMOS) signaling; fully

sufficient for operation at 300MB/s. Output drivers provide

user selectable 35Ω or 50Ω source termination to deliver

optimized signal integrity without static power consumption

and an output access time of 2.3ns. Input setup time is 0.3ns.

The interface block forwards all 3-6 byte command packets

from input pins to output pins on the next rising or falling clock

edge. The I/O control block decodes the device address within

command packets and, upon a matching address, inhibits data

from being forwarded to the output pins. Data truncation

reduces I/O power by an average 50% and allows simultaneous

read and write data transfer when the write device is upstream

of the read device, achieving 600MB/s total throughput.

The IO control block also includes an error detection

mechanism that monitors the integrity of command packets.

The final byte of each command packet is a Hamming Code

calculated on the preceding bytes of the packet. Upon detection

of an error, command execution is inhibited and a status

broadcasts a status register read command to determine

whether the error occurred before or after the target device. If

the error occurred before the target device the command should

be reissued. On-chip error detection prevents erroneous

commands from being executed by the target device before the

controller is able to detect and terminate them. Failure to

terminate a command in a timely manner could have serious

consequences if, for example, a device mistakenly interprets an

incoming read command as a program or erase command.

IV. MEASUREMENT RESULTS

Four internal NAND flash interface blocks each contain a

command converter, SRAM for intermediate storage of data

Latency control

CE# CE# [7:0]

RST#

CK#

Finite State Machine

Timing Generator

Adjustable Frequency divider

Input/Ouput

Control + Registers

Command decoding &

Conversion

FIFO

CSI

DSI

D[7:0]

Q[7:0]

Address

registers

DEMUXDEMUX

MUX

Data path control (x32)

CSO

STO

I/O0[7:0]

I/O1[7:0]

I/O2[7:0]

I/O3[7:0]

CLE[3:0]

ALE[3:0]

WE#[3:0]

RE#[3:0]

icsi / idsi

icsi

idsi

dvick

idsi

icsi

WE sig. gen

RE sig. gen

Latch EN gen

WP#[3:0]

8K Byte SRAM

(Bank 0)

STI

DSO

External

HyperLink

Interface

Internal

NAND

Interfaces

Figure 3. Interface Chip Architecture

and timing control for a configurable 40MHz asynchronous

NAND or 133MHz toggle mode NAND I/O port. These allow

simultaneous data transfers and independent, concurrent flash

commands to be carried out on each of the eight flash die. The

interface chip also supports several technology nodes from

different manufacturers selectable by bond option. A clock for

the NAND flash interface block is generated from external

CK/CK# by a programmable clock divider. The SRAM

supports a page size of 8192 bytes plus 448, 512, or 640 bytes

of spare data. Before executing a program command the

program data is first loaded, via the external interface, into one

of the four SRAMs. Upon receiving a program command the

interface chip transfers the SRAM data to the target NAND

flash device and then issues a page program command to the

target NAND device. To reduce the amount of time spent

transferring read data to and from the NAND flash devices,

pages may be subdivided into smaller sub-pages ranging in size

from 2048 bytes to the full physical page. Programmable sub-

pages provide increased I/O operations per second (IOPS); a

key system level performance indicator. The subdivision size is

programmable by register and all internal read data transfers

occur automatically based on the programmed sub-page size.

Program and read throughput are a function of the page

size, the external interface throughput, the internal transfer

time, and the page program and page read times of the specific

NAND flash devices packaged with the chip. Although an

individual NAND die may provide only 5MB/s program

throughput, all 8 die may be operated independently and

simultaneously to achieve 40MB/s within a single HLNAND

MCP. Only 8 MCPs are required to fully saturate the

HyperLink ring. Fig. 4 shows test results indicating 308MB/s

throughput with a 6.5ns DDR clock.

tCK=6.5ns @ Vdd=1.8V (154MHz, 308MB/s DDR)

Figure 4: Output Data Schmoo Plot

V. CONCLUSION

The HLNAND MCP connects through a ring topology to

provide high throughput, increased scalability, reduced I/O

power and flexible page size, delivering important system level

benefits. Placing the high speed interface on a small separate

logic die eliminates the cost adder on multiple NAND devices

due to increased die size and process enhancements to provide

higher performance I/O transistors. Isolating the internal

NAND busses from the external interface dramatically reduces

loading and CV power. Fig. 5 shows a die photo of a prototype

interface chip and an X-ray cross section of a 4-die stack MCP.

The key features of the production version interface chip and

MCP supporting up to 8 NAND die are summarized in Table 1.

Figure 5: Interface Chip Die Photo and X-ray cross section of 4

NAND die stack MCP

Technology (Interface Chip) 0.18um CMOS 1P6M

Chip Size (Interface Chip) 16.2mm2

Organization 8912 bytes x 128 pages x 4096

blocks x 2 LUNs x 4 banks

Power Supply 2.7V ~ 3.6V & 1.8V

Read Time 96us ~ 146us (2KB ~ full page)

Program Time 2ms (Typ)

Erase Time 1.5ms (Typ)

Clock Cycle Time 6.5ns

I/O Width x1, x2, x4, and x8

Package 18mm x 14mm 100-Ball BGA

Table 1. 256Gb NAND Flash MCP Key Features

ACKNOWLEDGMENTS

The authors thank Dick Foss, Steven Przybylski, Roelof

Salters, and John Lindgren for technical suggestions and

support.

REFERENCES

[1] J.-K. Kim, K. Sakui, et al., “A 120mm2 64Mb NAND Flash Memory

Achieving 180ns/Byte Effective Program Speed,” Symp. On VLSI

Circuits, Digest of Technical Papers, Jun. 1996, pp.168-169.

[2] R. Cernea, L. Pham, et al., “A 34MB/s-Program-Throughput 16Gb MCL

NAND with All-Bitline Architecture in 56nm,” ISSCC Dig. Tech.

Papers, Feb. 2008, pp. 420-421.

[3] D. Nobunaga, E. Abedifard, et al., “A 50nm 8Gb Flash Memory with

100MB/s Program Throughput and 200MB/s DDR Interface,” ISSCC

Dig. Tech. Papers, Feb., 2008, pp. 426-427.

[4] H. Wiggers, D. Gustavsom, et al., “IEEE Standard for High-Bandwidth

Memory Interface Based on Scalable Coherent Interface (SCI) Signaling

Technology (RamLink)”, IEEE Std 1596.4-1996

[5] R. Schuetz, H. J. Oh, et al., “HyperLink NAND Flash Architecture for

Mass Storage Applications,” IEEE NVSMW, Aug. 2007, pp. 3-4.

800 MB/s DDR NAND Flash Memory Multi-Chip Package With Source-Synchronous Interface for Point-to-Point Ring Topology

Article

Jan 2013

A 256 Gb NAND flash memory multi-chip package (MCP) includes eight stacked 32 Gb 2 bit/cell multi-level cell (MLC) die and an 11.6 mm2 HyperLink NAND bridge chip providing four internal NAND channels for concurrent memory operations. The bridge chip provides an external 1.2 V unidirectional byte-wide point-to-point source-synchronous double data-rate (DDR) interface for low power 800 MB/s operation in a ring topology. Interface power is reduced by shutting down the phase-locked loop in every second MCP and alternating between edge aligned DDR clock and center aligned DDR clock for source-synchronous data transfer from MCP to MCP.

HyperLink NAND Flash Architecture for Mass Storage Applications

Conference Paper

Full-text available

Sep 2007

The dramatic price reduction of NAND Flash devices in recent years has created an opportunity for Flash to penetrate mass storage applications. This will happen provided the memory vendors can deliver NAND Flash devices with adequate performance and no intrinsic cost premium over the lowest cost conventional NAND Flash devices. The new HLNAND Flash Architecture facilitates this transition by enabling high performance NAND Flash devices with increased longevity and a cost advantage stemming from the low pin count interface and small die size.

A 120-mm2 64-Mb NAND flash memory achieving 180 ns/Byte effective program speed

Article

Full-text available

Jun 1997

Emerging application areas of mass storage flash memories require low cost, high density flash memories with enhanced device performance. This paper describes a 64 Mb NAND flash memory having improved read and program performances. A 40 MB/s read throughput is achieved by improving the page sensing time and employing the full-chip burst read capability. A 2-μs random access time is obtained by using a precharged capacitive decoupling sensing scheme with a staggered row decoder scheme. The full-chip burst read capability is realized by introducing a new array architecture. A narrow incremental step pulse programming scheme achieves a 5 MB/s program throughput corresponding to 180 ns/Byte effective program speed. The chip has been fabricated using a 0.4-μm single-metal CMOS process resulting in a die size of 120 mm<sup>2</sup> and an effective cell size of 1.1 μm<sup>2</sup>

A 120mm(2) 64Mb NAND flash memory achieving 180ns/byte effective program speed

Conference Paper

Jan 1996

A 50nm 8Gb NAND flash memory with 100MB/s program throughput and 200MB/S DDR interface

Conference Paper

Mar 2008

A 3.3V 8Gb NAND flash memory with a synchronous double-data-rate (DDR) interface is designed and fabricated using 3M 50nm technology to meet the requirements of the markets. This paper achieves a NAND flash program throughput of 100 MB/s with quad-plane operation, which is 5x previously reported. I/O read/write throughput of 200MB/s is achieved using a newly developed DDR interface and data path. The chip features a dual interface, supporting both the newly developed synchronous DDR interface as well as the standard, asynchronous NAND flash interface.

A 34MB/s-Program-Throughput 16Gb MLC NAND with All-Bitline Architecture in 56nm

Conference Paper

Mar 2008

In the diverse world of NAND flash applications, higher storage capacity is not the only imperative. Increasingly, performance is a differentiating factor and is also a way of creating new markets or expanding existing markets. While conventional memory uses, for actual operations, every other cell along a selected word line (WL) (Takeuchi, 2006), this design simultaneously exercises them all. A performance improvement of at least 100% is derived from this all-bitline (ABL) architecture relative to conventional chips. Additional techniques push performance to even higher levels.

IEEE Standard for High-Bandwidth Memory Interface Based on Scalable Coherent Interface (SCI) Signaling Technology (RamLink)

4-1996

H Wiggers
D Gustavsom

H. Wiggers, D. Gustavsom, et al., "IEEE Standard for High-Bandwidth Memory Interface Based on Scalable Coherent Interface (SCI) Signaling Technology (RamLink)", IEEE Std 1596.4-1996

A 50nm 8Gb Flash Memory with 100MB/s Program Throughput and 200MB/s DDR Interface

Feb 2008
426-427

D Nobunaga
E Abedifard

D. Nobunaga, E. Abedifard, et al., "A 50nm 8Gb Flash Memory with 100MB/s Program Throughput and 200MB/s DDR Interface," ISSCC Dig. Tech. Papers, Feb., 2008, pp. 426-427.

A 34MB/s-Program-Throughput 16Gb MCL NAND with All-Bitline Architecture in 56nm

Feb 2008
420-421

R Cernea
L Pham

R. Cernea, L. Pham, et al., "A 34MB/s-Program-Throughput 16Gb MCL NAND with All-Bitline Architecture in 56nm," ISSCC Dig. Tech. Papers, Feb. 2008, pp. 420-421.

HyperLink NAND Flash Architecture for Mass Storage Applications

Aug 2007
3-4

R Schuetz
H J Oh

R. Schuetz, H. J. Oh, et al., "HyperLink NAND Flash Architecture for Mass Storage Applications," IEEE NVSMW, Aug. 2007, pp. 3-4.

A 256Gb NAND flash memory stack with 300MB/s HLNAND interface chip for point-to-point ring topology

Abstract and Figures

Recommended publications

A 159mm2 32nm 32Gb MLC NAND-flash memory with 200MB/s asynchronous DDR interface

800 MB/s DDR NAND Flash Memory Multi-Chip Package With Source-Synchronous Interface for Point-to-Poi...

Low Stress Program and Single Wordline Erase Schemes for NAND Flash Memory

HyperLink NAND Flash Architecture for Mass Storage Applications