RADAR algorithm implementation. Note that there are limits on the number of pulses in both coarse and fine control phases (not specified in the flowchart), beyond which the program operation will be declared "fail." Cells with the same target ranges can be programed at the same time.

Source publication

RADAR: A Fast and Energy-Efficient Programming Technique for Multiple Bits-Per-Cell RRAM Arrays

Article

Full-text available

Jul 2021

HfO₂-based resistive RAM (RRAM) is an emerging nonvolatile memory technology that has recently been shown capable of storing multiple bits-per-cell. The energy/delay costs of an RRAM write operation are dependent on the number of pulses required for RRAM programming. The pulse count is often large when existing programming approaches are used for m...

Context 1

... BL voltage V BL (for fine SET) or the SL voltage V SL (for fine RESET)-this phase is similar to the GSR algorithm, except that the parameters are range-dependent, i.e., tuned specifically for each target range. Details about the choice of parameters for both resistance control phases are discussed in Section IV. The RADAR algorithm is shown in Fig. ...

View in full-text

Context 2

... range. We thus chose the simplest option, in which each coarse range has the largest possible width, from 0 to 35 k. With this choice, after a single SET pulse in the Coarse Control phase, the algorithm switches to the Fine Control phase immediately. For this scenario, RADAR can be simplified by removing the red "Coarse Reset pulse" part of Fig. 6 (belonging to the Coarse Control phase). The coarse ranges did not provide significant benefit to pulse count in the technology we used but may be beneficial in other RRAM ...

View in full-text

Context 3

... voltage of 4 V in the Fine Control phase (without the cell achieving the target range), the WL voltage V WL (which starts from the value used in the Coarse Control phase for each range) is stepped up with 10-mV step size (about two or three steps needed for the devices we tested). This case is indicated in "Blue" (in the Fine Control phase) of Fig. 6. The reason to use V BL in the fine SET phase can be explained as follows. Suppose the target range is Range 1, and after the coarse SET pulse (V BL = 2 V, V WL = 2.86 V, as shown in Table I), the cell resistance is higher than the target of 4.58 k. In the fine SET phase [where the cell configuration is like a common source amplifier, ...

View in full-text

EMBER: Efficient Multiple-Bits-Per-Cell Embedded RRAM Macro for High-Density Digital Storage

Article

Full-text available

Jul 2024
IEEE J SOLID-ST CIRC

Designing compact and energy-efficient resistive RAM (RRAM) macros is challenging due to: 1) large read/write circuits that decrease storage density; 2) low-conductance cells that increase read latency; and 3) the pronounced effects of routing parasitics on high-conductance cell read energy. Multiple-bits-per-cell RRAM can boost storage density but has further challenges resulting from reliability problems due to conductance relaxation and slow write due to narrow conductance levels. This work presents a multiple-bits-per-cell RRAM macro called Efficient Multiple-Bits-per-Cell Embedded RRAM (EMBER), which: 1) demonstrates read/write circuit compaction through constrained optimization of driver and pass gate transistor sizes; 2) introduces a common-mode bleed conductance at the sense amplifier inputs, reducing read settling time by for low-conductance cells, and 3) cuts read path capacitance to further reduce read access time and energy. To address reliability and write speed, EMBER contains a configurable on-chip read/write controller. We present a level allocation scheme that uses array-level characterization data to find sufficiently reliable allocations, while simultaneously maximizing write bandwidth. EMBER is the first embedded RRAM storage macro to achieve fully integrated multiple-bits-per-cell readout and write-verification without any off-chip reference generation or sensing. The macro operates at with 64k $\times$ 48 $=$ 3 M cells in TSMC 40-nm CMOS, achieving 1 b/cell read operation with energy at, and 2 b/cell read with at . 1 b/cell write-verify operates with energy at (BER $<$ ), and 2 b/cell write-verify operates with at (BER $<$ ). The array-level endurance is found to be 10 K for 1–2 b/cell. Normalizing for process scaling, the macro demonstrates the highest effective RRAM cell density to date of for 1 b/cell and for 2 b/cell, an improvement of and, respectively, over the best prior work.

Multi-Pole NEM Relays and Multiple-Bits-Per-Cell RRAM for Efficient 3-D ICs

Thesis

Full-text available

Mar 2024

Akash Levy

In this dissertation, I present techniques for improving the power, performance, and area of integrated circuits (ICs) through 3-D integration of two emerging nanotechnologies: (1) resistive random-access memory (RRAM), a non-volatile memory with multiple-bits-per-cell storage capability, and (2) nanoelectromechanical (NEM) relays, nano-scale mechanical relays that can be actuated electrostatically. In modern ICs for edge computing, data movement between on and off-chip memories typically consumes a large fraction of the total power. Dense, non-volatile embedded memory can reduce/eliminate off-chip data movement by keeping frequently-read application data always on chip. RRAM is a good candidate for such a memory, especially because it can store multiple bits per cell, achieving high density on-chip storage. However, efficient and reliable operation with multiple-bits-per-cell RRAM has been a challenge due to (1) stochastic device behavior during programming that results in large pulse counts with traditional write-verify methods, and (2) reliability issues arising from resistance relaxation. Towards the goal of achieving efficient and reliable multiple-bits-per-cell RRAM, I present three contributions: (1) range-dependent adaptive resistance (RADAR) tuning, a fast and energy-efficient programming method for multiple-bits-per-cell RRAM that uses an adaptive combination of coarse- and fine-grained cell resistance tuning, yielding a 2.4x reduction in pulse count over prior methods, (2) characterization of resistance relaxation behavior in three RRAM technologies and analysis of its implications for multiple-bits-per-cell storage, and (3) efficient multiple-bits-per-cell embedded RRAM (EMBER), the first demonstration of a fully-integrated multiple-bits-per-cell RRAM macro. EMBER contains a multiple-bits-per-cell read and write controller with a high degree of flexibility that enables good level allocation (mitigating reliability issues from resistance relaxation) and programming scheme optimization (yielding low-energy, low-latency multiple-bits-per-cell writes). Finally, in reconfigurable ICs, in addition to the memories, the routing fabric consumes a large fraction of the overall area and power. I demonstrate that replacing CMOS routing switches with 3-D integrated multi-pole nanoelectromechanical (NEM) relays in a coarse-grained reconfigurable array (CGRA) can achieve 19% lower area and 10% lower power at iso-performance.

Exploiting the State Dependency of Conductance Variations in Memristive Devices for Accurate In-Memory Computing

Article

Full-text available

Dec 2023
IEEE T ELECTRON DEV

Analog in-memory computing (AIMC) using memristive devices is considered a promising Non-von Neumann approach for deep learning (DL) inference tasks. However, inaccuracies in the programming of devices, that are attributed to conductance variations, pose a key challenge toward achieving sufficient compute precision for DL inference. Fortunately, conduction variations in memristive devices, such as phase-change memory (PCM) devices, exhibit a strong state dependence. This state dependence can be exploited in synaptic unit cells that comprise more than one memristive device, to encode positive or negative weights. In such multi-memristive unit cells, we propose a method that optimally maps the weights to the device conductance values, by maximizing the number of devices at the stable SET and RESET states. We demonstrate that this method reduces the matrix-vector multiplication (MVM) error and is more resilient to non-ideal device retention characteristics. With this approach, we increase the mean experimental inference accuracy of a network trained for MNIST classification by 0.71% on two PCM-based AIMC cores, and the hardware-realistic simulated top-1 accuracy of a network trained for ImageNet classification by 0.28%, while significantly reducing variability across multiple experiment instances.

Devices and Architectures for Efficient Computing In-Memory (CIM) Design

Chapter

Nov 2023

Smart computing has demonstrated huge potential for various application sectors such as personalized healthcare and smart robotics. Smart computing aims bringing computing close to the source where the data is generated or stored. Memristor-based Computation-In-Memory (CIM) has the potential to realize such smart computing for data and computation intensive applications. This paper presents an overview and design present of CIM, covering from the architecture and circuit level down to the device level. On the circuit and device level, accelerators for machine learning will be presented and discussed, focusing on variability and reliability effects. We will discuss these aspects for Redox-based Resistive Random Access Memories (ReRAM) based on the Valence Change Mechanism (VCM) by employing the compact model JART VCM v1b.

Statistical Modeling of Metal-Oxide RRAM SET/RESET Behavior Using Deep Neural Networks

Conference Paper

Full-text available

Sep 2023

Boosting RRAM-based Mixed-Signal Accelerators in FD-SOI technology for ML applications

Article

Full-text available

Aug 2023

This paper presents the Flipped (F)-2T2R RRAM compute cell enhancing the performance of RRAM-based mixed-signal accelerators for deep neural networks (DNNs) in machine learning (ML) applications. The F-2T2R cell is designed to exploit the features of the FD-SOI technology and it achieves a large increase in cell output impedance, compared to the standard 1T1R cell. The paper also describes the modelling of an F-2T2R-based accelerator and its transistor-level implementation in a 22-nm FD-SOI technology. The modelling results and the accelerator performance are validated by simulation. The proposed design can achieve an energy efficiency of up to 1260 1b-TOPS/W, with a memory array of 256 rows and columns. From the results of our analytical framework, a ResNet18, mapped on the accelerator, can obtain an accuracy reduction below 2%, with respect to the floating point baseline, on the CIFAR-10 dataset.

One-Transistor-Multiple-RRAM Cells for Energy-Efficient In-Memory Computing

Conference Paper

Jun 2023

ANN Inference enabled by Variability Mitigation using 2T-1R Bit Cell-based Design Space Analysis

Conference Paper

Full-text available

May 2023

A CMOL-Like Memristor-CMOS Neuromorphic Chip-Core Demonstrating Stochastic Binary STDP

Article

Full-text available

Dec 2022

The advent of nanoscale memristors raised hopes of being able to build CMOL (CMOS/nanowire/molecular) type ultra-dense in-memory-computing circuit architectures. In CMOL, nanoscale memristors would be fabricated at the inter-section of nanowires. The CMOL concept can be exploited in neuromorphic hardware by fabricating lower density neurons on CMOS and placing massive analog synaptic connectivity with nanowire and nanoscale-memristor fabric post-fabricated on top. However, technical problems have hindered such developments for presently available reliable commercial monolithic CMOS-memristor technologies. On one hand, each memristor needs a MOS selector transistor in series to guarantee forming and programming operations in large arrays. This results in compound MOS-memristor synapses (called 1T1R) which are no longer synapses at the crossing of nanowires. On the other hand, memristors do not yet constitute highly reliable, stable analog memories for massive analog-weight synapses with gradual learning. Here we demonstrate a pseudo-CMOL monolithic chip core that circumvents the two technical problems mentioned above by: (a) exploiting a CMOL-like geometrical chip layout technique to improve density despite the 1T1R limitation, and (b) exploiting a binary weight stochastic Spike-Timing-Dependent-Plasticity (STDP) learning rule that takes advantage of the more reliable binary memory capability of the memristors used. Experimental results are provided for a spiking neural network (SNN) CMOL-core with 64 input neurons, 64 output neurons and 4096 1T1R synapses, fabricated in 130nm CMOS with 200nm-sized Ti/HfOx/TiN memristors on top. The CMOL-core uses query-driven event read-out, which allows for memristor variability insensitive computations. Experimental system-level demonstrations are provided for plain template matching tasks, as well as regularized stochastic binary STDP feature-extraction learning, obtaining perfect recognition in hardware for a 4-letter recognition experiment.

Memristive Devices for Time Domain Compute-in-Memory

Article

Full-text available

Dec 2022

Analog compute schemes as well as compute-in-memory have emerged in an effort to reduce the increasing power hunger of convolutional neural networks, which exceeds the constraints of edge devices. Memristive device types are a relatively new offering with interesting opportunities for unexplored circuit concepts. In this work, the use of memristive devices in cascaded time domain compute-in-memory is introduced with the primary goal of reducing size of fully unrolled architectures. The different effects influencing determinism in memristive devices are outlined together with reliability concerns. Architectures for binary as well as multi-bit multiply and accumulate cells are presented and evaluated. As more involved circuits offer more accurate compute result, a trade-off between design effort and accuracy comes into the picture. To further evaluate this trade-off, the impact of variations on overall compute accuracy is discussed. The presented cells reaches an Energy/OP of 0.23 fJ at a size of 1.2 μm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> for binary and 6.04 fJ at 3.2 μm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> for 4x4 bit multiply and accumulate operations.

RADAR algorithm implementation. Note that there are limits on the number of pulses in both coarse and fine control phases (not specified in the flowchart), beyond which the program operation will be declared "fail." Cells with the same target ranges can be programed at the same time.

Contexts in source publication

Citations