Figure 3 - uploaded by Sparsh Mittal
Content may be subject to copyright.
Results for Quad-core System 

Results for Quad-core System 

Source publication
Conference Paper
Full-text available
Use of NVM (Non-volatile memory) devices such as ReRAM (resistive RAM) and STT-RAM (spin transfer torque RAM) for designing on-chip caches holds the promise of providing a high-density, low-leakage alternative to SRAM. However, low write endurance of NVMs, along with the write-variation introduced by existing cache management schemes may significan...

Similar publications

Technical Report
Full-text available
The limitations of SRAM viz. low-density and high leakage power have motivated the researchers to explore non-volatile memory (NVM) as an alternative. However, the write-endurance of NVMs is orders of magnitude smaller than that of SRAM, and existing cache management schemes may introduce significant write-variation, and hence, the use of NVMs for...
Article
Full-text available
Monolithic 3D (M3D) integration has been emerged as a promising technology for fine-grained 3D stacking. As the M3D integration offers extremely small dimension of via in a nanometer-scale, it is beneficial for small microarchitectural blocks such as caches, register files, translation look-aside buffers (TLBs), etc. However, since the M3D integrat...
Article
Full-text available
While non-volatile memories (NVMs) provide high-density and low-leakage, they also have low write-endurance. This, along with the write-variation introduced by the cache management policies can lead to very small cache lifetime. In this paper, we propose ENLIVE, a technique for improving the lifetime of NVM caches. Our technique uses a small SRAM s...

Citations

... NVM technologies (e.g. STT-RAM, ReRAM, and PCM) [28], [48], [71] could be considered but they suffer from limited write endurance, high access latency, high write energy, and low write bandwidth [42], [61], [87]. These problems make NVCache more suitable for last level (or near last level) cache. ...
... These problems will be more pronounced than NVMM because caches will be written at a much higher rate than memory, and the closer it is to the core, the higher the rate. Spin-Transfer Torque Random Access Memory (STT-RAM) has a relatively high write endurance level of 4 × 10 12 writes, higher than alternatives such as Phase Change Memory (PCM) with 10 8 writes [52], [61], [71], [87], and Resistive Random Access Memory (ReRAM) with 10 11 writes [4], [48], [61], [87]. However, their write endurance is still orders of magnitude lower than SRAM memory cells (about 10 15 ) [61], [71], [87]. ...
... These problems will be more pronounced than NVMM because caches will be written at a much higher rate than memory, and the closer it is to the core, the higher the rate. Spin-Transfer Torque Random Access Memory (STT-RAM) has a relatively high write endurance level of 4 × 10 12 writes, higher than alternatives such as Phase Change Memory (PCM) with 10 8 writes [52], [61], [71], [87], and Resistive Random Access Memory (ReRAM) with 10 11 writes [4], [48], [61], [87]. However, their write endurance is still orders of magnitude lower than SRAM memory cells (about 10 15 ) [61], [71], [87]. ...
... Several studies tried to extend the cache lifetime by reducing the number of writes to NVM cells [2,3,24,28,31,32] or by evenly distributing the write activities over the cells [14,15,26,27,33,42,44]. The vast majority of these studies focused on data part in the cache. ...
... The studies in this category address the write variation among the data block of a cache set and try to minimize this variation. In [42] and [33], intra-set wear-leveling is provided by invalidating write-intensive data blocks without updating their age bits. The next request for the block causes a cache miss and stores the block in another cache block. ...
Article
Full-text available
Emerging non-volatile memories (NVMs) are known as promising alternatives to SRAMs in on-chip caches. However, their limited write endurance is a major challenge when NVMs are employed in these highly frequently written caches. Early wear-out of NVM cells makes the lifetime of the caches extremely insufficient for nowadays computational systems. Previous studies only addressed the lifetime of data part in the cache. This paper first demonstrates that the age bits field of the cache replacement algorithm is the most frequently written part of a cache block and its lifetime is shorter than that of data part by more than 27×. Second, it investigates the effect of age bits wear-out on the cache operation and shows that the performance is severely degraded after even a small portion of age bits become non-operational. Third, a novel cache replacement algorithm, so-called Sleepy-LRU, is proposed to reduce the write activity of the age bits with negligible overheads. The evaluations show that Sleepy-LRU extends the lifetime of instruction and data caches to 3.63× and 3.00×, respectively, with an average of 0.06% performance overhead. In addition, Sleepy-LRU imposes no area and power consumption overhead.
... A ReRAM NUCA that addresses the lifetime problem in a performance conscious manner is reported by Kotra et al. [14]. LastingNVCache proposed by Mittal et al. [15] reduces the intra-set write variation by adding the write counters with each block in the cache. After the counter reaches a specified limit, the write operation is skipped by invalidating the block. ...
Article
Full-text available
The attractive features such as low static power and high density exhibited by the Non-Volatile Memory (NVM) technologies makes them a promising candidate in the memory hierarchy, including caches. However, the limited write endurance with the write variations governed by the access patterns and the applied replacement policies reduce the chance of NVMs as a successor of SRAM. These write variations are of concern as they not only breakdown the NVM cells but also reduce the effective lifetime. This paper proposes efficient techniques to mitigate the intra-set write variation to improve the lifetime of NVM cache. Our first two techniques partition the cache into windows of equal size and distribute the writes uniformly across the cache set by employing the window as write-restricted or read-only. The selection of the window in these techniques is by rotation or with the help of counters. In our third technique, different cache ways are employed as a write-restricted over the period of execution to distribute the writes uniformly. Experimental results using full system simulation show significant reduction in intra-set write variation along with improvement in the cache lifetime.
... Depending on the behavior of memory access in a program and cache replacement proposed cache management policies to reduce the writes on an hybrid cache. Each Mittal et al. [78,75] presented techniques to increase cache lifetime by reducing intra-set write variation. The idea behind is to change the physical cache-block location of a write-intensive data item within a set to achieve wear-leveling by periodically flushing a frequently-written data-item. ...
Thesis
Traditional memories such as SRAM, DRAM and Flash have faced during the last years,critical challenges related to what modern computing systems required: high performance,high storage density and low power. As the number of CMOS transistors is increasing, theleakage power consumption becomes a critical issue for energy-efficient systems. SRAMand DRAM consume too much energy and have low density and Flash memories have alimited write endurance. Therefore, these technologies can no longer ensure the needs in bothembedded and high-performance computing domains. The future memory systems mustrespect the energy and performance requirements. Since Non Volatile Memories (NVMs)appeared, many studies have shown prominent features where such technologies can be apotential replacement of the conventional memories used on-chip and off-chip. NVMs haveimportant qualities in storage density, scalability, leakage power, access performance andwrite endurance. Many research works have proposed designs based on NVMs, whether onmain memory or on cache memories. Nevertheless, there are still some critical drawbacksof these new technologies. The main drawback is the cost of write operations in terms oflatency and energy consumption. Ideally, we want to replace traditional technologies withNVMs to benefit from storage density and very low leakage but eventually without the writeoperations overhead.The scope of this work is to exploit the advantages of NVMs employed mainly on cachememories by mitigating the cost of write operations. Obviously, reducing the number of writeoperations in a program will help in reducing the energy consumption of that program. Manyapproaches about reducing writes operations exist at circuit level, architectural level andsoftware level. We propose a compiler-level optimization that reduces the number of writeoperations by eliminating the execution of redundant stores, called silent stores. A store issilent if it’s writing in a memory address the same value that is already stored at this address.The LLVM-based optimization eliminates the identified silent stores in a program by notexecuting them.Furthermore, the cost of a write operation is highly dependent on the used NVM andits non-volatility called retention time; when the retention time is high then the latency andthe energetic cost of a write operation are considerably high and vice versa. Based on thischaracteristic, we propose an approach applicable in a multi-bank NVM where each bank isdesigned with a specific retention time. We analyze a program and we compute the worst-caselifetime of a store instruction. The worst-case lifetime will help to allocate data to the mostappropriate NVM bank.
... Microarchitectural simulations have been performed using Sniper simulator and workloads from SPEC2006 suite and HPC field (Section 4). In addition, ENLIVE has been compared with two recently proposed techniques for improving lifetime of NVM caches, namely PoLF (probabilistic line-flush) [5] and LastingNVCache [13] (refer Section 5.1). The results have shown that, compared to other techniques, ENLIVE provides larger improvement in cache lifetime and performance, with a smaller energy loss (Section 5.2). ...
... Note that although intra-set wear-leveling techniques (e.g., [5,13,[19][20][21]) can also be used for mitigating repeated address attack to NVM, ENLIVE offers a distinct advantage over them. An intra-set wear-leveling technique only performs uniform-distribution of writes to a set and does not reduce the total number of writes to a set. ...
... 1. Relative cache lifetime where the lifetime is defined as the inverse of maximum writes on any cache block [4,13] 2. Weighted speedup [14] (called relative performance) 3. Percentage energy loss 4. Absolute increase in MPKI (miss-per-kilo-instructions) ...
Article
Full-text available
While non-volatile memories (NVMs) provide high-density and low-leakage, they also have low write-endurance. This, along with the write-variation introduced by the cache management policies can lead to very small cache lifetime. In this paper, we propose ENLIVE, a technique for improving the lifetime of NVM caches. Our technique uses a small SRAM storage, called HotStore. ENLIVE detects frequently written blocks and transfers them to the HotStore so that they can be accessed with smaller latency and energy. This also reduces the number of writes to the NVM cache which improves its lifetime. We present microarchitectural schemes for managing the HotStore. Simulations have been performed using an x86-64 simulator and benchmarks from SPEC2006 suite. We observe that ENLIVE provides higher improvement in lifetime and better performance and energy efficiency than two state-of-the-art techniques for improving NVM cache lifetime. ENLIVE provides 8.47X, 14.67X and 15.79X improvement in lifetime or 2, 4 and 8 core systems, respectively. Also, it works well for a range of system and algorithm parameters and incurs only small overhead.
... This configuration could help reduce the large power and area requirements of SRAM; however, the memory system would need to use clever new algorithms to prevent the processor from creating either performance or endurance hotspots in the NVM technology, which, if left unaddressed, could lead to a very short cell lifetime. Such algorithms are currently being investigated [21,32], but numerous manufacturing and deployment hurdles remain. ...
Article
Full-text available
For extreme-scale high performance computing systems, system-wide power consumption has been identified as one of the key constraints moving forward, where the DRAM main memory systems account for about 30-50% of a node's overall power consumption. Moreover, as the benefits of device scaling for DRAM memory slow, it will become increasingly difficult to keep memory capacities balanced with increasing computational rates offered by next-generation processors. However, a number of emerging memory technologies - nonvolatile memory (NVM) devices - are being investigated as an alternative for DRAM. Moving forward, these NVM devices may offer a number of solutions for HPC architectures. First, as the name, NVM, implies, these devices retain state without continuous power, which can, in turn, reduce power costs. Second, certain NVM devices can be as dense as DRAM, facilitating more memory capacity in the same physical volume. Finally, NVM, such as contemporary NAND flash memory, can be less expensive than DRAM in terms of cost per bit. Taken together, these benefits can provide opportunities for revolutionizing the design of extreme-scale HPC systems. Researchers are investigating how to integrate these emerging technologies into future extreme-scale HPC systems, and how to expose these capabilities in the software stack and applications. Current results show a number of these strategies may offer high-bandwidth I/O, larger main memory capacities, persistent data structures, and new approaches for application resilience and output post-processing, such as transaction-based, incremental-checkpointing and in-situ visualization, respectively.
... Classification of cache WLTs: Based on their granularity, the WLTs can be classified as inter-color [29], inter-set [6], [15], intra-set [6], [16], [30]- [32] and memory-cell level [27], [28], [33]. We propose an intra-set WLT and compare our technique to other intra-set WLTs (see Section V and VI for more details). ...
... The WLTs can also be divided as whether they use datainvalidation (also called 'flushing') [6], [15], [29], [30] or incache data-movement (also called data-migration or shifting) ( [16] and PoLSwap shown in Section V). Data invalidation increases off-chip accesses, leading to contention and endurance issues in main memory. ...
... Taken together, they evaluate a technique in comprehensive manner. These metrics have been used by other research works also [6], [15], [16], [27], [30], [32]. ...
Article
Full-text available
Driven by the trends of increasing core-count and bandwidth-wall problem, the size of last level caches (LLCs) has greatly increased and hence, the researchers have explored non-volatile memories (NVMs) which provide high density and consume low-leakage power. Since NVMs have low write-endurance and the existing cache management policies are write variation-unaware, effective wear-leveling techniques are required for achieving reasonable cache lifetimes using NVMs. We present EqualWrites, a technique for mitigating intra-set write variation. Our technique works by recording the number of writes on a block and changing the cache-block location of a hot data-item to redirect the future writes to a cold block to achieve wear-leveling. Simulation experiments have been performed using an x86-64 simulator and benchmarks from SPEC06 and HPC (high-performance computing) field. The results show that for single, dual and quad-core system configurations, EqualWrites improves cache lifetime by 6.31X, 8.74X and 10.54X, respectively. Also, its implementation overhead is very small and it provides larger improvement in lifetime than three other intra-set wear-leveling techniques and a cache replacement policy.
... Since SRAM provides high write endurance and performance, it has been conventionally used for designing LLCs. However, this has also led to increase in the contribution of LLCs towards chip area and power consumption since SRAM This technical report is an extension of our IEEE ISVLSI 2014 paper [1]. The specific extensions made in this report are listed at the end of Section 1. consumes high leakage power and has low density. ...
... Extensions from previous version: This paper makes the following extensions to the previous version [1]. ...
Technical Report
Full-text available
The limitations of SRAM viz. low-density and high leakage power have motivated the researchers to explore non-volatile memory (NVM) as an alternative. However, the write-endurance of NVMs is orders of magnitude smaller than that of SRAM, and existing cache management schemes may introduce significant write-variation, and hence, the use of NVMs for designing on-chip caches is challenging. In this paper, we present LastingNVCache, a technique for improving the cache lifetime by mitigating the intra-set write variation. LastingNVCache works on the key idea that by periodically flushing a frequently-written data-item, next time the block can be made to load into a cold block in the set. Through this, the future writes to that data-item can be redirected from a hot block to a cold block, which leads to improvement in the cache lifetime. Microarchitectural simulations have shown that for single, dual and quad-core systems, LastingNVCache provides 6.36X, 9.79X, and 10.94X improvement in lifetime, respectively. Also, its implementation overhead is small and it outperforms two recently proposed techniques for improving the lifetime of NVM caches.
Article
Due to the technological advancements in the last few decades, several applications have emerged that demand more computing power and on-chip and off-chip memories. However, the scaling of memory technologies is not at par with computing throughput of modern day multi-core processors. Conventional memory technologies such as SRAM and DRAM have technological limitations to meet large on-chip memory requirements owing to their low packaging density and high leakage power. In order to meet the ever-increasing demand for memory, researchers came up with alternative solutions, such as emerging non-volatile memory technologies such as STT-RAM, PCM and ReRAM. However, these memory technologies have limited write endurance and high write energy. This emphasizes the need for a policy that will reduce the writes or distribute the writes uniformly across the memory thereby enhancing its lifetime by delaying the early wear out of memory cells due to frequent writes. We propose two techniques, Enhanced-Virtually Split Cache (E-ViSC) and Protean-Virtually Split Cache (P-ViSC), which dynamically adjust the cache configuration to distribute the writes uniformly across the memory to enhance the lifetime. Experimental studies show that E-ViSC and P-ViSC improve lifetime of NVM L2 caches by upto 2.31x and 1.97x respectively.