Table 2 - uploaded by Farrukh Hijaz
Content may be subject to copyright.
Projected Transistor Parameters for 11 nm Tri-Gate

Projected Transistor Parameters for 11 nm Tri-Gate

Source publication
Conference Paper
Full-text available
Near-threshold voltage computing promises an order of magnitude improvement in energy efficiency, enabling future processors to integrate 100s of cores running concurrently. However, such low voltage operation accompanies extreme parametric variations, resulting in unreliable operation of the processor. The memory bit-cells in on-chip caches are mo...

Context in source publication

Context 1
... derive models for a tri-gate 11nm electrical technology node using the virtual-source transport models of [18] and the parasitic capacitance model of [27]. These models are used to obtain electrical technology parameters (Table 2) used by both McPAT and DSENT. The static energy (subthreshold and gate leakage) is pro- jected to be the dominant component of the overall energy at NTV [17]. ...

Similar publications

Article
Full-text available
Phase change memory (PCM) is one of promising technology to replace DRAM with its attractive features such as zero leakage power and high scalability. In PCM, a SET operation needs much more time than a RESET operation. A typical write request concurrently writes 64 bytes to a PCM memory line. Therefore, write latency is mainly determined by SET op...

Citations

... Although the shared-LLC organization enables large on-chip cache capacity, the average LLC access latency is considerably higher than a private-LLC system. Therefore, tiled multicores rely heavily on their low-latency private caches for common case optimizations, and are relatively insensitive to the LLC latency [9,10]. ...
... On a set of parallel benchmarks, the proposed protocol reduces the overall energy by 14.7%, 10.7%, 10.5%, and 16.7% and the completion time by 2.5%, 6.5%, 4.5%, and 9.5% when compared ...
Article
Full-text available
The trend of increasing processor performance by boosting frequency has been halted due to excessive power dissipation. However, transistor density has continued to grow which has enabled integration of many cores on a single chip to meet the performance requirements of future applications. Scaling to hundreds of cores on a single chip present a number of challenges, mainly efficient data access and on-chip communication. Near-threshold voltage (NTV) operation has been identified as the most energy efficient region to operate in. Running at NTV can facilitate efficient data access, however, it introduces bit-cell faults in the SRAMs which needs to be dealt with. Another avenue to extract data access efficiency is by improving on-chip data locality. Shared memory abstraction dominates the traditional small computer and embedded space due to its ease of programming. For efficiency, shared memory is often implemented with hardware support for synchronization and cache coherence among the cores. However, accesses to shared data with frequent writes results in wasteful invalidations, synchronous write-backs, and cache line ping-pong leading to low spatio-temporal locality. Moreover, communication through coherent caches and shared memory primitives is inefficient because it can take many instructions to coordinate between cores. This thesis focuses on mitigating the effects of the data access and communication challenges and make architectural contributions to enable efficient and scalable many-core processors. The main idea is to minimize data movement and make each necessary data access more efficient. In this regard, a novel private level-1 cache architecture is presented to enable efficient and fault-free operation at near-threshold voltages. To better exploit data locality, a last-level cache (LLC) data replication scheme is proposed that co-optimizes data locality and off-chip miss rate. It utilizes an in-hardware predictive mechanism to classify data and only replicate high reuse data in the local LLC bank. Finally, a hybrid shared memory, explicit messaging architecture is proposed to enable efficient on-chip communication. In this architecture the shared memory model is retained, however, a set of lightweight in-hardware explicit message passing style instructions are introduced in the instruction set architecture (ISA) that enable efficient movement of computation to where data is located.
... Redesign and careful evaluation required at NTV. Due to unique challenges presented at NTV, partial retrofitting of existing cache management schemes or evaluation approaches for NTV is likely to be insufficient. For example, Hijaz and Khan [2014] note that due to increased latency and reduced capacity at NTV, a cache management scheme (e.g. for placement, movement and replication of data) that is optimized for nominal voltage may not perform optimally at NTV. This is due to the fact that the reduced capacity affects replication decisions and presents a tradeoff between energy loss due to increased off-chip accesses and energy saving due to NTV operation. ...
... We now discuss several of these works. [Abella et al. 2009[Abella et al. , 2010Alameldeen et al. 2011;Ansari et al. 2011;Bacha and Teodorescu 2014;BanaiyanMofrad et al. 2011BanaiyanMofrad et al. , 2013Bortolotti et al. 2014;Chakraborty et al. 2010;Chishti et al. 2009;Choi et al. 2011;Dreslinkski et al. 2007;Dreslinski et al. 2008;Duwe et al. 2015;Ferrerón et al. 2014;Ghasemi et al. 2011;Gottscho et al. 2014;Han et al. 2013;Hijaz and Khan 2014;Hijaz et al. 2013;Khare and Jain 2013;Kumar and Hinton 2009;Ladas et al. 2010;Mahmood and Kim 2011;Maric et al. 2012Maric et al. , 2013aMiller et al. 2010;Roberts et al. 2008;Wilkerson et al. 2008;Yalcin et al. 2014a,b;Zhang et al. 2012] Core [Abella et al. 2010;Dreslinkski et al. 2007;Dreslinski et al. 2013;Miller et al. 2012a,?; Evaluation platform Real processor Teodorescu 2013, 2014;Cho and Mahlke 2012;] Simulator nearly all others Chishti et al. [2009] propose a technique which trades-off cache capacity to enable hard/soft-error resilience at lower voltages. At high voltage, only conventional ECC is used and the entire cache is used for storing data. ...
... Dreslinski et al. 2013; Hijaz and Khan 2014; Khare and Jain 2013; Kumar and Hinton 2009] NT-tolerant cells in tag array[BanaiyanMofrad et al. 2011[BanaiyanMofrad et al. , 2013Duwe et al. 2015;Ferrerón et al. 2014;Ladas et al. 2010;Miller et al. 2010] Management/optimization approaches Disabling faulty or other specific cells[Abella et al. 2009;Alameldeen et al. 2011;BanaiyanMofrad et al. 2011BanaiyanMofrad et al. , 2013Choi et al. 2011;Ferrerón et al. 2014;Ghasemi et al. 2011;Gottscho et al. 2014;Hijaz and Khan 2014;Hijaz et al. 2013;Krimer et al. 2010;Ladas et al. 2010;Mahmood and Kim 2011;Maric et al. 2012Maric et al. , 2013aMiller et al. 2010;Roberts et al. 2008;Wilkerson et al. 2008;Zhang et al. 2012] Use of replication[Ashraf et al. 2014;BanaiyanMofrad et al. 2011BanaiyanMofrad et al. , 2013Chakraborty et al. 2010;Ferrerón et al. 2014;Han et al. 2013;Krimer et al. 2010;Seo et al. 2012;Yalcin et al. 2014b] Use of errorcorrecting codes[Alameldeen et al. 2011; Teodorescu 2013, 2014;Chishti et al. 2009;Duwe et al. 2015;Hijaz et al. 2013;Maric et al. 2013b;Miller et al. 2010; Yalcin et al. 2014a,b;Zhang et al. 2012] Use of filter/victim cache or fault-buffer[Dreslinski et al. 2008;Ladas et al. 2010;Mahmood and Kim 2011;Maric et al. 2013a] Architectural ...
Article
Full-text available
Energy efficiency has now become the primary obstacle in scaling the performance of all classes of computing systems. Low-voltage computing and specifically, near-threshold voltage computing (NTC), which involves operating the transistor very close to and yet above its threshold voltage, holds the promise of providing many-fold improvement in energy efficiency. However, use of NTC also presents several challenges such as increased parametric variation, failure rate and performance loss etc. This paper surveys several recent techniques which aim to offset these challenges for fully leveraging the potential of NTC. By classifying these techniques along several dimensions, we also highlight their similarities and differences. It is hoped that this paper will provide insights into state-of-art NTC techniques to researchers and system-designers and inspire further research in this field.
Article
The end of Dennard’s scaling poses computer systems, especially the datacenters, in front of both power and utilization walls. One possible solution to combat the power and utilization walls is dark silicon where transistors are under-utilized in the chip, but this will result in a diminishing performance. Another solution is Near-Threshold Voltage Computing (NTC) which operates transistors in the near-threshold region and provides much more flexible tradeoffs between power and performance. However, prior efforts largely focus on a specific design option based on the legacy desktop applications, therefore, lacking comprehensive analysis of emerging scale-out applications with multiple design options when dark silicon and/or NTC are/is applied. In this paper, we characterize different perspectives including performance, energy efficiency and reliability in the context of NTC/dark silicon cloud processors running emerging scale-out workloads on various architecture designs. We find NTC is generally an effective way to alleviate the power challenge over scale-out applications compared with dark silicon, it can improve performance by 1.6X, energy efficiency by 50% and the reliability problem can be relieved by ECC. Meanwhile, we also observe tiled-OoO architecture improves the performance by 20%~370% and energy efficiency by 40% ~ 600% over alternative architecture designs, making it a preferable design paradigm for scale-out workloads. We believe that our observations will provide insights for the design of cloud processors under dark silicon and/or NTC.
Conference Paper
This work proposes a strategy for designing VLSI circuits to operate in an extremely wide Voltage-Frequency Scaling (VFS) range, from the supply voltage at which the minimum energy per operation (MEP) is achieved, up to the nominal voltage for the process. First the sizing methodology of two library cells using transistors with different threshold voltages: Regular-VT (RVT) and Low-VT (LVT) is described. Just five combinational cells: INV, NAND, NOR, OAI21, and AOI22 comprise the libraries plus two register cells, all with multiple strengths, for RVT ones. The sizing rule for the transistors of each cell is directly driven by requiring equal rise and fall times in order to attenuate variability effects at very low supply voltages. These cell libraries were characterized for typical, fast, and slow process corners, over temperature (-40°C, 25°C, and 125°C) variations, and for supply voltages varying from 200 mV up to 1.2 V with small supply steps. Circuit syntheses were performed for ten VLSI circuit benchmarks: notch filter, 8051 compatible core, and eight ISCAS benchmark circuits, considering all VDD operating points. We show that at the optimum MEP point (near-VT) an average reduction of 54.46% and 99.01% in energy is possible, when compared with deep sub-threshold and nominal supply voltages, respectively, at room temperature. The extremely wide VFS regime enables operating frequencies varying from hundreds of kHz up to MHz/GHz at -40°C and 25°C, and from MHz up to GHz at 125°C. The near-VT designs herein presented, when compared to related work, showed on average an energy reduction and performance gain of 24.1% and 152.68%, respectively, for the same circuit benchmarks. Comparison of near-VT operation at very low and high temperatures show advantages for a hotter CMOS operation for this regime.
Article
Full-text available
In this paper, we investigate the feasibility of voltage adjustment in a large capacity cache, and propose the architecture of voltage-adaptable nonuniform cache access (VANUCA) that exploits near-threshold computing and multivoltage domain to approach the limit of Vdd in a low-power cache. However, the adoption of near-threshold voltage (NTV) leads to a rocketing error probability in SRAM arrays, which has to be addressed by effective fault-tolerant techniques. Instead of using error correction code or data duplication, the VANUCA exploits the natural data redundancy across the whole memory hierarchy to enable fast fault recovery in the NTV cache. Based on the discovered data resilience and the multi-Vdd architecture, the VANUCA is able to match vulnerable/invulnerable data clusters to available high-/low-voltage domains by utilizing the data migration mechanism in dynamic NUCA. The proposed VANUCA includes two important architectural techniques: 1) static assignment that assumes a fixed voltage domain partitioning and 2) DataMotion that dynamically fits the working set into heterogeneous cache banks through Vdd switching. Experimental results show that the VANUCA achieves considerable improvements in energy efficiency over the conventional single-voltage domain NUCA cache.