Figure 2 - uploaded by Gregory Ruhl
Content may be subject to copyright.
The various components of the power-management unit (PMU) and how it interacts with the floating-point pipeline. Clock and power gate enabled signals for the floating-point unit (FPU) are controlled by this unit. 

The various components of the power-management unit (PMU) and how it interacts with the floating-point pipeline. Clock and power gate enabled signals for the floating-point unit (FPU) are controlled by this unit. 

Source publication
Article
Full-text available
Designing a microprocessor that's efficient across a wide-voltage-operating range requires overcoming a variety of microarchitecture and circuit design challenges. In this article, the authors demonstrate their IA-32 processor, which is built in 32-nm CMOS technology, which can operate efficiently between 280 mV and 1.2 V. They also discuss some of...

Context in source publication

Context 1
... PMU controls the distributed sleep devices and enables the clock throughout the floating-point unit (FPU). It monitors the stream of instruction IDs along with any stall or reset activity in the FPU pipeline (see Figure 2). When it detects an idle con- dition, it signals the programmable idle- threshold counter to clock gate and power gate the FPU after the first idle. ...

Similar publications

Article
Full-text available
To evaluate the long-term consequence of repetitive mild traumatic brain injury (mTBI) on bone, mTBI was induced in 10-week-old female C57BL/6J mice using a weight drop model, once per day for 4 consecutive days at different drop heights (0.5, 1 and 1.5 m) and the skeletal phenotype was evaluated at different time points after the impact. In vivo m...
Article
Full-text available
Internet-of-Things end-nodes demand low power processing platforms characterized by heterogeneous dedicated units, controlled by a processor core running concurrent control threads. Such architecture scheme fits one of the main target application domain of the RISC-V instruction set. We present an open-source processing core compliant with RISC-V o...
Conference Paper
Full-text available
This paper presents the first steps towards the implementation of an evolvable and self-adaptable processor cache. The implemented system consists of a run-time reconfigurable memory-to-cache address mapping engine embedded into the split level one cache of a Leon3 SPARC processor as well as of an measurement infrastructure able to profile microarc...
Article
Full-text available
The lamina cribrosa is the putative site of retinal ganglion cell axonal injury in glaucoma. Although histological studies have provided evidence of structural changes to the lamina cribrosa, even in early stages of glaucoma, until recently, the ability to evaluate the lamina cribrosa in vivo has been limited. Recent advances in optical coherence t...
Article
Full-text available
Peripheral vascular resistance is increased in essential hypertension. This involves structural changes of resistance arteries and stiffening of the arterial wall, including remodeling of the extracellular matrix. We hypothesized that biopsies of the human parietal pericardium, obtained during coronary artery bypass grafting or cardiac valve replac...

Citations

... Studies have found that the optimal supply voltage resides just above the threshold voltage of the transistor [7], and it has since been experimentally proven by many silicon prototypes [8,9]. To gain a more intuitive understanding of NTC, it helps to breakdown the entire power consumption into the dynamic power and the static power. ...
Preprint
Full-text available
Energy-efficient microprocessors are essential for a wide range of applications. While near-threshold computing is a promising technique to improve energy efficiency, optimal supply demands from logic core and on-chip memory are conflicting. In this paper, we perform static reliability analysis of 6T SRAM and discover the variance among different sizing configuration and asymmetric minimum voltage requirements between read and write operations. We leverage this asymmetric property in near-threshold processors equipped with voltage boosting capability by proposing an opportunistic dual-supply switching scheme with a write aggregation buffer. Our results show that proposed technique improves energy efficiency by more than 21.45% with approximate 10.19% performance speed-up.
... Studies have found that the optimal supply voltage resides just above the threshold voltage of the transistor [7], and it has since been experimentally proven by many silicon prototypes [8,9]. To gain a more intuitive understanding of NTC, it helps to breakdown the entire power consumption into the dynamic power and the static power. ...
Article
Full-text available
Energy-efficient microprocessors are essential for a wide range of applications. While near-threshold computing is a promising technique to improve energy efficiency, optimal supply demands from logic core and on-chip memory are conflicting. In this paper, we perform static reliability analysis of 6T SRAM and discover the variance among different sizing configuration and asymmetric minimum voltage requirements between read and write operations. We leverage this asymmetric property i n near-threshold processors equipped with voltage boosting capability by proposing an opportunistic dual-supply switching scheme with a write aggregation buffer. Our results show that proposed technique improves energy efficiency by more than 21.45% with approximate 10.19% performance speed-up.
... Such energy eciency improvement is desirable for a variety of applications, such as battery-powered smart phones and embedded systems, as well as data centers that pay hefty electricity bills to power their servers. Studies have found that the optimal supply voltage resides just above the threshold voltage of the transistor [2], and it has since been experimentally proven by many silicon prototypes [12,21]. ...
Conference Paper
Full-text available
Energy-efficient microprocessors are essential for a wide range of applications. While near-threshold computing is a promising technique to improve energy efficiency, optimal supply demands from logic core and on-chip memory are conflicting. In this paper, we perform reliability analysis of 6T SRAM and discover imbalanced minimum voltage requirements between read and write operations. We leverage this imbalance property in near-threshold processors equipped with voltage boosting capability by proposing an opportunistic dual-supply switching scheme with a write aggregation buffer. Our results show that proposed technique improves energy efficiency by more than 18% with approximate 8.54% performance speed-up.
... We count the number of tiles containing timing or stability failures in microarchitectural components for a sample of 100 chips where the maximum number of failure occurrences is 64 tiles × 100 chips = 6400. Fig. 3 depicts composition of yield losses for (σ/µ)V th = 0. 15. ...
... Karpuzcu et al. [7] proposed a process variation-aware thread scheduling and frequency assignment technique that exploits a heterogeneity of clusters in an NTV many-core processor. Some prior work demonstrated silicon implementations for NTV processors [15] which use different voltages for SRAM arrays for better stability; however, our designs selectively supply the boosted V dd only to the faulty cache memories. ...
... In the case of SWBoost, we only boost wordline and SRAM cell supply voltage. SWBoost is based on dual-voltage rail (DVR); however, SWBoost operates in a finer-grained manner as compared to the conventional DVR[12][13][15][17]. SWBoost selectively applies the boosted V dd to faulty cache components to improve yield and energy efficiency. ...
Conference Paper
Full-text available
Process variation is a major impediment in optimizing yield, energy, and performance in near-threshold many-core processors. In this paper, we present a comprehensive analysis on yield losses in near-threshold many-core processors. Based on our analysis, we propose energy-efficient yield improvement techniques for near-threshold many-core processors: SRAM cell arrays and Wordline driver voltage Boosting (SWBoost) and Cache voltage Boosting (CBoost). Results reveal that SWBoost and CBoost improve a chip yield by up to 66% and 83%, respectively. Furthermore, runtime energy overheads of SWBoost and CBoost are only 0.46% and 0.54%, respectively, which are much lower than conventional voltage boosting techniques.
... In particular, restricting the usage of the high-, complex, or minimum-sized logic gates [9], [10] due to their higher sensitivity to process variability can significantly increase area, leakage, and dynamic power. For the x86 core, an area penalty approaching 80% is reported in [13]. Making the sequential elements and the bitcells more tolerant to variability tends to exacerbate these negative effects. ...
Article
This paper describes the implementation of a Qualcomm Hexagon digital signal processor (DSP) in a 28 nm high-κ metal gate technology. The DSP is a multi-threaded very-long- instruction-word (VLIW) machine optimized for low leakage and energy efficiency. It uses a clock distribution network, clock gating cells, and pulsed latches that are optimized for low switching energy. The processor can be powered using a low-dropout (LDO) voltage regulator or a head switch. It operates from 255 MHz at 0.60 V to 1.24 GHz at 1.05 V. When operating from the LDO, the power consumption of the core can be as low as 58 μW/MHz, which is two to three times lower than comparable cores optimized for ultra-low voltage operation.
... Nevertheless Near-Threshold Voltage (NTV) logic is currently actively researched. Intel has developed an NTV x86 processor [26], [27] that shows a remarkable DVFS flexibility and can be seen as a prototypical core for future extremely energy-efficient multicore systems. ...
Conference Paper
Full-text available
This paper undertakes a critical review of the current challenges in multicore processor evolution, underlying trends and design decisions for future multicore processor implementations. It is first shown, that for keeping up with Moore´s law during the last decade, the VLSI scaling rules for processor design had to be dramatically changed. In future multicore designs large quantities of dark silicon will be unavoidable and chip architects will have to find new ways for balancing further performance gains, energy efficiency and software complexity. The paper compares the various architectural alternatives on the basis of specific analytical models for multicore systems. Examples of leading commercial multicore processors and architectural research trends are given to underscore the dramatic changes lying ahead in computer architecture and multicore processor design.
Chapter
CPU-GPU integrated systems are emerging as a high-performance and easily-programmable heterogeneous platform to facilitate development of data-parallel software. Network-intensive GPU workloads generate high on-chip traffic, producing local congestion near hot Last Level Cache (LLC) banks, drastically harming CPU performance. Congestion-optimized on-chip network designs can mitigate this problem through their large virtual and physical channel resources. However, when there is little or no GPU traffic, such networks become suboptimal, as they exhibit higher unloaded packet latencies due to their longer critical path delays. In this chapter, we introduce BiNoCHS, a reconfigurable voltage-scalable on-chip network for CPU-GPU heterogeneous systems. Under CPU-dominated low-traffic scenarios, BiNoCHS operates at nominal-voltage and high clock frequency with a topology optimized for low hop count and simple routing strategy, maximizing CPU performance. Under high-intensity GPU/mixed workloads, it transitions to a near-threshold mode, activating additional routers/channels and adaptive routing to resolve congestion. Our evaluation results demonstrate that BiNoCHS improves CPU/GPU performance by an average of 57.3%/33.6% over a latency-optimized network under congestion, while improving CPU performance by 32.8% over high-bandwidth design in unloaded scenarios.
Article
Scaling supply voltage to the near-threshold voltage (NTV) region is an effective approach for energy-constrained circuit design at the cost of acceptable performance reduction. However, by operating in the NTV region, the sensitivity of circuits to process and runtime variations significantly aggravates. Therefore, the performance and power consumption of a circuit is largely impacted by the variabilities, which affects the operating voltage for the most efficient computation, i.e., the minimum energy point (MEP). Accordingly, finding an optimum operating voltage for near-threshold computing (NTC) to account for variabilities is very challenging. In this article, we propose an MEP calibration and adaptation approach based on machine learning to tune for minimal energy operation on a per-chip basis by considering process and runtime variations. In the proposed approach, the optimal supply voltage of each chip is determined during manufacturing tests by characterizing dynamic and leakage power and at runtime by considering the impact of temperature variation. The presented method does not require costly power measurement circuitry on chip. The simulation results show that the proposed method has high MEP prediction accuracy and achieves near-optimal operation by only 1.2% higher energy consumption compared with the optimal operation.
Article
Near-threshold computing (NTC) poses stringent constraints on designing reliable circuits, as degradations have a magnified impact at lower supply voltages (Vdd) compared with super-threshold supply voltages. While phenomena, such as bias temperature instability (BTI) scale down with Vdd, mitigate their magnified impact with reduced degradations and, thus, have little impact on NTC reliability. Process variation (PV) and random telegraph noise (RTN) do not scale with Vdd and, therefore, become key reliability challenges in NTC. On the other hand, in super-threshold computing (STC), PV and BTI are the dominant phenomena, as BTI induces considerable degradations at nominal Vdd and PV imposes large enough shifts to matter at any supply voltage. Therefore, to allow Vdd-scaling from super- to near-threshold, we need to consider all of BTI, RTN, and PV. Ergo, we present a unified RTN and BTI model that models their shared physical origin and is validated against experimental data across a wide voltage range. Our unified model and PV model capture the joint impact of RTN, BTI, and PV within a probabilistic reliability estimation for NTC and STC circuits. We employed our proposed model to analyze the reliability of SRAM cells showing how taking error correction codes into account is able to mitigate the deleterious effects of BTI, RTN, and PV by 36% compared with unprotected circuits.