The various components of the power-management unit (PMU) and how it interacts with the floating-point pipeline. Clock and power gate enabled signals for the floating-point unit (FPU) are controlled by this unit.

Source publication

IA-32 Processor with a Wide-Voltage-Operating Range in 32-nm CMOS

Article

Full-text available

Mar 2013

Designing a microprocessor that's efficient across a wide-voltage-operating range requires overcoming a variety of microarchitecture and circuit design challenges. In this article, the authors demonstrate their IA-32 processor, which is built in 32-nm CMOS technology, which can operate efficiently between 280 mV and 1.2 V. They also discuss some of...

Context 1

... PMU controls the distributed sleep devices and enables the clock throughout the floating-point unit (FPU). It monitors the stream of instruction IDs along with any stall or reset activity in the FPU pipeline (see Figure 2). When it detects an idle con- dition, it signals the programmable idle- threshold counter to clock gate and power gate the FPU after the first idle. ...

View in full-text

Figure 2. Longitudinal image of representative calcein labeled...

Table 2 . Changes in trabecular parameters of the L5 vertebra 12 weeks...

Figure 3. Representative image of tartrate-resistance acidic...

Table 3 . Tibia cortical bone parameters 12 weeks post impact in the...

Figure 4. Decreased trabecular bone strength, reflected by apparent...

Experimental repetitive mild traumatic brain injury induces deficits in trabecular bone microarchitecture and strength in mice

Article

Full-text available

Dec 2017

To evaluate the long-term consequence of repetitive mild traumatic brain injury (mTBI) on bone, mTBI was induced in 10-week-old female C57BL/6J mice using a weight drop model, once per day for 4 consecutive days at different drop heights (0.5, 1 and 1.5 m) and the skeletal phenotype was evaluated at different time points after the impact. In vivo m...

The microarchitecture of a multi-threaded RISC-V compliant processing core family for IoT end-nodes

Article

Full-text available

Dec 2017

Internet-of-Things end-nodes demand low power processing platforms characterized by heterogeneous dedicated units, controlled by a processor core running concurrent control threads. Such architecture scheme fits one of the main target application domain of the RISC-V instruction set. We present an open-source processing core compliant with RISC-V o...

Towards Self-Adaptive Caches: a Run-Time Reconfigurable Multi-Core Infrastructure

Conference Paper

Full-text available

Jan 2014

This paper presents the first steps towards the implementation of an evolvable and self-adaptable processor cache. The implemented system consists of a run-time reconfigurable memory-to-cache address mapping engine embedded into the split level one cache of a Leon3 SPARC processor as well as of an measurement infrastructure able to profile microarc...

Lamina Cribrosa in Glaucoma: Diagnosis and Monitoring

Article

Full-text available

Apr 2015

The lamina cribrosa is the putative site of retinal ganglion cell axonal injury in glaucoma. Although histological studies have provided evidence of structural changes to the lamina cribrosa, even in early stages of glaucoma, until recently, the ability to evaluate the lamina cribrosa in vivo has been limited. Recent advances in optical coherence t...

Elastin Organization in Pig and Cardiovascular Disease Patients' Pericardial Resistance Arteries

Article

Full-text available

Mar 2015

Peripheral vascular resistance is increased in essential hypertension. This involves structural changes of resistance arteries and stiffening of the arterial wall, including remodeling of the extracellular matrix. We hypothesized that biopsies of the human parietal pericardium, obtained during coronary artery bypass grafting or cardiac valve replac...

Exploiting Read/Write Asymmetry to Achieve Opportunistic SRAM Voltage Switching in Dual-Supply Near-Threshold Processors

Preprint

Full-text available

Sep 2018

Energy-efficient microprocessors are essential for a wide range of applications. While near-threshold computing is a promising technique to improve energy efficiency, optimal supply demands from logic core and on-chip memory are conflicting. In this paper, we perform static reliability analysis of 6T SRAM and discover the variance among different sizing configuration and asymmetric minimum voltage requirements between read and write operations. We leverage this asymmetric property in near-threshold processors equipped with voltage boosting capability by proposing an opportunistic dual-supply switching scheme with a write aggregation buffer. Our results show that proposed technique improves energy efficiency by more than 21.45% with approximate 10.19% performance speed-up.

Exploiting Read/Write Asymmetry to Achieve Opportunistic SRAM Voltage Switching in Dual-Supply Near-Threshold Processors

Article

Full-text available

Aug 2018

Energy-efficient microprocessors are essential for a wide range of applications. While near-threshold computing is a promising technique to improve energy efficiency, optimal supply demands from logic core and on-chip memory are conflicting. In this paper, we perform static reliability analysis of 6T SRAM and discover the variance among different sizing configuration and asymmetric minimum voltage requirements between read and write operations. We leverage this asymmetric property i n near-threshold processors equipped with voltage boosting capability by proposing an opportunistic dual-supply switching scheme with a write aggregation buffer. Our results show that proposed technique improves energy efficiency by more than 21.45% with approximate 10.19% performance speed-up.

SRAM based opportunistic energy efficiency improvement in dual-supply near-threshold processors

Conference Paper

Full-text available

Jun 2018

Energy-efficient microprocessors are essential for a wide range of applications. While near-threshold computing is a promising technique to improve energy efficiency, optimal supply demands from logic core and on-chip memory are conflicting. In this paper, we perform reliability analysis of 6T SRAM and discover imbalanced minimum voltage requirements between read and write operations. We leverage this imbalance property in near-threshold processors equipped with voltage boosting capability by proposing an opportunistic dual-supply switching scheme with a write aggregation buffer. Our results show that proposed technique improves energy efficiency by more than 18% with approximate 8.54% performance speed-up.

Fine-Grained Voltage Boosting for Improving Yield in Near-Threshold Many-Core Processors

Conference Paper

Full-text available

May 2015

Process variation is a major impediment in optimizing yield, energy, and performance in near-threshold many-core processors. In this paper, we present a comprehensive analysis on yield losses in near-threshold many-core processors. Based on our analysis, we propose energy-efficient yield improvement techniques for near-threshold many-core processors: SRAM cell arrays and Wordline driver voltage Boosting (SWBoost) and Cache voltage Boosting (CBoost). Results reveal that SWBoost and CBoost improve a chip yield by up to 66% and 83%, respectively. Furthermore, runtime energy overheads of SWBoost and CBoost are only 0.46% and 0.54%, respectively, which are much lower than conventional voltage boosting techniques.

A 28 nm DSP Powered by an On-Chip LDO for High-Performance and Energy-Efficient Mobile Applications

Article

Dec 2014
IEEE J SOLID-ST CIRC

Baker Mohammad

This paper describes the implementation of a Qualcomm Hexagon digital signal processor (DSP) in a 28 nm high-κ metal gate technology. The DSP is a multi-threaded very-long- instruction-word (VLIW) machine optimized for low leakage and energy efficiency. It uses a clock distribution network, clock gating cells, and pulsed latches that are optimized for low switching energy. The processor can be powered using a low-dropout (LDO) voltage regulator or a head switch. It operates from 255 MHz at 0.60 V to 1.24 GHz at 1.05 V. When operating from the LDO, the power consumption of the core can be as low as 58 μW/MHz, which is two to three times lower than comparable cores optimized for ultra-low voltage operation.

Multicore Processors: Challenges, Opportunities, Emerging Trends, Proceedings Embedded World Conference 2014, 25-27 February, 2014, Nuremberg, Germany, Design & Elektronik, 2014

Conference Paper

Full-text available

Feb 2014

Christian Märtin

This paper undertakes a critical review of the current challenges in multicore processor evolution, underlying trends and design decisions for future multicore processor implementations. It is first shown, that for keeping up with Moore´s law during the last decade, the VLSI scaling rules for processor design had to be dramatically changed. In future multicore designs large quantities of dark silicon will be unavoidable and chip architects will have to find new ways for balancing further performance gains, energy efficiency and software complexity. The paper compares the various architectural alternatives on the basis of specific analytical models for multicore systems. Examples of leading commercial multicore processors and architectural research trends are given to underscore the dramatic changes lying ahead in computer architecture and multicore processor design.

A power-performance balanced network-on-chip for mixed CPU-GPU systems

Chapter

Jan 2022
ADV COMPUT

CPU-GPU integrated systems are emerging as a high-performance and easily-programmable heterogeneous platform to facilitate development of data-parallel software. Network-intensive GPU workloads generate high on-chip traffic, producing local congestion near hot Last Level Cache (LLC) banks, drastically harming CPU performance. Congestion-optimized on-chip network designs can mitigate this problem through their large virtual and physical channel resources. However, when there is little or no GPU traffic, such networks become suboptimal, as they exhibit higher unloaded packet latencies due to their longer critical path delays. In this chapter, we introduce BiNoCHS, a reconfigurable voltage-scalable on-chip network for CPU-GPU heterogeneous systems. Under CPU-dominated low-traffic scenarios, BiNoCHS operates at nominal-voltage and high clock frequency with a topology optimized for low hop count and simple routing strategy, maximizing CPU performance. Under high-intensity GPU/mixed workloads, it transitions to a near-threshold mode, activating additional routers/channels and adaptive routing to resolve congestion. Our evaluation results demonstrate that BiNoCHS improves CPU/GPU performance by an average of 57.3%/33.6% over a latency-optimized network under congestion, while improving CPU performance by 32.8% over high-bandwidth design in unloaded scenarios.

Achieving Energy Efficiency for Near-Threshold Circuits Through Postfabrication Calibration and Adaptation

Article

Dec 2019
IEEE T VLSI SYST

Scaling supply voltage to the near-threshold voltage (NTV) region is an effective approach for energy-constrained circuit design at the cost of acceptable performance reduction. However, by operating in the NTV region, the sensitivity of circuits to process and runtime variations significantly aggravates. Therefore, the performance and power consumption of a circuit is largely impacted by the variabilities, which affects the operating voltage for the most efficient computation, i.e., the minimum energy point (MEP). Accordingly, finding an optimum operating voltage for near-threshold computing (NTC) to account for variabilities is very challenging. In this article, we propose an MEP calibration and adaptation approach based on machine learning to tune for minimal energy operation on a per-chip basis by considering process and runtime variations. In the proposed approach, the optimal supply voltage of each chip is determined during manufacturing tests by characterizing dynamic and leakage power and at runtime by considering the impact of temperature variation. The presented method does not require costly power measurement circuitry on chip. The simulation results show that the proposed method has high MEP prediction accuracy and achieves near-optimal operation by only 1.2% higher energy consumption compared with the optimal operation.

SRAM based Opportunistic Energy Efficiency Improvement in Dual-Supply Near-Threshold Processors

Conference Paper

Jun 2018

Reliability in Super- and Near-Threshold Computing: A Unified Model of RTN, BTI and PV

Article

Jun 2017
IEEE T CIRCUITS-I

Near-threshold computing (NTC) poses stringent constraints on designing reliable circuits, as degradations have a magnified impact at lower supply voltages (Vdd) compared with super-threshold supply voltages. While phenomena, such as bias temperature instability (BTI) scale down with Vdd, mitigate their magnified impact with reduced degradations and, thus, have little impact on NTC reliability. Process variation (PV) and random telegraph noise (RTN) do not scale with Vdd and, therefore, become key reliability challenges in NTC. On the other hand, in super-threshold computing (STC), PV and BTI are the dominant phenomena, as BTI induces considerable degradations at nominal Vdd and PV imposes large enough shifts to matter at any supply voltage. Therefore, to allow Vdd-scaling from super- to near-threshold, we need to consider all of BTI, RTN, and PV. Ergo, we present a unified RTN and BTI model that models their shared physical origin and is validated against experimental data across a wide voltage range. Our unified model and PV model capture the joint impact of RTN, BTI, and PV within a probabilistic reliability estimation for NTC and STC circuits. We employed our proposed model to analyze the reliability of SRAM cells showing how taking error correction codes into account is able to mitigate the deleterious effects of BTI, RTN, and PV by 36% compared with unprotected circuits.

The various components of the power-management unit (PMU) and how it interacts with the floating-point pipeline. Clock and power gate enabled signals for the floating-point unit (FPU) are controlled by this unit.

Context in source publication

Similar publications

Citations