A chip multiprocessor with 16 cores

Source publication

TSIC: Thermal Scheduling Simulator for Chip Multiprocessors

Conference Paper

Full-text available

Nov 2005

Increased power density, hot-spots, and temperature gradients are severe limiting factors for today’s state-of-the-art microprocessors. However, the flexibility offered by the multiple cores in future Chip Multiprocessors (CMPs) results in a great opportunity for controlling the chip thermal characteristics. When a process is to be assigned to a co...

Context 1

... follows is a brief analysis of the previous equation. Inter-Core Heat Exchange: There is heat exchange between each pair of cores in the chip, that obeys both the principle of superposition and reci- procity. Superposition, which is modeled by the summation in equation 1, means that the total effect of inter-cores heat exchange is the sum of the effect of heat exchange between each pair of cores. Reciprocity, which de- fines that if core A is cooler than core B the temperature of A due to heat exchange with B will increase by the same “amount” that the temperature of B will decrease, is modeled by f c ( x ) = − f c ( − x ) in equation 1. Heat Abduction from the Ambient: Heat transfer to the ambient, which is the only way of cooling the chip, is a function of the difference between core’s and ambient’s temperature. The larger the difference the larger the heat abduction rate [2,12,13]. Local Power Consumption: The local power consumption of the core is the last, but probably the most important, factor in the heat equation. As mentioned earlier, applications have significant differences in their thermal behavior. TSIC takes this diversity into account by modeling applications of five different thermal types ranging from applications with minimal impact on temperature to “thermal viruses” (section 4.1) . The first term of equation 1, models the inter-core heat exchange, im- plying that such an exchange exists among any pair of cores in the chip. Skadron et al. [2] found that modeling only the heat exchange between adjacent cores (cores having a common edge) has minimal effect on accuracy and provides significant improvement on computational efficiency of the algorithm. Thermal transfer between non adjacent cores still exists but now is implicit. For example, in Figure 1 the temperature of core 1 is modeled to be explicitly affected only by the cores 1-L, 1-R, 1-U and 1-D and not by the diagonally adjacent cores (1-LU,1-LD, 1-RU and 1-RD). However core 1-LU, for example, still affects the temperature of core 1, but implicitly, through affecting the temperature of cores 1-L and 1-U (this applies in the same way to the rest of the diagonally adjacent cores). Cores neighboring with the edge of the chip, have the ability to dissipate more heat to the ambient than the other cores, due to the increased “free” cross-sectional area. TSIC takes this into account by modeling increased heat abduction rate for these cores. Whenever a process is scheduled for execution, Process Scheduling Policies are used to determine the core on which it will run. In TSIC, the user is able to choose one out of the several available policies. The modularity of TSIC allows new policies to be implemented with just few lines of code. The scheduling policies available in TSIC are briefly described ...

View in full-text

Figure.1.View of a CPU complexity Level, [7].

Hardware Complexity of Microprocessor Design According to Moore's Law

Conference Paper

Full-text available

Jul 2014

Haissam El-Aawar

The increasing of the number of transistors on a chip, which plays the main role in improvement in the performance and increasing the speed of a microprocessor, causes rapidly increasing of microprocessor design complexity. Based on Moore’s Law the number of transistors should be doubled every 24 months. The doubling of transistor count affects inc...

Hardware-rooted trust for secure key management and transient trust

Conference Paper

Full-text available

Oct 2007

We propose minimalist new hardware additions to a microprocessor chip that protect cryptographic keys in portable computing devices which are used in the field but owned by a central authority. Our authority-mode architecture has trust rooted in two critical secrets: a Device Root Key and a Storage Root Hash, initialized in the device by the truste...

Reliability-Aware Instruction Set Customization for ASIPs with Hardened Logic

Article

Full-text available

Aug 2012

Application-specific instruction-set processors (ASIPs) allow the designer to extend the instruction set of the base processor with selected custom instructions to tailor-fit the application. In this paper, with the help of a motivational example, we first demonstrate that different custom instructions are vulnerable to faults with varying probabil...

FUNCTIONAL TESTING OF A MICROPROCESSOR THROUGH LINEAR CHECKING METHOD

Article

Full-text available

The gate-level testing also called low-level testing is generally appropriate at the design time and for small circuits. The chip-level testing and board-level testing also called high-level testing are preferred when the circuit complexities are too high, making it difficult to perform low level testing in a reasonable amount of time. The cost of...

DELICIOUS: Deadline-Aware Approximate Computing in Cache-Conscious Multicore

Article

Full-text available

Jan 2022
IEEE T PARALL DISTR

Enhancing result-accuracy in approximate computing (AC) based real-time systems, without violating power constraints of the underlying hardware, is a challenging problem. Execution of such AC real-time applications can be split into two parts: (i) the mandatory part , execution of which provides a result of acceptable quality, followed by (ii) the optional part , that can be executed partially or fully to refine the initially obtained result in order to increase the result-accuracy, without violating the time-constraint. This paper introduces DELICIOUS , a novel hybrid offline-online scheduling strategy for AC real-time dependent tasks. By employing an efficient heuristic algorithm , DELICIOUS first generates a schedule for a task-set with an objective to maximize the results-accuracy, while respecting system-wide constraints. During execution, DELICIOUS then introduces a prudential cache resizing that reduces temperature of the adjacent cores, by generating thermal buffers at the turned off cache ways. DELICIOUS further trades off this thermal benefits by enhancing the processing speed of the cores for a stipulated duration, called V/F Spiking , without violating the power budget of the core, to shorten the execution length of the tasks. This reduced runtime is exploited either to enhance result-accuracy by dynamically adjusting the optional part, or to reduce temperature by enabling sleep mode at the cores. While surpassing the prior art, DELICIOUS offers $80\%$ result-accuracy with its scheduling strategy, which is further enhanced by $8.3\%$ in online, while reducing runtime peak temperature by 5.8 ${}^{\circ }\mathrm{C}$ on average, as shown by benchmark based evaluation on a 4-core based multicore.

Exploring the Role of Large Centralised Caches in Thermal Efficient Chip Design

Article

Jun 2019
ACM T DES AUTOMAT EL

In the era of short channel length, Dynamic Thermal Management (DTM) has become a challenging task for the architects and designers engineering modern Chip Multi-Processors (CMPs). Ever-increasing demand of processing power along with the developed integration technology produces CMPs with high power density, which in turn increases effective chip temperature. This increased temperature leads to increase in the reliability issues for the chip-circuitry with significant increment in leakage power consumption. Recent DTM techniques apply DVFS or Task Migration to reduce temperature at the cores, the hottest on-chip components, but often ignore the on-chip hot caches. To commensurate the high data demand of these cores, most of the modern CMPs are equipped with large multi-level on-chip caches, out of which on-chip Last Level Caches (LLCs) occupy the largest on-chip area. These LLCs are accounted for their significantly high leakage power consumption that can also potentially generate on-chip hotspots at the LLCs similar to the cores. As power consumption constructs the backbone of heat dissipation, hence, this work dynamically shrinks cache size while maintaining performance constraint to reduce LLC leakage, primarily. These turned-off cache portions further work as on-chip thermal buffers for reducing average and peak temperature of the CMP without affecting the computation. Simulation results claim that, at a minimal penalty on the performance, proposed cache-based thermal management having 8MB centralised multi-banked shared LLC gives around 5°C reduction in peak and average chip temperature, which are comparable with a Greedy DVFS policy.

Exploring the Role of Large Centralised Caches in Thermal Efficient Chip Design

Preprint

Jun 2019

In the era of short channel length, Dynamic Thermal Management (DTM) has become a challenging task for the architects and designers while engineering modern Chip Multi-Processors (CMPs). Ever increasing demand of processing power along with the developed integration technology produces CMPs with high power density, which in turn increases effective chip temperature. This increased temperature leads to increase in the reliability issues for the chip-circuitry with significant increment in leakage power consumption. Recent DTM techniques apply DVFS or Task Migration to reduce temperature at the cores, the hottest on-chip components, but often ignore the on-chip hot caches. To commensurate the high data demand of these cores, most of the modern CMPs are equipped with large multi-level on-chip caches, out of which on-chip Last Level Caches (LLCs) occupy the largest on-chip area. These LLCs are accounted for their significantly high leakage power consumption which can also potentially generate on-chip hotspots at the LLCs similar to the cores. As power consumption constructs the backbone of heat dissipation, hence, this work dynamically shrinks cache size while maintaining performance constraint to reduce LLC leakage, primarily. These turned off cache portions further work as on-chip thermal buffers for reducing average and peak temperature of the CMP without affecting the computation. Simulation results claim that, at a minimal penalty on the performance, proposed cache based thermal management having 8MB centralised multi-banked shared LLC gives around 5 • C reduction in peak and average chip temperature, which are comparable with a Greedy DVFS policy.

Towards Analysing the Effect of Hybrid Caches on the Temperature of Tiled Chip Multi-Processors

Conference Paper

Full-text available

Dec 2018

Analysing the Role of Last Level Caches in Controlling Chip Temperature

Article

Full-text available

Mar 2018

Dynamic Thermal Management (DTM) has become a major concern for the chip-designers, as it becomes a challenging task in recent power densed high performance Chip Multi-Processors (CMPs), due to integration of more on-chip components to meet ever increasing demand of processing power. The increased chip temperature incorporates severe circuit errors along with significant increment in leakage power consumption. Traditional DTM techniques apply DVFS or task migration to reduce core temperature, as cores are considered as the hottest on-chip components. Additionally, to commensurate high data demand of these high performance cores, large on-chip Last Level Caches (LLCs) are attached, which are the principal contributors to the on-chip leakage power consumption and occupy the largest on-chip area. As power consumption reduction plays the pivotal role in temperature reduction, hence, this work dynamically shrinks the cache size not only to reduce leakage power consumption, but also, to create on-chip thermal buffers for reducing average chip temperature by exploiting the heat transfer physics. Cache resizing decisions are taken based upon the generated cache hotspots and/or the access patterns, during process execution. Simulation results of the proposed thermal management method are compared with an existing DVFS based method (at cores) and a prior drowsy cache based technique to show its effectiveness.

Critical thickness of an optimum extended surface characterized by uniform heat transfer coefficient

Article

Full-text available

Mar 2015

We consider the heat transfer problem associated with a periodic array of extended surfaces (fins) subjected to convection heat transfer with a uniform heat transfer coefficient. Our analysis differs from the classical approach as (i) we consider two-dimensional heat conduction and (ii) the base of the fin is included in the heat transfer process. The problem is modeled as an arbitrary two-dimensional channel whose upper surface is flat and isothermal, while the lower surface has a periodic array of extensions/fins which are subjected to heat convection with a uniform heat transfer coefficient. Using the generalized Schwarz-Christoffel transformation the domain is mapped onto a straight channel where the heat conduction problem is solved using the boundary element method. The boundary element solution is subsequently used to pose a shape optimization problem, i.e. an inverse problem, where the objective function is the normalized Shape Factor and the variables of the optimization are the parameters of the Schwarz-Christoffel transformation. Numerical optimization suggests that the optimum fin is infinitely thin and that there exists a critical Biot number that characterizes whether the addition of the fin would result in an enhancement of heat transfer. The existence of a critical Biot number was investigated for the case of rectangular fins. {\bf It is concluded that a rectangular fin is effective if its thickness is less than} $1.64 k/h$, where the $h$ is the heat transfer coefficient and $k$ is the thermal conductivity. This result is independent of both the thickness of the base and the length of the fin.

Optimum isothermal surfaces that maximize heat transfer

Article

Aug 2013
INT J HEAT MASS TRAN

ETSSI: Energy-based Task Scheduling Simulator for wireless sensor networks

Conference Paper

Jun 2011

Distributed processing has been a viable solution for enabling the next generation of real-time wireless sensor networks (WSN). Efficient task scheduling and allocation (TSA) policies guarantee that efficiency of the distribution. However, TSA policies in WSN face the challenges imposed by the wireless communication medium. This makes the accurate evaluation and verification of TSA policies difficult in live systems. Hence, developing a TSA simulator becomes essential to decrease the time for successful development and testing of relevant algorithms. This work addresses the need for a TSA simulator for WSN and develops ETSSI an Energy-based Task Scheduling Simulator. ETSSI is an event-driven, scalable, simulator which provides a user-friendly graphical interface. Its accuracy is more than 80% compared to in-door live implementations on a test bed of Telosb nodes. Most importantly, the TSA policy designer using ETSSI is only concerned about the application model, not the actual application implementation, which is mandatory in today's WSN simulators.

Critical insulation thickness of a rectangular slab embedded with a periodic array of isothermal strips

Article

Oct 2010
INT J HEAT MASS TRAN

We address the problem of two-dimensional heat conduction in a solid slab embed-ded with a periodic array of isothermal strips. The surfaces of the slab are subjected to a convective heat transfer boundary condition with a uniform heat transfer coeffi-cient. Similar to the concept of critical insulation radius, associated with cylindrical and spherical configurations, we show that there exists a critical insulation thickness, associated with the slab, such that the total thermal resistance attains a minimum, i.e. a maximum heat transfer rate can be achieved. This result, which is not observed in one-dimensional heat conduction in a plane wall, is a consequence of the non-trivial cou-pling between conduction and convection that results in a 2D temperature distribution in the slab, and a non-uniform temperature on the surface of the slab. The findings of this work offer opportunities for improving the design of a broad range of engineering processes and products.

Thermal-Aware Scheduling for Future Chip Multiprocessors

Article

Full-text available

Apr 2007

The increased complexity and operating frequency in current single chip microprocessors is resulting in a decrease in the performance improvements. Consequently, major manufacturers offer chip multiprocessor (CMP) architectures in order to keep up with the expected performance gains. This architecture is successfully being introduced in many markets including that of the embedded systems. Nevertheless, the integration of several cores onto the same chip may lead to increased heat dissipation and consequently additional costs for cooling, higher power consumption, decrease of the reliability, and thermal-induced performance loss, among others. In this paper, we analyze the evolution of the thermal issues for the future chip multiprocessor architectures and show that as the number of on-chip cores increases, the thermal-induced problems will worsen. In addition, we present several scenarios that result in excessive thermal stress to the CMP chip or significant performance loss. In order to minimize or even eliminate these problems, we propose thermal-aware scheduler (TAS) algorithms. When assigning processes to cores, TAS takes their temperature and cooling ability into account in order to avoid thermal stress and at the same time improve the performance. Experimental results have shown that a TAS algorithm that considers also the temperatures of neighboring cores is able to significantly reduce the temperature-induced performance loss while at the same time, decrease the chip's temperature across many different operation and configuration scenarios.

A chip multiprocessor with 16 cores

Context in source publication

Similar publications

Citations