Fig 1 - uploaded by Kyriakos Stavrou
Content may be subject to copyright.
A chip multiprocessor with 16 cores 

A chip multiprocessor with 16 cores 

Source publication
Conference Paper
Full-text available
Increased power density, hot-spots, and temperature gradients are severe limiting factors for today’s state-of-the-art microprocessors. However, the flexibility offered by the multiple cores in future Chip Multiprocessors (CMPs) results in a great opportunity for controlling the chip thermal characteristics. When a process is to be assigned to a co...

Context in source publication

Context 1
... follows is a brief analysis of the previous equation. Inter-Core Heat Exchange: There is heat exchange between each pair of cores in the chip, that obeys both the principle of superposition and reci- procity. Superposition, which is modeled by the summation in equation 1, means that the total effect of inter-cores heat exchange is the sum of the effect of heat exchange between each pair of cores. Reciprocity, which de- fines that if core A is cooler than core B the temperature of A due to heat exchange with B will increase by the same “amount” that the temperature of B will decrease, is modeled by f c ( x ) = − f c ( − x ) in equation 1. Heat Abduction from the Ambient: Heat transfer to the ambient, which is the only way of cooling the chip, is a function of the difference between core’s and ambient’s temperature. The larger the difference the larger the heat abduction rate [2,12,13]. Local Power Consumption: The local power consumption of the core is the last, but probably the most important, factor in the heat equation. As mentioned earlier, applications have significant differences in their thermal behavior. TSIC takes this diversity into account by modeling applications of five different thermal types ranging from applications with minimal impact on temperature to “thermal viruses” (section 4.1) . The first term of equation 1, models the inter-core heat exchange, im- plying that such an exchange exists among any pair of cores in the chip. Skadron et al. [2] found that modeling only the heat exchange between adjacent cores (cores having a common edge) has minimal effect on accuracy and provides significant improvement on computational efficiency of the algorithm. Thermal transfer between non adjacent cores still exists but now is implicit. For example, in Figure 1 the temperature of core 1 is modeled to be explicitly affected only by the cores 1-L, 1-R, 1-U and 1-D and not by the diagonally adjacent cores (1-LU,1-LD, 1-RU and 1-RD). However core 1-LU, for example, still affects the temperature of core 1, but implicitly, through affecting the temperature of cores 1-L and 1-U (this applies in the same way to the rest of the diagonally adjacent cores). Cores neighboring with the edge of the chip, have the ability to dissipate more heat to the ambient than the other cores, due to the increased “free” cross-sectional area. TSIC takes this into account by modeling increased heat abduction rate for these cores. Whenever a process is scheduled for execution, Process Scheduling Policies are used to determine the core on which it will run. In TSIC, the user is able to choose one out of the several available policies. The modularity of TSIC allows new policies to be implemented with just few lines of code. The scheduling policies available in TSIC are briefly described ...

Similar publications

Conference Paper
Full-text available
The increasing of the number of transistors on a chip, which plays the main role in improvement in the performance and increasing the speed of a microprocessor, causes rapidly increasing of microprocessor design complexity. Based on Moore’s Law the number of transistors should be doubled every 24 months. The doubling of transistor count affects inc...
Conference Paper
Full-text available
We propose minimalist new hardware additions to a microprocessor chip that protect cryptographic keys in portable computing devices which are used in the field but owned by a central authority. Our authority-mode architecture has trust rooted in two critical secrets: a Device Root Key and a Storage Root Hash, initialized in the device by the truste...
Article
Full-text available
Application-specific instruction-set processors (ASIPs) allow the designer to extend the instruction set of the base processor with selected custom instructions to tailor-fit the application. In this paper, with the help of a motivational example, we first demonstrate that different custom instructions are vulnerable to faults with varying probabil...
Article
Full-text available
The gate-level testing also called low-level testing is generally appropriate at the design time and for small circuits. The chip-level testing and board-level testing also called high-level testing are preferred when the circuit complexities are too high, making it difficult to perform low level testing in a reasonable amount of time. The cost of...

Citations

... After detecting the dead blocks, DELICIOUS proactively evicts them from the LLC and turns off LLC ways to generate on-chip thermal buffers and to reduce core-temperature in its vicinity [11], [12]. Basically, the temperature of any on-chip component is guided by the basic superposition and reciprocity principle of heat transfer, which is driven by three factors: (1) the component's own power consumption, (2) heat abduction by ambient, and (3) conductive heat transfer with its peers [45]. Hence, prudential selection of these LLCways for shutting down on-the-fly can potentially reduce the chip temperature [12], by (a) curtailing its own power consumption and (b) incorporating heat transfer with the peers at the generated on-chip thermal buffers, while maintaining performance. ...
Article
Full-text available
Enhancing result-accuracy in approximate computing (AC) based real-time systems, without violating power constraints of the underlying hardware, is a challenging problem. Execution of such AC real-time applications can be split into two parts: (i) the mandatory part , execution of which provides a result of acceptable quality, followed by (ii) the optional part , that can be executed partially or fully to refine the initially obtained result in order to increase the result-accuracy, without violating the time-constraint. This paper introduces DELICIOUS , a novel hybrid offline-online scheduling strategy for AC real-time dependent tasks. By employing an efficient heuristic algorithm , DELICIOUS first generates a schedule for a task-set with an objective to maximize the results-accuracy, while respecting system-wide constraints. During execution, DELICIOUS then introduces a prudential cache resizing that reduces temperature of the adjacent cores, by generating thermal buffers at the turned off cache ways. DELICIOUS further trades off this thermal benefits by enhancing the processing speed of the cores for a stipulated duration, called V/F Spiking , without violating the power budget of the core, to shorten the execution length of the tasks. This reduced runtime is exploited either to enhance result-accuracy by dynamically adjusting the optional part, or to reduce temperature by enabling sleep mode at the cores. While surpassing the prior art, DELICIOUS offers $80\%$ result-accuracy with its scheduling strategy, which is further enhanced by $8.3\%$ in online, while reducing runtime peak temperature by 5.8 ${}^{\circ }\mathrm{C}$ on average, as shown by benchmark based evaluation on a 4-core based multicore.
... The temperature of on-chip components are driven by the following factors: (a) the component's own power consumption, (b) heat abduction by the ambient, and (c) heat exchange among the peer components. The temperature of a component T com (t ) at time t can be modeled as [38]: ...
... whereT com (t − 1) is the temperature of the component com at time t − 1. f дen (P dyn (t ) + P st (t )) denotes the generated temperature due to its power consumption, whereas f r em (T b (t − 1) − T a ) is the change in temperature due to heat abduction or removal by the ambient, the effective way of cooling. The last component f tr (T com (t − 1) − T m (t − 1)) implies the temperature change due to heat transfer among the peers (p com ), which obeys the principle of superposition and reciprocity [38]. ...
Article
In the era of short channel length, Dynamic Thermal Management (DTM) has become a challenging task for the architects and designers engineering modern Chip Multi-Processors (CMPs). Ever-increasing demand of processing power along with the developed integration technology produces CMPs with high power density, which in turn increases effective chip temperature. This increased temperature leads to increase in the reliability issues for the chip-circuitry with significant increment in leakage power consumption. Recent DTM techniques apply DVFS or Task Migration to reduce temperature at the cores, the hottest on-chip components, but often ignore the on-chip hot caches. To commensurate the high data demand of these cores, most of the modern CMPs are equipped with large multi-level on-chip caches, out of which on-chip Last Level Caches (LLCs) occupy the largest on-chip area. These LLCs are accounted for their significantly high leakage power consumption that can also potentially generate on-chip hotspots at the LLCs similar to the cores. As power consumption constructs the backbone of heat dissipation, hence, this work dynamically shrinks cache size while maintaining performance constraint to reduce LLC leakage, primarily. These turned-off cache portions further work as on-chip thermal buffers for reducing average and peak temperature of the CMP without affecting the computation. Simulation results claim that, at a minimal penalty on the performance, proposed cache-based thermal management having 8MB centralised multi-banked shared LLC gives around 5°C reduction in peak and average chip temperature, which are comparable with a Greedy DVFS policy.
... The temperature of on-chip components are driven by the following factors: (a) the component's own power consumption, (b) heat abduction by the ambient and (c) heat exchange among the peer components. The temperature of a component ( ) at time can be modeled as [38]: ...
... ( ( )+ ( )) denotes the generated temperature due to its power consumption, whereas ( ( −1)− ) is the change in temperature due to heat abduction or removal by the ambient, the effective way of cooling. The last component ( ( − 1) − ( − 1)) implies the temperature change due to heat transfer among the peers ( ), which obeys the principle of superposition and reciprocity [38]. Before modeling temperature for our CMP (ref. Figure 9), we divided the whole CMP into three zones -(i) the core area, for which the thermal status depends on other adjacent core blocks and the neighbouring cache banks; (ii) the cache banks adjacent to the cores, where heat exchanges between the core blocks and the peer cache banks; and (iii) other cache banks, where heat flows only among the cache banks. ...
Preprint
In the era of short channel length, Dynamic Thermal Management (DTM) has become a challenging task for the architects and designers while engineering modern Chip Multi-Processors (CMPs). Ever increasing demand of processing power along with the developed integration technology produces CMPs with high power density, which in turn increases effective chip temperature. This increased temperature leads to increase in the reliability issues for the chip-circuitry with significant increment in leakage power consumption. Recent DTM techniques apply DVFS or Task Migration to reduce temperature at the cores, the hottest on-chip components, but often ignore the on-chip hot caches. To commensurate the high data demand of these cores, most of the modern CMPs are equipped with large multi-level on-chip caches, out of which on-chip Last Level Caches (LLCs) occupy the largest on-chip area. These LLCs are accounted for their significantly high leakage power consumption which can also potentially generate on-chip hotspots at the LLCs similar to the cores. As power consumption constructs the backbone of heat dissipation, hence, this work dynamically shrinks cache size while maintaining performance constraint to reduce LLC leakage, primarily. These turned off cache portions further work as on-chip thermal buffers for reducing average and peak temperature of the CMP without affecting the computation. Simulation results claim that, at a minimal penalty on the performance, proposed cache based thermal management having 8MB centralised multi-banked shared LLC gives around 5 • C reduction in peak and average chip temperature, which are comparable with a Greedy DVFS policy.
... The temperature of on-chip components are driven by the following factors: (a) the components own power consumption, (b) heat abduction by the ambient and (c) heat exchange among the peer components. The temperature of a component T com (t) at time t can be modeled as [20]: ...
... is the change in temperature due to heat abduction or removal by the ambient, the effective way of cooling. The last component f tr (T com (t − 1) − T m (t − 1)) implies the temperature change due to heat transfer among the peers (p com ), which obeys the principle of superposition and reciprocity [20]. ...
... Dynamic temperature of a tile is driven by the following three factors: (a) the component's own power consumption, (b) heat abduction by the ambient and (c) heat exchange among the peer components, which can be modeled as [43]: ...
... f rem (T i (t−1)−T a ) is the temperature change due to heat removal by the ambient, the most effective way of cooling. Finally, f tr (T i (t − 1) − T m (t − 1)) implies the temperature change due to heat transfer among the peer tiles (T p ), which obeys the principle of superposition and reciprocity [43]. ...
Article
Full-text available
Dynamic Thermal Management (DTM) has become a major concern for the chip-designers, as it becomes a challenging task in recent power densed high performance Chip Multi-Processors (CMPs), due to integration of more on-chip components to meet ever increasing demand of processing power. The increased chip temperature incorporates severe circuit errors along with significant increment in leakage power consumption. Traditional DTM techniques apply DVFS or task migration to reduce core temperature, as cores are considered as the hottest on-chip components. Additionally, to commensurate high data demand of these high performance cores, large on-chip Last Level Caches (LLCs) are attached, which are the principal contributors to the on-chip leakage power consumption and occupy the largest on-chip area. As power consumption reduction plays the pivotal role in temperature reduction, hence, this work dynamically shrinks the cache size not only to reduce leakage power consumption, but also, to create on-chip thermal buffers for reducing average chip temperature by exploiting the heat transfer physics. Cache resizing decisions are taken based upon the generated cache hotspots and/or the access patterns, during process execution. Simulation results of the proposed thermal management method are compared with an existing DVFS based method (at cores) and a prior drowsy cache based technique to show its effectiveness.
... From an engineering perspective, knowledge of a surface geometry that maximizes the transport rate offers opportunities for new designs that exhibit enhanced characteristics and properties. For example, the problem of transport across an uneven surface, described in the preceding paragraph, is relevant to a variety of engineering applications involving heat transfer across rough and irregular boundaries, such as the surface of a circuit board in microelectronics [22,23,24,25]. In general, heat transfer in slab-like configurations is of interest to problems associated with Heat Transport from Extended Surfaces (Fins) [8] and inverted high conductivity fins/inserts [26]. ...
Article
Full-text available
We consider the heat transfer problem associated with a periodic array of extended surfaces (fins) subjected to convection heat transfer with a uniform heat transfer coefficient. Our analysis differs from the classical approach as (i) we consider two-dimensional heat conduction and (ii) the base of the fin is included in the heat transfer process. The problem is modeled as an arbitrary two-dimensional channel whose upper surface is flat and isothermal, while the lower surface has a periodic array of extensions/fins which are subjected to heat convection with a uniform heat transfer coefficient. Using the generalized Schwarz-Christoffel transformation the domain is mapped onto a straight channel where the heat conduction problem is solved using the boundary element method. The boundary element solution is subsequently used to pose a shape optimization problem, i.e. an inverse problem, where the objective function is the normalized Shape Factor and the variables of the optimization are the parameters of the Schwarz-Christoffel transformation. Numerical optimization suggests that the optimum fin is infinitely thin and that there exists a critical Biot number that characterizes whether the addition of the fin would result in an enhancement of heat transfer. The existence of a critical Biot number was investigated for the case of rectangular fins. {\bf It is concluded that a rectangular fin is effective if its thickness is less than} $1.64 k/h$, where the $h$ is the heat transfer coefficient and $k$ is the thermal conductivity. This result is independent of both the thickness of the base and the length of the fin.
... From an engineering perspecti ve, knowledge of a surface geometry that maximizes the transport rate offers opportun ities for new designs that exhibit enhanced characterist ics and propertie s. For example, the problem of transport across an uneven surface, described in the preceding paragraph, is relevant to a variety of engineering applications involving heat transfer across rough and irregular boundari es, such as the surface of a circuit board in microelectr onics [27,6,[28][29][30]. In general, heat transfer in slab-like configurations is of interest to problems associate d with heat transport from extended surfaces (Fins) [8], inverted high conductivity fins/inserts [6,31], and embedde d heating tubes [2,[32][33][34][35]. ...
... WSN have long been regarded as a venue for distributed processing [5][4][3][6][20] where TSA policies are essential, and developing a TSA simulator is of the same importance in WSN, as it is in multicore processors [9], or in grid computing [10] contexts. ETSSI an energy-based task scheduling simulator for WSN was presented. ...
... The TSA simulator will provide a common platform for fair evaluation of different TSA policies, and will greatly facilitate the testing of various TSA policies under different network conditions. A TSA simulator for WSN has the same importance as those in multicore systems [9], and grid computing [10] contexts, especially since WSN is being regarded as a venue for distributed processing and computing. ETSSI, an Energy-based Task Scheduling Simulator, is proposed in this work. ...
... WSN have long been regarded as a venue for distributed processing [[20] where TSA policies are essential, and developing a TSA simulator is of the same importance in WSN, as it is in multicore processors [9], or in grid computing [10] contexts. ETSSI an energy-based task scheduling simulator for WSN was presented. ...
Conference Paper
Distributed processing has been a viable solution for enabling the next generation of real-time wireless sensor networks (WSN). Efficient task scheduling and allocation (TSA) policies guarantee that efficiency of the distribution. However, TSA policies in WSN face the challenges imposed by the wireless communication medium. This makes the accurate evaluation and verification of TSA policies difficult in live systems. Hence, developing a TSA simulator becomes essential to decrease the time for successful development and testing of relevant algorithms. This work addresses the need for a TSA simulator for WSN and develops ETSSI an Energy-based Task Scheduling Simulator. ETSSI is an event-driven, scalable, simulator which provides a user-friendly graphical interface. Its accuracy is more than 80% compared to in-door live implementations on a test bed of Telosb nodes. Most importantly, the TSA policy designer using ETSSI is only concerned about the application model, not the actual application implementation, which is mandatory in today's WSN simulators.
... laptops and desktop computers) requires multiple CPU units, each of which is a significant source of thermal energy. Thermal management in these systems is a topic of active research [7, 8, 9]. Moreover, both the problem formulated in this work and the results are relevant to a class of manufacturing processes related to thermal processing or operation of layered structures: (i) self-curing/bonding of laminate polymer matrix composites (PMC) where the heat is produced internally by conductive strips [10, 11, 12, 13], and (ii) internal (self) rapid thermal processing of semiconductor structures through embedded strips of nanoheaters [14, 15]. ...
Article
We address the problem of two-dimensional heat conduction in a solid slab embed-ded with a periodic array of isothermal strips. The surfaces of the slab are subjected to a convective heat transfer boundary condition with a uniform heat transfer coeffi-cient. Similar to the concept of critical insulation radius, associated with cylindrical and spherical configurations, we show that there exists a critical insulation thickness, associated with the slab, such that the total thermal resistance attains a minimum, i.e. a maximum heat transfer rate can be achieved. This result, which is not observed in one-dimensional heat conduction in a plane wall, is a consequence of the non-trivial cou-pling between conduction and convection that results in a 2D temperature distribution in the slab, and a non-uniform temperature on the surface of the slab. The findings of this work offer opportunities for improving the design of a broad range of engineering processes and products.
... The contributions of this paper are the identification of the thermal issues that arise from the technological evolution of the CMP chips, as well as the proposal and evaluation of a thermal-aware scheduling algorithm with two optimizations: thermal threshold and neighborhood awareness. To evaluate the proposed techniques, we used the TSIC simulator [29] . The experimental results for future CMP chip configurations showed that simple thermal-aware scheduling algorithms may result in significant performance degradation as the temperature of the cores often reach the maximum allowed value, consequently triggering DTM events. ...
... The simulator used is the Thermal Scheduling SImulator for Chip Multiprocessors (TSIC) [29], which has been developed specially to study thermal-aware scheduling on chip multiprocessors . TSIC models CMPs with different number of cores whereas it enables studies exploring several other parameters , such as the maximum allowed chip temperature, chip utilization, chip size, migration events, and scheduling algorithms. ...
Article
Full-text available
The increased complexity and operating frequency in current single chip microprocessors is resulting in a decrease in the performance improvements. Consequently, major manufacturers offer chip multiprocessor (CMP) architectures in order to keep up with the expected performance gains. This architecture is successfully being introduced in many markets including that of the embedded systems. Nevertheless, the integration of several cores onto the same chip may lead to increased heat dissipation and consequently additional costs for cooling, higher power consumption, decrease of the reliability, and thermal-induced performance loss, among others. In this paper, we analyze the evolution of the thermal issues for the future chip multiprocessor architectures and show that as the number of on-chip cores increases, the thermal-induced problems will worsen. In addition, we present several scenarios that result in excessive thermal stress to the CMP chip or significant performance loss. In order to minimize or even eliminate these problems, we propose thermal-aware scheduler (TAS) algorithms. When assigning processes to cores, TAS takes their temperature and cooling ability into account in order to avoid thermal stress and at the same time improve the performance. Experimental results have shown that a TAS algorithm that considers also the temperatures of neighboring cores is able to significantly reduce the temperature-induced performance loss while at the same time, decrease the chip's temperature across many different operation and configuration scenarios.