Performance comparison between a single core and multi-core processor

Source publication

Multi-core processors: A new way forward and challenges

Conference Paper

Full-text available

Jan 2009

Continuous effort to achieve higher performance without driving up the power consumption and thermal effects has led the researchers to look for alternative architectures for microprocessors. Like the parallel processing which is extensively used in today's all microprocessors, multi-core architecture which combines several independent microprocess...

Context 1

... order to track the continuing performance improvement following Moore’s low, successive technologies have relied on scaling of different device and interconnect parameters. Historically, these performance gains have been accomplished by efficient exploitation of sophisticated process technology, innovative architecture or micro-architecture [2]. To keep on the circuit speed, focus is mainly given on increasing operating frequency. Device dimensions have been scaled to support higher integration density for greater functions. However, this scaling brings in several critical challenges in current sub- 65nm technology. Although power supply has been scaled to keep the dynamic power at bay, due to the aggressive scaling of MOS geometry leakage power has been a major part of total power. Growing device components further aggravates the heat generation in a small chip. Therefore, thermal challenges have appeared as one of the major challenges to the successful advancement of CMOS technology [6]. A new approach called “parallel processing” was proposed in early 1990s to save power [9]. Since then, this method has gained wide acceptance among the architecture designers for which almost every processor now a days runs on this principle. However, this approach can’t alone continue supporting growing speed of microprocessors. Clock frequency must be increased which in turn generates more heat in a processor. This is the main reason for all commercial processors to stick around 4GHz. Considering the growing concern for power dissipation, the concept of multi-core processor is a new step forward and it has become the technology for current and next decades [11], [18]. A multi-core chip-level processor combines two or more independent cores into a single die. Thus a Dual-core processor contains two cores; a Quad- core processor contains four cores and so on. A multi-core processor implements multi-processing units on a single physical package. One basic difference between single processor and multi-core processor is that a single processor has a unique L1 cache along with a L2 cache where as each independent processor in a multi-core system has a common shared L2 cache in addition to an individual L1 cache. In a single core processor, 45nm technology is currently in production and next come 32nm, and 22nm and most likely 10nm node is the limiting technology node considering the strong quantum effects. Consequently, multi-core processor is a promising architecture technique. IBM first introduced multi-core processor chip, Power4 in 2001 [15] through which designers were able to achieve much greater communication bandwidth and resulting performance. In mid-2006, Intel reached new levels of energy-efficient performance with their Intel Core TM 2 Duo processors using 65 nm technology and latest micro- architecture [2]. Although it has been a frequently used architecture, numerous challenges involve accordingly and they must be addressed by the researchers. The rest of this paper is organized as follows. Section II briefly gives some major advantages of multi-core processor. It is concluded that multi-core processors become the standard for delivering greater performance, improved performance per watt and new capabilities across different electronic applications. Section III describes the leading interconnect challenges in multi-core processors. Challenges incurred by design automation and verification and software adaptability have been briefly studied in section IV and V respectively. Finally section VI wraps up the paper. The key driving force to adopting multi-core processor architecture was to address power and cooling challenges. Figure 1 gives performance comparison between a single core and multi-core processor [2]. This analysis which is performed based on Intel tests using the SPECint2000 and SPECfp2000 benchmarks reports that multi-core processors perform much better than a single core processor and it is projected that relative advantage of multi-core system will enhance over the next couple of years. Historically chip manufacturers have met the demand for increasing processor speed by boosting up the operating clock frequency along with the higher integration density. This approach has resulted in uncontrollable heat dissipation in current technology node. With heat rising incrementally faster than the rate at which clock signal propagates through the processors, it has prompted the processor designers for alternative methodologies. Multi-core processors take advantage of a fundamental relationship between power and frequency. By incorporating multiple cores, each core is able to run at a lower frequency, dividing power among them normally given to a single core. The result is a big performance increase over a single core processor. It can be observed that increasing clock frequency by 20% to a single core delivers a 13% performance gain, but requires 73% greater power. Conversely, decreasing clock frequency by 20% reduces power usage by 49%, but causes only 13% performance loss [2]. If a second core is added into the single core architecture, it results in a dual-core processor that at 20% reduced clock frequency; it can effectively deliver 73% more performance while using approximately the same power as a single-core processor at maximum frequency. As stated earlier, each single processor core in a multi- core architecture has its unique L1 cache and all processors in the die share a common L2 cache. Therefore, number of caches and memories required become less than if single core processor is used for the equal number of jobs that need to be performed. For example, Intel Advanced Smart Cache works by sharing the L2 cache among cores so that data are stored in one place that each core can access. Sharing L2 cache enables each core to dynamically utilize even up to 100% of available L2 cache, thus optimizing cache resources [2]. Intel® Smart Memory Access improves system performance by optimizing available data bandwidth from the memory subsystem and hiding the latency of memory accesses through two techniques: a new adaptability called memory disambiguation, and an instruction pointer-based pre-fetcher that fetches memory contents before they are requested [2]. C. Performance enhancement by multi-threading Along with parallel processing method, multi-threading technology is extensively used in single core processor. According to this approach, on a single processor, multithreading generally works on the principle of time- division multiplexing which is much similar to the parallel execution of multiple tasks where the processor switches between different threads [13]. This context switching happens so fast that it creates the illusion of simultaneity to an end user. On a multiprocessor system, threading can be achieved via multiprocessing, where as different threads and processes can run simultaneously on different processor cores. Threading a task in parallel processing machines thus not only increases the number of tasks executed per unit time but also enhances the accuracy of the task. Consequently, it is obvious that significant performance improvement can be achieved using multi- core systems coupled with advances in memory, I/O, and storage devices. Although device performance in a single processor has increased with the continuous scaling technology parameters over the generations, interconnect performance has degraded since interconnect scaling exhibit exactly opposite trend. Therefore, the overall performance of a microprocessor is determined by interconnect characteristics [4]. The main bottleneck associated with the interconnection network of multi-core processor is the interfacing of different cores in a single die. Several interconnection mechanisms have been proposed in [10] to spice up the interconnect performance. Among them, a shared bus fabric (SBF) that provides connection to various modules with the capabilities of coherence source and sinking, a point to point link that connects two SBFs and a cross bar connection system are most commonly used. A shared bus fabric is a high speed link which can communicate data between processors, caches, I/O and memory in a multi-processor system. The effectiveness of such an approach depends on the probability that an L2 miss is serviced on a local cache (an L2 connected to the same SBF), rather than a cache on a remote SBF. It is not easy to maintain this condition in a complex multi-core system. Another problem is that the interconnect fabric itself is large and power-hungry, consuming resources that would otherwise be available for more cores and caches. ...

View in full-text

FIGURE 2. Percentage of register writes divided in writes storing...

FIGURE 4. Slice pipeline stages including the main DC-Patch design...

FIGURE 5. Percentage of register writes requiring a normal access...

FIGURE 6. Percentage of register writes with a mispeculation on the...

FIGURE 7. Slowdown of DC-Patch with respect to a conventional register...

DC-Patch: A Microarchitectural Fault Patching Technique for GPU Register Files

Article

Full-text available

Sep 2020

The ever-increasing parallelism demand of General-Purpose Graphics Processing Unit (GPGPU) applications pushes toward larger and more energy-hungry register files in successive GPU generations. Reducing the supply voltage beyond its safe limit is an effective way to improve the energy efficiency of register files. However, at these operating voltag...

Hardware Architectures for Real-Time Medical Imaging

Article

Full-text available

Dec 2021

Medical imaging is considered one of the most important advances in the history of medicine and has become an essential part of the diagnosis and treatment of patients. Earlier prediction and treatment have been driving the acquisition of higher image resolutions as well as the fusion of different modalities, raising the need for sophisticated hardware and software systems for medical image registration, storage, analysis, and processing. In this scenario and given the new clinical pipelines and the huge clinical burden of hospitals, these systems are often required to provide both highly accurate and real-time processing of large amounts of imaging data. Additionally, lowering the prices of each part of imaging equipment, as well as its development and implementation, and increasing their lifespan is crucial to minimize the cost and lead to more accessible healthcare. This paper focuses on the evolution and the application of different hardware architectures (namely, CPU, GPU, DSP, FPGA, and ASIC) in medical imaging through various specific examples and discussing different options depending on the specific application. The main purpose is to provide a general introduction to hardware acceleration techniques for medical imaging researchers and developers who need to accelerate their implementations.

Parallelization Technique using Hybrid Programming Model

Article

Jan 2021

Investigating Policies for Performance of Multi-core Processors

Article

Full-text available

Feb 2019
IJCSE

Performance is a critical concern of multi-core systems. There are some issues which affect the performance of multicore systems especially shared resource contention and application to core mapping. To address the performance issues various software and hardware-based policies are proposed in different works of literature. These policies address the particular performance issue through some specific approach in isolation. However, having many performance issues and the corresponding number of policies to solve the issues; it is not clear which policy would be beneficial for a particular situation for application execution. There is a need of investigation & classification of existing policies through various aspects like the approach used to address the performance issues, tools used for profiling the application and metrics used to find the source of performance degradation. The classification of policies could help make static and runtime decisions for addressing different performance issues which arise owing to resource allocation and contention. In this paper, we reviewed various policies employed for performance improvement of multicore systems. Policies like the application to core scheduling, memory allocation, bandwidth allocation, parameter tuning & self-awareness are investigated on various angles and resulted in an indepth classification which is conferred from the tables. Further, classification could be used to design a holistic policy scheduler which could schedule a policy considering the application workload characteristics in totality. Also, the scheduler could help on performance improvement through scheduling/switching the appropriate policies at run time for application execution while considering the system status.

Resource-Aware Scheduling in Heterogeneous, Multi-core Clusters for Energy Efficiency

Conference Paper

Nov 2017

Xuan T. Tran

The benefits and necessity of multi-core technology are undeniable and make it a critical trend in chip manufacture. This shift, however, also brings complexities in computer sciences, especially in job scheduling problem. Additionally, energy bills have been a major concern due to the increasing population of computing systems lately. The trade-off between performance and energy efficiency in such systems makes the scheduling optimization more challenging. This study aims to propose an energy-efficient scheduling solution that exploits the resource heterogeneity and utilization in computing clusters of multi-core processors. The numerical results show that the proposed policy helps saving significant energy in a heterogeneous cluster.

The impact of dynamic power management in computational clusters with multi-core processors

Article

Full-text available

Jun 2016
J SCI IND RES INDIA

In this paper, we study a question related to the execution of jobs in computing clusters built from servers with multi-core processors. Scenarios where a single job is executed or multiple jobs are simultaneously processed by multi-core processors are investigated. Workloads based on captured traces are used in our study. Numerical results demonstrate that the computing resources are efficiently utilized in a multiple job executing scenario and the setup time has a slight impact on the average response time of a computing cluster. Furthermore, a scenario where multiple jobs are simultaneously executed by cores and Dynamic Power Management (DPM) is applied for each processor core yields the most efficient energy consumption. As a consequence, schedulers should take account the feature of multicore processors to save the energy consumption of computing clusters.

Real-Time Agent-Based Modeling Simulation with in-Situ Visualization of Complex Biological Systems: A Case Study on Vocal Fold Inflammation and Healing

Conference Paper

Full-text available

May 2016

We present an efficient and scalable scheme for implementing agent-based modeling (ABM) simulation with In Situ visualization of large complex systems on heterogeneous computing platforms. The scheme is designed to make optimal use of the resources available on a heterogeneous platform consisting of a multicore CPU and a GPU, resulting in minimal to no resource idle time. Furthermore, the scheme was implemented under a client-server paradigm that enables remote users to visualize and analyze simulation data as it is being generated at each time step of the model. Performance of a simulation case study of vocal fold inflammation and wound healing with 3.8 million agents shows 35× and 7× speedup in execution time over single-core and multi-core CPU respectively. Each iteration of the model took less than 200 ms to simulate, visualize and send the results to the client. This enables users to monitor the simulation in real-time and modify its course as needed.

The Effect of RSA Exponential Key Growth on the Multi-Core Computational Resource

Article

Apr 2016

Cryptography has been widely used as a mean to secure message communication. A cryptosystem is made up of a publicly available algorithm and a secretly kept key. The algorithm is responsible for transforming the original message into something unintelligible. The result of losing the key or cracked algorithm can be catastrophic, where all secret communications will be known to adversaries. One way to find the key is by brute-force attacks which try every possible combination of keys. The only way to prevent this is by having the key of sufficiently large enough such that finding the right key cannot be made in a reasonable time frame. However, large key size imposes extra computational works which result in larger energy consumption and thus more heat dissipation to the environment. Therefore, the selection of key size does not only depends on the required security level, but also factors such as the ability of the processor and the available memory resources. The advent of multi-core technology promises some improvements in the utilization of computational resources. Many reports support the idea that multi-core technology brought a significant improvement over the single core technology. In this study, we investigate this hypothesis on the RSA cryptosystem in relation to the key size. Earlier studies reported multi-core efficiency in normal applications, but the question arises if multi-core architecture remains superior to a single core architecture when dealing with applications involving large integers. From our experimentation, we observe that the higher the number of cores, the better the performance of the encryption and decryption processes. The quad-core technology can smoothly handle operations involving 8192 bits key. © 2016 Mohamad A. Mohamed, Ammar Y. Tuama, Mokhairi Makhtar, Mohd K. Awang and Mustafa Mamat.

Multi-core processors - An overview

Article

Full-text available

Oct 2011

Balaji Venu

Microprocessors have revolutionized the world we live in and continuous efforts are being made to manufacture not only faster chips but also smarter ones. A number of techniques such as data level parallelism, instruction level parallelism and hyper threading (Intel's HT) already exists which have dramatically improved the performance of microprocessor cores. This paper briefs on evolution of multi-core processors followed by introducing the technology and its advantages in today's world. The paper concludes by detailing on the challenges currently faced by multi-core processors and how the industry is trying to address these issues.

The multicore architecture

Chapter

Jan 2022

Threads Scheduling and Load Balancing with Loop Iteration in Multicore Processors: a Case Study with OpenMP

Conference Paper

Dec 2021

Performance comparison between a single core and multi-core processor

Context in source publication

Similar publications

Citations