Percentage reduction in branch misprediction rate with BMP (baseline bimodal or gshare branch predictor is 1KB) for SPECfp 2000 benchmarks.

Source publication

Low power/area branch prediction using complementary branch predictors

Conference Paper

Full-text available

Apr 2008

Although high branch prediction accuracy is necessary for high performance, it typically comes at the cost of larger predictor tables and/or more complex prediction algorithms. Unfortunately, large predictor tables and complex algorithms require more chip area and have higher power consumption, which precludes their use in embedded processors. As a...

Predicting Voltage Droops Using Recurring Program and Microarchitectural Event Activity

Article

Full-text available

Mar 2010

Shrinking feature size and diminishing supply voltage are making circuits more sensitive to supply voltage fluctuations within a microprocessor. If left unattended, voltage fluctuations can lead to timing violations or even transistor lifetime issues. A mechanism that dynamically learns to predict dangerous voltage fluctuations based on program and...

A Survey of Techniques for Dynamic Branch Prediction

Article

Full-text available

Apr 2018
CONCURR COMP-PRACT E

Sparsh Mittal

Branch predictor (BP) is an essential component in modern processors since high BP accuracy can improve performance and reduce energy by decreasing the number of instructions executed on wrong-path. However, reducing latency and storage overhead of BP while maintaining high accuracy presents significant challenges. In this paper, we present a survey of dynamic branch prediction techniques. We classify the works based on key features to underscore their differences and similarities. We believe this paper will spark further research in this area and will be useful for computer architects, processor designers and researchers.

An Improved Dynamic Branch Predictor by Selective Access of a Specific Element in 4-Way Cache

Article

Dec 2013

This paper proposes an improved branch predictor that reduces the number execution cycles of applications by selectively accessing a specific element in 4-way associative cache. When a branch instruction is fetched, the proposed branch predictor acquires a branch target address from the selected element in the cache by referring to MRU buffer. Branch prediction rate and application execution speed are considerably improved by increasing the number of BTAC entries in restricted power condition, when compared with that of previous branch predictor which accesses all elements. The effectiveness of the proposed dynamic branch predictor is verified by executing benchmark applications on the core simulator. Experimental results show that number of execution cycles decreases by an average of 10.1%, while power consumption increases an average of 7.4%, when compared to that of a core without a dynamic branch predictor. Execution cycles are reduced by 4.1% in comparison with a core which employs previous dynamic branch predictor.

A Shadow Dynamic Finite State Machine for Branch Prediction: An Alternative for the 2-bit Saturating Counter

Article

Full-text available

Jan 2011
INFORMATICA-LITHUAN

We propose an adaptive learning machine-based branch predictor – the shadow dynamic finite state machine (SDFSM) – that enables more accurate branch predictions by learning unique branching patterns through a self-modifying technique. SDFSM states represent branch pattern bits. If a state mispredicts a branch, the state is swapped with its shadow state, which represents the correct branching pattern bit. Therefore, the prediction accuracy can reach 100% if the number of states matches a branch's pattern length. When compared to a 2-bit saturating counter using bimodal branch predictors, the SDFSM decreases average misprediction rates by 18.3%, with individual decreases as high as 55%. Povzetek: Predstavljena je metoda za učenje vejitvenih vzorcev v procesorju.

Cost-effective branch prediction by combining hedging and filtering

Article

Dec 2010

With the ever increasing needs of power aware architecture and circuit design in recent years, how to reduce the power consumption of processors without sacrificing performance has become an important issue. In this paper, we propose a new method for low power branch prediction-Hedging Filter, which combines filtering scheme reducing dynamic power consumption with hedging prediction mechanism lowering static power dissipation. We analyze and empirically study this proposed scheme embodied in the Sentry Table-Complementary Branch Prediction combo with respect to critical path delay, performance, hardware overhead and power consumption. Hedging Filter not only preserves critical path delay and prediction accuracy, but also contributes to the savings of dynamic and static power. From our evaluation, presuming equivalent or superior performance with respect to traditional counterparts, the proposed method reduces branch prediction hardware cost by up to 71% and power saving by up to 79% respectively.

Design Space Exploration of Hybrid Ultra Low Power Branch Predictors

Conference Paper

Feb 2012

Modern branch predictors are often too large and power hungry to be a viable option for small, embedded processors where die space, power consumption and performance are all at a premium. With embedded processors the large cache structures required for high performance branch prediction can easily take up more die space than the rest of the processor combined. When coupled with the large leakage energies, which are set to be an increasing issue as technologies advance to 45nm and beyond, it can often appear appealing to not use a dynamic branch predictor at all. This paper seeks to find a way of using an ultra small branch predictor in a hybrid predictor configuration suitable for an embedded processor. We introduce a novel bias parameter to the consideration of when to execute branches statically or dynamically, further exploring the performance vs energy trade-off. We present a solution that reduces dynamic branch predictor aliasing, improves performance and requires a minimum of extra die space. The results presented relate die space requirements, energy use and performance impacts. We look at how best to optimise this balance in a way that is usually not considered, and on a lower bits budget than has previously been presented. The EEMBC 1.1 benchmark suite [1] was used to explore the energy vs performance trade-off boundary, taking averages of the results across 31 different benchmarks. We evaluate 5 traditional branch predictor configurations and 36 novel ultra small hybrid branch predictors through the use of 9 sets of our novel bias values, combining GShare dynamic predictions with profiled backwards taken forwards not-taken (BTFN)/ backwards not-taken forwards taken (BNFT) static predictions. The results demonstrate that the use of a static-dynamic hybrid is not only beneficial but necessary for very small predictors to produce a positive effect on the cycle count and overall energy use of the processor. Through the use of our novel bias parameter we explore the performance vs energy trade-off and show that through a small (0.1 seconds at 500MHz or 0.35%) reduction in peak performance (total runtime in region of 28.35 seconds) for a given architecture we can gain substantial dynamic energy savings from reduced dynamic predictor accesses (removing up to an additional 16.5%, or 53 million, of the traditional hybrid predictor accesses). Our best performing architecture showed an average improvement in run time of 2 seconds (6.7%) over a static BTFN baseline (total runtime 30.46s), at the cost of only an additional 0.01mm2 (or 1%) die space.

Percentage reduction in branch misprediction rate with BMP (baseline bimodal or gshare branch predictor is 1KB) for SPECfp 2000 benchmarks.

Similar publications

Citations