Figure 8 - uploaded by Resit Sendag
Content may be subject to copyright.
Percentage reduction in branch misprediction rate with BMP (baseline bimodal or gshare branch predictor is 1KB) for SPECfp 2000 benchmarks.

Percentage reduction in branch misprediction rate with BMP (baseline bimodal or gshare branch predictor is 1KB) for SPECfp 2000 benchmarks.

Source publication
Conference Paper
Full-text available
Although high branch prediction accuracy is necessary for high performance, it typically comes at the cost of larger predictor tables and/or more complex prediction algorithms. Unfortunately, large predictor tables and complex algorithms require more chip area and have higher power consumption, which precludes their use in embedded processors. As a...

Similar publications

Article
Full-text available
Shrinking feature size and diminishing supply voltage are making circuits more sensitive to supply voltage fluctuations within a microprocessor. If left unattended, voltage fluctuations can lead to timing violations or even transistor lifetime issues. A mechanism that dynamically learns to predict dangerous voltage fluctuations based on program and...

Citations

... As power budget becomes the primary design constraint in all ranges of computing systems [32], improvement in energy efficiency of BP has become vital to justify its use in modern processors. [8,72] or side predictor [31,59] for them, not storing them in BH [2,17], not updating all tables [1,72] and skipping dot-product [18] or BP access [65] for them Predicting loop branches [46,73] Correlating on data values [34,41,62,74,75] Modifying/filtering global history [37,38,56,72,76] Adapting history length [13,27] ...
... Sendag et al. [46] propose a design which uses a complementary BP (CBP) along with the conventional BP. The CBP only focuses on commonly mispredicted branches, whereas the conventional BP speculates on predictable branches. ...
... Also, the confidence of interval value with higher confidence is decreased by one. Notice that both their interval corrector and the BMP of Sendag et al. [46] work by predicting when a misprediction will happen and then avoiding that misprediction. ...
Article
Full-text available
Branch predictor (BP) is an essential component in modern processors since high BP accuracy can improve performance and reduce energy by decreasing the number of instructions executed on wrong-path. However, reducing latency and storage overhead of BP while maintaining high accuracy presents significant challenges. In this paper, we present a survey of dynamic branch prediction techniques. We classify the works based on key features to underscore their differences and similarities. We believe this paper will spark further research in this area and will be useful for computer architects, processor designers and researchers.
... [5,6] . [8] , Y. Maa는 BTAC의 주소를 Hedging하 여 접근하였다 [9] . [23,24] . ...
Article
This paper proposes an improved branch predictor that reduces the number execution cycles of applications by selectively accessing a specific element in 4-way associative cache. When a branch instruction is fetched, the proposed branch predictor acquires a branch target address from the selected element in the cache by referring to MRU buffer. Branch prediction rate and application execution speed are considerably improved by increasing the number of BTAC entries in restricted power condition, when compared with that of previous branch predictor which accesses all elements. The effectiveness of the proposed dynamic branch predictor is verified by executing benchmark applications on the core simulator. Experimental results show that number of execution cycles decreases by an average of 10.1%, while power consumption increases an average of 7.4%, when compared to that of a core without a dynamic branch predictor. Execution cycles are reduced by 4.1% in comparison with a core which employs previous dynamic branch predictor.
... Branch predictor performance can also be evaluated using the misprediction speedup, as derived in [17], such that:Figure 9 shows the misprediction speedup verses hardware budget in number of entries for various SDFSM sizes compared to the counter-based predictor. These speedups are in line with speedups obtained for other recent innovations in branch predictors [18][20]. This paper proposes the shadow dynamic finite state machine (SDFSM), a new branch predictor where the FSM states are dynamically trained during rum-time to learn unique branch pattern behaviors. ...
Article
Full-text available
We propose an adaptive learning machine-based branch predictor – the shadow dynamic finite state machine (SDFSM) – that enables more accurate branch predictions by learning unique branching patterns through a self-modifying technique. SDFSM states represent branch pattern bits. If a state mispredicts a branch, the state is swapped with its shadow state, which represents the correct branching pattern bit. Therefore, the prediction accuracy can reach 100% if the number of states matches a branch's pattern length. When compared to a 2-bit saturating counter using bimodal branch predictors, the SDFSM decreases average misprediction rates by 18.3%, with individual decreases as high as 55%. Povzetek: Predstavljena je metoda za učenje vejitvenih vzorcev v procesorju.
Article
With the ever increasing needs of power aware architecture and circuit design in recent years, how to reduce the power consumption of processors without sacrificing performance has become an important issue. In this paper, we propose a new method for low power branch prediction-Hedging Filter, which combines filtering scheme reducing dynamic power consumption with hedging prediction mechanism lowering static power dissipation. We analyze and empirically study this proposed scheme embodied in the Sentry Table-Complementary Branch Prediction combo with respect to critical path delay, performance, hardware overhead and power consumption. Hedging Filter not only preserves critical path delay and prediction accuracy, but also contributes to the savings of dynamic and static power. From our evaluation, presuming equivalent or superior performance with respect to traditional counterparts, the proposed method reduces branch prediction hardware cost by up to 71% and power saving by up to 79% respectively.
Conference Paper
Modern branch predictors are often too large and power hungry to be a viable option for small, embedded processors where die space, power consumption and performance are all at a premium. With embedded processors the large cache structures required for high performance branch prediction can easily take up more die space than the rest of the processor combined. When coupled with the large leakage energies, which are set to be an increasing issue as technologies advance to 45nm and beyond, it can often appear appealing to not use a dynamic branch predictor at all. This paper seeks to find a way of using an ultra small branch predictor in a hybrid predictor configuration suitable for an embedded processor. We introduce a novel bias parameter to the consideration of when to execute branches statically or dynamically, further exploring the performance vs energy trade-off. We present a solution that reduces dynamic branch predictor aliasing, improves performance and requires a minimum of extra die space. The results presented relate die space requirements, energy use and performance impacts. We look at how best to optimise this balance in a way that is usually not considered, and on a lower bits budget than has previously been presented. The EEMBC 1.1 benchmark suite [1] was used to explore the energy vs performance trade-off boundary, taking averages of the results across 31 different benchmarks. We evaluate 5 traditional branch predictor configurations and 36 novel ultra small hybrid branch predictors through the use of 9 sets of our novel bias values, combining GShare dynamic predictions with profiled backwards taken forwards not-taken (BTFN)/ backwards not-taken forwards taken (BNFT) static predictions. The results demonstrate that the use of a static-dynamic hybrid is not only beneficial but necessary for very small predictors to produce a positive effect on the cycle count and overall energy use of the processor. Through the use of our novel bias parameter we explore the performance vs energy trade-off and show that through a small (0.1 seconds at 500MHz or 0.35%) reduction in peak performance (total runtime in region of 28.35 seconds) for a given architecture we can gain substantial dynamic energy savings from reduced dynamic predictor accesses (removing up to an additional 16.5%, or 53 million, of the traditional hybrid predictor accesses). Our best performing architecture showed an average improvement in run time of 2 seconds (6.7%) over a static BTFN baseline (total runtime 30.46s), at the cost of only an additional 0.01mm2 (or 1%) die space.