Fig 2 - uploaded by Walild A. Najjar
Content may be subject to copyright.
VALVE encoder block diagram. The figure shows three segments: 32-bit segment, 24-bit MSB segment, and 16-bit MSB segment. The segment selector selects the best hits among various hits and puts the value on the data bus.

VALVE encoder block diagram. The figure shows three segments: 32-bit segment, 24-bit MSB segment, and 16-bit MSB segment. The segment selector selects the best hits among various hits and puts the value on the data bus.

Source publication
Article
Full-text available
Off-Chip buses constitute a significant portion of the total system power in embedded systems. Many research works have focused on reducing power consumption in the off-chip buses. While numerous techniques exist for reducing bus power in address buses, only a handful of techniques have been proposed for off-chip data bus power reduction. In this p...

Context in source publication

Context 1
... algorithm for the VALVE encoder is described in Algorithm 1. The VALVE encoder, shown in Fig. 2, can encode bit patterns of width 32-bits, 24-bits, and 16-bits. For every data value, masks are applied to extract the 32, 24, and 16 bit patterns. These bit patterns are then looked up in the appropriate segments of the VALVE table. In the event of a hit in multiple segments, the segment selector picks the hit from a segment with the ...

Similar publications

Article
Full-text available
We propose a method to estimate the data bus width to the requirements of an application that is to run on a custom processor. The proposed estimation method is a simulation-based tool that uses Extreme Value Theory to estimate the width of an off-chip or on-chip data bus based on the characteristics of the application. It finds the minimum number...

Citations

... The techniques discussed so far focus on increasing cache or memory capacity, reduce miss rates, and consequently lower bandwidth usage; however, compression techniques can also be applied to directly improve effective bandwidth, reduce transfers' power requirements, or narrow Citron et al. [CR95] propose Bus-Expander, a component that compresses and decompresses data words between the ends of the buses (Figure 3.3). It works based on the principle of 55 A similar approach is adopted by Yang et al. [YGZ04], where the whole word or bytes are stored in a Compression scheme T SWC [TSS08] Significance Width B TUBE [SAYN09] Partial dictionary E VALVE [SAYN09] Partial dictionary E ...
... The techniques discussed so far focus on increasing cache or memory capacity, reduce miss rates, and consequently lower bandwidth usage; however, compression techniques can also be applied to directly improve effective bandwidth, reduce transfers' power requirements, or narrow Citron et al. [CR95] propose Bus-Expander, a component that compresses and decompresses data words between the ends of the buses (Figure 3.3). It works based on the principle of 55 A similar approach is adopted by Yang et al. [YGZ04], where the whole word or bytes are stored in a Compression scheme T SWC [TSS08] Significance Width B TUBE [SAYN09] Partial dictionary E VALVE [SAYN09] Partial dictionary E ...
... They also propose extra techniques to reduce bit flipping, such as XOR-ing values from sequential different value transfers, and not allowing values that have low hamming distance from their one hot encoding to be stored in the table. Since multiple data types can be transferred, one can keep a different table per type to reduce negative interaction[ALYK12]. Partial dictionary matches can also be performed on the tables by embedding extra pattern information within the method's metadata[SAYN09].Finally, Burtscher's FPC (BFPC)[BR10] mixes these approaches by matching the MSB of the table and sending the number of differing LSB. To reduces metadata overhead, instead of sending index bits, they predict which table entry to use.One could also effectively store data in memories in compressed format, but not co-allocate them, in order to reduce the bandwidth usage. ...
Thesis
Hardware compression techniques are typically simplifications of software compression methods. They must, however, comply with area, power and latency constraints. This study unveils the challenges of adopting compression in memory design. The goal of this analysis is not to summarize proposals, but to put in evidence the solutions they employ to handle those challenges. An in-depth description of the main characteristics of multiple methods is provided, as well as criteria that can be used as a basis for the assessment of such schemes.Typically, these schemes are not very efficient, and those that do compress well decompress slowly. This work explores their granularity to redefine their perspectives and improve their efficiency, through a concept called Region-Chunk compression. Its goal is to achieve low (good) compression ratio and fast decompression latency. The key observation is that by further sub-dividing the chunks of data being compressed one can reduce data duplication. This concept can be applied to several previously proposed compressors, resulting in a reduction of their average compressed size. In particular, a single-cycle-decompression compressor is boosted to reach a compressibility level competitive to state-of-the-art proposals.Finally, to increase the probability of successfully co-allocating compressed lines, Pairwise Space Sharing (PSS) is proposed. PSS can be applied orthogonally to compaction methods at no extra latency penalty, and with a cost-effective metadata overhead. The proposed system (Region-Chunk+PSS) further enhances the normalized average cache capacity by 2.7% (geometric mean), while featuring short decompression latency.
... To the best of our knowledge, this is the first work to control the compression and approximation of raw (pre-encoded) video frames based on the video quality while minimizing overheads. Energy Optimization Techniques: Multiple energy optimization techniques have been studied on edge devices [2,21,26,30,50,51,56,57,62,64,66,81,83,87]. Several proposals by Zhu et al. [88,89], Nachiappan et al. [38] and Yedlapalli et al. [81] have proposed mobile-SoC energy optimization through scheduling and virtualization. ...
Conference Paper
Video broadcast and streaming are among the most widely used applications for edge devices. Roughly 82% of the mobile internet traffic is made up of video data. This is likely to worsen with the advent of 5G that will open up new opportunities for high resolution videos, virtual and augmented reality-based applications. The raw video data produced and consumed by edge devices is considerably higher than what is transmitted out of them. This leads to huge memory bandwidth and energy requirements from such edge devices. Therefore, optimizing the memory bandwidth and energy consumption needs is imperative for further improvements in energy efficiency of such edge devices. In this paper, we propose two mechanisms for on-the-fly compression and approximation of raw video data that is generated by the image sensors. The first mechanism, MidVB, performs lossless compression of the video frames coming out of the sensors and stores the compressed format into the memory. The second mechanism, Distill, builds on top of MidVB and further reduces memory consumption by approximating the video frame data. On an average, across 20 raw videos, MidVB and Distill are able to reduce the memory bandwidth by 43% and 72%, respectively, over the raw representation. They outperform a well known memory saving mechanism by 7% and 36%, respectively. Furthermore, MidVB and Distill reduce the energy consumption by 40% and 67%, respectively, over the baseline.
... In multi-dimensional optical data storage uses a spatial region of a recording medium. Dinesh C. Suresh and Jun Yang [13] et al presents two novel data bus encoding schemes to reduce power consumption in the data bus. The first encoding scheme called variable length value encoder (VALVE). ...
... Table 15 highlights the differences between the techniques discussed in this section with those discussed in Section 6.1. We now review FV encoding techniques that exploit locality of only full value [52] and both full and partial value [62,64,73,91]. • Yang et al. [52] present an FV encoding technique which transmits FVs in encoded form and infrequent values without encoding. ...
Article
Full-text available
In modern processors, data-movement consumes two orders of magnitude higher energy than a floating-point operation and hence, data-movement is becoming the primary bottleneck in scaling the performance of modern processors within the fixed power budget. Intelligent data-encoding techniques hold the promise of reducing the data-movement energy. In this paper, we present a survey of encoding techniques for reducing data-movement energy. By classifying the works on key metrics, we bring out their similarities and differences. This paper is expected to be useful for computer architects, processor designers and researchers in the area of interconnect and memory system design.
... Data Encoding Techniques Leveraging Data Equality. Several previously proposed techniques [27,28,29,30,31,32,33,34] store specific data in a cache repository and transfer an energy-efficient encoded form of the data when a transaction is matched to the cached data (e.g., sending the index information of matching data in the cache). Our mechanism is different from this approach in several aspects. ...
... Architecture level optimization methods includes voltage/frequency Scale[4] [5] [6] [7], clock gating[8] [9] and low power coding techniques[10] [11] etc. II. PROSPECTIVE DYNAMIC FREQUENCY SCALE POWER OPTIMIZE METHOD ...
Article
Full-text available
Power problem has been one of the most restricting development barriers of processor. I/O system power is an import part of the processor's power. The paper aims at I/O dynamic power of multi-cores processor, and put forwards a prospective dynamic frequency scale with clock gating power optimize method. The experimental results show that the method which we put forward can greatly reduce the dynamic power of I/O system
... Most of them are effective for random data, which have high activity. On the other hand, unfortunately, most of the proposed coding algorithms (e.g., [11,[15][16][17]) only consider self transitions, and hence, they are incapable to reduce power consumption of data buses in future CMOS technologies. ...
... There are various low-power coding methods for data buses: BI (Bus Invert) code [8] for uncorrelated data patterns, and probability-based mapping [9][10][11] for patterns with non-uniform probability densities. So, at first, probability-based mapping will be reviewed, and after that the BI-based coding will be discussed. ...
... In other words, considering wide on-chip buses for future high performance computing systems, implementation of table-based encoding schemes enforces intolerable overheads. While the WCM algorithm considers both the self and coupling capacitances of the bus wires, Suresh et al. [11] have proposed two novel data bus encoding schemes, named as VALVE and TUBE, to reduce self transitions in the data buses. VALVE and TUBE use a Content Addressable Memory (CAM) table to store a finite set of fixedwidth codes. ...
Article
Two main sources for power dissipation in parallel buses are data transitions on each wire and coupling between adjacent wires. So far, many techniques have been proposed for reducing the self and coupling powers. Most of these methods utilize one (or more) control bit(s) to manage the behavior of data transitions on the parallel bus. In this paper, we propose a new coding scheme, referred to as GPH, to reduce power dissipation of these control bits. GPH coding scheme employs partitioned Bus Invert and Odd Even Bus-Invert coding techniques. This method benefits from Particle Swarm Optimization (PSO) algorithm to efficiently partition the bus. In order to reduce self and coupling powers of the control bits, it finds partitions with similar transition behaviors and groups them together. One extra control bit is added to each group of partitions. Properly managing number of transitions on control bits of each partition and that of each group, GPH reduces total power consumption, including coupling power. It also locates control bits of each partition such that total power consumption is minimized. We evaluate the efficiency of the proposed method for coding data and address buses under various hardware platforms. Experimental results show 43% average power saving in coded data compared to the original one. We also show the prominence of our coding scheme over previously proposed techniques.
Conference Paper
Data movement over long and highly capacitive interconnects is responsible for a large fraction of the energy consumed in nanometer ICs. DDRx, the most broadly adopted family of DRAM interfaces, contributes significantly to the overall system energy in a wide range of computer systems. To reduce the energy cost of data transfers, DDR4 adopts a pseudo open-drain IO circuit that consumes power only when transmitting or receiving a 0, which makes the IO energy proportional to the number of 0s transferred over the data bus. A data bus invert (DBI) coding technique is therefore supported by the DDR4 standard to encode each byte using a small number of 0s. Although sparse coding techniques that are more advanced than DBI can reduce the IO power further, the relatively high bandwidth overhead of these codes has heretofore prevented their application to the DDRx bus. This paper presents MiL (More is Less), a novel data communication framework built on top of DDR4, which exploits the data bus under-utilization caused by DRAM timing constraints to selectively apply sparse codes, thereby reducing the IO energy without compromising system performance. Evaluation results on a set of eleven parallel applications show that MiL can reduce the average IO interface energy by 49%, and the average DRAM system energy by 8% when added on top of a conventional DDR4 system, with less than 2% performance degradation on average.
Article
As DRAM data bandwidth increases, tremendous energy is dissipated in the DRAM data bus. To reduce the energy consumed in the data bus, DRAM interfaces with asymmetric termination, such as Pseudo Open Drain (POD) and Low Voltage Swing Terminated Logic (LVSTL), have been adopted in modern DRAMs. In interfaces using asymmetric termination, the amount of termination energy is proportional to the hamming weight of the data words. In this work, we propose Bitwise Difference Encoding (BD-Encoding), which decreases the hamming weight of data words, leading to a reduction in energy consumption in the modern DRAM data bus. Since smaller hamming weight of the data words also reduces switching activity, switching energy and power noise are also both reduced. BD-Encoding exploits the similarity in data words in the DRAM data bus. We observed that similar data words (i.e. data words whose hamming distance is small) are highly likely to be sent over at similar times. Based on this observation, BD-coder stores the data recently sent over in both the memory controller and DRAMs. Then, BD-coder transfers the bitwise difference between the current data and the most similar data. In an evaluation using SPEC 2006, BD-Encoding using 64 recent data reduced termination energy by 58.3% and switching energy by 45.3%. In addition, 55% of the LdI/dt noise was decreased with BD-Encoding.