VALVE encoder block diagram. The figure shows three segments: 32-bit segment, 24-bit MSB segment, and 16-bit MSB segment. The segment selector selects the best hits among various hits and puts the value on the data bus.

Source publication

Tunable and Energy Efficient Bus Encoding Techniques

Article

Full-text available

Aug 2009

Off-Chip buses constitute a significant portion of the total system power in embedded systems. Many research works have focused on reducing power consumption in the off-chip buses. While numerous techniques exist for reducing bus power in address buses, only a handful of techniques have been proposed for off-chip data bus power reduction. In this p...

Context 1

... algorithm for the VALVE encoder is described in Algorithm 1. The VALVE encoder, shown in Fig. 2, can encode bit patterns of width 32-bits, 24-bits, and 16-bits. For every data value, masks are applied to extract the 32, 24, and 16 bit patterns. These bit patterns are then looked up in the appropriate segments of the VALVE table. In the event of a hit in multiple segments, the segment selector picks the hit from a segment with the ...

View in full-text

Estimating data bus size for custom processors in embedded systems

Article

Full-text available

Mar 2005

We propose a method to estimate the data bus width to the requirements of an application that is to run on a custom processor. The proposed estimation method is a simulation-based tool that uses Extreme Value Theory to estimate the width of an off-chip or on-chip data bus based on the characteristics of the application. It finds the minimum number...

Towards Compression At All Levels In The Memory Hierarchy

Thesis

Apr 2021

Daniel Rodrigues Carvalho

Hardware compression techniques are typically simplifications of software compression methods. They must, however, comply with area, power and latency constraints. This study unveils the challenges of adopting compression in memory design. The goal of this analysis is not to summarize proposals, but to put in evidence the solutions they employ to handle those challenges. An in-depth description of the main characteristics of multiple methods is provided, as well as criteria that can be used as a basis for the assessment of such schemes.Typically, these schemes are not very efficient, and those that do compress well decompress slowly. This work explores their granularity to redefine their perspectives and improve their efficiency, through a concept called Region-Chunk compression. Its goal is to achieve low (good) compression ratio and fast decompression latency. The key observation is that by further sub-dividing the chunks of data being compressed one can reduce data duplication. This concept can be applied to several previously proposed compressors, resulting in a reduction of their average compressed size. In particular, a single-cycle-decompression compressor is boosted to reach a compressibility level competitive to state-of-the-art proposals.Finally, to increase the probability of successfully co-allocating compressed lines, Pairwise Space Sharing (PSS) is proposed. PSS can be applied orthogonally to compaction methods at no extra latency penalty, and with a cost-effective metadata overhead. The proposed system (Region-Chunk+PSS) further enhances the normalized average cache capacity by 2.7% (geometric mean), while featuring short decompression latency.

Distilling the Essence of Raw Video to Reduce Memory Usage and Energy at Edge Devices

Conference Paper

Oct 2019

Video broadcast and streaming are among the most widely used applications for edge devices. Roughly 82% of the mobile internet traffic is made up of video data. This is likely to worsen with the advent of 5G that will open up new opportunities for high resolution videos, virtual and augmented reality-based applications. The raw video data produced and consumed by edge devices is considerably higher than what is transmitted out of them. This leads to huge memory bandwidth and energy requirements from such edge devices. Therefore, optimizing the memory bandwidth and energy consumption needs is imperative for further improvements in energy efficiency of such edge devices. In this paper, we propose two mechanisms for on-the-fly compression and approximation of raw video data that is generated by the image sensors. The first mechanism, MidVB, performs lossless compression of the video frames coming out of the sensors and stores the compressed format into the memory. The second mechanism, Distill, builds on top of MidVB and further reduces memory consumption by approximating the video frame data. On an average, across 20 raw videos, MidVB and Distill are able to reduce the memory bandwidth by 43% and 72%, respectively, over the raw representation. They outperform a well known memory saving mechanism by 7% and 36%, respectively. Furthermore, MidVB and Distill reduce the energy consumption by 40% and 67%, respectively, over the baseline.

Efficient Channel Coding Scheme for Optical Memory: An Overview

Article

Apr 2019

Jesna Jose

A Survey of Encoding Techniques for Reducing Data-Movement Energy

Article

Full-text available

Nov 2018
J SYST ARCHITECT

In modern processors, data-movement consumes two orders of magnitude higher energy than a floating-point operation and hence, data-movement is becoming the primary bottleneck in scaling the performance of modern processors within the fixed power budget. Intelligent data-encoding techniques hold the promise of reducing the data-movement energy. In this paper, we present a survey of encoding techniques for reducing data-movement energy. By classifying the works on key metrics, we bring out their similarities and differences. This paper is expected to be useful for computer architects, processor designers and researchers in the area of interconnect and memory system design.

Reducing Data Transfer Energy by Exploiting Similarity within a Data Transaction

Conference Paper

Feb 2018

Based on Prospective Dynamic Frequency Scale Power Optimize Method for Multi-Cores Processor's I/O System

Article

Full-text available

Nov 2012

Yufeng Guo

Power problem has been one of the most restricting development barriers of processor. I/O system power is an import part of the processor's power. The paper aims at I/O dynamic power of multi-cores processor, and put forwards a prospective dynamic frequency scale with clock gating power optimize method. The experimental results show that the method which we put forward can greatly reduce the dynamic power of I/O system

GPH: A group-based partitioning scheme for reducing total power consumption of parallel buses

Article

Feb 2011
MICROPROCESS MICROSY

Two main sources for power dissipation in parallel buses are data transitions on each wire and coupling between adjacent wires. So far, many techniques have been proposed for reducing the self and coupling powers. Most of these methods utilize one (or more) control bit(s) to manage the behavior of data transitions on the parallel bus. In this paper, we propose a new coding scheme, referred to as GPH, to reduce power dissipation of these control bits. GPH coding scheme employs partitioned Bus Invert and Odd Even Bus-Invert coding techniques. This method benefits from Particle Swarm Optimization (PSO) algorithm to efficiently partition the bus. In order to reduce self and coupling powers of the control bits, it finds partitions with similar transition behaviors and groups them together. One extra control bit is added to each group of partitions. Properly managing number of transitions on control bits of each partition and that of each group, GPH reduces total power consumption, including coupling power. It also locates control bits of each partition such that total power consumption is minimized. We evaluate the efficiency of the proposed method for coding data and address buses under various hardware platforms. Experimental results show 43% average power saving in coded data compared to the original one. We also show the prominence of our coding scheme over previously proposed techniques.

Reducing data movement energy via online data clustering and encoding

Conference Paper

Oct 2016

More is less: improving the energy efficiency of data movement via opportunistic use of sparse codes

Conference Paper

Dec 2015

Data movement over long and highly capacitive interconnects is responsible for a large fraction of the energy consumed in nanometer ICs. DDRx, the most broadly adopted family of DRAM interfaces, contributes significantly to the overall system energy in a wide range of computer systems. To reduce the energy cost of data transfers, DDR4 adopts a pseudo open-drain IO circuit that consumes power only when transmitting or receiving a 0, which makes the IO energy proportional to the number of 0s transferred over the data bus. A data bus invert (DBI) coding technique is therefore supported by the DDR4 standard to encode each byte using a small number of 0s. Although sparse coding techniques that are more advanced than DBI can reduce the IO power further, the relatively high bandwidth overhead of these codes has heretofore prevented their application to the DDRx bus. This paper presents MiL (More is Less), a novel data communication framework built on top of DDR4, which exploits the data bus under-utilization caused by DRAM timing constraints to selectively apply sparse codes, thereby reducing the IO energy without compromising system performance. Evaluation results on a set of eleven parallel applications show that MiL can reduce the average IO interface energy by 49%, and the average DRAM system energy by 8% when added on top of a conventional DDR4 system, with less than 2% performance degradation on average.

Energy Efficient Data Encoding in DRAM Channels Exploiting Data Value Similarity

Article

Jun 2016
Comput Architect News

As DRAM data bandwidth increases, tremendous energy is dissipated in the DRAM data bus. To reduce the energy consumed in the data bus, DRAM interfaces with asymmetric termination, such as Pseudo Open Drain (POD) and Low Voltage Swing Terminated Logic (LVSTL), have been adopted in modern DRAMs. In interfaces using asymmetric termination, the amount of termination energy is proportional to the hamming weight of the data words. In this work, we propose Bitwise Difference Encoding (BD-Encoding), which decreases the hamming weight of data words, leading to a reduction in energy consumption in the modern DRAM data bus. Since smaller hamming weight of the data words also reduces switching activity, switching energy and power noise are also both reduced. BD-Encoding exploits the similarity in data words in the DRAM data bus. We observed that similar data words (i.e. data words whose hamming distance is small) are highly likely to be sent over at similar times. Based on this observation, BD-coder stores the data recently sent over in both the memory controller and DRAMs. Then, BD-coder transfers the bitwise difference between the current data and the most similar data. In an evaluation using SPEC 2006, BD-Encoding using 64 recent data reduced termination energy by 58.3% and switching energy by 45.3%. In addition, 55% of the LdI/dt noise was decreased with BD-Encoding.

VALVE encoder block diagram. The figure shows three segments: 32-bit segment, 24-bit MSB segment, and 16-bit MSB segment. The segment selector selects the best hits among various hits and puts the value on the data bus.

Context in source publication

Similar publications

Citations