Figure 2 - uploaded by Mauro Olivieri
Content may be subject to copyright.
Implementation of the swap operation (M = 4). Left: swap box. Right: complete swap unit. N is the bus width. 

Implementation of the swap operation (M = 4). Left: swap box. Right: complete swap unit. N is the bus width. 

Source publication
Article
Full-text available
We present a novel coding scheme for reducing bus power dissipation. The presented approach is well suited to driving off-chip buses, where the line capacitance is a dominant factor. A distinctive feature of the technique is the dynamic reordering of bus line positions, in order to minimize the toggling activity on physical bus wires. The effective...

Contexts in source publication

Context 1
... swapping patterns can be sequentially generated by a finite state machine (FSM) very similar to a binary counter. The direct binary representation of a swapping pattern is a vector of M binary numbers each ranging from 0 to M .. 1, therefore, requiring M · log2(M ) bits. The swap operation is performed by a set of multiplexers as in Fig. 2. Referring to M = 4, the 8-bit pattern is par- titioned into four 2-bit numbers, namely A, B, C, and D in the left part of Fig. 2. In practice, the extra lines to transmit the pattern are drastically reduced by means of a combinational pattern encoder, exploiting the fact that the allowed individual patterns are at most M!. The pro- posed coding function (Definition 2) is implemented by a twin swap unit, illustrated in Fig. 3; the conversion from a swapping pattern to its inverse is directly implemented by a dedicated two-level combinational logic unit PConv. In order to perform the M! attempts to find the best pat- tern, a partially or fully parallel implementation of BS de- coder can be pursued, employing L units, each perform- ing M !/L attempts. We refer to such solution as an L-way parallel architecture. The architecture of the single unit is shown in Fig. 4. PatGen is the FSM that generates the set of allowed patterns to be tried, H produces the Hamming distance between two words by performing a population count after XORing. The Cmp unit compares the actual Hamming distance with the temporary minimum. When all the patterns have been tried and the minimum distance found, the threshold unit stores the pattern, the encoded word and the distance value on output registers. Fig. 5 shows the top view of the encoder ...
Context 2
... swapping patterns can be sequentially generated by a finite state machine (FSM) very similar to a binary counter. The direct binary representation of a swapping pattern is a vector of M binary numbers each ranging from 0 to M .. 1, therefore, requiring M · log2(M ) bits. The swap operation is performed by a set of multiplexers as in Fig. 2. Referring to M = 4, the 8-bit pattern is par- titioned into four 2-bit numbers, namely A, B, C, and D in the left part of Fig. 2. In practice, the extra lines to transmit the pattern are drastically reduced by means of a combinational pattern encoder, exploiting the fact that the allowed individual patterns are at most M!. The pro- posed coding function (Definition 2) is implemented by a twin swap unit, illustrated in Fig. 3; the conversion from a swapping pattern to its inverse is directly implemented by a dedicated two-level combinational logic unit PConv. In order to perform the M! attempts to find the best pat- tern, a partially or fully parallel implementation of BS de- coder can be pursued, employing L units, each perform- ing M !/L attempts. We refer to such solution as an L-way parallel architecture. The architecture of the single unit is shown in Fig. 4. PatGen is the FSM that generates the set of allowed patterns to be tried, H produces the Hamming distance between two words by performing a population count after XORing. The Cmp unit compares the actual Hamming distance with the temporary minimum. When all the patterns have been tried and the minimum distance found, the threshold unit stores the pattern, the encoded word and the distance value on output registers. Fig. 5 shows the top view of the encoder ...

Similar publications

Conference Paper
Full-text available
This paper presents a self-checking implementation for adder schemes using the dual duplication code. To prove the efficiency of the proposed method, the circuit is simulated in double pass transistor CMOS at 32nm technology and some transient faults are voluntarily injected in the layout of the circuit. This fully differential implementation requi...
Article
Full-text available
Si1-xGex-channel pFETs can combine enhanced intrinsic performance with a threshold voltage shift, therefore this technology possibly facilitates the use of high-k/metal gate stacks in high-performance applications. This review presents imec's work on a new device concept using Si1-xGex-channels, the implant-free quantum well transistor, that can ad...
Conference Paper
Full-text available
In this paper, the research of the optimal layout of photovoltaic (PV) modules in a PV array giving the maximum output power under different shaded working conditions is carried out. The particular condition of non uniform solar exposition of the modules is analyzed. The study of the different configurations has been carried out starting from a cir...
Article
Full-text available
A programmable CMOS delay line circuit with microsecond delay range and adjustable duty cycle is proposed. Through circuit simulation, approximately 2μs delay range can be achieved using 10-bit counter operating at a clock frequency of 500MHz. Utilising synchronous counters instead of synchronous latches has significantly reduced the large occupied...
Conference Paper
Full-text available
We demonstrate yield prediction of silicon wavelength filters using layout-aware Monte-Carlo circuit simulations. Maps of wafer and die-level variability of width and thickness are projected onto circuit layout and translated into circuit model parameters. We apply this onto Mach-Zehnder lattice filters with different filter orders.

Citations

... The purpose of EV is spotted as an imperative means to minimize the emissions of CO2 and has gained attention from industries and academia [10]. EV covers numerous design issues based on power consumption, battery technologies, and energy-optimized digital control system [2,13,24,29]. An EV consumes small power as being static in roads intersections, and it charges batteries. ...
Article
Full-text available
The modernization in Electric Vehicles (EVs) has acquired immense interest amongst several researchers as the EV is termed a supreme mode of transportation. In addition, EV is imperative to preserve classical fuel, but EV poses short driving that are restricted by insufficient batteries that obstruct reliability, and there exist lesser charging applications, which are irregularly dispersed. A new model is devised for optimal routing to charge EV using server-hosted VANET. The goal is to discover optimal routes to charge EV with Vehicular Adhoc Network (VANET). The server-hosted VANET contains roadside and vehicle units such that roadside and vehicle units are operated with a cloud server. Here, optimal routes for attaining charging stations are discovered using the proposed Fractional-Social Ski Driver (Fractional-SSD), which is obtained by integrating Fractional calculus (FC) and Social Ski Driver optimization (SSD). In addition, the fitness function is newly developed using battery power, traffic density and distance. Thus, routing decisions are made to route the EV for charging the battery by adapting multi-objective factors. Hence, the proposed Fractional-SSD is employed to choose the optimal route for charging EV. As a result, the proposed Fractional-SSD acquired improved performance with the maximal battery power of 13,884.19 J, smallest traffic density of 6.5, delay of 10.973 min, and fitness of 24.800, respectively.
... The idea was to aggregate sensors data using an ultralow power binary operator that scrambles digital data stream according to a permutation pattern. The S operator (S=swap) has been formerly introduced in [14] for dynamic energy reduction in off-chip buses. We recover the original information applying an inverse permutation pattern to swapped data. ...
Article
In this work, we propose a novel data-Aggregation system for gathering heterogeneous and nonsparse signals from a cluster-based sensor network. The aggregation algorithm uses an ultra-low energy binary operator that performs the bit line permutation of the source data. The data detection introduces a binary noise whose reduction is by probabilistic process profiling and a further low-pass filtering. The proposed aggregation system compresses sensors data and it enables the secure (from passive attacks) transmission toward the base station. Single user binary data permutation dissipates 2.64 fJoule/cycle dynamic energy in 32 nm CMOS technology; instead, noise profiling dissipates an average 117.16 Pico Joule/cycle total energy in the same technology. Static power in both scenarios represents the most important source when data rate is 1 MHz.
... Since the late 80s, the challenge to contain the energy dissipation in an embedded system is critical for most of the silicon foundries [1]. Today, the processors dissipate few pico-Joules reducing the toggle activity [2] in the logic, optimizing the consumption of analog parts: phase locked loop [3], non volatile memories [4], voltage pumps and off-chip system buses [5]. The L1 cache hierarchy actually dissipates from 15% up to 40% of energy in an embedded processor [6]. ...
Article
Full-text available
In this work, we propose an architecture-level power optimization technique for L1 caches. The idea is to unify the DATA and TAG fields in a unique embedded static RAM and an intelligent cache controller to minimize the latency penalty. Moreover, an intermediate high-speed pre-fetch buffer optimizes the whole system. We apply this approach to direct-mapped instruction cache and set-associative data cache. Experimental results indicate the power saving by 20% with latency overhead by 12%.
... Unfortunately, the BI coding becomes ineffective for high bandwidth traffic, while the same approach is competitive for compressed data transmission on short buses (entropy 0.5 [16]). In 2004, we introduced a novel approach based on bus lines scrambling hereafter "Bus-Switch" (BS) [17] [18]. ...
Article
Full-text available
This paper introduces the best architecture for a novel low-power encoding system suitable for high bandwidth off-chip data buses. The technique, known as Bus Switch, reorders dynamically the lines of a bus in agreement to a permutation scheme such to minimize the total bus switching activity, responsible for the consumption of dynamic energy. The idea was to reduce the area, power and latency of the permutation circuits using fixed-scheme scrambling units. Moreover, I replaced the toggle count calculation and evaluation circuits with a hierarchical arrangement of analog comparators, representing the bus toggle binary string as a voltage value. I designed the Bus Switch encoder and decoder in semiconductor technologies at 90, 65 and 45 nanometers. The results confirmed that the proposed Bus Switch minimized the required transistors number and the related area and energy consumptions, extending the Bus Switch's field of application.
... Prior works own into different categories: power reduction in address or data buses [2][13] [19][20] [24], on [1] [21] or off [6][12] chip buses, high or low data rate. Bus invert represented a first tentative to encode information for low-power in data buses [21] This simple circuit has been employed in different design contexts: parallel off-chip buses, low-data rate transmission and embedded processors [16] Bus switch (BS) coding has been introduced faced the problem of reducing dynamic energy on external buses [12] [13] [26] [27]. Bus switch reduces energy better than prior approach in particular operative conditions. ...
... In the work [12] the authors defined the energy saving compared to original bus power dissipation. The used coding approach reduced bus energy dissipation conveniently, when the parameter E% , defined in the equation 2.1, is less than 100.0%. the bus capacitance and Vdd is the power supply. ...
... In the work [12] authors evaluated different coding/decoding functions. ANSI C simulation showed the best activity reduction using B(t) and b(t) as follows: ...
Article
Full-text available
In this paper, we proposed an high-speed and low-power off-chip data bus interface based on the best coding schemes in this hard operative condition. We analyzed the clustered bus invert method and the bus switch coding, a newly proposed approach based on bus lines logically re-ordered. We proposed an high speed and low-power bus interface based on the combined employment of these two approaches controlled by a 9-rules Takagi-Sugeno analog fuzzy controller. The controller analyzes the binary traffic statistical property changing on the fly the used coding scheme. The fuzzy controller has been designed taking care of total energy dissipation such to do not compromise the benefit of coding approaches. The controller is able also to re-configure the bus switch sub-section in an operative condition where original approach introduces strong power losses. We demonstrated the effectiveness of the approach designing at transistor level the analog fuzzy controller and the digital part of the bus interface. Simulation conducted with H-SPICE and NANOSIM confirmed the bus interface is the optimal trade-off for reducing dynamic energy in off-chip buses.
... The novel approach compresses data avoiding standard mathematical operators (+,*) that forces the MCU to work as digital signal processor. In particular, starting from a previous work on bus encoding [15], we re-utilize an ultra low-power binary operator (S= Swap Operator) that re-order a binary string according to a permutation pattern. This new operator permits data recovery, applying the inverse permutation pattern. ...
... The swap operator has been formerly introduced facing the problem to reduce power in off-chip buses [15]. Let be x (n) the input bus with res n lines; the swap operator S [], with permutation pattern p (n), produces the reordered bus y (n), in short: ...
Article
The paper introduced a novel methodology, for reducing energetic consumption, during data compression in homogenous sensor nodes organized in a cluster based network. Our approach employed a bit-wise operator previously used in the context of the reduction of dynamic energy in external buses. The document defined the compression and decompression laws based on this operator, in a conceptual way much similar to the code division multiple access (CDMA) systems, used in the telecommunication scenario. Each sensor has internally associated a digital signature, used in the compression stage. The host computer tries to recover the original waveform executing the cited operator and applying the inverse signature. The original data has been corrupted by an interference process, which depends on the presence of the other users in the same cluster. The host computer is able to select the best signatures, mostly reducing the energy of the interfering process. Simulations conducted with Matlab and Simple Power indicated our approach gains an 85% in energy consumption compared to the simpler algorithm up to now known (Least Mean Squares). Moreover, simulations verified the host has the capability to recover the transmitted waveforms in their fundamental harmonic members.
... As a general result, while statistical approaches assume data statistics known in advance, such assumption does not hold in many applications, where the more general adaptive techniques are preferable. In [6], a novel encoding scheme devoted to high capacitance off-chip buses was proposed, based on clustering, reordering and encoding the bus lines according to a runtime-defined reordering pattern and coding function. The performance results demonstrated the effectiveness of the technique in reducing the switching activity, with respect to other adaptive strategies. ...
Conference Paper
Full-text available
This paper analyzes the performance and timing overhead trade-off for a recently proposed data bus encoding scheme for low-power based on data lines reordering. The bus switch (BS) mechanism introduces greater activity savings than previous approaches; the hardware complexity of the encoder suggests to apply BS in off-chip buses, where the parasitic capacitance makes dynamic power dissipation in the bus lines the dominant contribution to power consumption. In the basic BS implementation, the encoding circuits included extra bus lines which degrade the energy saving. This paper illustrates and analyzes a circuit implementation with only one extra line, at the cost of a small time overhead. This solution strongly enhances the advantage in off-chip communications, where the available number of pads represents a key resource in low-cost packages. Our results indicate that the effectiveness of the approach strongly depend on an a-priori traffic analysis.
... The proposed encoding technique derives from a novel transition based bus encoding for low-power electrical buses [7]. It can be logically expressed as a four steps process: ...
Conference Paper
Full-text available
The increased demands of high data-rate communications could be satisfied by optical semiconductor elements. Actually, these devices represent an important role in the total energy budget available for the chip. This work presents a low-power encoding technique which optimizes the statistical distribution so as to reduce the energy dissipated in optical communications. We evaluated the encoding circuits referring to 180 nm, 130 nm and 90 nm CMOS technologies. Our results show an up to 12% electrical current reduction in the on-chip light emitter.
... Additionally, a reduced bus frequency introduces a strong "bottleneck" with high-speed core at elevate information demand. The "Bus Switch" (BS) [8] [12] mechanism represents a possible answer for a very low-power bus encoding, preserving the required bandwidth for high-speed transmission. It is based on tentatively encoding, clustering and reordering the input lines according to a "reordering" scheme. ...
...  The off-chip buses represent the ideal field of application for the BS mechanism. The physical implementation, in 130nm low-leakage technology, suggests a convenient use of BS systems in buses with loads from 2 to 4pF, typical values for off-chip buses [8]. ...
Article
Full-text available
The dynamic power management (DPM) represents an important challenge for extending the battery lifetime in a portable system. The power management, based on static and off-line approaches, does not consider the basic property of a modern battery, which recovers a fraction of its charge during the idle time. The DPM approach profiles a complex system in different power figures depending on a reduced set of macro-states. The DPM problem gives a sequence of macro-states which increases the battery lifetime. The dynamic power management is also required in complex systems where the power dissipation in communication channels represent a dominant factor. The modern communication arrangements operate at rate of some G Bit/sec, which implicated high transition activities, responsible of the dynamic power consumption. Moreover, the signal level involved in the output pads has a quadratic contribution in the dynamic power. The problem of low-power bus encoding has been extensively tackled in the past. The basic approach minimizes the transition density, directly related to the lines switching activities, responsible of the load/un- load of the parasitic capacities. The current literature on low-power bus encoding provides solutions, which do not guarantee a good activity saving increasing the bus lines; this issue represents a huge limitation in the modern communication channels, which require high transmission bandwidth. The paper introduced a novel low-power bus encoding approach, based on tentatively encoding, clustering and re-ordering the lines of a wide system data bus used in multi-processor scenario. The "Bus-Switch" mechanism, as novel bus encoding approach, drastically reduces the transition activity, preserving the required bandwidth for high data-rate communications. Since the optimal bus switch encoder complexity grows significantly decreasing the level of clustering, a sub-optimal approach requires power management policy in order to effectively control the battery life. The paperwork presents an overview of the bus-switch mechanism, including the required architecture for encoding/decoding the input lines. The RTL-model has been translated in a modern technology library at 90nm low-leakage using the Synopsys Tool for placement and CTS
... Unfortunately, the hardware complexity reduces the BS' field of application to off-chip buses with line capacitance more than 6pF and 4pF in technology at 180nm and 130nm respectively (32-bit bus, 4-bit cluster size [6]). In particular , the main limitation for power reduction and circuit feasibility concerns the utilization of the full-set of possible reordering patterns, which factorially depends on cluster size. ...
... [6], the authors found the minimal bus capacitance for BS convenient utilization be: ...
... This property does not held in many applications, where the bus transmits a large amount of information with varying statistical properties. The Bus Switch (BS) mechanism [6] represents a recently proposed adaptive approach for low-energy data bus encoding. This approach divides a large data bus in identical clusters, applying a reordering scheme and a fixed coding function to the input lines of the bus. ...
Conference Paper
Full-text available
The Bus Switch mechanism is a recently proposed bus encoding technique for low-power off-chip data buses. The approach is based on clustering, reordering and encoding the bus input lines according to a reordering pattern and a fixed coding function. This work presents a statistical approach for reducing the hardware overhead of the bus switch technique, by operating with a sub-set of the possible reordering patterns. We demonstrate the effectiveness and robustness of the proposed approach by ANSI C simulations, measuring the average switching activity savings. Our results show a modest switching activity degradation while saving 90% computation time, thus obtaining a sub-optimal encoder configuration satisfactory for a large variety of benchmarks.