August 2016
·
10 Reads
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
August 2016
·
10 Reads
February 2014
·
266 Reads
·
25 Citations
IEEE Journal of Solid-State Circuits
Following the rapid expansion of mobile computing in the past decade, mobile system-on-a-chip (SoC) designs have off-loaded most compute-intensive tasks to dedicated accelerators to improve energy efficiency. An increasing number of accelerators in power-limited SoCs results in large regions of “dark silicon.” Such accelerators lack flexibility, thus any design change requires a SoC re-spin, significantly impacting cost and timeline. To address the need for efficiency and flexibility, this work presents a multi-granularity FPGA suitable for mobile computing. Occupying 20.5mm2 in 40nm CMOS, the chip incorporates 2,760 fine-grained configurable logic blocks (CLBs) with 11,040 6-input look-up-tables (LUTs) for random logic, basic arithmetic, shift registers, and distributed memories, 42 medium-grained 48b DSP processors for MAC and SIMD operations, 16 32K×1b to 512×72b reconfigurable block RAMs, and 2 coarsegrained kernels: a 64-8192-point fast Fourier transform (FFT) processor and a 16-core universal DSP (UDSP) for software-defined radio (SDR). Using a mixradix hierarchical interconnect, the chip achieves a 4× interconnect area reduction over commercial FPGAs for comparable connectivity, reducing overall area and leakage by 2.5×, and delivering a 10-50% lower active power. With coarse-grained kernels, the chip's energy efficiency reaches within 4-5× of ASIC designs.
May 2012
·
20 Reads
This chapter discusses wordlength optimization. Emphasis is placed on automated floating-to-fixed point conversion. Reduction in the number of bits without significant degradation in algorithm performance is an important step in hardware implementation of DSP algorithms. Manual tuning of bits is often performed by designers. Such approach is time-consuming and results in sub-optimal results. This chapter discusses an automated optimization approach.
February 2011
·
1,398 Reads
·
177 Citations
IEEE Journal of Solid-State Circuits
This work presents measured results from test chips containing circuits implemented with micro-electro-mechanical (MEM) relays. The relay circuits designed on these test chips illustrate a range of important functions necessary for the implementation of integrated VLSI systems and lend insight into circuit design techniques optimized for the physical properties of these devices. To explore these techniques a hybrid electro-mechanical model of the relays' electrical and mechanical characteristics has been developed, correlated to measurements, and then also applied to predict MEM relay performance if the technology were scaled to a 90 nm technology node. A theoretical, scaled, 32-bit MEM relay-based adder, with a single-bit functionality demonstrated by the measured circuits, is found to offer a factor of ten energy efficiency gain over an optimized CMOS adder for sub-20 MOPS throughputs at a moderate increase in area.
January 2011
·
368 Reads
·
14 Citations
ISRN Signal Processing
This paper presents an automated tool for floating-point to fixed-point conversion. The tool is based on previous work that was built in MATLAB/Simulink environment and Xilinx System Generator support. The tool is now extended to include Synplify DSP blocksets in a seamless way from the users' view point. In addition to FPGA area estimation, the tool now also includes ASIC area estimation for end-users who choose the ASIC flow. The tool minimizes hardware cost subject to mean-squared quantization error (MSE) constraints. To obtain more accurate ASIC area estimations with synthesized results, 3 performance levels are available to choose from, suitable for high-performance, typical, or low-power applications. The use of the tool is first illustrated on an FIR filter to achieve over 50% area savings for MSE specification of 10−6 as compared to all 16-bit realization. More complex optimization results for chip-level designs are also demonstrated.
January 2011
·
245 Reads
·
4 Citations
A 2048 look-up-table FPGA with a radix-2 hierarchical interconnect network is realized in 3.94mm2 in 65-nm CMOS. It has an interconnect-to-logic area ratio of 1:1, which is a 3-4x reduction from modern FPGAs while allowing up to 100% resource utilization. As a proof of concept, it is designed with standard cells, achieving 16.4 GOPS/mm2 at 370MHz. Peak energy efficiency of 1.1 GOPS/mW is measured at 0.5V.
March 2010
·
4,680 Reads
·
364 Citations
Proceedings of the IEEE
Operation in the subthreshold region most often is synonymous to minimum-energy operation. Yet, the penalty in performance is huge. In this paper, we explore how design in the moderate inversion region helps to recover some of that lost performance, while staying quite close to the minimum-energy point. An energy-delay modeling framework that extends over the weak, moderate, and strong inversion regions is developed. The impact of activity and design parameters such as supply voltage and transistor sizing on the energy and performance in this operational region is derived. The quantitative benefits of operating in near-threshold region are established using some simple examples. The paper shows that a 20% increase in energy from the minimum-energy point gives back ten times in performance. Based on these observations, a pass-transistor based logic family that excels in this operational region is introduced. The logic family operates most of its logic in the above-threshold mode (using low-threshold transistors), yet containing leakage to only those in subthreshold. Operation below minimum-energy point of CMOS is demonstrated. In leakage-dominated ultralow-power designs, time-multiplexing will be shown to yield not only area, but also energy reduction due to lower leakage. Finally, the paper demonstrates the use of ultralow-power design techniques in chip synthesis.
September 2009
·
54 Reads
·
17 Citations
IEEE Transactions on Circuits and Systems II: Express Briefs
This brief presents an improved logical-effort model to account for the slope mismatch between the input and output of a gate. The model has a simple formulation in which only one additional parameter is needed, making the analysis suitable for hand calculations. Using 65- and 90-nm complementary metal-oxide-semiconductor technologies, the model maintains less than 5% error in gate-delay estimations compared to Spectre simulations even under large variations between the input and output slopes. Using this model, a circuit optimization tool is written to optimize an adder synthesized with a 65-nm standard-cell library. The estimation error for the adder is also within the modeling accuracy of 5%, whereas the original logical-effort model and the synthesis timing libraries have errors of up to 40% and 20%, respectively.
January 2008
·
20 Reads
·
2 Citations
This project report presents a major update to the original word-length optimization tool for floating-point to fixed-point conversion. The updated tool now support designs in Synplify DSP blockset; all commonly-used blocks are supported. Changes from the current optimization flow is kept to minimum, so users who are familiar with the original Xilinx System Generator flow can utilize this updated tool without additional knowledge. In addition to FPGA area-estimation, the update also includes ASIC area-estimation for end-users who chooses the ASIC flow. To obtain more accurate area-estimations with the synthesized results, 3 performance levels are available to choose from, suitable for high-performance, typical, or low-power applications.
... latency and throughput) enhancement and optimization. Especially, embedding the FPGA on a systemon-chip (SoC) [9], [10] is one of the most attractive candidates for IoT applications [11]. Recently, there are several research activities on a standard-cell based FPGA as flexible and portable option for embedded FPGA fabrics [12]- [17]. ...
February 2014
IEEE Journal of Solid-State Circuits
... Affine analysis of ranges [14] can cancel correlated terms (e.g., A − A has a range [0, 0]), leading to a more compact design at the expense of more complex analysis. These techniques are used in high-level synthesis flows [15], which unfortunately require users to express their designs using non-zero-cost abstractions. The benefits of these optimizations can be offset by the difficulties encountered when porting existing RTL to a new flow and/or application. ...
January 2011
ISRN Signal Processing
... M2000 (Abound Logic) proposed an MSSN with local crossbars based on a Clos network [16], while Leopard Logic proposed a butterfly-based hierarchical network [17]. In [10,18] an MSSN based on butterfly topology is discussed with depopulation of the upper stages of the network and an isomorphic transformation to solve the radix-boundary problem [19], a limiting factor of MSSN in the field of FPGAs. Nevertheless, area saving is balanced by the fact that this network is no more proven to be RNB, although authors indicates the availability of enough bandwidth based on Rent's rule. ...
January 2011
... They followed a binary search algorithm to quickly arrive at a coarse optimal point and start the fine optimization from the coarse optimal point, thus reducing complexity to vary linearly with N, the number of quantizers, even without signal grouping. In addition, this method is well suited to MATLAB-based design and is very handy for integration with automated tools like Accelchip [30] and SynplifyPro [31,32] for providing a complete automated flow from floating-point MATLAB design to FPGA hardware resources in minutes. The paper [29] identified that MATLAB-Simulink environment can be exploited only for fixed-point modeling and simulation of AMS circuits. ...
January 2008
... After the best stage effort f* is determined, each gate within the critical path can be sized using the following formula: [9] , , * out n in n n n ...
September 2009
IEEE Transactions on Circuits and Systems II: Express Briefs
... Finally, laterally actuated relays typically have a smaller footprint when compared to other relay structures. NEMS relays utilizing four and six-terminal configurations have also been proposed as logic gate building blocks to minimize the number of relays needed for any circuit path in logic circuits and thus the overall switching time for the desired logic function [11], [12], [13], [14]. The four terminal designs introduce an additional body bias terminal and two separate contact regions on the cantilever electrically isolated from the gate terminal which serves as the input, as can be seen in Figure 1 (c). ...
February 2011
IEEE Journal of Solid-State Circuits
... It can quickly perform the carry generation process which can be computed with the help of propagating and generating signals. The major issue behind this kind of adder is that the propagation delay of the adder will be increased if the bit width of the input operands increases [10]. ...
March 2010
Proceedings of the IEEE