Figure 1 - uploaded by Ali Al-Haj
Content may be subject to copyright.
Multiply accumulate operation (a) conventional implementation and (b) distributed Arithmetic implementation 3.2.1. Internal configuration

Multiply accumulate operation (a) conventional implementation and (b) distributed Arithmetic implementation 3.2.1. Internal configuration

Source publication
Article
Full-text available
The discrete wavelet transform has gained the reputation of being a very effective sig-nal analysis tool for many practical applications. However, due to its computation-intensive na-ture, current implementations of the transform fall short of meeting real-time processing re-quirements of most applications. This paper describes a parallel implement...

Contexts in source publication

Context 1
... the input samples are represented with B bits of preci- sion, B clock cycles are required to complete an inner-product calculation. An example of a distributed arithmetic implementation of a 4- element inner product operation is shown in Figure 1 along with the conventional imple- mentation of the same product operation. ...
Context 2
... is noted from the results obtained above, and further illustrated in Figure 10, that the throughput of the distributed arithmetic im- plementation is higher than the throughput of the conventional arithmetic implementation. This is expected since the distributed arith- metic implementation replaced the time-con- suming conventional multiply accumulate op- erations with fast look-up tables and shift op- erations. ...
Context 3
... partial products of all multiply accumulate operations were pre- computed offline and stored in the LUTs, thus saving a great a mount of real-time computa- tion. As for Virtex slice utilization, distributed arithmetic, uses less hardware resources than the conventional arithmetic, as illustrated in Figure 11. Conventional Arithmetic Implementation ...
Context 4
... DWT Inverse DWT Figure 11. Comparison between the utilization of two DWT implementations This is also expected since the conventional arithmetic multiplier requires much more logic resources than the distributed arithmetic multiplier which requires small LUTs, sim- ples adders and shift registers. ...

Similar publications

Conference Paper
For computer-aided hardware design, models are usually used to evaluate the designed systems. But there is still a gap between models and their efficient implementations on a real architecture, like FPGAs. For example, some model characteristics may lead to a waste of resources, which can even make a design infeasible. In this paper, we focus on ho...
Conference Paper
Full-text available
FPGAs are attractive devices due to their low develop-ment cost and short time-to-market, and widely used not only for reconfigurable purpose but also as application-dependent embedded devices for low-volume products. This paper presents a scan-based BIST architecture for testing of application-dependent circuits configured on FPGA. In or-der to bu...
Conference Paper
Full-text available
An FPGA implementation of a fine grain general-purpose SIMD processor array is presented. The processor architecture has a compact processing element which is encapsulated into two configurable logic blocks (CLBs) and is then replicated to form an array. A 32 × 32 processing element array is implemented on a low-cost Xilinx XC5VLX50 FPGA using four...
Conference Paper
Full-text available
In this paper we design and implement a complex Digital Up-Converter (DUC) using a Xilinx Virtex6 FPGA. All the steps necessary to build such circuits are thoroughly described and some valuable hints on how to overcome problems during the design time are presented. We introduce a new approach for oscillator circuits, which are an important part of...
Article
Full-text available
Field-Programmable Gate Arrays (FPGAs) are flexible and reusable circuits that can be easily reconfigured by the designer. One of the steps involved in the logic design with FPGA circuits is placement. In this step, the logic functions are assigned to specific cells of the circuit. In this paper we present a placement algorithm for FPGA circuits. I...

Citations

... The MAC method of FIR filter implementation is expensive to implement in a fieldprogrammable gate array (FPGA) owing to logic complexity and resource usage [6]. To resolve this problem, many researchers have used the distributed arithmetic (DA) algorithm in the FIR filter design process [9,15]. In the MAC method, any arithmetic operation is implemented by first multiplying and then adding; however, the DA algorithm converts this by shifting and adding. ...
... It has a basic-to-advanced converter at the front end, followed by a chip and several peripheral components, such as memory, to store data and channel coefficients [8]. Mechanical advancements have improved the various zones of DSP, one of which is the structure of all-around arranged calculations to ascertain the discrete Fourier transform [9]. The introduction of programmable digital signal processors (PDSPs) in the late 1970s was another advancement in this field that provided the option to perform multiplication and addition in one clock cycle. ...
Article
Full-text available
In many industries and telecommunication system there is a need for digital signal processing for fast transfer of data between two points or devices with low power consumption and considerable hardware resources (circuit size and speed). Finite Impulse Response (FIR) filters play an important role in many signal processing applications and telecommunication systems. This paper propose the design and implementation of 4-bit FIR Filter using Distributed Arithmetic (DA) Algorithms, which substitute, multiply and accumulate operation with series of Look Up Table (LUT). The proposed FIR filter is implemented in high-density field programmable logic devices (FPGAs) and designed using very high-speed integrated circuit hardware description language (VHDL) and verified using Xilinx ISE 14.7 tool and simulator. The proposed, modified and optimized DA provided the multiplication and accumulation free calculation of inner product data of FIR filter and this consecutively reduces the size and power dissipation of circuit. DA is one of the methods to implement FIR filters that impact the storage resource and the calculating speed, which make the memory size smaller and the operation speed faster. The simulated proposed structure required nearly 40% less cells, 35% less LUT pairs and 4% less power consumption with existing structure.
... However, in some practical applications, we only need finite phase shifts and it becomes promising to reduce the design complexity by decreasing the number of phase shifts. In 2011, PIN diodes were utilized as phase shifters in a large reflect array [1,22,66,10,13]. As a result, the reflecting element structure can be simple and easily controlled. ...
... One can then insert neural layers between the two parts. The whole model is trained and then fine-tuned with the channels considered [1,13,155,114]. ...
... For example, with the development of new antenna and integration technologies, we predict that large-scale arrays such as ELAA significantly impact channel modeling and performance evaluation. As shown in some papers [12,16,23,127,128], large-scale arrays bring new challenges in modeling, such as near-field spherical waves and nonstationary channels [13,123,124,127]. In the past, we simply modeled the far field, which could be approximated using plenary waves. ...
Article
Full-text available
rbital Angular Momentum (OAM), provides the new angular or mode dimension for wireless communications, offers an intriguing way for anti-jamming. The unprecedented demands for high-quality and seamless wireless services impose continuous challenges to existing cellular networks. Applications like enhanced mobile broadband (eMBB), ultra-reliable and low latency communications (URLLC), and massive machine type communications (mMTC) services are pushing the evolution of cellular systems towards the fifth-generation (5G). We propose to use the orthogonally of OAM modes for anti-jamming in wireless communications. In particular, the mode hopping (MH) scheme for anti-jamming within the narrow frequency band. We derive the closed-form expression of bit error rate (BER) for multiple user's scenario with our developed MH scheme. Our developed MH scheme can achieve the same anti-jamming results within the narrow frequency band as compared with the conventional wideband FH scheme. We explore the challenges in the design of next generation transport layer protocols (NGTP) in 6G Terahertz communication-based networks. Furthermore, we propose mode-frequency hopping (MFH) scheme, which jointly uses our developed MH scheme and the conventional FH scheme to further decrease the BER for wireless communication. In contrast, our experiments for Reconfigurable Intelligent Surface (RIS) reveal it as economically simple and a new type of ultra-thin meta material inlaid with multiple sub-wavelength scatters. We exposed our observations for possible favorable propagation conditions by controlling the phase shifts of the reflected waves at the surface such that the received signals are directly reflected towards the receivers without any extra cost of power sources or hardware. It provides a revolutionarily new approach to actively improve the link quality and coverage, which sheds light into the future 6G. Aiming high-quality channel links in cellular communications via design and optimization of RIS construction is explored in this work as novel RIS-based smart radio techniques. Unlike traditional antenna arrays, three unique characteristics of RIS are revealed in this work. First, the built-in programmable configuration of RIS enables analog beam forming inherently without extra hardware or signal processing. Second, the incident signals can be controlled to partly reflect and partly transmit through the RIS simultaneously, adding more flexibility to signal transmission. Third, RIS has no digital processing capability to actively send signals nor any radio frequency (RF) components. One of the considerations is the use of Terahertz communications that aims to provide 1 Tbps (terabits per second) and air latency less than 100μs. Further, 6G networks are expected to provide for more stringent Quality of Service (QoS) and mobility requirements. As such, it is necessary to develop novel channel estimation and communication protocols, design joint digital and RIS-based analog beam forming schemes, and perform interference control via mixed reflection and transmission. The aforementioned innovative use-cases call for the necessity of redefining the requirements of upcoming 6G technology. 5G technology has abundant potential but it cannot satisfy the stringent rate-reliability-latency requirements of the new applications. This work also highlights the requirements and KPIs of 6G technology will be stricter and more diverse. For example, we discuss a scenario while the 5G network is already operated in the very high frequency mm-waves region, 6G could require even higher frequencies for operation. The 6G technology will focus on achieving higher peak data rate, seamless ubiquitous connectivity, non-existent latency, high reliability, and strong security and privacy for providing ultimate user experience. A Section is devoted to describe the comparative study of the KPIs of both 5G and 6G.
... Three different architectures of DWT named as direct approach, recursive pyramidal approach and a new modified RPA, based on flexible filter and control unit structure, has been implemented in [5]. A parallel implementation of DWT and its inverse DWT reformulated using distributed arithmetic architecture has been implemented on high-density FPGA in [3]. Yazhini et al. [10] implemented DWT using modified DA with partial tables. ...
Chapter
Throughout the last 20 years, the DWT has been broadly utilizing in the applications in digital image handling. In this paper, a novel architecture for DWT computation based on modified distributive arithmetic and modified multiplexer logic-based architecture are proposed, designed and implemented on FPGA platform. The designed DWT architecture is designed for high throughput and latency. HDL model is developed for the modified architecture and is validated on FPGA platform for area, timing and power performances. The novel architecture proposed in this work is suitable for high-speed image coding.
... Implementations based on DA get better performance than that based on convolution approach. [4] This paper is organized as following: Section II: digital FIR filters, Section III FIR implementation, Section IV simulation and results, and V conclusion. ...
... Distributed Arithmetic (DA) algorithm is the most suitable alternative for convolution approach in case of constant coefficients [1, 11, and 12]. DA algorithm is suitable for portable applications, because in DA algorithm the costly MACs units are replaced with Look-Up-Tables (LUTs) and shifts which reduce the power consumption [4]. ...
... In Fig. 6, the m-bit LUT used in Fig. 4 have been divided into two ml2-bit LUTs. Then the output of the two LUTs are added before fed into the scaling accumulator [4,13]. ...
Conference Paper
Full-text available
In this paper, a study was conducted on four major mobile network operators (MTN, Vodafone, Tigo and Airtel) in some selected cities (Accra, Tema and Kumasi) in Ghana. The KPIs (Call Drop rate and Audio Quality) of these networks were measured, analysed and compared with the benchmark set by the local regulator (NCA) and international standard authority-International Telecommunication Union (ITU). It was observed that some of the measured KPIs values (Call Drop Rate and Audio Quality) were fairly close to the standard set by the local (NCA) and the international regulator (ITU) indicating customers could experience fairly good service in those locations, while other values (Traffic Channel Congestion and Call Set Up Time) were outside the standard set by NCA and ITU which means customers could experience some poor QoS in these areas.
... Table 1 shows the summary of DA and SA implementation. [66] ✓ FIR filters [67] ✓ Entropy encoding [68] ✓ Multiplierless filter [69] ✓ Algorithm [14] ✓ Modified DA based DWT architecture [70] ✓ Scalable core architecture [71] ✓ SA-DWT architecture [72] ✓ Full Search Block Matching Algorithm ...
Research
Full-text available
The rapid development of medical imaging and the invention of various medicines have benefited mankind and the whole community. Medical image processing is a niche area concerned with the operations and processes of generating images of the human body for clinical purposes. Potential areas such as image acquisition, image enhancement, image compression and storage, and image based visualization also include in medical image processing analysis. Unfortunately, medical image compression dealing with three-dimensional (3-D) modalities still in the pre-matured stage. Along with that, very limited researchers take a challenge to apply hardware on their implementation. Referring to the previous work reviewed, most of the compression method used lossless rather than lossy. For implementation using software, MATLAB and Verilog are the famous candidates among researchers. In term of analysis, most of the previous works conducted objective test compared with subjective test. This paper thoroughly reviews the recent advances in medical image compression mainly in terms of types of compression, software and hardware implementations and performance evaluation. Furthermore, challenges and open research issues are discussed in order to provide perspectives for future potential research. In conclusion, the overall picture of the image processing landscape, where several researchers more focused on software implementations and various combinations of software and hardware implementation.
... In [10], a bit-serial architecture based on a time-interleaved structure is used in the implementation of the DWT and this results a modular and scalable architecture allowing a bit-level parameterisation. In [11], a DWT implementation, which exploits a look-up table (LUT)-based architecture of FPGAs and reformulating the wavelet computation in accordance with the distributed arithmetic algorithm, is proposed. In [12], this time, speed efficiency is considered by employing a low-pass and high-pass filter pair which is working concurrently in each level of transform. ...
Article
Full-text available
The authors aimed to develop an application for producing different architectures to implement dual tree complex wavelet transform (DTCWT) having near shift-invariance property. To obtain a low-cost and portable solution for implementing the DTCWT in multi-channel real-time applications, various embedded-system approaches are realised. For comparison, the DTCWT was implemented in C language on a personal computer and on a PIC microcontroller. However, in the former approach portability and in the latter desired speed performance properties cannot be achieved. Hence, implementation of the DTCWT on a reconfigurable platform such as field programmable gate array, which provides portable, low-cost, low-power, and high-performance computing, is considered as the most feasible solution. At first, they used the system generator DSP design tool of Xilinx for algorithm design. However, the design implemented by using such tools is not optimised in terms of area and power. To overcome all these drawbacks mentioned above, they implemented the DTCWT algorithm by using Verilog Hardware Description Language, which has its own difficulties. To overcome these difficulties, simplify the usage of proposed algorithms and the adaptation procedures, a code generator program that can produce different architectures is proposed.
... Implementations based on DA get better performance than that based on convolution approach. [4] This paper is organized as following: Section II: digital FIR filters, Section III FIR implementation, Section IV simulation and results, and V conclusion. ...
... Distributed Arithmetic (DA) algorithm is the most suitable alternative for convolution approach in case of constant coefficients [1, 11, and 12]. DA algorithm is suitable for portable applications, because in DA algorithm the costly MACs units are replaced with Look-Up-Tables (LUTs) and shifts which reduce the power consumption [4]. ...
... In Fig. 6, the m-bit LUT used in Fig. 4 have been divided into two ml2-bit LUTs. Then the output of the two LUTs are added before fed into the scaling accumulator [4,13]. ...
Conference Paper
Full-text available
Finite impulse response (FIR) digital filters are extensively used due to their key role in various digital signal processing (DSP) applications. Several attempts have been made to develop hardware realization of FIR filters characterized by implementation complexity, precision and high speed. Field Programmable Gate Array is a reconfigurable realization of FIR filters. Field-programmable gate arrays (FPGAs) are on the verge of revolutionizing digital signal processing. Many front-end digital signal processing (DSP) algorithms, such as FFTs, FIR or IIR filters, are now most often realized by FPGAs. Modern FPGA families provide DSP arithmetic support with fast-carry chains that are used to implement multiply-accumulates (MACs) at high speed, with low overhead and low costs. In this paper, distributed arithmetic (DA) realization of FIR filter as serial and parallel are discussed in terms of hardware cost and resource utilization.
... Another research reveal, DA exploits parallelism (at the vector level) and pipelining (at the bit level) and is highly suitable for FPGA implementation due to their fine grained device fabric, massive parallelism capabilities, register rich architecture that enables efficient implementation of ROM structures in lookup tables (LUTs) [13]. The superior performance and hardware efficient nature of DA when compared to conventional arithmetic has been suitably demonstrated in the implementation of various algorithms such as algorithms based on matrix-vector multiplication, [14], [15]. ...
Conference Paper
Full-text available
This paper describes the design and implementation of three-dimensional (3-D) Haar with transpose-based computation and distributed arithmetic (DA). As a results of the separately property of the multidimensional Haar wavelet transform (HWT), the proposed architecture has been implemented using a cascade of three N-point one-dimensional (1-D) Haar and two transpose memories for a 3-D volume of N × N × N, suitable for 3-D medical image compression. The 3-D HWT architecture were implemented on SubRIO-9632 board National Instrument. Experimental result and analysis of area, power consumption and maximum frequency are discussed in this paper.
... This modified architecture uses parallel FIR filter structure to compute DWT for n resolution levels. [4]represents computation of fast discrete wavelet transform using FIR filter implementation using Distributed Arithmetic .This paper proposes a parallel implementation of DWT and its inverse DWT ,here, wavelet transform is reformulated according to distributed arithmetic and is implemented on high density field programmable gate array. [5] gives exact technique and implementation of distributed arithmetic with the use of partial tables. ...
... Generally, for the implementation of DWT, FIR filter is implemented using direct from structure as show in fig 2. In fig2, each filter tap consists of a delay element, an adder, and a multiplier [4]. But, this structure faces major drawback that filter throughput is inversely proportional to the number of filter taps. ...
... Distributed arithmetic is originated from a simple equation of Boolean algebra where, variable Y indicates the result of product between an input data vector x and a coefficient vector a for i values. The conventional representation of product of distributed arithmetic is given as follows [4]: ...
... Where W L (n, j) and W H (n, j) are the n th scaling and wavelet coefficient at the j th stages, h 0 (n) and h 1 (n) are dilation coefficients [18] corresponding to scaling and wavelet functions. The forward DWT has been implemented using Decimator block, which consists of a PDA FIR filter and down sampling operator. ...
Article
Full-text available
Development of modern integrated circuit technologies makes it feasible to develop cheaper, faster and smaller special purpose signal processing function circuits. Digital Signal processing functions are generally implemented either on ASICs with inflexibility, or on FPGAs with bottlenecks of relatively smaller utilization factor or lower speed compared to ASIC. Field Programmable DSP Array (FPDA) is the proposed DSP dedicated device, redolent to FPGA, but with basic fixed common modules (CMs) (like adders, subtractors, multipliers, scaling units, shifters) instead of CLBs. This paper introduces the development of reconfigurable system architecture with a focus on FPDA that integrates different DSP functions like DFT, FFT, DCT, FIR, IIR, and DWT etc. The switching between DSP functions is occurred by reconfiguring the interconnection between CMs. Validation of the proposed architecture has been achieved on Virtex5 FPGA. The architecture provides sufficient amount of flexibility, parallelism and scalability.