The basic architecture of reconfigurable complete-binary-adder-tree.

The basic architecture of reconfigurable complete-binary-adder-tree.

Source publication
Article
Full-text available
This paper presents a compact vector quantizer based on the self-organizing map (SOM), which can fulfill the data compression task for high-speed image sequence. In this vector quantizer, we solve the most severe computational demands in the codebook learning mode and the image encoding mode by a reconfigurable complete-binary-adder-tree (RCBAT), w...

Contexts in source publication

Context 1
... our work, we proposed an optimized circuit named RCBAT to accomplish these operations. Figure 3 shows the basic architecture of RCBAT. Similar to the SDU, the arithmetic block in our design can also be reconfigured either in codebook learning mode or in image encoding mode. ...
Context 2
... our work, we proposed an optimized circuit named RCBAT to accomplish these operations. Figure 3 shows the basic architecture of RCBAT. Similar to the SDU, the arithmetic block in our design can also be reconfigured either in codebook learning mode or in image encoding mode. ...
Context 3
... we adopt a mechanism of PVCS in the SDU, we must sum up all of the partial SEDs to get the exact SED. Hence, we add an additional stage following the fourth stage, which is shown at the right bottom of Figure 3. In the last stage, the separation signal "S SEP " keeps as "0" until all of the partial vectors have been processed (after m clock cycles). ...
Context 4
... the last stage, the separation signal "S SEP " keeps as "0" until all of the partial vectors have been processed (after m clock cycles). As shown in Figure 3, when "S SEP " is "0", the last adder is configured to sum the new partial SED and the intermediate exact SED up. When "S SEP " turns to "1", the exact value of SED, "DE 2 ", which contains m partial SEDs, will be obtained and then fed into the minimum distance search circuit for winner-neuron searching. ...
Context 5
... the meantime, the SDU in Figure 2 is configured to compute the value of α(x j − w ij ), and the RCBAT is transformed to 16 individual adders to calculate the value of w ij + α(x j − w ij ). The data flow of the RCBAT circuit is highlighted with blue color in Figure 3. In this way, the partial new weight vector can be smoothly obtained in one clock. ...
Context 6
... we adopt a mechanism of PVCS in the SDU, we must sum up all of the partial SEDs to get the exact SED. Hence, we add an additional stage following the fourth stage, which is shown at the right bottom of Figure 3. In the last stage, the separation signal "SSEP" keeps as "0" until all of the partial vectors have been processed (after m clock cycles). ...
Context 7
... the last stage, the separation signal "SSEP" keeps as "0" until all of the partial vectors have been processed (after m clock cycles). As shown in Figure 3, when "SSEP" is "0", the last adder is configured to sum the new partial SED and the intermediate exact SED up. When "SSEP" turns to "1", the exact value of SED, "DE 2 ", which contains m partial SEDs, will be obtained and then fed into the minimum distance search circuit for winner-neuron searching. ...
Context 8
... the meantime, the SDU in Figure 2 is configured to compute the value of α(xj − wij), and the RCBAT is transformed to 16 individual adders to calculate the value of wij + α(xj − wij). The data flow of the RCBAT circuit is highlighted with blue color in Figure 3. In this way, the partial new weight vector can be smoothly obtained in one clock. ...
Context 9
... the entire new weight vector of the winner-neuron can be figured out and updated after m clocks. Figure 3. The basic architecture of reconfigurable complete-binary-adder-tree. ...
Context 10
... terms of our proposed vector quantizer, the clock cycles to encode one individual pixel-block can be defined as Equation (6), where d, N, and k are the pixel-block size, the codebook size and the number of sub-blocks, respectively. The number '7' represents the depth of the shift register in Figure 4, which is equal to the sum of 1 stage in SDU (shown in Figure 2), 5 stages in the RCBAT (shown in Figure 3) and 1 stage in WTA circuits (shown in Figure 4). ...

Similar publications

Article
Full-text available
In order to simplify the hardware design and reduce the resource requirements, this paper proposes a novel implementation of a convolutional auto-encoder (CAE) in a field programmable gate array (FPGA). Instead of the traditional framework realized in a layer-by-layer way, we designed a new periodic layer-multiplexing framework for CAE. Only one la...

Citations

... The weight vectors were trained by an off-chip computer, and the SOM implemented on FPGA performed the recall function only. Other examples of hardware SOMs working in the recall mode were proposed by Kurdthongmee [91] and Huang et al. [20] for image compression. In the former work, a memory-based BMU search unit was proposed, resulting in a reduced final number of colors. ...
... In addition, instead of using only one pixel as the input vector of the SOM map, B × B blocks of pixels can be presented as input vectors and quantized by the SOM in the same way as previously presented. The HW SOMs in [13], [17], [18], [20], [28], [40], [67], [68], [70], [71], [76], [91] were applied to the real-time color/image compression. ...
... This can be explained by the use of the most recent FPGA devices and in the adoption of the architectural choices discussed earlier in this article. It should also be highlighted that the performance of the hardware SOM with the off-chip architecture presented in [20] was given in Figs. 14 and 15 and Table III in connections per second (CPS) (evaluated only in recall phase). ...
Article
Full-text available
Self-organizing feature maps (SOMs) are commonly used technique for clustering and data dimensionality reduction in many application fields. Indeed, their inherent property of topology preservation and unsupervised learning of processed data without any prior knowledge put them in the front of candidates for data reduction in the Internet of Things (IoT) and big data (BD) technologies. However, the high computational cost of SOMs limits their use to offline approaches and makes the online real-time high-performance SOM processing more challenging and mostly reserved to specific hardware implementations. In this article, we present a survey of hardware (HW) SOM implementations found in the literature so far: the most widely used computing blocks, architectures, design choices, adaptation, and optimization techniques that have been reported in the field of hardware SOMs. Moreover, we give an overview of main challenges and trends for their ubiquitous adoption as hardware accelerators in many application fields. This article is expected to be useful for researchers in the areas of artificial intelligence, hardware architecture, and system design.
... SOMs have been applied to almost all kinds of scenes in the image processing domain. In the last decade, SOMs have been applied to image compression, image segmentation, feature abstraction and classification of image, and the effectiveness has been demonstrated [35][36][37]. However, there is a limited number of publications on the application of SOMs to image inpainting. ...
Article
In addition to text data analysis, image analysis is an area that has increasingly gained importance in recent years because more and more image data have spread throughout the internet and real life. As an important segment of image analysis techniques, image restoration has been attracting a lot of researchers’ attention. As one of AI methodologies, Self-organizing Maps (SOMs) have been applied to a great number of useful applications. However, it has rarely been applied to the domain of image restoration. In this paper, we propose a novel image restoration method by leveraging the capability of SOMs, and we name it “boundary precedence image inpainting method based on SOMs”. In the proposed method, SOMs are used to separate a damaged image into different layers according to the pixel information of the image. Each pixel in the damaged area is considered to be a center of a square area, which is called a waiting-for-inpainting patch. The waiting-for-inpainting patch filling order is calculated by the boundary precedence method in which the information of the separated image layers obtained by SOMs is analyzed and used to calculate the filling order. According to the proposed method, the waiting-for-inpainting patches on the boundaries of the damaged region are restored first and the filling order of this proposed method depends on the precedence values of each waiting-for-inpainting patch. Case studies demonstrate the effectiveness of this proposed method. Both textural and structural information can be nicely repaired by the proposed method.
... The improvement of the learning performance was verified by the hardware SOM that was implemented on an FPGA. Huang et al. [16] proposed a vector quantizer based on the hardware SOM. The proposed vector quantizer was implemented on an FPGA and was used for high-speed image compression. ...
... The problem in using DPLL is that the computing precision was proportional to the clock frequency because the phases of the carrier signals were modulated by clock signal. For example, the frequency of the clock must be 2 16 times higher than that of the carrier signal if 16-bit precision was required to represent vector element values. ...
Article
Full-text available
This paper proposes a unique hardware architecture for a self-organizing map (SOM) that mimics the biological brain by using pulse mode operation. In the proposed SOM, vector elements are given as in the form of frequency modulated signals, and digital frequency-locked loops (DFLLs) in neurons handle the computations of the vector elements. The SOM is trained by unsupervised learning, where the winner neuron that has the nearest weight vector is found first. In the proposed SOM, the winner neuron is found by counting cycle slips between the signals that carry input and weight vectors. After the winner neuron is found, weight vectors selected by a neighborhood function are updated toward the input vector. Triangular neighborhood function that is implemented by using an attenuating enable signal for the DFLLs, is employed. To evaluate the proposed SOM and its building components, VHDL simulations and experiments using an FPGA were conducted. Compared to the previous work, the operation speed and learning capability were significantly improved. Novelty of the proposed architecture is it uniquely uses a pulse-based operation that mimics the biological brain, and it was verified that unsupervised learning can be realized with neurons communicating with each other using frequency modulated pulse signals.
... Configurable hardware appears well adapted to obtain efficient and flexible neural network implementation. Several SOM implementations on FPGA supports have been proposed [11][12][13][14][15][16][17][18]. Indeed, Porrmann et al. in [11] successfully implemented, on a Virtex FPGA support, a reconfigurable SIMD architecture of an SOM network formed by a processing element (PE). ...
... is new approach allowed reaching a maximal operating frequency of 47 MHz and a number of frames per second (fps) equal to 22. For image compression, the authors in [17] successfully integrated a completely parallel SOM on an FPGA circuit using a shared comparator to exploit the parallelism between different neuroprocessors. In [18], the authors have proposed a new scalable and adaptable SOM network hardware architecture. ...
... h c,l,k (t) is the neighbourhood function already presented in Section 2 (equation (3)) and ΔM l,k �����→ (t) is already calculated by VEP. For the integration of the update operation, recent work in the literature [8,10,12,17] has used multiplication operators and memory modules to store the results. ...
Article
Full-text available
In this article, we propose to design a new modular architecture for a self-organizing map (SOM) neural network. The proposed approach, called systolic-SOM (SSOM), is based on the use of a generic model inspired by a systolic movement. This model is formed by two levels of nested parallelism of neurons and connections. Thus, this solution provides a distributed set of independent computations between the processing units called neuroprocessors (NPs) which define the SSOM architecture. The NP modules have an innovative architecture compared to those proposed in the literature. Indeed, each NP performs three different tasks without requiring additional external modules. To validate our approach, we evaluate the performance of several SOM network architectures after their integration on an FPGA support. This architecture has achieved a performance almost twice as fast as that obtained in the recent literature.
... This new approach allowed reaching a maximal operating frequency of 47 MHz and a number of frames per second (fps) equal to 22. For image compression, the authors in Ref. 28 successfully integrated an FPGA circuit and a completely parallel SOM using a shared comparator. The proposed solution achieved 28,494 MCPS with a maximal frequency of 79 MHz. ...
... As a consequence, a lot of approaches [22][23][24][25]27,28 used processing units, often called neuroprocessors (NPs) to ensure application adaptability. NPs were interconnected through connection lines that allowed the exchange of computed data with a lower number of clock cycles. ...
... (4)) and ÁM l;k ðtÞ has been already calculated by VEP. For the integration of the update operation, some recent work in the literature 19,21,23,28 have used multiplication operators and memory modules to store the results. This is generally costly in logic resources. ...
Article
In this article we present a new generic architectural approach of a Self-Organizing Map (SOM). The proposed architecture, called the Diagonal-SOM (D-SOM), is described as a Hardware–Description-Language as a intellectual property kernel with easily adjustable parameters. The D-SOM architecture is based on a generic formalism that exploits two levels of the nested parallelism of neurons and connections. This solution is therefore considered as a system based on the cooperation of a distributed set of independent computations. The organization and structure of these calculations process an oriented data flow in order to find a better treatment distribution between different neuroprocessors. To validate the D-SOM architecture, we evaluate the performance of several SOM network architectures after their integration on a Xilinx Virtex-7 FPGA support. The proposed solution allows the easy adaptation of learning to a large number of SOM topologies without any considerable design effort. 16×16 SOM hardware is validated through FPGA implementation, where temporal performance is almost twice as fast as that obtained in the recent literature. The suggested D-SOM architecture is also validated through simulation on variable-sized SOM networks applied to colour vector quantization.
... The vector quantization with predictor error was applied for the compression of medical images; a hybrid optimization comprising of artificial bee colony and genetic algorithm was used for the codebook generation [20]. The FPGA implementation of self-organizing map neural network based vector quantizer was framed for the compression of images [21]. The vector quantization was found to be efficient in the medical image retrieval when coupled with the fuzzy logic [22]. ...
Article
Full-text available
The role of compression is vital in telemedicine for the storage and transmission of medical images. This work is based on Contextual Vector Quantization (CVQ) compression algorithm with codebook optimization by Simulated Annealing (SA) for the compression of CT images. The region of interest (foreground) and background are separated initially by region growing algorithm. The region of interest is encoded with low compression ratio and high bit rate; the background region is encoded with high compression ratio and low bit rate. The codebook generated from foreground and background is merged, optimized by simulated annealing algorithm. The performance of CVQ-SA algorithm was validated in terms of metrics like Peak to Signal Noise Ratio (PSNR), Mean Square Error (MSE) and Compression Ratio (CR), the result was superior when compared with classical VQ, CVQ, JPEG lossless and JPEG lossy algorithms. The algorithms are developed in Matlab 2010a and tested on real-time abdomen CT datasets. The quality of reconstructed image was also validated by metrics like Structural Content (SC), Normalized Absolute Error (NAE), Normalized Cross Correlation (NCC) and statistical analysis was performed by Mann Whitney U Test. The outcome of this work will be an aid in the field of telemedicine for the transfer of medical images.
... In the meanwhile, all the weight vectors share the same ALUs rather than having individual ALUs. Overall, the proposed MRPA in [1][2][3][4][5][6] with N wordparallelism consists of one control unit (CU) and four specific function modules (SFMs), which are the parameterizable storage modules (PSMs), weight modules (WMs), summation module (SM) and comparison module (CM) as shown in Fig. 1. Rather than mapping each neuron to a dedicated PE, the MRPA shares its SFMs to all neurons. ...
... Nowadays, there are many applications [1,2], such as image transmission, acquisition, compression, enhancement and analysis, that require an efficient and accurate algorithm for image quality assessment (IQA). Generally, the IQA methods can be separated into two major classes: subjective assessment and objective assessment. ...
Article
Full-text available
Driven by the rapid development of digital imaging and network technologies, the opinion-unaware blind image quality assessment (BIQA) method has become an important yet very challenging task. In this paper, we design an effective novel scheme for opinion-unaware BIQA. We first utilize the convolutional maps to select high-contrast patches, and then we utilize these selected patches of pristine images to train a pristine multivariate Gaussian (PMVG) model. In the test stage, each high-contrast patch is fitted by a test MVG (TMVG) model, and the local quality score is obtained by comparing with the PMVG. Finally, we propose the deep activation pooling (DAP) to automatically emphasize the more important scores and suppress the less important ones so as to obtain the overall image quality score. We verify the proposed method on two widely used databases, that is, the computational and subjective image quality (CSIQ) and the laboratory for image and video engineering (LIVE) databases, and the experimental results demonstrate that the proposed method achieves better results than the state-of-the-art methods.