Figure 9 - uploaded by Marco Cornero
Content may be subject to copyright.
A simple bit-wise subset of the tree of adders of the convolution blocks.

A simple bit-wise subset of the tree of adders of the convolution blocks.

Source publication
Conference Paper
Full-text available
In this paper we present a VHDL-based design methodology which we adopted in the design of an ASIC chip for real time image analysis in a quality control industrial environment. The design methodology is based on the following considerations: i) we ezplored the design space by applying some high level transfo~rmations on the VHDL specifications; ii...

Contexts in source publication

Context 1
... modeling is illustrated on the basis of an example. In Figure 9 two single bit pipelined adders together with four reg- isters (D-s) are shown. The example shows a subset of the tree of adders of the convolution blocks. ...
Context 2
... pipeline stages in the example of Figure 9 are synchronized with the submaster clock S.CLK. The D- s (A0 add, B0 add, A1 add, B1 add) are modeled us- ing processes synchronized on the edges of the S.CLK: each D-is modeled using two processes ("master" and "slave"): the "master" process is synchronized on the rising edge of S.CLK, and the "slave" pro- cess is synchronized on the falling edge of S.CLK. ...
Context 3
... Figure 10 the timing diagram and the pipeline schema for the circuit of Figure 9 is illustrated. In Figure 9 the VHDL description is also detailed. ...
Context 4
... Figure 10 the timing diagram and the pipeline schema for the circuit of Figure 9 is illustrated. In Figure 9 the VHDL description is also detailed. ...

Similar publications

Conference Paper
Full-text available
This work proposes an VHDL generation software for optimized FIR filters. In this paper a near optimum algorithm for constant coefficient FIR filters was used. This algorithm uses general coefficient representation for the optimal sharing of partial products in Multiple Constants Multiplications (MCM). The developed tool was compared to Matlab FDA...
Article
Full-text available
A flow is proposed which offers a programming approach to the systems design of application specific micro-controllers. Tools have been developed for compilation and cosimulation, and a reconfigurable board has been designed which can be used for rapid prototyping. The final design can be compiled into a structural VHDL netlist for a standard cell...
Article
Full-text available
Pattern localization and classification are CPU time intensive being normally implemented in software, however with lower performance than custom implementations. Custom implementation in hardware (ASIC) allows real-time processing, having higher cost and time-to-market than software implementation. We present an alternative that represents a good...
Article
Full-text available
The industrial applications of fuzzy processors are increasing mainly in control and pattern recognition fields. Some of these applications require high speed that can not be obtained by standard commercial fuzzy processors. This paper describes the architecture of a very small size high speed fuzzy chip with two inputs and one output. The input sa...
Conference Paper
Full-text available
Résumé— La minimisation du temps de transfert des données, entre la mémoire et les blocs de calcul dans les algorithmes de traitement des images, reste un problème major pour assurer un temps réel, surtout avec des algorithmes qui utilisent de plus en plus des données parallèles. Le choix du système de communication dépend des composants constituan...

Citations

... The development of application specific hardware description languages, such as Silage, HIFI, and others [5,6,10], aimed to the synthesis of multi-dimensional signal processing applications do not consider the possible parallelism across loop iterations. Others techniques have been developed to be applied when writing the VHDL code in order to overcome possible synthesis constraints [17]. Since multi-dimensional applications are highly dependent on the execution of nested loops, in this paper we present a technique, which analyzes such loops, splitting them in sections that can run in parallel. ...
Conference Paper
Full-text available
Multi-dimensional systems, including image processing, geophysical signal processing, and fluid dynamics, are becoming one of the most important targets of computational improvement studies. Most of the optimized solutions to those problems point to the use of application specific integrated circuits (ASICs). From the analysis of the multi-dimensional programming code, one can observe that nested loop like structures are often the most time consuming part. Designing ASICs with multiple processing units is usually the appropriate solution to achieve the required computational performance. In this paper, a new loop transformation algorithm, which allows an efficient utilization of the multiprocessor system is presented. Uniform nested loops are modeled as multi-dimensional data flow graphs. New loop structures are generated so that an arbitrary number of processors available in the system can run in parallel. An example demonstrates the effectiveness of the algorithm
... This application refers to the development of a dedicated digital VLSI hardware to detect the presence of linear defects in images. The portion of the architecture with the highest computational load can be logically subdivided into two cascaded modules: a feature extractor module followed by a decision module to classify the presence/absence of linear defects in the object [13]. The first module is a constrained convolver whose 9 × 5 weights mask has been trained with defect/no-defect examples. ...
Article
Full-text available
The paper provides a sensitivity analysis to measure the loss in accuracy induced by perturbations affecting acyclic computational flows composed of linear convolutions and nonlinear functions. We do not assume a large number of coefficients or input independence for the convolution module, nor strict requirements on the nonlinear function. The analysis is tailored to digital VLSI implementations where perturbations, associated with data quantization, affect the device inputs, coefficients, internal values, and outputs. The sensitivity analysis can be used to measure the loss in accuracy along the computational chain, to characterize the tolerated perturbations, and to dimension the whole architecture
... Application Specific Processor (ASP) design concepts [1,7,8] gained attention after extensive developments have been done in two different fields: VLSI design automation and parallel code generation fields. Related advancements are made in layout compaction, logic synthesis, RTL and behavioral synthesis, software pipelining, and VLIW type of architectures. ...
Conference Paper
Full-text available
We outline general design steps of our synthesis tool to realize application specific co-processors such that for a given scientific application having intensive iterative computations especially with recurrences, a VLIW type of co-processor is synthesized and realized, and an accompanying parallel code is generated. We introduce a novel register file model, Shifting Register File (SRF), based on cyclic regularity of register file accesses; and a simple method, Expansion Scheduling, for scheduling iterative computations, which is based on cyclic regularity of loops. We also present a variable-register file allocation method and show how simple logic units can be used to activate proper registers at run time through an example
... Application Specific Processor (ASP) design concepts [1,7,8] gained attention after extensive developments have been done in two different fields: VLSI design automation and parallel code generation fields. Related advancements are made in layout compaction, logic synthesis, RTL and behavioral synthesis, software pipelining, and VLIW type of architectures. ...
Article
Full-text available
In this paper, we outline general design steps of our synthesis tool to realize application specific co-processors such that for a given scientific application having intensive iterative computations especially with recurrences, a VLIW type of co-processor is synthesized and realized, and an accompanying parallel code is generated. We introduce a novel register file model, Shifting Register File (SRF), based on cyclic regularity of register file accesses; and a simple method, Expansion Scheduling, for scheduling iterative computations, which is based on cyclic regularity of loops. We also present a variable-register file allocation method and show how simple logic units can be used to activate proper registers at run time through an example.
... Researchers, at IMEC and other institutions , have found plenty motivation [8] to develop and/or improve application specific hardware description languages, such as Silage, HIFI, and others, in order to synthesize multi-dimensional signal processing applications [4, 5]. Others, have developed techniques to be applied when writing the VHDL code in order to overcome possible synthesis constraints [13] . Since multi-dimensional applications are highly dependent on the execution of nested loops, in this paper we introduce the idea of pre-compiling specific commands used to describe such loops, in order to adapt them to current VHDL synthesizable constructs. ...
... Optimization techniques are used to improve the parallelism among the operations in order to satisfy the time constraint. In [13], loop unfolding or unrolling was used to identify parallelism in the process and to do such an optimization. However, all the required optimization was done manually, and in several different steps. ...
Conference Paper
Full-text available
The VHDL language is considered to be an important standard among the hardware description tools. Most of the existing loop optimization techniques that consider the parallelism inherent to multi-dimensional problems depend on loop transformations not available in the current VHDL Synthesis products. This study presents a coding technique on modeling multi-dimensional (nested) loops on VHDL, where pre-processor tools can rewrite the VHDL instructions in such a way that the optimized design can be synthesized. This new approach is expected to improve the VHDL design cycle by including multidimensional signal processing and other common applications in the scope of the VHDL Synthesis tools
... The design methodology (see [10]) is shown in Fig. 3. We started from a set of specications (system requirements), issued by the end user (e.g. ...
... The design methodology (see [10]) is shown inFig. 3 . ...
... The high level transformations identied quite easilyFigure 3: Design methodology. the micro-architecture of the data-path for the computation of the 2-D convolution (see [10]). The chip architecture was then described at RT-level; at the same time we devised the time scheduling and the resource allocation. ...
Article
Full-text available
In this paper we present the design of an ASIC chip for real-time image processing in industrial applications. The chip is a module of a system for the automatic surface inspection of mechanical parts: it implements the feed-forward phase of a neural network model (multi-layer perceptron with local connections) tuned to the specific application. The design has been performed in 0.7 /spl mu/m CMOS technology using an approach based on high level transformations of the VHDL specifications. Special emphasis was given to achieve real-time speed. As a result, the architecture is based on a deep pipeline and the performance is beyond the real-time specifications.
Article
In this paper, we present a VLSI architecture for real-time image processing in quality control industrial applications: automation of the visual inspection phase of mechanical parts treated by the Fluorescent Magnetic Particle Inspection method for structural-defect detection. The VLSI architecture implements a highly constrained neural network tailored for this specific application: the multi-layer perceptron with strictly local connections. The learning of the weights is performed off line by using the adaptive simulated-annealing algorithm. The neural network has been trained on real plant data: recognition results of the training and classification tasks compare favorably with those obtained by expert human operators.The VLSI architecture receives as input the image (taken on-line on the plant) of a mechanical part and it will find out if at least one structural surface defect is present. The VLSI architecture was optimized, through a set of transformations on the high-level VHDL specifications of the neural network algorithm, to reach real-time operating conditions. Following the proposed approach and the designed architecture, we designed and successfully tested a custom VLSI chip for the real-time implementation of the recognition task.