Article

An integrated system for developing regular array designs

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper describes an integrated system for developing regular array designs based on the block description language Ruby. Ruby supports concise design description and formal verification. A parametrised Ruby description can be used in simulating, refining and visualising designs, and in compiling hardware implementations such as field programmable gate arrays. Our system enables rapid design production, while good design quality is achieved by (a) the efficient instantiation of device-specific libraries, (b) the size optimisation of bit-level components using the design refiner, and (c) the exploitation of regularity information at source level in the library composition process. The development and implementation of several median filters are used to illustrate the system.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... It has been shown that, despite advances in automatic placement methods, user-supplied placement information can often significantly improve FPGA performance and resource utilization for common applications [S00]. Relative placement for hardware design has been developed for languages such as µFP [LJS89], Ruby [GL01] and Lava [S00]. However, all hardware languages that support such placement techniques are compiled into a netlist in a single stage. ...
... This facility will enable speed and energy optimisation, since recent work [WAL04] has shown that, for reconfigurable hardware technology, pipelined designs can run faster or can consume lower energy per operation than non-pipelined designs. Further work will generalise our approach to deal with relational descriptions [GL01]. ...
Conference Paper
Full-text available
This paper presents a framework for verifying compilation tools based on parametrised hardware libraries expressed in Pebble, a simple declarative language. An approach based on pass separation techniques is described for specifying and verifying Pebble abstraction mechanisms, such as the loop statement. We show how this approach can be used to verify the correctness of the flattening procedure in the Pebble compiler, which also results in a more efficient implementation than a non-verified version. The approach is useful for guiding compiler implementations for Pebble and related languages such as VHDL; it may also form the basis for automating the generation of provably-correct tools for hardware development.
... For instance, the calculation of the median value is not possible without knowing the size of the data for which it is applied. This requires a small modification inside sorter (according to the sorter design [16]) to append the median-related information such as group cardinality alongside the data passed on to the aggregation ending. This information can already be available, as the sorting part reads the window in its entirety before flushing its output to the aggregation engine. ...
Preprint
Aggregation queries are a series of computationally-demanding analytics operations on grouped and/or time series (streaming) data. They include tasks such as summation or finding the mean among the items of a group (sharing a group ID) or within the last N observed tuples. They have a wide range of applications including in database analytics, operating systems, bank security and medical sensors. Existing challenges include the increased hardware utilisation and random memory access patterns that result from hash-based approaches or multi-tasking as a way to introduce parallelism. There are also challenges relating to the degree of which the function can be calculated incrementally for sliding windows, such as with overlapping windows. This paper presents a pipelined and reconfigurable approach for calculating a wide range of aggregation queries with minimal hardware overhead.
... Karaman Another method is that of threshold decomposition, as used in [BT04], however the architecture proposed relies on the window being of size 3×3 and uses 3-input adders and so is not scalable to large windows. Systolic median architectures based on insertion sort have also been proposed [GL01]; in this case, the amount of hardware is proportional to the window size. In [VRSPGP02], the authors take advantage of the wide data buses on the development board to allow the median calculations for multiple pixels in parallel. ...
Thesis
Full-text available
Computer Vision is a rapidly developing field in which machines process visual data to extract some meaningful information. Digitised images in their pixels and bits serve no purpose of their own. It is only by interpreting the data, and extracting higher level information that a scene can be understood. The algorithms that enable this process are often complex, and data-intensive, limiting the processing rate when implemented in software. Hardware-accelerated implementations provide a significant performance boost that can enable real- time processing. The Trace Transform is a newly proposed algorithm that has been proven effective in image categorisation and recognition tasks. It is flexibly defined allowing the mathematical details to be tailored to the target application. However, it is highly computationally intensive, which limits its applications. Modern heterogeneous FPGAs provide an ideal platform for accelerating the Trace transform for real-time performance, while also allowing an element of flexibility, which highly suits the generality of the Trace transform. This thesis details the implementation of an extensible Trace transform architecture for vision applications, before extending this architecture to a full flexible platform suited to the exploration of Trace transform applications. As part of the work presented, a general set of architectures for large-windowed median and weighted median filters are presented as required for a number of Trace transform implementations. Finally an acceleration of Hidden Markov Model decoding for person detection is presented. Such a system can be used to extract frames of interest from a video sequence, to be subsequently processed by the Trace transform. All these architectures emphasise the need for considered, platform-driven design in achieving maximum performance through hardware acceleration.
... Wires are treated as supporting multiple types themselves, depending on the final compilation target environment. For example, for simulation using the re [10] tool, wires can carry string, integer or boolean values (this is useful for simulating wordlevel descriptions of designs). ...
Article
The increasing complexity of electronic circuits is leading to a crisis in hardware design. While the number of transistors that can be fabricated on an integrated circuit is doubling roughly every 18 months, the ability of circuit designers to take advantage of the extra hardware is growing much more slowly. This "design productivity gap" is a serious threat to future electronic de-velopment. Proposed solutions focus on "code re-use" and allowing designers to work from a higher level of abstraction when designing circuits. Quartz is a new declarative language for structural hardware description supporting high levels of abstraction through the use of higher order combi-nators. A compilation workflow exists to compile Quartz designs to hardware through a series of transformations resulting in VHDL output. In this work we investigate three areas of importance to the compilation of Quartz designs. We investigate ways in which Quartz blocks can be overloaded and distin-guished at compile-time from their type signatures. The Quartz type system is expanded to incorporate predicated types and we present a new algorithm based on satisfiability matrix predicates to manage overloading. Our algo-rithm uses matrices of type data to simultaneously process multiple possible typings of a Quartz program, eliminating possibilities that can not be used until a single overloaded instance can be resolved. Our approach is presented in the context of Quartz but is a general algorithm that could be adapted to any language with similar requirements. We present a static directional inference mechanism that can be used to support the compilation of Quartz designs while maintaining parameterisa-tion and the block hierarchy. We show that a complete directional inference algorithm can not exist, due to the possible usage of vectors of internal sig-nals within blocks. We present several possible optimisations to our static inference algorithm that can be used to successfully process common circuit patterns and demonstrate that non-static methods can be used as a last resort to compile Quartz circuits where directions can not be inferred. We discuss the issues with determining the size of Quartz vectors during compilation and present a number of mechanisms that can be used to aid the compilation of irregular circuits by inferring sizes using Sized Types and handling unknown vector sizes in a non-static list-like manner.
... • a symbolic simulator based on the Rebecca system [14], which can simulate bit-level and word-level designs, and combine symbolic, numeric and logical inputs and outputs; ...
Conference Paper
As design complexity grows, verification becomes a bottleneck in design development and implementation. This paper describes a novel approach for verifying reconfigurable streaming designs, based on symbolic simulation and equivalence checking. Compared with numerical simulation, symbolic simulation provides a more informative way of showing a design behaved as expected; equivalence checking enables automatic checking of equivalence of symbolic expressions. Our approach has been implemented for designs targeting Maxeler technologies, using an easy-to-use symbolic simulator and the Yices equivalence checker, together with other facilities such as an output combiner to support an automated verification flow. Several benchmarks including, including one-dimensional convolution and finite difference computation, are used to evaluate the proposed approach.
... There are several languages which allow structured hardware description with layout information encoded: µFP [She83], Ruby [JS90,GL01], Lava [CS00], Pebble [LM98,MLD02], Quartz [PL05], Hydra [O'D95] and CADIC [BHK + 87], to mention a few. Most of these are not primarily intended as layout languages, but as languages for structural netlist description. ...
Article
Full-text available
Abstract The semiconductor industry is facing increasing problems with designing complex circuits with tight constraints on area, performance and power consumption. Worse still, these circuits must be designed and verified very quickly. The existing design tools have great problems with handling the complexity of the designs, and time consuming manual intervention is often needed in order to reach a satisfactory result. One factor in this increasing complexity is the fact that routing wires dominate logical gates in today’s high-performance circuits when non-functional properties such as signal delay and power consumption are considered. In conventional design methods, information about routing wires is not included until the later design stages, so that bad choices early on are not discovered until after the time consuming physical design stage. In order to overcome this problem, we need design methods which take wire effects into account right from the start. This requires better abstraction techniques that faithfully model the lower-level implications, even when working at a high level
... To capture information about architecture and behaviour of cell in a regular array we advocate a hardware description notation that can express topological information. There already exists a hardware description language called Ruby [2] that allows behaviour and layout to be elegantly expressed, resulting in powerful architectural descriptions such as in [3,4] particularly. Topologically, a basic Ruby component is a tile that consists of four edges, namely East (e), West (w), South (s) and North (n). ...
Article
Full-text available
The square cells specified by Ruby language, called Ruby cells, can be relocated by either rotation, horizontal flip, vertical flip or shifting in a regular array structure such as FPGA (Field Programmable Gate Array) or Cell MatrixÔ. In this paper, the behaviours of those relocations have shown at two different levels, called architectural level and logic level. In other words, the behavioural views describe the function of the configuration relocation of the target circuits regardless of its implementation. Under the behavioural view of configuration relocation at architectural level, the cell relocations can be specified and reasoned formally by algebraic laws of Ruby algebra and Group theory to create an abstract description of the configuration relocations of the target circuit without reference to particular elements within the reconfiguration device. Under the behavioural view of configuration relocation at logic level, the above abstract description considered as a Partial Order-based Model (POM) and its dependencies are given by the transition relation and this is our approach to synthesise the algorithm for reconfiguration micro-controller automatically the information contained in the high-level specification languages.
... To capture information about architecture and behaviour we advocate a hardware description notation that can express topological information. There already exists a hardware description language called Ruby [2] that allows behaviour and layout to be elegantly expressed, resulting in powerful architectural descriptions such as in [3,4] particularly. ...
Article
Full-text available
Although the partially reconfigurable FPGA design is powerful if two different configurations were mapped at compile time to overlapping locations in the FPGA, only one of these configurations can be present in the array at any given moment. They cannot operate simultaneously. However, if somehow the final FPGA location can be determined at runtime, one or both of these overlapping configurations can be relocated to a new location that was previously unused to allow for simultaneous use. The configurations can be relocated by either rotation or shifting in an FPGA fabric. In this paper, our research has shown that the relocating configurations can be specified and reasoned formally by algebraic laws for checking whether a chip of given size and a given feasible schedule allow a feasible placement. Our examination is done on a generic partially reconfigurable FPGA and Ruby algebra is used to specify and reason in this case.
... The use of relative placement information, such as placing components beside or below one another, has been proposed for producing designs. Languages and systems that support this technique include µFP [8], Ruby [5], [17], T-Ruby [15], Lava [1], and Rebecca [3]. All these systems produce, from declarative descriptions, circuit layouts in the form of VHDL or EDIF descriptions with explicit coordinates which can be mapped efficiently into hardware. ...
Conference Paper
Full-text available
Placement information is useful in producing efficient circuit layout, especially for hardware libraries or for run-time reconfigurable designs. Relative placement information enables control of circuit layout at a higher level of abstraction than placement information in the form of explicit coordinates. We present a functional specification of a procedure for compiling programs with relative placement information in Pebble, a simple language based on Structural VHDL, into programs with explicit placement coordinate information. This procedure includes source-level transformation for compiling into descriptions that support conditional compilation based on symbolic placement constraints, a feature essential for parametrised library elements. Partial evaluation is used to optimise a description using relative placement to improve its size and speed. We illustrate our approach using a DES encryption design, which results in a 60% reduction in area and a 6% improvement in speed.
... The use of relative placement information, such as placing components beside or below one another, has been proposed for producing designs. Languages and systems that support this technique include Lava [1], Rebecca [2], µFP [7], Ruby [4],[15], and T-Ruby [13]. All these systems produce, from declarative descriptions, circuit layouts in the form of VHDL or EDIF descriptions with explicit coordinates which can be mapped efficiently into hardware. ...
Conference Paper
Full-text available
This paper presents a framework for verifying compilation tools for parameterised hardware libraries with placement information. Such libraries are captured in Pebble, a simple declarative language based on structural VHDL, and can contain placement information to guide circuit layout. Relative placement information enables control of circuit layout at a higher level of abstraction than placement information in the form of explicit coordinates. We provide a functional specification of a procedure for compiling Pebble programs with relative placement information into Pebble programs with explicit placement coordinate information. We present an overview of the steps for verifying this procedure based on pass separation techniques. The compilation procedure can be used in conjunction with partial evaluation to optimise the size and speed of circuits described using relative placement. Our approach has been used for optimising a pattern matcher design, which results in a 33% reduction in resource usage.
... Another method is that of threshold decomposition, as used in [13]; however, the architecture proposed relies on the window being of size 3 Â 3 and uses three input adders and thus is not scalable to large windows. Systolic median architectures based on insertion sort have also been proposed [14]; in this case, the amount of hardware is proportional to the window size. Similarly for the implementation in [15]. ...
Article
Full-text available
Most effort in designing median filters has focused on two-dimensional filters with small window sizes, used for image processing. However, recent work on novel image processing algorithms, such as the trace transform, has highlighted the need for architectures that can compute the median and weighted median of large one-dimensional windows, to which the optimisations in the aforementioned architectures do not apply. A set of architectures for computing both the median and weighted median of large, flexibly sized windows through parallel cumulative histogram construction is presented. The architecture uses embedded memories to control the highly parallel bank of histogram nodes, and can implicitly determine window sizes for median and weighted median calculations. The architecture is shown to perform at 72 Msamples, and has been integrated within a trace transform architecture.
... Currently, fast runtime partial reconfiguration features of embedded systems [6], [7] are available for programming the concept of dynamically reconfigurable computing by dynamic relocation of available cells in a regular array structure, whereby the system tries to avoid the lack of contiguous free cells preventing the configuration of new functions (provided that the total number of cells available is sufficient). Note that spreading the components of an incoming function, due to fragmentation of available cells, would degrade its performance , delaying tasks and reducing effective utilization of the regular array structure. ...
Conference Paper
Full-text available
Dynamically reconfigurable computing within embedded computer-based systems can be partially modified at runtime without stopping the operation of the whole system. In this paper, a provable algorithm for runtime evolution of a logical configuration is formally represented by the appropriate graph transformation. In other words, programming is considered as a visual transformation of the logical configuration by the formulated rules. Their soundness is proved. A logical configuration in evolution is provable from another by applying these rules. Subsequently, an algorithmic approach to programming is formally developed and analyzed
... Fast runtime partial reconfiguration features of embedded systems [9], [11] are available for programming the concept of dynamically reconfigurable computing by on-the-fly reorganizing of available cells in regular array structures such as those in FPGAs (Field-Programmable Gate Arrays). This is not only relevant to the composition of new embedded systems using architectures and procedural models that stem from the " designed for change " methodology but also can be of decisive importance during the perpetual process of upgrading existing evolution-capable embedded systems. ...
Article
Full-text available
In embedded systems, dynamically reconfigurable computing can be partially modified at runtime without stopping the operation of the whole system. In this paper, we consider a reorganization mechanism for dynamically reconfigurable computing in embedded systems to guarantee that invariants of the design are respected. This reorganization is considered as a visual transformation of the logical configuration by the formulated rules. The invariant is recognized under the restructuring of the configuration using reconfiguration rules.
... Our Quartz compiler targets the latest version of Pebble, which has a more concise syntax than earlier versions. This allows us to use much of the infrastructure that has been previously developed for the compilation and simulation of Pebble and Ruby designs [4]. Our tools include word-level symbolic simulation and bit-level simulation of designs, as well as the full range of VHDL simulation and synthesis tools. ...
Conference Paper
Full-text available
We present Quartz, the first language supporting advanced features such as polymorphism, overloading, formal reasoning and generic VHDL library compilation, for correct and efficient reconfigurable design. Quartz is designed to support formal reasoning for design verification and generic optimisation strategies can be captured as algebraic transformations; the correctness of such transformations has been established using the Isabelle theorem prover. The parameterisation supported by Quartz higher-order combinators makes the expression of regular designs with a parameterised level of pipelining much easier than the equivalent in VHDL. The language also supports reconfiguration through the use of virtual multiplexer blocks. We have used Quartz to describe a range of designs with parameterised pipelining, and investigated the different tradeoffs in speed, size and power consumption. For designs where pipeline registers can reduce glitch propagation, increasing the level of pipelining can reduce power consumption by as much as 90%
... (ii) The Lava system [130] can convert designs into a form suitable for input to a model checker; a number of FPGA design libraries have been verified in this way [131]. (iii) The Ruby language [132] supports correctnesspreserving transformations, and a wide variety of hardware designs have been produced. (iv) The Pebble [133] hardware design language has been formally specified [134], so that provably-correct design tools can be developed. ...
Article
Reconfigurable computing is becoming increasingly attractive for many applications. This survey covers two aspects of reconfigurable computing: architectures and design methods. The paper includes recent advances in reconfigurable architectures, such as the Alters Stratix II and Xilinx Virtex 4 FPGA devices. The authors identify major trends in general-purpose and special-purpose design methods. It is shown that reconfigurable computing designs are capable of achieving up to 500 times speedup and 70% energy savings over microprocessor implementations for specific applications.
Conference Paper
Full-text available
Survey Paper
Article
This paper is intended for on how mobile technologies can play an important role in the evolution of M-learning. In Second Generation, GSM can be introduced in three ways such as general packet radio services, continuous connection to the internet not requiring dial-up connection reaching up to 171.2 Kbits /s. Secondly, Blue tooth is a chip technology, that allows short range data and voice transfer among mobile devices. Thirdly, multimedia messaging service, which incorporates graphics and images into text messages. 2G services are frequently referred as personal communications service (PCS) and SMS messaging is available for data transmission. In third generation, EDGE can be introduced in two ways: as a packet-switched enhancement for general packet radio service (GPRS), known as enhanced GPRS or EGPRS, and as a circuit-switched data enhancement called enhanced circuit-switched data (ECSD). The services associated with 3G provide the ability to transfer both voice data (such as making a telephone call) and non-voice data such as downloading information, exchanging e-mail, and instant messaging. 4G will be the successor to 3G. It will feature high-speed mobile wireless access with a very high data transmission speed. It also addresses the notion of pervasive networks, an entirely hypothetical concept in which the user can be simultaneously connected to several wireless access technologies and can seamlessly move between them.
Article
Full-text available
This paper presents a framework for verifying compilation tools for parametrised hardware designs with placement information. The framework involves Pebble, a simple declarative language based on Structural VHDL which supports the use of placement information to guide circuit layout; such information often leads to efficient designs that are particularly important for hardware libraries. Relative placement information enables control of circuit layout at a higher level of abstraction than placement information in the form of explicit coordinates. An approach based on pass separation techniques is adopted for specifying and verifying two Pebble abstraction mechanisms: a flattening procedure and a relative placement method. For the flattening procedure, which takes a set of parametrised blocks and unfolds the circuit description into a netlist, we provide semantic descriptions of both the hierarchical and the flattened Pebble languages to prove its functional correctness. For the relative placement method, we specify the compilation procedure from Pebble programs with relative placement information to Pebble programs with explicit coordinate expressions, often in the form of symbolic placement constraints. This compilation procedure can be used in conjunction with partial evaluation to optimise the size and speed of parametrised circuit descriptions using relative placement, without flattening the original hierarchical descriptions. Our approach has been used for optimising a pattern matcher design, which results in a 33% reduction in resource usage. For DES encryption, our method can reduce the size of a DES design by 60%.
Conference Paper
This paper reviews techniques and tools for customising processors at design time and at run time. We use several examples to illustrate customisation for particular application domains, and explore the use of declarative and imperative languages for describing and customising data processors. We then consider run-time customisation, which necessitates additional work at compile time such as production of multiple configurations for downloading at run time. The customisation of instruction processors and design tools is also discussed.
Article
Manual placement of components is often used in FPGA circuit design in order to achieve better results than would be generated by automatic place and route algorithms. However, explicit placement of basic elements in parametrized hardware descriptions is tedious and error-prone. We describe a framework for the description and verification of parametrized hardware libraries with layout information, supporting both placing components with explicit symbolic coordinates and ‘neighboring’ placement directives such as A beside B. The correctness of generated layouts is established by proof in higher-order logic, automated by using the Isabelle theorem prover. We have developed an extensive library of theorems describing properties of layouts that are combined by our compiler and the theorem prover to achieve a high level of automation in the verification of complete circuit layouts, making formal verification of circuit layouts practical with minimal user effort. Our system has been used to verify layout descriptions for a range of circuits that have been mapped to Xilinx FPGAs.
Conference Paper
Full-text available
HML (Hardware ML) is an innovative hardware description language based on the functional programming language SML. HML is a high-order language with polymorphic types. It uses advanced type checking and type inference techniques. We have implemented an HML type checker and a translator to VHDL. We generate a synthesizable subset of VHDL and automatically infer types and interfaces. This paper gives an overview of HML and discusses its typechecking techniques and the translation from HML to VHDL. We present a non-restoring integer square-root example to illustrate the HML system
Article
Full-text available
Lava is a tool to assist circuit designers in specifying, designing, verifying and implementing hardware. It is a collection of Haskell modules. The system design exploits functional programming language features, such as monads and type classes, to provide multiple interpretations of circuit descriptions. These interpretations implement standard circuit analyses such as simulation, formal verification and the generation of code for the production of real circuits. Lava also uses polymorphism and higher order functions to provide more abstract and general descriptions than are possible in traditional hardware description languages. Two Fast Fourier Transform circuit examples illustrate this.
Article
Full-text available
A language of relations and combining forms is presented in which to describe both the behaviour of circuits and the specifications which they must meet. We illustrate a design method that starts by selecting representations for the values on which a circuit operates, and derive the circuit from these representations by a process of refinement entirely within the language. Formal methods have always been used in circuit design. It would be unthinkable to attempt to design combinational circuits without using Boolean algebra. This means that circuit designers, unlike programmers, already use mathematical tools as a matter of course. It also means that we have a good basis on which to build higher level formal design methods. Encouraged by these observations, we have been investigating the application of formal program development techniques to circuit design. We view circuit design as the transformation of a program describing the required behaviour into an equivalent program that is s...
Article
Full-text available
This paper describes a framework and tools for visualising hardware libraries for Fleld-Programmable Gate Arrays (FPGAs), which should also be useful for circuit design in general. Our approach integrates the visualisation of design behaviour and structure, supports various simulation modes, and assists the development of run-time reconfigurable designs in FPGAs such as Xilinx 6200 devices. Our tools can automatically generate a block diagram from a concise parametrised description. Design operations are animated by projecting a dataflow model on the block diagram. The user can select to view data values on specific input and output ports and internal paths. Numerical, symbolic and bit-level simulation and their combination are supported, and the animation speed can be adjusted. The tools should benefit both library users and suppliers, since they can be used (a) to show the internal structure of a design, (b) to illustrate effective usage of library components, and (c) to present the ...
Article
Full-text available
. We present an overview of a prototype system based on a functional language for developing regular array circuits. The features of a simulator, floorplanner and expression transformer are discussed and illustrated. INTRODUCTION Implementing algorithms on a regular array of processors has many advantages. Besides offering an efficient realisation of parallel structures, regular patterns of interconnections also provide an opportunity for simplifying their description and their development. Various approaches for regular array design have been proposed; examples include methods based on dependence graphs [5], recurrence equations [14], and algebraic techniques [16]. This paper presents an overview of a prototype system for regular array development. The system is based on ¯FP [15], a functional language with mechanisms for abstracting spatial and temporal iteration. These abstractions result in a succinct and precise notation for specifying designs. Moreover, the explicit representat...
Article
This paper describes the use of Ruby, a language of functions and relations, to develop serialised implementations of array-based architectures. Our Ruby expressions contain parameters which can be varied to produce a wide range of designs with different space-time trade-offs. Such expressions can be obtained by applying correctness-preserving transformations to an initial simple description. This approach provides a unified treatment of serialisation schemes similar to LPGS (Locally Parallel Globally Sequential) and LSGP (Locally Sequential Globally Parallel) partitioning methods, and will be illustrated by the development of a variety of circuits for convolution.
Conference Paper
This work describes a single-chip VLSI median filter in which a new algorithm of complexity linearly dependent on the filter window length is implemented as a bit-level systolic array. The filter has a window of 25 samples and has been tested at a clock frequency over 70 MHz.
Conference Paper
For pragmatic reasons it is useful to exclude the identity relation from the ‘implementable subset’ of Ruby. However there are many expressions in the relational calculus whose natural meaning is just this identity relation. This note gives an identity-free account of some of these expressions, and shows that there is no satisfactory identity-free account of some others. This is an exercise in writing about Ruby without drawing any pictures, in part because it is about those expressions which would correspond to blank pictures.
Conference Paper
We suggest that the productivity of FPGA users can be improved by adopting design libraries which are optimally implemented, rich in variety, easy to use, compatible with incremental development techniques and carefully validated. These requirements motivate our research into a framework for developing FPGA libraries involving the industrial-standard VHDL language and the declarative language Ruby. This paper describes the main elements in our framework, and illustrates its application to the Xilinx 6200 series FPGAs.
Conference Paper
In this paper we have proposed a new approach to the design of media intensive appliances using a CPU and a modest amount of FPGA for hardware acceleration. From implementations of representative algorithms (a digital filter and a dynamic programming match), we have demonstrated that a small amount of reconfigurable logic can be used to achieve high performance. For these algorithms the equivalent of around 500 Xilinx CLBs, plus a small local memory, can increase performance 12 to 21 times when compared to an optimised CPU-only implementation. The FPGA is able to exploit parallelism available in an algorithm. The FPGA can also be “bit-width efficient”, using no more bits of precision than are necessary. Internal store can be used for local data, state and control. The system architecture must however be able to support this increased computation rate. We have explored three architectures that enable this. In functional unit mode the FPGA is a slave coprocessor to the CPU. This is the simplest mode for partitioning but performance can be limited by marshalling overheads on the CPU. It can be used where FPGA resources do not enable direct memory access or where only portions of an algorithm computation can be accommodated. Lockstep mode is very similar to functional unit mode but increases performance by having a direct data path between the FPGA and memory. Datapath mode offers the best potential performance but, as the CPU and FPGA execute as two independent units, there are many more issues that must be considered when partitioning. Datapath mode is particularly effective when the FPGA can process a large independent part of the algorithm without need for complex control and synchronisation with the CPU. These architectures provide templates for the synthesis of CPU/FPGA systems. They provide contexts within which partitions can be evaluated. Although the process is manual at the moment, we expect the architectures and techniques discussed in this paper to form the basis for future tools which will automate this process. Research at HPLabs Bristol is targeted at providing these design tools.
Article
The ALPHA language results from research on automatic synthesis of systolic algorithms. It is based on the recurrence equation formalism introduced by Karp, Miller and Winograd in 1967. The basic objects of ALPHA are variables indexed on integral points of a convex set. It is a functional/equational language, whose definition is particularly well-suited to express regular algorithms, as well as transformations of these algorithms from their initial mathematical specification to an implementation on a synchronous parallel architecture. In particular, ALPHA makes it easy to define, prove and implement basic transformations such as Leiserson and Saxe's retiming, space-time reindexing, localization, and partitioning. We describe ALPHA, its use for expressing and deriving systolic arrays, and the design environment ALPHA DU CENTAUR for this language.
Conference Paper
This paper presents a method, based on the formalism of affine recurrence equations, for the synthesis of digital circuits exploiting parallelism at the bit-level. In the initial specification of a numerical algorithm, the arithmetic operators are replaced with their yet unscheduled (schedule-free) binary implementation as recurrence equations. This allows a bit-level dependency analysis yielding a bit-parallel array. The method is demonstrated on the example of the matrix-vector product, and discussed
Conference Paper
FPGA-based synthesis roofs require information about behaviour and architecture to make effective use of the limited number of cells typically available. A hardware description language which models layout and behaviour is used to elegantly specify circuit architecture. This source level information is used to efficiently translate circuit descriptions onto FPGA devices
Conference Paper
We report our current research in a computer assisted methodology for synthesizing regular array processors using the ALPHA language and design environment. The design process starts from an algorithmic level description of the function and finishes with a netlist of an array processor which performs the specified function. To illustrate the proposed approach, we present the design of an array processor to do polynomial division
Conference Paper
We present an experimental framework for mapping declarative programs, written in a language known as Ruby, into various combinations of hardware and software. Strategies for parametrised partitioning into hardware and software can be captured concisely in this framework, and their validity can be checked wing algebraic reasoning. The method has been used to guide the development of prototype compilers capable of producing, from a Ruby expression, a variety of implementations involving field programmable gate arrays (FPGAs) and microprocessors. The viability of this approach is illustrated using a number of examples for two reconfigurable systems, one containing an array of Algotronix devices and a PC host, and the other containing a transputer and a Xilinx device
Conference Paper
We describe a cost-effective method for developing parallel architectures which increase the performance of range and image sensors. A parametrised edge detector and its systolic implementation using Field-Programmable Gate Arrays (FPGAs) are presented. Experiments and analyses indicate that our circuits can satisfy the performance requirements, and some of the designs out-perform the software equivalent on a 486-based PC by nearly two orders of magnitude. 1 Introduction This paper describes an approach for developing architectures for range and image sensors, which have applications in industrial inspection and in autonomous vehicles and robots. Our work has been inspired by three developments: the need to include powerful processing in sensing and control systems, the availability of programmable hardware like Field-Programmable Gate Arrays (FPGAs), and the advance in languages and tools for hardware synthesis. Custom hardware is often used in real-time sensing; for example, Graefe ...
Conference Paper
The authors consider the use of a nonstandard interpretation to analyze parametrized circuit descriptions, in particular for array based architectures. Various metrics are employed to characterize the performance tradeoffs for generic designs. The objective is to facilitate the comparison of feasible design alternatives at an early stage of development. The research centers on techniques for extracting various performance attributes, such as critical path and latency, from a single generic design representation. The features of this approach include-uniformity, modularity, reusability, flexibility, and computerized support
Article
Median filtering is a simple digital technique for smoothing signals. One main characteristic of the filter is that it maps the input signal space into a root signal space, where signals invariant to median filters are called roots of the signal. In this paper, we develop the theory for the root signal set of median filters. A tree structure for the root signal set is obtained for binary signals. The number of roots R (n) for a signal of length "n" and window size filter "2s- 1" is exactly represented by the difference equation R(n) = R(n - 1) + R(n - s). A general solution is obtained in a Z domain approach. Finally, a method for faster one dimensional median filter operation is introduced.
Article
We present a fast algorithm for two-dimensional median filtering. It is based on storing and updating the gray level histogram of the picture elements in the window. The algorithm is much faster than conventional sorting methods. For a window size of m × n, the computer time required is 0(n).
Article
In this paper a nonlinear smoothing algorithm recently proposed by Tukey is described and evaluated for speech processing applications. Simple linear smoothing routines generally fail to provide adequate smoothing for data which exhibit both local roughness and sharp discontinuities. The proposed nonlinear smoothing algorithm can effectively smooth such data by using a combination of median smoothing routines and linear filtering. The concept of double smoothing is introduced as a refinement on the smoothing algorithm. Examples of the application of the nonlinear smoothing methods to typical speech parameters are included in this paper.
Article
An algorithm for VLSI median filtering of one-dimensional signals of complexity linearly dependent on the filter window length is described. The algorithm is implemented as a bit-level systolic array (BLSA), in order to achieve high performance. A single-chip median filter characterized by a window length of 25 8-b samples, and by operation on three interleaved independent sequences for a total of 75 samples, is presented as a demonstration of the concept. The throughput relevant to one sequence is 1/3 for this chip, whereas the theoretical maximum allowed by the algorithm is 1/2. Prototypes designed with a 2-μm CMOS technology have been successfully tested at a clock frequency over 70 MHz
Article
This paper describes the T-Ruby system for designing VLSI circuits, starting from formal specifications in which they are described in terms of relational abstractions of their behaviour. The design process involves correctness-preserving transformations based on proved equivalences between relations, together with the addition of constraints. A class of implementable relations is defined. The tool enables such relations to be simulated or translated into a circuit description in VHDL. The design process is illustrated by the derivation of a circuit for 2-dimensional convolution. Keywords: Formal methods; Design by transformation; Integration of formal systems med CAD. 1 INTRODUCTION 1 1 Introduction This paper describes a computer-based system, known as T-Ruby [12], for designing VLSI circuits starting from a high-level, mathematical specification of their behaviour: A circuit is described by a binary relation between appropriate, possibly complex domains of values, and simple rela...
Article
This paper describes a tool for use in user-directed synthesis of circuits specified using the relational VLSI description language Ruby. The synthesis method is based on transformational rewriting of Ruby terms in accordance with previously defined term equivalences. The tool permits the introduction of constraints into the specification, thus enhancing the usefulness of the rewrite system in relation to simple rewriting. Keyword Codes: B.7.2, D.1.1. Keywords: Integrated Circuits, Design Aids; Applicative Programming. 1. Introduction Ruby [3] is a language intended for specifying VLSI circuits in terms of relational abstractions of their behaviour. A circuit is described by a binary relation, and the language permits simple relations to be composed into more complex ones by the use of a variety of combining forms which are higher-order functions. Similarly, simple combining forms can be composed into more complex ones, as in conventional languages for functional programming. The basi...
Article
For pragmatic reasons it is useful to exclude the identity relation from the `implementable subset' of Ruby. However there are many expressions in the relational calculus whose natural meaning is just this identity relation. This note gives an identity-free account of some of these expressions, and shows that there is no satisfactory identity-free account of some others. This is an exercise in writing about Ruby without drawing any pictures, in part because it is about those expressions which would correspond to blank pictures. What there is when there is nothing there In Ruby one uses relations to represent circuit components, and the composition R ; S of relations corresponds to some connection of two components in which the parts of the R represented by its range are connected to the parts of the S represented by its domain. With this interpretation, the repeated composition R n naturally represents a `pipeline' of n components, each an R, connected in a linear array. A...
Article
. This paper presents an overview of a prototype hardware compiler which compiles a design expressed in the Ruby language into FPGAs. The features of two important modules, the refinement module and the floorplanning module, are discussed and illustrated. Target code can be produced in various formats, including device-specific formats such as XNF or CFG, and device-independent formats such as VHDL. The viability of our floorplanning scheme is demonstrated by a compiler backend for Algotronix's CAL1024 FPGAs. The implementation of a priority queue is used to illustrate our approach. 1 Introduction Compiling selected parts of application programs into hardware, such as FPGAs, has recently attracted much interest. This method holds promise of producing better special-purpose systems more rapidly than existing techniques. A number of hardware compilers (see, for example, [8], [11]) have been developed for designs described in various languages into hardware netlists, which can then be ma...
Article
. We examine the use of non-standard interpretation to analyse parametrised circuit descriptions, in particular for array-based architectures. Various metrics are employed to characterise the performance trade-offs of generic designs. The objective is to facilitate the evaluation of such metrics for estimating design quality, so that feasible design alternatives can be compared at an early stage of development. INTRODUCTION Constructing digital systems involves two challenges: to develop one or more circuits that perform the desired function, and to analyse design alternatives in order to select the optimal design. Our previous work [3], [4], [5] has described an algebraic framework and the associated computer-based tools for developing array-based architectures. We have shown how such a framework can be used to simplify the parametrisation, structuring and refinement of designs. This paper builds on this algebraic framework and examines the analysis of parametrised descriptions by n...
Article
The declarative language Ruby provides a coherent framework for representing and developing designs. Sketching diagrams for Ruby programs by hand is, however, time-consuming and error-prone. This paper describes a design sketcher which automates the production of a diagram from a Ruby description. 1 INTRODUCTION Text-based languages, such as VHDL, 3 are becoming increasingly popular for developing designs. Their popularity is mainly due to their facilities for parametrising designs, and it is a great bonus if both behaviour and structure can be expressed in a single notation. Moreover, pictorial representations such as circuit schematics can be tedious to create and to modify. Providing visual aid in hardware design is, nevertheless, important. Circuit diagrams, when appropriately drawn, make explicit the basic structure and size of components, allowing designers to obtain rapidly an overview of a design and to locate specific parts on which they can focus. There have been attempts ...
Article
. We suggest that the productivity of FPGA users can be improved by adopting design libraries which are optimally implemented, rich in variety, easy to use, compatible with incremental development techniques and carefully validated. These requirements motivate our research into a framework for developing FPGA libraries involving the industrial-standard VHDL language and the declarative language Ruby. This paper describes the main elements in our framework, and illustrates its application to the Xilinx 6200 series FPGAs. 1 Introduction FPGA users often face a dilemma. The advance in microprocessor and custom hardware technologies is increasing the pressure for improving functionality and performance of FPGA-based implementations; such improvements, however, may necessitate the use of low-level, device-specific FPGA features, thus lengthening design cycles and reducing the opportunity for design reuse. The productivity of FPGA users can be greatly enhanced by having library elements -- ...
Article
this paper is fourfold: first, to describe some observations about how arraybased designs can be optimised by transposition -- a method of rearranging components and their interconnections; second, to provide concise parametric representations of such designs; third, to present simple equations that correspond to correctness-preserving transformations of these parametric representations; and finally, to suggest quantitative measures of design trade-offs involved in this kind of transformation. Motivation
Circuit design in ruby Formal Methods for VLSI Design
  • G Jones
  • M Sheeran
G. Jones, M. Sheeran, Circuit design in ruby, in: J. Staunstrup (Ed.), Formal Methods for VLSI Design, North-Holland, Amsterdam, 1990, pp. 13±70.
A model for representing Ruby circuits
  • C J Block
C.J. Block, A model for representing Ruby circuits, in: Proceedings of the Glasgow Workshop on Functional Programming, 1996, http://www.dcs.gla.ac.uk/research/ fpga/papers/ps/Model.ps.
Transformational rewriting with Ruby, in: Computer Hardware Description Languages and Their Applications (CHDL'93)
  • R Sharp
  • O Rasmussen
R. Sharp, O. Rasmussen, Transformational rewriting with Ruby, in: Computer Hardware Description Languages and Their Applications (CHDL'93), Elsevier, Amsterdam, 1993, pp. 243±360.
Regular array synthesis using Alpha IEEE Computer Society es such as ®eld-programmable gate arrays
  • D Wilde
  • O Sie
D. Wilde, O. Sie, Regular array synthesis using Alpha, in: Proceedings of the International Conference on Applica-tion-speci®c Array Processors, IEEE Computer Society es such as ®eld-programmable gate arrays. He received his M.A., M.Sc., and D.Phil. in engineering and computing science from University of Oxford. S. Guo, W. Luk / Journal of Systems Architecture 47 (2001) 315±337
A recon®gurable approach to low cost media processing Field-Program-mable Logic and Applications
  • I Kostarnov
  • S Morley
  • J Osmany
  • C Soloman
I. Kostarnov, S. Morley, J. Osmany, C. Soloman, A recon®gurable approach to low cost media processing, in: W. Luk, P.Y.K. Cheung, M. Glesner (Eds.), Field-Program-mable Logic and Applications, Lecture Notes in Computer Science, vol. 1304, Springer, Berlin, 1997, pp. 79±90.
Ruby-a language of relations and high-order functions
  • M Sheeran
M. Sheeran, Ruby±a language of relations and high-order functions, in: G. Birtwistle (Ed.), Proceedings of the Third Ban€ workshop on hardware veri®cation, Springer, Berlin, 1990.
Tcl and Tk toolkit, Addison-Wesley professional computing series
  • J K Ousterhout
J.K. Ousterhout, Tcl and Tk toolkit, Addison-Wesley professional computing series, 1996.
Techniques and tools for developing Ruby design, D.Phil. thesis, Computing laboratory
  • S Guo
S. Guo, Techniques and tools for developing Ruby design, D.Phil. thesis, Computing laboratory, University of Ox-ford, 1997.