Conference Paper

Compiling Higher Order Functional Programs to Composable Digital Hardware

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This work demonstrates the capabilities of a high-level synthesis tool-chain that allows the compilation of higher order functional programs to gate-level hardware descriptions. Higher order programming allows functions to take functions as parameters. In a hardware context, the latency-insensitive interfaces generated between compiled modules enable late-binding with libraries of pre-existing functions at the place-and-route compilation stage. We demonstrate the completeness and utility of our approach using a case study, a recursive k-means clustering algorithm. The algorithm features complex data-dependent control flow and opportunities to exploit both coarse and fine-grained parallelism.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
Modern embedded image processing deployment systems are heterogeneous combinations of general-purpose and specialized processors, custom ASIC accelerators and bespoke hardware accelerators. This paper offers a primer on hardware acceleration of image processing, focusing on embedded, real-time applications. We then survey the landscape of High Level Synthesis technologies that are amenable to the domain, as well as new-generation Hardware Description Languages, and present our ongoing work on IMP-lang, a language for early stage design of heterogeneous image processing systems. We show that hardware acceleration is not just a process of converting a piece of computation into an equivalent hardware system: that naive approach offers, in most cases, little benefit. Instead, acceleration must take into account how data is streamed throughout the system, and optimize that streaming accordingly. We show that the choice of tooling plays an important role in the results of acceleration. Different tools, in function of the underlying language paradigm, produce wildly different results across performance, size, and power consumption metrics. Finally, we show that bringing heterogeneous considerations to the language level offers significant advantages to early design estimation, allowing designers to partition their algorithms more efficiently, iterating towards a convergent design that can then be implemented across heterogeneous elements accordingly.
Preprint
Full-text available
Modern embedded image processing deployment systems are heterogeneous combinations of general-purpose and specialized processors, custom ASIC accelerators and bespoke hardware accelerators. This paper offers a primer on hardware acceleration of image processing, focusing on embedded, real-time applications. We then survey the landscape of High Level Synthesis technologies that are amenable to the domain, and present our ongoing work on IMP-Lang, a language for early stage design of heterogeneous image processing systems. We show that hardware acceleration is not just a process of converting a piece of computation into an equivalent hardware system: that naive approach offers, in most cases, little benefit. Instead, acceleration must take into account how data is streamed throughout the system, and optimize that streaming accordingly. We show that the choice of tooling plays an important role in the results of acceleration. Different tools, in function of the underlying language paradigm, produce wildly different results across performance, size, and power consumption metrics. Finally, we show that bringing heterogeneous considerations to the language level offers significant advantages to early design estimation, allowing designers to partition their algorithms more efficiently, iterating towards a convergent design that can then be implemented across heterogeneous elements accordingly.
Article
Full-text available
High-level synthesis (HLS) is an increasingly popular approach in electronic design automation (EDA) that raises the abstraction level for designing digital circuits. With the increasing complexity of embedded systems, these tools are particularly relevant in embedded systems design. In this paper, we present our evaluation of a broad selection of recent HLS tools in terms of capabilities, usability and quality of results. Even though HLS tools are still lacking some maturity, they are constantly improving and the industry is now starting to adopt them into their design flows.
Article
Full-text available
Behavioral synthesis is an automated design process that compiles functional and/or algorithmic descriptions into optimized hardware architectures. It has long been identified as one of the critical technologies for enabling the transition to the higher level of abstraction. Unfortunately, it stumbled in its debut on the EDA marketplace during the mid-1990s and so far has had a limited adoption among chip designers. With the billion-transistor chip on the horizon, the design com-plexity of integrated circuit systems is outgrowing the capabilities of current RTL methods. This brought about a renewed interest in behavioral synthesis, and innovations are required to address the new challenges in nanometer IC designs. In this paper we present the xPilot behavioral synthesis system being developed at UCLA. xPilot aims to provide novel platform-based synthesis technologies to optimize logic, interconnects, performance, and power simultaneously at the high level. The ultimate goal is to improve both design productivity and quality of results. Preliminary experiments on FPGAs demonstrate the efficacy of our approach on a wide range of applications and its value in exploring various design tradeoffs.
Chapter
The rapid increase of complexity in System-on-a-Chip design urges the design community to raise the level of abstraction beyond RTL. Automated behavior-level and system-level synthesis are naturally identified as next steps to replace RTL synthesis and will greatly boost the adoption of electronic system-level (ESL) design. High-level executable specifications, such as C, C++, or SystemC, are also preferred for system-level verification and hardware/software co-design. In this chapter we present a commercial platform-based ESL synthesis system, named AutoPilot™ offered by AutoESL Design Technologies, Inc. AutoPilot is based on the xPilot system originally developed at UCLA. It automatically generates efficient RTL code from C, C++ or SystemC descriptions for a given system platform and simultaneously optimize logic, interconnects, performance, and power. Preliminary experiments demonstrate very promising results for a wide range of applications, including hardware synthesis, system-level design exploration, and reconfigurable accelerated computing.
Chapter
The rapid increase of complexity in System-on-a-Chip design urges the design community to raise the level of abstraction beyond RTL. Automated behavior-level and system-level synthesis are naturally identified as next steps to replace RTL synthesis and will greatly boost the adoption of electronic system-level (ESL) design. High-level executable specifications, such as C, C++, or SystemC, are also preferred for system-level verification and hardware/software co-design. In this chapter we present a commercial platform-based ESL synthesis system, named AutoPilot™ offered by AutoESL Design Technologies, Inc. AutoPilot is based on the xPilot system originally developed at UCLA. It automatically generates efficient RTL code from C, C++ or SystemC descriptions for a given system platform and simultaneously optimize logic, interconnects, performance, and power. Preliminary experiments demonstrate very promising results for a wide range of applications, including hardware synthesis, system-level design exploration, and reconfigurable accelerated computing.
Conference Paper
Geometry of Synthesis is a technique for compiling higher-level programming languages into digital circuits via their game semantic model. Ghica (2007) first presented the key idea, then Ghica and Smith (2010) gave a provably correct compiler into asynchronous circuits for Syntactic Control of Interference (SCI), an affine-typed version of Reynolds's Idealized Algol. Affine typing has the dual benefits of ruling out race conditions through the type system and having a finite-state game-semantic model for any term, which leads to a natural circuit representation and simpler correctness proofs. In this paper we go beyond SCI to full Idealized Algol, enhanced with shared-memory concurrency and semaphores. Compiling ICA proceeds in three stages. First, an intermediate type system called Syntactic Control of Concurrency (SCC), is used to statically determine "concurrency bounds" on all identifiers in the program. Then, a program transformation called serialization is applied to the program to translate it into an equivalent SCC program in which all concurrency bounds are set to the unit. Finally, the resulting program can be then compiled into asynchronous circuits using a slightly enhanced version of the GoS II compiler, which can handle assignable variables used in non-sequential contexts.
Conference Paper
Abramsky's Geometry of Interaction interpretation (GoI) is a logical-directed way to reconcile the process and functional views of computation, and can lead to a dataflow-style semantics of programming languages that is both operational (i.e. effective) and denotational (i.e. inductive on the language syntax). The key idea of Ghica's Geometry of Synthesis (GoS) approach is that for certain programming languages (namely Reynolds's affine Syntactic Control of Interference - SCI) the GoI processes-like interpretation of the language can be given a finitary representation, for both internal state and tokens. A physical realisation of this representation becomes a semantics-directed compiler for SCI into hardware. In this paper we examine the issue of compiling affine recursive programs into hardware using the GoS method. We give syntax and compilation techniques for unfolding recursive computation in space or in time and we illustrate it with simple benchmark-style examples. We examine the performance of the benchmarks against conventional CPU-based execution models.