Fig 2 - uploaded by François Berry
Content may be subject to copyright.
An example of actor description in CAPH  

An example of actor description in CAPH  

Source publication
Conference Paper
Full-text available
We introduce CAPH, a new domain-specific language (DSL) suited to the implementation of stream-processing applications on field programmable gate arrays (FPGA). \caph relies upon the actor/dataflow model of computation. Applications are described as networks of purely dataflow actors exchanging tokens through unidirectional channels. The behavior o...

Similar publications

Chapter
Full-text available
We introduce CAPH, a new domain-specific language (DSL) suited to the implementation of stream-processing applications on field programmable gate arrays (FPGA). CAPH relies upon the actor/dataflow model of computation. Applications are described as networks of purely dataflow actors exchanging tokens through unidirectional channels. The behavior of...

Citations

... Separated-memory FIFOs with minimal size have been used. The minimal size that ensures reaching the end of the computation has been evaluated through a SystemC simulation of the system generated with CAPH [19], which automatically reports the maximal usage of each buffer. ...
Conference Paper
Full-text available
Multithreading is a well-known technique for general-purpose systems to deliver a substantial performance gain, raising resource efficiency by exploiting underutilization periods. With the increase of specialized hardware, resource efficiency became fundamental to master the introduced overhead of such kind of devices. In this work, we propose a model-based approach for designing specialized multithread hardware accelerators. This novel approach exploits dataflow models of applications and tagged tokens to let the resulting hardware support concurrent threads without the need to replicate the whole accelerator. Assessment is carried out over different versions of an accelerator for a compute-intensive step of modern video coding algorithms, under several feeding configurations. Results highlight that the proposed multithread accelerators achieve a valuable tradeoff: saving computational resources with respect to replicated parallel single-thread accelerators, while guaranteeing shorter waiting, response, and elaboration time than a unique single-thread accelerator multiplexed in time.
... These hardware architectures are able to commence processing of the image as soon as the necessary pixels are received and continue processing the rest of the arriving image as a pipeline, giving rise to both low-latency and high-throughput operations. Indeed, to facilitate the design of complex streaming image-processing hardware, some FPGA-hardware generators have already been proposed, often relying on the use of domain-specific languages (DSLs) as a bridge between the algorithm designer and the lower-level hardware [5][6][7][8]. In our previous work, SWIM [9], a streaming line buffer generator, was also proposed to address the complexities of rearranging misaligned multi-pixel blocks for ultra high-input throughput applications. ...
Article
Full-text available
Parallel hardware designed for image processing promotes vision-guided intelligent applications. With the advantages of high-throughput and low-latency, streaming architecture on FPGA is especially attractive to real-time image processing. Notably, many real-world applications, such as region of interest (ROI) detection, demand the ability to process images continuously at different sizes and resolutions in hardware without interruptions. FPGA is especially suitable for implementation of such flexible streaming architecture, but most existing solutions require run-time reconfiguration, and hence cannot achieve seamless image size-switching. In this paper, we propose a dynamically-programmable buffer architecture (D-SWIM) based on the Stream-Windowing Interleaved Memory (SWIM) architecture to realize image processing on FPGA for image streams at arbitrary sizes defined at run time. D-SWIM redefines the way that on-chip memory is organized and controlled, and the hardware adapts to arbitrary image size with sub-100 ns delay that ensures minimum interruptions to the image processing at a high frame rate. Compared to the prior SWIM buffer for high-throughput scenarios, D-SWIM achieved dynamic programmability with only a slight overhead on logic resource usage, but saved up to 56 % of the BRAM resource. The D-SWIM buffer achieves a max operating frequency of 329.5 MHz and reduction in power consumption by 45.7 % comparing with the SWIM scheme. Real-world image processing applications, such as 2D-Convolution and the Harris Corner Detector, have also been used to evaluate D-SWIM’s performance, where a pixel throughput of 4.5 Giga Pixel/s and 4.2 Giga Pixel/s were achieved respectively in each case. Compared to the implementation with prior streaming frameworks, the D-SWIM-based design not only realizes seamless image size-switching, but also improves hardware efficiency up to 30 × .
... CAPH [29], Pyrope [30] and Chisel [31] are HDLs which provide data-flow based design models. CAPH and Pyrope are independent languages while Chisel is an EDSL in Scala. ...
Conference Paper
Full-text available
Synchronous Message Exchange (SME) is a CSP-derived model for hardware designs implementing globally synchronous message passing. SME implementations currently exist for several general-purpose languages, some of which, are trans-latable to VHDL for subsequent implementation on hardware. A common SME language could reduce the duplication and feature disparity present in these independent implementations. This paper introduces a domain-specific language for implementing SME designs. It is usable both as a primary implementation language for SME models and as an intermediate target for general-purpose languages. We describe the language, its implementation and its features. Furthermore, we explain the specific requirements for a language within this domain. Finally, we evaluate the language through a number of simple, but realistic, hardware designs by showing how they may be implemented and tested.
... In general the code has to undergo various optimizations and transformations before the actual HDL generation, Optimized Software C is not equal to Optimized Hardware C Mappers can parallelize and pipeline C code however they generally cannot automatically instantiate multiple functional units. compiler has to identify parallelism in the sequential code before mapping it onto the target hardware because C is intrinsically sequential whereas hardware is truly concurrent [65,60]. ...
Article
Full-text available
FPGAs have achieved quick acceptance, spread and growth over the past years because they can be applied to a variety of applications. Some of these applications includes: random logic, bioinformatics, video and image processing, device controllers, communication encoding, modulation, and filtering, limited size systems with RAM blocks, and many more. For example, for video and image processing application it is very difficult and time consuming to use traditional HDL languages, so it’s obligatory to search for other efficient, synthesis tools to implement your design. The question is what is the best comparable language or tool to implement desired application. Also this research is very helpful for language developers to know strength points, weakness points, ease of use and efficiency of each tool or language. This research faced many challenges one of them is that there is no complete reference of all FPGA languages and tools, also available references and guides are few and almost not good. Searching for a simple example to learn some of these tools or languages would be a time consuming. This paper represents a review study or guide of almost all PLD's languages, interpreters and tools that can be used for programming, simulating and synthesizing PLD's for analog, digital & mixed signals and systems supported with simple examples. In addition, their features and applications will be discussed. In Addition to the aforementioned summary, the paper presents a new classification for FPGA languages and Tools as well. Interpreters like PERL and MYHHDL (python based) also described here. At the end one can summarize all what’s needed to know about FPGA languages, Tools, Compilers, and Interpreters in one research paper.
... CONCLUSION AND FUTURE WORK We have described the application of a real-time object detection application on a FPGA-based smart camera architecture.For this application, we have shown that an efficient implementation can be obtained with a dataflow based language whose abstraction level is significantly higher than that of traditional HDL languages such as VHDL or Verilog. Together with previous experimentations [10], [26], this confirms that, for applications having to operate on the fly on video data streams, the dataflow model of computation, used jointly as a programming model and an execution model can offer a very effective way to conciliate abstraction and efficiency when programming FPGAs. This in turn opens significant opportunities to exploit this kind of devices in architectures such as smart cameras, since, in the mid and long term, it is not realistic to require that programmers of these architectures rely on low-level hardware description languages. ...
Article
Full-text available
Embedded computer vision based smart systems raise challenging issues in many research fields, including real-time vision processing, communication protocols or distributed algorithms. The amount of data generated by cameras using high resolution image sensors requires powerful computing systems to be processed at digital video frame rates. Consequently, the design of efficient and flexible smart cameras, with on-board processing capabilities, has become a key issue for the expansion of smart vision systems relying on decentralized processing at the image sensor node level. In this context, FPGA-based platforms, supporting massive data parallelism, offer large opportunities to match real-time processing constraints compared to platforms based on general purpose processors. In this paper, we describe the implementation, on such a platform, of a configurable object detection application, reformulated according to the dataflow model of computation. The application relies on the computation of the histogram of oriented gradients (HOG) and a linear SVM-based classification. It is described using the CAPH programming language, allowing efficient hardware descriptions to be generated automatically from high level dataflow specifications without prior knowledge of hardware description languages such as VHDL or Verilog. Results show that the performance of the generated code does not suffer from a significant overhead compared to handwritten HDL code.
... In response to the challenge of obtaining efficient FPGA implementation from high-level specifications, the authors have recently introduced CAPH, a high-level, domain specific language (DSL) built upon the dataflow computation model for programming FPGAs. Preliminary results [17,18] have already shown that, using CAPH, efficient implementations of simple vision applications can be obtained with a language whose abstraction level is significantly higher than that of traditional HDL languages such as VHDL or Verilog. The goal of this paper is to provide a more thorough assessment of the CAPH language by applying it to the implementation of a complex, realistic real-time image processing (RTIP) application on a FPGA-based smart camera. ...
... Together with previous experiments [1,17], this confirms that, at least for this kind of application, operating on the fly on data streams, the dataflow model of computation, can offer a very effective way to conciliate abstraction and efficiency when programming FPGAs. This is turn opens significant opportunities to exploit this kind of devices in architectures such as smart cameras, since, in the mid and long term, it is not realistic to require that programmers of these architectures rely on low-level hardware description languages. ...
... In response to the challenge of obtaining efficient FPGA implementation from high-level specifications, the authors have recently introduced CAPH, a high-level, domain specific language (DSL) built upon the dataflow computation model for programming FPGAs. Preliminary results [17,18] have already shown that, using CAPH, efficient implementations of simple vision applications can be obtained with a language whose abstraction level is significantly higher than that of traditional HDL languages such as VHDL or Verilog. The goal of this paper is to provide a more thorough assessment of the CAPH language by applying it to the implementation of a complex, realistic real-time image processing (RTIP) application on a FPGA-based smart camera. ...
... Together with previous experiments [1,17], this confirms that, at least for this kind of application, operating on the fly on data streams, the dataflow model of computation, can offer a very effective way to conciliate abstraction and efficiency when programming FPGAs. This is turn opens significant opportunities to exploit this kind of devices in architectures such as smart cameras, since, in the mid and long term, it is not realistic to require that programmers of these architectures rely on low-level hardware description languages. ...
Article
Full-text available
We describe the application of the caph programming language to the implementation of complex real-time image processing application on a FPGA embedded in a smart camera architecture. caph is based upon the dataflow model of computation. Applications are described as networks of purely dataflow actors exchanging tokens through unidirectional channels and the behavior of each actor is defined as a set of transition rules using pattern matching. We show that this model is naturally suited to the description of applications operating on the fly on digital video streams and supports a fully automated compilation path producing efficient VHDL code. This is demonstrated on an application performing the extraction of HOG (histogram of oriented gradient) feature vectors in real time on the dreamcam smart camera, an experimental platform developed at our institute.
... In this paper we define a hardware block as the implementation of a certain computer vision algorithm in any HDL language (e.g., Verilog, VHDL, CAPH [8]). Every block requires at least one input and one output with the related data-valid signals, that notify the validity of the data to the following blocks. ...
Conference Paper
Full-text available
Smart Camera Networks (SCNs) is nowadays an emerging research field which represents the natural evolution of centralized computer vision applications towards full distributed and pervasive systems. In such a scenario, one of the biggest effort is in the definition of a flexible and reconfigurable SCN node architecture able to remotely support the possibility of updating the application parameters and changing the running computer vision applications at run-time. In this respect, this paper presents a novel SCN node architecture based on a device in which a microcontroller manages all the network functionality as well as the remote configuration, while an FPGA implements all the necessary module of a full computer vision pipeline. In the paper the envisioned architecture is first detailed in general terms, then a real implementation is presented to show the feasibility and the benefits of the proposed solution. Finally, performance evaluation results prove the potential of hardware software codesign in reaching flexibility and reduced latency time.
... One idea introduced by the research community is to restrict the domain of applications by designing a domain specific language (DSL) to better utilize FP-GAs for that specific domain. The CAPH [79] language is an example of this approach. CAPH is a high-level language for implementing stream-processing applications on FPGAs relying the dataflow/actor-oriented model of computation (MoC). ...
... This intermediate representation is then transformed into VHDL. The transformation process -described in detail in [79,83]-is illustrated in Fig. 4.7-a using the sub actor as an example. In this figure , the small number besides each transition gives the number of the corresponding rule in the CAPH code. ...
... This section will compare FPGA implementations of JPEG encoder parts using different development methodologies/tools. These methodologies include handwritten VHDL code and automatically generated code from two dataflow compilers, CAPH [79] and CAL [123]. The JPEG encoder is selected as application for comparison because of its complex implementation and intensive computations. ...
Article
Field Programmable Gate Arrays (FPGAs) are reconfigurable devices which can outperform General Purpose Processors (GPPs) for applications exhibiting parallelism. Traditionally, FPGAs are programmed using Hardware Description Languages (HDLs) such as Verilog and VHDL. Using these languages generally offers the best performances but the programmer must be familiar with digital design. This creates a barrier for the software community to use FPGAs and limits their adoption as a computing solution. To make FPGAs accessible to both software and hardware programmers, a number of tools have been proposed both by academia and industry providing high-level programming environment. A widely used approach is to convert C-like languages to HDLs, making it easier for software programmers to use FPGAs. But these approaches generally do not provide performances on the par with those obtained with HDL languages. The primary reason is the inability of C-like approaches to express parallelism. Our claim is that in order to have a high level programming language for FPGAs as well as not to compromise on performance, a shift in programming paradigm is required. We think that the Dataflowow / actor programming model is a good candidate for this. This thesis explores the adoption of Dataflow / actor programming model for programming FPGAs. More precisely, we assess the suitability of CAPH, a domain-specific language based on this programming model for the description and implementation of stream-processing applications on FPGAs. The expressivity of the language and the efficiency of the generated code are assessed experimentally using a set of test bench applications ranging from very simple applications (basic image filtering) to more complex realistic applications such as motion detection, Connected Component Labeling (CCL) and JPEG encoder.
... L'article n'a pas vocationà faire une présentation complète du langage. Une telle présentation peutêtre trouvée par exemple dans [5] et, de manière plus formelle dans le manuel de référence du langage [6]. On se contente dans cette section d'une présentation informelle par l'exemple. ...