2: A block diagram showing the memory hierarchy of a computer. The RF can be loosely coupled, tightly coupled or attached as co-processor. Source: Reconfigurable Computing[19].

Source publication

General Purpose Computing with Reconfigurable Acceleration

Conference Paper

Full-text available

Aug 2010

In this paper we describe a new generic approach for accelerating software functions using a reconfigurable device connected through a high-speed link to a general purpose system. As opposed to related ISA extension approaches, we insert system calls to the original program at hand to control the reconfigurable accelerator. The reconfigurable devic...

Context 1

... Section 2.2 we discuss how a reconfigurable fabric can be integrated into a traditional computing system. Subsequently, in Section 2.3 we discuss high performance computing with reconfigurable acceleration, and take a more 8 CHAPTER 2. BACKGROUND Figure 2.1: An example of how Computational blocks can be connected in a reconfig- urable fabric. ...

View in full-text

Context 2

... reconfigurable fabric is one of the basic requirements for reconfigurable computing. The reconfigurable fabric is a specially designed chip consisting of computing elements and an interconnect such as in Figure 2.1. The grid formation shown in the figure is not necessarily the way these devices are implemented but the underlying principles are the same. ...

View in full-text

Context 3

... general purpose computing with reconfigurable acceleration, the reconfigurable fabric must somehow be connected to an existing host processor. Figure 2.2 shows the different levels at which the fabric can be connected in the memory hierarchy. ...

View in full-text

Context 4

... Altix provides support for reconfigurable computing in the form of their recon- figurable Application Specific Computing (RASC) program [20]. In RASC, an FPGA ( Figure 2.3) is connected to the SGI NUMAlink [20] interconnect as a co-processor. NU- MAlink is a high bandwidth and low latency interconnect which is used to connect processors, memory, and other components in Altix machines. ...

View in full-text

Context 5

... Core Services block provides the interface between the Algorithmic block and the host system, and to do so implements the following features: It Implements the Scalable System Port (SSP), which allows communication over the NUMAlink; Provides read and write access to the SRAMs from both the host system and the algorithmic block; Allows single and multi step debugging of the algorithmic block; And provides access to the algorithms debug port and registers. Figure 2.4 shows that a device driver provides access to the FPGAs core services from software through system calls. ...

View in full-text

Context 6

... Convey architecture (Figure 2.5) consists of off-the-shelf Intel processors in combi- nation with a reconfigurable co-processor. ...

View in full-text

Context 7

... the co-processor shares a cache-coherent view of the global virtual memory with the host processor. The co-processor consists of three components (Figure 2.6): the Application Engine Hub (AEH), Memory Controllers (MCs), and the Application Engines (AEs). The AEH is responsible for the interface to the host processor and I/O chipset, and fetching instruc- tions. ...

View in full-text

Context 8

... how to make an architecture where applications can be easily ported to different hardware using the same platform without major redesign. The MOLEN architecture (Figure 2.7) consists of a GPP and a reconfigurable accelerator which can communicate through a set of registers called Exchange Registers (XREG). A program is executed on the GPP, with certain computationally intensive functions implemented as accelerators. ...

View in full-text

AES 256 Key Secured FPGA Communication using Bluetooth & XBEE

Article

Full-text available

Sep 2015

Now a day's extended number of wireless communication users have increasing demand of security and protecting data transmitted by the user over unsecured network so that unauthorized persons cannot access it. the data share through wireless network so it must provide data with authentication. The security between the wireless devices is most import...

Integration Issues of a run-Time Configurable Memory Management Unit to a RISC Processor on FPGA

Article

Dec 2016
MICROPROCESS MICROSY

This paper presents the integration issues of a proposed run-time configurable Memory Management Unit (MMU) to the COFFEE processor developed by our group at Tampere University of Technology. The MMU consists of three Translation Lookaside Buffers (TLBs) in two levels of hierarchy. The MMU and its respective integration to the processor is prototyped on a Field Programmable Gate Array (FPGA) device. Furthermore, analytical results of scaling the second-level Unified TLB (UTLB) to three configurations (with 16, 32, and 64 entries) with respect to the effect on overall hit rate as well as the energy consumption are shown. The critical path analysis of the logical design running on the target FPGA is presented together with a description of optimization techniques to improve static timing performance which leads to gain 22.75% speed-up. We could reach to our target operating frequency of 200 MHz for the 64-entry UTLB and, thus, it is our preferred option. The 32-entry UTLB configuration provides a decent trade-off for resource-constrained or speed-critical hardware designs while the 16-entry configuration poses unsatisfactory performance. Next, integration challenges and how to resolve each of them (such as employing a wrapper around the MMU, modifying the hardware description of the COFFEE core, etc.) are investigated in detail. This paper not only provides invaluable information with regard to the implementation and integration phases of the MMU to a RISC processor, it opens a new horizon to our processor to provide virtual memory for its running operating system without degrading the operating frequency. This work also tends toward being a general reference for future integration to the COFFEE core as well as other similar processor architectures.

Design, implementation and analysis of a run-time configurable Memory Management Unit on FPGA

Conference Paper

Oct 2015

Heterogeneous CPU/FPGA Reconfigurable Computing System for Avionic Test Application

Conference Paper

Full-text available

May 2013

Real-time computing systems are increasingly used in aerospace and avionic industries. In the face of power wall and real-time requirements, hardware designers are directed towards reconfigurable computing with the usage of heterogeneous CPU/FPGA systems. However, there is a lack of real-time environments able to deal with the execution of applications on such heterogeneous systems dedicated to avionic Test and Simulation (T&S). This research investigates the problem of soft real-time environments for CPU/FPGA systems and proposes first a high-performance hardware architecture used to implement intimately coupled hardware and software avionic models. Second, this paper presents the description of an efficient real-time software environment for the model's execution, the multi-core CPU monitoring and the runtime task re-allocation to avoid the timing constraint violation. Experimental results underpin the industrial relevance of the presented approach for avionic T&S systems with real-time support.

Reconfigurable acceleration and dynamic partial self-reconfiguration in general purpose computing

Conference Paper

Dec 2011

A prototyping environment for high performance reconfigurable computing

Conference Paper

Full-text available

Jul 2011

In the face of power wall and high performance requirements, designers of hardware architectures are directed more and more towards reconfigurable computing with the usage of heterogeneous CPU/FPGA systems. In such architectures, multi-core processors come with high computation rates while the reconfigurable logic offers high performance per watt and adaptability to the application constraints. However, the design of heterogeneous architectures is facing extremely challenging requirements such as the appropriate programming model, design tools, and the rapid system prototyping. Focusing this issue, we present a prototyping environment for heterogeneous CPU/FPGA systems. Within this environment, we conceived a generic and scalable architecture based on a multi-core processor tightly-connected to FPGA in order to meet performance, power and flexibility goals. Furthermore, front-end interfaces are presented in order to establish communication, data sharing, and synchronisation between the different software and hardware processing units. Finally, we defined a design methodology that eases the development of applications onto heterogeneous systems. Our environment is conceived using standard host machine coupled with a Xilinx Virtex 6 FPGA through the PCI Express standard bus. In the experimental part, we evaluate first the reliability of different CPU/FPGA communication solutions in order to bring real-time capabilities to our system. Secondly, we demonstrate the efficiency of the presented design methodology for heterogeneous systems through the FIR signal processing application.

TARCAD: A template architecture for reconfigurable accelerator designs

Conference Paper

Full-text available

Jul 2011

In the race towards computational efficiency, accelerators are achieving prominence. Among the different types, accelerators built using reconfigurable fabric, such as FPGAs, have a tremendous potential due to the ability to customize the hardware to the application. However, the lack of a standard design methodology hinders the adoption of such devices and makes the portability and reusability across designs difficult. In addition, generation of highly customized circuits does not integrate nicely with high level synthesis tools. In this work, we introduce TARCAD, a template architecture to design reconfigurable accelerators. TARCAD enables high customization in the data management and compute engines while retaining a programming model based on generic programming principles. The template provides generality and scalable performance over a range of FPGAs. We describe the template architecture in detail and show how to implement five important scientific kernels: MxM, Acoustic Wave Equation, FFT, SpMV and Smith Waterman. TARCAD is compared with other High Level Synthesis models and is evaluated against GPUs, a well-known architecture that is far less customizable and, therefore, also easier to target from a simple and portable programming model. We analyze the TARCAD template and compare its efficiency on a large Xilinx Virtex-6 device to that of several recent GPU studies.

Reconfigurable Computing Platforms and Target System Architectures for Automatic HW/SW Compilation

Article

Apr 2011

Holger Lange

Artificial Intelligence Based Automated Appliances in Smart Home

Conference Paper

Nov 2023

ASIP acceleration for virtual-to-physical address translation on RDMA-enabled FPGA-based network interfaces

Article

Jan 2015
FUTURE GENER COMP SY

We developed a point-to-point, low latency, 3D torus Network Controller integrated in an FPGA-based PCIe board which implements a Remote Direct Memory Access (RDMA) communication protocol. RDMA requires ability to directly access the remote node application memory with minimal OS or CPU intervention. To this purpose, a key element is the design of a direct memory writing mechanism to address the destination buffers; on Virtual Memory supporting OSes this corresponds to a number of page-segmented DMAs. To minimally affect overall performance, mechanisms with lowest possible latency are needed for either Virtual-to-Physical address translation and registered buffers list scanning. In a first implementation these tasks were set on a soft-core on the FPGA, leading to a 1.6 s latency to process a single packet and limiting the peak bandwidth. As a second trial, we present an accelerated version for these time-critical network functions exploiting an application-specific processor (ASIP) designed using a retargetable ASIP development toolsuite that allows architectural exploration. Benchmark results for Buffer Search and Virtual-to-Physical tasks on the ASIP show improvements for latency with up to ten times lower cycles cost compared with the soft-core .

A new technique of embedding multigrain parallel HPRC in OR1200 a soft-core processor

Conference Paper

Feb 2012

In the embedded era, reconfigurable components comes in three forms of IP Intellectual Property cores i) Soft core ii) Firm core and iii) Hard Core. This paper presents a new technique of embedding multigrain parallel processing HPRC using FPGA in the CPU/DSP unit of OR1200 a soft-core RISC processor. The core performance is increased by placing a multigrain parallel processing HPRC internally in the Integer Execution Pipeline unit of the CPU/DSP core. Depending on the complexity/depth of the code, the dependency level of vertices DL were created and numbers of threads N were created to run the code parallel in HPRC. Multigrain parallel processing HPRC is achieved by two function i) HPRC_Parallel_Start to trigger the parallel thread ii) HPRC_Parallel_End to stop the thread. In the first phase of this paper a Verilog HDL functional code is developed and synthesised using XIINX ISE and in the second phase a CoreMark processor core benchmark is used to test the performance of the reconfigured IP soft core.

2: A block diagram showing the memory hierarchy of a computer. The RF can be loosely coupled, tightly coupled or attached as co-processor. Source: Reconfigurable Computing[19].

Contexts in source publication

Similar publications

Citations