A brief description of the design flow of a hardware and software heterogeneous system highlighting key features. More detail of the flow is contained in reference [11].

Source publication

Figure 1. Bandwidth/memory distribution in Xilinx Virtex-7 FPGA which...

Figure 3. A brief description of the design flow of a hardware and...

Table 4 . Comparison of IPPro against other FPGA-based processor...

Figure 7. Impact of the various datapath models 1 , 2 , 3 on f max...

Figure 8. Block diagram of FPGA-based soft core Image Processing...

FPGA-Based Processor Acceleration for Image Processing Applications

Article

Full-text available

Jan 2019

FPGA-based embedded image processing systems offer considerable computing resources but present programming challenges when compared to software systems. The paper describes an approach based on an FPGA-based soft processor called Image Processing Processor (IPPro) which can operate up to 337 MHz on a high-end Xilinx FPGA family and gives details o...

Context 1

... developed tool flow ( Figure 3) starts with a user-defined RVC-CAL description composed of actors selected to execute in FPGA-based soft cores with the rest to be run in the host CPUs. By analyzing behaviour, software/hardware partitioning is decided by two main factors, the actors with the worse execution time (determined exactly by number of instructions and the average waiting time to receive the input tokens and send the produced tokens), and the overheads incurred in transferring the image data to/from the accelerator. ...

View in full-text

ZUCL 2.0: Virtualised Memory and Communication for ZYNQ UltraScale+ FPGAs

Conference Paper

Full-text available

Sep 2019

This paper introduces ZUCL 2.0, which extends abstraction services for FPGA applications on ARM-FPGA hybrids. The ZUCL 2.0 management services include 1) FPGA multi-tasking and context-switching based on dynamic reconfiguration and cooperative scheduling, 2) communication abstraction based on the ARM AMBA standard, and 3) memory isolation for priva...

Digital Miniature Cathode Ray Magnetometer

Article

Full-text available

Apr 2024

In this study, we introduce the concept and construction of an innovative Digital Miniature Cathode Ray Magnetometer designed for the precise detection of magnetic fields. This device addresses several limitations inherent to magnetic probes such as D.C. offset, nonlinearity, temperature drift, sensor aging, and the need for frequent recalibration, while capable of operating in a wide range of magnetic fields. The core principle of this device involves the utilization of a charged particle beam as the sensitivity medium. The system leverages the interaction of an electron beam with a scintillator material, which then emits visible light that is captured by an imager. The emitted scintillation light is captured by a CMOS sensor. This sensor not only records the scintillation light but also accurately determines the position of the electron beam, providing invaluable spatial information crucial for magnetic field mapping. The key innovation lies in the combination of electron beam projection, CMOS imager scintillation-based detection, and digital image signal processing. By employing this synergy, the magnetometer achieves remarkable accuracy, sensitivity and dynamic range. The precise position registration enabled by the CMOS sensor further enhances the device’s utility in capturing complex magnetic field patterns, allowing for 2D field mapping. In this work, the optimization of the probe’s performance is tailored for applications related to the characterization of insertion devices in light sources, including undulators.

Model-based Design of a High-Throughput Canny Edge Detection Accelerator on Zynq-7000 FPGA

Article

Full-text available

Apr 2024

This paper presents a novel approach for fast FPGA prototyping of the Canny edge detection algorithm using High-Level Synthesis (HLS) based on the HDL Coder. Traditional RTL-based design methodologies for implementing image processing algorithms on FPGAs can be time-consuming and error-prone. HLS offers a higher level of abstraction, enabling designers to focus on algorithmic functionality while the tool automatically generates efficient hardware descriptions. This advantage was exploited by implementing the Canny edge detection algorithm in MATLAB/Simulink and utilizing the HDL Coder to automatically convert it into synthesizable VHDL code. This design flow significantly reduces development time and complexity compared to the traditional RTL approach. The experimental results showed that the HLS-based Canny edge detector achieved real-time performance on a Xilinx FPGA platform, showcasing the effectiveness of the proposed approach for fast FPGA prototyping in image processing applications.

Fast FPGA-Based Image Feature Extraction for Data Fusion in Autonomous Vehicles.

Article

Full-text available

Nov 2023

Computer vision plays a critical role in many applications, particularly in the domain of autonomous vehicles. To achieve high-level image processing tasks such as image classification and object tracking, it is essential to extract low-level features from the image data. However, in order to integrate these compute-intensive tasks into a control loop, they must be completed as quickly as possible. This paper presents a novel FPGA-based system for fast and accurate image feature extraction, specifically designed to meet the constraints of data fusion in autonomous vehicles. The system computes a set of generic statistical image features, including contrast, homogeneity, and entropy, and is implemented on two Xilinx FPGA platforms - an Alveo U200 Data Center Accelerator Card and a Zynq UltraScale+ MPSoC ZCU104 Evaluation Kit. Experimental results show that the proposed system achieves high-speed image feature extraction with low latency, making it well-suited for use in autonomous vehicle systems that require real-time image processing. The presented system can also be easily extended to extract additional features for various image and data fusion applications.

Hardware acceleration design of the SHA-3 for high throughput and low area on FPGA

Article

Full-text available

Aug 2023

In sensitive communications, the cryptographic hash function plays a crucial role, including in the military, healthcare, and banking, ensuring secure transmission by verifying data integrity and carrying out other vital tasks. Compared to other cryptographic hash algorithms, such as SHA-1 and SHA-2, the Keccak hash function (SHA-3) boasts superior hardware performance and is more resilient to modern cryptanalysis techniques. Nonetheless, hardware performance enhancements, such as boosting speed or reducing area usage, are constantly required. This research focuses on increasing the Keccak hash algorithm’s throughput rate by introducing a novel architecture that reduces the total number of clock cycles required to obtain the result of a hash function. Additionally, the new simplified structure of the round constant (RC) generator design assures a reasonably low area and achieves the highest throughput and efficiency. Thus, when implemented, it achieved the highest throughput of 19.515 Gbps, 24.428 Gbps, 33.393 Gbps, and 36.358 Gbps on FPGA devices with the Virtex-5, Artix-7, Virtex-6, and Virtex-7, respectively. Finally, our approach is compared to recently published designs.

FPGA to study the behavior of a maneuvering UGV using sliding innovation filter

Conference Paper

Jun 2023

FPGA implementation of breast cancer detection using SVM linear classifier

Article

Full-text available

Mar 2023
MULTIMED TOOLS APPL

The Support Vector Machine (SVM) can be used to perform linear and nonlinear operations to solve regression and classification problems. The SVM algorithm is straightforward, generating a line or a hyperplane that can be used for separating different classes of data. However, due to its high computational complexity, SVM is a time-consuming algorithm when modeled solely with software. Various researchers attempted to implement SVM in hardware particularly on field-programmable gate array (FPGA) platforms in order to achieve high performance at lower cost and power consumption. As a result, the algorithm is unsuitable for embedded real-time applications. Therefore, SVM linear classifier is implemented on hardware which decreases the latency and executes the task in real time. In this paper, an SVM linear classifier with pipeline architecture is proposed for fast processing in Verilog HDL using a single-precision IEEE standard 754 number format. In order to perform a study related to hardware resource utilization and timing for the WBCD breast cancer datasets. The various performance metrics such as resource utilization, on-chip power consumption, and static timing analysis with constraints are evaluated. The accuracy rate is computed both using software and hardware for performance evaluation. The pipelined SVM architecture is designed using Verilog HDL, and then it is synthesized using the Vivado simulation tool. The design is configured to the Xilinx KC705 Kintex-7 evaluation board for implementation. This paper mainly focuses on the design of an SVM linear classifier with pipelined architecture for FPGA implementation. The FPGA-based two-class SVM classifier can perform fast data classification due to the advanced parallel calculation feature provided by FPGA. The classification system operates in a linear fashion. The simulation and synthesis results show that the SVM linear classification system can be able to classify data effectively.

Proposal of Smith-Waterman algorithm on FPGA to accelerate the forward and backtracking steps

Article

Full-text available

Jun 2022
PLOS ONE

In bioinformatics, alignment is an essential technique for finding similarities between biological sequences. Usually, the alignment is performed with the Smith-Waterman (SW) algorithm, a well-known sequence alignment technique of high-level precision based on dynamic programming. However, given the massive data volume in biological databases and their continuous exponential increase, high-speed data processing is necessary. Therefore, this work proposes a parallel hardware design for the SW algorithm with a systolic array structure to accelerate the forward and backtracking steps. For this purpose, the architecture calculates and stores the paths in the forward stage for pre-organizing the alignment, which reduces the complexity of the backtracking stage. The backtracking starts from the maximum score position in the matrix and generates the optimal SW sequence alignment path. The architecture was validated on Field-Programmable Gate Array (FPGA), and synthesis analyses have shown that the proposed design reaches up to 79.5 Giga Cell Updates per Second (GCPUS).

A Soft Coprocessor Approach for Developing Image and Video Processing Applications on FPGAs

Article

Full-text available

Feb 2022

Developing Field Programmable Gate Array (FPGA)-based applications is typically a slow and multi-skilled task. Research in tools to support application development has gradually reached a higher level. This paper describes an approach which aims to further raise the level at which an application developer works in developing FPGA-based implementations of image and video processing applications. The starting concept is a system of streamed soft coprocessors. We present a set of soft coprocessors which implement some of the key abstractions of Image Algebra. Our soft coprocessors are designed for easy chaining, and allow users to describe their application as a dataflow graph. A prototype implementation of a development environment, called SCoPeS, is presented. An application can be modified even during execution without requiring re-synthesis. The paper concludes with performance and resource utilization results for different implementations of a sample algorithm. We conclude that the soft coprocessor approach has the potential to deliver better performance than the soft processor approach, and can improve programmability over dedicated HDL cores for domain-specific applications while achieving competitive real time performance and utilization.

Fast γ Photon Imaging for Inner Surface Defects Detecting

Article

Full-text available

Dec 2021
SENSORS-BASEL

Only a few effective methods can detect internal defects and monitor the internal state of complex structural parts. On the basis of the principle of PET (positron emission computed tomography), a new measurement method, using γ photon to detect defects of an inner surface, is proposed. This method has the characteristics of strong penetration, anti-corrosion and anti-interference. With the aim of improving detection accuracy and imaging speed, this study also proposes image reconstruction algorithms, combining the classic FBP (filtered back projection) with MLEM (maximum likelihood expectation Maximization) algorithm. The proposed scheme can reduce the number of iterations required, when imaging, to achieve the same image quality. According to the operational demands of FPGAs (field-programmable gate array), a BPML (back projection maximum likelihood) algorithm is adapted to the structural characteristics of an FPGA, which makes it feasible to test the proposed algorithms therein. Furthermore, edge detection and defect recognition are conducted after reconstructing the inner image. The effectiveness and superiority of the algorithm are verified, and the performance of the FPGA is evaluated by the experiments.

Parallel Implementation of Smith-Waterman Algorithm on FPGA

Preprint

Full-text available

Jul 2021

In bioinformatics, alignment is an essential technique for finding similarities between biological sequences. Usually, the alignment is performed with the Smith-Waterman (SW) algorithm, a well-known sequence alignment technique of high-level precision based on dynamic programming. However, given the massive data volume in biological databases and their continuous exponential increase, high-speed data processing is necessary. Therefore, this work proposes a parallel hardware design for the SW algorithm with a systolic array structure to accelerate the Forward and Backtracking steps. For this purpose, the architecture calculates and stores the paths in the Forward stage for pre-organizing the alignment, which reduces the complexity of the Backtracking stage. The backtracking starts from the maximum score position in the matrix and generates the optimal SW sequence alignment path. The architecture was validated on Field-Programmable Gate Array (FPGA), and synthesis analyses have shown that the proposed design reaches up to 79.5 Giga Cell Updates per Second (GCPUS).

A brief description of the design flow of a hardware and software heterogeneous system highlighting key features. More detail of the flow is contained in reference [11].

Context in source publication

Similar publications

Citations