A brief description of the design flow of a hardware and software heterogeneous system highlighting key features. More detail of the flow is contained in reference [11].

A brief description of the design flow of a hardware and software heterogeneous system highlighting key features. More detail of the flow is contained in reference [11].

Source publication
Article
Full-text available
FPGA-based embedded image processing systems offer considerable computing resources but present programming challenges when compared to software systems. The paper describes an approach based on an FPGA-based soft processor called Image Processing Processor (IPPro) which can operate up to 337 MHz on a high-end Xilinx FPGA family and gives details o...

Context in source publication

Context 1
... developed tool flow ( Figure 3) starts with a user-defined RVC-CAL description composed of actors selected to execute in FPGA-based soft cores with the rest to be run in the host CPUs. By analyzing behaviour, software/hardware partitioning is decided by two main factors, the actors with the worse execution time (determined exactly by number of instructions and the average waiting time to receive the input tokens and send the produced tokens), and the overheads incurred in transferring the image data to/from the accelerator. ...

Similar publications

Conference Paper
Full-text available
This paper introduces ZUCL 2.0, which extends abstraction services for FPGA applications on ARM-FPGA hybrids. The ZUCL 2.0 management services include 1) FPGA multi-tasking and context-switching based on dynamic reconfiguration and cooperative scheduling, 2) communication abstraction based on the ARM AMBA standard, and 3) memory isolation for priva...

Citations

... A pivotal shift is underway as we migrate from a computer-based infrastructure to a Field Programmable Gate Array (FPGA) device. By leveraging FPGA's parallel processing power, we anticipate a profound boost in computational efficiency, paving the way for more intricate and resource-demanding image processing algorithms [22,23]. ...
Article
Full-text available
In this study, we introduce the concept and construction of an innovative Digital Miniature Cathode Ray Magnetometer designed for the precise detection of magnetic fields. This device addresses several limitations inherent to magnetic probes such as D.C. offset, nonlinearity, temperature drift, sensor aging, and the need for frequent recalibration, while capable of operating in a wide range of magnetic fields. The core principle of this device involves the utilization of a charged particle beam as the sensitivity medium. The system leverages the interaction of an electron beam with a scintillator material, which then emits visible light that is captured by an imager. The emitted scintillation light is captured by a CMOS sensor. This sensor not only records the scintillation light but also accurately determines the position of the electron beam, providing invaluable spatial information crucial for magnetic field mapping. The key innovation lies in the combination of electron beam projection, CMOS imager scintillation-based detection, and digital image signal processing. By employing this synergy, the magnetometer achieves remarkable accuracy, sensitivity and dynamic range. The precise position registration enabled by the CMOS sensor further enhances the device’s utility in capturing complex magnetic field patterns, allowing for 2D field mapping. In this work, the optimization of the probe’s performance is tailored for applications related to the characterization of insertion devices in light sources, including undulators.
... DESIGN BASED ON ZYNQ7000 The system used Zedboard, a Zynq-7000 development board, an Avnet FMC-HDMI-CAM module, and a Python 1300-C camera [8]. FMC stands for FPGA Mezzanine Card, a standardized interface that facilitates high-speed connections between FPGAs and peripheral devices [9]. ...
... In the future, the system will have reduced on-chip components to optimize space utilization and streamline software complexity. The IP blocks that were first built were redesigned and integrated into Simulink [8]. The edge detection algorithm was integrated into a unified IP block. ...
Article
Full-text available
This paper presents a novel approach for fast FPGA prototyping of the Canny edge detection algorithm using High-Level Synthesis (HLS) based on the HDL Coder. Traditional RTL-based design methodologies for implementing image processing algorithms on FPGAs can be time-consuming and error-prone. HLS offers a higher level of abstraction, enabling designers to focus on algorithmic functionality while the tool automatically generates efficient hardware descriptions. This advantage was exploited by implementing the Canny edge detection algorithm in MATLAB/Simulink and utilizing the HDL Coder to automatically convert it into synthesizable VHDL code. This design flow significantly reduces development time and complexity compared to the traditional RTL approach. The experimental results showed that the HLS-based Canny edge detector achieved real-time performance on a Xilinx FPGA platform, showcasing the effectiveness of the proposed approach for fast FPGA prototyping in image processing applications.
... Similar to our work, Siddiqui et. al. [15] proposed an FPGA-based soft processor called Image Processing Processor (IPPro) for general purpose image processing. The authors designed and tested an instantiable component, since they focus on multi-core operation. ...
Article
Full-text available
Computer vision plays a critical role in many applications, particularly in the domain of autonomous vehicles. To achieve high-level image processing tasks such as image classification and object tracking, it is essential to extract low-level features from the image data. However, in order to integrate these compute-intensive tasks into a control loop, they must be completed as quickly as possible. This paper presents a novel FPGA-based system for fast and accurate image feature extraction, specifically designed to meet the constraints of data fusion in autonomous vehicles. The system computes a set of generic statistical image features, including contrast, homogeneity, and entropy, and is implemented on two Xilinx FPGA platforms - an Alveo U200 Data Center Accelerator Card and a Zynq UltraScale+ MPSoC ZCU104 Evaluation Kit. Experimental results show that the proposed system achieves high-speed image feature extraction with low latency, making it well-suited for use in autonomous vehicle systems that require real-time image processing. The presented system can also be easily extended to extract additional features for various image and data fusion applications.
... The hardware performance of SHA-3 is preferred over software due to its superior power, speed, and throughput implementation. Field-programmable gate array (FPGA) is preferred over application-specific integrated circuits (ASIC) as a hardware performance platform due to its lower price and shorter development time [10][11][12][13][14]. ...
Article
Full-text available
In sensitive communications, the cryptographic hash function plays a crucial role, including in the military, healthcare, and banking, ensuring secure transmission by verifying data integrity and carrying out other vital tasks. Compared to other cryptographic hash algorithms, such as SHA-1 and SHA-2, the Keccak hash function (SHA-3) boasts superior hardware performance and is more resilient to modern cryptanalysis techniques. Nonetheless, hardware performance enhancements, such as boosting speed or reducing area usage, are constantly required. This research focuses on increasing the Keccak hash algorithm’s throughput rate by introducing a novel architecture that reduces the total number of clock cycles required to obtain the result of a hash function. Additionally, the new simplified structure of the round constant (RC) generator design assures a reasonably low area and achieves the highest throughput and efficiency. Thus, when implemented, it achieved the highest throughput of 19.515 Gbps, 24.428 Gbps, 33.393 Gbps, and 36.358 Gbps on FPGA devices with the Virtex-5, Artix-7, Virtex-6, and Virtex-7, respectively. Finally, our approach is compared to recently published designs.
... A FPGA is a type of integrated circuit that contains programmable logic gates, memory, and other elements on a chip. FPGAs are used in a wide range of applications, including security [46,47], image processing [48,49], face and object recognition [50][51][52][53][54][55][56][57][58][59][60][61][62][63][64], quantum computing [65][66][67][68][69][70][71][72][73][74][75][76][77], and artificial intelligence [64,[78][79][80][81][82][83][84][85][86]. To demonstrate the performance of the system and filter, we used a Zynq-7000 FPGA [78,79,87]. ...
... One of the most significant limitations of current embedded GPUs, as addressed by the majority of researchers [10,28,32,34], is the limited existing resources like limited memory, cache, registers, and cores). As a result, GPUs are challenging to deploy in embedded environments, motivating a shift toward FPGA implementations. ...
Article
Full-text available
The Support Vector Machine (SVM) can be used to perform linear and nonlinear operations to solve regression and classification problems. The SVM algorithm is straightforward, generating a line or a hyperplane that can be used for separating different classes of data. However, due to its high computational complexity, SVM is a time-consuming algorithm when modeled solely with software. Various researchers attempted to implement SVM in hardware particularly on field-programmable gate array (FPGA) platforms in order to achieve high performance at lower cost and power consumption. As a result, the algorithm is unsuitable for embedded real-time applications. Therefore, SVM linear classifier is implemented on hardware which decreases the latency and executes the task in real time. In this paper, an SVM linear classifier with pipeline architecture is proposed for fast processing in Verilog HDL using a single-precision IEEE standard 754 number format. In order to perform a study related to hardware resource utilization and timing for the WBCD breast cancer datasets. The various performance metrics such as resource utilization, on-chip power consumption, and static timing analysis with constraints are evaluated. The accuracy rate is computed both using software and hardware for performance evaluation. The pipelined SVM architecture is designed using Verilog HDL, and then it is synthesized using the Vivado simulation tool. The design is configured to the Xilinx KC705 Kintex-7 evaluation board for implementation. This paper mainly focuses on the design of an SVM linear classifier with pipelined architecture for FPGA implementation. The FPGA-based two-class SVM classifier can perform fast data classification due to the advanced parallel calculation feature provided by FPGA. The classification system operates in a linear fashion. The simulation and synthesis results show that the SVM linear classification system can be able to classify data effectively.
... The high computing latency is due to the high number of cores and low cache memory to control these cores. In contrast to GPUs, FPGAs are customizable according to the user's needs, achieving better computing performance and lower latency [15,54,55]. However, FPGA hardware development is usually complex and takes a long time. ...
Article
Full-text available
In bioinformatics, alignment is an essential technique for finding similarities between biological sequences. Usually, the alignment is performed with the Smith-Waterman (SW) algorithm, a well-known sequence alignment technique of high-level precision based on dynamic programming. However, given the massive data volume in biological databases and their continuous exponential increase, high-speed data processing is necessary. Therefore, this work proposes a parallel hardware design for the SW algorithm with a systolic array structure to accelerate the forward and backtracking steps. For this purpose, the architecture calculates and stores the paths in the forward stage for pre-organizing the alignment, which reduces the complexity of the backtracking stage. The backtracking starts from the maximum score position in the matrix and generates the optimal SW sequence alignment path. The architecture was validated on Field-Programmable Gate Array (FPGA), and synthesis analyses have shown that the proposed design reaches up to 79.5 Giga Cell Updates per Second (GCPUS).
... Unfortunately, there are no soft processors optimized directly for image processing from vendors such as Intel (Altera) and Xilinx. Two soft processors developed specifically for image processing are, for example, IPPro [30] and a RISC-V soft processor [31]. These processors require fewer resources than Nios II and Microblaze. ...
... Point operations usually do not need image buffers; neighborhood operations require line buffers to hold the relevant pixels within the window depending on the size of the kernel. Some global operations do not require any buffering; however, some function-specific global SCPs may need a whole frame buffer to hold the frame until the frame has been processed, such as Otsu adaptive thresholding [30]. When creating an instance of one type of SCP, the optimized data handling then comes for free. ...
Article
Full-text available
Developing Field Programmable Gate Array (FPGA)-based applications is typically a slow and multi-skilled task. Research in tools to support application development has gradually reached a higher level. This paper describes an approach which aims to further raise the level at which an application developer works in developing FPGA-based implementations of image and video processing applications. The starting concept is a system of streamed soft coprocessors. We present a set of soft coprocessors which implement some of the key abstractions of Image Algebra. Our soft coprocessors are designed for easy chaining, and allow users to describe their application as a dataflow graph. A prototype implementation of a development environment, called SCoPeS, is presented. An application can be modified even during execution without requiring re-synthesis. The paper concludes with performance and resource utilization results for different implementations of a sample algorithm. We conclude that the soft coprocessor approach has the potential to deliver better performance than the soft processor approach, and can improve programmability over dedicated HDL cores for domain-specific applications while achieving competitive real time performance and utilization.
... Both a pipeline structure and a parallel array exist in FPGA systematic structures. Therefore, whether an algorithm can be reconstructed in FPGA depends on mapping this algorithm into an FPGA systematic structure [24,25]. ...
Article
Full-text available
Only a few effective methods can detect internal defects and monitor the internal state of complex structural parts. On the basis of the principle of PET (positron emission computed tomography), a new measurement method, using γ photon to detect defects of an inner surface, is proposed. This method has the characteristics of strong penetration, anti-corrosion and anti-interference. With the aim of improving detection accuracy and imaging speed, this study also proposes image reconstruction algorithms, combining the classic FBP (filtered back projection) with MLEM (maximum likelihood expectation Maximization) algorithm. The proposed scheme can reduce the number of iterations required, when imaging, to achieve the same image quality. According to the operational demands of FPGAs (field-programmable gate array), a BPML (back projection maximum likelihood) algorithm is adapted to the structural characteristics of an FPGA, which makes it feasible to test the proposed algorithms therein. Furthermore, edge detection and defect recognition are conducted after reconstructing the inner image. The effectiveness and superiority of the algorithm are verified, and the performance of the FPGA is evaluated by the experiments.
... 79 The high computing latency is due to the high number of cores and low cache memory 80 to control these cores. In contrast to GPUs, FPGAs are customizable according to the 81 user's needs, achieving better computing performance and lower latency [14,44,45]. 82 However, FPGA hardware development is usually complex and takes a long time. ...
Preprint
Full-text available
In bioinformatics, alignment is an essential technique for finding similarities between biological sequences. Usually, the alignment is performed with the Smith-Waterman (SW) algorithm, a well-known sequence alignment technique of high-level precision based on dynamic programming. However, given the massive data volume in biological databases and their continuous exponential increase, high-speed data processing is necessary. Therefore, this work proposes a parallel hardware design for the SW algorithm with a systolic array structure to accelerate the Forward and Backtracking steps. For this purpose, the architecture calculates and stores the paths in the Forward stage for pre-organizing the alignment, which reduces the complexity of the Backtracking stage. The backtracking starts from the maximum score position in the matrix and generates the optimal SW sequence alignment path. The architecture was validated on Field-Programmable Gate Array (FPGA), and synthesis analyses have shown that the proposed design reaches up to 79.5 Giga Cell Updates per Second (GCPUS).