Pseudocode for matrix multiplication.

Source publication

Hardware Realization of Matrix Multiplication using Field Programmable Gate Array

Article

Full-text available

Aug 2009

Matrix multiplication is a computationally-intensive and fundamental matrix operation in many algorithms used in scientific computations. It serves as the basic building block for signal, image processing, graphics and robotic applications. To improve the performance of these applications, a high performance matrix multiplier is required. Tradition...

Context 1

... for end for end for end MatrixMultiplication [15]. Matrix multiplication is based on the pseudocode shown in Fig. 1. In the shown pseudocode i, j, and k are the loop indices. The loop body consists of a single recurrence equation given by (2) ...

View in full-text

On-Line Testing and Healing Permanent Radiation Effects in Reconfigurable Systems

Technical Report

Full-text available

Mar 2016

Reconfigurable systems are more and more employed in many application fields, including aerospace. SRAM-based FPGAs represent an extremely interesting hardware platform for this kind of systems, because they offer flexibility as well as processing power. Furthermore, the ability of run time reconfiguration of SRAM-based FPGAs can provide advantages...

Operation of rectangular interleaver for data channel.

Rectangular interleaver for data channel with Qm = 4 and E = 20.

Operation of isosceles right triangle interleaver for control channel.

Isosceles right triangle interleaver for control channel with E = 15...

Bit collection and Bit selection for 5G NR.

A reduced complexity rate-matching and channel Interleaver/De-Interleaver for 5G NR

Article

Full-text available

Apr 2024

Rate matching and channel interleavers play a pivotal role in code rate adaptation and minimizing burst errors in communication systems. The data and control channels in the Fifth generation (5G) New Radio (NR) employs distinct channel interleavers/de-interleavers for reducing bit error rate. However, the independent implementation results in a sub...

Hardware Implementation of Walsh-Rademacher Functions for communication security

Article

Full-text available

Jan 2011

Rabie A. MAHMOUD

Encoded data by Walsh sequence formed a primary step to generate encoded data to be inputs for ciphers. The encoded function is based on Walsh sequence generator. Rademacher functions which are stored in the device and a counter to produce the index of Walsh sequence which is used as inputs of Gray code function are generating Walsh sequence. In th...

All configurations of the target system: the Cyclone V SoC board...

Processing time related to CPU/GPU-based image processing

Processing and latency time in FPGA-based image processing

Example of image filter architecture: In this example, a three-line...

Pipeline-based labeling process using run-length encoding

Performance verification and latency time evaluation of hardware image processing module for appearance inspection systems using FPGA

Article

Full-text available

Jan 2024

This paper analyzes a hardware-accelerated image processing module for visual inspection systems. These systems are essential for maintaining product quality and decreasing manual inspection. The proposed system harnesses FPGA technology to enhance the efficiency of image processing tasks, with a specific focus on filtering and labeling processes....

Schematic view of 4-phase BD asynchronous Muller pipeline

Schematic view of asynchronous FLC using asynchronous loops

ISim post-route waveform 4-phase BD ISE divider core using local clock

ISim post-route waveform of asynchronous FLC

ISim post-route waveform of synchronous FLC

Detailed implementation of asynchronous circuits on commercial FPGAs

Article

Full-text available

Jun 2020

This paper provides the essential details of implementing 4-phase bundled data and speed independent asynchronous circuits on FPGAs. The required Xilinx synthesis tools including attributes, constraints and hardware implementation of basic asynchronous elements like Cgate, delay line, and handshaking modules are discussed. Finally, two design and i...

Application of FPGA in Process Tomography Systems

Article

Full-text available

Oct 2020

This paper will provide some insights on the application of Field Programmable Gate Array (FPGA) in process tomography. The focus of this paper will be to investigate the performance of the technology with respect to various tomography systems and comparison to other similar technologies including the Application Specific Integrated Circuit (ASIC), Graphics Processing Unit (GPU) and the microcontroller. Fundamentally, the FPGA is primarily used in the Data Acquisition System (DAQ) due to its better performance and better trade-off as compared to competitor technologies. However, the drawback of using FPGA is that it is relatively more expensive.

Fast Constrained Generalized Predictive Control with ADMM Embedded in an FPGA

Article

Feb 2020
Lat Am Trans IEEE

Constrained model predictive control (MPC) usually requires the computation of a quadratic programming problem (QP) at each sampling instant. This is computationally expensive and becomes a limitation to embed and use MPC in plants with fast sampling rates. Several special solvers for MPC problems have been proposed in the last years, but most of them focus on state-space formulations, which are very popular in academia. This paper proposes a solution based on alternated direction method of multipliers, tailored for embedded systems and applied to generalized predictive control (GPC), which is a very popular formulation in industry. Implementations issues of parallel computation are discussed in order to accelerate the time required for the operations. The implementation in an FPGA proved to be quite fast, with the observed worst case execution time of 11,54 µs for the presented example. These results contribute to embed GPC applications in processes that are typically controlled by classical controllers because of their fast dynamics.

Configurable Systolic Matrix Multiplication

Conference Paper

Jan 2014

Matrix multiplication is an important basic operation that is used in a vast range of applications like image processing and DSP. The design and implementation of a new matrix multiplication module is the main focus of this paper. Our proposed matrix multiplier hardware can easily be re-configured in order to accept any pair of input matrices that are mathematically allowed to multiply. The proposed hardware not only is able to multiply both square and non-square matrices, but it also utilizes a scalable systolic architecture to enhance the computation speed in terms of clock cycles compared to a previously established work in this area. Non-square multiplication and re-configurability of the proposed matrix multiplier make it capable of being used in higher level system applications such as a filter. The corresponding RTL code was developed, compiled, and simulated using the SystemC library. The implemented design is also synthesized for different matrix dimensions and the cost of hardware in terms of basic logic elements is reported.

Implementation of effective matrix multiplication on FPGA

Article

Oct 2011

Matrix Multiplication is a basic operation that can be used in many applications of DSP. For raw matrix data cannot feed into Simulink Xilinx block directly, thus a new module needs to be designed to complete the matrix multiplication. The original method is straightforward, while consuming considerable hardware resources. In order to save the consumption, we propose a new method to design the matrix multiplication module on Simulink Xilinx platform, which is also implemented on Spartan 3E FPGA (Field Programmable Gate Array). The main idea of the proposal is to reuse the resource and input the data in serial. In this way, the hardware cost can be dramatically decreased; meanwhile decreased but more time for the computation will be needed.

A hardware acceleration technique for gradient descent and conjugate gradient

Article

Jun 2011

Application Robustification, a promising approach for reducing processor power, converts applications into numeri- cal optimization problems and solves them using gradient descent and conjugate gradient algorithms (1). The improvement in robustness, however, comes at the expense of performance when compared to the baseline non-iterative versions of these applica- tions. To mitigate the performance loss from robustification, we present the design of a hardware accelerator and corresponding software support that accelerate gradient descent and conjugate gradient based iterative implementation of applications. Unlike traditional accelerators, our design accelerates different types of linear algebra operations found in many algorithms and is capable of efficiently handling sparse matrices that arise in applications such as graph matching. We show that the proposed accelerator can provide significant speedups for iterative versions of several applications and that for some applications such as least squares, it can substantially improve the computation time as compared to the baseline non-iterative implementation.

Pseudocode for matrix multiplication.

Context in source publication

Similar publications

Citations