Fig 1 - uploaded by Syed Manzoor Qasim
Content may be subject to copyright.
Pseudocode for matrix multiplication.

Pseudocode for matrix multiplication.

Source publication
Article
Full-text available
Matrix multiplication is a computationally-intensive and fundamental matrix operation in many algorithms used in scientific computations. It serves as the basic building block for signal, image processing, graphics and robotic applications. To improve the performance of these applications, a high performance matrix multiplier is required. Tradition...

Context in source publication

Context 1
... for end for end for end MatrixMultiplication [15]. Matrix multiplication is based on the pseudocode shown in Fig. 1. In the shown pseudocode i, j, and k are the loop indices. The loop body consists of a single recurrence equation given by (2) ...

Similar publications

Technical Report
Full-text available
Reconfigurable systems are more and more employed in many application fields, including aerospace. SRAM-based FPGAs represent an extremely interesting hardware platform for this kind of systems, because they offer flexibility as well as processing power. Furthermore, the ability of run time reconfiguration of SRAM-based FPGAs can provide advantages...
Article
Full-text available
Rate matching and channel interleavers play a pivotal role in code rate adaptation and minimizing burst errors in communication systems. The data and control channels in the Fifth generation (5G) New Radio (NR) employs distinct channel interleavers/de-interleavers for reducing bit error rate. However, the independent implementation results in a sub...
Article
Full-text available
Encoded data by Walsh sequence formed a primary step to generate encoded data to be inputs for ciphers. The encoded function is based on Walsh sequence generator. Rademacher functions which are stored in the device and a counter to produce the index of Walsh sequence which is used as inputs of Gray code function are generating Walsh sequence. In th...
Article
Full-text available
This paper analyzes a hardware-accelerated image processing module for visual inspection systems. These systems are essential for maintaining product quality and decreasing manual inspection. The proposed system harnesses FPGA technology to enhance the efficiency of image processing tasks, with a specific focus on filtering and labeling processes....
Article
Full-text available
This paper provides the essential details of implementing 4-phase bundled data and speed independent asynchronous circuits on FPGAs. The required Xilinx synthesis tools including attributes, constraints and hardware implementation of basic asynchronous elements like Cgate, delay line, and handshaking modules are discussed. Finally, two design and i...

Citations

... Two of the most common manufacturer of FPGA or Complex Programmable Logic Devices (CPLD) one might encountered are probably Altera and Xilinx. According to [8], the three parameters to evaluate Field Programmable Gate Array (FPGA) are speed, area, and power (energy). Meanwhile, the languages of choice for FPGA are Verilog and VHDL (Verilog Hardware Description Language). ...
... Nevertheless, the best ASIC could always out beat the fastest FPGA as these two technology progresses. Syed M. Qasim, [8] has introduced a Field Programmable Gate Array (FPGA) based matrix multiplication provides speed-up in computation time and flexibility compared to software and ASIC based method. Two advantages of using Field Programmable Gate Array (FPGA) i.e. able to run at a speed more than 500 MHz [45] and potential for dynamic reconfiguration [46]; that is, reprogramming part of the device at run time so that resources can be reused through time multiplexing. ...
Article
Full-text available
This paper will provide some insights on the application of Field Programmable Gate Array (FPGA) in process tomography. The focus of this paper will be to investigate the performance of the technology with respect to various tomography systems and comparison to other similar technologies including the Application Specific Integrated Circuit (ASIC), Graphics Processing Unit (GPU) and the microcontroller. Fundamentally, the FPGA is primarily used in the Data Acquisition System (DAQ) due to its better performance and better trade-off as compared to competitor technologies. However, the drawback of using FPGA is that it is relatively more expensive.
... Algoritmos clássicos de multiplicação de matrizes densas, para um processador simples, são computados dentro de três laços de repetição aninhados, com n repetições cada. Tais algoritmos possuem uma complexidade O(n 3 ), o que representa um crescimento acentuado do número de operações com a dimensão da matriz [19]. ...
Article
Constrained model predictive control (MPC) usually requires the computation of a quadratic programming problem (QP) at each sampling instant. This is computationally expensive and becomes a limitation to embed and use MPC in plants with fast sampling rates. Several special solvers for MPC problems have been proposed in the last years, but most of them focus on state-space formulations, which are very popular in academia. This paper proposes a solution based on alternated direction method of multipliers, tailored for embedded systems and applied to generalized predictive control (GPC), which is a very popular formulation in industry. Implementations issues of parallel computation are discussed in order to accelerate the time required for the operations. The implementation in an FPGA proved to be quite fast, with the observed worst case execution time of 11,54 µs for the presented example. These results contribute to embed GPC applications in processes that are typically controlled by classical controllers because of their fast dynamics.
Conference Paper
Matrix multiplication is an important basic operation that is used in a vast range of applications like image processing and DSP. The design and implementation of a new matrix multiplication module is the main focus of this paper. Our proposed matrix multiplier hardware can easily be re-configured in order to accept any pair of input matrices that are mathematically allowed to multiply. The proposed hardware not only is able to multiply both square and non-square matrices, but it also utilizes a scalable systolic architecture to enhance the computation speed in terms of clock cycles compared to a previously established work in this area. Non-square multiplication and re-configurability of the proposed matrix multiplier make it capable of being used in higher level system applications such as a filter. The corresponding RTL code was developed, compiled, and simulated using the SystemC library. The implemented design is also synthesized for different matrix dimensions and the cost of hardware in terms of basic logic elements is reported.
Article
Matrix Multiplication is a basic operation that can be used in many applications of DSP. For raw matrix data cannot feed into Simulink Xilinx block directly, thus a new module needs to be designed to complete the matrix multiplication. The original method is straightforward, while consuming considerable hardware resources. In order to save the consumption, we propose a new method to design the matrix multiplication module on Simulink Xilinx platform, which is also implemented on Spartan 3E FPGA (Field Programmable Gate Array). The main idea of the proposal is to reuse the resource and input the data in serial. In this way, the hardware cost can be dramatically decreased; meanwhile decreased but more time for the computation will be needed.
Article
Application Robustification, a promising approach for reducing processor power, converts applications into numeri- cal optimization problems and solves them using gradient descent and conjugate gradient algorithms (1). The improvement in robustness, however, comes at the expense of performance when compared to the baseline non-iterative versions of these applica- tions. To mitigate the performance loss from robustification, we present the design of a hardware accelerator and corresponding software support that accelerate gradient descent and conjugate gradient based iterative implementation of applications. Unlike traditional accelerators, our design accelerates different types of linear algebra operations found in many algorithms and is capable of efficiently handling sparse matrices that arise in applications such as graph matching. We show that the proposed accelerator can provide significant speedups for iterative versions of several applications and that for some applications such as least squares, it can substantially improve the computation time as compared to the baseline non-iterative implementation.