Generic Tanner graph and iterative decoding using a thread-per-node parallel approach.

Source publication

Real-time DVB-S2 LDPC decoding on many-core GPU accelerators

Conference Paper

Full-text available

Jun 2011

It is well known that LDPC decoding is computationally demanding and one of the hardest signal operations to parallelize. Beyond data dependencies that restrict the decoding of a single word, it requires a large number of memory accesses. In this paper we propose parallel algorithms for performing in CPUs the most demanding case of irregular and lo...

Context 1

... are usually represented by bipartite Tanner graphs [14], connecting Bit Nodes (BN) and Check Nodes (CN). The in-formation received from the channel is propagated, processed and exchanged between neighboring nodes of the graph, as depicted by the arrows in figure 1. If at the end of an itera- tion the codeword does not verify all parity-check equations, a new iteration is launched until the maximum number of al- lowed iterations occurs or a valid codeword is reached. ...

View in full-text

Context 2

... Min-Sum algorithm was adopted in this work to per- form the decoding of computationally intensive long LDPC codes because it is less complex than the well-known Sum- Product algorithm [14]. As figure 1 indicates, the inputs of the decoder are log-likelihood ratios (LLRs), which represent the logarithm of the ratio of two complementary probabilities (LLR(x) = ln(p(x = 0)/p(x = 1))) at the input of the decoder [14]. ...

View in full-text

Parallel algorithm of IDCT with GPUs and CUDA for large-scale video quality of 3G

Article

Full-text available

Aug 2012

When video is transmitted over 3G networks, the video quality might suffer from impairments caused by packet losses. Extracting video quality features is a set of algorithms and inverse discrete cosine transforms is an important algorithm in this field. To improve the performance and be suitable to apply to evaluating the 3G video quality in real-t...

Cuda Parallel Implementation of Image Reconstruction Algorithm for Positron Emission Tomography

Data

Full-text available

Dec 2012

Although the use of iterative algorithms for image reconstruction in 3D Positron Emission Tomography (PET) has shown to produce images with better quality than analytical methods, they are computationally expensive. New Graphic Processor Units (GPUs) provide high performance at low cost and programming tools that make it possible to execute paralle...

Solving sparse linear and nonlinear systems on GPU clusters

Article

Full-text available

Jun 2013

Lilia Ziane Khodja

Or the past few years, the clusters equipped with GPUs have become attractive tools for high performance computing. In this thesis, we have designed parallel iterative algorithms for solving large sparse linear and nonlinear systems on GPU clusters. First, we have focused on solving sparse linear systems using CG and GMRES iterative methods. The ex...

Review on LDPC Codes for Big Data Storage

Article

Full-text available

Mar 2021
WIRELESS PERS COMMUN

The need for highly scalable and reliable big data storage systems is due to the fact that there is an explosive growth in data everywhere particularly due to data generated from social networking sites and IOT Technology for various applications. Hence most of Information Technology, Medical organizations and big Industries, Social networking companies, Government organizations (like ISRO) are required to have storage capacities of 100 PB (petabytes) of data. Therefore to secure and store this kind of large data efficiently research on application of Erasure codes for both Cloud storage and Network or Distributed Storage systems is considered recently. The traditional triple-replication method, which stores 3 copies of every file and requires an extra 200% storage overhead is actually highly expensive. Erasure Codes are considered for parallel storage systems as an alternative to traditional storage system. This paper presents the various techniques of applying LDPC codes for big data storage and provides the research gap to investigate application of LDPC codes for big data storage.

Binary and Non-Binary Low Density Parity Check Codes: A Survey

Article

Full-text available

May 2018

Salah Abdulghani Alabady

Forward error detection and correction codes have been widely either used of storage applications or transferred through a wireline or wireless communication media systems for many years. Due to the unreliable wireless links, broadcast nature of wireless transmissions, interference, moreover, noisy transmission channel, frequent topology changes, and the various quality of wireless channel, there are challenge to provide high data rate service, high throughput, high packet delivery ratio (PDR), low end-to-end delay and reliable services. In order to address these challenges, several channel coding scheme are proposed. In this paper, detailed overviews of the major concepts in error detection and correction codes are presented. The paper provided fundamentals of Low Density Parity Check (LDPC) codes, and a comprehensive survey of the binary and non-binary LDPC codes is provided.

Implementation of a Fully-Parallel Turbo Decoder on a General-Purpose Graphics Processing Unit

Article

Full-text available

Jun 2016

Turbo codes comprising a parallel concatenation of upper and lower convolutional codes are widely employed in state-of-the-art wireless communication standards, since they facilitate transmission throughputs that closely approach the channel capacity. However, this necessitates high processing throughputs in order for the turbo code to support real-time communications. In state-of-the-art turbo code implementations, the processing throughput is typically limited by the data dependencies that occur within the forward and backward recursions of the Log-BCJR algorithm, which is employed during turbo decoding. In contrast to the highly-serial Log-BCJR turbo decoder, we have recently proposed a novel Fully Parallel Turbo Decoder (FPTD) algorithm, which can eliminate the data dependencies and perform fully parallel processing. In this paper, we propose an optimized FPTD algorithm, which reformulates the operation of the FPTD algorithm so that the upper and lower decoders have identical operation, in order to support Single Instruction Multiple Data (SIMD) operation. This allows us to develop a novel General Purpose Graphics Processing Unit (GPGPU) implementation of the FPTD, which has application in Software-Defined Radios (SDRs) and virtualized Cloud-Radio Access Networks (C-RANs). As a benefit of its higher degree of parallelism, we show that our FPTD improves the higher processing throughput of the Log-BCJR turbo decoder by between 2.3 and 9.2 times, when employing a high-specification GPGPU. However, this is achieved at the cost of a moderate increase of the overall complexity by between 1.7 and 3.3 times.

A survey on programmable LDPC decoders

Article

Full-text available

Jan 2016

Low-density parity-check (LDPC) block codes are popular forward error correction schemes due to their capacity-approaching characteristics. However, the realization of LDPC decoders that meet both low latency and high throughput is not a trivial challenge. Usually, this has been solved with ASIC and FPGA technology that enables meeting the decoder design constraints. But the rise of parallel architectures, such as graphics processing units, and the scaling of CPU streaming extensions, has shown that multi-core and many-core technology can provide a flexible alternative to the development of dedicated LDPC decoders for the compute-intensive prototyping phase of the design of new codes. Under this light, this paper surveys the most relevant publications made in the past decade to programmable LDPC decoders. It looks at the advantages and disadvantages of parallel archi-tectures and data-parallel programming models, and assesses how the design space exploration is pursued regarding key characteristics of the underlying code and decoding algorithm features. The paper concludes with a set of open problems in the field of communication systems on parallel programmable and reconfigurable architectures.

Repeat-Accumulate Codes for Reconciliation in Continuous Variable Quantum Key Distribution

Article

Full-text available

Oct 2015

This paper investigates the design of low-complexity error correction codes for the verification step in continuous variable quantum key distribution (CVQKD) systems. We design new coding schemes based on quasi-cyclic repeat-accumulate codes which demonstrate good performances for CVQKD reconciliation.

FFT-SPA non-binary LDPC decoding on GPU

Conference Paper

Full-text available

May 2013
Acoust Speech Signal Process

It is well known that non-binary LDPC codes outperform the BER performance of binary LDPC codes for the same code length. The superior BER performance of non-binary codes comes at the expense of more complex decoding algorithms that demand higher computational power. In this paper, we propose parallel signal processing algorithms for performing the FFT-SPA and the corresponding decoding of non-binary LDPC codes over GF(q). The constraints imposed by the complex nature of associated subsystems and kernels, in particular the Check Nodes, present computational challenges regarding multicore systems. Experimental results obtained on GPU for a variety of GF(q) show throughputs in the order of 2 Mbps, which is far above from the minimum throughput required, for example, for real-time video applications that can benefit from such error correcting capabilities.

Étude de codes LDPC pour applications spatiales optiques et conception des décodeurs associés

Thesis

Jan 2021

Vincent Pignoly

Les systèmes de communications numériques sont omniprésents dans notre quotidien. L'évolution des besoins implique la recherche et le développement de solutions innovantes pour les futurs systèmes de communications. Dans le cadre des communications numériques satellitaires, la plupart des satellites utilisent des liens par radiofréquences pour communiquer avec la Terre. Pour limiter l'utilisation de bande passante et augmenter les débits, les technologies de communications numériques via des liens optiques constituent une alternative intéressante. Ces technologies utilisent des lasers pour l'émission des données et des télescopes en réception. Cependant, l'énergie lumineuse est absorbée ou déviée par les particules présentes dans l'atmosphère terrestre. Ces perturbations sont à l'origine de nouvelles problématiques et de nouveaux schémas de codage doivent être mis au point pour y remédier.Les codes LDPC sont une famille de codes correcteurs d'erreurs. Leurs performances proches de la limite de Shannon en font des solutions très attractives pour les systèmes de communications numériques. Ils ont notamment été sélectionnés dans le standard Wifi et pour la 5G, permettant d'atteindre de très haut débits (plusieurs Gbit/s). Ils ont aussi été retenus par les standards CCSDS et DVB-S2 pour des applications spatiales.Cette thèse porte sur l'étude et l'implantation matérielle de schéma de codage appliqué à des communications numériques satellitaires via des liens optiques. La première contribution est l'étude d'un schéma de codage pour un lien optique descendant avec un décodage canal à entrées souples au sol. Dans le cadre de cette étude, une architecture matérielle permettant d'implanter le processus de décodage sur FPGA et capable d'atteindre un débit attendu de 10 Gbit/s a été développée. Une deuxième contribution porte sur le lien optique montant impliquant un décodeur canal à entrées dures embarqué dans un satellite. Les contraintes qui en découlent ont amené à repenser l'algorithme Gallager B étendu. Cela a permis la conception d'une nouvelle architecture permettant d'effectuer efficacement un décodage à entrées dures tout en respectant les contraintes spatiales au niveau de la complexité matérielle, de la dissipation thermique et du débit (10 Gbit/s).

Repeat-accumulate codes for reconciliation in continuous variable quantum key distribution

Conference Paper

Jan 2016

Efficient DVB-T2 decoding accelerator design by time-multiplexing FPGA resources

Conference Paper

Aug 2012

Demodulation and decoding of second generation terrestrial digital video broadcasting (DVB-T2) signals on general purpose processor platforms is challenging in terms of complexity and in terms of power. FPGA-based runtime acceleration for DVB-T2 allows for unwrapping the iterative structures of modern channel decoding schemes by using parallel hardware designs. Additionally, due to the sequential nature of the DVB-T2 receiver chain we can use partial reconfiguration to switch between different decoding modules. We will show in a theoretical analysis that this time-multiplexing approach can be used to realize resource-efficient DVB-T2 receiver chains at a much lower resource and power consumption as compared to solely processor-based solutions.

Efficient graphics processing unit based layered decoders for quasicyclic low-density parity-check codes

Article

Jan 2015
CONCURR COMP-PRACT E

Because layered low-density parity-check (LDPC) decoding algorithm was proposed, one can exploit the diversity gain to achieve performance comparable to the traditional two-phase message passing (TPMP) decoding but with about twice faster decoding convergence compared to TPMP. In order to reduce the decoding time of layered LDPC decoder, a graphics processing unit (GPU) is exploited as the modem processor so that the decoding procedure can be processed in parallel using numerous threads in the GPU. In this paper, we present the parallel algorithms and efficient implementations on the GPU for two different layered message passing schemes, the row-layered and column-layered decoding. In the experiments, the quasicyclic LDPC codes for WiFi (802.11n) and WiMAX (802.16e) are decoded by the proposed layered LDPC decoders. The experimental results show that our decoder has good bit error ratio (BER) performance comparable to TPMP decoder. The peak throughput is 712 Mbps, which is about two orders of magnitude faster than that of CPU implementation and comparable to the dedicated hardware solutions. Compared to the existing fastest GPU-based implementation, the presented decoder can achieve a performance improvement of 2.3 times. Copyright © 2013 John Wiley & Sons, Ltd.

Generic Tanner graph and iterative decoding using a thread-per-node parallel approach.

Contexts in source publication

Similar publications

Citations