ArticlePDF Available

CCSDS 131.2-B-1 Serial Concatenated Convolutional Turbo Decoder Architecture for Efficient FPGA Implementation

Authors:

Abstract and Figures

Most of the turbo encoding schemes at standards are parallel-based, so different architectures for efficient implementation are common in the literature. However, a serial turbo decoder is not that common. This scheme is used in CCSDS 131.2-B-1 standard, which is attracting much of attention recently due to its higher performance for satellite communications. In this paper, an efficient architecture for the decoder is proposed and analyzed. It is intended to show an architecture that can be modeled in a circuit description language (such as VHDL and Verilog) in such a way that it can be easily implemented on a Field Programmable Gate Array (FPGA). This work describes in detail this architecture explaining the encoding operations that are performed at the transmitter and then, how to undo them at the receiver. The proposed algorithm works by using independent components to divide the tasks and to obtain a pipeline architecture to improve the efficiency. The results of simulating and implementing the proposed architecture on a Xilinx Zynq UltraScale+ RFSoC ZCU28DR board with XCZU28DR-2FFVG1517E RFSoC are shown. The final results presented demonstrate how the hardware operations give equivalent results to the software simulation and do not consume board resources aggressively as usually the turbodecoder does.
Content may be subject to copyright.
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
CCSDS 131.2-B-1 serial concatenated
convolutional turbo decoder architecture
for efficient FPGA implementation
MIGUEL ÁNGEL PÉREZ NARANJO1, VÍCTOR P. GIL JIMÉNEZ2(Senior Member, IEEE)
1Department of Signal theory and communications, Universidad Carlos III de Madrid, 28911 Leganés, Madrid (e-mail: mpnaranjo@tsc.uc3m.es)
2Department of Signal theory and communications, Universidad Carlos III de Madrid, 28911 Leganés, Madrid (e-mail: vgil@ing.uc3m.es)
Corresponding author: Miguel Ángel Pérez Naranjo (e-mail: mpnaranjo@tsc.uc3m.es).
This work was partly funded by Project “IRENE” (PID2020-115323RB-C33) (MINECO/AEI/FEDER, UE) and project MFOC Madrid
Flight on Chip - Innovation Cooperative Projects Comunidad of Madrid - HUBS 2018/ Madrid Flight on Chip.
ABSTRACT Most of the turbo encoding schemes at standards are parallel-based, so different architectures
for efficient implementation are common in the literature. However, a serial turbo decoder is not that
common. This scheme is used in CCSDS 131.2-B-1 standard, which is attracting much of attention recently
due to its higher performance for satellite communications. In this paper, an efficient architecture for the
decoder is proposed and analyzed. It is intended to show an architecture that can be modeled in a circuit
description language (such as VHDL and Verilog) in such a way that it can be easily implemented on a Field
Programmable Gate Array (FPGA). This work describes in detail this architecture explaining the encoding
operations that are performed at the transmitter and then, how to undo them at the receiver. The proposed
algorithm works by using independent components to divide the tasks and to obtain a pipeline architecture
to improve the efficiency. The results of simulating and implementing the proposed architecture on a Xilinx
Zynq UltraScale+ RFSoC ZCU28DR board with XCZU28DR-2FFVG1517E RFSoC are shown. The final
results presented demonstrate how the hardware operations give equivalent results to the software simulation
and do not consume board resources aggressively as usually the turbodecoder does.
INDEX TERMS DSP, Coding, FEC, serial Turbodecoder, Pipeline, Convolutional, BCJR, VHDL,
Synthesis, Implementation, Hardware, FPGA, CCSDS.
I. INTRODUCTION
FORWARD Error Correction (FEC) is a mandatory tech-
nique nowadays for the design of a high performance
communications system, as it allows to detect and correct
errors on the transmitted information, allowing to reach the
well-known Shanon limit [1].
Among all the channel coding techniques, turbo encod-
ing/decoding [2] is one of the most promising strategies
for improving performance [3] although the complexity in-
creases.. For this reason, there are many standards using
turbo encoding/decoding [4]–[6] . The turbo encoding idea
is the concatenation of two simple convolutional encoders,
preferably those based on a recursive systematic codes (RSC)
model, to improve the error correction capacity by means of
an iterative algorithm, in order to achieve very low error rates
without the need to use a large number of shift registers in the
encoder architecture, as this would exponentially increase the
complexity of the decoding process [3].
The origins of turbo decoder technology dates back to late
1980s. In 1989, Alain Glavieux proposed a modification to
the Viterbi algorithm called Soft-Output Viterbi Algorithm
(SOVA) [7], [8] that allowed working with soft outputs after
decoding a single convolutional encoder, which led to the ob-
servation that working with soft-input and soft-output (SISO)
decoders [9]–[11] improved the signal-to-noise ratio (SNR).
When the next phases of the general structure of the turbo
decoder were developed, the concatenation of the encoders
made the use of SOVA for decoding unfeasible and the BCJR
algorithm [12], [13], also known as the forward-backward
algorithm, which follows the Maximum a Posteriori (MAP)
criterion, began to be used. It was created in 1974 but adapted
and improved in 1993 by its inventors, Bahl, Cocke, Jelinek,
and Raviv.
The first commercial use of turbo codes occurred in 1997
with Inmarsat’s M4 multimedia service by satellite. This new
service used the component of Turbo4 circuits [14] (CAS
VOLUME 4, 2016 1
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3235966
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Perez-Naranjo and Gil Jiménez: CCSDS 131.2-B-1 SCC turbo decoder architecture for efficient FPGA implementation
5093 successor) with a 16-QAM modulation and enables
the user to communicate with Inmarsat-3 spot-beam satellite
from a terminal at 64 kbit/s. The narrowband technology
based on a 16 QAM constellation mapping and turbo coding
provides significant reduction (> 50 %) in the required band-
width for mobile satellite channels, at the same time improv-
ing the satellite power efficiency [15]. This was the beginning
of turbo codes in commercial applications which took a lot
of effort from several teams in hardware implementation
[16], [17] and the adaptation of communications standards to
these technologies in a way that would make them achievable
in commercial electronics [18], [19]. However, as hardware
development technology evolved and it became possible to
increase the number of resources of the devices where such
algorithms are to be implemented, it also became possible
to increase the complexity of the turbo decoder variant to be
implemented, as increasingly higher transmission rates are
required and decoding operations are applied to arrays of
symbols of much longer lengths, as is the case with IEEE
802.16 family of wireless communications standards [20] or
HomePlug AV became an IEEE standard in 2010 [21]. It
has also been a key player in more recent technologies such
as faster-than-Nyquist (FTN) signaling [22]–[24] or coherent
decoding [25], [26].
For the case of satellite communications, the performance
of the turbo decoder combined with Frequency-Hopped
Spread Spectrum (FH-SS) has been analysed in [27] and
compared with the combined performance of several dy-
namic power allocation algorithms, where a modification of
the classical structure is made to develop a new iterative algo-
rithm for channel variance and carrier phase estimation (side
information), which was shown to provide superior perfor-
mance to the case where no side information is available [28].
A turbo trellis coded modulation in conjunction with con-
tinuous phase modulation was used in a frequency-hopping
packet radio structure to further reduce the error probability
too [29]. Also very interesting is the high throughput that
can be achieved with error correction algorithms based on
newer techniques such as Turbo-Hadamard coding methods
[30]. However, all these algorithms are not yet validated in
direct hardware implementation, that is, they are still com-
plex to implement by transcribing them directly into a circuit
description language, either VHDL or Verliog, without using
other elements such as a microprocessor or Universal Soft-
ware Radio Peripheral (USRP) that allow certain parts of the
algorithm to be implemented in software. That is why the
standard when using a turbo decoder scheme is to try not to
add additional blocks beyond those used in classical schemes
[31], which are the RSCs, an interleaver and a puncturing
block in some cases to achieve the rates recommended in the
standards.
Thus we come to the standard specifications recommended
by the Consultative Committee for Space Data Systems
(CCSDS) in [32] where optimal combinations of coding
rates and frame lengths are pursued to make efficient use of
bandwidth and maximising spectral efficiency.
The aim of this work is to present a valid and efficient
architecture to perform decoding of the Serial Concatenated
Convolutional Code (SCCC) block, described as a Serial
Concatenated Convolutional (SCC) Turbo Coding Scheme,
proposed in the aforementioned standard and described in
[33], showing the pipelined components of the complete
decoder assembly and the connections of its integrated com-
ponents. In addition, detailed simulation results are shown
on the code produced in VHDL, corresponding to the wave-
forms that would be obtained from the proposed circuit,
validating the likelihoods computed in each iteration and
the final bits obtained after the hard decision process, to
compare those results with the software simulation using
MATLAB. In addition, the synthesis and implementation
performance values are included on the evaluation board, in
this case, on a Xilinx Zynq UltraScale+ RFSoC ZCU28DR
with XCZU28DR-2FFVG1517E RFSoC, to evaluate the re-
source management on it and the efficiency of the proposed
architecture.
In other words, this paper presents a novel architecture not
explored in the literature to provide a customized decoding
scheme to [32], and optimized for electronics to facilitate its
implementation in real systems. So, the main contributions of
this paper are summarized as follows:
Complete description of the proposed architecture, in-
cluding the description of each component with its
corresponding block diagram and the connections and
signals between them for their subsequent hardware
implementation. In addition, the optimized algorithm is
specified for decoding each RSC individually.
A software simulation comparing the theoretical and
optimized versions of the algorithm, to demonstrate that
using the appropriate number of iterations only 0.5 dB
of Eb/N0is lost between the two versions, the latter
being infinitely simpler to implement in hardware and
more efficient in terms of FPGA resource consumption.
Exhaustive comparison of the results in the software
simulation with the hardware simulation to demonstrate
that the values are the same and to verify that the
architecture works correctly.
Show the synthesis results of the proposed architecture
to evaluate the resources it consumes on the evaluation
board.
Present the results of carrying out the implementation of
the synthesised circuit to evaluate the timing and tem-
perature performance and certify that it can be realised
on the FPGA.
This paper is organized as follows: Section II describes the
SCC Turbo Coding block collected in the reference standard,
an overview of the BCJR algorithm for a single RSC and a the
description of methods of simplifying this algorithm, which
makes it possible to do such signal processing in hardware in
an efficient way; section III shows the proposed architecture
of the turbo decoder, detailing the pipelined structure that
builds it, showing each component that makes it up and how
2VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3235966
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Perez-Naranjo and Gil Jiménez: CCSDS 131.2-B-1 SCC turbo decoder architecture for efficient FPGA implementation
FIGURE 1. Block Diagram of the SCC Turbo Coding Scheme in the transmitter. Extracted from [32].
the flow is controlled to avoid mismatches of information
in the memory blocks to generically fit the different pro-
posed lengths of the input and output frames; section IV
presents hardware simulation, synthesis and implementation
commented results; conclusions are presented in section V.
II. SYSTEM MODEL
A. SCC TURBO CODING SCHEME DESCRIPTION
Turbo encoder classic architecture is based on two RSCs
concatenated through a memory called interleaver which is
responsible for scrambling the input data to avoid bursts of
consecutive encoding errors. The most common is to find
the encoders in parallel, i.e. the original input and the input
modified by the interleaver are encoded at the same time,
although there is also the possibility of the turbo encoder
appearing with the RSC in series with the interleaver in
between, in such a way that the interleaver messes up the
original encoded information, which means that it has a
larger memory than in the parallel case. Each variant has its
advantages and disadvantages, which are discussed in [34],
but one important advantage that the serial configuration has
over the parallel model is that the data and parity bits can
exploit the extrinsic information. This advantage is one of
the main reasons why in the reference standard of this work a
modification of the series turbo decoder ,the above mentioned
SCCC configuration, is chosen.
The use of SCCC is intended mainly for high data rate
applications. The Forward Error Correction (FEC) scheme is
based on the concatenation of two simple four-state encoder
structures. The SCCC scheme implies a Physical Layer frame
of constant length, with pilots inserted in fixed positions. This
architecture simplifies the synchronization procedure, thus
further allowing fast and efficient acquisition at very high
rates for the receiver [32]. The following sub-subsections
describe the different blocks that make up the complete en-
coder. As it has been exposed, the turbo encoding proposed in
this standard is a serial-based one, which needs a completely
different architecture at the decoder side. As it can be seen in
the figure, the puncturing operation after CC1 only removes
one bit out of 4 of redundancy, thus the systematic bits are
always transmitted.
Fig. 1 shows the block diagram for the complete encoder.
It consists of two convolutional encoders, the outer one is
referred to as CC1 and the inner one as CC2, a fixed punc-
turing block, an interleaver block, one de-multiplexer item
to split the systematic and parity bits produced by encoder
CC2, so that the inputs are finally reorganised to enter another
interleaver, named as Row-Column interleaver, with its own
puncturing patterns that would only affect the systematic
or parity information separately, again preventing possible
concatenation errors and making the system more robust.
It should be emphasised that the puncturing block pattern,
the interleaving block pattern and the additional puncturing
patterns of the Row-Column interleaver depend on the mode
of operation followed from those available in the reference
standard. It should be noted also that it is these modes of
operation that determine the lengths of the frames that are
sent between the different blocks that make up the SCC Turbo
Coding Scheme. Although it is not in the scope of this paper
to explain in detail how the individual blocks work, it is
considered necessary to provide some additional information
about the encoders and the Row-Column interleaver, in order
to facilitate the subsequent understanding of the proposed
architecture for the decoding of the complete structure.
About the encoders, they have the same features, with
coding rates 1/2 and 4 possible states in the encoding process.
Fig. 2 shows the architecture of this type of encoder, where
the boxes with a D inside symbolise a shift register, and the
circles enclosing the ’+’ symbol refer to the logical operation
"Exclusive OR". Initially the registers are initialised to ’0’,
and the coding takes place with the switch in upper position.
Once the frame is encoded, for the final two bit times, the
switch moves to the lower position to receive feedback from
the registers. This feedback cancels the same feedback sent
(unswitched) to the leftmost exclusive OR gate, and causes
all two registers to become filled with zeros after the final two
bit times. In this way, a terminated trellis has been obtained,
which simplifies the decoding process.
In the case of Row-Column interleaver, its goal is to
reorganize the information in such a way that the output
VOLUME 4, 2016 3
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3235966
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Perez-Naranjo and Gil Jiménez: CCSDS 131.2-B-1 SCC turbo decoder architecture for efficient FPGA implementation
FIGURE 2. Convolutional Encoder Block Diagram for CC1 and CC2.
Extracted from [32].
consists first of getting all the systematic bits, applying a
certain puncturing pattern to them, and then all the parity bits
of CC2 also having done an additional puncturing operation.
Fig. 3 presents a graphical representation of this action in the
reverse way to be implemented in the receiver, in a block
that is called Row-Column de-interleaver. It can be seen
how the systematic and parity bits of CC2, orange and blue
positions in Fig. 3 respectively, are interleaved, and after
passing through the block first all the systematic bits are
grouped together, and then all the parity bits. At this point,
the block has independent puncturing patterns too for the
systematic or parity information, which it applies to recover
the corresponding position that was deleted in the transmitter
when the original puncturing was applied, reflected as black
filled positions in Fig. 3. Depending on the transmission
mode, the amount of systematic or parity bits are different.
FIGURE 3. Schematic representation of the action performed by the
Row-Column de-interleaver block.
B. BCJR ALGORITHM OVERVIEW
As mentioned in the introduction, in a turbo decoder design
the algorithm should be selected as a trade off between the
decoding performance and implementation complexity. The
BCJR algorithm must work with soft inputs and soft outputs,
which allows it to obtain more accurate results than applying
a hard method to the output. Besides, a soft output allows
to make the structure more flexible to develop an iterative
architecture.
Let’s consider an input sequence x=x1x2. . . xNof N
n-bit symbols, and let uibe a binary random variable with
possible values {0,1}which represent the information or
message input bit corresponding to estimated value according
to xksymbol. From now, if ukis ’1’, it is mapped as +1
and ’0’ is mapped as -1 otherwise. Thus, taking into account
the input bit a priori probability P(ui), we define the log-
likelihood ratio (LLR)
L(ui) = logP(ui= +1)
P(ui=1),(1)
which, at the beginning of the algorithm is zero. The reason
is because we do not have any previous information and we
assume the input bits are i.i.d, so P(ui= +1) = P(ui=
1) = 1/2.
The BCJR algorithm needs certain information to estimate
the correct bit. This information corresponds to the sequence
of symbols received, denoted as yfor which the algorithm
computes the a posteriori LLR
L(ui|y) = logP(ui= +1|y)
P(ui=1|y).(2)
Considering the transition from the current state ψto the
next state ψ, and defining two sets denoted as U1and U0
representing the set of transitions from state Si1=ψto
state Si=ψoriginated by ui=1or ui= +1, with
i= 1,2, . . . , N , respectively. Thus, a posteriori LLR can be
expressed as
L(ui|y) = logP(ui= +1|y)
P(ui=1|y)=logPU1P(ψ, ψ|y)
PU0P(ψ, ψ|y).(3)
Applying Bayes theorem over (3) due to transitions are
mutually exclusive, we finally obtain
L(ui|y) = logPU1P(ψ, ψ, y)P(y)
PU0P(ψ, ψ, y)P(y)
=logPU1P(ψ, ψ, y)
PU0P(ψ, ψ, y).(4)
In (4), it is shown the joint probability of receiving the N-
bit sequence yand being in state ψat time i-1 and in state ψ
at the current time i.It can be seen how the final expression
for calculating the LLR is a ratio of two joint probabilities,
with the numerator being the joint probability of receiving y
and being in state ψat time i1and in state ψat the current
time ifor the set originated by uk= +1, and the numerator
same case for the set originated by uk=1. In turn,
this joint probability can be disassembled as the product of
three temporally differentiable probabilities, associated to the
temporal character that reflects the computed trellis diagram
of a convolutional encoder. Assuming that we are in the i-
th position of the trellis, we can define these probabilities
as referring to the past state, present state and future state of
that position in the diagram. Reapplying Bayes’ theorem and
assuming a memory-less channel where the current symbol
does not depend on past information, the joint probability de-
composition in the three temporal subsequences is as follows
4VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3235966
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Perez-Naranjo and Gil Jiménez: CCSDS 131.2-B-1 SCC turbo decoder architecture for efficient FPGA implementation
P(ψ, ψ, y) = P(y>i |ψ, ψ, y<iyi)P(ψ, ψ, y<i,yi)
=P(y>i|ψ)P(ψ, ψ, y<i,yi)
=P(y>i|ψ)P(yi, ψ|ψ,y<i)P(ψ,y<i)
=P(y>i|ψ)P(yi, ψ|ψ)P(ψ,y<i)
=βi(ψ)γi(ψ, ψ)αi1(ψ).(5)
Attending to (5), the above-mentioned time subsequences
are therefore described by those conditional and joint prob-
abilities, in such a way that βi(ψ) = P(y>i|ψ)is the
conditional probability that, given the current state is ψ,
the future sequence will be y>i,γi(ψ, ψ) = P(yi, ψ|ψ)
defines the probability that next state is ψ, and the received
symbol is yigiven the previous state is ψand αi1(ψ) =
P(ψ,y<i)determines the joint probability that at time i-
th the state is ψand the received sequence until then is
y<i. Associating the definitions of the probabilities to the
grid diagram, therefore, βi(ψ)refers to the future transitions
for moments after i-th instant and it is denoted as backward
metric, γi(ψ, ψ)refers to current transitions values and it is
denoted as branch metric, and αi1(ψ)refers to previous
transitions for moments before i-th instant and it is denoted
as forward metric. Finally, by inserting (5) in (4) we arrive at
the final LLR expression that we will implement in the FPGA
to calculate the a posteriori probabilities of each decoder of
the proposed architecture
L(ui|y) = logPU1βi(ψ)γi(ψ, ψ)αi1(ψ)
PU0βi(ψ)γi(ψ, ψ)αi1(ψ).(6)
The process by which an approximate closed expression is
derived for each of the time subsequences is shown below in
order to facilitate the simplification process discussed in the
next section where the final expressions to be implemented
in VHDL are shown.
1) Branch metric computation
Applying the definition of conditional probability, the branch
metric expression can be written as
γi(ψ, ψ) = P(yi, ψ|ψ)
=P(yi|ψ, ψ)P(ψ , ψ)
=P(yi|ψ, ψ)P(ui).(7)
Referring to the first factor, P(yi|ψ, ψ), it should been
noted that the joint occurrence of the consecutive states
Si1=ψand Si=ψis equivalent to the occurrence of
the corresponding coded symbol xiin the transmitter, so,
P(yi|ψ, ψ) = P(yi|xi). Substituting this in (7) we obtain
γi(ψ, ψ) = P(yi|xi)P(ui).(8)
To define an expression for P(yi|xi), it is taking into
consideration that in a memory-less channel the successive
transmissions are statistically independent, so it can be writ-
ten than
P(yi|xi) =
N
Y
m=1
P(yim|xim ).(9)
In this work, in order to facilitate the hardware imple-
mentation, we consider the approximation that the channel
has been modeled under the conditions of an Additive and
White Gaussian Noise (AWGN) channel, therefore in [35] it
is shown that the expression for (9) is as follows
P(yi|xi) = C(0)
iexp 2FREb
N0
(xi·yi),(10)
where C(0)
iis a constant computed from channel characteris-
tics such as fading or measure with the received sequence,
Fis the channel fading amplitude, Ris the coding rate
and Eb/N0is the energy per bit-to-noise ratio in dB of the
system. The P(ui)calculation case is much simpler, since
by defining
P(ui=±1) = exp {uiL(ui)}
1 + exp {uiL(ui)},(11)
and substituting this value in (1) , we obtain, after regrouping
the terms
P(ui) = C(1)
iuiL(ui)
2.(12)
Finally, substituting (10) and (12) into (8), we arrive at the
final expression of the branch metrics
γi(ψ, ψ) = C(0)
iC(1)
iexp 2FREb
N0
(xi·yi)uiL(ui)
2
=Ciexp FREb
N0
(xi·yi)uiL(ui).(13)
In the case of other channel models the metric can be
adapted to accordingly or even used as it is in eq. 13 assuming
a slight degradation in the performance.
2) Forward and Backward metric computation
Let us define the expressions to be implemented to refer to
past and future transitions with respect to the motion in the
i-th state. Previously, the subsequence corresponding to the
past had been defined as
αi1(ψ) = P(ψ,y<i)αi(ψ) = P(ψ, y<i,yi),(14)
and applying definitions of probability theory [36] and as-
suming again the action on a memory-less channel, the
expression for the forward metrics computation expression
remains in a recursive calculation as defined below
VOLUME 4, 2016 5
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3235966
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Perez-Naranjo and Gil Jiménez: CCSDS 131.2-B-1 SCC turbo decoder architecture for efficient FPGA implementation
αi(ψ) = P(ψ, y<i ,yi)
=X
ψ
P(ψ, ψ,y<i ,yi)
=X
ψ
P(ψ, yi|ψ)P(ψ,y<i )
=X
ψ
γi(ψ, ψ)αi1(ψ).(15)
It is worth nothing to say that constraints of memory-less
channel is not mandatory for good performance. It is only for
ease on the explanation. For the case of the sub-sequences
corresponding to the future, the process is analogous. From
the probabilistic definition
βi(ψ) = P(y>i|ψ)βi1(ψ) = P(y>i1|ψ),(16)
and following the constrains of the previous case, the recur-
sive expression for calculating future transitions is defined as
follows
βi1(ψ) = P(y>i1|ψ)
=X
ϕ
P(ψ, yi,y>i |ψ)
=X
ψ
P(y>i|ψ, ψ, yi)P(ψ, yi|ψ)
=X
ψ
P(y>i|ψ)P(ψ, yi|ψ)
=X
ψ
βi(ψ)γi(ψ, ψ).(17)
Due to the fact that (15) and (17) are recursive expressions,
it is necessary to define an initial value, in the case i= 0 for
the forward metrics and i=Nin the case of the backward
metrics. In both cases, the initial value is
α0(ψ) = (1if ψ= 0,
0if ψ= 0.(18)
βN(ψ) = (1if ψ= 0,
0if ψ= 0.(19)
However, as discussed in [37], all these expressions are
too expensive to be implemented in hardware, and therefore
numerical methods have been developed that are able to
present similar results to those achieved with the previous
expressions but with lower computational cost.
C. BCJR ALGORITHM SIMPLIFICATION METHODS
In order to implement the convolutional decoder for a RSC
based on the BCJR algorithm, it is necessary to first apply
(13), and then recursively and preferably with a paralleled
structure, (15),(17), (18) and (19).
In order to achieve this goal efficiently, in [38] a method
based on natural logarithms is proposed, by which the ex-
pressions mentioned above are altered, thus obtaining
L(γ)
i(ψ, ψ) = logγi(ψ, ψ)
=Ci+FREb
N0
(xi·yi) + uiL(ui).(20)
L(α)
i(ψ) = logαi(ψ)
=max[ψ]L(α)
i1(ψ) + L(γ)
i(ψ, ψ).(21)
L(β)
i1(ψ) = logβi1(ψ)
=max[ψ]L(β)
i(ψ) + L(γ)
Ni(ψ, ψ).(22)
From (20), (21) and (22) it can be seen that the greatest
advantage obtained is that the multiplications have been
transformed into sums evaluated on the max[Si](·)function,
which is nothing more than the ordinary maximum function
evaluated in the current trellis state, where the highest value
is sought, which is simple to implement in VHDL’93 and
already compiled if VHDL’08 is used. This simplification
is known as the max-log-MAP algorithm. Since logarithms
are applied to recursive expressions, we must also apply
logarithms to the expressions (18) and (19), obtaining
L(α)
0(ψ) = (0if ψ= 0,
−∞ otherwise. (23)
L(β)
N(ψ) = (0if ψ= 0,
−∞ otherwise. (24)
There is a disadvantage with this approximation, and that
is that the numerical values are slightly worse than values
obtained with the original expressions. However, such degra-
dation in performance are not very large and therefore in
practice this sub optimal solution makes it perfectly viable
for implementation due to the versatility it offers in terms of
simplicity of implementation and computational efficiency,
since in hardware the computational expressions would use
auxiliary Digital Signal Processors (DSPs) that with this
implementation are replaced by adders, much less expensive
and faster, leaving the DSPs free for other tasks of the
receiver.
Original BCJR algorithm has a disadvantage which is the
problem of numerical instability that occurs in (15) and (17),
since a normalization of these expressions is required for
each time iand avoid overflow. Max-log-MAP algorithm
solves this problem and does not require such normalization,
which translates as a splitting over previous results that
must be saved for subsequent iteration due to recursion, and
therefore, entails extra memory cost and auxiliary DSPs in
hardware.
6VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3235966
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Perez-Naranjo and Gil Jiménez: CCSDS 131.2-B-1 SCC turbo decoder architecture for efficient FPGA implementation
Algorithm 1: Max-log-MAP algorithm steps
Data: ui, L(ui),i= 0
Result: L(ui|y)R
;/*Branch metrics computation */
1while i=Ndo
2Compute (20) for each input uiand L(ui);
3i++;
4i=0;
;/*Forward and backward metrics
computation */
5while i=Ndo
6Compute (21) with L(γ)
i(ψ, ψ);
7Compute (22) with L(γ)
Ni(ψ, ψ);
8i++;
9i=0;
;/*A posteriori probability
computation */
10 while i=Ndo
11 Compute (26) with L(α)
i(ψ),L(γ)
i(ψ, ψ)and
L(β)
Ni(ψ);
12 i++;
13 i=0;
It is important to note that in [39] another simplification
method is proposed that gives the same numerical results as
the original solution but with less computational expense,
using the max* function, based on the Jacobian logarithm
operation and defined as
max*(θ, ϕ) = max(θ, ϕ) + log (1 + exp {1 |θϕ|}).
(25)
This variant is known as log-MAP algorithm, but despite
giving the exact results of the original BCJR algorithm, it
requires making external auxiliary modules based on Look-
Up-Tables (LUTs) that would consume more board resources
and would also add additional clock cycles delays. Due to
that the integration of this block with the rest of the structure
would harder the complete turbo decoder and the receiver
itself, since the latter also uses LUTs based on counters to
synchronize the sending of information from one block to
another. For these reasons we have preferred to use the max-
log-MAP version, since it is simpler to implement and allows
the calculations to be performed in the same clock cycle.
Finally, changes made in (20),(21) and (22) have an effect
on (6) where the a posteriori probabilities computation is
simplified as
L(ui|y) = max[U1]L(α)
i(ψ) + L(γ)
i(ψ, ψ) + L(β)
Ni(ψ)
max[U0]L(α)
i(ψ) + L(γ)
i(ψ, ψ) + L(β)
Ni(ψ).(26)
For implementation purposes, max-log-MAP algorithm
summarized steps are described in Algorithm 1 box. It must
FIGURE 4. SCC-Decoder full hardware block schematic.
FIGURE 5. Main block diagram of the two major stages of the proposed
turbo decoder architecture.
be emphasised that while statements in Algorithm 1 box are
not apply to do a loop in hardware, they are just to note that in
the implementation should be a pipelined structure with flow
control.
III. PROPOSED PIPELINED ARCHITECTURE
This section shows the schematics of the proposed design to
implement the complete turbo decoder structure in hardware.
First, Fig. 4 shows a schematic of the inputs and outputs of
the above modules. The Row-Column de-interleaver input,
which is the input of the complete SCC-decoding block,
takes the output values from the previous soft demodulator
and a trigger signal that is activated while calculating those
soft values a priori. The demodulator is designed to, on the
same clock edge, provide the soft bit corresponding to the
systematic bit and the parity bit coming out of the CC2
encoder, according to the transmission scheme shown in Fig.
2. Once these soft bits are obtained, while the trigger signal
arriving from the demodulator is enabled, the storing circuit
is also active, which is responsible for storing the incoming
information, until finally the maximum of the frame has been
reached and the demodulator is deactivated. Therefore, the
incoming trigger signal to our block is not more enabled and
the storing circuit is stopped to perform the interleaving and
puncturing operations corresponding to that block.
The proposal for this next block can be simplified in the
diagram in Fig. 5, where the decoder operation has been
divided into 2 main stages. The first stage consists of decod-
VOLUME 4, 2016 7
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3235966
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Perez-Naranjo and Gil Jiménez: CCSDS 131.2-B-1 SCC turbo decoder architecture for efficient FPGA implementation
FIGURE 6. Full block diagram for the hardware components and connections corresponding to the Row-Column de-interleaver block.
FIGURE 7. Complete block diagram of the proposed hardware architecture for the corresponding turbo decoder block.
8VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3235966
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Perez-Naranjo and Gil Jiménez: CCSDS 131.2-B-1 SCC turbo decoder architecture for efficient FPGA implementation
ing by inverting the steps of the proposed scheme, where a
block with the VHDL implementation of the max-log-MAP
algorithm corresponding to CC2 would be applied, then the
effect of the interleaver and the fixed puncturing block would
be undone, and finally the max-log-MAP algorithm corre-
sponding to CC1 would be applied again on the processed
information. These four big blocks are collected in the purple
area of Fig. 5, named as feedforward stage, where the next
step is to check the iteration on which we are, if it has already
been the last one, we take out the calculated likelihoods to
apply a hard decisor and obtain the estimated bits, otherwise
we go to the second stage. The second step consists of an
update of the a priori observations from the first decoding
block, obtaining the inputs from the second decoding block,
updating their values and applying puncturing to them to
adjust the size of the first decoding block frame, which is
fixed, due to the first decoding block inputs always are the
observations from the Row-Column de-interleaver, which are
themselves stored in auxiliary block RAMs. Also, updated a
priori observations are modified by the original interleaver.
These stage refers to green area in Fig. 5, named as feedback
stage. Data in this stage always returns to feedforward stage.
Further details of the Row-Column de-interleaver can be
seen in Fig. 6, where the pipelined components that compose
it are shown in yellow. Also, in same figure, it is shown
several rectangular prisms in dark turquoise blue that rep-
resent the RAM blocks where the information used in the
blocks is stored for processing. In Fig. 6, it can be seen
how the block consists of a master process, called deinter-
leaver synchronizer, which is responsible for activating each
component at the right time, as it has followed a parallel
architecture and the blocks are independent, so that when
implementing on the FPGA resources are allocated more
efficiently. The master process activates the storing circuit
in such a way that the input information is stored in two
RAM blocks. Subsequently, when no more data enters, the
de-interleaving + puncturing process is activated to perform
the behavior shown in Fig. 3. For this, the data is stored in
another auxiliary RAM block in the appropriate order, and
finally, reading the puncturing patterns corresponding to the
operation mode, implemented by LUTs where ’1’ is stored if
the corresponding position is not deleted and ’0’ if a position
on which puncturing has been performed in transmission has
to be added, the corresponding positions are added. Finally,
when the end of the LUT corresponding to the puncturing
patterns is reached, the information stored in the auxiliary
RAM block is sent to the block corresponding to the turbo
decoder.
The more detailed separate components that make up
the turbo decoder block (hardware-oriented component im-
plementation of the complete decoding algorithm) and the
connections between them to control the data flow between
the two stages are presented in Fig. 7. As with the Row-
Column de-interleaver block architecture, a master process
called turbodecoder synchronizer is also used here, which is
responsible for enabling each of the independent components
that perform the operations mentioned above (activation of
decoders, interleaver and puncturer circuits for the feedfor-
ward stage, and updater, puncturer and interleaver circuit for
the feedback stage). As in the previous case, it is shown in
yellow every pipelined component used in this architecture
and as a rectangular prism in dark turquoise blue for RAM
blocks, but in addition, two marked data flows have been
added, a purple-coloured path reflecting the route followed
in the feedforward stage and a green-coloured path showing
the route followed in the feedback stage, in accordance with
the diagram presented in Fig. 5. The rest of the blue markers
refer to the trigger signals of the independent components,
which are handled by the turbodecoder synchronizer process.
Before explaining the control logic over the data flow, it
should be noted that depending on the used transmission
mode, the dimensions of the frames within the block will
vary. These dimensions are given in the reference standard
and can be summarised in up to two single variables in
this block: an integer variable S representing the number
of systematic bits generated in the complete encoder, and
another integer variable K representing the encoder input
frame size [32], both referring to the transmitter components.
Therefore, in the block diagram in Fig. 7, the dimensions of
the frames according to these variables have been added to all
the paths corresponding to the two main stages of the block.
The operation is therefore as follows: in the case of the
first iteration, the synchronising signals between the Row-
Column de-interleaver and the turbo decoder are only acti-
vated while information is sent from the former to the latter,
where that information is stored in RAM blocks. Once these
signals are deactivated, the 1st convolutional decoder (corre-
sponding to CC2 applying max-log-MAP algorithm) starts
to operate taking a priori observations as null, since i.i.d.
inputs are considered and therefore their LLR is zero, which
produces the first set of observations a posteriori with respect
to the initial input information. These observations are re-
ordered according to reverse the original interleaving pattern
assigned by a LUT depending on the transmission mode,
which generates a frame of 2 observations less than the input,
as the interleaver length is S-2 according to the standard,
regardless of the transmission modes. The de-interleaved
observations are then passed through the depuncturer block,
where the dimension of the incoming frame is increased to
undo the effect of the fixed puncturing pattern applied in
transmission and match the length of the output frame of
CC1 in transmission. As explained above, this puncturing
pattern is fixed in the standard, and as in the transmitter it
consisted of removing one observation every four coming out
of CC1, this block in reception generates an extra observation
every 3 received from the de-interleaver block. Next, the
2nd convolutional decoder (corresponding to CC1 and also
applying the max-log-MAP algorithm version) is applied and
the first set of a posteriori observations of the complete turbo
decoder is obtained, finishing the operations corresponding
to first iteration of the feedforward stage. This decoder also
works with null a posteriori likelihoods, but unlike the first
VOLUME 4, 2016 9
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3235966
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Perez-Naranjo and Gil Jiménez: CCSDS 131.2-B-1 SCC turbo decoder architecture for efficient FPGA implementation
one, these are not going to be updated in any iteration,
since the information about the changes of the interleaver
and puncturer blocks are the ones used to update only the
1st decoder at the beginning of a new iteration, i.e. the
modifications are included in the first decoder. This bring us
a simpler and efficient architecture, due to it is not necessary
to do extra computation for this block. At this point, it is
evaluated whether the maximum number of iterations set has
been reached. If this is the case, the hard decision is made on
the last a posteriori likelihoods obtained, but as we are still
in the first iteration, we move on to the feedback stage, and
therefore the a posteriori likelihoods are discarded (frame of
length K) and the updated 2nd convolutional decoder input
(frame of length 2(K+2)) is sent to the 2nd convolutional
decoder block. In this way the original puncturing pattern is
performed and the interleaving operation is applied to these
observations, which finally return to the 1st convolutional
decoder block and will be the a priori observations to be
added to the original inputs in the next iteration.
This process will be repeated until the maximum number
of iterations is reached, and as mentioned above, it will be
applied in a hard decision on the a posteriori likelihoods
coming from the 2nd decoder, such that a ’0’ is chosen if the
likelihood is negative, and ’1’ otherwise. An efficient way to
perform this operation in hardware is just to invert the sign bit
of the observation value to be evaluated, i.e. if we obtain an
observation whose value as a real number is negative, its sign
bit will always be ’1’, and therefore the result of the circuit
is simply to apply a NOT gate to this bit, obtaining a ’0’.
The same operation would be done in the case of a positive
likelihood.
IV. RESULTS AND ANALYSIS
This section describes the results obtained with the proposed
architecture. Before analysing the simulation and imple-
mentation results. It is important to note the degradation
suffered with the max-log-MAP algorithm approximation
with respect to the optimal version described by (25). For
this purpose, Fig. 8 shows the software simulation results
of the Bit Error Rate (BER) obtained using the complete
architecture for a a different number of iterations and for a
low range of values over the Eb/N0. This figure is intended
to show the difference in error correction gain achieved by
applying the optimized algorithm with the theoretical, or
optimal, algorithm (much more expensive and difficult to
implement in hardware) since both versions will correct all
frame errors but with different values of Eb/N0(in dB).
Two groups of curves can be seen in the figure, the solid
line curves corresponding to the optimal version of the BCJR
algorithm and the dashed line curves corresponding to the
results of applying the max-log-MAP algorithm. For the case
in which only one iteration of the turbodecoder (associated
to both blue curves) is performed, the difference between the
optimal version and the simplified approach is null, since the
system has not been fed back by updating the information
a priori, so the differences between the two versions of the
-3 -2 -1 0 1 2 3
Eb/N0 (dB)
10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
Bit Error Rate
Optimum: 1 iter.
max-log-MAP: 1 iter.
Optimum: 2 iter.
max-log-MAP: 2 iter.
Optimum: 4 iter.
max-log-MAP: 4 iter.
Optimum: 7 iter.
max-log-MAP: 7 iter.
Optimum: 10 iter.
max-log-MAP: 10 iter.
FIGURE 8. BER results using the optimal version of the BCJR algorithm
(solid lines) and using the simplified version based on the max-log-MAP
algorithm (dashed line) for the complete turbo decoder architecture.
algorithm cannot be appreciated. However, when a second
iteration is performed, the difference in system performance
is already noticeable, as shown by the red curves. This trend
is maintained as the number of iterations increases, being
the system able to correct all errors more drastically for a
higher number of iterations than the previous cases, until
in the case shown by the magenta and black solid lines,
corresponding to use 7 and 10 iterations respectively, the
system converges and corrects all errors for an Eb/N0= -1
dB, and its simplified version (associated to the dashed curves
with the same colors) shows the same behavior in the case
of -0.5 dB. Therefore, it has been decided to implement the
system using 7 iterations instead of 10 in order to realize a
more efficient system that spends less processing time and
FPGA resources.
A. HDL SIMULATION
This subsection shows the hardware simulation, after having
applied the architecture described in this paper in VHDL.
Simulation is the processing of a complete frame corre-
sponding to having used transmission mode 1, consisting
of 16000 observations from the soft demodulator, will be
established with the turbo decoder set to run at 7 iterations
and operating at a clock frequency of 100 MHz. To verify
the results shown and for simplicity for the reader, integer
arithmetic is considered. For this, a random synthetic signal
of logic vectors is generated, but it will be evaluated using
the Vivado converter to signed integers, and the input signal
will be cloned with MATLAB to show that the results of the
likelihoods and the final logic signal are correct, obtaining
the same results in both tools.
The result of the hardware simulation of the Row-Column
de-interleaver block is shown in Figures 9 and 10. Fig. 9
shows the Row-Column de-interleaver own input, which is
mapped to samples received from the soft demodulator. It
can be seen in this figure how the inputs of this block
(marked with a yellow marker) correspond to the outputs of
10 VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3235966
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Perez-Naranjo and Gil Jiménez: CCSDS 131.2-B-1 SCC turbo decoder architecture for efficient FPGA implementation
FIGURE 9. Beginning of the Row-Column de-interleaver input frame in
the hardware simulation.
FIGURE 10. Action time of the block corresponding to the Row-Column
interleaver.
FIGURE 11. Numeric values of the beginning of the Row-Column
de-interleaver input frame (left) and of the beginning (middle) and (right)
of the Row-Column de-interleaver output frame.
the demodulator (named as ’LL1’ and ’LL2’ in the figure).
Fig. 10 presents the time transition between all the operations
corresponding to this complete block. The yellow vertical
time marker indicates the beginning of the de-interleaving
and puncturing process, and the blue marker indicates the end
and the beginning of the ejection of the modified information
to the turbo decoder. Also in this figure, it is mentioned
the beginning and the end of the modified Row-Column
de-interleaver frame, respectively, marking with blue arrows
also the positions where extra values have been added as a
result of the depuncturer process. The beginning of the frame
is the figure divided by a red dash line and the end of the
same frame is signalized with an orange dash line, all in the
same figure.
The results shown can be validated with the software
results shown in Fig. 11, where the systematic observations
have also been highlighted in blue, leaving the unmarked
ones as those corresponding to parity. Blue arrows are also
used to indicate the action of the depuncturer on the corre-
sponding positions.
FIGURE 12. Waveforms corresponding to the turbo decoder 1st iteration
process.
Fig. 12 shows all the trigger signals corresponding to the
subsystems involved in the decoding for a single iteration.
The signals ’y1’ and ’y2’ are the outputs of the Row-Column
interleaver, while the signals highlighted with an orange box
symbolize the activation of each of the blocks that make
up the turbo decoder architecture. It is shown how these
signals are activated sequentially to maintain the flow of
the complete turbo decoding algorithm, and also the end of
the second decoder (where the max-log-MAP algorithm has
been applied for the second time in this first iteration) has
been marked with a circle and a red arrow to symbolize that
this is where the likelihoods of interest to be evaluated are
obtained after a fixed number of iterations (7 according to the
gain simulation measurements as indicated at the end of the
second paragraph of section IV). Finally, separated by a red
and orange dashed line, the numerical values of the beginning
and end of the plot, respectively, of final likelihoods obtained
in the first iteration of the architecture are shown. These
values are to be compared with those obtained in the software
simulation to check the correct functioning of the architecture
in hardware.
These trigger signals are responsible for enable each of
the sub-circuits marked in yellow in Fig. 7 in the appropriate
order to maintain the correct flow of the algorithm. According
to the label used for each signal in Fig. 12, it is summarized
in Table 1 which sub-circuit is activated and which function
each activation signal performs.
Fig. 13 presents the numerical values of the output of the
first decoder for each of the iterations in software simulation,
to validate the data of the algorithm written in VHDL. The
left column shows the values at the beginning of the frame
and the right column shows the values at the end of the frame.
Fig. 14 shows the waveforms obtained when the complete
decoding process has been completed, reaching the maxi-
VOLUME 4, 2016 11
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3235966
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Perez-Naranjo and Gil Jiménez: CCSDS 131.2-B-1 SCC turbo decoder architecture for efficient FPGA implementation
TABLE 1. Associated circuits and functions of each trigger signal shown in Fig. 12.
Stage Signal name Fig. 7 sub-circuit Operation done
FEEDFORWARD
SET-UP-F SET UP BRAMs Start the process, save the initial observations in memory to use at the beginning of each iteration.
DCDR-1-F OUTER DECODER Enables the outer decoder circuit.
DEC-IN-F OUTER DECODER It allows avoiding losing information after the feedback stage.
DEINTL-F DEINTERLEAVER Enables the De-Interleaver circuit.
DEPUN-F DEPUNCTURER Enables the De-Interleaver circuit.
DCDR-2-F INNER DECODER Enables the inner decoder circuit.
HARD-D-F HARD DECISION Enables the hard decision circuit at the end of the algorithm.
FEEDBACK
NEW-IT-F NEW ITERATOR FLAG Signaling when a new iteration is on (Non first or final iteration)
UPDATE-F UPDATE Enable the full feedback action with INNER DECODER results (a posteriori estimated likelihoods).
PUNCTU-F PUNCTURER Apply original puncturing pattern
FEEDBK-F FEEDBACK INTERLEAVER Apply original interelaver pattern and feedback OUTER DECODER
FIGURE 13. Beginning (left) and end (right) of the numerical values
frame for the feedforward stage output observations for all iterations.
FIGURE 14. Waveforms corresponding to performing the turbo decoder
iterations.
mum number of configured iterations. In the figure itself,
an orange grid has been added in the part according to the
trigger signals that divide the activation’s of each iteration.
It can be seen how the patterns of the activation signals
are repeated during all iterations, according adapting to the
behavior seen in Fig. 12, where in addition a trigger signal
is activated which is held at ’1’ for the rest of the iterations
and returns to ’0’ when the iteration is finished and the hard
decision is about to be made. Below, separated with a dashed
red line, the update of the values of the likelihoods of interest
on which to do in the last iteration the hard demodulation and
FIGURE 15. Bits obtained after performing the hard decision process in
software simulation. Initial (left) and final values (right).
obtain the estimated bits of the frame is shown. It can be seen
how the behavior of the previous figure is maintained without
anomalies in each iteration. At the beginning, it has been
marked with a red circle the activation of the signal called
in the figure as ’SET-UP-F’, which is activated only in the
first iteration, and remains off in the rest of iterations, since
this signal only indicates that the original input values on
which it is necessary to iterate have been stored in memory.
12 VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3235966
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Perez-Naranjo and Gil Jiménez: CCSDS 131.2-B-1 SCC turbo decoder architecture for efficient FPGA implementation
FIGURE 16. Beginning of the bit frame obtained at the output of the SCC-decoder block.
FIGURE 17. End of the bit frame obtained at the output of the SCC-decoder block.
Likewise, with a purple circle we have marked the end of the
last iteration, where only the signal ’HARD-D-F’ is activated,
which applies hard demodulation on the last likelihoods
calculated to obtain bits, with the method explained in the last
paragraph of Section III. Therefore, the behavior is shown to
be correct since there are markers at the beginning and end
of the architecture.
Finally, the results of the hard decision process are shown
in figures 20-22. Fig. 20 simply shows the results of applying
the proposed software architecture on the signed integer
results used in this simulation. Figures 21 and 22 present the
start and end of the bit frame obtained after the complete
decoding process, respectively. Also, in these figures, the
value of the corresponding bit at each clock edge has been
marked to make it easier to compare the results.
B. SYNTHESIS RESULTS
Once it has been shown that the hardware description of
the proposed architecture shows waveforms that suit our
objectives, we move on to synthesise the circuit to see the
resources it consumes. The synthesis tool used is Vivado
and, as mentioned before, the FPGA is the Xilinx Zynq
UltraScale+ RFSoC ZCU28DR model. The results of the
synthesis are summarised in Table 2.
TABLE 2. Resources used by the FPGA after synthesising the proposed
architecture.
Resource Estimation Available Utilization (%)
LUT 6914 425280 1.63
FF 2542 850560 0.3
BRAM 273.5 1080 25.33
IO 28 347 8.07
BUFG 1 696 0.14
This table 2 shows that the proposed architecture as a
whole expends 6914 LUTs, 2542 Flip-Flops (FF), 273.5
RAM blocks (BRAM), 28 input-output (IO) nets and 1 clock
buffer (BUFG). It should be emphasised that the BRAM
components used in this design refers to the set of memories
used in the design, as the architecture in general uses 39
memories, distributed as follows: 12 RAMs of 207 Kbits, 12
RAMs of 180 Kbits, 2 RAMs of 174 Kbits, 3 RAMs of 168
Kbits, 3 RAMs of 126 Kbits, 2 RAMs of 92 Kbits, 2 RAMs
of 87 Kbits and 3 RAMs of 84 Kbits. These sizes are related
to the turbo decoder parameters in the standard, which define
the frame lengths with which it operates. In this way, it is
possible to customise the RAMs to be used in the circuit and
use the right amount of bits to implement the hardware. Thus,
a component such as the decoder that usually consumes a lot
of receiver resources has been optimized to use around 25%
of the FPGA’s available BRAMs. The rest of the resources
used are residual considering all those available.
The synthesis results shown can be graphically comple-
mented with the RTL models that Vivado generates from
the VHDL code written to implement the equations cor-
responding to the proposed architecture. Figure 18 shows
the two major components of the architecture, which are
the Row-Column de-interleaver (marked in green) and the
turbodecoder block (marked in red).
FIGURE 18. Main view of synthesized circuit. The Row-Column
De-inteleaver is marked in green and the turbodecoder block is marked
in red.
Within the turbodecoder block, the submodules of Figure
7 are implemented, some of which are shown in Figure 18.
However, it should be noted that for the system to function
correctly, a large number of primitives are generated that clut-
ter the canvas, so a zoom of the complete block is provided.
In this figure the initial RAMs have been marked in red, used
to store the initial frames coming from the Row-Column de-
interleaver so that they can be used at the beginning of each
iteration of the algorithm and not be lost in the following
clock edges, and in green the circuits corresponding to the
VOLUME 4, 2016 13
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3235966
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Perez-Naranjo and Gil Jiménez: CCSDS 131.2-B-1 SCC turbo decoder architecture for efficient FPGA implementation
TABLE 3. Summary of results of timing test on the proposed implemented architecture running at 100 MHz.
Design timing summary
Setup Hold Pulse Width
Worst Negative Slack (WNS) 2.354 ns Worst Hold Slack (WHS) 0.004 ns Worst Pulse Width Slack (WPWS) 4.458 ns
Total Negative Slack (TNS) 0 ns Total Hold Slack (THS) 0 ns Total Pulse Width Slack (TPWS) 0 ns
Number of Failing Endpoints 0 Number of Failing Endpoints 0 Number of Failing Endpoints 0
Total number of Endpoints 16447 Total number of Endpoints 16447 Total number of Endpoints 3119
FIGURE 19. Some implemented and encapsulated sub-circuits of Figure
17 inside the turbodecoder block.
implementation of the max-log MAP algorithm have been
marked. You can also see other blocks of the architecture
such as the de-interleaver or the de-puncturer and see the
connection between them, i.e., how the blocks are connected
and how the signals enter according to the workflow shown
in Figure 17. It should be noted that without making cuts
and zooming the images, a large number of primitives are
generated, which, as mentioned above, occupy the entire
canvas. This case is shown in Figure 19, where it can be
seen how you get very little information from the full RTL
implementation. This does not matter at the hardware level
of the FPGA since its ability to work in parallel allows
using different areas of the board to perform operations and
modules at the same time. That’s why, to facilitate the reading
of the paper and not to introduce too much information about
the implementation that would complicate its reading, only
some details in the implementation are indicated to give a
more precise vision.
FIGURE 20. Full RTL implementation of turbodecoder block.
C. IMPLEMENTATION RESULTS
Finally, the results of implementing the architecture de-
scribed during this work are shown. In the case of end
resource usage, it is the same as shown in Table 2 except
that the implemented design uses 6807 LUTs instead. For
the case when timing constrains are evaluated, results are
shown in Table 3. It should be remembered that implemen-
tation has been carried out for a clock frequency of 100
MHz. These results reveal that timing requirements are very
comfortably met in the WNS and WPWS measurements, so
higher frequencies could be used. The results on the WHS
measure are a little tighter, but this is to be expected since
this measurement occurs according to the worst case FPGA
resources hold up, and more taking into account that many
BRAMs components are used where there have to be a
lot of connections between LUTs, nets and these blocks,
which affects the performance, but as the results show it is
not a problem in the implementation. In the case of power
consumption, the results are quite good as there is still a lot
of thermal margin. These results are shown in Table 4.
TABLE 4. Summary of results of the power test carried out on the
proposed implemented architecture.
Power test parameter Value
Total On-Chip Power 1.386 W
Junction Temperature 26,2 ºC
Thermal Margin 73,8 ºC (84,7 W)
Effective ΘJA 0,8 ºC/W
Power supplied to off-chips devices 0 W
FIGURE 21. Summary of power expended by each on-chip component.
In addition, as the resources used in the implemented
design have been analysed, a breakdown of the individual
14 VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3235966
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Perez-Naranjo and Gil Jiménez: CCSDS 131.2-B-1 SCC turbo decoder architecture for efficient FPGA implementation
FIGURE 22. Implemented design on the FPGA device.
power consumed by each component on these on-chip re-
sources is added in Fig. 23. As expected, the BRAM blocks
represent the highest consumption of the FPGA due to the
fact that they represent the largest number of components
used in the proposed design. All these implementation results
are due to the routing positioning shown in Fig. 24, where
a schematic of the FPGA is shown and the primitives that
have been assigned to each of the processes describing the
proposed architecture are marked in blue. It can be seen how
the implementation tool has decided to put the primitives in
the northern part of the device, i.e. those areas with a higher
value for the Y coordinates with respect to the different clock
sections.
V. CONCLUSIONS
The work presented in this paper proposed a simple ar-
chitecture for the implementation in the Xilinx Zynq Ul-
traScale+RFSoC ZCU28DR evaluation board a decoding
scheme valid for the SCC Turbo Coding Scheme block
suggested in the CCSDS 131.2-B-1 standard. The algorithms
to be implemented in the design have been detailed as well as
complete and explained schemes on the data flow treatment
and the nets connections that would be necessary to be
able to be implemented in hardware. In addition, a structure
consisting of independent components has been presented
that favours a pipeline architecture, separating as much as
possible the resources to be used by the FPGA, which is
an advantage for the efficiency of the design. Results have
been presented on a hardware simulation based on integer
arithmetic to simplify the validation of results, showing that
the results match the software simulation results and proving
the effectiveness of the proposed architecture. Finally, it has
been verified that the design is synthesizable and passes the
time and temperature tests running the device at 100 MHz,
as well as presenting the resources consumed by the FPGA,
which are used in a very small percentage with respect to
those provided by this board and a scheme of the imple-
mented design on the device. Although the proposed scheme
in this paper is for the standard CCSDS 131.2-B-1, the ideas
and architecture can be easily extrapolated to other serial-
based turbodecoder scheme what makes the contribution of
this paper more valuable.
REFERENCES
[1] C. E. Shannon, A mathematical theory of communication,” The Bell
System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948.
[2] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near shannon limit error-
correcting coding and decoding: Turbo-codes. 1, in Proceedings of ICC
’93 - IEEE International Conference on Communications, vol. 2, pp. 1064–
1070 vol.2, 1993.
[3] C. Berrou and A. Glavieux, Turbo Codes. John Wiley & Sons, Ltd, 2003.
[4] “Btm synchronization and channel coding: Blue book,” standard, CCSDS
131.2-B-0, Consultative Committee for Space Data Systems, Mar. 2013.
[5] “Low-density parity-check codes for use in near-earth and deep space
applications. experimental specification,” standard, CCSDS 131.1-O-0.4,
Consultative Committee for Space Data Systems, Mar. 2006.
[6] “Use of dvb-s2 etsi standard in high data rate telemetry for near space-earth
transmissions. experimental specification,” standard, CCSDS 131.1-O-0.4,
Consultative Committee for Space Data Systems, Mar. 2006.
[7] G. Battail, “Weighting of the symbols decoded by the viterbi algorithm (in
french),” Ann. Télécommun., vol. 42, p. 31–38, 01 1987.
[8] J. Heller and I. Jacobs, “Viterbi decoding for satellite and space communi-
cation,” IEEE Transactions on Communication Technology, vol. 19, no. 5,
pp. 835–848, 1971.
[9] S. Benedetto, G. Montorsi, D. Divsalar, and F. Pollara, “A soft-input soft-
output maximum a posteriori (map) module to decode parallel and serial
concatenated codes,” Telecommunications and Data Acquisition Progress
Report, vol. 42, 07 1996.
[10] A. Viterbi, An intuitive justification and a simplified implementation of
the map decoder for convolutional codes, IEEE Journal on Selected Areas
in Communications, vol. 16, no. 2, pp. 260–264, 1998.
[11] S. Benedetto and G. Montorsi, “Performance of continuous and blockwise
decoded turbo codes,” IEEE Communications Letters, vol. 1, no. 3, pp. 77–
79, 1997.
[12] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear
codes for minimizing symbol error rate (corresp.),” IEEE Transactions on
Information Theory, vol. 20, no. 2, pp. 284–287, 1974.
[13] P. Robertson, P. Hoeher, and E. Villebrun, “Optimal and sub-optimal
maximum a posteriori algorithms suitable for turbo decoding,” European
Transactions on Telecommunications, vol. 8, no. 2, pp. 119–125, 1997.
[14] M. Jezequel, C. Berrou, C. Douillard, and P. PENARD, “Characteristics
of a sixteen-state turbo-encoder/decoder (Turbo4), in International Sym-
posium on Turbo Codes & Related Topics, (Brest, France), pp. 280–283,
Télécom Bretagne, Sept. 1997.
[15] S. Barbulescu, Turbo Codes on Satellite Communications, p. 44. 01 2005.
[16] S. Barbulescu, W. Farrell, P. Gray, and M. Rice, “Bandwidth efficient turbo
coding for high speed mobile satellite communications,” 11 1997.
[17] S. S. Pietrobon, “Implementation and performance of a turbo/map de-
coder, International Journal of Satellite Communications, vol. 16, no. 1,
pp. 23–46, 1998.
[18] “Introduction to cdma2000 standards for spread spectrum systems,” stan-
dard, 3rd generation partnership Project 2, July 1999.
[19] “Physical layer standard for cdma2000 spread spectrum systems,” stan-
dard, 3rd generation partnership Project 2, July 1999.
[20] D. Wisdom, E. Ajayi, U. Arinze, O. Aladesote, A. Ganya, H. Idris,
and D. Wisdom, “Ieee compter society -nigeria -technical paper series a
comprehensive survey on power saving schemes (cspss) in ieee 802.16e/m
networks,” 07 2021.
[21] H. Latchman, S. Katar, L. Yonge, and S. Gavette, Homeplug AV and IEEE
1901: A handbook for PLC designers and users. 09 2013.
[22] S. Li, B. Bai, J. Zhou, P. Chen, and Z. Yu, “Reduced-complexity equaliza-
tion for faster-than-nyquist signaling: New methods based on ungerboeck
observation model,”IEEE Transactions on Communications, vol. 66, no. 3,
pp. 1190–1204, 2018.
[23] F.-L. Luo and C. Zhang, Faster-than-Nyquist Signaling for 5G Communi-
cation, pp. 24–46. 2016.
VOLUME 4, 2016 15
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3235966
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Perez-Naranjo and Gil Jiménez: CCSDS 131.2-B-1 SCC turbo decoder architecture for efficient FPGA implementation
[24] J. Fan, S. Guo, X. Zhou, Y. Ren, G. Li, and X. Chen, “Faster-than-nyquist
signaling: An overview,” IEEE Access, vol. PP, pp. 1–1, 02 2017.
[25] V. K. Veludandi, “Bcjr vs sova for a practical coherent turbo coded
ofdm system,” in 2019 10th International Conference on Computing,
Communication and Networking Technologies (ICCCNT), pp. 1–5, 2019.
[26] K. Vasudevan, “Coherent detection of turbo-coded OFDM signals trans-
mitted through frequency selective rayleigh fading channels with receiver
diversity and increased throughput, Wireless Personal Communications,
vol. 82, pp. 1623–1642, feb 2015.
[27] J. Kang and W. Stark, “Turbo codes for noncoherent fh-ss with partial band
interference,” IEEE Transactions on Communications, vol. 46, no. 11,
pp. 1451–1458, 1998.
[28] H. El Gamal and E. Geraniotis, “Turbo codes with channel estima-
tion and dynamic power allocation for anti-jam fh/ssma,” in IEEE
Military Communications Conference. Proceedings. MILCOM 98 (Cat.
No.98CH36201), vol. 1, pp. 170–175 vol.1, 1998.
[29] J. Gass, P. Curry, and C. Langford, “An application of turbo trellis-coded
modulation to tactical communications,” in MILCOM 1999. IEEE Military
Communications. Conference Proceedings (Cat. No.99CH36341), vol. 1,
pp. 530–533 vol.1, 1999.
[30] S. Jiang, P. W. Zhang, F. Lau, C.-W. Sham, and K. Huang, “A turbo-
hadamard encoder/decoder system with hundreds of mbps throughput,”
in 2018 IEEE 10th International Symposium on Turbo Codes Iterative
Information Processing (ISTC), pp. 1–5, 2018.
[31] A. Louliej, Y. Jabrane, V. Gil Jiménez, and A. Garcia Armada, “Practical
guidelines for approaching the implementation of neural networks on fpga
for papr reduction in vehicular networks,” Sensors, vol. 19, p. 116, 12
2018.
[32] “Flexible advanced coding and modulation scheme for high rate telemetry
applications: Blue book,” standard, CCSDS 131.2-B-1. Consultative Com-
mittee for Space Data Systems, Mar. 2012.
[33] A. Lamoral Coines and V. P. G. Jiménez, “Ccsds 131.2-b-1 transmitter
design on fpga with adaptive coding and modulation schemes for satellite
communications,” Electronics, vol. 10, no. 20, 2021.
[34] E. Boutillon, C. Douillard, and G. Montorsi, “Iterative decoding of con-
catenated convolutional codes: Implementation issues, Proceedings of the
IEEE, vol. 95, no. 6, pp. 1201–1227, 2007.
[35] P. Robertson, “Illuminating the structure of code and decoder of parallel
concatenated recursive systematic (turbo) codes, in 1994 IEEE GLOBE-
COM. Communications: The Global Bridge, vol. 3, pp. 1298–1303 vol.3,
1994.
[36] F. Dekking, C. Kraaikamp, H. Lopuhaä, and L. Meester, A Modern
Introduction to Probability and Statistics: Understanding Why and How.
Springer Texts in Statistics, Springer, 2005.
[37] J. Erfanian, S. Pasupathy, and G. Gulak, “Reduced complexity symbol
detectors with parallel structure for isi channels,” IEEE Transactions on
Communications, vol. 42, no. 234, pp. 1661–1671, 1994.
[38] W. Koch and A. Baier, “Optimum and sub-optimum detection of coded
data disturbed by time-varying intersymbol interference (applicable to
digital mobile radio receivers), in [Proceedings] GLOBECOM ’90: IEEE
Global Telecommunications Conference and Exhibition, pp. 1679–1684
vol.3, 1990.
[39] P. Robertson, E. Villebrun, and P. Hoeher, “A comparison of optimal and
sub-optimal map decoding algorithms operating in the log domain,” in
Proceedings IEEE International Conference on Communications ICC ’95,
vol. 2, pp. 1009–1013 vol.2, 1995.
MIGUEL ÁNGEL PÉREZ NARANJO received
the B.S. and M.S. degree in telecommunication
from the University Carlos III of Madrid in 2019
and 2021, respectively. In addition to working in
the private sector in Spain, most of his research
career has been focused on working as senior re-
search assistant at University Carlos III of Madrid
from 2020 to 2022, where he has also partici-
pated in international projects. His research in-
terests include advanced beamforming techniques
and hardware implementation of hybrid algorithms applied to satellite com-
munications.
VÍCTOR P. GIL JIMÉNEZ (Senior Mem-
ber, IEEE) received the B.S. degree (Hons.) in
telecommunication from the University of Alcalá
in 1998 and the M.S. degree (Hons.) in telecom-
munication and the Ph.D. degree (Hons.) from the
University Carlos III of Madrid in 2001 and 2005,
respectively. He was with the Spanish Antarctica
Base in 1999 as a Communications Staff. He
visited the University of Leeds, U.K., in 2003,
Chalmers Technical University, Sweden, in 2004,
and the Instituto de Telecommunicaçoes, Portugal, from 2008 to 2010. He
is currently with the Department of Signal Theory and Communications,
University Carlos III of Madrid, as an Associate Professor. He has also
led several private and national Spanish projects and has participated in
several European and international projects. He holds one patent. He has
published over 80 journal articles/conference papers and 8 book chapters.
His research interests include advanced multicarrier systems for wireless
radio, satellite and visible light communications. He held the IEEE Spanish
Communications and Signal Processing Joint Chapter Chair from 2015 to
2022. He received the Master Thesis and the Ph.D. Thesis Award from the
Professional Association of Telecommunication Engineers of Spain in 1998
and 2006, respectively.
16 VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3235966
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
... The Basic Architecture of the Iterative Receiver Using the Turbo Principle Figure 1 shows the structure of a transmitter in the presence of the convolutional encoder, the classic structure of the iterative receiver, and the model channel in the presence of intersymbol interference for BPSK modulation. Most turbo equalization architectures described in the literature use the same basic architecture, both for the SISO equalization module and for the SISO convolutional decoding module, using the BCJR algorithm [33]. In Figure 1, the input binary data sequence u i , assumed to be equiprobable, is encoded by a convolutional encoder, interleaved, mapped into BPSK symbols, and transmitted in a time-varying and frequency-selective medium. ...
... Figure 1 shows the structure of a transmitter in the presence of the convolutional encoder, the classic structure of the iterative receiver, and the model channel in the presence of intersymbol interference for BPSK modulation. Most turbo equalization architectures described in the literature use the same basic architecture, both for the SISO equalization module and for the SISO convolutional decoding module, using the BCJR algorithm [33]. In Figure 1, the input binary data sequence ui, assumed to be equiprobable, is encoded by a convolutional encoder, interleaved, mapped into BPSK symbols, and transmitted in a time-varying and frequency-selective medium. ...
Article
Full-text available
In this article we present an iterative system of equalization and decoding to manage the intersymbol interference over an additive white Gaussian noise (AWGN) channel. Following the classic turbo equalization scheme, the proposed system consists of low-density parity-check (LDPC) coding at the transmitter side; we applied a Log maximum a posteriori probability (Log-MAP) equalizer and min-sum LDPC decoding at the receiver side. The equalizer and decoder, linked through interleaving and deinterleaving, iteratively update each other’s information. We performed the performance analysis of the proposed system, bit error rate (BER) vs. signal-to-noise ratio (SNR), considering three different impulse responses of the channel (h). Our experimental results indicated that increasing the number of iterations performed by the LDPC decoder from 10 to 20 during the iterative process of equalization and decoding leads to better outcomes. The proposed system was compared with turbo equalization and separate equalization, performed before the decoding process with minimum mean-square error (MMSE) and LDPC decoding, in terms of BER vs. SNR, considering the three different h. Based on the analyzed results, it can be concluded that the equalization performance depends on both the impulse responses of the channel and the chosen decoding and equalization method; therefore, the equalization method does not always offer good results for any h.
Article
Full-text available
Satellite communications are a well-established research area in which the main innovation of last decade has been the use of multi-carrier modulations and more robust channel coding techniques. However, in recent years, novel advanced signal processing has started being developed for these communications due to the increase in the signal processing capacity of transmitters and receivers. Although signal processing capabilities are increasing, they are still constrained by large limitations because these techniques need to be implemented in real hardware, thus making complexity a matter of critical importance. Therefore, this paper presents the design and implementation of a transmitter with adaptable coding and modulation on a field-programmable-gate-array (FPGA). The main motivation came from the standard CCSDS 131.2-B-1 which recommends that such a novel transmitter which has to date not been implemented in a real system The system was modeled by MATLAB with the purpose of being programmed in VHDL following the AXI-stream protocol between components. Behavioral simulation results were obtained in VIVADO and compared with MATLAB for verification purposes. The transmitter logical circuit was synthesized in a FPGA Zynq Ultrascale RFSoC ZU28DR, showing low resource consumption and correct functioning, leading us to conclude that the deployment of new communication systems in state-of-the-art hardware in satellite communications is justified.
Article
Full-text available
Nowadays, the sensor community has become wireless, increasing their potential and applications. In particular, these emerging technologies are promising for vehicles’ communications (V2V) to dramatically reduce the number of fatal roadway accidents by providing early warnings. The ECMA-368 wireless communication standard has been developed and used in wireless sensor networks and it is also proposed to be used in vehicular networks. It adopts Multiband Orthogonal Frequency Division Multiplexing (MB-OFDM) technology to transmit data. However, the large power envelope fluctuation of OFDM signals limits the power efficiency of the High Power Amplifier (HPA) due to nonlinear distortion. This is especially important for mobile broadband wireless and sensors in vehicular networks. Many algorithms have been proposed for solving this drawback. However, complexity and implementations are usually an issue in real developments. In this paper, the implementation of a novel architecture based on multilayer perceptron artificial neural networks on a Field Programmable Gate Array (FPGA) chip is evaluated and some guidelines are drawn suitable for vehicular communications. The proposed implementation improves performance in terms of Peak to Average Power Ratio (PAPR) reduction, distortion and Bit Error Rate (BER) with much lower complexity. Two different chips have been used, namely, Xilinx and Altera and a comparison is also provided. As a conclusion, the proposed implementation allows a minimal consumption of the resources jointly with a higher maximum frequency, higher performance and lower complexity.
Article
Full-text available
Faster-than-Nyquist (FTN) signaling can improve the bandwidth utilization. In this article, we will provide a comprehensive survey on the topic. The history and the applications of FTN signaling are first introduced. And then, the basic principles and the system framework of FTN signaling are presented. Next, more details on transmitter and receiver optimization are discussed. Finally, the current research challenges on FTN signaling are identified and conclusions are provided.
Conference Paper
Worldwide interoperability for micro wave access (WiMAX) is a wireless network that have attracted the attention of researchers on the area of power savings, because of the increase in multimedia applications that consumes more battery power of Mobile stations (MS). MS experience an increase in their daily operations with limited power enforced on the MS; hence the need to optimally increase efficiency; due to the excessive power consumption experienced by MS; which significantly affects the performance of MS since, MS are battery powered with a super impose life, improving the life time of MS is imperative. Thus, this has led to numerous research works on energy-savings. Hence, we have presented a survey on the recent and past energy-saving schemes, aimed at understanding some of the most relevant sources of inefficiency in energy savings and how some of these challenges are solved by the existing solutions. We further presented a comparative analysis on these schemes with the aim of identifying current challenges that are yet to be addressed by the research community as well as presents future directions towards efficient energy savings in WiMAX networks. Keywords: Power-Savings, Consumption, WiMAX, Networks
Book
HomePlug is a growing technology for creating high-speed Power Line Communication (PLC) networks by transmitting data over in-home or in-office power lines. Users only need to plug adapters into wall outlets to create an instant network of computers, printers, routers, home entertainment devices, and appliance control systems. HomePlug AV and IEEE 1901: A Handbook for PLC Designers and Users provides for the first time an opportunity for non-members of the HomePlug Alliance to gain in-depth insight into the design and operation of the HomePlug standards. Offering a clear and simple description of the standards, this groundbreaking resource presents HomePlug AV and the associated IEEE 1901 standards in terms more readily understood by a much wider audience, including nontechnical managers, engineers, students, and HomePlug designers. The book details the many benefits of HomePlug AV, including: An affordable, secure alternative or complement to WiFi-especially in buildings where WiFi reception is poor or running new network wires is impractical Higher potential data transmission rates up to 200 Mbps Support for multimedia applications such as HDTV and VoIP The book also provides an overview of the HomePlug Green PHY standard that is targeted for use in smart energy applications, and the HomePlug AV 2.0 standard that operates at up to 1.5 Gbps. An essential tool for designers of HomePlug devices, network administrators, and individual users of HomePlug networks who need to understand the features and capabilities of HomePlug, HomePlug AV and IEEE 1901: A Handbook for PLC Designers and Users will also prove useful for researchers in academia and the power line communications industry. © 2013 The Institute of Electrical and Electronics Engineers, Inc. All rights reserved.
Chapter
The first modem designed for a satellite application of turbo codes [1] was tested only four years after the publication of the first paper which introduced the concept of turbo codes [2–5]. Such a short span between the invention of a radical new concept and its application is also a first in the brief history of this field. Satellite communications took off in early 70's, with Intelsat being the first international body that coordinated the activities in this sector and acted as an enabler through standards definition process. For example, the IESS308/309/310 standards allowed different satellite modem manufacturers to build Viterbi and Reed-Solomon codecs that were compatible with one another. In the late 90's, with the move towards privatization, Intelsat's role changed. With the explosion of different turbo-like codes and no standards available, Intelsat issued generic standards (IESS 315/316) which do not specify the details of the coding scheme but only its performance requirements. Therefore, one can see today a wide range of proprietary solutions using turbo-like codes. Performance-wise, there are very small differences between the lat-est contenders, e.g., in the order of fractions of a dB. Additional features like code rate flexibility, variable delay, performance in non-linear channels (Section 11.5.10) or increased security (Section 11.5.1) became the differentiator factors, on top of the cost issue. Another trend encouraged by the iterative decoding techniques introduced with the invention of turbo codes is the joint source-channel decoding, an example being briefly described in Section 11.5.2. In conjunction with the latest MPEG4 standard, this could improve significantly the link budget for image/video transmissions.