Conference PaperPDF Available

Low-Complexity Software Stack Decoding of Polar Codes

Authors:

Figures

Content may be subject to copyright.
Low-Complexity Software Stack Decoding of Polar
Codes
Harsh Aurora, Carlo Condo, Warren J. Gross
Department of Electrical and Computer Engineering, McGill University, Montr´
eal, Qu´
ebec, Canada
Email: harsh.aurora@mail.mcgill.ca, carlo.condo@mcgill.ca, warren.gross@mcgill.ca
Abstract—Polar codes are a recent class of linear error-
correcting codes that asymptotically achieve the channel capacity
at infinite code length. The Successive Cancellation List (SCL)
algorithm yields very good error-correction performance, at the
cost of high implementation complexity. The Stack (SCS) de-
coding algorithm provides similar error-correction performance
at a lower complexity. In this work, we propose an efficient
software implementation of the SCS decoding algorithm, along
with techniques to further reduce its computational complexity.
In particular, we reduce the SCS memory requirements through
efficient path switching, replace the stack sorting with a linear
search, and explore the use of a partial CRC along with an
early termination criterion. Using the proposed methods, we are
able to reduce the computational complexity of the SCS decoder,
reducing the number of estimated bits up to 97% with respect
to SCL, while maintaining similar error-correction performance
as SCL.
I. INTRODUCTION
Polar codes [1] are the first error-correcting codes that can
provably achieve channel capacity, and they have been selected
as a coding scheme for the 5th generation wireless systems
standards (5G) [2]. The first proposed decoding algorithm is
the successive-cancellation (SC) algorithm [1]. While its error-
correction performance is able to reach channel capacity at
infinite code length, it is mediocre at practical code lengths.
Thus, many improvements to SC have been proposed in the
past years: list SC (SCL) [3] and its evolutions [4]–[6] have
gathered the interest of academia and industry alike thanks
to their substantial error-correction performance gains. They
rely on multiple parallel SC decoders working on different
possible candidate codewords, and on dedicated metrics to
identify the most likely one. SCL decoders thus suffer from
high computational complexity.
Similar to the concept used in SCL, SCS has been proposed
in [7] and improved upon in [8], [9]. It relies on a set
of codeword candidates, of which only the most likely is
extended. Unlike SCL, the amount of memory required by
SCS is variable. This cannot easily lead to actual memory
reduction in hardware decoders, where memory usually is
sized at design time considering the worst case. The flexible
nature of SCS is instead well suited for software decoders,
whose inherent adaptability can be exploited in base stations.
Current polar code software decoders suffer from longer la-
tency and lower throughput with respect to hardware decoders
[10], [11]. Fast software decoders such as [12] require parallel
implementations on powerful, power-hungry platforms.
In this work, we present an efficient software implementa-
tion of the SCS algorithm in which the decoder tree has the
same memory requirement as that of SC, improving over [13].
Our software implementation replaces the stack sorting with
a linear search. We then propose an early CRC check in the
message bits, that provides a reduction in computational com-
plexity and latency. Lastly, we describe an early termination
criterion based on this CRC check, which enables us to further
reduce the computational complexity of the SCS decoder while
maintaining similar error-correction performance as SCL.
II. PRELIMINARIES
A polar code P C(N , K)of code length Nand rate R=
K/N is a linear block code that identifies Kreliable bit-
channels, used to transmit information, and NKunreliable
ones, frozen at a known value. Polar codes are encoded by
multiplying the information/frozen bit vector by the generator
matrix Gn, i.e. the n-th Kronecker product of the polarization
matrix G= [ 1 0
1 1 ].
The SC decoding algorithm can be viewed as a recursive
binary tree search. A node receives from its parent a vector
of log-likelihood ratios (LLRs) α: at the tree stage λ, nodes
compute the left αl={αl
0, αl
1, . . . , αl
2λ11}and right αr=
{αr
0, αr
1, . . . , αr
2λ11}LLR vectors. These are transmitted to
child nodes:
αl
i= sgn(αi) sgn(αi+2λ1) min(αi, αi+2λ1), (1)
αr
i=αi+2λ1+ (1 2βl
i)αi, (2)
with LLRs at the root node initialized as the LLRs received
from the channel. The right hand terms in Eq. (1) and (2)
are also known as the fand gfunctions respectively. The
partial sums βreceived from the left and right child nodes
are calculated as:
βi=βl
iβr
i,if i2λ1
βr
i,otherwise. (3)
where is the XOR operation, and 0i < 2λ. At leaf nodes,
the βvalue and the estimated bit vector ˆuN1
0are computed
as
βi=0,when αi0or iis frozen;
1,otherwise. (4)
The SCL decoding algorithm [3] improves the error-
correction performance of SC by relying on Lparallel SC
decoding paths. Every time an information bit is estimated,
both possible values 0and 1are investigated and 2Lpaths are
created. Each path is associated to a path metric PM, and the
Lpaths with the highest PM are discarded. In the LLR-based
formulation of SCL [4], the PM can be computed as
PM1l= 0,
PMil=(PMi1l+|αil|, if ˆuil6=1
2(1 sgn (αil)) ,
PMi1l, otherwise, (5)
where lis the path index and ˆujlis the estimate of bit jat path
l. The main limitation of the SCL decoder is a high degree
of complexity: it has a space complexity of O(LN)and time
complexity of O(LN log2N).
The SCS algorithm addresses the high complexity issues of
the SCL decoder by employing a priority queue (PQ) of size
D, in which the candidate paths are stored. Every time a bit
is estimated, the decoder only extends the most probable path
from the queue. An additional list-like parameter Lis used to
limit the number of paths in the queue. If a path of length φ
is extracted Ltimes from the queue, all paths with length less
than φare deleted from the queue.
III. MEMORY EFFICI EN T SOF TWAR E STACK DECODER
In this section, we describe our software implementation
of the SCS decoder. The main improvements over existing
work in [7]–[9], [13] include reducing the decoding tree spatial
complexity to O(N)and replacing the stack sorting step with a
linear search over the stack. We calculate our bit probabilities
in the LLR domain, and make use of the path metric from Eq.
(5). The probability calculation and bit propagation is based on
the approach in [3]. We begin by outlining the data-structures
used in our SCS implementation.
P: A 2-D float array with which the LLR of a bit index is
recursively calculated. It consists of nrows, where each
row is a probability array of size 2λλ[0, n].
C: A 3-D bit array where the estimated bits are stored
and recursively propagated for gfunction calculations.
PM: Array of size Dthat stores path metrics.
PL: Array of size Dthat stores path lengths.
PL hits: Array of size Nin which the value at each
index φindicates the number of times a path of length φ
was extracted from the PQ.
paths: A 2-D bit array that stores the paths in the PQ.
inactive path indices: an integer stack of depth D
that contains inactive path indices.
active path: A boolean array of size Dthat indicates
whether a path is active or not.
In addition to these, the SCS decoder makes use of the
following variables:
T: Total number of active paths in the stack.
min index: Index of path with minimum path metric.
max index: Index of path with maximum path metric.
path switch: Boolean that indicates a path switch.
The main loop of the SCS decoder is described in Algorithm 1,
while the most important functions are detailed in Algorithms
2-6. First, the data structures are initialized. The memory for
P,C,PL,PM and paths does not need to be initialized, as it
Algorithm 1: SCS Decoder, Main Loop
Input : received vector yN1
0
Output: estimated message bits ˆmK1
0
1initialize data structures();
2min idx =assign initial path();
3for φ= 0,1, . . . , N 1do
4P[0][φ] = L0(yφ);
5while (1) do
6recursively calc P(n, PL[min index]);
7pm0 = calc new pm(PM[min index], P[n][0], 0);
8pm1 = calc new pm(PM[min index], P[n][0], 1);
9if (PL[min index]Ac)then
10 extend path(min index, 0, pm0);
11 else
12 if (T== D)then
13 if (PM[max index]>max(pm0, pm1)) then
14 kill path(max index);
15 if pm0 <pm1 then
16 if (T< D)then
17 max index = clone path(min index);
18 extend path(max index, 1, pm1);
19 extend path(min index, 0, pm0);
20 else
21 if (T== D)then
22 max index = clone path(min index);
23 extend path(max index, 0, pm0);
24 extend path(min index, 1, pm1);
25 update min max index();
26 update length info();
27 if (end check() == 1) then
28 break;
29 if path switch then
30 load path();
31 φ=PL[min index]1;
32 C[φmod 2][n][0];
33 if ((φmod 2) == 1)then
34 recursively update C(n, PL[min index]1);
35 for φ= 0,1, . . . , K 1do
36 ˆmφ=paths[min index][Aφ];
is set up as new paths are created. The initial path is assigned
to the min index and the channel LLRs are populated at the
top of the probability tree P.
In the while loop (line 5 to 34), the LLR for the current bit
of the most reliable path is calculated. Lines 9 and 10 extend
this path in the event of a frozen bit (i.e. bit index belongs
to frozen set AC). In the case of a message bit, lines 12-14
first check if the PQ is full and if both the new guesses are
better than the worst path in the PQ. If this is true, then the
Algorithm 2: initialize data structures()
1clear(inactive path indices);
2for p= 0,1, . . . , D 1do
3push(inactive path indices,p);
4active path[p]=false;
5for φ= 0,1, . . . , N 1do
6PL hits[φ]= 0;
Algorithm 3: assign initial path()
Output: Index pof initial path
1p=pop(inactive path indices);
2active path[p]=true;
3PM[p]= 0.0;
4PL[p]= 0;
5T= 1;
worst path is killed. Lines 15-24 extend the best path along
the more reliable guess and place the other guess in the PQ if
there is space.
The function update min max length is then called to
update min index,max index and path switch. The indices
of the paths with the maximum and minimum path metrics are
identified in a single loop of at most O(D)complexity, which
eliminates the need to sort all the paths in the PQ, since these
are the only paths that will have to be extended or deleted in
the current iteration of the decoder. Furthermore, by keeping
track of path switching it is possible to reuse the values in the
Pand Cmemory just like an SC decoder, as long as SCS is
extending the same path. In case of a path switch, the new
path needs to be loaded into the Pand Cmemory only once,
and then they can be reused until the path switches again. This
enables us to reduce the space complexity while maintaining
the computational complexity between switches.
Next, update length info is called, which checks if the
current path length has been investigated Ltimes, and kills
all shorter paths if so. Then, the call to end check causes the
algorithm to break out of the while loop if the PQ is empty
or if the length of the current path has reached N. Finally, a
new path is loaded in case of a switch, and the last bit of the
current path is updated in the Cmemory. Upon exiting the
while loop, the index of the decoded path is in min index:
the decoder copies the bits of the unfrozen set Ainto the
estimated message bit vector, and the algorithm terminates.
The probability and bit trees Pand Chave a space com-
plexity of O(N), equal to that of the SC decoder. The PL
and PM arrays have a space complexity of O(D), while the
paths memory has a space complexity of O(ND). Since the
frozen values are already known and only the message bits in
the path need to be saved, the paths memory can be further
compressed to a space complexity of O(KD)at the cost of
the decoder only being able to support a maximum fixed rate.
Algorithm 4: clone path()
Input : Index pof path to clone
Output: Index p0of cloned path
1p0=pop(inactive path indices);
2active path[p0]=true;
3PM[p0]=PM[p];
4PL[p0]=PL[p];
5T=T+1;
6for φ= 0,1,...,PL[p]1do
7paths[p0][φ]=paths[p][φ];
Algorithm 5: recursively calc P()
Input: Layer λand phase φ
1if λ= 0 then
2return;
3ψ=φ/2;
4if ((φmod 2)== 0)or (path switch == 1) then
5recursively calc P(λ1, ψ);
6for β= 0,1,...,2nλ1do
7if ((φmod 2)== 0)then
8P[λ][β] = f(P[λ1][2β], P[λ1][2β+1]);
9else
10 u = C[0][λ][β];
11 P[λ][β] = g(P[λ1][2β], P[λ1][2β+1], u);
Algorithm 6: load path()
1for φ= 0,1,...,PL[min index]1do
2C[φmod 2][n][0] = paths[min index][φ];
3if ((φmod 2)== 1)then
4recursively update C(n, φ);
IV. FURT HE R COMPLEXITY REDUCTION
We define an “iteration” as a decoder estimating a particular
bit index in a candidate path. Thus, the SC and SCL decoding
algorithms have a fixed number of iterations Nand NL
respectively, while the SCS decoder has a variable number
of iterations depending on Eb/N0. This number converges to
Niterations as Eb/N0increases.
Studies presented in [14] have shown that decoding failures
are typically caused by a limited number of errors introduced
by the channel (1-3 channel errors). These errors are more
likely to occur at bit indices with low reliability, that are found
early on in the polar codeword, and thus decoded earlier.
We propose to protect the first γinformation bits encoun-
tered along the SC decoding tree with a CRC of length
Cγ. When the SCS decoder reaches a candidate path with
γmessage bits, it can perform a CRC check and kill the path
in case the CRC fails. Paths that fail the CRC still result in
an increment of PL hits, and therefore the SCS decoder will
have at most Lpaths that have passed this initial CRC.
It is possible, especially at low Eb/N0, that incorrect paths
0 1 2 3
105
104
103
102
101
100
Eb/No
FER
SC
SCL (L=32)
SCS
SCS-ET
Fig. 1. FER curves for different decoding algorithms, P C(512,256).
pass this initial CRC, or that the correct path gets killed
before or shortly after the CRC check, due to errors in the
CRC bits. In such cases the SCS decoder performs many
useless iterations only to result in a decoding failure. We
propose to introduce an early termination criterion by defining
a maximum number of iterations Mit the decoder is allowed
to take before failure is declared. Mit is initialized to 2LN:
in the event of an initial CRC failure, Mit is penalized by N
iterations, corresponding to the path it has just removed from
consideration. An early termination criterion for SCS decoders
has also been proposed in [15]. However, the parameters of
the method described in [15] depend on channel conditions,
and the early termination comes at a cost in FER; our
approach (SCS-ET) is instead channel-independent and causes
negligible error-correction performance degradation.
V. SIMULATION RESU LTS
Simulation results are presented for P C(512,256) con-
structed for an AWGN channel with σ2= 0.5. The parameter
Lis set to 32 for SCL, SCS, and SCS-ET. The stack depth
Dis set to LN = 16,384 for the SCS and SCS-ET decoders.
Finally, SCS-ET has initial CRC parameters set to γ= 16, Cγ
= 8 and a CRC polynomial 0xD5.
Fig. 1 shows the frame error rate (FER) for the considered
algorithms. It can be seen that SCS and SCS-ET provide
similar error-correction performance as SCL. Fig. 2 shows that
on average the SCS decoder takes fewer iterations than the
SCL decoder, with a gain ranging between 48% and 97%,
and at high Eb/N0it converges to SC complexity. It can
be observed that by using a CRC on γinformation bits and
the early termination criterion, the complexity of the SCS-
ET decoder has been reduced, gaining 1% to 50% over SCS
and 71% to 97% over SCL. Finally, Fig. 3 shows that the
CRC check in the SCS-ET decoder reduces the number of
iterations by 1% to 28% with respect to SCS in case of a
successful decoding, while the CRC combined with the early
termination criterion yields a gain ranging between 31% to
53% in iterations over SCS in case of failed decoding. SCS-ET
0 1 2 3
103
104
Eb/No
Average Iterations
SC
SCL (L=32)
SCS
SCS-ET
Fig. 2. Average number of iterations for different decoding algorithms,
P C(512,256).
0 1 2 3
103
104
Eb/No
Average Iterations
SCS Pass
SCS Fail
SCS-ET Pass
SCS-ET Fail
Fig. 3. Average number of of iterations for decoder success/failure, with and
without early termination, P C(512,256).
thus requires 66%97% and 75%95% fewer iterations than
SCL in case of successful and failed decoding, respectively.
VI. CONCLUSION
In this work, we have presented an efficient software im-
plementation of the SCS decoding algorithm for polar codes.
It replaces the stack sorting step with a linear search over
the stack, and guarantees the same spatial complexity as SC
to compute the path probabilities, with additional memory
required only for storing paths in the queue. We have also pro-
posed a partial CRC check as an effective noise-independent
method to reduce the SCS time complexity, along with an early
termination criterion. Simulation results show up to a 97%
iteration gain with respect to SCL, with negligible degradation
in error-correction performance.
REFERENCES
[1] E. Arikan, “Channel polarization: A method for constructing capacity-
achieving codes for symmetric binary-input memoryless channels, IEEE
Transactions on Information Theory, vol. 55, no. 7, pp. 3051–3073, July
2009.
[2] “Final report of 3GPP TSG RAN WG1 #87 v1.0.0,
http://www.3gpp.org/ftp/tsg ran/WG1 RL1/TSGR1 87/Report/Final
Minutes report RAN1%2387 v100.zip, Reno, USA, November 2016.
[3] I. Tal and A. Vardy, “List decoding of polar codes, IEEE Transactions
on Information Theory, vol. 61, no. 5, pp. 2213–2226, May 2015.
[4] A. Balatsoukas-Stimming, M. B. Parizi, and A. Burg, “LLR-based
successive cancellation list decoding of polar codes, IEEE Transactions
on Signal Processing, vol. 63, no. 19, pp. 5165–5179, Oct 2015.
[5] S. A. Hashemi, C. Condo, and W. J. Gross, “Simplified successive-
cancellation list decoding of polar codes,” in 2016 IEEE International
Symposium on Information Theory (ISIT), July 2016, pp. 815–819.
[6] ——, “Fast simplified successive-cancellation list decoding of polar
codes,” in 2017 IEEE Wireless Communications and Networking Con-
ference Workshops (WCNCW), March 2017, pp. 1–6.
[7] K. Niu and K. Chen, “Stack decoding of polar codes,” Electronics
Letters, vol. 48, no. 12, pp. 695–697, June 2012.
[8] ——, “CRC-aided decoding of polar codes, IEEE Communications
Letters, vol. 16, no. 10, pp. 1668–1671, October 2012.
[9] K. Chen, K. Niu, and J. Lin, “Improved successive cancellation decoding
of polar codes,” IEEE Transactions on Communications, vol. 61, no. 8,
pp. 3100–3107, August 2013.
[10] Y. Shen, C. Zhang, J. Yang, S. Zhang, and X. You, “Low-latency soft-
ware successive cancellation list polar decoder using stage-located copy,”
in 2016 IEEE International Conference on Digital Signal Processing
(DSP), Oct 2016, pp. 84–88.
[11] P. Giard, G. Sarkis, C. Leroux, C. Thibeault, and W. J. Gross,
“Low-latency software polar decoders, in Journal of Signal Processing
Systems, to appear. [Online]. Available: http://arxiv.org/abs/1504.00353
[12] B. L. Gal, C. Leroux, and C. Jego, “Multi-Gb/s software decoding of
polar codes,” IEEE Transactions on Signal Processing, vol. 63, no. 2,
pp. 349–359, Jan 2015.
[13] V. Miloslavskaya and P. Trifonov, “Sequential decoding of polar codes,
IEEE Communications Letters, vol. 18, no. 7, pp. 1127–1130, July 2014.
[14] O. Afisiadis, A. Balatsoukas-Stimming, and A. Burg, A low-complexity
improved successive cancellation decoder for polar codes, in 2014 48th
Asilomar Conference on Signals, Systems and Computers, Nov 2014, pp.
2116–2120.
[15] P. Trifonov, V. Miloslavskaya, and R. Morozov, “Fast sequential
decoding of polar codes,” CoRR, vol. abs/1703.06592, 2017. [Online].
Available: http://arxiv.org/abs/1703.06592
... In order to facilitate the employment of polar decoders in practical communication systems relying on logarithmic domain, processing the logarithmic likelihood ratio (LLR)-based SCL or SCS decoding of polar codes has been investigated in [9][10][11][12], which generate hard bit decisions based on their LLR inputs. For facilitating iterative detection and decoding (IDD), soft-output decoders have also been proposed, such as the belief propagation (BP) [13] and soft cancellation (SCAN) The [14] polar decoders, which outperformed the non-iterative hardoutput SCL decoder for transmission over fading channels by exploiting extrinsic information exchange between the detector and decoder. ...
... However, to the best of our knowledge, there exists no open literature on how to generate the soft-output of SCS decoders, even though this would have the potential of reducing the detection complexity of the SCL decoder [10,11]. Motivated by filling this knowledge gap in the open literature, we propose a soft-output SCS (SSCS) decoder, which facilitates the iterative decoding of polar-coded systems. ...
... When P V [n] reaches a pre-set value, which is S/2 in this letter, all the candidates that have a shorter length than n in the stack will be popped out from the stack for releasing extra memory, as shown in Lines 26-27 of the Algorithm 1. The search process is demonstrated in Lines 8-29 of Algorithm 1, where the sorting operation follows [10]. ...
Article
Full-text available
Polar coding has been ratified for employment in the 3GPP New Radio standard and several soft-decision decoders achieved comparable performance to that of the state-of-the-art successive cancellation list decoder. Aiming for further improving the performance of the soft-decision polar decoders, we propose a soft-output successive cancellation stack (SSCS) polar decoder, which jointly exploits the benefits of the depth-first search of the stack decoder and the soft information output of the belief propagation decoder. This has the substantial benefit of facilitating soft-input soft-output (SISO) decoding and seamless iterative information exchange in turbo-style receivers. As a further contribution, we intrinsically amalgamate our SSCS decoder into polar-coded large-scale multiple-input multiple-output (MIMO) systems and conceive an iterative turbo receiver, operating on the basis of logarithmic likelihood ratios (LLRs). Our simulation results show that the proposed SSCS decoder is capable of outperforming the state-of-the-art SISO polar decoders, despite requiring a lower complexity at moderate to high signal-to-noise ratios (SNRs). Additionally, compared with the non-iterative hard-output SCS decoder, our SSCS scheme attained 1.5 dB SNR gain at a bit error ratio level of $10^{-5}$ , when decoding the [256,512] polar code of a $(64\times 64)$ MIMO system.
... At the same time average number of decoding attempts drops rapidly as the SNR increases, and proposed decoders have lower complexity compared to CA-SCL and Fast CA-SCL decoding with L = 32 and 64 in the mediumto-high SNR region. Now let us present the detailed complexity analysis (in terms of average number of different operations) of the proposed decoders and compare them with CA-SCL-16, CA-SCL-32, fast CA-SCL decoders, proposed in [8], and stack decoder (SCS) from [25]. For all implementations of SCL decoders we use simplified LLR-based versions of them, where logarithmic and exponential calculations are omitted [26]. ...
... In the last three rows of the table we present complexity estimations for different polar codes decoded by succesive cancellation stack decoder (SCS) with list size L = 32 and stack size D = LN , where N is a code length. In accordance with [25] these parameters of SCS allow to obtain performance of SCL-32 decoder. As it can be found from the Tables IV-VI, SCS can significantly reduce number of nearly all operations applied during decoding in comparison not only with CA-SCL-32 but also with GSCLF. ...
Article
Full-text available
In this paper, an improvement for SC list flip (SCL-Flip) decoding is presented for polar codes. A novel bit-selection metric for critical set (set of information symbols of polar codes being flipped during additional decoding attempts) based on path metric of successive cancellation list (SCL) decoding is suggested. With the proposed metric, the improved SCL scheme based on special nodes (SN) decoders was developed. This decoder will be denoted by GSCLF. The main idea of the proposed decoder is joint using of two approaches: first one is a fast decoding of special nodes in binary tree representation of polar code (e.g., some special nodes in tree representation of polar code that allow efficient list decoding with low complexity) and the second one is an applying of additional decoding attempts (flips) in the case when initial decoding was erroneous. The simultaneous use of these two approaches results in both a significant reduction in spatial complexity and a significant reduction in the number of computations required for decoding whereas keeping excellent performance. Simulation results presented in this paper allow us to conclude that the computational complexity of the proposed GSCLF decoder is from 66% to 80% smaller than the one of SCL-32 decoder.
... In addition to the SCL decoder, there is a similar approach that uses a stack to store paths, the management of which allows for more efficient use of memory and the use of fewer candidates than in the case of SCL. This approach is called SCS (successive cancellation stack [2]) and provides efficient decoding with space improvements. ...
Article
Full-text available
Polar codes have emerged as a focal point in the field of error-correcting codes, owing to their remarkable capacity-achieving characteristics and their relevance in various modern communication systems. The basic successive cancellation (SC) approach is not optimal to use in terms of the trade-off between performance and decoding complexity. SC-Creeper algorithm performs better with about the same low complexity as the SC version of the algorithm. However, the SC-Creeper algorithm did not have the ability to use the candidate list as a measure to improve performance and refine the search for the true codeword. To compare with successive cancellation list (SCL) approach and the ability to use more computing memory, the SCL-Creeper method was developed, using two additional lists. This method can also be used as a development of Fano algorithms for polar codes (mainly, Fano decoding in polar decoding does not use lists). This paper addresses the challenge of computational complexity in polar code decoding by integrating a list structure with the SC-Creeper algorithm. Building on prior research that introduced the concept of SC-Creeper, the study focuses on enhancing error correction performance while mitigating computational burden. The first chapters describe the polar encoding process and basic decoding technologies, then discuss the basic Creeper algorithm. In the following chapters, the authors describe a modified version of the two-list Creeper approach (that is, the SCL-Creeper version of the algorithm). Extensive simulations and numerical analysis presented in the paper underscore the tangible advantages of this novel decoding strategy. Leveraging the basic list algorithm, renowned for its superior error correction capabilities, the research explores the integration of Creeper to systematically prune unnecessary decoding paths. The resulting SCL-Creeper hybrid approach aims to strike a balance between error correction efficiency and computational complexity. Finally, the optimal selection of parameters for the SCL-Creeper approach and future directions in the research of the list version of the fast Creeper algorithm are discussed.
... The Successive Cancellation List (SCL) decoder, introduced by Tal and Vardy [3], follows a similar path-wise traversal strategy as SC but maintains a list of up to L candidate paths for further exploration. The Successive Cancellation Stack (SCS) decoder [4] offers improved error correction ratios and throughput; however, this achievement comes with the trade-off of heightened space complexity. Another noteworthy development is the SC-Fano decoding algorithm [5], which integrates sequential decoding concepts into the polar decoding traversal procedure. ...
Article
Full-text available
Polar codes have established themselves as a cornerstone in modern error correction coding due to their capacity-achieving properties and practical implementation advantages. However, decoding polar codes remains a computationally intensive task. In this paper, we introduce a novel approach to improve the decoding efficiency of polar codes by integrating the threshold-based SC-Creeper decoding algorithm, originally designed for convolutional codes. Our proposed decoder with an additional cost function seamlessly merges two established decoding paradigms, namely the stack and Fano approaches. The core idea is to leverage the strengths of both decoding techniques to strike a balance between computational efficiency and performance, with an additional method of controlling movement along a code tree. Simulations demonstrate the superiority of the proposed improved SC-Creeper decoder with tuned parameters. The improved SC-Creeper decoder achieves the performance of the CA-SCL-8 decoder in terms of high code rates and overcomes it in terms of the N=1024 code length, while simultaneously surpassing the efficiency of the traditional Fano decoding algorithm.
Article
Full-text available
Introduction/purpose: The paper introduces a reduced latency stack decoding algorithm of polar codes, inspired by the bidirectional stack decoding of convolutional codes and based on the folding technique. Methods: The stack decoding algorithm (also known as stack search) that is useful for decoding tree codes, the list decoding technique introduced by Peter Elias and the folding technique for polar codes which is used to reduce the latency of the decoding algorithm. The simulation was done using the Monte Carlo procedure. Results: A new polar code decoding algorithm, suitable for parallel implementation, is developed and the simulation results are presented. Conclusions: Polar codes are a class of capacity achieving codes that have been adopted as the main coding scheme for control channels in 5G New Radio. The main decoding algorithm for polar codes is the successive cancellation decoder. This algorithm performs well at large blocklengths with a low complexity, but has very low reliability at short and medium blocklengths. Several decoding algorithms have been proposed in order to improve the error correcting performance of polar codes. The successive cancellation list decoder, in conjunction with a cyclic redundancy check, provides very good error-correction performance, but at the cost of a high implementation complexity. The successive cancellation stack decoder provides similar error-correction performance at a lower complexity. Future machine-type and ultra reliable low latency communication applications require high-speed low latency decoding algorithms with good error correcting performance. In this paper, we propose a novel decoding algorithm, inspired by the bidirectional stack decoding of classical convolutional codes, with reduced latency that achieves similar performance as the classical successive cancellation list and successive cancellation stack decoding algorithms. The results are presented analytically and verified by simulation.
Article
Proved to achieve the symmetric capacity of the binary-input discrete memoryless channels, polar codes have been chosen for the eMBB control channels in the 5th generation mobile communication systems. Besides the main decoding algorithms like successive cancellation (SC) decoding and CRC-aid SC list (CA-SCL) decoding, sphere decoder (SD) and list SD (LSD) are the alternatives for short codes with less required memory bits. Existing SD and LSD attain high calculation complexity, for SD requires a back-tracking process and LSD needs a large list size L to achieve satisfying performance. To reduce complexity, an efficient software stack sphere decoder (ESSD) based on the synchronous determination is firstly proposed in this article. With the dynamic set-by-set decoding in the stack structure, it achieves the lowest complexity in SD-based decoders (SD/LSD/ESSD) while sharing the same performance on low-rate codes and high-rate codes. Compared with the CA-SCL decoder, the complexity and latency of the proposed ESSD are also competitive at high signal-to-noise-ratio on the displayed codes. Implemented on C++, the proposed ESSD reduces 44.77% latency compared with CA-SCL-32 for P(128, 120) at the BER of 10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">-5</sup> with E <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">b</sub> /N <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> = 7 dB.
Article
Full-text available
An extension of the stack decoding algorithm for polar codes is presented. The paper introduces a new score function, which enables one to accurately compare paths of different length. This results in significant complexity reduction with respect to the original stack algorithm at the expense of negligible performance loss.
Article
Full-text available
This paper presents an optimized software implementation of a Successive Cancellation (SC) decoder for polar codes. Despite the strong data dependencies in SC decoding, a highly parallel software polar decoder is devised for x86 processor target. A high level of performance is achieved by exploiting the parallelism inherent in today's processor architectures (SIMD, multicore, etc.). Some optimizations that were originally thought for hardware implementation (memory reduction techniques and algorithmic simplifications) were also applied to enhance the throughput of the software implementation. Finally, some low level optimizations such as explicit assembly description or data packing are used to improve the throughput even more. The resulting decoder description is implemented on different x86 processor targets. An analysis of the decoder in terms of latency and throughput is proposed. The influence of several parameters on the throughput and the latency is investigated: the selected target, the code rate, the code length, the SIMD mode (SSE/AVX), the multithreading mode, etc. The energy per decoded bit is also estimated. The proposed software decoder compares favorably with state of the art software polar decoders. Extensive experimentations demonstrate that the proposed software polar decoder exceeds 1 Gb/s for code lengths N ≤ 217 on a single core and reaches multi-Gb/s throughputs when using four cores in parallel in AVX mode.
Conference Paper
Full-text available
Under successive cancellation (SC) decoding, polar codes are inferior to other codes of similar blocklength in terms of frame error rate. While more sophisticated decoding algorithms such as list- or stack-decoding partially mitigate this performance loss, they suffer from an increase in complexity. In this paper, we describe a new flavor of the SC decoder, called the SC flip decoder. Our algorithm preserves the low memory requirements of the basic SC decoder and adjusts the required decoding effort to the signal quality. In the waterfall region, its average computational complexity is almost as low as that of the SC decoder.
Conference Paper
Full-text available
We focus on the metric sorter unit of successive cancellation list decoders for polar codes, which lies on the critical path in all current hardware implementations of the decoder. We review existing metric sorter architectures and we propose two new architectures that exploit the structure of the path metrics in a log-likelihood ratio based formulation of successive cancellation list decoding. Our synthesis results show that, for the list size of L=32, our first proposed sorter is 14% faster and 45% smaller than existing sorters, while for smaller list sizes, our second sorter has a higher delay in return for up to 36% reduction in the area.
Conference Paper
Full-text available
We present an LLR-based implementation of the successive cancellation list (SCL) decoder. To this end, we associate each decoding path with a `metric' which (i) is a monotone function of the path's likelihood and (ii) can be computed efficiently from the channel LLRs. The LLR-based formulation leads to a more efficient hardware implementation of the decoder compared to the known log-likelihood based implementation. Synthesis results for an SCL decoder with block-length of $N = 1024$ and list sizes of $L=2$ and $L=4$ confirm that the LLR-based decoder has considerable area and operating frequency advantages in the orders of $50\%$ and $30\%$, respectively.
Chapter
The low-complexity encoding and decoding algorithms render polar codes attractive for use in SDR applications where computational resources are limited. In this chapter, we present low-latency software polar decoders that exploit modern processor capabilities. We show how adapting the algorithm at various levels can lead to significant improvements in latency and throughput, yielding polar decoders that are suitable for high-performance SDR applications on modern desktop processors and embedded-platform processors. These proposed decoders have an order of magnitude lower latency and memory footprint compared to state-of-the-art decoders, while maintaining comparable throughput. In addition, we present strategies and results for implementing polar decoders on graphical processing units. Finally, we show that the energy efficiency of the proposed decoders is comparable to state-of-the-art software polar decoders.
Conference Paper
Successive cancellation list (SCL) decoding for polar codes is promising in data communication. However, in addition to L times complexity of conventional SC, both path selecting and updating result in extra complexity. In detail, the copy of intermediate values suffers from a long latency, especially when list size L is large. In this paper, a stage-located copy algorithm is proposed to avoid copying the same contents in candidate paths, which significantly reduces the processing latency. Furthermore, the resulting data processing speedup increases with code length. For (2048, 1723) polar codes, experimental results have shown that by employing the proposed stage-located copy, throughput of software-based SCL decoder with L = 32 achieves up to 1.1 Mbps throughput with 45% increase compared to the state-of-the-art software SCL decoders.
Article
The problem of efficient decoding of polar codes is considered. A low-complexity sequential soft decision decoding algorithm is proposed. It is based on the successive cancellation approach, and it employs most likely codeword probability estimates for selection of a path within the code tree to be extended.
Article
CRC (cyclic redundancy check)-aided decoding schemes are proposed to improve the performance of polar codes. A unified description of successive cancellation decoding and its improved version with list or stack is provided and the CRC-aided successive cancellation list/stack (CA-SCL/SCS) decoding schemes are proposed. Simulation results in binary-input additive white Gaussian noise channel (BI-AWGNC) show that CA-SCL/SCS can provide significant gain of 0.5 dB over the turbo codes used in 3GPP standard with code rate 1/2 and code length 1024 at the block error probability (BLER) of 10-4. Moreover, the time complexity of CA-SCS decoder is much lower than that of turbo decoder and can be close to that of successive cancellation (SC) decoder in the high SNR regime.