Content uploaded by Carlo Condo
Author content
All content in this area was uploaded by Carlo Condo on Jul 04, 2018
Content may be subject to copyright.
Low-Complexity Software Stack Decoding of Polar
Codes
Harsh Aurora, Carlo Condo, Warren J. Gross
Department of Electrical and Computer Engineering, McGill University, Montr´
eal, Qu´
ebec, Canada
Email: harsh.aurora@mail.mcgill.ca, carlo.condo@mcgill.ca, warren.gross@mcgill.ca
Abstract—Polar codes are a recent class of linear error-
correcting codes that asymptotically achieve the channel capacity
at infinite code length. The Successive Cancellation List (SCL)
algorithm yields very good error-correction performance, at the
cost of high implementation complexity. The Stack (SCS) de-
coding algorithm provides similar error-correction performance
at a lower complexity. In this work, we propose an efficient
software implementation of the SCS decoding algorithm, along
with techniques to further reduce its computational complexity.
In particular, we reduce the SCS memory requirements through
efficient path switching, replace the stack sorting with a linear
search, and explore the use of a partial CRC along with an
early termination criterion. Using the proposed methods, we are
able to reduce the computational complexity of the SCS decoder,
reducing the number of estimated bits up to 97% with respect
to SCL, while maintaining similar error-correction performance
as SCL.
I. INTRODUCTION
Polar codes [1] are the first error-correcting codes that can
provably achieve channel capacity, and they have been selected
as a coding scheme for the 5th generation wireless systems
standards (5G) [2]. The first proposed decoding algorithm is
the successive-cancellation (SC) algorithm [1]. While its error-
correction performance is able to reach channel capacity at
infinite code length, it is mediocre at practical code lengths.
Thus, many improvements to SC have been proposed in the
past years: list SC (SCL) [3] and its evolutions [4]–[6] have
gathered the interest of academia and industry alike thanks
to their substantial error-correction performance gains. They
rely on multiple parallel SC decoders working on different
possible candidate codewords, and on dedicated metrics to
identify the most likely one. SCL decoders thus suffer from
high computational complexity.
Similar to the concept used in SCL, SCS has been proposed
in [7] and improved upon in [8], [9]. It relies on a set
of codeword candidates, of which only the most likely is
extended. Unlike SCL, the amount of memory required by
SCS is variable. This cannot easily lead to actual memory
reduction in hardware decoders, where memory usually is
sized at design time considering the worst case. The flexible
nature of SCS is instead well suited for software decoders,
whose inherent adaptability can be exploited in base stations.
Current polar code software decoders suffer from longer la-
tency and lower throughput with respect to hardware decoders
[10], [11]. Fast software decoders such as [12] require parallel
implementations on powerful, power-hungry platforms.
In this work, we present an efficient software implementa-
tion of the SCS algorithm in which the decoder tree has the
same memory requirement as that of SC, improving over [13].
Our software implementation replaces the stack sorting with
a linear search. We then propose an early CRC check in the
message bits, that provides a reduction in computational com-
plexity and latency. Lastly, we describe an early termination
criterion based on this CRC check, which enables us to further
reduce the computational complexity of the SCS decoder while
maintaining similar error-correction performance as SCL.
II. PRELIMINARIES
A polar code P C(N , K)of code length Nand rate R=
K/N is a linear block code that identifies Kreliable bit-
channels, used to transmit information, and N−Kunreliable
ones, frozen at a known value. Polar codes are encoded by
multiplying the information/frozen bit vector by the generator
matrix G⊗n, i.e. the n-th Kronecker product of the polarization
matrix G= [ 1 0
1 1 ].
The SC decoding algorithm can be viewed as a recursive
binary tree search. A node receives from its parent a vector
of log-likelihood ratios (LLRs) α: at the tree stage λ, nodes
compute the left αl={αl
0, αl
1, . . . , αl
2λ−1−1}and right αr=
{αr
0, αr
1, . . . , αr
2λ−1−1}LLR vectors. These are transmitted to
child nodes:
αl
i= sgn(αi) sgn(αi+2λ−1) min(αi, αi+2λ−1), (1)
αr
i=αi+2λ−1+ (1 −2βl
i)αi, (2)
with LLRs at the root node initialized as the LLRs received
from the channel. The right hand terms in Eq. (1) and (2)
are also known as the fand gfunctions respectively. The
partial sums βreceived from the left and right child nodes
are calculated as:
βi=βl
i⊕βr
i,if i≤2λ−1
βr
i,otherwise. (3)
where ⊕is the XOR operation, and 0≤i < 2λ. At leaf nodes,
the βvalue and the estimated bit vector ˆuN−1
0are computed
as
βi=0,when αi≥0or iis frozen;
1,otherwise. (4)
The SCL decoding algorithm [3] improves the error-
correction performance of SC by relying on Lparallel SC
decoding paths. Every time an information bit is estimated,
both possible values 0and 1are investigated and 2Lpaths are
created. Each path is associated to a path metric PM, and the
Lpaths with the highest PM are discarded. In the LLR-based
formulation of SCL [4], the PM can be computed as
PM−1l= 0,
PMil=(PMi−1l+|αil|, if ˆuil6=1
2(1 −sgn (αil)) ,
PMi−1l, otherwise, (5)
where lis the path index and ˆujlis the estimate of bit jat path
l. The main limitation of the SCL decoder is a high degree
of complexity: it has a space complexity of O(LN)and time
complexity of O(LN log2N).
The SCS algorithm addresses the high complexity issues of
the SCL decoder by employing a priority queue (PQ) of size
D, in which the candidate paths are stored. Every time a bit
is estimated, the decoder only extends the most probable path
from the queue. An additional list-like parameter Lis used to
limit the number of paths in the queue. If a path of length φ
is extracted Ltimes from the queue, all paths with length less
than φare deleted from the queue.
III. MEMORY EFFICI EN T SOF TWAR E STACK DECODER
In this section, we describe our software implementation
of the SCS decoder. The main improvements over existing
work in [7]–[9], [13] include reducing the decoding tree spatial
complexity to O(N)and replacing the stack sorting step with a
linear search over the stack. We calculate our bit probabilities
in the LLR domain, and make use of the path metric from Eq.
(5). The probability calculation and bit propagation is based on
the approach in [3]. We begin by outlining the data-structures
used in our SCS implementation.
•P: A 2-D float array with which the LLR of a bit index is
recursively calculated. It consists of nrows, where each
row is a probability array of size 2λλ∈[0, n].
•C: A 3-D bit array where the estimated bits are stored
and recursively propagated for gfunction calculations.
•PM: Array of size Dthat stores path metrics.
•PL: Array of size Dthat stores path lengths.
•PL hits: Array of size Nin which the value at each
index φindicates the number of times a path of length φ
was extracted from the PQ.
•paths: A 2-D bit array that stores the paths in the PQ.
•inactive path indices: an integer stack of depth D
that contains inactive path indices.
•active path: A boolean array of size Dthat indicates
whether a path is active or not.
In addition to these, the SCS decoder makes use of the
following variables:
•T: Total number of active paths in the stack.
•min index: Index of path with minimum path metric.
•max index: Index of path with maximum path metric.
•path switch: Boolean that indicates a path switch.
The main loop of the SCS decoder is described in Algorithm 1,
while the most important functions are detailed in Algorithms
2-6. First, the data structures are initialized. The memory for
P,C,PL,PM and paths does not need to be initialized, as it
Algorithm 1: SCS Decoder, Main Loop
Input : received vector yN−1
0
Output: estimated message bits ˆmK−1
0
1initialize data structures();
2min idx =assign initial path();
3for φ= 0,1, . . . , N −1do
4P[0][φ] = L0(yφ);
5while (1) do
6recursively calc P(n, PL[min index]);
7pm0 = calc new pm(PM[min index], P[n][0], 0);
8pm1 = calc new pm(PM[min index], P[n][0], 1);
9if (PL[min index]∈Ac)then
10 extend path(min index, 0, pm0);
11 else
12 if (T== D)then
13 if (PM[max index]>max(pm0, pm1)) then
14 kill path(max index);
15 if pm0 <pm1 then
16 if (T< D)then
17 max index = clone path(min index);
18 extend path(max index, 1, pm1);
19 extend path(min index, 0, pm0);
20 else
21 if (T== D)then
22 max index = clone path(min index);
23 extend path(max index, 0, pm0);
24 extend path(min index, 1, pm1);
25 update min max index();
26 update length info();
27 if (end check() == 1) then
28 break;
29 if path switch then
30 load path();
31 φ=PL[min index]−1;
32 C[φmod 2][n][0];
33 if ((φmod 2) == 1)then
34 recursively update C(n, PL[min index]−1);
35 for φ= 0,1, . . . , K −1do
36 ˆmφ=paths[min index][Aφ];
is set up as new paths are created. The initial path is assigned
to the min index and the channel LLRs are populated at the
top of the probability tree P.
In the while loop (line 5 to 34), the LLR for the current bit
of the most reliable path is calculated. Lines 9 and 10 extend
this path in the event of a frozen bit (i.e. bit index belongs
to frozen set AC). In the case of a message bit, lines 12-14
first check if the PQ is full and if both the new guesses are
better than the worst path in the PQ. If this is true, then the
Algorithm 2: initialize data structures()
1clear(inactive path indices);
2for p= 0,1, . . . , D −1do
3push(inactive path indices,p);
4active path[p]=false;
5for φ= 0,1, . . . , N −1do
6PL hits[φ]= 0;
Algorithm 3: assign initial path()
Output: Index pof initial path
1p=pop(inactive path indices);
2active path[p]=true;
3PM[p]= 0.0;
4PL[p]= 0;
5T= 1;
worst path is killed. Lines 15-24 extend the best path along
the more reliable guess and place the other guess in the PQ if
there is space.
The function update min max length is then called to
update min index,max index and path switch. The indices
of the paths with the maximum and minimum path metrics are
identified in a single loop of at most O(D)complexity, which
eliminates the need to sort all the paths in the PQ, since these
are the only paths that will have to be extended or deleted in
the current iteration of the decoder. Furthermore, by keeping
track of path switching it is possible to reuse the values in the
Pand Cmemory just like an SC decoder, as long as SCS is
extending the same path. In case of a path switch, the new
path needs to be loaded into the Pand Cmemory only once,
and then they can be reused until the path switches again. This
enables us to reduce the space complexity while maintaining
the computational complexity between switches.
Next, update length info is called, which checks if the
current path length has been investigated Ltimes, and kills
all shorter paths if so. Then, the call to end check causes the
algorithm to break out of the while loop if the PQ is empty
or if the length of the current path has reached N. Finally, a
new path is loaded in case of a switch, and the last bit of the
current path is updated in the Cmemory. Upon exiting the
while loop, the index of the decoded path is in min index:
the decoder copies the bits of the unfrozen set Ainto the
estimated message bit vector, and the algorithm terminates.
The probability and bit trees Pand Chave a space com-
plexity of O(N), equal to that of the SC decoder. The PL
and PM arrays have a space complexity of O(D), while the
paths memory has a space complexity of O(ND). Since the
frozen values are already known and only the message bits in
the path need to be saved, the paths memory can be further
compressed to a space complexity of O(KD)at the cost of
the decoder only being able to support a maximum fixed rate.
Algorithm 4: clone path()
Input : Index pof path to clone
Output: Index p0of cloned path
1p0=pop(inactive path indices);
2active path[p0]=true;
3PM[p0]=PM[p];
4PL[p0]=PL[p];
5T=T+1;
6for φ= 0,1,...,PL[p]−1do
7paths[p0][φ]=paths[p][φ];
Algorithm 5: recursively calc P()
Input: Layer λand phase φ
1if λ= 0 then
2return;
3ψ=φ/2;
4if ((φmod 2)== 0)or (path switch == 1) then
5recursively calc P(λ−1, ψ);
6for β= 0,1,...,2n−λ−1do
7if ((φmod 2)== 0)then
8P[λ][β] = f(P[λ−1][2β], P[λ−1][2β+1]);
9else
10 u = C[0][λ][β];
11 P[λ][β] = g(P[λ−1][2β], P[λ−1][2β+1], u);
Algorithm 6: load path()
1for φ= 0,1,...,PL[min index]−1do
2C[φmod 2][n][0] = paths[min index][φ];
3if ((φmod 2)== 1)then
4recursively update C(n, φ);
IV. FURT HE R COMPLEXITY REDUCTION
We define an “iteration” as a decoder estimating a particular
bit index in a candidate path. Thus, the SC and SCL decoding
algorithms have a fixed number of iterations Nand NL
respectively, while the SCS decoder has a variable number
of iterations depending on Eb/N0. This number converges to
Niterations as Eb/N0increases.
Studies presented in [14] have shown that decoding failures
are typically caused by a limited number of errors introduced
by the channel (1-3 channel errors). These errors are more
likely to occur at bit indices with low reliability, that are found
early on in the polar codeword, and thus decoded earlier.
We propose to protect the first γinformation bits encoun-
tered along the SC decoding tree with a CRC of length
Cγ. When the SCS decoder reaches a candidate path with
γmessage bits, it can perform a CRC check and kill the path
in case the CRC fails. Paths that fail the CRC still result in
an increment of PL hits, and therefore the SCS decoder will
have at most Lpaths that have passed this initial CRC.
It is possible, especially at low Eb/N0, that incorrect paths
0 1 2 3
10−5
10−4
10−3
10−2
10−1
100
Eb/No
FER
SC
SCL (L=32)
SCS
SCS-ET
Fig. 1. FER curves for different decoding algorithms, P C(512,256).
pass this initial CRC, or that the correct path gets killed
before or shortly after the CRC check, due to errors in the
CRC bits. In such cases the SCS decoder performs many
useless iterations only to result in a decoding failure. We
propose to introduce an early termination criterion by defining
a maximum number of iterations Mit the decoder is allowed
to take before failure is declared. Mit is initialized to 2LN:
in the event of an initial CRC failure, Mit is penalized by N
iterations, corresponding to the path it has just removed from
consideration. An early termination criterion for SCS decoders
has also been proposed in [15]. However, the parameters of
the method described in [15] depend on channel conditions,
and the early termination comes at a cost in FER; our
approach (SCS-ET) is instead channel-independent and causes
negligible error-correction performance degradation.
V. SIMULATION RESU LTS
Simulation results are presented for P C(512,256) con-
structed for an AWGN channel with σ2= 0.5. The parameter
Lis set to 32 for SCL, SCS, and SCS-ET. The stack depth
Dis set to LN = 16,384 for the SCS and SCS-ET decoders.
Finally, SCS-ET has initial CRC parameters set to γ= 16, Cγ
= 8 and a CRC polynomial 0xD5.
Fig. 1 shows the frame error rate (FER) for the considered
algorithms. It can be seen that SCS and SCS-ET provide
similar error-correction performance as SCL. Fig. 2 shows that
on average the SCS decoder takes fewer iterations than the
SCL decoder, with a gain ranging between 48% and 97%,
and at high Eb/N0it converges to SC complexity. It can
be observed that by using a CRC on γinformation bits and
the early termination criterion, the complexity of the SCS-
ET decoder has been reduced, gaining 1% to 50% over SCS
and 71% to 97% over SCL. Finally, Fig. 3 shows that the
CRC check in the SCS-ET decoder reduces the number of
iterations by 1% to 28% with respect to SCS in case of a
successful decoding, while the CRC combined with the early
termination criterion yields a gain ranging between 31% to
53% in iterations over SCS in case of failed decoding. SCS-ET
0 1 2 3
103
104
Eb/No
Average Iterations
SC
SCL (L=32)
SCS
SCS-ET
Fig. 2. Average number of iterations for different decoding algorithms,
P C(512,256).
0 1 2 3
103
104
Eb/No
Average Iterations
SCS Pass
SCS Fail
SCS-ET Pass
SCS-ET Fail
Fig. 3. Average number of of iterations for decoder success/failure, with and
without early termination, P C(512,256).
thus requires 66%−97% and 75%−95% fewer iterations than
SCL in case of successful and failed decoding, respectively.
VI. CONCLUSION
In this work, we have presented an efficient software im-
plementation of the SCS decoding algorithm for polar codes.
It replaces the stack sorting step with a linear search over
the stack, and guarantees the same spatial complexity as SC
to compute the path probabilities, with additional memory
required only for storing paths in the queue. We have also pro-
posed a partial CRC check as an effective noise-independent
method to reduce the SCS time complexity, along with an early
termination criterion. Simulation results show up to a 97%
iteration gain with respect to SCL, with negligible degradation
in error-correction performance.
REFERENCES
[1] E. Arikan, “Channel polarization: A method for constructing capacity-
achieving codes for symmetric binary-input memoryless channels,” IEEE
Transactions on Information Theory, vol. 55, no. 7, pp. 3051–3073, July
2009.
[2] “Final report of 3GPP TSG RAN WG1 #87 v1.0.0,”
http://www.3gpp.org/ftp/tsg ran/WG1 RL1/TSGR1 87/Report/Final
Minutes report RAN1%2387 v100.zip, Reno, USA, November 2016.
[3] I. Tal and A. Vardy, “List decoding of polar codes,” IEEE Transactions
on Information Theory, vol. 61, no. 5, pp. 2213–2226, May 2015.
[4] A. Balatsoukas-Stimming, M. B. Parizi, and A. Burg, “LLR-based
successive cancellation list decoding of polar codes,” IEEE Transactions
on Signal Processing, vol. 63, no. 19, pp. 5165–5179, Oct 2015.
[5] S. A. Hashemi, C. Condo, and W. J. Gross, “Simplified successive-
cancellation list decoding of polar codes,” in 2016 IEEE International
Symposium on Information Theory (ISIT), July 2016, pp. 815–819.
[6] ——, “Fast simplified successive-cancellation list decoding of polar
codes,” in 2017 IEEE Wireless Communications and Networking Con-
ference Workshops (WCNCW), March 2017, pp. 1–6.
[7] K. Niu and K. Chen, “Stack decoding of polar codes,” Electronics
Letters, vol. 48, no. 12, pp. 695–697, June 2012.
[8] ——, “CRC-aided decoding of polar codes,” IEEE Communications
Letters, vol. 16, no. 10, pp. 1668–1671, October 2012.
[9] K. Chen, K. Niu, and J. Lin, “Improved successive cancellation decoding
of polar codes,” IEEE Transactions on Communications, vol. 61, no. 8,
pp. 3100–3107, August 2013.
[10] Y. Shen, C. Zhang, J. Yang, S. Zhang, and X. You, “Low-latency soft-
ware successive cancellation list polar decoder using stage-located copy,”
in 2016 IEEE International Conference on Digital Signal Processing
(DSP), Oct 2016, pp. 84–88.
[11] P. Giard, G. Sarkis, C. Leroux, C. Thibeault, and W. J. Gross,
“Low-latency software polar decoders,” in Journal of Signal Processing
Systems, to appear. [Online]. Available: http://arxiv.org/abs/1504.00353
[12] B. L. Gal, C. Leroux, and C. Jego, “Multi-Gb/s software decoding of
polar codes,” IEEE Transactions on Signal Processing, vol. 63, no. 2,
pp. 349–359, Jan 2015.
[13] V. Miloslavskaya and P. Trifonov, “Sequential decoding of polar codes,”
IEEE Communications Letters, vol. 18, no. 7, pp. 1127–1130, July 2014.
[14] O. Afisiadis, A. Balatsoukas-Stimming, and A. Burg, “A low-complexity
improved successive cancellation decoder for polar codes,” in 2014 48th
Asilomar Conference on Signals, Systems and Computers, Nov 2014, pp.
2116–2120.
[15] P. Trifonov, V. Miloslavskaya, and R. Morozov, “Fast sequential
decoding of polar codes,” CoRR, vol. abs/1703.06592, 2017. [Online].
Available: http://arxiv.org/abs/1703.06592