Conference PaperPDF Available

Continual Learning-Based Channel Estimation for 5G Millimeter-Wave Systems

Authors:
  • Samsung Research
Continual Learning-Based Channel Estimation for
5G Millimeter-Wave Systems
Swaraj Kumar1, Satya Kumar Vankayala1, Biswapratap Singh Sahoo2, and Seungil Yoon3
1Networks S/W R&D Group, Samsung R&D Institute India Bangalore, Bengaluru, India
2Networks Modem R&D Group, Samsung R&D Institute India Bangalore, Bengaluru, India
3Network Business, Samsung Electronics, Suwon, South Korea
Abstract—Accurate channel estimation in the millimeter-wave
(mmWave) based wireless communication systems is challenging
and involves a lot of computational costs. The mmWave fre-
quency band has its advantages and disadvantages. At higher fre-
quency mmWave bands, due to smaller wavelengths, we can pack
a large number of antennas compared to lower frequency bands.
However, the main disadvantages of the mmWave system are
computing accurate channel estimation, smaller coverage, and
high signal absorption. Besides, when multiple-input multiple-
output (MIMO) systems operated over mmWave frequencies,
it makes the channel estimation even more intricate in terms
of computational complexity and estimation accuracy. In this
paper, we plan to address these limitations and improve channel
accuracy; we proposed a Continual Learning (CL)-based method
for channel estimation in mmWave MIMO systems. Besides, we
also proposed an activation function that is numerically stable
and robust against early saturation. We discussed several channel
estimation algorithms from the literature, also evaluated and
compared their performances via numerical simulations. Our
simulation results show that the proposed CL-based method
outperforms the existing minimum mean squared error (MMSE)-
based channel estimators in terms of precision. Furthermore,
based on our experiments, we give insight into spectral efficiency
with respect to the number of available channel observations.
Index Terms—machine learning, channel estimation,
millimeter-wave system, MIMO, channel model
I. INTRODUCTION
Communication over millimeter wave (mmWave) frequency
band has been regarded as a promising technology to meet
ever-growing data traffic [1]. However, the mmWave fre-
quency band faces severe signal attenuation challenges. Taking
the advantages of the smaller wavelength of mmWave bands,
narrow high gain beamforming is possible by deploying a high
number of antennas at the transceivers to compensate for the
high propagation loss [1], [2]. However, the use of a large
number of antennas makes the process of channel estimation
challenging. With the conventional channel estimators, the
number of training pilots is a function of the number of
transmit antennas as well as the number of resource blocks,
which is prohibitive [2].
Channel estimation plays an integral role in the design and
performance of the next-generation wireless communication
systems, e.g., mmWave networks. Generally speaking, the
channel estimation can be broadly classified into two cate-
gories: 1) Blind channel based estimation, and 2) Training
based channel estimations. Blind channel estimation is used
in the uplink channel estimation process. Performance of these
algorithms are inferior compared to training-based channel
estimators [3]. While the blind estimators provide reasonable
accuracy, the residual calibration errors will limit the overall
system performance. Recently, training based channel estima-
tions have gained much attention in MIMO systems. In these
methods, the training sequences with respect to minimizing
mean squared error (MMSE) estimation are determined with
partial knowledge about the state of the unknown time-
varying channel, additive noise, and multiuser interference.
Though MMSE channel estimators are used in products, the
main drawback is high computational complexity. There is
a tradeoff between channel estimation complexity and per-
formance. The learning-based algorithms capture the channel
imperfections better while achieving fairly low computational
complexity, and making them suitable for channel estima-
tion [4], [5]. Thus, learning-based MMSE channel estimators
make a good combination to study mmWave massive MIMO
systems. However, the computational complexity of learning-
based schemes grows with increasing number of layers.
The channel models for mmWave systems are often based
on geometry-based stochastic models which are complicated,
thus the MMSE estimates cannot be determined in closed
form. To find the best computable estimator, it is a common
strategy to limit the estimator to a fixed class of functions
and then find the best estimator in that class [5]. Thus,
low-complexity MMSE estimators which could restrict the
class of linear estimators can minimize the mean squared
error (MSE). In the 5G systems, number of antennas and
system bandwidth is much higher than 4G systems. There is a
need for faster and accurate channel estimation in 5G Cloud
Radio Access Network (RAN) systems. We can implement
our computationally efficient machine learning algorithm in
Cloud-RAN systems easily.
In this paper, we introduce a continual learning (CL)-based
MMSE called CL-MMSE to increase the plasticity of the
network. Besides, we also introduce an activation function
called Noisy Sigsoftmax that allows for smooth gradient
updates during backpropagation. Both these solutions have
been carefully designed for improving the efficacy of neural
networks and reducing the number of layers. Our proposed
approach differs from previous studies on the performance
analysis of massive MIMO systems with reciprocity calibra-
tion imperfections where the exact knowledge of the channel
is assumed to be available.
The rest of this paper is organized as follows. In Section II,
we provide a brief summary of the existing channel estimation
methods. Section III presents a typical massive MIMO system
and the investigated channel model. In Section IV, a learning-
based channel estimation method is proposed. Section V pro-
vides the performance evaluation of the proposed algorithms,
and we conclude the paper in Section VI.
II. RE LATE D WOR KS
Several works have been investigated to study the mmWave
MIMO channel estimation. Authors in [5] adopted a neural
network-based model to estimate a filter matrix which when
multiplied with the observed signal gives the channel esti-
mation. They used a hierarchical training mechanism (first
train on less number of antennas and then progressively
increase the antennas). Authors used a very popular activation
function called Softmax. Softmax can also be a bottleneck
of representational learning in language modeling due to
numerical instability.
Neural networks with Softmax are believed to have univer-
sal approximation property and problem of early saturation. To
address these issues, authors in [6] proposed Noisy Softmax
activation function. The early saturation behavior of Softmax
impedes the exploration of backpropagation, which sometimes
is a reason for model converging at a bad local-minima.
Besides, the Noisy Softmax relies on injecting annealed noise
in Softmax during each iteration, thus postponing the early
saturation.The proposed method in [7], called Sigsoftmax
designed specially for recurrent neural networks (RNN) solves
the numerical stability problem of Softmax. Incorporating it in
CNN requires changes like restricting its range. But Sigsoft-
max is monotonically increasing, hence we use a modified
version of it to restrict the range of upper bound to close
to 1. This modified Sigsoftmax suffers from early saturation.
For overcoming the early saturation problem in the modified
Sigsoftmax, we have added annealed noise as in [6], to make
the activation function more robust.
Several works have been investigated the learning-based
channel estimation [4], [5], [8], however with different key
focus in terms of system model and estimator design. These
solutions also do not provide a computationally prudent
models. The learning-based methods mainly exploit Toeplitz
structure of the covariance matrix for channel estimation [5].
We also adopt the Toeplitz structure to define the dimensions
of kernel needed in the convolutional layer. To make the
learning procedure more robust, we introduce a continual
learning mechanism for training the proposed MMSE. Con-
tinual learning is an incremental learning mechanism which
enables the model to learn on multiple tasks.
Neural networks tends to forget the previously learnt tasks
when introduced to the new task. This is called catastrophic
forgetting. This is due to low plasticity of the neural networks
used. Stability of the neural network generally decreases
when it is used to solve myriad of tasks, this is referred as
stability-plasticity dilemma [9]. Continual Learning methods
can be classified in three major groups: replay, regularization-
based and parameter isolation methods [10]. We employ a
regularization-based method called elastic weight consolida-
tion (EWC) [11]. EWC introduces a regularization term in
the loss function which signifies the importance of retaining
the learning from previous tasks. During training of new
tasks, deviation from the optimal weights of previous task
is penalised.
III. SYS TE M AN D CHA NN EL MO DELS
A. System Model
We considered a wireless communication system operating
in the mmWave frequency band. We assumed the base station
(BS) and the user equipment (UE) are equipped with M
and Ntransmit-receive antenna, respectively. Antennas are
separated by at least λ/2, where λdenotes the wavelength.
In addition, both the BS and UE are assumed to be equipped
with Mrf, and Nrf radio frequency (RF) chains, respectively,
where Mrf Mand Nrf N. We consider the uplink
channel estimation for a multi-antenna transceivers mmWave
communication system.
Further assuming the hybrid analog-digital architecture,
where BS transmit pilot signals periodically. We assume that
periodicity is lesser than coherence time. The transmitter
modifies the data streams s, before transmission, with a
digital baseband precoder, and subsequently construct the
final transmitter signal using a RF precoder. The time-varying
discrete transmitted signal can be expressed as [2]
x=PMs(1)
where PMdenote the combined precoding matrix used to
construct the final transmitter signal. The complex-valued
received signal at receiver can be expressed as
r=Hx +n(2)
where Hdenotes the M×Nchannel gain matrix, and
nrepresents the zero-mean additive white Gaussian noise
(AWGN) vector with covariance matrix σ2IN. The received
signal ris multiplied by a complex weighting factor WH,
which adjusts the magnitude and phase of the transmitted
signal from the transmitting antennas, and the superscript H
on Wrepresents its complex conjugate. We assume that σ2
is known to the BS receiver. Finally, the combined signal can
be expressed as
y=WHHx +WHn(3)
In this paper, we estimate the uplink channel using CNN
with a novel activation function.
B. Millimeter-Wave Channel Model
The mmWave channel has limited scattering [12]–[14].
Thus, we assume that channel has Ltaps, where `=
0,1...,L is the subpath index. Under this adopted channel
model, the channel matrix Hcan mathematically be expressed
as
H=
L
X
`=0
g`urx(θ`)uH
tx(φ`)(4)
where g`is the complex small-scale fading coefficient asso-
ciated with the `-th propagation path, urx(θ`)and utx (φ`)
are the antenna array response vectors with AoD (Angle of
Departure) φ`and AoA (Angle of Arrival) θ`of the `-th path
at the transmitter and receiver, respectively. Assuming uniform
linear arrays (ULAs), the antenna array response urx(θ`)and
uH
tx(φ`)are adopted as defined in [2].
IV. LEARNING-BASED CHANNEL ESTIMATION
A. Key Idea
Hierarchical training allows the model to learn sequentially
from small number of antennas to large number of antennas,
it still does not solve the problem of catastrophic forgetting
which makes the model forget the previous tasks when we
train it for new task, although there is no major variation be-
tween the tasks here. However in [5], they use interpolation to
make approximation in their training mechanism, which alters
the input significantly and weights (θ1and θ2) will change
significantly in between these tasks. To make the learning
optimal and avoid forgetting, we use EWC to enhance the
retention power of our CL-MMSE. The objective loss function
i.e., LBfor the proposed CL-MMSE can be expressed as
LB=1
kN k X
H∈N
kWB(θB,1, θB,2,y)Hk2
2+λ
2
X
i
|θB,i |2
(5)
where yis the observation, His the channel response matrix,
Nis the batch size, θK,j are weights of jconvolutional layer
for Kantennas. WBis the output of the CNN for yfor having
Bnumber of antennas, λis the regularization coefficient for
L2 regularization. Regularization is only used in the first task.
Objective function in (5) will learn for Bantennas forgetting
the training on previous task. Adding EWC regularization
term, gives the final loss function:
LEW C =LB+X
i
µ
2Fi(θB,i θ
A,i)2(6)
where µis the coefficient indicating the importance of task
with Anumber of antennas (previous task) with respect
to task with Bnumber of antennas (new task). Fiis the
fisher information matrix for layer i.θ
A,i is the interpolated
weight matrix for Aantennas. Interpolation is applied to make
the dimension of θB,i and θ
A,i same. When using (6), L2
regularization is turned off (λ= 0 in LB). L2 regularization
negatively affects the EWC regularization term.
We calculate the fisher’s matrix between the weights of the
two tasks and add it to the loss function (5). The EWC’s
regularization term penalises the model if it gets away from
the previous tasks optimal weights. This way when we do
the weight update, the weights do not get completely skewed
toward the latest task, and the network is able to retain
Fig. 1. Illustration of weights updation scheme with or without CL. Here,
θ
T1,i is the interpolation of θT1,i and θT,i is the optimal weight for task
Tand layer i.
the prior learning. Fig. 1 depicts the CL’s weight updating
strategy. Optimal weights follow a trajectory which enables
the network to be more plastic and retain prior learning.
This signifies that the latest task is important but significant
attention need to be given to the distance it moves away from
the prior tasks weights. Also, unlike hierarchical training θT,i
is initialized and not merely interpolated from θT1,i. This
allows CL-MMSE to explore more rather than getting stuck
at bad local minima introduced due to the previous task. Also
continual learning can be used in addition with federated
learning for distributed computing [15].
We introduce a new activation function to overcome the
deficiencies of softmax function. Though there have been a
surge in finding ways to alter the undesirable characteristics
of softmax, the research is limited to vision and Natural
language Processing (NLP) tasks. For these tasks, softmax
is mostly used in the last layer for classification problems.
Due to limited range of activation functions, it is not suitable
to be used in the output layer of the neural networks for
regression problems. Our network is limited to two convo-
lutional blocks and the activation function in middle of them.
Small CNN ensures low computational complexity. As in
[5], the performance of Softmax function was worse than
ReLU. This is due to Softmax not being numerical stable
and having negative values which is particularly damaging
for an activation function especially when used in mid layers.
Activation function aren’t required to inhibit the neurons of
successive layer. We try to implement Sigsoftmax as proposed
by authors in [7], for the MMSE estimator. Sigsoftmax
provides an altered version of Softmax which is non negative
and is more numerically stable than Softmax.
[f(z)]i=exp(zi)sig(zi)
PP
p1exp(zp)sig(zp)(7)
Where [f(z)]iis the ith term in array f(z),ziis the ith
term in array zof size Pand sig is the sigmoid function.
Sigsoftmax (7) was developed for RNN, we modify it to be
used in our proposed convolutional neural network (CNN).
Equation (7) is monotonically increasing. To restrict flow of
any large values to the last layer, we need to restrict its range.
This is can be done by adding a large term to the input of
(8). This gives a modified version of Sigsoftmax:
[f(z+b)]i=exp(zi+b)sig(zi+b)
PM
m1exp(zm+b)sig(zm+b)(8)
Simplifying the equation further, produces (9)
[f(z+b)]i=exp(zi)sig(zi+b)
PM
m1exp(zm)sig(zm+b)(9)
Where for a large b(i.e., b= 5 in our case), the rate, this
allows us to overcome the short comings of Softmax and
restrict the range too, unlike ReLU. But with large bin (9)
the increase in (9) is very slow.
But in our simulations, modified Sigsoftmax (9) tends to
perform better than Softmax and for low number of antennas
similar to ReLU. Increase in f(zi)on increasing ziis consid-
erably slowed down in (7) compared to (9). This leads to better
results for low number of antennas. But at higher number of
antennas the performance deteriorates. This is due to early
saturation of modified Sigsoftmax. Due to this modification,
Sigsoftmax function the gradient update is small and is not
able to update the weights accurately for larger number of
antennas. To resolve this we have added annealed noise to the
modified Sigsoftmax. This method was popularized in Noisy
Softmax [6], which delays the saturation for the functions.
zi=ziα(θi˙
Xi(1 cos(βi))) (10)
where ziis the input to the activation function, θi= weights of
the preceding layer, Xi=ith row of X(input to the preceding
layer), βiis the angle between θiand Xivectors, and αis
the hyper-parameter for adjusting the noise level.
The annealed noise is calculated using the weights of the
preceding layer and input to that layer. Although the noise
addition technique used here was devised for the dense layer
of a neural network as in [6]. However, in our CNN we
use only one kernel per convolutional layer. This makes the
noise addition technique feasible. This noise addition to the
modified Sigsoftmax improves the performance of the model
even for large number of antennas by delaying the saturation.
The proposed activation function removes the hurdles that
restricts Softmax from performing better. The activation func-
tion can enable enhanced learning on a task, however, can not
increase the plasticity of the network. Hence, for learning new
tasks (more number of antennas setting) the weights needs to
be adjusted properly to extract maximum from the network
and retain previous task as well.
B. Working Procedure
Fig. 2 illustrates the flow of the proposed channel esti-
mation. To train the model, we generate the true channel
matrix Husing a channel model similar to [16], and y
observations are obtained using (3) which served as the input
to CNN-MMSE estimator. Subsequently, the neural network is
composed of two circular convolution layers each having one
kernel and having biases of its own. The size of the kernel
depends on the number of antennas and the transformation
used. In between two layers is a activation function. We are
Fig. 2. Flow diagram of the proposed CL-MMSE estimator.
going to evaluate the performance of our Noisy Sigsoftmax
with other popular functions. The estimated channel matrix,
denoted as Hest, is calculated by multiplying the observations
with the output of the CNN. Finally, we calculate the CL-
MMSE loss, denoted as L, using (5).
Algorithm 1 Continual Learning MMSE
1: Set Mmax,Kmax ,n
2: M0=Mmax/2n,K0=Kmax /2n
3: Initialize θ0,1,θ0,2,b0,1,b0,2
4: Generate observation yand Husing (3)
5: Calculate the loss function using (5) and update weight
using ADAM [17]
6: for i= 1,2, . . . , N do
7: Set Mi=M/2ni, Ki=K/2ni
8: Interpolate θi1,1,θi1,2,bi1,1,bi1,2
9: Initialize θi,1,θi,2, bi,1, bi,2
10: Generate yand Husing (3) for Mi
11: Calculate Fiusing θi,and θi1,
12: Calculate loss function using (6) and update weights
using ADAM [17]
13: end for
During training we used a hierarchical training mechanism
combinedly with continual learning, The proposed mechanism
is provided in Algorithm 1. Step 1 takes number of antennas,
denoted as Mmax, and the length of kernel, denoted as Kmax ,
as inputs. In step 2, we spilt the training into nsteps, in
each ith incremental step we train the network on nM/βni
antennas, with βas 2. Step 3 initializes the weights for the first
set of antennas. Step 4 uses (3) to generate yand H. Step 5
updates the weights using (5). From Step 611, we update the
weights by doubling the number of antennas in each iteration.
In Step 8 interpolation is used to increase the dimension of
previous iteration’s weight matrix θi1,1and θi1,2to keep the
dimension same as the current iteration’s weight matrix θi,1
and θi,2. Similarly biases are also interpolated in Step 8. With
varying number of antennas, we add the EWC regularization
term, calculated in Step 11 to the CL-MMSE loss function
(6). The EWC is calculated with the weight matrix of current
iteration weight matrix (θi,) and with the interpolated weight
matrix of previous iteration (θi1,).
V. PERFORMANCE EVAL UATIO N AN D ANALYSI S
To analyze the performance of our algorithm CL-MMSE,
we compare it with three other schemes: Fast MMSE [5],
Maximum Likelihood (ML) [18], and CNN-MMSE [5] es-
timator. Fast MMSE technique uses the Softmax activation
function and does not have hierarchical training schedule. The
ML method also first calculates the channel covariance matrix
and uses this to find the MMSE estimates of channel vectors.
We test our CL-MMSE with five different activation functions:
ReLU [19], Swish [20], Sigsoftmax [7], Softmax [5] and our
proposed Noisy Sigsoftmax function (9). All these CL-MMSE
estimators are trained using our proposed training method as
mentioned in IV-B. Models are trained for N= 16Msamples
(M is the number of antennas). We use 70% percent samples
for training, 15% for validation and remaining 15% for testing.
Fig. 3 depicts the gain in accuracy obtained when continual
learning is incorporated in the normal hierarchical training.
Using the proposed training mechanism, we train the CL-
MMSE estimators with different activation functions. We have
chosen batch size as 20, number of layers as 2, and ADAM
optimizer [17]. Different learning rate was used for the CNNs
based on the activation functions for more precision.
We study the performance of our scheme against two
metrics: MSE and spectral efficiency. In Table I, we have
listed the basic system parameters used for simulation which
are based on [16]. The performance of the Toeplitz structures
was found to be superior to the Circulant. Thus, Topelitz
structure has been used to analyse the activation functions. In
addition, the off-line learning procedure required by the CNN
estimators to generate the necessary realizations of channel
vectors and observations, i.e., the noise variance σ2and the
correct model for the parameters, are assume to be known.
The channel estimation for different proposed methods
with per-antenna MSE for a single snapshot (T= 1) is
demonstrated in Fig. 4 as a function of the number of
antennas. Toeplitz structure was used in conjunction with the
activation functions. Fig. 4 shows the efficacy of our proposed
activation function the noisy modified Sigsoftmax, represented
as ToepNoisySigsoftmax. The performance is better than all
the other activation functions. From the simulation results,
it is clear that Sigsoftmax deteriorates with large number
of antennas (i.e., M= 96 and M= 128) whereas Noisy
Sigsoftmax outperforms all other schemes for large number
of antennas.
TABLE I
SIMULATION PARAMETERS
Parameter Value
Deployment scenario Urban
Operating frequency [GHz] 28
Subcarrier spacing [kHz] 120
Number of antennas at BS (M) [8, 16, 32, 96, 128]
Number of antennas at UE (N) 3
UE distribution Random
Pathloss coefficient 3.5
Log-normal Shadow Fading 0 dB
SNR at maximum distance 15 dB
Annealed noise (α) 0.1
EWC coefficient (µ) 0.3
Regularizer coefficient (λ) for M0= 106else 0
Distance {Minimum, Maximum} {1000 m, 1500 m}
Fig. 3. Comparison of training mechanism of proposed CL-MMSE and
Hierarchical learning MMSE (HL-MMSE) [5].
0 20 40 60 80 100 120 140
Number of Antennas at the Base Station
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
Normalized MSE
ToepSwish
ToepReLU
ToepSigSoftmax
FastMMSE
CircML
ToepNoisySigSoftmax
ToepSoftmax
Fig. 4. MSE per antenna at an SNR of 15 dB against a single observation.
As shown in Fig. 5, in our evaluation with varying SNR, the
noisy Sigsoftmax outperforms the other schemes. With higher
SNR values, the performance of the estimators converges. In
addition, the Softmax function also outperform ReLU.
-15 -10 -5 0 5 10 15
SNR[dB]
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Normalized MSE
FastMMSE
ToepSigSoftmax
ToepNoisySigSoftmax
ToepReLU
Toepswish
ToepSoftmax
Fig. 5. MSE per antenna for M= 64 antennas against a single observation.
Fig. 6, we demonstrate that the spectral efficiency against
the number of observations T. We observed significant gain
with the use of noisy Sigsoftmax. In addition, it is also seen
that the importance of improving the performance of CL-
MMSE can have on the information rate.
0 5 10 15 20 25 30 35 40
Number of Observations T
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
Spectral Efficiency in Bit/s/Hz
FastMMSE
ToepSigSoftmax
ToepNoisySigSoftmax
ToepReLU
Toepswish
ToepSoftmax
Fig. 6. Spectral efficiency for M= 64 antennas and SNR of 15 dB.
VI. CONCLUSION
In this paper, we presented a continual learning-based chan-
nel estimator called CL-MMSE. In contrast to other MMSE-
based approaches in the literature, our proposed method differs
in terms of training mechanism and and newly introduced ac-
tivation function. We carefully designed the channel estimator
by combinedly hierarchical training and continual learning to
improve the efficacy of the neural network while reducing
the number of layers. Despite the use of a lower number
of layers, our proposed method outperformed the existing
schemes in the literature in terms of MSE and computational
complexity. Our observation through simulations suggests that
neural networks provide a low complexity solution for blind
channel estimation. In future. one can consider the proposed
solution for 5G systems.
REFERENCES
[1] T. S. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang, G. N.
Wong, J. K. Schulz, M. Samimi, and F. Gutierrez, “Millimeter Wave
Mobile Communications for 5G Cellular: It Will Work!,” IEEE Access,
vol. 1, pp. 335–349, 2013.
[2] Y. Wu, Y. Gu, and Z. Wang, “Efficient Channel Estimation for mmWave
MIMO With Transceiver Hardware Impairments,” IEEE Trans. Veh.
Technol., vol. 68, no. 10, pp. 9883–9895, 2019.
[3] R. Chopra, C. R. Murthy, H. A. Suraweera, and E. G. Larsson,
“Blind Channel Estimation for Downlink Massive MIMO Systems with
Imperfect Channel Reciprocity,IEEE Trans. Signal Process., 2020.
[4] S. K. Vankayala, S. Kumar, and I. Kommineni, “Optimizing deep
learning based channel estimation using channel response arrangement,”
in 2020 IEEE International Conference on Electronics, Computing and
Communication Technologies (CONECCT), pp. 1–5, 2020.
[5] D. Neumann, T. Wiese, and W. Utschick, “Learning the MMSE Channel
Estimator,” IEEE Trans. Signal Process., vol. 66, no. 11, pp. 2905–2917,
2018.
[6] B. Chen, W. Deng, and J. Du, “Noisy Softmax: Improving the General-
ization Ability of DCCN via Postponing the Early Softmax Saturation,”
Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition,
pp. 5372–5381, 2017.
[7] S. Kanai, Y. Fujiwara, Y. Yamanaka, and S. Adachi, “Sigsoftmax:
Reanalysis of the Softmax Bottleneck,” Proc. of the 32nd Int. Conf.
on Neural Information Processing Systems, p. 284–294, 2018.
[8] M. A. Amirabadi, M. H. Kahaei, S. A. Nezamalhosseini, and V. T.
Vakili, “Deep Learning for Channel Estimation in FSO Communication
System,” Optics Communications, vol. 459, p. 124989, 2020.
[9] S. Grossberg, “Studies of Mind and Brain : Neural Principles of
Learning, Perception, Development, Cognition, and Motor Control,
Boston studies in the philosophy of science 70. Dordrecht: Reidel, 1982.
[10] M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis,
G. Slabaugh, and T. Tuytelaars, “Continual learning: A comparative
Study on How to Defy Forgetting in Classification Tasks,” arXiv preprint
arXiv:1909.08383, 2019.
[11] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins,
A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska,
et al., “Overcoming Catastrophic Forgetting in Neural Networks,
Proceedings of the national academy of sciences, vol. 114, no. 13,
pp. 3521–3526, 2017.
[12] J. Ko, Y.-J. Cho, S. Hur, T. Kim, J. Park, A. F. Molisch, K. Haneda,
M. Peter, D.-J. Park, and D.-H. Cho, “Millimeter-Wave Channel Mea-
surements and Analysis for Statistical Spatial Channel Model in In-
building and Urban Environments at 28 GHz,IEEE Trans. Wireless
Commun., vol. 16, no. 9, pp. 5853–5868, 2017.
[13] M. R. Akdeniz, Y. Liu, M. K. Samimi, S. Sun, S. Rangan, T. S.
Rappaport, and E. Erkip, “Millimeter Wave Channel Modeling and
Cellular Capacity Evaluation,IEEE J. Sel. Areas Commun., vol. 32,
no. 6, pp. 1164–1179, 2014.
[14] C.-H. Yao, Y.-Y. Chen, B. P. Sahoo, and H.-Y. Wei, “Outage Reduction
with Joint Scheduling and Power Allocation in 5G mmWave Cellular
Networks,” in 2017 IEEE 28th Annual Int. Symp. on Personal, Indoor,
and Mobile Radio Communications (PIMRC), pp. 1–6, IEEE, 2017.
[15] S. Kumar, S. Dutta, S. Chatturvedi, and M. Bhatia, “Strategies for en-
hancing training and privacy in blockchain enabled federated learning,
in 2020 IEEE Sixth International Conference on Multimedia Big Data
(BigMM), pp. 333–340, 2020.
[16] 3GPP, “Spatial channel model for multiple input multiple output
(MIMO) Simulations (Release 12),” 3GPP, 2014. TR 25.996 V12.0.0.
[17] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,”
arXiv preprint arXiv:1412.6980., 2014.
[18] D. Neumann, M. Joham, L. Weiland, and W. Utschick, “Low-complexity
Computation of LMMSE Channel Estimates in Massive MIMO,19th
Int. ITG Workshop on Smart Antennas, pp. 1–6, 2015.
[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet Classifica-
tion with Deep Convolutional Neural Networks,Advances in neural
information processing systems, pp. 1097–1105, 2012.
[20] P. Ramachandran, B. Zoph, and Q. V. Le, “Searching for Activation
Functions,” arXiv preprint arXiv:1710.05941, 2017.
... In the context of wireless networking, some few recent works have considered LML or continual learning approaches for network intrusion detection [28], resource optimization [29] and channel estimation [30]. Since wireless networks are dynamic, the goal of LML is to gracefully handle tasks that continue to evolve in the network which classical machine learning algorithms cannot handle, as learning is usually isolated. ...
Conference Paper
Full-text available
The throughput maximization in multi-hop wireless networks is largely limited by interference due to the reuse of the channel resources. Although machine learning (ML) can accelerate the optimization of wireless network capacity, the existing system can become limited because of insufficient knowledge from available data. We propose a self-renewal ML (SRML) method that incrementally improves the throughput maximization of future optimization instances through the design of a data selection algorithm for scheduling structure classification and application identification model retraining. With one round of implementation, the SRML method outperforms the fixed ML (FML) method, random (RAND) method and Greedy Heuristic method in the multi-commodity flow deployment setting with an average achievable throughput of 100% for small flows and at least 79% for large flows, relative to the delayed column generation (DCG) benchmark algorithm, while reducing the computational complexity and achieving a high solution efficiency. By leveraging the transfer learning of parameters during self-renewal, the computational cost of model training is reduced by at least 71.09%.
... In this work, the DNN makes use of a continuously updated data buffer containing representative historical data from different dynamic settings of the environment. Another line of work proposed a CLbased minimum mean-square error (CL-MMSE) technique for mmWave channel estimation to address high frequency signal characteristics [21]. However, CL-MMSE ignores pilot transmission overhead and only varies the number of receive antennas between 8 and 128 to generate the CL tasks. ...
Preprint
Full-text available
With the proliferation of deep learning techniques for wireless communication, several works have adopted learning-based approaches to solve the channel estimation problem. While these methods are usually promoted for their computational efficiency at inference time, their use is restricted to specific stationary training settings in terms of communication system parameters, e.g., signal-to-noise ratio (SNR) and coherence time. Therefore, the performance of these learning-based solutions will degrade when the models are tested on different settings than the ones used for training. This motivates our work in which we investigate continual supervised learning (CL) to mitigate the shortcomings of the current approaches. In particular, we design a set of channel estimation tasks wherein we vary different parameters of the channel model. We focus on Gauss-Markov Rayleigh fading channel estimation to assess the impact of non-stationarity on performance in terms of the mean square error (MSE) criterion. We study a selection of state-of-the-art CL methods and we showcase empirically the importance of catastrophic forgetting in continuously evolving channel settings. Our results demonstrate that the CL algorithms can improve the interference performance in two channel estimation tasks governed by changes in the SNR level and coherence time.
... We also consider the cases where one or more BSs might face technical outage and might not be available during the RL training. To mitigate the absence of one or more BS in between the simulation we use continual learning [12] to stabilize the training process. The parameters used in the actor-critic model are given in Table II. ...
Article
Data-driven machine learning (ML) is promoted as one potential technology to be used in next-generation wireless systems. This led to a large body of research work that applies ML techniques to solve problems in different layers of the wireless transmission link. However, most of these applications rely on supervised learning which assumes that the source (training) and target (test) data are independent and identically distributed (i.i.d). This assumption is often violated in the real world due to domain or distribution shifts between the source and the target data. Thus, it is important to ensure that these algorithms generalize to out-of-distribution (OOD) data. In this context, domain generalization (DG) tackles the OOD-related issues by learning models on different and distinct source domains/datasets with generalization capabilities to unseen new domains without additional finetuning. Motivated by the importance of DG requirements for wireless applications, we present a comprehensive overview of the recent developments in DG and the different sources of domain shift. We also summarize the existing DG methods and review their applications in selected wireless communication problems, and conclude with insights and open questions.
Article
Full-text available
Nowadays, the Multiple Input Multiple Output (MIMO) Orthogonal Frequency Division Multiplexing (OFDM) is an important method used in wireless communications, especially in 5G cellular communications. As in a wireless network, the input signals pass through a channel, and the input signal undergoes phase shift, attenuation, and interference. So, the information from the user side and the received signals differ. Thus, an effective channel estimator is essential to make cellular communication better. Hence, a novel hybrid technique called Chimp-based CatBoost channel estimation (CbCBCE) was proposed. This technique combines the Chimp optimization algorithm and the CatBoost algorithm. The channel parameters are estimated and then reduced using the Chimp optimization algorithm. Finally, the proposed model is validated with the case study. Then, the result of the proposed model was estimated and compared with other existing techniques. It is observed that the outcome of the proposed design is high compared to the other conventional methods. The presented model is executed in the MATLAB platform, proving that the proposed model has high throughput, increased energy efficiency, less BER, and a high data transfer rate.
Conference Paper
Full-text available
Several recent advances in Federated Learning have made it possible for researchers to train their models on private data present on contributing devices without compromising their privacy. In this paradigm, each contributor's local updates are aggregated and averaged to update the global model. In this paper, we introduce a secure and decentralized training for distributed data. In order to develop an efficient decentralized system, blockchain technology is introduced via Ethereum, which enables us to create a value-driven incentive mechanism. This is done to encourage the contributors to positively affect the learning of the global model. We provide an enhanced security mechanism by implementing differential privacy and homomorphic encryption. The performance of the global model has been significantly boosted by implementing Elastic Weight Consolidation, which prevents Catastrophic forgetting, a scenario where the model learns only on new data and forgets its previous learnings. It proves essential in distributed training since the model is being trained on a spectrum of data, often present in clusters on each contributor's device. We introduce an innovative way of using hyperparameter optimization in federated learning with the help of Hyperopt and deposit based reward mechanism. Experiments verify the capability of the novel strategies incorporated in our system.
Conference Paper
Full-text available
The techniques used in deep learning for channel estimation are generally model-centric. These models have changed significantly over the years with each iteration yielding a better estimator than the last. Fundamentally, channel estimation works by exploiting correlations in an array of complex numbers, in particular the channel gains for a fading channel. In this paper, we study the effects of the spatial arrangement of channel response and input data, on channel estimation. With the right spatial arrangement, we improved the performance of our convolutional neural network that was used for estimation. Additionally, we optimized the training procedure simultaneously. We experimentally validate the importance of spatial arrangement of data in obtaining an accurate deep learning model for the channel.
Conference Paper
Full-text available
Millimeter-wave (mmWave) communications is a promising technology which supports high datarates (multi- Gbps) by utilizing high bandwidth and the directional antenna. While the directionality reduces interference significantly and compensates the high propagation loss, it brings about two major problems. Firstly, mmWave links are easily blocked by obstacles like human bodies and buildings. Secondly, user mobility can frequently cause misalignments between transmitter and receiver beams, which is known as the deafness problem. In this paper, these problems are addressed and a joint scheduling and power allocation framework is proposed to reduce the outage proba- bility during user movement. Extensive simulations are done to demonstrate the pros and cons of the proposed algorithms and the improvement of system performance.
Conference Paper
Full-text available
Over the past few years, softmax and SGD have become a commonly used component and the default training strategy in CNN frameworks, respectively. However, when optimizing CNNs with SGD, the saturation behavior behind softmax always gives us an illusion of training well and then is omitted. In this paper, we first emphasize that the early saturation behavior of softmax will impede the exploration of SGD, which sometimes is a reason for model converging at a bad local-minima, then propose Noisy Softmax to mitigating this early saturation issue by injecting annealed noise in softmax during each iteration. This operation based on noise injection aims at postponing the early saturation and further bringing continuous gradients propagation so as to significantly encourage SGD solver to be more exploratory and help to find a better local-minima. This paper empirically verifies the superiority of the early softmax desaturation, and our method indeed improves the generalization ability of CNN model by regularization. We experimentally find that this early desaturation helps optimization in many tasks, yielding state-of-the-art or competitive results on several popular benchmark datasets.
Article
We consider the performance of time-division duplex (TDD) massive multiple-input multiple-output (MIMO) with imperfect calibration of the transmit and receive radio frequency chains. By deriving the achievable signal-to-interference-plus-noise ratio SINR and the per-user bit error rate BER for constant modulus constellations, we establish that with linear precoding, reciprocity imperfections can result in substantial reduction of the array gain. To mitigate this loss, we propose an algorithm for blind estimation of the effective channel gain at each user. We show that given sufficiently many downlink symbols, this blind channel estimation algorithm restores the array gain. In addition, the proposed blind gain estimation algorithm can improve performance even under perfect reciprocity compared to standard hardening-based receivers. Following this, we derive the BERs for non-constant modulus constellations, viz. M-PAM and M-QAM. We corroborate all our derived results using numerical simulations.
Article
Perfect channel estimation is a complex task with high power consumption and cost; in addition, requiring pilot transmission reduces the data rate. So, it is not favourable especially in mobile communication systems. The aim of this paper is to design (a new, low cost and low complexity) deep learning based channel estimator for free space optical (FSO) communication. In order to have a better understanding, this paper goes deeper through the problem, and presents different new deep learning based FSO systems, in which deep learning is used as detector, joint constellation shaper and detector, channel estimator, joint channel estimator and detector, joint constellation shaper and channel estimator and detector. For comparison with conventional systems, the outstanding QAM modulation, perfect channel estimation and maximum likelihood detection is applied. Considering wide range of atmospheric turbulences, from weak to strong by Gamma–Gamma model, symbol error rate performance of the proposed structures is investigated. Results indicate that the proposed deep learning based channel estimation technique, despite its less complexity, cost and power consumption provides close enough performance to the perfect channel estimation. It should be noted that the proposed structure does not need pilot sequence, hence, it has higher data rate than perfect channel estimation. The performance of the proposed deep learning based structures does not change with atmospheric turbulence variation. Furthermore, they are low cost, low complexity, with favourable performance. Accordingly, they could be good choices especially for mobile communication systems. Because the transceiver of these systems is a small mobile phone that should have low cost, complexity, and power consuming.
Article
Transceiver hardware impairments (e.g., phase noise, high power amplifier nonlinearities, in-phase/quadrature-phase imbalance, and quantization errors) degrade the performance of channel estimation in millimeter wave (mmWave) multiple-input-multiple-output (MIMO) systems. Although compensation methods can be exploited to mitigate the impact of hardware impairments, there always remains residual impairments that will distort the training pilots and received signals. In this paper, we reformulate the channel estimation with transceiver impairments into a sparse recovery problem from a Bayesian perspective and propose an efficient channel estimation algorithm. The proposed algorithm can effectively deal with the perturbation in channel estimation problem caused by the transmitter hardware impairments with small amount computation times. Simulation results demonstrate the superior performance of the proposed algorithm compared to the conventional orthogonal matching pursuit algorithm (OMP) based channel estimation method and the Bayesian inference method.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Article
We present an O(M log M) method for estimating M-dimensional structured channels that uses techniques from the field of machine learning. Our channel model is typical in communications: the channel vector is normal distributed given an unknown covariance matrix, which depends on random hyperparameters such as the angles of the propagation paths. If the channel model exhibits certain Toeplitz and shift-invariance structures, the complexity of the MMSE estimator can be reduced to O(M log M); otherwise, it is much higher. To obtain an O(M log M) estimator for the general case, we use the structure of this specific MMSE estimator as an informed guess for the architecture of a neural network. We discuss how this network can be efficiently trained with channel realizations to learn the MMSE estimator in the class of O(M log M) estimators.
Article
The millimeter-wave (mmWave) band will be a key component of fifth-generation (5G) wireless communication systems. This paper presents radio propagation measurements and analysis investigating the wideband directional channel characteristics of mmWave transmission for in-building and urban cellular communication systems in the 28 GHz band. Based on the measurements, we analyze and model the spatio-temporal channel characteristics such as multipath delay, angular statistics, and path loss. In particular we investigate the clustering of the multipath components, and investigate both the intra-cluster and inter-cluster distributions. Based on these investigations, we present a complete channel model suitable for system simulations in the in-building and urban environments.