Content uploaded by Swaraj Kumar
Author content
All content in this area was uploaded by Swaraj Kumar on Feb 09, 2022
Content may be subject to copyright.
Continual Learning-Based Channel Estimation for
5G Millimeter-Wave Systems
Swaraj Kumar1, Satya Kumar Vankayala1, Biswapratap Singh Sahoo2, and Seungil Yoon3
1Networks S/W R&D Group, Samsung R&D Institute India Bangalore, Bengaluru, India
2Networks Modem R&D Group, Samsung R&D Institute India Bangalore, Bengaluru, India
3Network Business, Samsung Electronics, Suwon, South Korea
Abstract—Accurate channel estimation in the millimeter-wave
(mmWave) based wireless communication systems is challenging
and involves a lot of computational costs. The mmWave fre-
quency band has its advantages and disadvantages. At higher fre-
quency mmWave bands, due to smaller wavelengths, we can pack
a large number of antennas compared to lower frequency bands.
However, the main disadvantages of the mmWave system are
computing accurate channel estimation, smaller coverage, and
high signal absorption. Besides, when multiple-input multiple-
output (MIMO) systems operated over mmWave frequencies,
it makes the channel estimation even more intricate in terms
of computational complexity and estimation accuracy. In this
paper, we plan to address these limitations and improve channel
accuracy; we proposed a Continual Learning (CL)-based method
for channel estimation in mmWave MIMO systems. Besides, we
also proposed an activation function that is numerically stable
and robust against early saturation. We discussed several channel
estimation algorithms from the literature, also evaluated and
compared their performances via numerical simulations. Our
simulation results show that the proposed CL-based method
outperforms the existing minimum mean squared error (MMSE)-
based channel estimators in terms of precision. Furthermore,
based on our experiments, we give insight into spectral efficiency
with respect to the number of available channel observations.
Index Terms—machine learning, channel estimation,
millimeter-wave system, MIMO, channel model
I. INTRODUCTION
Communication over millimeter wave (mmWave) frequency
band has been regarded as a promising technology to meet
ever-growing data traffic [1]. However, the mmWave fre-
quency band faces severe signal attenuation challenges. Taking
the advantages of the smaller wavelength of mmWave bands,
narrow high gain beamforming is possible by deploying a high
number of antennas at the transceivers to compensate for the
high propagation loss [1], [2]. However, the use of a large
number of antennas makes the process of channel estimation
challenging. With the conventional channel estimators, the
number of training pilots is a function of the number of
transmit antennas as well as the number of resource blocks,
which is prohibitive [2].
Channel estimation plays an integral role in the design and
performance of the next-generation wireless communication
systems, e.g., mmWave networks. Generally speaking, the
channel estimation can be broadly classified into two cate-
gories: 1) Blind channel based estimation, and 2) Training
based channel estimations. Blind channel estimation is used
in the uplink channel estimation process. Performance of these
algorithms are inferior compared to training-based channel
estimators [3]. While the blind estimators provide reasonable
accuracy, the residual calibration errors will limit the overall
system performance. Recently, training based channel estima-
tions have gained much attention in MIMO systems. In these
methods, the training sequences with respect to minimizing
mean squared error (MMSE) estimation are determined with
partial knowledge about the state of the unknown time-
varying channel, additive noise, and multiuser interference.
Though MMSE channel estimators are used in products, the
main drawback is high computational complexity. There is
a tradeoff between channel estimation complexity and per-
formance. The learning-based algorithms capture the channel
imperfections better while achieving fairly low computational
complexity, and making them suitable for channel estima-
tion [4], [5]. Thus, learning-based MMSE channel estimators
make a good combination to study mmWave massive MIMO
systems. However, the computational complexity of learning-
based schemes grows with increasing number of layers.
The channel models for mmWave systems are often based
on geometry-based stochastic models which are complicated,
thus the MMSE estimates cannot be determined in closed
form. To find the best computable estimator, it is a common
strategy to limit the estimator to a fixed class of functions
and then find the best estimator in that class [5]. Thus,
low-complexity MMSE estimators which could restrict the
class of linear estimators can minimize the mean squared
error (MSE). In the 5G systems, number of antennas and
system bandwidth is much higher than 4G systems. There is a
need for faster and accurate channel estimation in 5G Cloud
Radio Access Network (RAN) systems. We can implement
our computationally efficient machine learning algorithm in
Cloud-RAN systems easily.
In this paper, we introduce a continual learning (CL)-based
MMSE called CL-MMSE to increase the plasticity of the
network. Besides, we also introduce an activation function
called Noisy Sigsoftmax that allows for smooth gradient
updates during backpropagation. Both these solutions have
been carefully designed for improving the efficacy of neural
networks and reducing the number of layers. Our proposed
approach differs from previous studies on the performance
analysis of massive MIMO systems with reciprocity calibra-
tion imperfections where the exact knowledge of the channel
is assumed to be available.
The rest of this paper is organized as follows. In Section II,
we provide a brief summary of the existing channel estimation
methods. Section III presents a typical massive MIMO system
and the investigated channel model. In Section IV, a learning-
based channel estimation method is proposed. Section V pro-
vides the performance evaluation of the proposed algorithms,
and we conclude the paper in Section VI.
II. RE LATE D WOR KS
Several works have been investigated to study the mmWave
MIMO channel estimation. Authors in [5] adopted a neural
network-based model to estimate a filter matrix which when
multiplied with the observed signal gives the channel esti-
mation. They used a hierarchical training mechanism (first
train on less number of antennas and then progressively
increase the antennas). Authors used a very popular activation
function called Softmax. Softmax can also be a bottleneck
of representational learning in language modeling due to
numerical instability.
Neural networks with Softmax are believed to have univer-
sal approximation property and problem of early saturation. To
address these issues, authors in [6] proposed Noisy Softmax
activation function. The early saturation behavior of Softmax
impedes the exploration of backpropagation, which sometimes
is a reason for model converging at a bad local-minima.
Besides, the Noisy Softmax relies on injecting annealed noise
in Softmax during each iteration, thus postponing the early
saturation.The proposed method in [7], called Sigsoftmax
designed specially for recurrent neural networks (RNN) solves
the numerical stability problem of Softmax. Incorporating it in
CNN requires changes like restricting its range. But Sigsoft-
max is monotonically increasing, hence we use a modified
version of it to restrict the range of upper bound to close
to 1. This modified Sigsoftmax suffers from early saturation.
For overcoming the early saturation problem in the modified
Sigsoftmax, we have added annealed noise as in [6], to make
the activation function more robust.
Several works have been investigated the learning-based
channel estimation [4], [5], [8], however with different key
focus in terms of system model and estimator design. These
solutions also do not provide a computationally prudent
models. The learning-based methods mainly exploit Toeplitz
structure of the covariance matrix for channel estimation [5].
We also adopt the Toeplitz structure to define the dimensions
of kernel needed in the convolutional layer. To make the
learning procedure more robust, we introduce a continual
learning mechanism for training the proposed MMSE. Con-
tinual learning is an incremental learning mechanism which
enables the model to learn on multiple tasks.
Neural networks tends to forget the previously learnt tasks
when introduced to the new task. This is called catastrophic
forgetting. This is due to low plasticity of the neural networks
used. Stability of the neural network generally decreases
when it is used to solve myriad of tasks, this is referred as
stability-plasticity dilemma [9]. Continual Learning methods
can be classified in three major groups: replay, regularization-
based and parameter isolation methods [10]. We employ a
regularization-based method called elastic weight consolida-
tion (EWC) [11]. EWC introduces a regularization term in
the loss function which signifies the importance of retaining
the learning from previous tasks. During training of new
tasks, deviation from the optimal weights of previous task
is penalised.
III. SYS TE M AN D CHA NN EL MO DELS
A. System Model
We considered a wireless communication system operating
in the mmWave frequency band. We assumed the base station
(BS) and the user equipment (UE) are equipped with M
and Ntransmit-receive antenna, respectively. Antennas are
separated by at least λ/2, where λdenotes the wavelength.
In addition, both the BS and UE are assumed to be equipped
with Mrf, and Nrf radio frequency (RF) chains, respectively,
where Mrf ≤Mand Nrf ≤N. We consider the uplink
channel estimation for a multi-antenna transceivers mmWave
communication system.
Further assuming the hybrid analog-digital architecture,
where BS transmit pilot signals periodically. We assume that
periodicity is lesser than coherence time. The transmitter
modifies the data streams s, before transmission, with a
digital baseband precoder, and subsequently construct the
final transmitter signal using a RF precoder. The time-varying
discrete transmitted signal can be expressed as [2]
x=PMs(1)
where PMdenote the combined precoding matrix used to
construct the final transmitter signal. The complex-valued
received signal at receiver can be expressed as
r=Hx +n(2)
where Hdenotes the M×Nchannel gain matrix, and
nrepresents the zero-mean additive white Gaussian noise
(AWGN) vector with covariance matrix σ2IN. The received
signal ris multiplied by a complex weighting factor WH,
which adjusts the magnitude and phase of the transmitted
signal from the transmitting antennas, and the superscript H
on Wrepresents its complex conjugate. We assume that σ2
is known to the BS receiver. Finally, the combined signal can
be expressed as
y=WHHx +WHn(3)
In this paper, we estimate the uplink channel using CNN
with a novel activation function.
B. Millimeter-Wave Channel Model
The mmWave channel has limited scattering [12]–[14].
Thus, we assume that channel has Ltaps, where `=
0,1...,L is the subpath index. Under this adopted channel
model, the channel matrix Hcan mathematically be expressed
as
H=
L
X
`=0
g`urx(θ`)uH
tx(φ`)(4)
where g`is the complex small-scale fading coefficient asso-
ciated with the `-th propagation path, urx(θ`)and utx (φ`)
are the antenna array response vectors with AoD (Angle of
Departure) φ`and AoA (Angle of Arrival) θ`of the `-th path
at the transmitter and receiver, respectively. Assuming uniform
linear arrays (ULAs), the antenna array response urx(θ`)and
uH
tx(φ`)are adopted as defined in [2].
IV. LEARNING-BASED CHANNEL ESTIMATION
A. Key Idea
Hierarchical training allows the model to learn sequentially
from small number of antennas to large number of antennas,
it still does not solve the problem of catastrophic forgetting
which makes the model forget the previous tasks when we
train it for new task, although there is no major variation be-
tween the tasks here. However in [5], they use interpolation to
make approximation in their training mechanism, which alters
the input significantly and weights (θ1and θ2) will change
significantly in between these tasks. To make the learning
optimal and avoid forgetting, we use EWC to enhance the
retention power of our CL-MMSE. The objective loss function
i.e., LBfor the proposed CL-MMSE can be expressed as
LB=1
kN k X
H∈N
kWB(θB,1, θB,2,y)−Hk2
2+λ
2
X
i
|θB,i |2
(5)
where yis the observation, His the channel response matrix,
Nis the batch size, θK,j are weights of jconvolutional layer
for Kantennas. WBis the output of the CNN for yfor having
Bnumber of antennas, λis the regularization coefficient for
L2 regularization. Regularization is only used in the first task.
Objective function in (5) will learn for Bantennas forgetting
the training on previous task. Adding EWC regularization
term, gives the final loss function:
LEW C =LB+X
i
µ
2Fi(θB,i −θ∗
A,i)2(6)
where µis the coefficient indicating the importance of task
with Anumber of antennas (previous task) with respect
to task with Bnumber of antennas (new task). Fiis the
fisher information matrix for layer i.θ∗
A,i is the interpolated
weight matrix for Aantennas. Interpolation is applied to make
the dimension of θB,i and θ∗
A,i same. When using (6), L2
regularization is turned off (λ= 0 in LB). L2 regularization
negatively affects the EWC regularization term.
We calculate the fisher’s matrix between the weights of the
two tasks and add it to the loss function (5). The EWC’s
regularization term penalises the model if it gets away from
the previous tasks optimal weights. This way when we do
the weight update, the weights do not get completely skewed
toward the latest task, and the network is able to retain
Fig. 1. Illustration of weights updation scheme with or without CL. Here,
θ∗
T−1,i is the interpolation of θT−1,i and θT,i is the optimal weight for task
Tand layer i.
the prior learning. Fig. 1 depicts the CL’s weight updating
strategy. Optimal weights follow a trajectory which enables
the network to be more plastic and retain prior learning.
This signifies that the latest task is important but significant
attention need to be given to the distance it moves away from
the prior tasks weights. Also, unlike hierarchical training θT,i
is initialized and not merely interpolated from θT−1,i. This
allows CL-MMSE to explore more rather than getting stuck
at bad local minima introduced due to the previous task. Also
continual learning can be used in addition with federated
learning for distributed computing [15].
We introduce a new activation function to overcome the
deficiencies of softmax function. Though there have been a
surge in finding ways to alter the undesirable characteristics
of softmax, the research is limited to vision and Natural
language Processing (NLP) tasks. For these tasks, softmax
is mostly used in the last layer for classification problems.
Due to limited range of activation functions, it is not suitable
to be used in the output layer of the neural networks for
regression problems. Our network is limited to two convo-
lutional blocks and the activation function in middle of them.
Small CNN ensures low computational complexity. As in
[5], the performance of Softmax function was worse than
ReLU. This is due to Softmax not being numerical stable
and having negative values which is particularly damaging
for an activation function especially when used in mid layers.
Activation function aren’t required to inhibit the neurons of
successive layer. We try to implement Sigsoftmax as proposed
by authors in [7], for the MMSE estimator. Sigsoftmax
provides an altered version of Softmax which is non negative
and is more numerically stable than Softmax.
[f(z)]i=exp(zi)sig(zi)
PP
p−1exp(zp)sig(zp)(7)
Where [f(z)]iis the ith term in array f(z),ziis the ith
term in array zof size Pand sig is the sigmoid function.
Sigsoftmax (7) was developed for RNN, we modify it to be
used in our proposed convolutional neural network (CNN).
Equation (7) is monotonically increasing. To restrict flow of
any large values to the last layer, we need to restrict its range.
This is can be done by adding a large term to the input of
(8). This gives a modified version of Sigsoftmax:
[f(z+b)]i=exp(zi+b)sig(zi+b)
PM
m−1exp(zm+b)sig(zm+b)(8)
Simplifying the equation further, produces (9)
[f(z+b)]i=exp(zi)sig(zi+b)
PM
m−1exp(zm)sig(zm+b)(9)
Where for a large b(i.e., b= 5 in our case), the rate, this
allows us to overcome the short comings of Softmax and
restrict the range too, unlike ReLU. But with large bin (9)
the increase in (9) is very slow.
But in our simulations, modified Sigsoftmax (9) tends to
perform better than Softmax and for low number of antennas
similar to ReLU. Increase in f(zi)on increasing ziis consid-
erably slowed down in (7) compared to (9). This leads to better
results for low number of antennas. But at higher number of
antennas the performance deteriorates. This is due to early
saturation of modified Sigsoftmax. Due to this modification,
Sigsoftmax function the gradient update is small and is not
able to update the weights accurately for larger number of
antennas. To resolve this we have added annealed noise to the
modified Sigsoftmax. This method was popularized in Noisy
Softmax [6], which delays the saturation for the functions.
zi=zi−α(θi˙
Xi(1 −cos(βi))) (10)
where ziis the input to the activation function, θi= weights of
the preceding layer, Xi=ith row of X(input to the preceding
layer), βiis the angle between θiand Xivectors, and αis
the hyper-parameter for adjusting the noise level.
The annealed noise is calculated using the weights of the
preceding layer and input to that layer. Although the noise
addition technique used here was devised for the dense layer
of a neural network as in [6]. However, in our CNN we
use only one kernel per convolutional layer. This makes the
noise addition technique feasible. This noise addition to the
modified Sigsoftmax improves the performance of the model
even for large number of antennas by delaying the saturation.
The proposed activation function removes the hurdles that
restricts Softmax from performing better. The activation func-
tion can enable enhanced learning on a task, however, can not
increase the plasticity of the network. Hence, for learning new
tasks (more number of antennas setting) the weights needs to
be adjusted properly to extract maximum from the network
and retain previous task as well.
B. Working Procedure
Fig. 2 illustrates the flow of the proposed channel esti-
mation. To train the model, we generate the true channel
matrix Husing a channel model similar to [16], and y
observations are obtained using (3) which served as the input
to CNN-MMSE estimator. Subsequently, the neural network is
composed of two circular convolution layers each having one
kernel and having biases of its own. The size of the kernel
depends on the number of antennas and the transformation
used. In between two layers is a activation function. We are
Fig. 2. Flow diagram of the proposed CL-MMSE estimator.
going to evaluate the performance of our Noisy Sigsoftmax
with other popular functions. The estimated channel matrix,
denoted as Hest, is calculated by multiplying the observations
with the output of the CNN. Finally, we calculate the CL-
MMSE loss, denoted as L, using (5).
Algorithm 1 Continual Learning MMSE
1: Set Mmax,Kmax ,n
2: M0=Mmax/2n,K0=Kmax /2n
3: Initialize θ0,1,θ0,2,b0,1,b0,2
4: Generate observation yand Husing (3)
5: Calculate the loss function using (5) and update weight
using ADAM [17]
6: for i= 1,2, . . . , N do
7: Set Mi=M/2n−i, Ki=K/2n−i
8: Interpolate θi−1,1,θi−1,2,bi−1,1,bi−1,2
9: Initialize θi,1,θi,2, bi,1, bi,2
10: Generate yand Husing (3) for Mi
11: Calculate Fiusing θi,∗and θi−1,∗
12: Calculate loss function using (6) and update weights
using ADAM [17]
13: end for
During training we used a hierarchical training mechanism
combinedly with continual learning, The proposed mechanism
is provided in Algorithm 1. Step 1 takes number of antennas,
denoted as Mmax, and the length of kernel, denoted as Kmax ,
as inputs. In step 2, we spilt the training into nsteps, in
each ith incremental step we train the network on nM/βn−i
antennas, with βas 2. Step 3 initializes the weights for the first
set of antennas. Step 4 uses (3) to generate yand H. Step 5
updates the weights using (5). From Step 6−11, we update the
weights by doubling the number of antennas in each iteration.
In Step 8 interpolation is used to increase the dimension of
previous iteration’s weight matrix θi−1,1and θi−1,2to keep the
dimension same as the current iteration’s weight matrix θi,1
and θi,2. Similarly biases are also interpolated in Step 8. With
varying number of antennas, we add the EWC regularization
term, calculated in Step 11 to the CL-MMSE loss function
(6). The EWC is calculated with the weight matrix of current
iteration weight matrix (θi,∗) and with the interpolated weight
matrix of previous iteration (θi−1,∗).
V. PERFORMANCE EVAL UATIO N AN D ANALYSI S
To analyze the performance of our algorithm CL-MMSE,
we compare it with three other schemes: Fast MMSE [5],
Maximum Likelihood (ML) [18], and CNN-MMSE [5] es-
timator. Fast MMSE technique uses the Softmax activation
function and does not have hierarchical training schedule. The
ML method also first calculates the channel covariance matrix
and uses this to find the MMSE estimates of channel vectors.
We test our CL-MMSE with five different activation functions:
ReLU [19], Swish [20], Sigsoftmax [7], Softmax [5] and our
proposed Noisy Sigsoftmax function (9). All these CL-MMSE
estimators are trained using our proposed training method as
mentioned in IV-B. Models are trained for N= 16Msamples
(M is the number of antennas). We use 70% percent samples
for training, 15% for validation and remaining 15% for testing.
Fig. 3 depicts the gain in accuracy obtained when continual
learning is incorporated in the normal hierarchical training.
Using the proposed training mechanism, we train the CL-
MMSE estimators with different activation functions. We have
chosen batch size as 20, number of layers as 2, and ADAM
optimizer [17]. Different learning rate was used for the CNNs
based on the activation functions for more precision.
We study the performance of our scheme against two
metrics: MSE and spectral efficiency. In Table I, we have
listed the basic system parameters used for simulation which
are based on [16]. The performance of the Toeplitz structures
was found to be superior to the Circulant. Thus, Topelitz
structure has been used to analyse the activation functions. In
addition, the off-line learning procedure required by the CNN
estimators to generate the necessary realizations of channel
vectors and observations, i.e., the noise variance σ2and the
correct model for the parameters, are assume to be known.
The channel estimation for different proposed methods
with per-antenna MSE for a single snapshot (T= 1) is
demonstrated in Fig. 4 as a function of the number of
antennas. Toeplitz structure was used in conjunction with the
activation functions. Fig. 4 shows the efficacy of our proposed
activation function the noisy modified Sigsoftmax, represented
as ToepNoisySigsoftmax. The performance is better than all
the other activation functions. From the simulation results,
it is clear that Sigsoftmax deteriorates with large number
of antennas (i.e., M= 96 and M= 128) whereas Noisy
Sigsoftmax outperforms all other schemes for large number
of antennas.
TABLE I
SIMULATION PARAMETERS
Parameter Value
Deployment scenario Urban
Operating frequency [GHz] 28
Subcarrier spacing [kHz] 120
Number of antennas at BS (M) [8, 16, 32, 96, 128]
Number of antennas at UE (N) 3
UE distribution Random
Pathloss coefficient 3.5
Log-normal Shadow Fading 0 dB
SNR at maximum distance 15 dB
Annealed noise (α) 0.1
EWC coefficient (µ) 0.3
Regularizer coefficient (λ) for M0= 10−6else 0
Distance {Minimum, Maximum} {1000 m, 1500 m}
Fig. 3. Comparison of training mechanism of proposed CL-MMSE and
Hierarchical learning MMSE (HL-MMSE) [5].
0 20 40 60 80 100 120 140
Number of Antennas at the Base Station
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
Normalized MSE
ToepSwish
ToepReLU
ToepSigSoftmax
FastMMSE
CircML
ToepNoisySigSoftmax
ToepSoftmax
Fig. 4. MSE per antenna at an SNR of 15 dB against a single observation.
As shown in Fig. 5, in our evaluation with varying SNR, the
noisy Sigsoftmax outperforms the other schemes. With higher
SNR values, the performance of the estimators converges. In
addition, the Softmax function also outperform ReLU.
-15 -10 -5 0 5 10 15
SNR[dB]
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Normalized MSE
FastMMSE
ToepSigSoftmax
ToepNoisySigSoftmax
ToepReLU
Toepswish
ToepSoftmax
Fig. 5. MSE per antenna for M= 64 antennas against a single observation.
Fig. 6, we demonstrate that the spectral efficiency against
the number of observations T. We observed significant gain
with the use of noisy Sigsoftmax. In addition, it is also seen
that the importance of improving the performance of CL-
MMSE can have on the information rate.
0 5 10 15 20 25 30 35 40
Number of Observations T
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
Spectral Efficiency in Bit/s/Hz
FastMMSE
ToepSigSoftmax
ToepNoisySigSoftmax
ToepReLU
Toepswish
ToepSoftmax
Fig. 6. Spectral efficiency for M= 64 antennas and SNR of 15 dB.
VI. CONCLUSION
In this paper, we presented a continual learning-based chan-
nel estimator called CL-MMSE. In contrast to other MMSE-
based approaches in the literature, our proposed method differs
in terms of training mechanism and and newly introduced ac-
tivation function. We carefully designed the channel estimator
by combinedly hierarchical training and continual learning to
improve the efficacy of the neural network while reducing
the number of layers. Despite the use of a lower number
of layers, our proposed method outperformed the existing
schemes in the literature in terms of MSE and computational
complexity. Our observation through simulations suggests that
neural networks provide a low complexity solution for blind
channel estimation. In future. one can consider the proposed
solution for 5G systems.
REFERENCES
[1] T. S. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang, G. N.
Wong, J. K. Schulz, M. Samimi, and F. Gutierrez, “Millimeter Wave
Mobile Communications for 5G Cellular: It Will Work!,” IEEE Access,
vol. 1, pp. 335–349, 2013.
[2] Y. Wu, Y. Gu, and Z. Wang, “Efficient Channel Estimation for mmWave
MIMO With Transceiver Hardware Impairments,” IEEE Trans. Veh.
Technol., vol. 68, no. 10, pp. 9883–9895, 2019.
[3] R. Chopra, C. R. Murthy, H. A. Suraweera, and E. G. Larsson,
“Blind Channel Estimation for Downlink Massive MIMO Systems with
Imperfect Channel Reciprocity,” IEEE Trans. Signal Process., 2020.
[4] S. K. Vankayala, S. Kumar, and I. Kommineni, “Optimizing deep
learning based channel estimation using channel response arrangement,”
in 2020 IEEE International Conference on Electronics, Computing and
Communication Technologies (CONECCT), pp. 1–5, 2020.
[5] D. Neumann, T. Wiese, and W. Utschick, “Learning the MMSE Channel
Estimator,” IEEE Trans. Signal Process., vol. 66, no. 11, pp. 2905–2917,
2018.
[6] B. Chen, W. Deng, and J. Du, “Noisy Softmax: Improving the General-
ization Ability of DCCN via Postponing the Early Softmax Saturation,”
Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition,
pp. 5372–5381, 2017.
[7] S. Kanai, Y. Fujiwara, Y. Yamanaka, and S. Adachi, “Sigsoftmax:
Reanalysis of the Softmax Bottleneck,” Proc. of the 32nd Int. Conf.
on Neural Information Processing Systems, p. 284–294, 2018.
[8] M. A. Amirabadi, M. H. Kahaei, S. A. Nezamalhosseini, and V. T.
Vakili, “Deep Learning for Channel Estimation in FSO Communication
System,” Optics Communications, vol. 459, p. 124989, 2020.
[9] S. Grossberg, “Studies of Mind and Brain : Neural Principles of
Learning, Perception, Development, Cognition, and Motor Control,”
Boston studies in the philosophy of science 70. Dordrecht: Reidel, 1982.
[10] M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis,
G. Slabaugh, and T. Tuytelaars, “Continual learning: A comparative
Study on How to Defy Forgetting in Classification Tasks,” arXiv preprint
arXiv:1909.08383, 2019.
[11] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins,
A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska,
et al., “Overcoming Catastrophic Forgetting in Neural Networks,”
Proceedings of the national academy of sciences, vol. 114, no. 13,
pp. 3521–3526, 2017.
[12] J. Ko, Y.-J. Cho, S. Hur, T. Kim, J. Park, A. F. Molisch, K. Haneda,
M. Peter, D.-J. Park, and D.-H. Cho, “Millimeter-Wave Channel Mea-
surements and Analysis for Statistical Spatial Channel Model in In-
building and Urban Environments at 28 GHz,” IEEE Trans. Wireless
Commun., vol. 16, no. 9, pp. 5853–5868, 2017.
[13] M. R. Akdeniz, Y. Liu, M. K. Samimi, S. Sun, S. Rangan, T. S.
Rappaport, and E. Erkip, “Millimeter Wave Channel Modeling and
Cellular Capacity Evaluation,” IEEE J. Sel. Areas Commun., vol. 32,
no. 6, pp. 1164–1179, 2014.
[14] C.-H. Yao, Y.-Y. Chen, B. P. Sahoo, and H.-Y. Wei, “Outage Reduction
with Joint Scheduling and Power Allocation in 5G mmWave Cellular
Networks,” in 2017 IEEE 28th Annual Int. Symp. on Personal, Indoor,
and Mobile Radio Communications (PIMRC), pp. 1–6, IEEE, 2017.
[15] S. Kumar, S. Dutta, S. Chatturvedi, and M. Bhatia, “Strategies for en-
hancing training and privacy in blockchain enabled federated learning,”
in 2020 IEEE Sixth International Conference on Multimedia Big Data
(BigMM), pp. 333–340, 2020.
[16] 3GPP, “Spatial channel model for multiple input multiple output
(MIMO) Simulations (Release 12),” 3GPP, 2014. TR 25.996 V12.0.0.
[17] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,”
arXiv preprint arXiv:1412.6980., 2014.
[18] D. Neumann, M. Joham, L. Weiland, and W. Utschick, “Low-complexity
Computation of LMMSE Channel Estimates in Massive MIMO,” 19th
Int. ITG Workshop on Smart Antennas, pp. 1–6, 2015.
[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet Classifica-
tion with Deep Convolutional Neural Networks,” Advances in neural
information processing systems, pp. 1097–1105, 2012.
[20] P. Ramachandran, B. Zoph, and Q. V. Le, “Searching for Activation
Functions,” arXiv preprint arXiv:1710.05941, 2017.