Conference PaperPDF Available

Continual Learning-Based Channel Estimation for 5G Millimeter-Wave Systems

January 2021

January 2021

DOI:10.1109/CCNC49032.2021.9369645

Conference: 2021 IEEE 18th Annual Consumer Communications & Networking Conference (CCNC)

Authors:

Swaraj Kumar

Samsung Research

Satya kumar Vankayala

Indian Institute of Science

Content uploaded by Swaraj Kumar

Content may be subject to copyright.

Continual Learning-Based Channel Estimation for

5G Millimeter-Wave Systems

Swaraj Kumar1, Satya Kumar Vankayala1, Biswapratap Singh Sahoo2, and Seungil Yoon3

1Networks S/W R&D Group, Samsung R&D Institute India Bangalore, Bengaluru, India

2Networks Modem R&D Group, Samsung R&D Institute India Bangalore, Bengaluru, India

3Network Business, Samsung Electronics, Suwon, South Korea

Abstract—Accurate channel estimation in the millimeter-wave

(mmWave) based wireless communication systems is challenging

and involves a lot of computational costs. The mmWave fre-

quency band has its advantages and disadvantages. At higher fre-

quency mmWave bands, due to smaller wavelengths, we can pack

a large number of antennas compared to lower frequency bands.

However, the main disadvantages of the mmWave system are

computing accurate channel estimation, smaller coverage, and

high signal absorption. Besides, when multiple-input multiple-

output (MIMO) systems operated over mmWave frequencies,

it makes the channel estimation even more intricate in terms

of computational complexity and estimation accuracy. In this

paper, we plan to address these limitations and improve channel

accuracy; we proposed a Continual Learning (CL)-based method

for channel estimation in mmWave MIMO systems. Besides, we

also proposed an activation function that is numerically stable

and robust against early saturation. We discussed several channel

estimation algorithms from the literature, also evaluated and

compared their performances via numerical simulations. Our

simulation results show that the proposed CL-based method

outperforms the existing minimum mean squared error (MMSE)-

based channel estimators in terms of precision. Furthermore,

based on our experiments, we give insight into spectral efﬁciency

with respect to the number of available channel observations.

Index Terms—machine learning, channel estimation,

millimeter-wave system, MIMO, channel model

I. INTRODUCTION

Communication over millimeter wave (mmWave) frequency

band has been regarded as a promising technology to meet

ever-growing data trafﬁc [1]. However, the mmWave fre-

quency band faces severe signal attenuation challenges. Taking

the advantages of the smaller wavelength of mmWave bands,

narrow high gain beamforming is possible by deploying a high

number of antennas at the transceivers to compensate for the

high propagation loss [1], [2]. However, the use of a large

number of antennas makes the process of channel estimation

challenging. With the conventional channel estimators, the

number of training pilots is a function of the number of

transmit antennas as well as the number of resource blocks,

which is prohibitive [2].

Channel estimation plays an integral role in the design and

performance of the next-generation wireless communication

systems, e.g., mmWave networks. Generally speaking, the

channel estimation can be broadly classiﬁed into two cate-

gories: 1) Blind channel based estimation, and 2) Training

based channel estimations. Blind channel estimation is used

in the uplink channel estimation process. Performance of these

algorithms are inferior compared to training-based channel

estimators [3]. While the blind estimators provide reasonable

accuracy, the residual calibration errors will limit the overall

system performance. Recently, training based channel estima-

tions have gained much attention in MIMO systems. In these

methods, the training sequences with respect to minimizing

mean squared error (MMSE) estimation are determined with

partial knowledge about the state of the unknown time-

varying channel, additive noise, and multiuser interference.

Though MMSE channel estimators are used in products, the

main drawback is high computational complexity. There is

a tradeoff between channel estimation complexity and per-

formance. The learning-based algorithms capture the channel

imperfections better while achieving fairly low computational

complexity, and making them suitable for channel estima-

tion [4], [5]. Thus, learning-based MMSE channel estimators

make a good combination to study mmWave massive MIMO

systems. However, the computational complexity of learning-

based schemes grows with increasing number of layers.

The channel models for mmWave systems are often based

on geometry-based stochastic models which are complicated,

thus the MMSE estimates cannot be determined in closed

form. To ﬁnd the best computable estimator, it is a common

strategy to limit the estimator to a ﬁxed class of functions

and then ﬁnd the best estimator in that class [5]. Thus,

low-complexity MMSE estimators which could restrict the

class of linear estimators can minimize the mean squared

error (MSE). In the 5G systems, number of antennas and

system bandwidth is much higher than 4G systems. There is a

need for faster and accurate channel estimation in 5G Cloud

Radio Access Network (RAN) systems. We can implement

our computationally efﬁcient machine learning algorithm in

Cloud-RAN systems easily.

In this paper, we introduce a continual learning (CL)-based

MMSE called CL-MMSE to increase the plasticity of the

network. Besides, we also introduce an activation function

called Noisy Sigsoftmax that allows for smooth gradient

updates during backpropagation. Both these solutions have

been carefully designed for improving the efﬁcacy of neural

networks and reducing the number of layers. Our proposed

approach differs from previous studies on the performance

analysis of massive MIMO systems with reciprocity calibra-

tion imperfections where the exact knowledge of the channel

is assumed to be available.

The rest of this paper is organized as follows. In Section II,

we provide a brief summary of the existing channel estimation

methods. Section III presents a typical massive MIMO system

and the investigated channel model. In Section IV, a learning-

based channel estimation method is proposed. Section V pro-

vides the performance evaluation of the proposed algorithms,

and we conclude the paper in Section VI.

II. RE LATE D WOR KS

Several works have been investigated to study the mmWave

MIMO channel estimation. Authors in [5] adopted a neural

network-based model to estimate a ﬁlter matrix which when

multiplied with the observed signal gives the channel esti-

mation. They used a hierarchical training mechanism (ﬁrst

train on less number of antennas and then progressively

increase the antennas). Authors used a very popular activation

function called Softmax. Softmax can also be a bottleneck

of representational learning in language modeling due to

numerical instability.

Neural networks with Softmax are believed to have univer-

sal approximation property and problem of early saturation. To

address these issues, authors in [6] proposed Noisy Softmax

activation function. The early saturation behavior of Softmax

impedes the exploration of backpropagation, which sometimes

is a reason for model converging at a bad local-minima.

Besides, the Noisy Softmax relies on injecting annealed noise

in Softmax during each iteration, thus postponing the early

saturation.The proposed method in [7], called Sigsoftmax

designed specially for recurrent neural networks (RNN) solves

the numerical stability problem of Softmax. Incorporating it in

CNN requires changes like restricting its range. But Sigsoft-

max is monotonically increasing, hence we use a modiﬁed

version of it to restrict the range of upper bound to close

to 1. This modiﬁed Sigsoftmax suffers from early saturation.

For overcoming the early saturation problem in the modiﬁed

Sigsoftmax, we have added annealed noise as in [6], to make

the activation function more robust.

Several works have been investigated the learning-based

channel estimation [4], [5], [8], however with different key

focus in terms of system model and estimator design. These

solutions also do not provide a computationally prudent

models. The learning-based methods mainly exploit Toeplitz

structure of the covariance matrix for channel estimation [5].

We also adopt the Toeplitz structure to deﬁne the dimensions

of kernel needed in the convolutional layer. To make the

learning procedure more robust, we introduce a continual

learning mechanism for training the proposed MMSE. Con-

tinual learning is an incremental learning mechanism which

enables the model to learn on multiple tasks.

Neural networks tends to forget the previously learnt tasks

when introduced to the new task. This is called catastrophic

forgetting. This is due to low plasticity of the neural networks

used. Stability of the neural network generally decreases

when it is used to solve myriad of tasks, this is referred as

stability-plasticity dilemma [9]. Continual Learning methods

can be classiﬁed in three major groups: replay, regularization-

based and parameter isolation methods [10]. We employ a

regularization-based method called elastic weight consolida-

tion (EWC) [11]. EWC introduces a regularization term in

the loss function which signiﬁes the importance of retaining

the learning from previous tasks. During training of new

tasks, deviation from the optimal weights of previous task

is penalised.

III. SYS TE M AN D CHA NN EL MO DELS

A. System Model

We considered a wireless communication system operating

in the mmWave frequency band. We assumed the base station

(BS) and the user equipment (UE) are equipped with M

and Ntransmit-receive antenna, respectively. Antennas are

separated by at least λ/2, where λdenotes the wavelength.

In addition, both the BS and UE are assumed to be equipped

with Mrf, and Nrf radio frequency (RF) chains, respectively,

where Mrf ≤Mand Nrf ≤N. We consider the uplink

channel estimation for a multi-antenna transceivers mmWave

communication system.

Further assuming the hybrid analog-digital architecture,

where BS transmit pilot signals periodically. We assume that

periodicity is lesser than coherence time. The transmitter

modiﬁes the data streams s, before transmission, with a

digital baseband precoder, and subsequently construct the

ﬁnal transmitter signal using a RF precoder. The time-varying

discrete transmitted signal can be expressed as [2]

x=PMs(1)

where PMdenote the combined precoding matrix used to

construct the ﬁnal transmitter signal. The complex-valued

received signal at receiver can be expressed as

r=Hx +n(2)

where Hdenotes the M×Nchannel gain matrix, and

nrepresents the zero-mean additive white Gaussian noise

(AWGN) vector with covariance matrix σ2IN. The received

signal ris multiplied by a complex weighting factor WH,

which adjusts the magnitude and phase of the transmitted

signal from the transmitting antennas, and the superscript H

on Wrepresents its complex conjugate. We assume that σ2

is known to the BS receiver. Finally, the combined signal can

be expressed as

y=WHHx +WHn(3)

In this paper, we estimate the uplink channel using CNN

with a novel activation function.

B. Millimeter-Wave Channel Model

The mmWave channel has limited scattering [12]–[14].

Thus, we assume that channel has Ltaps, where `=

0,1...,L is the subpath index. Under this adopted channel

model, the channel matrix Hcan mathematically be expressed

`=0

g`urx(θ`)uH

tx(φ`)(4)

where g`is the complex small-scale fading coefﬁcient asso-

ciated with the `-th propagation path, urx(θ`)and utx (φ`)

are the antenna array response vectors with AoD (Angle of

Departure) φ`and AoA (Angle of Arrival) θ`of the `-th path

at the transmitter and receiver, respectively. Assuming uniform

linear arrays (ULAs), the antenna array response urx(θ`)and

tx(φ`)are adopted as deﬁned in [2].

IV. LEARNING-BASED CHANNEL ESTIMATION

A. Key Idea

Hierarchical training allows the model to learn sequentially

from small number of antennas to large number of antennas,

it still does not solve the problem of catastrophic forgetting

which makes the model forget the previous tasks when we

train it for new task, although there is no major variation be-

tween the tasks here. However in [5], they use interpolation to

make approximation in their training mechanism, which alters

the input signiﬁcantly and weights (θ1and θ2) will change

signiﬁcantly in between these tasks. To make the learning

optimal and avoid forgetting, we use EWC to enhance the

retention power of our CL-MMSE. The objective loss function

i.e., LBfor the proposed CL-MMSE can be expressed as

LB=1

kN k X

H∈N

kWB(θB,1, θB,2,y)−Hk2

2+λ

|θB,i |2

(5)

where yis the observation, His the channel response matrix,

Nis the batch size, θK,j are weights of jconvolutional layer

for Kantennas. WBis the output of the CNN for yfor having

Bnumber of antennas, λis the regularization coefﬁcient for

L2 regularization. Regularization is only used in the ﬁrst task.

Objective function in (5) will learn for Bantennas forgetting

the training on previous task. Adding EWC regularization

term, gives the ﬁnal loss function:

LEW C =LB+X

2Fi(θB,i −θ∗

A,i)2(6)

where µis the coefﬁcient indicating the importance of task

with Anumber of antennas (previous task) with respect

to task with Bnumber of antennas (new task). Fiis the

ﬁsher information matrix for layer i.θ∗

A,i is the interpolated

weight matrix for Aantennas. Interpolation is applied to make

the dimension of θB,i and θ∗

A,i same. When using (6), L2

regularization is turned off (λ= 0 in LB). L2 regularization

negatively affects the EWC regularization term.

We calculate the ﬁsher’s matrix between the weights of the

two tasks and add it to the loss function (5). The EWC’s

regularization term penalises the model if it gets away from

the previous tasks optimal weights. This way when we do

the weight update, the weights do not get completely skewed

toward the latest task, and the network is able to retain

Fig. 1. Illustration of weights updation scheme with or without CL. Here,

θ∗

T−1,i is the interpolation of θT−1,i and θT,i is the optimal weight for task

Tand layer i.

the prior learning. Fig. 1 depicts the CL’s weight updating

strategy. Optimal weights follow a trajectory which enables

the network to be more plastic and retain prior learning.

This signiﬁes that the latest task is important but signiﬁcant

attention need to be given to the distance it moves away from

the prior tasks weights. Also, unlike hierarchical training θT,i

is initialized and not merely interpolated from θT−1,i. This

allows CL-MMSE to explore more rather than getting stuck

at bad local minima introduced due to the previous task. Also

continual learning can be used in addition with federated

learning for distributed computing [15].

We introduce a new activation function to overcome the

deﬁciencies of softmax function. Though there have been a

surge in ﬁnding ways to alter the undesirable characteristics

of softmax, the research is limited to vision and Natural

language Processing (NLP) tasks. For these tasks, softmax

is mostly used in the last layer for classiﬁcation problems.

Due to limited range of activation functions, it is not suitable

to be used in the output layer of the neural networks for

regression problems. Our network is limited to two convo-

lutional blocks and the activation function in middle of them.

Small CNN ensures low computational complexity. As in

[5], the performance of Softmax function was worse than

ReLU. This is due to Softmax not being numerical stable

and having negative values which is particularly damaging

for an activation function especially when used in mid layers.

Activation function aren’t required to inhibit the neurons of

successive layer. We try to implement Sigsoftmax as proposed

by authors in [7], for the MMSE estimator. Sigsoftmax

provides an altered version of Softmax which is non negative

and is more numerically stable than Softmax.

[f(z)]i=exp(zi)sig(zi)

p−1exp(zp)sig(zp)(7)

Where [f(z)]iis the ith term in array f(z),ziis the ith

term in array zof size Pand sig is the sigmoid function.

Sigsoftmax (7) was developed for RNN, we modify it to be

used in our proposed convolutional neural network (CNN).

Equation (7) is monotonically increasing. To restrict ﬂow of

any large values to the last layer, we need to restrict its range.

This is can be done by adding a large term to the input of

(8). This gives a modiﬁed version of Sigsoftmax:

[f(z+b)]i=exp(zi+b)sig(zi+b)

m−1exp(zm+b)sig(zm+b)(8)

Simplifying the equation further, produces (9)

[f(z+b)]i=exp(zi)sig(zi+b)

m−1exp(zm)sig(zm+b)(9)

Where for a large b(i.e., b= 5 in our case), the rate, this

allows us to overcome the short comings of Softmax and

restrict the range too, unlike ReLU. But with large bin (9)

the increase in (9) is very slow.

But in our simulations, modiﬁed Sigsoftmax (9) tends to

perform better than Softmax and for low number of antennas

similar to ReLU. Increase in f(zi)on increasing ziis consid-

erably slowed down in (7) compared to (9). This leads to better

results for low number of antennas. But at higher number of

antennas the performance deteriorates. This is due to early

saturation of modiﬁed Sigsoftmax. Due to this modiﬁcation,

Sigsoftmax function the gradient update is small and is not

able to update the weights accurately for larger number of

antennas. To resolve this we have added annealed noise to the

modiﬁed Sigsoftmax. This method was popularized in Noisy

Softmax [6], which delays the saturation for the functions.

zi=zi−α(θi˙

Xi(1 −cos(βi))) (10)

where ziis the input to the activation function, θi= weights of

the preceding layer, Xi=ith row of X(input to the preceding

layer), βiis the angle between θiand Xivectors, and αis

the hyper-parameter for adjusting the noise level.

The annealed noise is calculated using the weights of the

preceding layer and input to that layer. Although the noise

addition technique used here was devised for the dense layer

of a neural network as in [6]. However, in our CNN we

use only one kernel per convolutional layer. This makes the

noise addition technique feasible. This noise addition to the

modiﬁed Sigsoftmax improves the performance of the model

even for large number of antennas by delaying the saturation.

The proposed activation function removes the hurdles that

restricts Softmax from performing better. The activation func-

tion can enable enhanced learning on a task, however, can not

increase the plasticity of the network. Hence, for learning new

tasks (more number of antennas setting) the weights needs to

be adjusted properly to extract maximum from the network

and retain previous task as well.

B. Working Procedure

Fig. 2 illustrates the ﬂow of the proposed channel esti-

mation. To train the model, we generate the true channel

matrix Husing a channel model similar to [16], and y

observations are obtained using (3) which served as the input

to CNN-MMSE estimator. Subsequently, the neural network is

composed of two circular convolution layers each having one

kernel and having biases of its own. The size of the kernel

depends on the number of antennas and the transformation

used. In between two layers is a activation function. We are

Fig. 2. Flow diagram of the proposed CL-MMSE estimator.

going to evaluate the performance of our Noisy Sigsoftmax

with other popular functions. The estimated channel matrix,

denoted as Hest, is calculated by multiplying the observations

with the output of the CNN. Finally, we calculate the CL-

MMSE loss, denoted as L, using (5).

Algorithm 1 Continual Learning MMSE

1: Set Mmax,Kmax ,n

2: M0=Mmax/2n,K0=Kmax /2n

3: Initialize θ0,1,θ0,2,b0,1,b0,2

4: Generate observation yand Husing (3)

5: Calculate the loss function using (5) and update weight

using ADAM [17]

6: for i= 1,2, . . . , N do

7: Set Mi=M/2n−i, Ki=K/2n−i

8: Interpolate θi−1,1,θi−1,2,bi−1,1,bi−1,2

9: Initialize θi,1,θi,2, bi,1, bi,2

10: Generate yand Husing (3) for Mi

11: Calculate Fiusing θi,∗and θi−1,∗

12: Calculate loss function using (6) and update weights

using ADAM [17]

13: end for

During training we used a hierarchical training mechanism

combinedly with continual learning, The proposed mechanism

is provided in Algorithm 1. Step 1 takes number of antennas,

denoted as Mmax, and the length of kernel, denoted as Kmax ,

as inputs. In step 2, we spilt the training into nsteps, in

each ith incremental step we train the network on nM/βn−i

antennas, with βas 2. Step 3 initializes the weights for the ﬁrst

set of antennas. Step 4 uses (3) to generate yand H. Step 5

updates the weights using (5). From Step 6−11, we update the

weights by doubling the number of antennas in each iteration.

In Step 8 interpolation is used to increase the dimension of

previous iteration’s weight matrix θi−1,1and θi−1,2to keep the

dimension same as the current iteration’s weight matrix θi,1

and θi,2. Similarly biases are also interpolated in Step 8. With

varying number of antennas, we add the EWC regularization

term, calculated in Step 11 to the CL-MMSE loss function

(6). The EWC is calculated with the weight matrix of current

iteration weight matrix (θi,∗) and with the interpolated weight

matrix of previous iteration (θi−1,∗).

V. PERFORMANCE EVAL UATIO N AN D ANALYSI S

To analyze the performance of our algorithm CL-MMSE,

we compare it with three other schemes: Fast MMSE [5],

Maximum Likelihood (ML) [18], and CNN-MMSE [5] es-

timator. Fast MMSE technique uses the Softmax activation

function and does not have hierarchical training schedule. The

ML method also ﬁrst calculates the channel covariance matrix

and uses this to ﬁnd the MMSE estimates of channel vectors.

We test our CL-MMSE with ﬁve different activation functions:

ReLU [19], Swish [20], Sigsoftmax [7], Softmax [5] and our

proposed Noisy Sigsoftmax function (9). All these CL-MMSE

estimators are trained using our proposed training method as

mentioned in IV-B. Models are trained for N= 16Msamples

(M is the number of antennas). We use 70% percent samples

for training, 15% for validation and remaining 15% for testing.

Fig. 3 depicts the gain in accuracy obtained when continual

learning is incorporated in the normal hierarchical training.

Using the proposed training mechanism, we train the CL-

MMSE estimators with different activation functions. We have

chosen batch size as 20, number of layers as 2, and ADAM

optimizer [17]. Different learning rate was used for the CNNs

based on the activation functions for more precision.

We study the performance of our scheme against two

metrics: MSE and spectral efﬁciency. In Table I, we have

listed the basic system parameters used for simulation which

are based on [16]. The performance of the Toeplitz structures

was found to be superior to the Circulant. Thus, Topelitz

structure has been used to analyse the activation functions. In

addition, the off-line learning procedure required by the CNN

estimators to generate the necessary realizations of channel

vectors and observations, i.e., the noise variance σ2and the

correct model for the parameters, are assume to be known.

The channel estimation for different proposed methods

with per-antenna MSE for a single snapshot (T= 1) is

demonstrated in Fig. 4 as a function of the number of

antennas. Toeplitz structure was used in conjunction with the

activation functions. Fig. 4 shows the efﬁcacy of our proposed

activation function the noisy modiﬁed Sigsoftmax, represented

as ToepNoisySigsoftmax. The performance is better than all

the other activation functions. From the simulation results,

it is clear that Sigsoftmax deteriorates with large number

of antennas (i.e., M= 96 and M= 128) whereas Noisy

Sigsoftmax outperforms all other schemes for large number

of antennas.

TABLE I

SIMULATION PARAMETERS

Parameter Value

Deployment scenario Urban

Operating frequency [GHz] 28

Subcarrier spacing [kHz] 120

Number of antennas at BS (M) [8, 16, 32, 96, 128]

Number of antennas at UE (N) 3

UE distribution Random

Pathloss coefﬁcient 3.5

Log-normal Shadow Fading 0 dB

SNR at maximum distance 15 dB

Annealed noise (α) 0.1

EWC coefﬁcient (µ) 0.3

Regularizer coefﬁcient (λ) for M0= 10−6else 0

Distance {Minimum, Maximum} {1000 m, 1500 m}

Fig. 3. Comparison of training mechanism of proposed CL-MMSE and

Hierarchical learning MMSE (HL-MMSE) [5].

0 20 40 60 80 100 120 140

Number of Antennas at the Base Station

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

Normalized MSE

ToepSwish

ToepReLU

ToepSigSoftmax

FastMMSE

CircML

ToepNoisySigSoftmax

ToepSoftmax

Fig. 4. MSE per antenna at an SNR of 15 dB against a single observation.

As shown in Fig. 5, in our evaluation with varying SNR, the

noisy Sigsoftmax outperforms the other schemes. With higher

SNR values, the performance of the estimators converges. In

addition, the Softmax function also outperform ReLU.

-15 -10 -5 0 5 10 15

SNR[dB]

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Normalized MSE

FastMMSE

ToepSigSoftmax

ToepNoisySigSoftmax

ToepReLU

Toepswish

ToepSoftmax

Fig. 5. MSE per antenna for M= 64 antennas against a single observation.

Fig. 6, we demonstrate that the spectral efﬁciency against

the number of observations T. We observed signiﬁcant gain

with the use of noisy Sigsoftmax. In addition, it is also seen

that the importance of improving the performance of CL-

MMSE can have on the information rate.

0 5 10 15 20 25 30 35 40

Number of Observations T

2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

Spectral Efficiency in Bit/s/Hz

FastMMSE

ToepSigSoftmax

ToepNoisySigSoftmax

ToepReLU

Toepswish

ToepSoftmax

Fig. 6. Spectral efﬁciency for M= 64 antennas and SNR of 15 dB.

VI. CONCLUSION

In this paper, we presented a continual learning-based chan-

nel estimator called CL-MMSE. In contrast to other MMSE-

based approaches in the literature, our proposed method differs

in terms of training mechanism and and newly introduced ac-

tivation function. We carefully designed the channel estimator

by combinedly hierarchical training and continual learning to

improve the efﬁcacy of the neural network while reducing

the number of layers. Despite the use of a lower number

of layers, our proposed method outperformed the existing

schemes in the literature in terms of MSE and computational

complexity. Our observation through simulations suggests that

neural networks provide a low complexity solution for blind

channel estimation. In future. one can consider the proposed

solution for 5G systems.

REFERENCES

[1] T. S. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang, G. N.

Wong, J. K. Schulz, M. Samimi, and F. Gutierrez, “Millimeter Wave

Mobile Communications for 5G Cellular: It Will Work!,” IEEE Access,

vol. 1, pp. 335–349, 2013.

[2] Y. Wu, Y. Gu, and Z. Wang, “Efﬁcient Channel Estimation for mmWave

MIMO With Transceiver Hardware Impairments,” IEEE Trans. Veh.

Technol., vol. 68, no. 10, pp. 9883–9895, 2019.

[3] R. Chopra, C. R. Murthy, H. A. Suraweera, and E. G. Larsson,

“Blind Channel Estimation for Downlink Massive MIMO Systems with

Imperfect Channel Reciprocity,” IEEE Trans. Signal Process., 2020.

[4] S. K. Vankayala, S. Kumar, and I. Kommineni, “Optimizing deep

learning based channel estimation using channel response arrangement,”

in 2020 IEEE International Conference on Electronics, Computing and

Communication Technologies (CONECCT), pp. 1–5, 2020.

[5] D. Neumann, T. Wiese, and W. Utschick, “Learning the MMSE Channel

Estimator,” IEEE Trans. Signal Process., vol. 66, no. 11, pp. 2905–2917,

2018.

[6] B. Chen, W. Deng, and J. Du, “Noisy Softmax: Improving the General-

ization Ability of DCCN via Postponing the Early Softmax Saturation,”

Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition,

pp. 5372–5381, 2017.

[7] S. Kanai, Y. Fujiwara, Y. Yamanaka, and S. Adachi, “Sigsoftmax:

Reanalysis of the Softmax Bottleneck,” Proc. of the 32nd Int. Conf.

on Neural Information Processing Systems, p. 284–294, 2018.

[8] M. A. Amirabadi, M. H. Kahaei, S. A. Nezamalhosseini, and V. T.

Vakili, “Deep Learning for Channel Estimation in FSO Communication

System,” Optics Communications, vol. 459, p. 124989, 2020.

[9] S. Grossberg, “Studies of Mind and Brain : Neural Principles of

Learning, Perception, Development, Cognition, and Motor Control,”

Boston studies in the philosophy of science 70. Dordrecht: Reidel, 1982.

[10] M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis,

G. Slabaugh, and T. Tuytelaars, “Continual learning: A comparative

Study on How to Defy Forgetting in Classiﬁcation Tasks,” arXiv preprint

arXiv:1909.08383, 2019.

[11] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins,

A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska,

et al., “Overcoming Catastrophic Forgetting in Neural Networks,”

Proceedings of the national academy of sciences, vol. 114, no. 13,

pp. 3521–3526, 2017.

[12] J. Ko, Y.-J. Cho, S. Hur, T. Kim, J. Park, A. F. Molisch, K. Haneda,

M. Peter, D.-J. Park, and D.-H. Cho, “Millimeter-Wave Channel Mea-

surements and Analysis for Statistical Spatial Channel Model in In-

building and Urban Environments at 28 GHz,” IEEE Trans. Wireless

Commun., vol. 16, no. 9, pp. 5853–5868, 2017.

[13] M. R. Akdeniz, Y. Liu, M. K. Samimi, S. Sun, S. Rangan, T. S.

Rappaport, and E. Erkip, “Millimeter Wave Channel Modeling and

Cellular Capacity Evaluation,” IEEE J. Sel. Areas Commun., vol. 32,

no. 6, pp. 1164–1179, 2014.

[14] C.-H. Yao, Y.-Y. Chen, B. P. Sahoo, and H.-Y. Wei, “Outage Reduction

with Joint Scheduling and Power Allocation in 5G mmWave Cellular

Networks,” in 2017 IEEE 28th Annual Int. Symp. on Personal, Indoor,

and Mobile Radio Communications (PIMRC), pp. 1–6, IEEE, 2017.

[15] S. Kumar, S. Dutta, S. Chatturvedi, and M. Bhatia, “Strategies for en-

hancing training and privacy in blockchain enabled federated learning,”

in 2020 IEEE Sixth International Conference on Multimedia Big Data

(BigMM), pp. 333–340, 2020.

[16] 3GPP, “Spatial channel model for multiple input multiple output

(MIMO) Simulations (Release 12),” 3GPP, 2014. TR 25.996 V12.0.0.

[17] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,”

arXiv preprint arXiv:1412.6980., 2014.

[18] D. Neumann, M. Joham, L. Weiland, and W. Utschick, “Low-complexity

Computation of LMMSE Channel Estimates in Massive MIMO,” 19th

Int. ITG Workshop on Smart Antennas, pp. 1–6, 2015.

[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet Classiﬁca-

tion with Deep Convolutional Neural Networks,” Advances in neural

information processing systems, pp. 1097–1105, 2012.

[20] P. Ramachandran, B. Zoph, and Q. V. Le, “Searching for Activation

Functions,” arXiv preprint arXiv:1710.05941, 2017.

Self-Renewal Machine Learning Approach for Fast Wireless Network Optimization

Conference Paper

Full-text available

Sep 2023

The throughput maximization in multi-hop wireless networks is largely limited by interference due to the reuse of the channel resources. Although machine learning (ML) can accelerate the optimization of wireless network capacity, the existing system can become limited because of insufficient knowledge from available data. We propose a self-renewal ML (SRML) method that incrementally improves the throughput maximization of future optimization instances through the design of a data selection algorithm for scheduling structure classification and application identification model retraining. With one round of implementation, the SRML method outperforms the fixed ML (FML) method, random (RAND) method and Greedy Heuristic method in the multi-commodity flow deployment setting with an average achievable throughput of 100% for small flows and at least 79% for large flows, relative to the delayed column generation (DCG) benchmark algorithm, while reducing the computational complexity and achieving a high solution efficiency. By leveraging the transfer learning of parameters during self-renewal, the computational cost of model training is reduced by at least 71.09%.

Continual Learning-Based MIMO Channel Estimation: A Benchmarking Study

Preprint

Full-text available

Nov 2022

With the proliferation of deep learning techniques for wireless communication, several works have adopted learning-based approaches to solve the channel estimation problem. While these methods are usually promoted for their computational efficiency at inference time, their use is restricted to specific stationary training settings in terms of communication system parameters, e.g., signal-to-noise ratio (SNR) and coherence time. Therefore, the performance of these learning-based solutions will degrade when the models are tested on different settings than the ones used for training. This motivates our work in which we investigate continual supervised learning (CL) to mitigate the shortcomings of the current approaches. In particular, we design a set of channel estimation tasks wherein we vary different parameters of the channel model. We focus on Gauss-Markov Rayleigh fading channel estimation to assess the impact of non-stationarity on performance in terms of the mean square error (MSE) criterion. We study a selection of state-of-the-art CL methods and we showcase empirically the importance of catastrophic forgetting in continuously evolving channel settings. Our results demonstrate that the CL algorithms can improve the interference performance in two channel estimation tasks governed by changes in the SNR level and coherence time.

Reinforcement Learning Framework for Dynamic Power Transmission in Cloud RAN Systems

Conference Paper

Full-text available

Jul 2022

DTL-Based CSI Feedback Combined with Continual Learning in FDD Massive MIMO Systems

Conference Paper

Feb 2024

Continual Learning-Based MIMO Channel Estimation: A Benchmarking Study

Conference Paper

May 2023

Domain Generalization in Machine Learning Models for Wireless Communications: Concepts, State-of-the-Art, and Open Issues

Article

Jan 2023

Data-driven machine learning (ML) is promoted as one potential technology to be used in next-generation wireless systems. This led to a large body of research work that applies ML techniques to solve problems in different layers of the wireless transmission link. However, most of these applications rely on supervised learning which assumes that the source (training) and target (test) data are independent and identically distributed (i.i.d). This assumption is often violated in the real world due to domain or distribution shifts between the source and the target data. Thus, it is important to ensure that these algorithms generalize to out-of-distribution (OOD) data. In this context, domain generalization (DG) tackles the OOD-related issues by learning models on different and distinct source domains/datasets with generalization capabilities to unseen new domains without additional finetuning. Motivated by the importance of DG requirements for wireless applications, we present a comprehensive overview of the recent developments in DG and the different sources of domain shift. We also summarize the existing DG methods and review their applications in selected wireless communication problems, and conclude with insights and open questions.

Neural Network Framework for Modulation and Signal Classification

Conference Paper

Dec 2022

A Novel Intelligent Channel Estimation Strategy for the 5G Wireless Communication Systems

Article

Full-text available

Apr 2023
WIRELESS PERS COMMUN

Nowadays, the Multiple Input Multiple Output (MIMO) Orthogonal Frequency Division Multiplexing (OFDM) is an important method used in wireless communications, especially in 5G cellular communications. As in a wireless network, the input signals pass through a channel, and the input signal undergoes phase shift, attenuation, and interference. So, the information from the user side and the received signals differ. Thus, an effective channel estimator is essential to make cellular communication better. Hence, a novel hybrid technique called Chimp-based CatBoost channel estimation (CbCBCE) was proposed. This technique combines the Chimp optimization algorithm and the CatBoost algorithm. The channel parameters are estimated and then reduced using the Chimp optimization algorithm. Finally, the proposed model is validated with the case study. Then, the result of the proposed model was estimated and compared with other existing techniques. It is observed that the outcome of the proposed design is high compared to the other conventional methods. The presented model is executed in the MATLAB platform, proving that the proposed model has high throughput, increased energy efficiency, less BER, and a high data transfer rate.

Recurrent Neural Network Architecture for Communication Log Analysis

Conference Paper

Sep 2022

Mobility-aware Hybrid Channel Estimation Scheme for RIS-aided Millimeter-Wave Systems

Conference Paper

Jul 2022

Strategies for Enhancing Training and Privacy in Blockchain Enabled Federated Learning

Conference Paper

Full-text available

Sep 2020

Several recent advances in Federated Learning have made it possible for researchers to train their models on private data present on contributing devices without compromising their privacy. In this paradigm, each contributor's local updates are aggregated and averaged to update the global model. In this paper, we introduce a secure and decentralized training for distributed data. In order to develop an efficient decentralized system, blockchain technology is introduced via Ethereum, which enables us to create a value-driven incentive mechanism. This is done to encourage the contributors to positively affect the learning of the global model. We provide an enhanced security mechanism by implementing differential privacy and homomorphic encryption. The performance of the global model has been significantly boosted by implementing Elastic Weight Consolidation, which prevents Catastrophic forgetting, a scenario where the model learns only on new data and forgets its previous learnings. It proves essential in distributed training since the model is being trained on a spectrum of data, often present in clusters on each contributor's device. We introduce an innovative way of using hyperparameter optimization in federated learning with the help of Hyperopt and deposit based reward mechanism. Experiments verify the capability of the novel strategies incorporated in our system.

Optimizing Deep Learning Based Channel Estimation using Channel Response Arrangement

Conference Paper

Full-text available

Jul 2020

The techniques used in deep learning for channel estimation are generally model-centric. These models have changed significantly over the years with each iteration yielding a better estimator than the last. Fundamentally, channel estimation works by exploiting correlations in an array of complex numbers, in particular the channel gains for a fading channel. In this paper, we study the effects of the spatial arrangement of channel response and input data, on channel estimation. With the right spatial arrangement, we improved the performance of our convolutional neural network that was used for estimation. Additionally, we optimized the training procedure simultaneously. We experimentally validate the importance of spatial arrangement of data in obtaining an accurate deep learning model for the channel.

Outage Reduction with Joint Scheduling and Power Allocation in 5G mmWave Cellular Networks

Conference Paper

Full-text available

Oct 2017

Millimeter-wave (mmWave) communications is a promising technology which supports high datarates (multi- Gbps) by utilizing high bandwidth and the directional antenna. While the directionality reduces interference significantly and compensates the high propagation loss, it brings about two major problems. Firstly, mmWave links are easily blocked by obstacles like human bodies and buildings. Secondly, user mobility can frequently cause misalignments between transmitter and receiver beams, which is known as the deafness problem. In this paper, these problems are addressed and a joint scheduling and power allocation framework is proposed to reduce the outage proba- bility during user movement. Extensive simulations are done to demonstrate the pros and cons of the proposed algorithms and the improvement of system performance.

Noisy Softmax: Improving the Generalization Ability of DCNN via Postponing the Early Softmax Saturation

Conference Paper

Full-text available

Jul 2017

Over the past few years, softmax and SGD have become a commonly used component and the default training strategy in CNN frameworks, respectively. However, when optimizing CNNs with SGD, the saturation behavior behind softmax always gives us an illusion of training well and then is omitted. In this paper, we first emphasize that the early saturation behavior of softmax will impede the exploration of SGD, which sometimes is a reason for model converging at a bad local-minima, then propose Noisy Softmax to mitigating this early saturation issue by injecting annealed noise in softmax during each iteration. This operation based on noise injection aims at postponing the early saturation and further bringing continuous gradients propagation so as to significantly encourage SGD solver to be more exploratory and help to find a better local-minima. This paper empirically verifies the superiority of the early softmax desaturation, and our method indeed improves the generalization ability of CNN model by regularization. We experimentally find that this early desaturation helps optimization in many tasks, yielding state-of-the-art or competitive results on several popular benchmark datasets.

Blind Channel Estimation for Downlink Massive MIMO Systems With Imperfect Channel Reciprocity

Article

Apr 2020

We consider the performance of time-division duplex (TDD) massive multiple-input multiple-output (MIMO) with imperfect calibration of the transmit and receive radio frequency chains. By deriving the achievable signal-to-interference-plus-noise ratio SINR and the per-user bit error rate BER for constant modulus constellations, we establish that with linear precoding, reciprocity imperfections can result in substantial reduction of the array gain. To mitigate this loss, we propose an algorithm for blind estimation of the effective channel gain at each user. We show that given sufficiently many downlink symbols, this blind channel estimation algorithm restores the array gain. In addition, the proposed blind gain estimation algorithm can improve performance even under perfect reciprocity compared to standard hardening-based receivers. Following this, we derive the BERs for non-constant modulus constellations, viz. M-PAM and M-QAM. We corroborate all our derived results using numerical simulations.

Deep Learning for channel estimation in FSO communication system

Article

Nov 2019
OPT COMMUN

Perfect channel estimation is a complex task with high power consumption and cost; in addition, requiring pilot transmission reduces the data rate. So, it is not favourable especially in mobile communication systems. The aim of this paper is to design (a new, low cost and low complexity) deep learning based channel estimator for free space optical (FSO) communication. In order to have a better understanding, this paper goes deeper through the problem, and presents different new deep learning based FSO systems, in which deep learning is used as detector, joint constellation shaper and detector, channel estimator, joint channel estimator and detector, joint constellation shaper and channel estimator and detector. For comparison with conventional systems, the outstanding QAM modulation, perfect channel estimation and maximum likelihood detection is applied. Considering wide range of atmospheric turbulences, from weak to strong by Gamma–Gamma model, symbol error rate performance of the proposed structures is investigated. Results indicate that the proposed deep learning based channel estimation technique, despite its less complexity, cost and power consumption provides close enough performance to the perfect channel estimation. It should be noted that the proposed structure does not need pilot sequence, hence, it has higher data rate than perfect channel estimation. The performance of the proposed deep learning based structures does not change with atmospheric turbulence variation. Furthermore, they are low cost, low complexity, with favourable performance. Accordingly, they could be good choices especially for mobile communication systems. Because the transceiver of these systems is a small mobile phone that should have low cost, complexity, and power consuming.

Efficient Channel Estimation for mmWave MIMO With Transceiver Hardware Impairments

Article

Aug 2019

Transceiver hardware impairments (e.g., phase noise, high power amplifier nonlinearities, in-phase/quadrature-phase imbalance, and quantization errors) degrade the performance of channel estimation in millimeter wave (mmWave) multiple-input-multiple-output (MIMO) systems. Although compensation methods can be exploited to mitigate the impact of hardware impairments, there always remains residual impairments that will distort the training pilots and received signals. In this paper, we reformulate the channel estimation with transceiver impairments into a sparse recovery problem from a Bayesian perspective and propose an efficient channel estimation algorithm. The proposed algorithm can effectively deal with the perturbation in channel estimation problem caused by the transmitter hardware impairments with small amount computation times. Simulation results demonstrate the superior performance of the proposed algorithm compared to the conventional orthogonal matching pursuit algorithm (OMP) based channel estimation method and the Bayesian inference method.

Imagenet classification with deep convolutional neural networks

Conference Paper

Jan 2012

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry

Learning the MMSE Channel Estimator

Article

Jul 2017

We present an O(M log M) method for estimating M-dimensional structured channels that uses techniques from the field of machine learning. Our channel model is typical in communications: the channel vector is normal distributed given an unknown covariance matrix, which depends on random hyperparameters such as the angles of the propagation paths. If the channel model exhibits certain Toeplitz and shift-invariance structures, the complexity of the MMSE estimator can be reduced to O(M log M); otherwise, it is much higher. To obtain an O(M log M) estimator for the general case, we use the structure of this specific MMSE estimator as an informed guess for the architecture of a neural network. We discuss how this network can be efficiently trained with channel realizations to learn the MMSE estimator in the class of O(M log M) estimators.

Millimeter-Wave Channel Measurements and Analysis for Statistical Spatial Channel Model in In-Building and Urban Environments at 28 GHz

Article

Jun 2017

The millimeter-wave (mmWave) band will be a key component of fifth-generation (5G) wireless communication systems. This paper presents radio propagation measurements and analysis investigating the wideband directional channel characteristics of mmWave transmission for in-building and urban cellular communication systems in the 28 GHz band. Based on the measurements, we analyze and model the spatio-temporal channel characteristics such as multipath delay, angular statistics, and path loss. In particular we investigate the clustering of the multipath components, and investigate both the intra-cluster and inter-cluster distributions. Based on these investigations, we present a complete channel model suitable for system simulations in the in-building and urban environments.

Continual Learning-Based Channel Estimation for 5G Millimeter-Wave Systems

Recommended publications

3D Metallic Plate Lens Antenna based Beamspace Channel Estimation Technique for 5G Mmwave Massive MI...

Simultaneous Channel Estimation and Data Detection Based on Superimposed Training for Many Access MI...

A Low Complexity Channel Estimation and Detection for Massive MIMO Using SC-FDE

Joint Synchronization, Phase Noise and Compressive Channel Estimation in Hybrid Frequency-Selective...