PreprintPDF Available

A Lite Distributed Semantic Communication System for Internet of Things

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

The rapid development of deep learning (DL) and widespread applications of Internet-of-Things (IoT) have made the devices smarter than before, and enabled them to perform more intelligent tasks. However, it is challenging for any IoT device to train and run a DL model independently due to its limited computing capability. In this paper, we consider an IoT network where the cloud/edge platform performs the DL based semantic communication (DeepSC) model training and updating while IoT devices perform data collection and transmission based on the trained model. To make it affordable for IoT devices, we propose a lite distributed semantic communication system based on DL, named L-DeepSC, for text transmission with low complexity, where the data transmission from the IoT devices to the cloud/edge works at the semantic level to improve transmission efficiency. Particularly, by pruning the model redundancy and lowering the weight resolution, the L-DeepSC becomes affordable for IoT devices and the bandwidth required for model weight transmission between IoT devices and the cloud/edge is reduced significantly. Through analyzing the effects of fading channels in forward-propagation and back-propagation during the training of L-DeepSC, we develop a channel state information (CSI) aided training processing to decrease the effects of fading channels on transmission. Meanwhile, we tailor the semantic constellation by quantization for the current antenna design. Simulation demonstrates that the proposed L-DeepSC achieves competitive performance compared with traditional methods, especially in the low signal-to-noise (SNR) region. In particular, while it can reach as large as 20x compression ratio without performance degradation.
Content may be subject to copyright.
SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 1
A Lite Distributed Semantic Communication
System for Internet of Things
Huiqiang Xie and Zhijin Qin
Abstract—The rapid development of deep learning (DL) and
widespread applications of Internet-of-Things (IoT) have made
the devices smarter than before, and enabled them to perform
more intelligent tasks. However, it is challenging for any IoT
device to train and run a DL model independently due to its
limited computing capability. In this paper, we consider an IoT
network where the cloud/edge platform performs the DL based
semantic communication (DeepSC) model training and updating
while IoT devices perform data collection and transmission based
on the trained model. To make it affordable for IoT devices, we
propose a lite distributed semantic communication system based
on DL, named L-DeepSC, for text transmission with low com-
plexity, where the data transmission from the IoT devices to the
cloud/edge works at the semantic level to improve transmission
efficiency. Particularly, by pruning the model redundancy and
lowering the weight resolution, the L-DeepSC becomes affordable
for IoT devices and the bandwidth required for model weight
transmission between IoT devices and the cloud/edge is reduced
significantly. Through analyzing the effects of fading channels in
forward-propagation and back-propagation during the training
of L-DeepSC, we develop a channel state information (CSI) aided
training processing to decrease the effects of fading channels on
transmission. Meanwhile, we tailor the semantic constellation
by quantization for the current antenna design. Simulation
demonstrates that the proposed L-DeepSC achieves competitive
performance compared with traditional methods, especially in
the low signal-to-noise (SNR) region. In particular, while it can
reach as large as 20x compression ratio without performance
degradation.
Index Terms—Internet of Things, neural network compression,
pruning, quantization, semantic communication.
I. INTRODUCTION
With the widely deployed connected devices, Internet of
Things (IoT) networks are providing more and more intelligent
services, i.e., smart home, intelligent manufacturing, and smart
cities, by processing a massive amount of data generated by
those connected devices [1], [2]. Deep learning (DL) [3] has
demonstrated great potentials in processing various types of
data, i.e., images and texts. The DL-enabled IoT devices are
capable of exploiting and processing different types of data
more effectively as well as handling more intelligent tasks than
before. Although some IoT devices have certain capability to
process simple DL models, the limited memory, computing,
and battery capability still prevent from wide applications of
DL [4]. Therefore, the burden of DL model updates is usually
transferred to the cloud/edge platform [5]. Particularly, the DL
model is trained at the cloud/edge platform based on data
from the IoT devices, and then the trained model is distributed
Huiqiang Xie and Zhijin Qin are with the School of Electronic Engineering
and Computer Science, Queen Mary University of London, London E1 4NS,
UK (e-mail: h.xie@qmul.ac.uk, z.qin@qmul.ac.uk).
to IoT devices. However, data transmitted over the air could
be distorted by wireless channels, which may cause improper
trained results, i.e., local optimum. Moreover, the large number
of parameters in DL models leads to high latency when
distributing the DL models with limited bandwidth. Therefore,
transmitting accurate data to the cloud/edge platform over
wireless channels for model training and reducing the number
of parameters in DL models for lower latency and power
consumption at the IoT devices are two crucial problems.
To address the first problem on accurate data transmission
in an IoT network, semantic communication system, which
interprets information at the semantic level rather than bit
sequences [6], is promising. To make a decision from the
received information, there are usually three steps, i) the
traditional communication receiver to recover the raw data [7],
ii) the feature extractor to obtain and interpret the meanings
of the raw data for the decision [8], and iii) the effects
due to decisions according to the extracted features [9],
[10]. Corresponding to the three steps, the communication is
categorized into three levels correspondingly [11], including
transmission level, semantic level, and effectiveness level, as
explained in Fig. 1. The traditional communication system
works at the transmission level shown in Fig. 1(a), which
aims to transmitting and receiving symbol accurately [12].
The followed feature extractor network and effect networks are
designed separately based on applications. However, designing
these modules separately may lead to error propagation and
prevent from reaching joint optimality. For example, the fea-
ture network is not able to correct errors from the traditional
receiver, which will affect the subsequent decision making in
the effect network. Thus, through designing the traditional
receiver and feature extractor network jointly (the semantic
level) or merging traditional receiver, feature extractor net-
work, and effects network together (the effectiveness level),
communication systems have the capability of error correction
at the semantic level and effectiveness level, respectively.
In this paper, we will focus on distributed semantic com-
munications for IoT networks and leave effectiveness level
communication to the future research.
With the recent advancements on DL, it is promising to
represent a traditional transceiver or each individual signal
processing block by a deep neural network (DNN) [13].
There have been some initial works related to deep semantic
communications [14]–[17]. Bourtsoulatze et al. [14] proposed
joint source-channel coding for wireless image transmission
based on the convolutional neural network (CNN), where
peak signal-to-noise ratio (PSNR) is used to measure the
accuracy of image recovery at the receiver. Taking image
arXiv:2007.11095v1 [eess.SP] 21 Jul 2020
SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 2
Traditional
Receiver
Feature Extractor
Network
Effect
Network
Effect
Network
Semantic Receiver
Effect Receiver
(a) Transmission level
(b) Semantic level
(c) Effectiveness level
Received
symbols
Recovered
symbols
Extracted
Features
Received
Features
Recovered
Features
Take
Action
Take
Action
Take
Action
Received
Features
Fig. 1. Illustration of three communication levels at the receiver.
classification tasks into consideration, Lee et al. [18] devel-
oped an transmission-recognition communication system by
merging wireless image transmission with the effect network
as DNNs, i.e., image classification, which achieves higher
image classification accuracy than performing them separately.
For texts, Farsad et al. [16] designed joint source-channel
coding for erasure channel by using a recurrent neural network
(RNN) and a fully-connected neural network (FCN), where
the system recovers the text directly rather than perform
channel and source decoding separately. In order to understand
texts better and apply it in dynamic environments, Xie et
al. [17] developed a semantic communication system based
on Transformer, named DeepSC, which clarifies the concepts
of semantic information and semantic error at the sentence-
level for the first time. In brief, compared with traditional
approaches, the semantic communication systems are more
robust to channel variation and are able to achieve better per-
formance in terms of source recovery and image classification,
especially in the low signal-to-noise (SNR) regime.
To deal with the second problem on reducing the number of
parameters, network slimmer has attracted extensive attention
to compress DL models without degrading performance since
neural networks are trained usually with over-parameters [19].
Parameters pruning and quantization are two main approaches
for DL model compression. Parameter pruning is to remove
the unnecessary connections between two neurons or impor-
tant neurons. Han et al. [20] proposed an iterative pruning
approach, where the model is trained first, then pruned by a
given threshold, and is finely tuned to recover performance in
terms of image classification. This approach could reduce the
connections without losing accuracy. Liu et al. [21] proposed
to prune the filters in CNN by training the model with
the regularization loss function so that redundancy weights
converge to zero directly without sacrificing the performance.
By analyzing the connection sensitivity among neurons and
layers, Li et al. [22] remove the insensitive layers, which
further increases inference speed. By applying these pruning
approaches, DL models can be compressed by 13 to 20 times.
Quantization aims to represent a weight parameter with lower
precision (fewer bits), which reduces the required bitwidth of
data flowing through the neural network model in order to
shrink the model size for memory saving and simplify the
operations for computing acceleration [23]. With vector quan-
tization, Gong et al. [24] quantize the DL models. Similarly,
Zhou et al. [25] investigated an iterative quantization, which
starts with a trained full-resolution model and then quantizes
only a portion of the model followed by several epochs of
re-training to recover the accuracy loss from quantization.
A mix precision quantization by Li et al. [26] quantizes
weights while keeping the activations at full-resolution. The
training algorithm by Jacob et al. [27] preserves the model
accuracy after post-quantization. With the quantization, the
weights can generally be compressed from 32-bit to 8-bit
without performance loss. Similarly, pruning and quantizing
can be also used in DL-enabled communication systems. For
example, Guo et al. [28] have shown that model compression
can accelerate the processing of channel state information
(CSI) acquisition and signal detection in massive multiple-
input multiple-output (MIMO) systems without performance
degradation.
Through applying network slimmer into our existing work
DeepSC, the aforementioned two challenges in IoT networks
can be effectively addressed. Although the above works vali-
date the feasibility, we still face the following issues for make
it affordable for IoT devices:
Question 1: How to design semantic communication
system over wireless fading channels?
Question 2: How to form the constellation to reduce the
burden on antenna?
Question 3: How to compress semantic models for fast-
model transmission and low-cost implementation on IoT
devices?
In this paper, we design a distributed semantic communication
system for IoT networks. Specially, a lite DeepSC is proposed
(L-DeepSC) to address the above questions. The main contri-
butions of this paper are summarized as follows.
We design a distributed semantic communication network
under power and latency constraints, in which the receiver
and feature extractor networks are jointly optimized by
overcoming fading channels.
By identifying the impacts of CSI on DL model training
over fading channels, we propose a CSI-aided semantic
communication system to speed up convergence, where
the CSI is refined by a de-noise neural network. This
addresses aforementioned Question 1.
To alleviate the burden on antenna for data transmission
and receiving, we design a finite-bits constellation to
solve Question 2.
Due to over-parametrization, we propose a model com-
pression algorithm, including network sparsification and
quantization, to reduce the size of DL models by pruning
the redundancy connections and quantizing the weights,
which addresses aforementioned Question 3.
The rest of this paper is organized as follows. The dis-
tributed semantic communication system model is introduced
and the corresponding problems are identified in Section II.
Section III presents the proposed L-DeepSC. Numerical results
are used to verify the performance of the proposed L-DeepSC
in Section IV. Finally, Section V concludes this paper.
Notation:Cn×mand Rn×mrepresent the sets of complex
and real matrices of size n×m, respectively. Bold-font
variables denote matrices or vectors. x CN (µ, σ 2)means
SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 3
Semantic
features
Devices
Cloud/Edge Computing
Semantic Encoder Channel Encoder
Semantic Decoder Channel Decoder
Physical Channel
Source
Recovered
Feature/Source
(a) Proposed distributed semantic
communication network. (b) Semantic Communication System
Model Initialization/Update
(b) Semantic communication system
Distributed Semantic communication Network
Transmitter
Semantic
Decoder
Receiver
Physical Channel
Semantic Channel
Channel
Decoder
Channel
Encoder
Training
Dataset
Semantic
Encoder
Training
Dataset
s
X
ˆ
s
ˆ
s
Semantic
Encoder Channel
Encoder Channel
Decoder Semantic
Decoder
Physical
Channel
Semantic
Channel
Transmitter Receiver
s
Y
ˆ
s
Fig. 2. The framework of semantic communications for IoT networks.
variable xfollows the circularly-symmetric complex Gaussian
distribution with mean µand covariance σ2.(·)Tand (·)H
denote the transpose and Hermitian of a vector or a matrix,
respectively. <{·} and ={·} refer to the real and the imaginary
parts of a complex number.
II. SY ST EM MO DE L AN D PROB LE M FOR MU LATI ON
Text is an important type of source data, which can be
sensed from speaking and typing, environmental monitoring,
etc. By training DL models with these text data at cloud/edge
platform, the DL models based IoT devices have the capability
to understand text data and generates semantic feature to be
transmitted to the center to perform intelligent tasks, i.e.,
intelligent assistants, human emotion understanding, and envi-
ronment humid and temperature adjustment based on human
preference [29].
As shown in Fig. 2(a), we focus on distributed seman-
tic communications for IoT networks. The considered sys-
tem is consisted of various IoT networks with two layers,
the cloud/edge platform and distributed IoT devices. The
cloud/edge platform is equipped with huge computation power
and big memory, which can be used to train the DL model by
the received semantic features. The semantic communication
enabled IoT devices perform intelligent tasks by understanding
sensed texts, which are with limited memory and power
but expected long lifetime, i.e., up to 10 years. Particularly,
our considered distributed semantic communication system
consists of the following three steps:
1) Model Initialization/Update: The cloud/edge platform
first trains the semantic communication model by initial
dataset. The trained model is updated in the subsequent
iterations by the received semantic features from IoT
devices.
2) Model Broadcasting: The cloud/edge platform broad-
casts the trained DL model to each IoT devices.
3) Semantic Features Upload: The IoT devices constantly
capture the text data, which are encoded by the proposed
semantic transmitter shown in Fig. 2(b). The extracted
semantic features are then transmitted to the cloud/edge
for model update and subsequent processing.
The aforementioned Questions 1-3 correspond to model ini-
tialization/update, semantic features uploading and model
broadcasting, respectively. Different from the traditional in-
formation transmission, semantic features can be not only
used for recovering the text at semantic level accurately, but
also exploited as the input of others modules, i.e., emotion
classification, dialog system, and human-robot interaction, for
training effect networks and perform various intelligent tasks
directly. The devices can also exchange semantic features,
which has been previously discussed in our work in [17]. We
focus on the communication between cloud/edge platform and
local IoT devices to make the semantic communication model
affordable.
A. Semantic Communication System
The DeepSC shown in Fig. 2(b) can be divided into three
parts mainly, transmitter network, physical channel, and re-
ceiver network, where the transmitter network includes se-
mantic encoder and channel encoder, and the receiver network
consists of semantic decoder and channel decoder.
We assume that the input of the DeepSC is a sentence,
s= [w1, w2,· · · , wN], where wnrepresents the n-th word in
the sentence. The encoded symbol stream can be represented
as
X=CαSβ(s),(1)
where Sβ(·)is the semantic encoder network with parameter
set βand Cα(·)is the channel encoder with parameter set α.
If Xis sent to a wireless fading channel, the signal received
at the receiver can be given by
Y=fH(X) = HX +N,(2)
SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 4
where H1represents the channel gain between the transmitter
and the receiver, and N CN 0, σ 2
nis additive white
Gaussian noise (AWGN).
The decoded signal can be represented as
ˆs =S1
χC1
δ(Y),(3)
where ˆs is the recovered sentence, C1
δ(·)is the channel
decoder with parameter set δand S1
χ(·)is the semantic
decoder network with parameter set χ.
The whole semantic communication can be trained by the
cross-entropy (CE) loss function, which is given by
LCE(s,ˆs) = X
i=1
(q(wi)1) log (1 p(wi))
X
i=1
q(wi) log (p(wi)),(4)
where q(wi)is the real probability that the i-th word, wi,
appears in source sentence s, and p(wi)is the predicted
probability that the i-th word, wi, appears in ˆ
s. CE can
measure the difference between the two distributions. Through
minimizing the CE loss, the network can learn the word
distribution, q(wi), in the source sentence, s. Consequently,
the syntax, phrase, and the meaning of words in the context
can be learnt by DNNs.
B. Problem Description
Instead of bits, the input sentence, s, in the DeepSC, will
cause that the learned constellation is no longer limited to
a few points any more. After transmitting X, the fading
channel increases the difficulty of model training compared
with the AWGN channel. Meanwhile, the huge number of
parameters, α,β,χ,δ, indicates the complexity of the whole
model. These factors limit DeepSC for IoT networks, and
incur the aforementioned Questions 1-3, including feasible
constellation design, training for fading channel, and model
compression.
1) Training of fading channel: In DL, the training process
can be divided forward-propagation to predict the target and
back-propagation to converge the neural network, as stated in
the following.
Forward-propagation: From the received signal to recover
semantic information, the estimation sentence is given by
ˆs =S1
χC1
δ(Y),(5)
Back-propagation: Taking semantic encoder as an exam-
ple, the parameter vector at the tth iteration are is updated by
β(t) = β(t1) ηLCE
β,(6)
where ηis the learning rate and LCE
βis the gradient, computed
by LCE
β=LCE
ˆs
ˆs
Y
Y
X
X
β
=LCE
ˆs
ˆs
YHX
β.
(7)
1Here, we have avoided discussion of complex channels. If the complex
channel is ¯
H, then ¯
H= [<(H),−= (H) ; =(H),<(H)].
In (7), Hwill introduce stochasticity during weight updat-
ing. For an AWGN channel, H=Iwill not affect it. The
DL model, thus, can achieve the global optimum. However,
for fading channels, His random, which leads to that β
fails to converge to the global optimum while the forward-
propagation in (5) is unable to recover semantic information
accurately based on the local optimum. Thus, it is critical to
design training process to mitigate the effects of H, which
also makes the DeepSC applicable for fading channels.
2) Feasible constellation design: Generally, the DL mod-
els run on floating-point operations (FLOPs), which means
that the input, output, and weights are in a large range of
±1.40129 ×1045 to ±3.40282 ×10+38. Although DeepSC
can learn the constellations from the source information and
channel statistics, the learned constellation points, such as
cluster constellation [30], are disordered in the range of
±1.40129 ×1045 to ±3.40282 ×10+38, which brings addi-
tional burden to the antenna design for IoT devices. Therefore,
it is desired to form feasible constellation with only finite
points for the current radio frequency (RF) systems. In other
words, we have to design a smaller constellation for the
DeepSC.
3) Model communication: The more parameters DeepSC
has, the stronger its signal processing ability, which however
increase computational complexity and model size and result
in high power consumption. In the distributed DeepSC system,
the trained DeepSC model deployed at local IoT devices is
frequently updated to perform intelligent tasks better. The
IoT application limits the bandwidth and cost of distributing
the DeepSC model. Furthermore, to extend the IoT network
lifetime, especially the battery lifetime, most local devices
are with finite storage and computation capability, which
limits the size of DeepSC. Therefore, compressing DeepSC
not only reduces the latency of model transmission between
the cloud/edge platform and local devices but also makes it
possible to run the DL model on local devices.
III. PROP OS ED LI TE DISTRIBUTED SE MA NT IC
COMMUNICATION SYS TE M
To address the identified challenges in Section II, we pro-
pose a lite distributed semantic communication system, named
L-DeepSC. We analyze the effects of CSI in the model training
under fading channels and design a CSI-aided training process
to overcome the fading effects, which successfully deals with
Question 1. Besides, the weight pruning and quantization are
investigated to address Question 2. Finally, our finite-points
constellation design solves Question 3, effectively.
A. Deep De-noise Network based CSI Refinement and Can-
cellation
The most common method to reduce the effects of fading
channel in wireless communication is to use known channel
properties of a communication link, CSI. Similarly, CSI can
also reduce the channel impacts in training L-DeepSC. Next,
we will first analyze the role of CSI in L-DeepSC training.
SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 5
In order to simplify the analysis, we assume the transmitter
and the receiver are with one-layer dense with sigmoid activa-
tion, where transmitter has an additional untrainable embed-
ding layer, and receiver also has an untrainable de-embedding
layer. The IoT devices are with the trained transmitter model
and the cloud/edge platform works as the receiver, as shown
in the system model Fig. 2. The IoT devices and cloud/edge
platform are equipped with the same number of antennas.
After the embedding layer, the source message, s, is embedded
into, S. Then, the IoT devices encodes Sinto
X=σ(WTS+bT),(8)
where X2is the semantic features transmitted from the IoT
devices to the cloud/edge platform. WTand bTare the train-
able parameters to extract the features from source message
s, and σ(·)is the sigmoid activation function.
The received symbol at the cloud/edge platform is affected
by channel Hand AWGN as in (2). From the received symbol,
the cloud/edge platform recovers the embedding matrix by
ˆ
S=σ(WRY+bR),(9)
where the estimated source message, ˆ
s, can be obtained after
de-embedding layer. WRand bRcan learn to recover s. The
L-DeepSC can be optimized by the loss function in (4). The
fading channels not only contaminates the gradients in the
back-propagation, but also restricts the representation power
in the forward-propagation.
Back-propagation: It updates parameter WTby its gradi-
ent
LCE (ˆs,s)
WT
= (FRWRHFT)Tˆs LCE (ˆs,s)sT,(10)
where FRdiag (σ0(WRy+bR)) and FT
diag (σ0(WTs+bT)). In (10), the His untrainable
and random, therefore it will cause perturbation. If the
transmitter consists of very deep neural networks, the
gradient contamination will affect the back-propagation of
the whole transmitter network.
Forward-propagation: With the received signal WR, the
source messages can be recovered by
ˆ
S=σ(WRY+bR)
=σ(WRHX +WRN+bR).(11)
In (11), WRhas to learn how to deal with the channel
effects and decode at the same time, which increases training
burden and reduces network expression capability. Meanwhile,
the errors caused by channel effects also propagation to the
subsequent layers for the L-DeepSC receiver with multiple
layers.
The impacts of channel can be mitigated by exploiting CSI
at the cloud/edge. If channel His known, then the received
symbol can be processed by
˜
Y=HHH1HHY=X+˜
N,(12)
2Here, we have avoided discussion of complex signal. If the complex signal
is ¯
X, then ¯
X= [<(X),=(X)] .
ADNet
Channel
Cancellation Receiver
Hrough
Hrefine
LS
Estimator
pilot
Y
data
Y
data
Y
ˆ
s
Fig. 3. The proposed CSI refinement and cancellation based on de-noise
neural networks.
where ˜
N=HHH1HHN. In (12), the channel effect
is transferred from multiplicative noise to additive noise, ˜
N,
which provides the possibility of stable back-propagation as
well as the stronger capability of network representation.
With (12), back-propagation and forward-propagation can be
performed by setting H=Iin (10) and (11), respectively.
Therefore, the channel effects can be completely removed.
The above discussion shows the importance of CSI in
model training. However, CSI can be only estimated gen-
erally, by least-squared (LS), linear minimum mean-squared
error (LMMSE), or minimum mean-squared error (MMSE)
estimators. Due to exploiting prior channel statistics, LMMSE
and MMSE estimators usually perform better than the LS
estimators. Thus, LMMSE and MMSE estimators are sensitive
to the accuracy of channel statistic while LS estimator requires
no prior channel information.
For simplicity, we initially use the LS estimator. Then, we
adopt the deep de-noise network to increase the resolution of
LS estimator as in [31] shown in Fig. 3. Particularly, the rough
CSI estimated by LS estimator with few pilots first denoted
by
Hrough =H+N.(13)
From (13), Hrough consists of exact Hand the noise, N. De-
noise neural networks are used to recover Hmore accurately
from Hrough by considering Hand Hrough as the original pic-
ture and noisy picture, respectively. Here, we exploit attention-
guided denoising convolutional neural network (ADNet) [32]
to refine CSI, where the refined CSI, Hrefine denoted by
Hrefine =ADNet Hrough.(14)
In (14), the ADNet(·)is trained the the loss function,
L(Hrefine,H) = 1
2kHrefine Hk2
F. Since the performance
of the LS estimator is similar to that of LMMSE and MMSE
estimators in the high SNR region, we pay more attention
to the low SNR region when training ADNet. With proper
training, ADNet can mitigate the impacts from noise but
without any prior channel information, especially in the low
SNR region. Such a design provides a good solution for
Question 1.
B. Model Compression
Through applying CSI into model training, the cloud/edge
platform can extract the semantic features from L-DeepSC.
However, the size and complexity of trained L-DeepSC model
are still very large, which cause high latency for the cloud/edge
platform to broadcast updated L-DeepSC. Note that both
SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 6
0.88 0.19 0.35 -2.34
-1.08 0.55 0.93 -0.97
0.53 0.41 0.32 -0.49
-0.79 -0.84 -1.27 0.24
0.88 0 0 -2.34
-1.08 0.55 0.93 -0.97
0.53 0 0 0
-0.79 -0.84 -1.27 0
1 0 0 -1
-1 1 1 -1
1 0 0 0
-1 -1 -1 0
(a) (b) (c)
Fig. 4. Flowchart of the proposed joint pruning-quantization, (a) the original
weights matrix; (b) the weights after pruning, where the example pruning
function is x= 0 for x < 0.5; (c) the weights after quantization, where the
example quantization function is x=sign(x).
weights pruning and quantization can reduce the model size
and complexity, therefore, we compress the DeepSC model by
a joint pruning-quantization scheme to make it affordable for
IoT devices. As shown in Fig. 4, the original weights are first
pruned at a high-precision level by identifying and removing
the unnecessary weights, which makes the network sparse.
Quantization is then used to convert the trained L-DeepSC
model into a low-precision level. The proposed network spar-
sification and quantization can address Question 3 and are
introduced in detail in the following.
1) Network Sparsification: A proper criterion to disable
neural connections is important. Obviously, the connections
with small weight value can be pruned. Therefore, the pruning
issue here turns into setting a proper pruning threshold.
As shown in Fig. 2(b), the DeepSC consists with neural
networks, α,β,χ,δ, where each includes multiple layers.
Assume there are total Nlayers in the pre-trained DeepSC
model with W(n)
i,j being the weight of connection between
the ith neuron of the (n+ 1)th layer and jth neuron of nth
layer. With a pruning threshold wthre, the model weights can
be pruned by
W(n)
i,j =(W(n)
i,j ,if W(n)
i,j > wthre,
0,otherwise,(15)
We determine the pruning threshold by
wthre =sM×γ,(16)
where s= sort W(1) ,W(2),· · · ,W(N), is the sorted
weights value from least important one to the most important
one, Mis the total number of connections, and γ, the sparsity
ratio between 0 and 1, indicates the proportion of zero values
in weights. The weight pruning can be divided into two steps,
weight pruning to disable some neuron connections and fine-
tine to recover the accuracy, as shown in Algorithm 1.
2) Network Quantization: The quantization includes weight
quantization and activation quantization. The weights, W(n)
i,j ,
from a trained model, can be converted from 32-bit float point
to m-bits integer through applying the quantization function
by
˜
W(n)
i,j =round qwW(n)
i,j min W(n),(17)
Algorithm 1 Network Sparsification.
Input: The pre-trained weights W, the sparse ratio γ.
Output: The pruned weights Wpruned.
1: Count the the total number of connections, M.
2: Sort the whole connections from small to large, s.
3: Obtain the threshold by (16) with Mand γ,wthre.
4: for n= 1 to Ndo
5: Prune the connections by (15), W(n)
pruned.
6: end for
7: Fine-tune the pruned model by loss function (4)
Algorithm 2 Network Quantization.
Input: The pre-trained weights W, the quantization level m,
the correlation coefficient c, and the calibration data K.
Output: The pre-trained weights Wquantized and the range of
activation xmin and xmax.
1: Phase 1: Weights Quantization.
2: for n= 1 to Ndo
3: Compute the range of weights, max W(n)and
min W(n).
4: Quantize the weights by (17), ˜
W(n).
5: end for
6: Phase 2: Activations Quantization.
7: for t= 1 to Kdo
8: for n= 1 to Ndo
9: Update the dynamic range of activation by (19) and
(20), x(n)
min(t)and x(n)
max(t).
10: end for
11: end for
12: Quantize the activations by (21).
13: Fine-tune the quantized model by STE and loss function
(4).
where qwis the scale-factor to map the dynamic range of float
points to an m-bits integer, which is given by
qw=2m1
max W(n)min W(n).(18)
For activation quantization, the results of matrix multiplica-
tion are stored in accumulators. Due to the limited dynamic
range of integer formats, it is possible that the accumulator
overflows quickly if the bit-width for the weights and activa-
tion is same. Therefore, accumulators are usually implemented
with higher bit-widths, for example, INT32 += INT8×INT8.
Besides, the range of activations is dynamic and dependent on
the input data. Therefore, the output of activations has to re-
quantize into m-bits integer for the subsequent calculation.
Unlike weights that are constant, the output of activations
usually includes elements that are statistically outliers, which
expand the actual dynamic range. For example, even if 99%
of the data is distributed between -100 and 100, an outlier,
10,000, will extend the dynamic range into from -100 to
10,000, which significantly reduces the mapping resolution. In
order to reduce the influence from the outliers, an exponential
SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 7
moving average (EMA) is used by
x(n)
min(t+ 1) = (1 c)x(n)
min(t) + cmin X(n)(t),(19)
and
x(n)
max(t+ 1) = (1 c)x(n)
max(t) + cmax X(n)(t),(20)
where x(n)
min(t+ 1) and x(n)
max(t+ 1) are used for the range
of activation quantization, and x(n)
min(1) = min X(n)(1),
x(n)
max(1) = max X(n)(1),X(n)(t)is the output of activa-
tions at nth layer with tth batch data, c[0,1) represents
the correlation between the current x(n)
min/x(n)
max with its past
value. The effects from outliers can be mitigated by the past
normal values. After t+1 epochs, the x(n)
min and x(n)
max are fixed
based on x(n)
min(t+ 1) and x(n)
max(t+ 1). Then, the output of the
activations can be quantized by
˜
X(n)=clamp round qxX(n)x(n)
min;M, M ,
(21)
where qa= (2m1)/(x(n)
max x(n)
min)is the scale-factor and
clamp (·)is used to eliminate the quantized outliers, which is
given by
clamp X(n);T , T = min max X(n),T, T ,(22)
where T= 2m1, which is the border of the m-bits integer
format.
As shown in Algorithm 2, the network quantization includes
two phases: i) weight quantization; ii) activations quantization.
In phase 1, the weights of each layers can be quantized
by (17) directly. In phase 2, calibration process is applied
by running a few calibration batches in order to get the
activations statistics. In each batch, x(n)
min(t)and x(n)
max(t)will
be updated based on the activations statistics from the previous
batches. These quantization processes might lead to slight
accuracy degradation. The quantization-aware training (QAT)
is required to re-train for minimizing the loss of accuracy.
Since the rounding operation is not derivable, straight-through
estimator (STE) is used to estimate the gradient of quantized
weights in the back-propagation [33].
C. Constellation Design with Fewer Quantization Bits
The cloud/edge platform can further reduce the size of L-
DeepSC with model compression after the model is trained,
which not only reduces the latency significantly for broad-
casting the updated DeepSC to IoT devices, but also changes
DeepSC to L-DeepSC with low complexity. However, the
antenna of IoT devices is not able to create high-resolution
wave, in other words, the antenna cannot afford a large number
of constellation points close to each other.
Different from bits, the source message, s, is more com-
plicated and the learned constellation will not be limited to
few points, which brings additional burden on antenna design.
Besides, the DL model generally run in FP32, which also
expands the range of constellation. Thus, we aim to reduce the
size of learned constellation without degrading performance,
where the output of Xis the learned constellation while X
is also the output of activation of last layer at the local IoT
TABLE I
THE S ETT IN G OF L-DEE PSC RECEIVER.
Layer Name Units Activation
Receiver
(Decoder)
Dense 1 128 Relu
Dense 2 512 Relu
Dense 3 128 None
LayerNorm None None
4×Transformer Decoder 128 (8 heads) None
Prediction Layer Dictionary Size Softmax
devices. Inspired from the network quantization, we convert
the learned high-resolution constellation into low-resolution
one with few points. Thus, we use two-stage quantization to
narrow the range of constellations, which is represented by
Xdequantize =Xquantize
qx
+xmin,(23)
where Xquantize is the quantized Xfrom (21), qxis the scale-
factor and xmin is the obtained by (19) and Xdequantize is the
dequantized X.
First, we quantize the Xinto m-bits integer so that the range
of Xis narrowed to the size of 2m. For example, when m= 8,
the size of constellation is reduced to 256. Then, Xquantize is
dequantize to restore X. Such an Xdequantize has the similar
distribution as Xbut is with fewer constellation points, which
is helpful to simplify antenna design at receiver and preserves
the performance as much as possible and therefore provides
the solution for Question 2.
In summary, by exploiting the solutions for the afore-
mentioned Questions, we develop a lite distributed semantic
communication system, named L-DeepSC, which could reduce
the latency for model exchange under limited bandwidth, run
the models at IoT devices with low power consumption, and
deal with the distortion from fading channels when upload-
ing semantic features. As a result, the proposed L-DeepSC
becomes a good candidate for the IoT networks.
IV. NUMERICAL RES ULT S
In this section, we compare the proposed L-DeepSC with
traditional methods under different fading channels, including
Rayleigh and Rician fading channels. The weights pruning
and quantization are also verified under fading channels. For
the Rayleigh fading channel, the channel coefficient follows
CN (0,1); for the Rician fading channel. it follows C N (µ, σ2)
with µ=pk/(k+ 1) and σ=p1/(k+ 1). where kis
Rician coefficient and we use k= 2 in our simulation.
The transmitter of L-DeepSC is the same as that of DeepSC
in [17]. The parameters for the decoding network at the
receiver are shown in Table I for the fading channels, where
the sum of the outputs of Dense 1 and Dense 3 is the input
of LayerNorm layer.
The adopted dataset is the proceedings of the European
Parliament [34], which consists of around 2.0 million sen-
tences and 53 million words. The dataset is pre-processed
into lengths of sentences with 4 to 30 words and is split
into training data and testing data with 0.1 ratio. The bench-
mark approach is based on separate source coding and chan-
nel coding technologies, which adopt variable-length coding
SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 8
(a) Full-resolution Constellation (b) 4-bits constellation
Fig. 5. The comparison between the full-resolution constellation and 4-bits constellation.
(Huffman coding) and fixed-length coding (5-bit) for source
coding, Reed-Solomon (RS) coding [35] for channel coding,
and quadrature amplitude modulation (QAM). The bilingual
evaluation understudy (BLEU) score is used to measure the
performance [36].
A. Constellation Design
Fig. 5 compares the full-resolution constellation and the
4-bits constellation. The full-resolution constellation points
in Fig. 5(a) contain more information due to the higher
resolution, but require complicated antenna, which is almost
impossible to design. Through mapping the full-resolution
constellation into a finite space, the 4-bits constellation points
in Fig. 5(b) become simplified, which makes it possible
to implement in the existing RF system. Note that the 4-
bits constellation keeps the similar distribution with the full-
resolution constellation. For example, there exists certain blank
region in the edge of constellation in Fig. 5(a), while the 4-bits
constellation shows the similar trend in Fig. 5(b). Such similar
distribution prevents sharp performance degradation when the
resolution of constellation decreases significantly.
Fig. 6 shows the BLEU scores versus SNR for different con-
stellation sizes under AWGN, including 4-bits constellation, 8-
bits constellation and full-resolution constellation. All of them
could achieve very similar performance when SNR >9dB,
which demonstrate the constellation design is effective and
cause no significant performance degradation. Full resolution
and 8-bits constellations perform slightly better than 4-bits
constellation when SNR in low. This is because some weights
information used for denoising is lost when the resolution of
constellation is small.
B. Performance over Fading Channels
Fig. 7 compares the channel estimation MSEs of LS,
MMSE, and ADNet-aided LS estimator versus SNR under the
Rayleigh fading channels. Note that MMSE equals to LMMSE
for the AWGN channels. The MMSE and LS estimators have
0 3 6 9 12 15 18
SNR (dB)
0
0.2
0.4
0.6
0.8
1
BLEU Score
4-bits Constellation
8-bits Constellation
Original Constellation
Fig. 6. The BLEU scores of different constellation sizes versus SNR under
AWGN.
similar accuracy in the high SNR region, thus the range of
training SNRs for the ADNet is set from 0 dB to 10 dB to
improve the performance of LS estimator in the low SNR
region. As a result, the MSE of ADNet based LS estimator
is significantly lower than that of LS and MMSE estimators
when SNR is low. With increasing SNR, the MSE of ADNet
based LS estimator approaches to that of the LS and MMSE
estimators. Therefore, the ADNet based LS estimator can be
substituted by the LS estimator to reduce the complexity in
the high SNR region.
Fig. 8 and Fig. 9 illustrate the relationship between BLEU
score and SNR with the 4-bits constellation over the Rician
and the Rayleigh fading channels, respectively, where the L-
DeepSC is trained with perfect CSI, rough CSI by (13), refined
CSI by (14) and without CSI, respectively. The traditional
approaches are Huffman coding with (5,7) RS and 5-bit coding
with (7,9) RS, both with 64-QAM. We observe that all DL-
enabled approaches are more competitive under the fading
SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 9
0 3 6 9 12 15 18
SNR (dB)
10-3
10-2
10-1
100
MSE
MMSE estimator
LS estimator
LS estimator with ADNet
Fig. 7. The MSE for MMSE estimator, LS estimator, and the proposed ADNet
based LS estimator.
0 3 6 9 12 15 18
SNR (dB)
0
0.2
0.4
0.6
0.8
1
BLEU Score
L-DeepSC with perfect CSI
L-DeepSC with refined CSI
L-DeepSC with rough CSI
L-DeepSC without CSI
huffman + RS with perfect CSI
5-bit + RS with perfect CSI
Fig. 8. The BLEU scores versus SNR under Ricain fading channels, with
perfect CSI, rough CSI, refined CSI, and no CSI.
channels. The system trained without CSI performs worse
than those trained with CSI, especially under the Rayleigh
fading channels, which also confirms the analysis of (10) and
(11). Without CSI, the performance difference between the
Rayleigh channels and the Rician channels is caused by the
light-of-sight (LOS), which can help the systems recognize
the semantic information during training. Besides, with the
aid of CSI, the effects of the fading channels are mitigated
significantly, as we have analyzed before. When SNR is
low, the system with perfect CSI or refined CSI outperform
that with rough CSI. As SNR increases, all these systems,
L-DeepSC with perfect CSI, refined CSI and rough CSI,
converge to similar performance gradually.
C. Model Compression
In this experiment, we investigate the performance of
network slimmer, including network sparification, network
quantization, and the combination of both. The pre-trained
0 3 6 9 12 15 18
SNR (dB)
0
0.2
0.4
0.6
0.8
1
BLEU Score
L-DeepSC with perfect CSI
L-DeepSC with refine CSI
L-DeepSC with rough CSI
L-DeepSC without CSI
huffman + RS with perfect CSI
5-bit + RS with perfect CSI
Fig. 9. The BLEU scores versus SNR under Rayleigh fading channels, with
perfect CSI, rough CSI, refined CSI, and no CSI.
0 0.1 0.3 0.5 0.7 0.9 0.99
0
0.2
0.4
0.6
0.8
1
BLEU Score
SNR = 0 dB
SNR = 6 dB
SNR = 12 dB
SNR = 18 dB
Fig. 10. The BLEU scores of different SNRs versus sparsity ratio, γ, under
Rician fadings channel with the refined CSI.
model used for pruning and quantization is trained with 4-
bits constellation under the Rician fading channels.
Fig. 10 shows the influences of network sparsity ratio, γ,
on the BLEU scores with different SNRs under the Rician
fading channels, where the system is pruned directly when
γincreases from 0 to 0.9 and is pruned with fine-tuning
when γincreases to 0.99 continually. The proposed L-DeepSC
achieves almost the same BLEU scores when the γincreases
from 0 to 0.9, which shows that there exists a mass of
weights redundancy in the trained DeepSC model. When the
γincreases to 0.99, the BLEU scores still drop slightly due to
the processing of fine-tuning, where the performance loss at
0 dB and 6 dB is larger than that at 12 dB and 18 dB. Thus,
for the high SNR cases, the model can be pruned directly
with only slight performance degradation. For the low SNR
region, it is possible to prune 99% weights without significant
performance degradation when the system is sensitive to power
consumption.
Fig. 11 demonstrates the relationship between the BLEU
SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 10
TABLE II
THE BLEU SCORE AND COMPRESSION RATIO,ψ, COMPARISONS VERSUS DIFFERENT SPARSITY RATIO,γ,AN D QUAN TI ZATIO N LE VEL ,m,IN SNR =
12dB .
Pruned Model BLEU score
with m= 4 ψBLEU score
with m= 8 ψBLEU score
with m= 12 ψBLEU score
with m= 16 ψ
γ= 0.30.838967 11.429 0.892745 5.714 0.908537 3.81 0.910184 2.857
γ= 0.60.835863 20.0 0.897143 10.0 0.90815 6.667 0.900468 5.0
γ= 0.90.810322 80.0 0.895306 40.0 0.898784 26.667 0.910554 20.0
γ= 0.95 0.779685 160.0 0.875814 80.0 0.873426 53.333 0.877221 40.0
248121620
0
0.2
0.4
0.6
0.8
1
BLEU Score
SNR = 0dB
SNR = 6dB
SNR = 12dB
SNR = 18dB
Fig. 11. The BLEU scores of different SNRs versus quantization level, m,
under Rician fading channels with the refined CSI.
score and the quantization bit number, m, under the Rician
fading channels, where mis defined in (18), and the system
is quantized with QAT when the mis smaller than 2. The
performance with m= 8 to m= 20 is similar, which indicates
that the effectiveness of low-resolution neural networks. If
the system is more sensitive to power consumption and can
tolerant to cerain performance degradation, the resolution of
the neural networks can be further reduced to 4-bits level.
However, the BLEU score decreases dramatically from m= 4
to m= 2 over the whole SRN range since most key
information are removed in the low-resolution neural network.
Table II compares the BLEU scores and compression ratios
under different combinations of weights pruning and weights
quantization with SNR = 12 dB, where the compression ratio
is computed by
ψ=M×32
Mpruned ×m,(24)
where Mis the number of weights before pruning and Mpruned
is the number of weights remaining after pruning, 32 is the
number of required bits for FP32 and mis the number of the
required bits after quantization. The performance decreases
when γincreases or mdecreases, which are consistent with
Fig. 15 and Fig. 11. From the table, different compression
ratios could lead to similar performance. For example, the
BLEU score with γ= 30% and m= 8 is similar to that
with γ= 90% and m= 12, but the compression ratio is
about five times different, i.e., 5.714 and 26.667. By properly
choosing a suitable sparsity ratio and a quantization level, the
same performance can be achieved but with high compression
ratio.
V. CONCLUSION
In this paper, we proposed a lite distributed semantic com-
munication system, named L-DeepSC, for Internet of Things
(IoT) networks, where the participating devices are usually
with limited power and computing capabilities. Specially, the
receiver and feature extractor were designed jointly for text
transmission. Firstly, we analyzed the effectiveness of CSI
in forward-propagation and back-propagation during system
training over the fading channels. The analytical results reveal
that the fading channels contaminate the weights update and
restrict model representation capability. Thus, a refined LS
estimator with less pilot overheads was developed to eliminate
the effects from fading channels. Besides, we map the full-
resolution original constellation into finite bits constellation
to match the current antenna design, which was verified
by simulation results. Finally, due to the limited narrow
bandwidth and computational capability in IoT networks, two
model compression approaches have been proposed: 1) the
network sparsification to prune the unnecessary weights, and
2) network quantization to reduce the weights resolution.
The simulation results validated that the proposed L-DeepSC
outperforms the traditional methods, especially in the low
SNR regime, and has provided insights in the balance among
compression ratio, sparsity ratio, and quantization level. There-
fore, our proposed L-DeepSC is a promising candidate for
intelligent IoT networks, especially in the low SNR regime.
REFERENCES
[1] L. Atzori, A. Iera, and G. Morabito, “The internet of things: a survey,
Computer Networks, vol. 54, no. 15, pp. 2787–2805, Oct. 2010.
[2] T. Qiu, N. Chen, K. Li, M. Atiquzzaman, and W. Zhao, “How can
heterogeneous internet of things build our future: A survey,” IEEE
Commun. Surv. Tutorials, vol. 20, no. 3, pp. 2011–2027, Feb. 2018.
[3] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press,
2016.
[4] M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani, “Deep
learning for iot big data and streaming analytics: A survey,IEEE
Commun. Surv. Tutorials, vol. 20, no. 4, pp. 2923–2960, Jun. 2018.
[5] H. Li, K. Ota, and M. Dong, “Learning iot in edge: Deep learning for
the internet of things with edge computing,” IEEE Network, vol. 32,
no. 1, pp. 96–101, Jan. 2018.
[6] R. Carnap, Y. Bar-Hillel et al.,An Outline of A Theory of Semantic
Information. RLE Technical Reports 247, Research Laboratory of
Electronics, Massachusetts Institute of Technology., Cambridge MA,
Oct. 1952.
[7] D. Tse and P. Viswanath, Fundamentals of Wireless Communication.
Cambridge University Press, 2005.
[8] I. Guyon, S. Gunn, M. Nikravesh, and L. A. Zadeh, Feature Extraction:
Foundations and Applications. Springer, 2008, vol. 207.
SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 11
[9] R. Szeliski, Computer Vision: Algorithms and Applications. Springer
Science & Business Media, 2010.
[10] N. Indurkhya and F. J. Damerau, Handbook of Natural Language
Processing. CRC Press, 2010, vol. 2.
[11] C. E. Shannon and W. Weaver, The Mathematical Theory of Communi-
cation. The University of Illinois Press, 1949.
[12] D. Tse and P. Viswanath, Fundamentals Wireless Communication.
Cambridge University Press, 2005.
[13] Z. Qin, H. Ye, G. Y. Li, and B.-H. F. Juang, “Deep learning in physical
layer communications,” IEEE Wireless Commun., vol. 26, no. 2, pp.
93–99, Apr. 2019.
[14] E. Bourtsoulatze, D. B. Kurka, and D. G¨
und¨
uz, “Deep joint source-
channel coding for wireless image transmission,” IEEE Trans. Cogn.
Commun. Netw., vol. 5, no. 3, pp. 567–579, May 2019.
[15] M. Jankowski, D. G¨
und¨
uz, and K. Mikolajczyk, “Deep
joint transmission-recognition for power-constrained iot devices,
arXiv:2003.02027, 2020. [Online]. Available: https://arxiv.org/abs/2003.
02027
[16] N. Farsad, M. Rao, and A. Goldsmith, “Deep learning for joint source-
channel coding of text,” in Proc. IEEE Int’l. Conf. Acoustics Speech
Signal Process. (ICASSP), Calgary, AB, Canada, Apr. 2018, pp. 2326–
2330.
[17] H. Xie, Z. Qin, G. Y. Li, and B.-H. Juang, “Deep learning enabled
semantic communication systems,” arXiv:2006.10685, 2020. [Online].
Available: https://arxiv.org/abs/2006.10685
[18] C. Lee, J. Lin, P. Chen, and Y. Chang, “Deep learning-constructed joint
transmission-recognition for internet of things,” IEEE Access, vol. 7, pp.
76 547–76 561, Jun. 2019.
[19] E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus,
“Exploiting linear structure within convolutional networks for efficient
evaluation,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), Montreal,
Quebec, Canada, Dec. 2014, pp. 1269–1277.
[20] S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and
connections for efficient neural network,” in Proc. Adv. Neural Inf.
Process. Syst. (NIPS), Montreal, Quebec, Canada, Dec. 2015, pp. 1135–
1143.
[21] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang, “Learning
efficient convolutional networks through network slimming,” in Proc.
IEEE Int’l. Conf. on Comput. Vis. (ICCV), Venice, Italy, Oct. 2017, pp.
2755–2763.
[22] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning
filters for efficient convnets,” in Proc. IEEE Int’l. Conf. on Learning
Representations (ICLR), Toulon, France, Apr. 2017.
[23] R. Krishnamoorthi, “Quantizing deep convolutional networks for
efficient inference: A whitepaper,arXiv:1806.08342, 2018. [Online].
Available: http://arxiv.org/abs/1806.08342
[24] Y. Gong, L. Liu, M. Yang, and L. Bourdev, “Compressing deep
convolutional networks using vector quantization,arXiv:1412.6115,
2014. [Online]. Available: http://arxiv.org/abs/1412.6115
[25] A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen, “Incremental network
quantization: Towards lossless cnns with low-precision weights,” in
Proc. IEEE Int’l. Conf. on Learning Representations (ICLR), Toulon,
France, Apr. 24-26, 2017.
[26] F. Li, B. Zhang, and B. Liu, “Ternary weight networks,”
arXiv:1605.04711, 2016. [Online]. Available: http://arxiv.org/abs/1605.
04711
[27] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. G. Howard, H. Adam,
and D. Kalenichenko, “Quantization and training of neural networks for
efficient integer-arithmetic-only inference,” in Proc. IEEE Conf. Comput.
Vis. Pattern Recognit. (CVPR), Salt Lake City, UT, USA, Jun. 18-22,
2018, pp. 2704–2713.
[28] J. Guo, J. Wang, C.-K. Wen, S. Jin, and G. Y. Li, “Compression and
acceleration of neural networks for communications,” IEEE Wireless
Commun., Early Access.
[29] D. Gil, A. Ferr´
andez, H. Mora-Mora, and J. Peral, “Internet of things: A
review of surveys based on context aware intelligent services,” Sensors,
vol. 16, no. 7, p. 1069, Jul. 2016.
[30] B. Zhu, J. Wang, L. He, and J. Song, “Joint transceiver optimization for
wireless communication phy using neural network,” IEEE J. Sel. Areas
Commun., vol. 37, no. 6, pp. 1364–1373, Mar. 2019.
[31] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian
denoiser: Residual learning of deep cnn for image denoising,” IEEE
Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, Feb. 2017.
[32] C. Tian, Y. Xu, Z. Li, W. Zuo, L. Fei, and H. Liu, “Attention-guided cnn
for image denoising,” Neural Netw., vol. 124, pp. 117–129, Apr. 2020.
[33] Y. Bengio, N. L´
eonard, and A. Courville, “Estimating or propagating
gradients through stochastic neurons for conditional computation,”
arXiv:1308.3432, 2013. [Online]. Available: http://arxiv.org/abs/1308.
3432
[34] P. Koehn, “Europarl: A parallel corpus for statistical machine transla-
tion,” in MT summit, vol. 5, 2005, pp. 79–86.
[35] I. S. Reed and G. Solomon, “Polynomial codes over certain finite fields,”
J. the Society for Industrial and Applied Math., vol. 8, no. 2, pp. 300–
304, Jan. 1960.
[36] K. Papineni, S. Roukos, T. Ward, and W. Zhu, “Bleu: a method for
automatic evaluation of machine translation,” in Proc. Annual Meeting
Assoc. Comput. Linguistics (ACL), Philadelphia, PA, USA, Jul. 2002,
pp. 311–318.
Article
Full-text available
This paper addresses the problem of semantic communications (SemComs) in intelligent machine-to-machine (M2M) applications. Although M2M applications may employ other languages as the communication medium, natural languages are commonly used as the medium between machines and robots. One favorable characteristic of using natural languages is that it allows humans to inspect communication contents easily, which caters to the needs of security and quality of service for M2M communication. Currently, no exact solutions are available for quantifying and measuring the understanding of M2M communication. This paper identifies three specific challenges in the field: inconsistent knowledge base (KB), cross-domain interpretation, and a measure for understanding the meaning of messages. We propose a model to address these challenges in two steps. First, we propose an evidence-based shared-KB communication model for cross-domain meaning interpretation using Dewey Decimal Classification. Second, we propose a measure to quantify the understanding level through a two-stage validation between the sender and receiver. Real-life datasets and numerical experiments are used to evaluate the model’s performance. The results show that the degree of understanding (DoU) can be successfully measured by observing the performance of the sender and receiver under the same conditions. The proposed method can effectively improve mutual understanding between the two machines.
Article
Two new models for semantic communication systems are proposed. The first model incorporates the convolutional block attention module, which considers attention techniques in both the channel and spatial domains. The second model applies the efficient channel attention (ECA) network with reduced complexity. Experimental results demonstrate that the convolutional block attention module‐equipped model improved signal‐to‐distortion ratio performance by at a signal‐to‐noise ratio of while maintaining a similar number of parameters compared to the existing model using squeeze‐and‐excitation network. Meanwhile, the efficient channel attention‐equipped model reduced parameters by approximately without any degradation in performance compared to the existing model.
Article
Artificial intelligence (AI) has become a promising solution for meeting the stringent performance requirements on wireless physical layer in sixth-generation (6G) communication systems, due to its strong ability to learn complex model, achieve end-to-end optimization and adapt to dynamic environments. This article provides a comprehensive review with respect to artificial intelligence for wireless physical-layer technologies (AI4PHY). Specifically, we first analyze the characteristics of the classic AI techniques and their potential applications for physical-layer technologies. Then we study the AI-enhanced designs from the point of view of the basic physical-layer modules, including coding, modulation, multiple access, multiple-input-multiple-output (MIMO), channel estimation, as well as relay transmission. The standardization progress of AI4PHY in 3GPP is also discussed. Based on the current AI4PHY researches, we propose some potential future research directions to inspire and encourage the further exploration.
Article
Full-text available
Recently, deep learned enabled end-to-end (E2E) communication systems have been developed to merge all physical layer blocks in the traditional communication systems, which make joint transceiver optimization possible. Powered by deep learning, natural language processing (NLP) has achieved great success in analyzing and understanding a large amount of language texts. Inspired by research results in both areas, we aim to provide a new view on communication systems from the semantic level. Particularly, we propose a deep learning based semantic communication system, named DeepSC, for text transmission. Based on the Transformer, the DeepSC aims at maximizing the system capacity and minimizing the semantic errors by recovering the meaning of sentences, rather than bit-or symbol-errors in traditional communications. Moreover, transfer learning is used to ensure the DeepSC applicable to different communication environments and to accelerate the model training process. To justify the performance of semantic communications accurately, we also initialize a new metric, named sentence similarity. Compared with the traditional communication system without considering semantic information exchange, the proposed DeepSC is more robust to channel variation and is able to achieve better performance, especially in the low signal-to-noise (SNR) regime, as demonstrated by the extensive simulation results.
Article
Full-text available
In this article, we develop an end-to-end wireless communication system using deep neural networks (DNNs), where DNNs are employed to perform several key functions, including encoding, decoding, modulation, and demodulation. However, an accurate estimation of instantaneous channel transfer function, i.e., channel state information (CSI), is needed in order for the transmitter DNN to learn to optimize the receiver gain in decoding. This is very much a challenge since CSI varies with time and location in wireless communications and is hard to obtain when designing transceivers. We propose to use a conditional generative adversarial net (GAN) to represent channel effects and to bridge the transmitter DNN and the receiver DNN so that the gradient of the transmitter DNN can be back-propagated from the receiver DNN. In particular, a conditional GAN is employed to model the channel effects in a data-driven way, where the received signal corresponding to the pilot symbols is added as a part of the conditioning information of the GAN. To address the curse of dimensionality when the transmit symbol sequence is long, convolutional layers are utilized. From the simulation results, the proposed method is effective on additive white Gaussian noise (AWGN) channels, Rayleigh fading channels, and frequency-selective channels, which opens a new door for building data-driven DNNs for end-to-end communication systems.
Article
Full-text available
This paper proposes a deep learning-based channel estimation method for multi-cell interference-limited massive MIMO systems, in which base stations equipped with a large number of antennas serve multiple single-antenna users. The proposed estimator employs a specially designed deep neural network (DNN) based on the deep image prior (DIP) network to first denoise the received signal, followed by conventional least-squares (LS) estimation. We analytically prove that our LS-type deep channel estimator can approach minimum mean square error (MMSE) estimator performance for high-dimensional signals, while avoiding complex channel inversions and knowledge of the channel covariance matrix. This analytical result, while asymptotic, is observed in simulations to be operational for just 64 antennas and 64 subcarriers per OFDM symbol. The proposed method also does not require any training and utilizes several orders of magnitude fewer parameters than conventional DNNs. The proposed deep channel estimator is also robust to pilot contamination and can even completely eliminate it under certain conditions.
Article
Full-text available
The widely deployed Internet of things (IoT) devices provide intelligent services with its cognition capability. Since IoT data is usually transmitted to server for recognition (e.g., image classification) due to low computational capability and limited power supply, achieving recognition accuracy under limited bandwidth and noisy channel of wireless networks is a crucial but challenging task. In this paper, we propose a deep learning-constructed joint transmission-recognition scheme for IoT devices to effectively transmit data wirelessly to server for recognition, jointly considering transmission bandwidth, transmission reliability, complexity, and recognition accuracy. Compared to other schemes that may be deployed on IoT devices, i.e., a scheme based on JPEG compression and two compressed sensingbased schemes, the proposed deep neural network-based scheme has much higher recognition accuracy under various transmission scenarios at all signal-to-noise ratio (SNR). In particular, the proposed scheme maintains good performance at very low SNR. Moreover, the complexity of the proposed scheme is low, making it suitable for IoT applications. Finally, a transfer learning-based training method is proposed to effectively mitigate the computing burden and reduce overhead of online training.
Article
DL has achieved great success in signal processing and communications and has become a promising technology for future wireless communications. Existing works mainly focus on exploiting DL to improve the performance of communication systems. However, the high memory requirement and computational complexity constitute a major hurdle for the practical deployment of DL-based communications. In this article, we investigate how to compress and accelerate the neural networks (NNs) in communication systems. After introducing the deployment challenges for DL-based communication algorithms, we discuss some representative NN compression and acceleration techniques. Afterwards, two case studies for multiple-input-multiple-output (MIMO) communications, including DL-based channel state information feedback and signal detection, are presented to show the feasibility and potential of these techniques. We finally identify some challenges on NN compression and acceleration in DL-based communications and provide a guideline for subsequent research.
Article
Deep convolutional neural networks (CNNs) have attracted considerable interest in low-level computer vision. Researches are usually devoted to improving the performance via very deep CNNs. However, as the depth increases, influences of the shallow layers on deep layers are weakened. Inspired by the fact, we propose an attention-guided denoising convolutional neural network (ADNet), mainly including a sparse block (SB), a feature enhancement block (FEB), an attention block (AB) and a reconstruction block (RB) for image denoising. Specifically, the SB makes a tradeoff between performance and efficiency by using dilated and common convolutions to remove the noise. The FEB integrates global and local features information via a long path to enhance the expressive ability of the denoising model. The AB is used to finely extract the noise information hidden in the complex background, which is very effective for complex noisy images, especially real noisy images and bind denoising. Also, the FEB is integrated with the AB to improve the efficiency and reduce the complexity for training a denoising model. Finally, a RB aims to construct the clean image through the obtained noise mapping and the given noisy image. Additionally, comprehensive experiments show that the proposed ADNet performs very well in three tasks (i.e. synthetic and real noisy images, and blind denoising) in terms of both quantitative and qualitative evaluations. The code of ADNet is accessible at http://www.yongxu.org/lunwen.html.
Article
The idea of end-to-end learning of communication systems through neural network (NN)-based autoencoders has the shortcoming that it requires a differentiable channel model. We present in this paper a novel learning algorithm which alleviates this problem. The algorithm enables training of communication systems with an unknown channel model or with non-differentiable components. It iterates between training of the receiver using the true gradient, and training of the transmitter using an approximation of the gradient. We show that this approach works as well as model-based training for a variety of channels and tasks. Moreover, we demonstrate the algorithm’s practical viability through hardware implementation on software defined radios (SDRs) where it achieves state-of-theart performance over a coaxial cable and wireless channel.
Article
We propose a joint source and channel coding (JSCC) technique for wireless image transmission that does not rely on explicit codes for either compression or error correction; instead, it directly maps the image pixel values to the complex-valued channel input symbols. We parameterize the encoder and decoder functions by two convolutional neural networks (CNNs), which are trained jointly, and can be considered as an autoencoder with a non-trainable layer in the middle that represents the noisy communication channel. Our results show that the proposed deep JSCC scheme outperforms digital transmission concatenating JPEG or JPEG2000 compression with a capacity achieving channel code at low signal-to-noise ratio (SNR) and channel bandwidth values in the presence of additive white Gaussian noise (AWGN). More strikingly, deep JSCC does not suffer from the “cliff effect”, and it provides a graceful performance degradation as the channel SNR varies with respect to the SNR value assumed during training. In the case of a slow Rayleigh fading channel, deep JSCC learns noise resilient coded representations and significantly outperforms separation-based digital communication at all SNR and channel bandwidth values.