PreprintPDF Available

A Lite Distributed Semantic Communication System for Internet of Things

July 2020
IEEE Journal on Selected Areas in Communications PP(99)

July 2020
PP(99)

DOI:10.1109/JSAC.2020.3036968

Authors:

Huiqiang Xie

Jinan University (Guangzhou, China)

Zhijin Qin

Queen Mary, University of London

Preprints and early-stage research may not have been peer reviewed yet.

The rapid development of deep learning (DL) and widespread applications of Internet-of-Things (IoT) have made the devices smarter than before, and enabled them to perform more intelligent tasks. However, it is challenging for any IoT device to train and run a DL model independently due to its limited computing capability. In this paper, we consider an IoT network where the cloud/edge platform performs the DL based semantic communication (DeepSC) model training and updating while IoT devices perform data collection and transmission based on the trained model. To make it affordable for IoT devices, we propose a lite distributed semantic communication system based on DL, named L-DeepSC, for text transmission with low complexity, where the data transmission from the IoT devices to the cloud/edge works at the semantic level to improve transmission efficiency. Particularly, by pruning the model redundancy and lowering the weight resolution, the L-DeepSC becomes affordable for IoT devices and the bandwidth required for model weight transmission between IoT devices and the cloud/edge is reduced significantly. Through analyzing the effects of fading channels in forward-propagation and back-propagation during the training of L-DeepSC, we develop a channel state information (CSI) aided training processing to decrease the effects of fading channels on transmission. Meanwhile, we tailor the semantic constellation by quantization for the current antenna design. Simulation demonstrates that the proposed L-DeepSC achieves competitive performance compared with traditional methods, especially in the low signal-to-noise (SNR) region. In particular, while it can reach as large as 20x compression ratio without performance degradation.

The proposed CSI refinement and cancellation based on de-noise neural networks.

…

The BLEU scores of different SNRs versus quantization level, m, under Rician fading channels with the refined CSI.

…

Figures - uploaded by Huiqiang Xie

Content may be subject to copyright.

Content uploaded by Huiqiang Xie

Content may be subject to copyright.

SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 1

A Lite Distributed Semantic Communication

System for Internet of Things

Huiqiang Xie and Zhijin Qin

Abstract—The rapid development of deep learning (DL) and

widespread applications of Internet-of-Things (IoT) have made

the devices smarter than before, and enabled them to perform

more intelligent tasks. However, it is challenging for any IoT

device to train and run a DL model independently due to its

limited computing capability. In this paper, we consider an IoT

network where the cloud/edge platform performs the DL based

semantic communication (DeepSC) model training and updating

while IoT devices perform data collection and transmission based

on the trained model. To make it affordable for IoT devices, we

propose a lite distributed semantic communication system based

on DL, named L-DeepSC, for text transmission with low com-

plexity, where the data transmission from the IoT devices to the

cloud/edge works at the semantic level to improve transmission

efﬁciency. Particularly, by pruning the model redundancy and

lowering the weight resolution, the L-DeepSC becomes affordable

for IoT devices and the bandwidth required for model weight

transmission between IoT devices and the cloud/edge is reduced

signiﬁcantly. Through analyzing the effects of fading channels in

forward-propagation and back-propagation during the training

of L-DeepSC, we develop a channel state information (CSI) aided

training processing to decrease the effects of fading channels on

transmission. Meanwhile, we tailor the semantic constellation

by quantization for the current antenna design. Simulation

demonstrates that the proposed L-DeepSC achieves competitive

performance compared with traditional methods, especially in

the low signal-to-noise (SNR) region. In particular, while it can

reach as large as 20x compression ratio without performance

degradation.

Index Terms—Internet of Things, neural network compression,

pruning, quantization, semantic communication.

I. INTRODUCTION

With the widely deployed connected devices, Internet of

Things (IoT) networks are providing more and more intelligent

services, i.e., smart home, intelligent manufacturing, and smart

cities, by processing a massive amount of data generated by

those connected devices [1], [2]. Deep learning (DL) [3] has

demonstrated great potentials in processing various types of

data, i.e., images and texts. The DL-enabled IoT devices are

capable of exploiting and processing different types of data

more effectively as well as handling more intelligent tasks than

before. Although some IoT devices have certain capability to

process simple DL models, the limited memory, computing,

and battery capability still prevent from wide applications of

DL [4]. Therefore, the burden of DL model updates is usually

transferred to the cloud/edge platform [5]. Particularly, the DL

model is trained at the cloud/edge platform based on data

from the IoT devices, and then the trained model is distributed

Huiqiang Xie and Zhijin Qin are with the School of Electronic Engineering

and Computer Science, Queen Mary University of London, London E1 4NS,

UK (e-mail: h.xie@qmul.ac.uk, z.qin@qmul.ac.uk).

to IoT devices. However, data transmitted over the air could

be distorted by wireless channels, which may cause improper

trained results, i.e., local optimum. Moreover, the large number

of parameters in DL models leads to high latency when

distributing the DL models with limited bandwidth. Therefore,

transmitting accurate data to the cloud/edge platform over

wireless channels for model training and reducing the number

of parameters in DL models for lower latency and power

consumption at the IoT devices are two crucial problems.

To address the ﬁrst problem on accurate data transmission

in an IoT network, semantic communication system, which

interprets information at the semantic level rather than bit

sequences [6], is promising. To make a decision from the

received information, there are usually three steps, i) the

traditional communication receiver to recover the raw data [7],

ii) the feature extractor to obtain and interpret the meanings

of the raw data for the decision [8], and iii) the effects

due to decisions according to the extracted features [9],

[10]. Corresponding to the three steps, the communication is

categorized into three levels correspondingly [11], including

transmission level, semantic level, and effectiveness level, as

explained in Fig. 1. The traditional communication system

works at the transmission level shown in Fig. 1(a), which

aims to transmitting and receiving symbol accurately [12].

The followed feature extractor network and effect networks are

designed separately based on applications. However, designing

these modules separately may lead to error propagation and

prevent from reaching joint optimality. For example, the fea-

ture network is not able to correct errors from the traditional

receiver, which will affect the subsequent decision making in

the effect network. Thus, through designing the traditional

receiver and feature extractor network jointly (the semantic

level) or merging traditional receiver, feature extractor net-

work, and effects network together (the effectiveness level),

communication systems have the capability of error correction

at the semantic level and effectiveness level, respectively.

In this paper, we will focus on distributed semantic com-

munications for IoT networks and leave effectiveness level

communication to the future research.

With the recent advancements on DL, it is promising to

represent a traditional transceiver or each individual signal

processing block by a deep neural network (DNN) [13].

There have been some initial works related to deep semantic

communications [14]–[17]. Bourtsoulatze et al. [14] proposed

joint source-channel coding for wireless image transmission

based on the convolutional neural network (CNN), where

peak signal-to-noise ratio (PSNR) is used to measure the

accuracy of image recovery at the receiver. Taking image

arXiv:2007.11095v1 [eess.SP] 21 Jul 2020

SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 2

Traditional

Receiver

Feature Extractor

Network

Effect

Network

Effect

Network

Semantic Receiver

Effect Receiver

(a) Transmission level

(b) Semantic level

Received

symbols

Recovered

symbols

Extracted

Features

Received

Features

Recovered

Features

Take

Action

Take

Action

Take

Action

Received

Features

Fig. 1. Illustration of three communication levels at the receiver.

classiﬁcation tasks into consideration, Lee et al. [18] devel-

oped an transmission-recognition communication system by

merging wireless image transmission with the effect network

as DNNs, i.e., image classiﬁcation, which achieves higher

image classiﬁcation accuracy than performing them separately.

For texts, Farsad et al. [16] designed joint source-channel

coding for erasure channel by using a recurrent neural network

(RNN) and a fully-connected neural network (FCN), where

the system recovers the text directly rather than perform

channel and source decoding separately. In order to understand

texts better and apply it in dynamic environments, Xie et

al. [17] developed a semantic communication system based

on Transformer, named DeepSC, which clariﬁes the concepts

of semantic information and semantic error at the sentence-

level for the ﬁrst time. In brief, compared with traditional

approaches, the semantic communication systems are more

robust to channel variation and are able to achieve better per-

formance in terms of source recovery and image classiﬁcation,

especially in the low signal-to-noise (SNR) regime.

To deal with the second problem on reducing the number of

parameters, network slimmer has attracted extensive attention

to compress DL models without degrading performance since

neural networks are trained usually with over-parameters [19].

Parameters pruning and quantization are two main approaches

for DL model compression. Parameter pruning is to remove

the unnecessary connections between two neurons or impor-

tant neurons. Han et al. [20] proposed an iterative pruning

approach, where the model is trained ﬁrst, then pruned by a

given threshold, and is ﬁnely tuned to recover performance in

terms of image classiﬁcation. This approach could reduce the

connections without losing accuracy. Liu et al. [21] proposed

to prune the ﬁlters in CNN by training the model with

the regularization loss function so that redundancy weights

converge to zero directly without sacriﬁcing the performance.

By analyzing the connection sensitivity among neurons and

layers, Li et al. [22] remove the insensitive layers, which

further increases inference speed. By applying these pruning

approaches, DL models can be compressed by 13 to 20 times.

Quantization aims to represent a weight parameter with lower

precision (fewer bits), which reduces the required bitwidth of

data ﬂowing through the neural network model in order to

shrink the model size for memory saving and simplify the

operations for computing acceleration [23]. With vector quan-

tization, Gong et al. [24] quantize the DL models. Similarly,

Zhou et al. [25] investigated an iterative quantization, which

starts with a trained full-resolution model and then quantizes

only a portion of the model followed by several epochs of

re-training to recover the accuracy loss from quantization.

A mix precision quantization by Li et al. [26] quantizes

weights while keeping the activations at full-resolution. The

training algorithm by Jacob et al. [27] preserves the model

accuracy after post-quantization. With the quantization, the

weights can generally be compressed from 32-bit to 8-bit

without performance loss. Similarly, pruning and quantizing

can be also used in DL-enabled communication systems. For

example, Guo et al. [28] have shown that model compression

can accelerate the processing of channel state information

(CSI) acquisition and signal detection in massive multiple-

input multiple-output (MIMO) systems without performance

degradation.

Through applying network slimmer into our existing work

DeepSC, the aforementioned two challenges in IoT networks

can be effectively addressed. Although the above works vali-

date the feasibility, we still face the following issues for make

it affordable for IoT devices:

•Question 1: How to design semantic communication

system over wireless fading channels?

•Question 2: How to form the constellation to reduce the

burden on antenna?

•Question 3: How to compress semantic models for fast-

model transmission and low-cost implementation on IoT

devices?

In this paper, we design a distributed semantic communication

system for IoT networks. Specially, a lite DeepSC is proposed

(L-DeepSC) to address the above questions. The main contri-

butions of this paper are summarized as follows.

•We design a distributed semantic communication network

under power and latency constraints, in which the receiver

and feature extractor networks are jointly optimized by

overcoming fading channels.

•By identifying the impacts of CSI on DL model training

over fading channels, we propose a CSI-aided semantic

communication system to speed up convergence, where

the CSI is reﬁned by a de-noise neural network. This

addresses aforementioned Question 1.

•To alleviate the burden on antenna for data transmission

and receiving, we design a ﬁnite-bits constellation to

solve Question 2.

•Due to over-parametrization, we propose a model com-

pression algorithm, including network sparsiﬁcation and

quantization, to reduce the size of DL models by pruning

the redundancy connections and quantizing the weights,

which addresses aforementioned Question 3.

The rest of this paper is organized as follows. The dis-

tributed semantic communication system model is introduced

and the corresponding problems are identiﬁed in Section II.

Section III presents the proposed L-DeepSC. Numerical results

are used to verify the performance of the proposed L-DeepSC

in Section IV. Finally, Section V concludes this paper.

Notation:Cn×mand Rn×mrepresent the sets of complex

and real matrices of size n×m, respectively. Bold-font

variables denote matrices or vectors. x∼ CN (µ, σ 2)means

SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 3

Semantic

features

Devices

Cloud/Edge Computing



Semantic Encoder Channel Encoder

Semantic Decoder Channel Decoder

Physical Channel

Source

Recovered

Feature/Source

(a) Proposed distributed semantic

communication network. (b) Semantic Communication System

 Model Initialization/Update

(b) Semantic communication system

Distributed Semantic communication Network

Transmitter

Semantic

Decoder

Receiver

Physical Channel

Semantic Channel

Channel

Decoder

Channel

Encoder

Training

Dataset

Semantic

Encoder

Training

Dataset

Semantic

Encoder Channel

Decoder Semantic

Decoder

Physical

Channel

Semantic

Channel

Transmitter Receiver

Fig. 2. The framework of semantic communications for IoT networks.

variable xfollows the circularly-symmetric complex Gaussian

distribution with mean µand covariance σ2.(·)Tand (·)H

denote the transpose and Hermitian of a vector or a matrix,

respectively. <{·} and ={·} refer to the real and the imaginary

parts of a complex number.

II. SY ST EM MO DE L AN D PROB LE M FOR MU LATI ON

Text is an important type of source data, which can be

sensed from speaking and typing, environmental monitoring,

etc. By training DL models with these text data at cloud/edge

platform, the DL models based IoT devices have the capability

to understand text data and generates semantic feature to be

transmitted to the center to perform intelligent tasks, i.e.,

intelligent assistants, human emotion understanding, and envi-

ronment humid and temperature adjustment based on human

preference [29].

As shown in Fig. 2(a), we focus on distributed seman-

tic communications for IoT networks. The considered sys-

tem is consisted of various IoT networks with two layers,

the cloud/edge platform and distributed IoT devices. The

cloud/edge platform is equipped with huge computation power

and big memory, which can be used to train the DL model by

the received semantic features. The semantic communication

enabled IoT devices perform intelligent tasks by understanding

sensed texts, which are with limited memory and power

but expected long lifetime, i.e., up to 10 years. Particularly,

our considered distributed semantic communication system

consists of the following three steps:

1) Model Initialization/Update: The cloud/edge platform

ﬁrst trains the semantic communication model by initial

dataset. The trained model is updated in the subsequent

iterations by the received semantic features from IoT

devices.

2) Model Broadcasting: The cloud/edge platform broad-

casts the trained DL model to each IoT devices.

3) Semantic Features Upload: The IoT devices constantly

capture the text data, which are encoded by the proposed

semantic transmitter shown in Fig. 2(b). The extracted

semantic features are then transmitted to the cloud/edge

for model update and subsequent processing.

The aforementioned Questions 1-3 correspond to model ini-

tialization/update, semantic features uploading and model

broadcasting, respectively. Different from the traditional in-

formation transmission, semantic features can be not only

used for recovering the text at semantic level accurately, but

also exploited as the input of others modules, i.e., emotion

classiﬁcation, dialog system, and human-robot interaction, for

training effect networks and perform various intelligent tasks

directly. The devices can also exchange semantic features,

which has been previously discussed in our work in [17]. We

focus on the communication between cloud/edge platform and

local IoT devices to make the semantic communication model

affordable.

A. Semantic Communication System

The DeepSC shown in Fig. 2(b) can be divided into three

parts mainly, transmitter network, physical channel, and re-

ceiver network, where the transmitter network includes se-

mantic encoder and channel encoder, and the receiver network

consists of semantic decoder and channel decoder.

We assume that the input of the DeepSC is a sentence,

s= [w1, w2,· · · , wN], where wnrepresents the n-th word in

the sentence. The encoded symbol stream can be represented

X=CαSβ(s),(1)

where Sβ(·)is the semantic encoder network with parameter

set βand Cα(·)is the channel encoder with parameter set α.

If Xis sent to a wireless fading channel, the signal received

at the receiver can be given by

Y=fH(X) = HX +N,(2)

SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 4

where H1represents the channel gain between the transmitter

and the receiver, and N∼ CN 0, σ 2

nis additive white

Gaussian noise (AWGN).

The decoded signal can be represented as

ˆs =S−1

χC−1

δ(Y),(3)

where ˆs is the recovered sentence, C−1

δ(·)is the channel

decoder with parameter set δand S−1

χ(·)is the semantic

decoder network with parameter set χ.

The whole semantic communication can be trained by the

cross-entropy (CE) loss function, which is given by

LCE(s,ˆs) = X

i=1

(q(wi)−1) log (1 −p(wi))

−X

i=1

q(wi) log (p(wi)),(4)

where q(wi)is the real probability that the i-th word, wi,

appears in source sentence s, and p(wi)is the predicted

probability that the i-th word, wi, appears in ˆ

s. CE can

measure the difference between the two distributions. Through

minimizing the CE loss, the network can learn the word

distribution, q(wi), in the source sentence, s. Consequently,

the syntax, phrase, and the meaning of words in the context

can be learnt by DNNs.

B. Problem Description

Instead of bits, the input sentence, s, in the DeepSC, will

cause that the learned constellation is no longer limited to

a few points any more. After transmitting X, the fading

channel increases the difﬁculty of model training compared

with the AWGN channel. Meanwhile, the huge number of

parameters, α,β,χ,δ, indicates the complexity of the whole

model. These factors limit DeepSC for IoT networks, and

incur the aforementioned Questions 1-3, including feasible

constellation design, training for fading channel, and model

compression.

1) Training of fading channel: In DL, the training process

can be divided forward-propagation to predict the target and

back-propagation to converge the neural network, as stated in

the following.

Forward-propagation: From the received signal to recover

semantic information, the estimation sentence is given by

ˆs =S−1

χC−1

δ(Y),(5)

Back-propagation: Taking semantic encoder as an exam-

ple, the parameter vector at the tth iteration are is updated by

β(t) = β(t−1) −η∂LCE

∂β,(6)

where ηis the learning rate and ∂LCE

∂βis the gradient, computed

by ∂LCE

∂β=∂LCE

∂ˆs

∂Y

∂X

∂β

=∂LCE

∂ˆs

∂YH∂X

∂β.

(7)

1Here, we have avoided discussion of complex channels. If the complex

channel is ¯

H, then ¯

H= [<(H),−= (H) ; =(H),<(H)].

In (7), Hwill introduce stochasticity during weight updat-

ing. For an AWGN channel, H=Iwill not affect it. The

DL model, thus, can achieve the global optimum. However,

for fading channels, His random, which leads to that β

fails to converge to the global optimum while the forward-

propagation in (5) is unable to recover semantic information

accurately based on the local optimum. Thus, it is critical to

design training process to mitigate the effects of H, which

also makes the DeepSC applicable for fading channels.

2) Feasible constellation design: Generally, the DL mod-

els run on ﬂoating-point operations (FLOPs), which means

that the input, output, and weights are in a large range of

±1.40129 ×10−45 to ±3.40282 ×10+38. Although DeepSC

can learn the constellations from the source information and

channel statistics, the learned constellation points, such as

cluster constellation [30], are disordered in the range of

±1.40129 ×10−45 to ±3.40282 ×10+38, which brings addi-

tional burden to the antenna design for IoT devices. Therefore,

it is desired to form feasible constellation with only ﬁnite

points for the current radio frequency (RF) systems. In other

words, we have to design a smaller constellation for the

DeepSC.

3) Model communication: The more parameters DeepSC

has, the stronger its signal processing ability, which however

increase computational complexity and model size and result

in high power consumption. In the distributed DeepSC system,

the trained DeepSC model deployed at local IoT devices is

frequently updated to perform intelligent tasks better. The

IoT application limits the bandwidth and cost of distributing

the DeepSC model. Furthermore, to extend the IoT network

lifetime, especially the battery lifetime, most local devices

are with ﬁnite storage and computation capability, which

limits the size of DeepSC. Therefore, compressing DeepSC

not only reduces the latency of model transmission between

the cloud/edge platform and local devices but also makes it

possible to run the DL model on local devices.

III. PROP OS ED LI TE DISTRIBUTED SE MA NT IC

COMMUNICATION SYS TE M

To address the identiﬁed challenges in Section II, we pro-

pose a lite distributed semantic communication system, named

L-DeepSC. We analyze the effects of CSI in the model training

under fading channels and design a CSI-aided training process

to overcome the fading effects, which successfully deals with

Question 1. Besides, the weight pruning and quantization are

investigated to address Question 2. Finally, our ﬁnite-points

constellation design solves Question 3, effectively.

A. Deep De-noise Network based CSI Reﬁnement and Can-

cellation

The most common method to reduce the effects of fading

channel in wireless communication is to use known channel

properties of a communication link, CSI. Similarly, CSI can

also reduce the channel impacts in training L-DeepSC. Next,

we will ﬁrst analyze the role of CSI in L-DeepSC training.

SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 5

In order to simplify the analysis, we assume the transmitter

and the receiver are with one-layer dense with sigmoid activa-

tion, where transmitter has an additional untrainable embed-

ding layer, and receiver also has an untrainable de-embedding

layer. The IoT devices are with the trained transmitter model

and the cloud/edge platform works as the receiver, as shown

in the system model Fig. 2. The IoT devices and cloud/edge

platform are equipped with the same number of antennas.

After the embedding layer, the source message, s, is embedded

into, S. Then, the IoT devices encodes Sinto

X=σ(WTS+bT),(8)

where X2is the semantic features transmitted from the IoT

devices to the cloud/edge platform. WTand bTare the train-

able parameters to extract the features from source message

s, and σ(·)is the sigmoid activation function.

The received symbol at the cloud/edge platform is affected

by channel Hand AWGN as in (2). From the received symbol,

the cloud/edge platform recovers the embedding matrix by

S=σ(WRY+bR),(9)

where the estimated source message, ˆ

s, can be obtained after

de-embedding layer. WRand bRcan learn to recover s. The

L-DeepSC can be optimized by the loss function in (4). The

fading channels not only contaminates the gradients in the

back-propagation, but also restricts the representation power

in the forward-propagation.

Back-propagation: It updates parameter WTby its gradi-

ent

∂LCE (ˆs,s)

∂WT

= (FRWRHFT)T∇ˆs LCE (ˆs,s)sT,(10)

where FR∼diag (σ0(WRy+bR)) and FT∼

diag (σ0(WTs+bT)). In (10), the His untrainable

and random, therefore it will cause perturbation. If the

transmitter consists of very deep neural networks, the

gradient contamination will affect the back-propagation of

the whole transmitter network.

Forward-propagation: With the received signal WR, the

source messages can be recovered by

S=σ(WRY+bR)

=σ(WRHX +WRN+bR).(11)

In (11), WRhas to learn how to deal with the channel

effects and decode at the same time, which increases training

burden and reduces network expression capability. Meanwhile,

the errors caused by channel effects also propagation to the

subsequent layers for the L-DeepSC receiver with multiple

layers.

The impacts of channel can be mitigated by exploiting CSI

at the cloud/edge. If channel His known, then the received

symbol can be processed by

Y=HHH−1HHY=X+˜

N,(12)

2Here, we have avoided discussion of complex signal. If the complex signal

is ¯

X, then ¯

X= [<(X),=(X)] .

ADNet

Channel

Cancellation Receiver

Hrough

Hrefine

Estimator

pilot

data

Fig. 3. The proposed CSI reﬁnement and cancellation based on de-noise

neural networks.

where ˜

N=HHH−1HHN. In (12), the channel effect

is transferred from multiplicative noise to additive noise, ˜

which provides the possibility of stable back-propagation as

well as the stronger capability of network representation.

With (12), back-propagation and forward-propagation can be

performed by setting H=Iin (10) and (11), respectively.

Therefore, the channel effects can be completely removed.

The above discussion shows the importance of CSI in

model training. However, CSI can be only estimated gen-

erally, by least-squared (LS), linear minimum mean-squared

error (LMMSE), or minimum mean-squared error (MMSE)

estimators. Due to exploiting prior channel statistics, LMMSE

and MMSE estimators usually perform better than the LS

estimators. Thus, LMMSE and MMSE estimators are sensitive

to the accuracy of channel statistic while LS estimator requires

no prior channel information.

For simplicity, we initially use the LS estimator. Then, we

adopt the deep de-noise network to increase the resolution of

LS estimator as in [31] shown in Fig. 3. Particularly, the rough

CSI estimated by LS estimator with few pilots ﬁrst denoted

Hrough =H+N.(13)

From (13), Hrough consists of exact Hand the noise, N. De-

noise neural networks are used to recover Hmore accurately

from Hrough by considering Hand Hrough as the original pic-

ture and noisy picture, respectively. Here, we exploit attention-

guided denoising convolutional neural network (ADNet) [32]

to reﬁne CSI, where the reﬁned CSI, Hrefine denoted by

Hrefine =ADNet Hrough.(14)

In (14), the ADNet(·)is trained the the loss function,

L(Hrefine,H) = 1

2kHrefine −Hk2

F. Since the performance

of the LS estimator is similar to that of LMMSE and MMSE

estimators in the high SNR region, we pay more attention

to the low SNR region when training ADNet. With proper

training, ADNet can mitigate the impacts from noise but

without any prior channel information, especially in the low

SNR region. Such a design provides a good solution for

Question 1.

B. Model Compression

Through applying CSI into model training, the cloud/edge

platform can extract the semantic features from L-DeepSC.

However, the size and complexity of trained L-DeepSC model

are still very large, which cause high latency for the cloud/edge

platform to broadcast updated L-DeepSC. Note that both

SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 6

0.88 0.19 0.35 -2.34

-1.08 0.55 0.93 -0.97

0.53 0.41 0.32 -0.49

-0.79 -0.84 -1.27 0.24

0.88 0 0 -2.34

-1.08 0.55 0.93 -0.97

0.53 0 0 0

-0.79 -0.84 -1.27 0

1 0 0 -1

-1 1 1 -1

1 0 0 0

-1 -1 -1 0

(a) (b) (c)

Fig. 4. Flowchart of the proposed joint pruning-quantization, (a) the original

weights matrix; (b) the weights after pruning, where the example pruning

function is x= 0 for x < 0.5; (c) the weights after quantization, where the

example quantization function is x=sign(x).

weights pruning and quantization can reduce the model size

and complexity, therefore, we compress the DeepSC model by

a joint pruning-quantization scheme to make it affordable for

IoT devices. As shown in Fig. 4, the original weights are ﬁrst

pruned at a high-precision level by identifying and removing

the unnecessary weights, which makes the network sparse.

Quantization is then used to convert the trained L-DeepSC

model into a low-precision level. The proposed network spar-

siﬁcation and quantization can address Question 3 and are

introduced in detail in the following.

1) Network Sparsiﬁcation: A proper criterion to disable

neural connections is important. Obviously, the connections

with small weight value can be pruned. Therefore, the pruning

issue here turns into setting a proper pruning threshold.

As shown in Fig. 2(b), the DeepSC consists with neural

networks, α,β,χ,δ, where each includes multiple layers.

Assume there are total Nlayers in the pre-trained DeepSC

model with W(n)

i,j being the weight of connection between

the ith neuron of the (n+ 1)th layer and jth neuron of nth

layer. With a pruning threshold wthre, the model weights can

be pruned by

W(n)

i,j =(W(n)

i,j ,if W(n)

i,j > wthre,

0,otherwise,(15)

We determine the pruning threshold by

wthre =sM×γ,(16)

where s= sort W(1) ,W(2),· · · ,W(N), is the sorted

weights value from least important one to the most important

one, Mis the total number of connections, and γ, the sparsity

ratio between 0 and 1, indicates the proportion of zero values

in weights. The weight pruning can be divided into two steps,

weight pruning to disable some neuron connections and ﬁne-

tine to recover the accuracy, as shown in Algorithm 1.

2) Network Quantization: The quantization includes weight

quantization and activation quantization. The weights, W(n)

i,j ,

from a trained model, can be converted from 32-bit ﬂoat point

to m-bits integer through applying the quantization function

W(n)

i,j =round qwW(n)

i,j −min W(n),(17)

Algorithm 1 Network Sparsiﬁcation.

Input: The pre-trained weights W, the sparse ratio γ.

Output: The pruned weights Wpruned.

1: Count the the total number of connections, M.

2: Sort the whole connections from small to large, s.

3: Obtain the threshold by (16) with Mand γ,wthre.

4: for n= 1 to Ndo

5: Prune the connections by (15), W(n)

pruned.

6: end for

7: Fine-tune the pruned model by loss function (4)

Algorithm 2 Network Quantization.

Input: The pre-trained weights W, the quantization level m,

the correlation coefﬁcient c, and the calibration data K.

Output: The pre-trained weights Wquantized and the range of

activation xmin and xmax.

1: Phase 1: Weights Quantization.

2: for n= 1 to Ndo

3: Compute the range of weights, max W(n)and

min W(n).

4: Quantize the weights by (17), ˜

W(n).

5: end for

6: Phase 2: Activations Quantization.

7: for t= 1 to Kdo

8: for n= 1 to Ndo

9: Update the dynamic range of activation by (19) and

(20), x(n)

min(t)and x(n)

max(t).

10: end for

11: end for

12: Quantize the activations by (21).

13: Fine-tune the quantized model by STE and loss function

(4).

where qwis the scale-factor to map the dynamic range of ﬂoat

points to an m-bits integer, which is given by

qw=2m−1

max W(n)−min W(n).(18)

For activation quantization, the results of matrix multiplica-

tion are stored in accumulators. Due to the limited dynamic

range of integer formats, it is possible that the accumulator

overﬂows quickly if the bit-width for the weights and activa-

tion is same. Therefore, accumulators are usually implemented

with higher bit-widths, for example, INT32 += INT8×INT8.

Besides, the range of activations is dynamic and dependent on

the input data. Therefore, the output of activations has to re-

quantize into m-bits integer for the subsequent calculation.

Unlike weights that are constant, the output of activations

usually includes elements that are statistically outliers, which

expand the actual dynamic range. For example, even if 99%

of the data is distributed between -100 and 100, an outlier,

10,000, will extend the dynamic range into from -100 to

10,000, which signiﬁcantly reduces the mapping resolution. In

order to reduce the inﬂuence from the outliers, an exponential

SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 7

moving average (EMA) is used by

x(n)

min(t+ 1) = (1 −c)x(n)

min(t) + cmin X(n)(t),(19)

and

x(n)

max(t+ 1) = (1 −c)x(n)

max(t) + cmax X(n)(t),(20)

where x(n)

min(t+ 1) and x(n)

max(t+ 1) are used for the range

of activation quantization, and x(n)

min(1) = min X(n)(1),

x(n)

max(1) = max X(n)(1),X(n)(t)is the output of activa-

tions at nth layer with tth batch data, c∈[0,1) represents

the correlation between the current x(n)

min/x(n)

max with its past

value. The effects from outliers can be mitigated by the past

normal values. After t+1 epochs, the x(n)

min and x(n)

max are ﬁxed

based on x(n)

min(t+ 1) and x(n)

max(t+ 1). Then, the output of the

activations can be quantized by

X(n)=clamp round qxX(n)−x(n)

min;−M, M ,

(21)

where qa= (2m−1)/(x(n)

max −x(n)

min)is the scale-factor and

clamp (·)is used to eliminate the quantized outliers, which is

given by

clamp X(n);−T , T = min max X(n),−T, T ,(22)

where T= 2m−1, which is the border of the m-bits integer

format.

As shown in Algorithm 2, the network quantization includes

two phases: i) weight quantization; ii) activations quantization.

In phase 1, the weights of each layers can be quantized

by (17) directly. In phase 2, calibration process is applied

by running a few calibration batches in order to get the

activations statistics. In each batch, x(n)

min(t)and x(n)

max(t)will

be updated based on the activations statistics from the previous

batches. These quantization processes might lead to slight

accuracy degradation. The quantization-aware training (QAT)

is required to re-train for minimizing the loss of accuracy.

Since the rounding operation is not derivable, straight-through

estimator (STE) is used to estimate the gradient of quantized

weights in the back-propagation [33].

C. Constellation Design with Fewer Quantization Bits

The cloud/edge platform can further reduce the size of L-

DeepSC with model compression after the model is trained,

which not only reduces the latency signiﬁcantly for broad-

casting the updated DeepSC to IoT devices, but also changes

DeepSC to L-DeepSC with low complexity. However, the

antenna of IoT devices is not able to create high-resolution

wave, in other words, the antenna cannot afford a large number

of constellation points close to each other.

Different from bits, the source message, s, is more com-

plicated and the learned constellation will not be limited to

few points, which brings additional burden on antenna design.

Besides, the DL model generally run in FP32, which also

expands the range of constellation. Thus, we aim to reduce the

size of learned constellation without degrading performance,

where the output of Xis the learned constellation while X

is also the output of activation of last layer at the local IoT

TABLE I

THE S ETT IN G OF L-DEE PSC RECEIVER.

Layer Name Units Activation

Receiver

(Decoder)

Dense 1 128 Relu

Dense 2 512 Relu

Dense 3 128 None

LayerNorm None None

4×Transformer Decoder 128 (8 heads) None

Prediction Layer Dictionary Size Softmax

devices. Inspired from the network quantization, we convert

the learned high-resolution constellation into low-resolution

one with few points. Thus, we use two-stage quantization to

narrow the range of constellations, which is represented by

Xdequantize =Xquantize

+xmin,(23)

where Xquantize is the quantized Xfrom (21), qxis the scale-

factor and xmin is the obtained by (19) and Xdequantize is the

dequantized X.

First, we quantize the Xinto m-bits integer so that the range

of Xis narrowed to the size of 2m. For example, when m= 8,

the size of constellation is reduced to 256. Then, Xquantize is

dequantize to restore X. Such an Xdequantize has the similar

distribution as Xbut is with fewer constellation points, which

is helpful to simplify antenna design at receiver and preserves

the performance as much as possible and therefore provides

the solution for Question 2.

In summary, by exploiting the solutions for the afore-

mentioned Questions, we develop a lite distributed semantic

communication system, named L-DeepSC, which could reduce

the latency for model exchange under limited bandwidth, run

the models at IoT devices with low power consumption, and

deal with the distortion from fading channels when upload-

ing semantic features. As a result, the proposed L-DeepSC

becomes a good candidate for the IoT networks.

IV. NUMERICAL RES ULT S

In this section, we compare the proposed L-DeepSC with

traditional methods under different fading channels, including

Rayleigh and Rician fading channels. The weights pruning

and quantization are also veriﬁed under fading channels. For

the Rayleigh fading channel, the channel coefﬁcient follows

CN (0,1); for the Rician fading channel. it follows C N (µ, σ2)

with µ=pk/(k+ 1) and σ=p1/(k+ 1). where kis

Rician coefﬁcient and we use k= 2 in our simulation.

The transmitter of L-DeepSC is the same as that of DeepSC

in [17]. The parameters for the decoding network at the

receiver are shown in Table I for the fading channels, where

the sum of the outputs of Dense 1 and Dense 3 is the input

of LayerNorm layer.

The adopted dataset is the proceedings of the European

Parliament [34], which consists of around 2.0 million sen-

tences and 53 million words. The dataset is pre-processed

into lengths of sentences with 4 to 30 words and is split

into training data and testing data with 0.1 ratio. The bench-

mark approach is based on separate source coding and chan-

nel coding technologies, which adopt variable-length coding

SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 8

(a) Full-resolution Constellation (b) 4-bits constellation

Fig. 5. The comparison between the full-resolution constellation and 4-bits constellation.

(Huffman coding) and ﬁxed-length coding (5-bit) for source

coding, Reed-Solomon (RS) coding [35] for channel coding,

and quadrature amplitude modulation (QAM). The bilingual

evaluation understudy (BLEU) score is used to measure the

performance [36].

A. Constellation Design

Fig. 5 compares the full-resolution constellation and the

4-bits constellation. The full-resolution constellation points

in Fig. 5(a) contain more information due to the higher

resolution, but require complicated antenna, which is almost

impossible to design. Through mapping the full-resolution

constellation into a ﬁnite space, the 4-bits constellation points

in Fig. 5(b) become simpliﬁed, which makes it possible

to implement in the existing RF system. Note that the 4-

bits constellation keeps the similar distribution with the full-

resolution constellation. For example, there exists certain blank

region in the edge of constellation in Fig. 5(a), while the 4-bits

constellation shows the similar trend in Fig. 5(b). Such similar

distribution prevents sharp performance degradation when the

resolution of constellation decreases signiﬁcantly.

Fig. 6 shows the BLEU scores versus SNR for different con-

stellation sizes under AWGN, including 4-bits constellation, 8-

bits constellation and full-resolution constellation. All of them

could achieve very similar performance when SNR >9dB,

which demonstrate the constellation design is effective and

cause no signiﬁcant performance degradation. Full resolution

and 8-bits constellations perform slightly better than 4-bits

constellation when SNR in low. This is because some weights

information used for denoising is lost when the resolution of

constellation is small.

B. Performance over Fading Channels

Fig. 7 compares the channel estimation MSEs of LS,

MMSE, and ADNet-aided LS estimator versus SNR under the

Rayleigh fading channels. Note that MMSE equals to LMMSE

for the AWGN channels. The MMSE and LS estimators have

0 3 6 9 12 15 18

SNR (dB)

0.2

0.4

0.6

0.8

BLEU Score

4-bits Constellation

8-bits Constellation

Original Constellation

Fig. 6. The BLEU scores of different constellation sizes versus SNR under

AWGN.

similar accuracy in the high SNR region, thus the range of

training SNRs for the ADNet is set from 0 dB to 10 dB to

improve the performance of LS estimator in the low SNR

region. As a result, the MSE of ADNet based LS estimator

is signiﬁcantly lower than that of LS and MMSE estimators

when SNR is low. With increasing SNR, the MSE of ADNet

based LS estimator approaches to that of the LS and MMSE

estimators. Therefore, the ADNet based LS estimator can be

substituted by the LS estimator to reduce the complexity in

the high SNR region.

Fig. 8 and Fig. 9 illustrate the relationship between BLEU

score and SNR with the 4-bits constellation over the Rician

and the Rayleigh fading channels, respectively, where the L-

DeepSC is trained with perfect CSI, rough CSI by (13), reﬁned

CSI by (14) and without CSI, respectively. The traditional

approaches are Huffman coding with (5,7) RS and 5-bit coding

with (7,9) RS, both with 64-QAM. We observe that all DL-

enabled approaches are more competitive under the fading

SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 9

0 3 6 9 12 15 18

SNR (dB)

10-3

10-2

10-1

100

MSE

MMSE estimator

LS estimator

LS estimator with ADNet

Fig. 7. The MSE for MMSE estimator, LS estimator, and the proposed ADNet

based LS estimator.

0 3 6 9 12 15 18

SNR (dB)

0.2

0.4

0.6

0.8

BLEU Score

L-DeepSC with perfect CSI

L-DeepSC with refined CSI

L-DeepSC with rough CSI

L-DeepSC without CSI

huffman + RS with perfect CSI

5-bit + RS with perfect CSI

Fig. 8. The BLEU scores versus SNR under Ricain fading channels, with

perfect CSI, rough CSI, reﬁned CSI, and no CSI.

channels. The system trained without CSI performs worse

than those trained with CSI, especially under the Rayleigh

fading channels, which also conﬁrms the analysis of (10) and

(11). Without CSI, the performance difference between the

Rayleigh channels and the Rician channels is caused by the

light-of-sight (LOS), which can help the systems recognize

the semantic information during training. Besides, with the

aid of CSI, the effects of the fading channels are mitigated

signiﬁcantly, as we have analyzed before. When SNR is

low, the system with perfect CSI or reﬁned CSI outperform

that with rough CSI. As SNR increases, all these systems,

L-DeepSC with perfect CSI, reﬁned CSI and rough CSI,

converge to similar performance gradually.

C. Model Compression

In this experiment, we investigate the performance of

network slimmer, including network spariﬁcation, network

quantization, and the combination of both. The pre-trained

0 3 6 9 12 15 18

SNR (dB)

0.2

0.4

0.6

0.8

BLEU Score

L-DeepSC with perfect CSI

L-DeepSC with refine CSI

L-DeepSC with rough CSI

L-DeepSC without CSI

huffman + RS with perfect CSI

5-bit + RS with perfect CSI

Fig. 9. The BLEU scores versus SNR under Rayleigh fading channels, with

perfect CSI, rough CSI, reﬁned CSI, and no CSI.

0 0.1 0.3 0.5 0.7 0.9 0.99

0.2

0.4

0.6

0.8

BLEU Score

SNR = 0 dB

SNR = 6 dB

SNR = 12 dB

SNR = 18 dB

Fig. 10. The BLEU scores of different SNRs versus sparsity ratio, γ, under

Rician fadings channel with the reﬁned CSI.

model used for pruning and quantization is trained with 4-

bits constellation under the Rician fading channels.

Fig. 10 shows the inﬂuences of network sparsity ratio, γ,

on the BLEU scores with different SNRs under the Rician

fading channels, where the system is pruned directly when

γincreases from 0 to 0.9 and is pruned with ﬁne-tuning

when γincreases to 0.99 continually. The proposed L-DeepSC

achieves almost the same BLEU scores when the γincreases

from 0 to 0.9, which shows that there exists a mass of

weights redundancy in the trained DeepSC model. When the

γincreases to 0.99, the BLEU scores still drop slightly due to

the processing of ﬁne-tuning, where the performance loss at

0 dB and 6 dB is larger than that at 12 dB and 18 dB. Thus,

for the high SNR cases, the model can be pruned directly

with only slight performance degradation. For the low SNR

region, it is possible to prune 99% weights without signiﬁcant

performance degradation when the system is sensitive to power

consumption.

Fig. 11 demonstrates the relationship between the BLEU

SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 10

TABLE II

THE BLEU SCORE AND COMPRESSION RATIO,ψ, COMPARISONS VERSUS DIFFERENT SPARSITY RATIO,γ,AN D QUAN TI ZATIO N LE VEL ,m,IN SNR =

12dB .

Pruned Model BLEU score

with m= 4 ψBLEU score

with m= 8 ψBLEU score

with m= 12 ψBLEU score

with m= 16 ψ

γ= 0.30.838967 11.429 0.892745 5.714 0.908537 3.81 0.910184 2.857

γ= 0.60.835863 20.0 0.897143 10.0 0.90815 6.667 0.900468 5.0

γ= 0.90.810322 80.0 0.895306 40.0 0.898784 26.667 0.910554 20.0

γ= 0.95 0.779685 160.0 0.875814 80.0 0.873426 53.333 0.877221 40.0

248121620

0.2

0.4

0.6

0.8

BLEU Score

SNR = 0dB

SNR = 6dB

SNR = 12dB

SNR = 18dB

Fig. 11. The BLEU scores of different SNRs versus quantization level, m,

under Rician fading channels with the reﬁned CSI.

score and the quantization bit number, m, under the Rician

fading channels, where mis deﬁned in (18), and the system

is quantized with QAT when the mis smaller than 2. The

performance with m= 8 to m= 20 is similar, which indicates

that the effectiveness of low-resolution neural networks. If

the system is more sensitive to power consumption and can

tolerant to cerain performance degradation, the resolution of

the neural networks can be further reduced to 4-bits level.

However, the BLEU score decreases dramatically from m= 4

to m= 2 over the whole SRN range since most key

information are removed in the low-resolution neural network.

Table II compares the BLEU scores and compression ratios

under different combinations of weights pruning and weights

quantization with SNR = 12 dB, where the compression ratio

is computed by

ψ=M×32

Mpruned ×m,(24)

where Mis the number of weights before pruning and Mpruned

is the number of weights remaining after pruning, 32 is the

number of required bits for FP32 and mis the number of the

required bits after quantization. The performance decreases

when γincreases or mdecreases, which are consistent with

Fig. 15 and Fig. 11. From the table, different compression

ratios could lead to similar performance. For example, the

BLEU score with γ= 30% and m= 8 is similar to that

with γ= 90% and m= 12, but the compression ratio is

about ﬁve times different, i.e., 5.714 and 26.667. By properly

choosing a suitable sparsity ratio and a quantization level, the

same performance can be achieved but with high compression

ratio.

V. CONCLUSION

In this paper, we proposed a lite distributed semantic com-

munication system, named L-DeepSC, for Internet of Things

(IoT) networks, where the participating devices are usually

with limited power and computing capabilities. Specially, the

receiver and feature extractor were designed jointly for text

transmission. Firstly, we analyzed the effectiveness of CSI

in forward-propagation and back-propagation during system

training over the fading channels. The analytical results reveal

that the fading channels contaminate the weights update and

restrict model representation capability. Thus, a reﬁned LS

estimator with less pilot overheads was developed to eliminate

the effects from fading channels. Besides, we map the full-

resolution original constellation into ﬁnite bits constellation

to match the current antenna design, which was veriﬁed

by simulation results. Finally, due to the limited narrow

bandwidth and computational capability in IoT networks, two

model compression approaches have been proposed: 1) the

network sparsiﬁcation to prune the unnecessary weights, and

2) network quantization to reduce the weights resolution.

The simulation results validated that the proposed L-DeepSC

outperforms the traditional methods, especially in the low

SNR regime, and has provided insights in the balance among

compression ratio, sparsity ratio, and quantization level. There-

fore, our proposed L-DeepSC is a promising candidate for

intelligent IoT networks, especially in the low SNR regime.

REFERENCES

[1] L. Atzori, A. Iera, and G. Morabito, “The internet of things: a survey,”

Computer Networks, vol. 54, no. 15, pp. 2787–2805, Oct. 2010.

[2] T. Qiu, N. Chen, K. Li, M. Atiquzzaman, and W. Zhao, “How can

heterogeneous internet of things build our future: A survey,” IEEE

Commun. Surv. Tutorials, vol. 20, no. 3, pp. 2011–2027, Feb. 2018.

[3] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press,

2016.

[4] M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani, “Deep

learning for iot big data and streaming analytics: A survey,” IEEE

Commun. Surv. Tutorials, vol. 20, no. 4, pp. 2923–2960, Jun. 2018.

[5] H. Li, K. Ota, and M. Dong, “Learning iot in edge: Deep learning for

the internet of things with edge computing,” IEEE Network, vol. 32,

no. 1, pp. 96–101, Jan. 2018.

[6] R. Carnap, Y. Bar-Hillel et al.,An Outline of A Theory of Semantic

Information. RLE Technical Reports 247, Research Laboratory of

Electronics, Massachusetts Institute of Technology., Cambridge MA,

Oct. 1952.

[7] D. Tse and P. Viswanath, Fundamentals of Wireless Communication.

Cambridge University Press, 2005.

[8] I. Guyon, S. Gunn, M. Nikravesh, and L. A. Zadeh, Feature Extraction:

Foundations and Applications. Springer, 2008, vol. 207.

SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 11

[9] R. Szeliski, Computer Vision: Algorithms and Applications. Springer

Science & Business Media, 2010.

[10] N. Indurkhya and F. J. Damerau, Handbook of Natural Language

Processing. CRC Press, 2010, vol. 2.

[11] C. E. Shannon and W. Weaver, The Mathematical Theory of Communi-

cation. The University of Illinois Press, 1949.

[12] D. Tse and P. Viswanath, Fundamentals Wireless Communication.

Cambridge University Press, 2005.

[13] Z. Qin, H. Ye, G. Y. Li, and B.-H. F. Juang, “Deep learning in physical

layer communications,” IEEE Wireless Commun., vol. 26, no. 2, pp.

93–99, Apr. 2019.

[14] E. Bourtsoulatze, D. B. Kurka, and D. G¨

und¨

uz, “Deep joint source-

channel coding for wireless image transmission,” IEEE Trans. Cogn.

Commun. Netw., vol. 5, no. 3, pp. 567–579, May 2019.

[15] M. Jankowski, D. G¨

und¨

uz, and K. Mikolajczyk, “Deep

joint transmission-recognition for power-constrained iot devices,”

arXiv:2003.02027, 2020. [Online]. Available: https://arxiv.org/abs/2003.

02027

[16] N. Farsad, M. Rao, and A. Goldsmith, “Deep learning for joint source-

channel coding of text,” in Proc. IEEE Int’l. Conf. Acoustics Speech

Signal Process. (ICASSP), Calgary, AB, Canada, Apr. 2018, pp. 2326–

2330.

[17] H. Xie, Z. Qin, G. Y. Li, and B.-H. Juang, “Deep learning enabled

semantic communication systems,” arXiv:2006.10685, 2020. [Online].

Available: https://arxiv.org/abs/2006.10685

[18] C. Lee, J. Lin, P. Chen, and Y. Chang, “Deep learning-constructed joint

transmission-recognition for internet of things,” IEEE Access, vol. 7, pp.

76 547–76 561, Jun. 2019.

[19] E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus,

“Exploiting linear structure within convolutional networks for efﬁcient

evaluation,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), Montreal,

Quebec, Canada, Dec. 2014, pp. 1269–1277.

[20] S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and

connections for efﬁcient neural network,” in Proc. Adv. Neural Inf.

Process. Syst. (NIPS), Montreal, Quebec, Canada, Dec. 2015, pp. 1135–

1143.

[21] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang, “Learning

efﬁcient convolutional networks through network slimming,” in Proc.

IEEE Int’l. Conf. on Comput. Vis. (ICCV), Venice, Italy, Oct. 2017, pp.

2755–2763.

[22] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning

ﬁlters for efﬁcient convnets,” in Proc. IEEE Int’l. Conf. on Learning

Representations (ICLR), Toulon, France, Apr. 2017.

[23] R. Krishnamoorthi, “Quantizing deep convolutional networks for

efﬁcient inference: A whitepaper,” arXiv:1806.08342, 2018. [Online].

Available: http://arxiv.org/abs/1806.08342

[24] Y. Gong, L. Liu, M. Yang, and L. Bourdev, “Compressing deep

convolutional networks using vector quantization,” arXiv:1412.6115,

2014. [Online]. Available: http://arxiv.org/abs/1412.6115

[25] A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen, “Incremental network

quantization: Towards lossless cnns with low-precision weights,” in

Proc. IEEE Int’l. Conf. on Learning Representations (ICLR), Toulon,

France, Apr. 24-26, 2017.

[26] F. Li, B. Zhang, and B. Liu, “Ternary weight networks,”

arXiv:1605.04711, 2016. [Online]. Available: http://arxiv.org/abs/1605.

04711

[27] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. G. Howard, H. Adam,

and D. Kalenichenko, “Quantization and training of neural networks for

efﬁcient integer-arithmetic-only inference,” in Proc. IEEE Conf. Comput.

Vis. Pattern Recognit. (CVPR), Salt Lake City, UT, USA, Jun. 18-22,

2018, pp. 2704–2713.

[28] J. Guo, J. Wang, C.-K. Wen, S. Jin, and G. Y. Li, “Compression and

acceleration of neural networks for communications,” IEEE Wireless

Commun., Early Access.

[29] D. Gil, A. Ferr´

andez, H. Mora-Mora, and J. Peral, “Internet of things: A

review of surveys based on context aware intelligent services,” Sensors,

vol. 16, no. 7, p. 1069, Jul. 2016.

[30] B. Zhu, J. Wang, L. He, and J. Song, “Joint transceiver optimization for

wireless communication phy using neural network,” IEEE J. Sel. Areas

Commun., vol. 37, no. 6, pp. 1364–1373, Mar. 2019.

[31] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian

denoiser: Residual learning of deep cnn for image denoising,” IEEE

Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, Feb. 2017.

[32] C. Tian, Y. Xu, Z. Li, W. Zuo, L. Fei, and H. Liu, “Attention-guided cnn

for image denoising,” Neural Netw., vol. 124, pp. 117–129, Apr. 2020.

[33] Y. Bengio, N. L´

eonard, and A. Courville, “Estimating or propagating

gradients through stochastic neurons for conditional computation,”

arXiv:1308.3432, 2013. [Online]. Available: http://arxiv.org/abs/1308.

3432

[34] P. Koehn, “Europarl: A parallel corpus for statistical machine transla-

tion,” in MT summit, vol. 5, 2005, pp. 79–86.

[35] I. S. Reed and G. Solomon, “Polynomial codes over certain ﬁnite ﬁelds,”

J. the Society for Industrial and Applied Math., vol. 8, no. 2, pp. 300–

304, Jan. 1960.

[36] K. Papineni, S. Roukos, T. Ward, and W. Zhu, “Bleu: a method for

automatic evaluation of machine translation,” in Proc. Annual Meeting

Assoc. Comput. Linguistics (ACL), Philadelphia, PA, USA, Jul. 2002,

pp. 311–318.

A Model for Quantifying the Degree of Understanding in Cross-Domain M2M Semantic Communications

Article

Full-text available

Jan 2024

This paper addresses the problem of semantic communications (SemComs) in intelligent machine-to-machine (M2M) applications. Although M2M applications may employ other languages as the communication medium, natural languages are commonly used as the medium between machines and robots. One favorable characteristic of using natural languages is that it allows humans to inspect communication contents easily, which caters to the needs of security and quality of service for M2M communication. Currently, no exact solutions are available for quantifying and measuring the understanding of M2M communication. This paper identifies three specific challenges in the field: inconsistent knowledge base (KB), cross-domain interpretation, and a measure for understanding the meaning of messages. We propose a model to address these challenges in two steps. First, we propose an evidence-based shared-KB communication model for cross-domain meaning interpretation using Dewey Decimal Classification. Second, we propose a measure to quantify the understanding level through a two-stage validation between the sender and receiver. Real-life datasets and numerical experiments are used to evaluate the model’s performance. The results show that the degree of understanding (DoU) can be successfully measured by observing the performance of the sender and receiver under the same conditions. The proposed method can effectively improve mutual understanding between the two machines.

Optimization of Image Transmission in UAV-Enabled Semantic Communication Networks

Conference Paper

Dec 2023

Enhanced semantic communication schemes for speech signals

Article

Apr 2024

Two new models for semantic communication systems are proposed. The first model incorporates the convolutional block attention module, which considers attention techniques in both the channel and spatial domains. The second model applies the efficient channel attention (ECA) network with reduced complexity. Experimental results demonstrate that the convolutional block attention module‐equipped model improved signal‐to‐distortion ratio performance by at a signal‐to‐noise ratio of while maintaining a similar number of parameters compared to the existing model using squeeze‐and‐excitation network. Meanwhile, the efficient channel attention‐equipped model reduced parameters by approximately without any degradation in performance compared to the existing model.

Time-Sensitive Semantic Communication Using Dynamic Spiking Neural Networks

Conference Paper

Dec 2023

Artificial Intelligence for Wireless Physical-Layer Technologies (AI4PHY): A Comprehensive Survey

Article

Jun 2024

Artificial intelligence (AI) has become a promising solution for meeting the stringent performance requirements on wireless physical layer in sixth-generation (6G) communication systems, due to its strong ability to learn complex model, achieve end-to-end optimization and adapt to dynamic environments. This article provides a comprehensive review with respect to artificial intelligence for wireless physical-layer technologies (AI4PHY). Specifically, we first analyze the characteristics of the classic AI techniques and their potential applications for physical-layer technologies. Then we study the AI-enhanced designs from the point of view of the basic physical-layer modules, including coding, modulation, multiple access, multiple-input-multiple-output (MIMO), channel estimation, as well as relay transmission. The standardization progress of AI4PHY in 3GPP is also discussed. Based on the current AI4PHY researches, we propose some potential future research directions to inspire and encourage the further exploration.

Joint Optimization of Trajectory and Image Transmission in Multi-UAV Semantic Communication Networks

Conference Paper

Dec 2023

SEMUAV: A Cooperative Semantic Communication Method in SATIN

Conference Paper

Dec 2023

Deep Learning Enabled Video Semantic Transmission Against Multi-Dimensional Noise

Conference Paper

Dec 2023

Energy Efficiency in Semantic Networks: A Heuristic Optimization Approach for Resource Allocation

Conference Paper

Nov 2023

Open RAN meets Semantic Communications: A Synergistic Match for Open, Intelligent, and Knowledge-Driven 6G

Conference Paper

Nov 2023

Deep Learning Enabled Semantic Communication Systems

Article

Full-text available

Apr 2021

Recently, deep learned enabled end-to-end (E2E) communication systems have been developed to merge all physical layer blocks in the traditional communication systems, which make joint transceiver optimization possible. Powered by deep learning, natural language processing (NLP) has achieved great success in analyzing and understanding a large amount of language texts. Inspired by research results in both areas, we aim to provide a new view on communication systems from the semantic level. Particularly, we propose a deep learning based semantic communication system, named DeepSC, for text transmission. Based on the Transformer, the DeepSC aims at maximizing the system capacity and minimizing the semantic errors by recovering the meaning of sentences, rather than bit-or symbol-errors in traditional communications. Moreover, transfer learning is used to ensure the DeepSC applicable to different communication environments and to accelerate the model training process. To justify the performance of semantic communications accurately, we also initialize a new metric, named sentence similarity. Compared with the traditional communication system without considering semantic information exchange, the proposed DeepSC is more robust to channel variation and is able to achieve better performance, especially in the low signal-to-noise (SNR) regime, as demonstrated by the extensive simulation results.

Deep Learning-Based End-to-End Wireless Communication Systems With Conditional GANs as Unknown Channels

Article

Full-text available

Feb 2020

In this article, we develop an end-to-end wireless communication system using deep neural networks (DNNs), where DNNs are employed to perform several key functions, including encoding, decoding, modulation, and demodulation. However, an accurate estimation of instantaneous channel transfer function, i.e., channel state information (CSI), is needed in order for the transmitter DNN to learn to optimize the receiver gain in decoding. This is very much a challenge since CSI varies with time and location in wireless communications and is hard to obtain when designing transceivers. We propose to use a conditional generative adversarial net (GAN) to represent channel effects and to bridge the transmitter DNN and the receiver DNN so that the gradient of the transmitter DNN can be back-propagated from the receiver DNN. In particular, a conditional GAN is employed to model the channel effects in a data-driven way, where the received signal corresponding to the pilot symbols is added as a part of the conditioning information of the GAN. To address the curse of dimensionality when the transmit symbol sequence is long, convolutional layers are utilized. From the simulation results, the proposed method is effective on additive white Gaussian noise (AWGN) channels, Rayleigh fading channels, and frequency-selective channels, which opens a new door for building data-driven DNNs for end-to-end communication systems.

Massive MIMO Channel Estimation with an Untrained Deep Neural Network

Article

Full-text available

Jan 2020

This paper proposes a deep learning-based channel estimation method for multi-cell interference-limited massive MIMO systems, in which base stations equipped with a large number of antennas serve multiple single-antenna users. The proposed estimator employs a specially designed deep neural network (DNN) based on the deep image prior (DIP) network to first denoise the received signal, followed by conventional least-squares (LS) estimation. We analytically prove that our LS-type deep channel estimator can approach minimum mean square error (MMSE) estimator performance for high-dimensional signals, while avoiding complex channel inversions and knowledge of the channel covariance matrix. This analytical result, while asymptotic, is observed in simulations to be operational for just 64 antennas and 64 subcarriers per OFDM symbol. The proposed method also does not require any training and utilizes several orders of magnitude fewer parameters than conventional DNNs. The proposed deep channel estimator is also robust to pilot contamination and can even completely eliminate it under certain conditions.

Deep Learning-Constructed Joint Transmission-Recognition for Internet of Things

Article

Full-text available

Jun 2019

The widely deployed Internet of things (IoT) devices provide intelligent services with its cognition capability. Since IoT data is usually transmitted to server for recognition (e.g., image classification) due to low computational capability and limited power supply, achieving recognition accuracy under limited bandwidth and noisy channel of wireless networks is a crucial but challenging task. In this paper, we propose a deep learning-constructed joint transmission-recognition scheme for IoT devices to effectively transmit data wirelessly to server for recognition, jointly considering transmission bandwidth, transmission reliability, complexity, and recognition accuracy. Compared to other schemes that may be deployed on IoT devices, i.e., a scheme based on JPEG compression and two compressed sensingbased schemes, the proposed deep neural network-based scheme has much higher recognition accuracy under various transmission scenarios at all signal-to-noise ratio (SNR). In particular, the proposed scheme maintains good performance at very low SNR. Moreover, the complexity of the proposed scheme is low, making it suitable for IoT applications. Finally, a transfer learning-based training method is proposed to effectively mitigate the computing burden and reduce overhead of online training.

Joint Device-Edge Inference over Wireless Links with Pruning

Conference Paper

May 2020

Compression and Acceleration of Neural Networks for Communications

Article

Jul 2020

DL has achieved great success in signal processing and communications and has become a promising technology for future wireless communications. Existing works mainly focus on exploiting DL to improve the performance of communication systems. However, the high memory requirement and computational complexity constitute a major hurdle for the practical deployment of DL-based communications. In this article, we investigate how to compress and accelerate the neural networks (NNs) in communication systems. After introducing the deployment challenges for DL-based communication algorithms, we discuss some representative NN compression and acceleration techniques. Afterwards, two case studies for multiple-input-multiple-output (MIMO) communications, including DL-based channel state information feedback and signal detection, are presented to show the feasibility and potential of these techniques. We finally identify some challenges on NN compression and acceleration in DL-based communications and provide a guideline for subsequent research.

Deep Learning and Channel Estimation

Conference Paper

Mar 2020

Attention-guided CNN for image denoising

Article

Apr 2020
NEURAL NETWORKS

Deep convolutional neural networks (CNNs) have attracted considerable interest in low-level computer vision. Researches are usually devoted to improving the performance via very deep CNNs. However, as the depth increases, influences of the shallow layers on deep layers are weakened. Inspired by the fact, we propose an attention-guided denoising convolutional neural network (ADNet), mainly including a sparse block (SB), a feature enhancement block (FEB), an attention block (AB) and a reconstruction block (RB) for image denoising. Specifically, the SB makes a tradeoff between performance and efficiency by using dilated and common convolutions to remove the noise. The FEB integrates global and local features information via a long path to enhance the expressive ability of the denoising model. The AB is used to finely extract the noise information hidden in the complex background, which is very effective for complex noisy images, especially real noisy images and bind denoising. Also, the FEB is integrated with the AB to improve the efficiency and reduce the complexity for training a denoising model. Finally, a RB aims to construct the clean image through the obtained noise mapping and the given noisy image. Additionally, comprehensive experiments show that the proposed ADNet performs very well in three tasks (i.e. synthetic and real noisy images, and blind denoising) in terms of both quantitative and qualitative evaluations. The code of ADNet is accessible at http://www.yongxu.org/lunwen.html.

Model-Free Training of End-to-End Communication Systems

Article

Aug 2019

The idea of end-to-end learning of communication systems through neural network (NN)-based autoencoders has the shortcoming that it requires a differentiable channel model. We present in this paper a novel learning algorithm which alleviates this problem. The algorithm enables training of communication systems with an unknown channel model or with non-differentiable components. It iterates between training of the receiver using the true gradient, and training of the transmitter using an approximation of the gradient. We show that this approach works as well as model-based training for a variety of channels and tasks. Moreover, we demonstrate the algorithm’s practical viability through hardware implementation on software defined radios (SDRs) where it achieves state-of-theart performance over a coaxial cable and wireless channel.

Deep Joint Source-Channel Coding for Wireless Image Transmission

Article

May 2019

We propose a joint source and channel coding (JSCC) technique for wireless image transmission that does not rely on explicit codes for either compression or error correction; instead, it directly maps the image pixel values to the complex-valued channel input symbols. We parameterize the encoder and decoder functions by two convolutional neural networks (CNNs), which are trained jointly, and can be considered as an autoencoder with a non-trainable layer in the middle that represents the noisy communication channel. Our results show that the proposed deep JSCC scheme outperforms digital transmission concatenating JPEG or JPEG2000 compression with a capacity achieving channel code at low signal-to-noise ratio (SNR) and channel bandwidth values in the presence of additive white Gaussian noise (AWGN). More strikingly, deep JSCC does not suffer from the “cliff effect”, and it provides a graceful performance degradation as the channel SNR varies with respect to the SNR value assumed during training. In the case of a slow Rayleigh fading channel, deep JSCC learns noise resilient coded representations and significantly outperforms separation-based digital communication at all SNR and channel bandwidth values.

A Lite Distributed Semantic Communication System for Internet of Things

Abstract and Figures

Recommended publications

Deep Learning Enabled Semantic Communication Systems

Deep Learning Enabled Semantic Communication Systems

Semantic Communications for Speech Recognition

Deep Learning based Semantic Communications: An Initial Investigation