Conference PaperPDF Available

Deep Feature Learning for Wireless Spectrum Data

August 2023

August 2023

DOI:10.48550/arXiv.2308.03530

Conference: MeditCom 2023

Authors:

Ljupcho Milosheski

Jožef Stefan Institute

Gregor Cerar

Jožef Stefan Institute

Blaz Bertalanic

Jožef Stefan Institute

Carolina Fortuna

Jožef Stefan Institute

Show all 5 authorsHide

In recent years, the traditional feature engineering process for training machine learning models is being automated by the feature extraction layers integrated in deep learning archi-tectures. In wireless networks, many studies were conducted in automatic learning of feature representations for domain-related challenges. However, most of the existing works assume some supervision along the learning process by using labels to optimize the model. In this paper, we investigate an approach to learning feature representations for wireless transmission clustering in a completely unsupervised manner, i.e. requiring no labels in the process. We propose a model based on convolutional neural networks that automatically learns a reduced dimensionality representation of the input data with 99.3% less components compared to a baseline principal component analysis (PCA). We show that the automatic representation learning is able to extract fine-grained clusters containing the shapes of the wireless transmission bursts, while the baseline enables only general separability of the data based on the background noise.

Architecture of the automatic feature learning system.

…

Evaluation of the baseline representation learning.

…

VAT plots for DL-based representations.

…

Distribution of samples from each cluster along the frequency band for the baseline approach.

…

Figures - uploaded by Ljupcho Milosheski

Content may be subject to copyright.

Content uploaded by Ljupcho Milosheski

Content may be subject to copyright.

Deep Feature Learning for Wireless Spectrum Data

Ljupcho Milosheski, Gregor Cerar, Blaˇ

z Bertalaniˇ

c, Carolina Fortuna and Mihael Mohorˇ

ciˇ

Jozef Stefan Institute, Ljubljana, Slovenia

{ljupcho.milosheski, gregor.cerar, blaz.bertalanic, carolina.fortuna, miha.mohorcic}@ijs.si

Abstract—In recent years, the traditional feature engineering

process for training machine learning models is being automated

by the feature extraction layers integrated in deep learning archi-

tectures. In wireless networks, many studies were conducted in

automatic learning of feature representations for domain-related

challenges. However, most of the existing works assume some

supervision along the learning process by using labels to optimize

the model. In this paper, we investigate an approach to learning

feature representations for wireless transmission clustering in

a completely unsupervised manner, i.e. requiring no labels in

the process. We propose a model based on convolutional neural

networks that automatically learns a reduced dimensionality

representation of the input data with 99.3% less components

compared to a baseline principal component analysis (PCA).

We show that the automatic representation learning is able to

extract ﬁne-grained clusters containing the shapes of the wireless

transmission bursts, while the baseline enables only general

separability of the data based on the background noise.

Index Terms—spectrum, analysis, features extraction, self-

supervised, machine learning

I. INTRODUCTION

The introduction of machine learning (ML) algorithms

in wireless communication has led to improvement of the

existing and development of completely new solutions when

sufﬁcient data is available. Some examples include modulation

classiﬁcation [1], [2], radio technology classiﬁcation [3], [4],

anomaly detection [5], and device ﬁngerprinting [6], [7]. ML

techniques that rely on manually engineered features from

the data are gradually being replaced by deep learning (DL)

algorithms which are able to extract more relevant features as

an integral part of their training process [8]. Features extracted

using deep-learning models appear to contain more meaningful

information [3] and allow scaling to larger datasets while at

the same time improving the accuracy [7].

Although these DL models provide unmatched accuracy in

domain-based classiﬁcation tasks, they require large amounts

of labeled data for training, i.e. larger than classical machine

learning algorithms. Such large amounts of training data for

instance on radio spectrum usage are typically collected and

made available for the research community from real-world

environment either using wireless testbed networks such as

LOG-a-TEC [9] or crowd-sourcing initiative such as Elec-

troSense [10]. However, labeling radio spectrum data requires

domain specialists with good knowledge of the operating envi-

ronment and understanding of wireless technologies, making

it an expensive and erroneous process. To address this issue,

usage of unsupervised/semi-supervised [5], [11] models is

emerging as an alternative, but still under-explored approach.

In this paper, we adapt and propose an architecture for

learning feature representations of wireless transmissions from

spectrograms in a completely unsupervised manner. In the

absence of similar approach for direct comparison, we use the

principal components analysis (PCA) as a baseline automatic

representation learning approach. The proposed architecture

was originally developed for feature learning from color

images, known as DeepCluster [12]. We adapt this architecture

to the domain environment and prove it is a worthy alternative

for training a model that outperforms the baseline in the

extraction of features that describe and distinguish the spectro-

gram patterns of different wireless transmission technologies.

Considering the lower amount of content dynamics of the

spectrograms compared to the color images, we propose a

methodology for selecting the number of dimensions that

contain the relevant features in the representation provided by

convolutional neural networks (CNNs).

The main contributions of this work are as follows:

•We propose a CNN-based model that automatically learns

a reduced dimensionality representation of the input data

with 99.5% less components compared to baseline PCA.

•We prove that the proposed CNN-based representation

learning is able to extract features that are representing

actual transmissions, while PCA can learn only general

representations that characterize the background noise.

•We develop a methodology for evaluating the quality

of the provided features with regards to their clustering

tendency, complementary to the clustering quality assess-

ment. The introduction of such evaluation offers addi-

tional insight for the selection of the number of clusters

and number of dimensions of the reduced feature space,

the two critical parameters of the proposed architecture.

The rest of the paper is structured as follows. Section

II analyzes the related work. Section III elaborates on the

feature representation learning using DL while Section IV

elaborates on the experimental methodology, including the fea-

ture development and evaluation metrics. Section V presents

and discusses the experimental results. Finally, Section VI

concludes the paper.

II. RE LATE D WO RK

Regarding the state of the art feature representation learning

approaches available in wireless communications, we identi-

ﬁed two related lines of work: supervised feature learning and

feature learning incorporating unsupervised architectures. The

later can be completely unsupervised or semi-supervised.

A. Supervised feature learning

Considering the amount of research works, supervised DL

architectures have well established usage for the domain-

related problems. For device ﬁngerprinting, as one of the main

tasks in the domain, high classiﬁcation accuracy is achieved

(above 92%) in [6], [7]. The ability of the CNN to encode

relevant features is also proven in modulation classiﬁcation

tasks [1], [2] where various types of CNN-based architectures

are used. Supervised solutions for wireless technology classi-

ﬁcation achieving high accuracy are proposed in [3], [4].

It is clear that when classiﬁcation problem is being ad-

dressed and big amount of labeled data is available, CNN-

based solutions achieve top performance. But, providing big

labeled spectrum data, as was discussed before, is an expensive

and erroneous task.

B. Unsupervised and semi-supervised feature learning

General unavailability of labelled spectrum data is con-

straining the usage of the supervised approaches. Thus, efforts

are invested in resolving this problem by using architectures

that require only small section of the data to be labeled,

compromising the accuracy.

In [13], dilated causal convolutional (DCC) architecture is

used in an unsupervised auto-encoder conﬁguration to learn

features from an unlabeled dataset. Small part of the data is

labeled and used for tuning the last layers of the network

in supervised conﬁguration. They show that the auto-encoder

successfully learns the general features of the data.

In [11], an auto-encoder is compared to semi-supervised

bootstrapping of sparse representation for modulation classi-

ﬁcation problem. Authors visually show that using the semi-

supervised approach provides better features compared to the

unsupervised approach and generalizes better on unseen data,

but no quantitative support is provided.

There are also usages with completely unsupervised imple-

mentation. In [5], automatic feature learning is proposed with

an auto-encoder network for the task of anomaly detection in

spectrum data. The network is compared with linear and robust

PCA and is shown to better extract features and provides better

accuracy of anomaly detection. But this is still a marginal case

because the task is a binary classiﬁcation.

In our work, we aim towards completely automatic rep-

resentation learning from large amount of unlabeled radio

spectrum data for the purpose of clustering, when multiple

types of spectrum activities are existing.

III. FEATU RE REPRESENTATION LEARNING

We propose a CNN-based feature learning and clustering

architecture as depicted in Figure 1. It was inspired by

and adapted from the existing DeepCluster model, originally

proposed in [12] for RGB image features learning. The archi-

tecture design contains a representation learning block and a

clustering block. The representation learning contains a CNN

block followed by PCA performing automatic learning of

reduced dimensionality feature representation. The clustering

block then processes the data provided by the representation

learning block. It is relevant to consider that the same dimen-

sionality reduction could be achieved by using one additional

fully connected layer (FCN) after the CNN. However, we use

PCA because it allows automatic feature ranking based on the

explained variance ratio (EVR) as an integral part of the PCA.

The ranking is made in the same Cartesian space in which the

K-means is working, allowing for better explainability of the

developed models.

Compared to the original DeepCluster model [12], we made

the following adaptations:

•Rather than using VGG [14] as a DL architecture, we

selected ResNet18 [15] motivated by the performance

improvement in a use case involving spectrum data in [1].

Thus, we achieve similar performance while reducing the

complexity of the models.

•We customized ResNet input and output layers according

to the shape of the images and the number of classes. In

our case, the input spectrogram images have only one

channel, while the ResNet was originally designed for

3-channel RGB images.

During the training process of the automatic feature repre-

sentation learning using the architecture depicted in Figure 1,

a feedback loop is used as shown with a dotted line. It

consists of a fully connected classiﬁcation layer attached to

CNN. This layer generates estimated labels used as a ground

truth. During the iterative training process they are compared

to the pseudo-labels generated by the K-means clustering

and the difference is propagated back to guide the training.

More speciﬁcally, clustering and CNN weights training are

performed in an alternating manner. In the initial phase, the

K-means clustering on the output of randomly initialized

CNN provides initial cluster assignments which are used as

temporary pseudo-labels (L) for the ﬁrst epoch of training

of the CNN. The improved CNN is then used for features

extraction in the next iteration, that subsequently with the new

clustering provide new temporal labels. The clustering-training

sequence completes one training iteration. The procedure stops

when the predeﬁned number of iterations (training epochs) is

reached.

It is important to note that through its iterative training

the proposed representation learning approach includes a tight

coupling between the values of the CNN weights, the size

Nof the PCA components and the number of clusters Kas

summarized in the ﬁrst line of Table I. Using this architecture,

all these dimensions need to be optimized simultaneously and

they inﬂuence each other through the feedback loop. However,

in the ﬁnal application, only the optimized system consisting

of the representation learning is needed. Two possible ways

of utilizing the developed representation learning model are:

1) Feature extractor for clustering which provides ability to

discover new devices, by varying the parameters of the

used clustering algorithm.

2) Transmission classiﬁcation for already discovered num-

ber of classes using fully connected layer at the output

of the CNN.

Fig. 1: Architecture of the automatic feature learning system.

TABLE I: Adaptable parameters for the Baseline and CNN-

based learning architectures.

Architecture

Function

block Representation learning Clustering

CNN-based CNN + PCA (n=1..N) (k=1..K)

Baseline PCA (n=1..N) k (K=2..30)

The equivalent baseline architecture not involving the CNN

learning component and the dotted training loop in Figure 1

can be optimized in a sequential manner: ﬁst optimizing the

PCA as a representation learning method and then optimizing

the K-means as clustering method. The equivalent optimization

parameters are summarized in the second line of Table I. The

baseline system employs a ﬂattening block that reorders the

elements of the input matrix into a single row and feeds them

to PCA to learn a representation.

IV. METHODOLOGY

A. Training and Evaluation Data

The dataset used for the performance analysis consists of 15

days of radio spectrum measurements acquired in the LOG-

a-TEC testbed at a sampling rate of 5 power spectral density

measurements per second using 1024 FFT bins in the 868 MHz

license-free (shared spectrum) band with a 192 kHz bandwidth.

Details of the acquisition process and a subset of data can be

found in [9]. The acquired data has a matrix form of 1024×M,

where Mis the number of measurements over time.

Fig. 2: Sample of 8 spectrogram segments from the data.

The complete data-matrix was segmented into non-

overlapping square images (spectrograms) along time and

frequency (FFT bins) for a window size W= 128. An

example of such segmentation containing 8 square images is

shown in Figure 2, corresponding to the image resolution of

25.6 seconds (128 measurements taken at 5 measurements per

second) by 24 kHz. The window size is large enough to contain

any single type of activity and small enough to avoid having

too many activities in a single image while also having in

mind computational cost. Dividing the entire dataset of 15

days using W= 128 and zero overlapping, produces 423,904

images of 128 ×128 pixels. Additionally, the pixel values are

scaled to [0,1].

B. Optimization of the Representation Learning

CNN-based and baseline approaches can be optimized along

two dimensions: the number of PCA components that should

be used in the representation and the number of clusters for

the K-means. For the baseline model, the two parameters are

independent, meaning that the representation learning function

is not affected by the number of clusters that will be later used

on the obtained feature vectors. On the other side, for the

CNN-based architecture, changes in the number of clusters

affect the representation learning. This is because the number

of clusters should always be the same as the number of classes

at the output of the CNN during the learning process, so

it inﬂicts changes on the representation learning block. This

means that varying the number of clusters should also be

considered when choosing the number of dimensions for the

representation learning with the CNN-based model.

It is unfeasible to study the inﬂuence of individual CNN

weights on the learnt representation and cluster quality due

to their large numbers. They are optimized in a black-box

manner during the training process consisting of 200 training

epochs. This number was determined empirically by observing

the convergence of the loss function.

C. Evaluation

As an evaluation metric for choosing the dimensionality of

the representation for both models we use EVR [16]. EVR is

a measure of how much of the variation in the feature space is

assigned to each of the principal components after performing

PCA.

We analyze the quality of the representation for clustering

purposes employing visual assessment of tendency (VAT) [17].

This method produces matrix visualisation of the dissimilarity

of randomly selected subset of samples based on their pairwise

euclidean distances. The samples are ordered in such a way

that groups that are closely located in the feature space,

according to the distance metric, appear as dark squares along

the diagonal of the matrix. Implementation wise, we used an

improved version of VAT (i.e., iVAT) which provides better

visualization than the standard one.

We also evaluate the quality of the clustering, performed on

the extracted features, by using the Silhouette score metric. In

this way we provide quantiﬁcation of how well the clusters

are distinguished for the analysed models.

Using these metrics, we evaluate and explore the repre-

sentation learning capabilities of both approaches and their

applicability for clustering. First we analyze a histogram of

samples for the formed clusters based on the frequency sub-

band resulting from image segmentation that each sample

comes from. These plots provide information on whether

the learned feature representation used for the clustering is

correlated to the location of the samples along the frequency

axis. Then we plot the average of the samples assigned to

a single cluster. This provides an insight into the actual

spectrogram content that is speciﬁc for the formed clusters.

V. EXPERIMENTAL RESULTS

A. Learning with the PCA baseline approach

In Figure 3, we present the evaluation of the learnt rep-

resentation according to EVR and VAT metrics discussed in

Section IV-C. Figure 3a shows EVR of the features learnt by

the baseline representation learning block consisting of PCA

only, followed by the VAT plots in Figures 3b–h. The plots

correspond to 7 different PCA-based representation learning

models, conﬁgured for different number of components se-

lected in a way to evaluate the feature vectors with wide range

of different dimensions.

It can be seen that by keeping 95% of the variance ratio

in the PCA learned representation provides feature vectors of

dimension 1x3770, which is around 23% of the ﬂattened single

sample input of 1x16384. Although the learned representation

has reduced dimensions by more than four times compared

to the ﬂattened input, we still have a high dimensionality

representation. The VAT plots in Figure 3b–h show that the

baseline approach learns representation with weak clustering

tendency for all cases, except the one when using only the

ﬁrst two components of the feature space. The VAT-2 plot

(Figure 3b) of the 2-dimensional feature representation shows

the existence of three well separated clusters.

B. Learning with the CNN-based approach

Figure 4a shows EVR of the features learnt by the Rep-

resentation learning block using ResNet18 (RN) and VGG11

(VGG) DL-based models with different number of clusters. A

smaller number of clusters yields higher EVR in the lower

components, while there is no signiﬁcant difference in the

cumulative sum of EVR after the 20th component across

different models. All models encode features with more than

95% of EVR within 27 components which is 0.7% of the 3770

components required by the baseline PCA.

(a) Cumulative sum of EVR for the baseline representation.

(b) VAT-2 (c) VAT-5 (d) VAT-25 (e) VAT-250

(f) VAT-1000 (g) VAT-2000 (h) VAT-3770

Fig. 3: Evaluation of the baseline representation learning.

(a) Cumulative sum of EVR for DL-based representations.

(b) RN-10 (c) RN-20 (d) RN-25 (e) RN-30

(f) VGG-10 (g) VGG-20 (h) VGG-25 (i) VGG-30

Fig. 4: VAT plots for DL-based representations.

The VAT plots for the proposed RN-based model are shown

in Figure 4b–e and for the VGG-based model in Figure 4f–i.

The plots correspond to 4 different automatic representation

learning models, conﬁgured for 10, 20, 25 and 30 clusters.

Both DL models achieve very similar EVR. Experimentally

it was observed that models trained with lower number of

clusters signiﬁcantly worsen the clustering tendency according

to VAT plots, so they were not considered in the subsequent

analysis. For a smaller number of clusters in Figure 4b,

the learnt representations with RN-based model contain less

prominent dark squares compared to the models with larger

number of clusters in Figures 4b–c. Similar observation holds

for VGG-based models in Figures 4e–h. For both DL models,

the 25 clusters models show the most distinguished separation

of the feature space. This analysis indicates that the proposed

architecture is able to learn representations that can yield

11 to 25 well separated clusters. It is also able to learn

5 to 10 and 26 to 30 less clearly separated clusters while

it is less suitable for small number of clusters such as 2

to 4. Overall, the automatically learnt representation is able

to extract ﬁne-grained clusters containing the shapes of the

wireless transmission bursts.

Comparing to the baseline, the CNN-based model can learn

to encode the relevant information for cluster development in

only approximately 0.7% of the components required by the

PCA baseline, when the application requires higher numbers

of well deﬁned clusters, while also enabling superior cluster

differentiation. For two clusters, the baseline model provides

a better separation according to the VAT plots and the same

feature dimensionality size.

C. Cluster analysis

Next we examined the best clusters developed with the

baseline and the CNN-based approach using histograms of

samples accompanied with the average cluster spectrograms

in Figure 5 and Figure 6, respectively.

For the best baseline approach containing 3 clusters, Fig-

ure 5 shows that the cluster 0 contains mostly the samples from

sub-bands between 1 and 7, the cluster 1 contains the samples

from the left-most sub-band and the cluster 2 contains almost

all of the samples from the right-most sub-band. Sub-bands

refer to sections of frequency-wise segmentation as shown in

Figure 2. This observation is also aligned with the size of the

dark squares in the VAT plot of the 2 PCA features in Figure

3b, which contains 3 clusters, one big corresponding to the

cluster 0, and two almost equal smaller clusters corresponding

to the clusters 1 and 2. Clearly, the baseline approach clusters

the data based on the weaker signal on the left-most and right-

most samples of the full bandwidth. The weaker signal seems

to be a consequence of the nonuniform sensing capability

of the sensor. Looking back to Figure 5, the left-most and

the right-most samples have gradually vanishing brightness

towards the edges. Plotting the average of the assigned spec-

trograms from each of the clusters supports this observation.

Experimentally we identiﬁed that the 24-cluster automatic

CNN-based model using ResNet18 DL architecture provides

Fig. 5: Distribution of samples from each cluster along the

frequency band for the baseline approach.

Fig. 6: Distribution of samples from each cluster along the

frequency band for the clusters obtained by the RN-based

approach.

the best results. For this model, the average spectrograms and

histograms in Figure 6 show its effectiveness in learning gen-

eral features related both to the transmission-speciﬁc content

and the ”background” of the spectrograms. The combined

clusters 19 and 23 in Figure 6 are the same as the cluster

2 from the baseline approach, occupying the right-most sub-

band, while the cluster 7 corresponds to the cluster 1 of

the baseline. According to Figure 6, the samples assigned

to this cluster are again occupying the right-most sub-band.

This means that the automatic model can also learn the

features extracted with the baseline approach. Additionally,

the automatic model learns features that are speciﬁc for the

different patterns generated by the transmissions. The clusters

0, 2, 4, 5, 8, 9..14, 17, 18, 20..22, in Figure 6 show horizontal

line activities, which according to Figure 2 appear across

the entire bandwidth. Their histograms show that samples

assigned to these clusters are from all 8 sub-bands, and their

distribution along the entire channel is roughly uniform. The

clusters 3, 6 and 15 show the capability of the automatic model

to distinguish the transmission-free spectrograms. This can

be used to determine transmission-free sub-bands, which is

another advantage against the baseline approach. Finally, the

clusters 1 and 16 show groups of dot-like transmission bursts.

Fig. 7: Silhouette scores

The Silhouette scores depicted in Figure 7 for the baseline

and the two CNN-based approaches conﬁrm the observations

based on the VAT plots. The baseline approach exhibits better

performance at small number of clusters, for three clusters

achieving the best score of 0.68. However, the baseline-

provided feature space does not show transmission speciﬁc

groups as samples are clustered based on the background

noise. The automatic CNN-based models show comparable

performance across the variation of clusters. This justiﬁes the

usage of the proposed lower complexity CNN, the ResNet18

instead the VGG11, preserving the performance and signif-

icantly reducing the complexity in terms of the number of

required DL model parameters by roughly 11 times, according

to Table II.

TABLE II: Complexity comparison.

Algorithm RN VGG Baseline

Num.

parameters

11 M 133 M /

VI. CONCLUSIONS

In this paper, an automatic feature representation learning

architecture based on CNN and PCA was explored and com-

pared to a baseline model using only PCA for the task of

clustering spectrograms from radio spectrum measurements.

Our ﬁndings show that the baseline approach is useful when

clustering based on general features of the data is required,

with only a small number of clusters. On the other side, the

automatic learning combining CNN and PCA, although more

complex, provides much ﬁner distinction between closely

related groups of samples, based on their actual content, which

are the transmissions bursts in our case. This shows that such

architecture can be used for automatic representation learning

and is suitable when large yet unlabeled spectrogram data is

available.

ACKNOWLEDGMENTS

This work was funded in part by the Slovenian Research

Agency under the grant P2-0016 and in part by the European

Union’s Horizon Europe Framework Programme under the

grant agreement No 101096456 (NANCY). The project is sup-

ported by the Smart Networks and Services Joint Undertaking

and its members.

REFERENCES

[1] T. J. O’Shea, T. Roy, and T. C. Clancy, “Over-the-air deep learning based

radio signal classiﬁcation,” IEEE Journal of Selected Topics in Signal

Processing, vol. 12, no. 1, pp. 168–179, 2018.

[2] S. Rajendran, W. Meert, D. Giustiniano, V. Lenders, and S. Pollin, “Deep

learning models for wireless signal classiﬁcation with distributed low-

cost spectrum sensors,” IEEE Transactions on Cognitive Communica-

tions and Networking, vol. 4, no. 3, pp. 433–445, 2018.

[3] J. Fontaine, E. Fonseca, A. Shahid, M. Kist, L. A. DaSilva, I. Moer-

man, and E. De Poorter, “Towards low-complexity wireless technology

classiﬁcation across multiple environments,” Ad Hoc Networks, vol. 91,

p. 101881, 2019.

[4] E. Fonseca, J. F. Santos, F. Paisana, and L. A. DaSilva, “Radio

access technology characterisation through object detection,” Computer

Communications, vol. 168, pp. 12–19, 2021.

[5] Q. Feng, Y. Zhang, C. Li, Z. Dou, and J. Wang, “Anomaly detection

of spectrum in wireless communication via deep auto-encoders,” The

Journal of Supercomputing, vol. 73, no. 7, pp. 3161–3178, 2017.

[6] K. Merchant, S. Revay, G. Stantchev, and B. Nousain, “Deep learning

for rf device ﬁngerprinting in cognitive communication networks,” IEEE

Journal of Selected Topics in Signal Processing, vol. 12, no. 1, pp. 160–

167, 2018.

[7] J. Robinson, S. Kuzdeba, J. Stankowicz, and J. M. Carmack, “Dilated

causal convolutional model for rf ﬁngerprinting,” in 10th IEEE Annual

Computing and Communication Workshop and Conference (CCWC),

2020, pp. 0157–0162.

[8] S. Riyaz, K. Sankhe, S. Ioannidis, and K. Chowdhury, “Deep learning

convolutional neural networks for radio identiﬁcation,” IEEE Communi-

cations Magazine, vol. 56, no. 9, pp. 146–152, 2018.

[9] T. ˇ

Solc, C. Fortuna, and M. Mohorˇ

ciˇ

c, “Low-cost testbed development

and its applications in cognitive radio prototyping,” in Cognitive radio

and networking for heterogeneous wireless networks. Springer, 2015,

pp. 361–405.

[10] S. Rajendran, R. Calvo-Palomino, M. Fuchs, B. Van den Bergh, H. Cor-

dob´

es, D. Giustiniano, S. Pollin, and V. Lenders, “Electrosense: Open

and big spectrum data,” IEEE Communications Magazine, vol. 56, no. 1,

pp. 210–217, 2017.

[11] T. J. O’Shea, N. West, M. Vondal, and T. C. Clancy, “Semi-supervised

radio signal identiﬁcation,” in 19th IEEE International Conference on

Advanced Communication Technology (ICACT), 2017, pp. 33–38.

[12] M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “Deep clustering

for unsupervised learning of visual features,” in European Conference

on Computer Vision (ECCV), 2018, pp. 132–149.

[13] S. Kuzdeba, J. Robinson, and J. Carmack, “Transfer learning with radio

frequency signals,” in 18th IEEE Annual Consumer Communications &

Networking Conference (CCNC), 2021, pp. 1–9.

[14] K. Simonyan and A. Zisserman, “Very deep convolutional networks for

large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.

[15] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for

image recognition,” in IEEE Conference on computer vision and pattern

recognition, 2016, pp. 770–778.

[16] I. Jolliffe, “Principal component analysis,” Springer: Berlin, Germany,

vol. 87, pp. 41–64, 01 1986.

[17] J. C. Bezdek and R. J. Hathaway, “Vat: A tool for visual assessment of

(cluster) tendency,” in IEEE international Joint Conference on Neural

Networks (IJCNN’02), vol. 3, 2002, pp. 2225–2230.

ResearchGate has not been able to resolve any citations for this publication.

Towards Low-Complexity Wireless Technology Classification Across Multiple Environments

Article

Full-text available

May 2019
AD HOC NETW

To cope with the increasing number of co-existing wireless standards, complex machine learning techniques have been proposed for wireless technology classification. However, machine learning techniques in the scientific literature suffer from some shortcomings, namely: (i) they are often trained using data from only a single measurement location, and as such the results do not necessarily generalise and (ii) they typically do not evaluate complexity/accuracy trade-offs of the proposed solutions. To remedy these shortcomings, this paper investigates which resource-friendly approaches are suitable across multiple heterogeneous environments. To this end, the paper designs and evaluates classifiers for LTE, Wi-Fi and DVB-T technologies using multiple datasets to investigate the complexity/accuracy trade-offs between manual feature extraction and automatic feature learning techniques. Our wireless technology classification reaches an accuracy up to 99%. Moreover, we propose the use of data augmentation techniques to extend these results to unseen environments at the cost of only 2% reduction in accuracy. When concerning generalisation capabilities, complex automatic learning techniques surpass simple manual feature extraction approaches. Finally, the complexity of these automatic learning techniques can be significantly reduced by using computationally less intensive received signal strength indicator data while reaching acceptable accuracies in unseen environments (92% vs 97%).

Deep Learning Convolutional Neural Networks for Radio Identification

Article

Full-text available

Sep 2018

Advances in software defined radio (SDR) technology allow unprecedented control on the entire processing chain, allowing modification of each functional block as well as sampling the changes in the input waveform. This article describes a method for uniquely identifying a specific radio among nominally similar devices using a combination of SDR sensing capability and machine learning (ML) techniques. The key benefit of this approach is that ML operates on raw I/Q samples and distinguishes devices using only the transmitter hardware-induced signal modifications that serve as a unique signature for a particular device. No higher-level decoding, feature engineering, or protocol knowledge is needed, further mitigating challenges of ID spoofing and coexistence of multiple protocols in a shared spectrum. The contributions of the article are as follows: (i) The operational blocks in a typical wireless communications processing chain are modified in a simulation study to demonstrate RF impairments, which we exploit. (ii) Using an overthe- air dataset compiled from an experimental testbed of SDRs, an optimized deep convolutional neural network architecture is proposed, and results are quantitatively compared with alternate techniques such as support vector machines and logistic regression. (iii) Research challenges for increasing the robustness of the approach, as well as the parallel processing needs for efficient training, are described. Our work demonstrates up to 90-99 percent experimental accuracy at transmitter- receiver distances varying between 2-50 ft over a noisy, multi-path wireless channel.

Over the Air Deep Learning Based Radio Signal Classification

Article

Full-text available

Dec 2017

We conduct an in depth study on the performance of deep learning based radio signal classification for radio communications signals. We consider a rigorous baseline method using higher order moments and strong boosted gradient tree classification and compare performance between the two approaches across a range of configurations and channel impairments. We consider the effects of carrier frequency offset, symbol rate, and multi-path fading in simulation and conduct over-the-air measurement of radio classification performance in the lab using software radios and compare performance and training strategies for both. Finally we conclude with a discussion of remaining problems, and design considerations for using such techniques.

Distributed Deep Learning Models for Wireless Signal Classification with Low-Cost Spectrum Sensors

Article

Full-text available

Jul 2017

This paper looks into the technology classification problem for a distributed wireless spectrum sensing network. First, a new data-driven model for Automatic Modulation Classification (AMC) based on long short term memory (LSTM) is proposed. The model learns from the time domain amplitude and phase information of the modulation schemes present in the training data without requiring expert features like higher order cyclic moments. Analyses show that the proposed model yields an average classification accuracy of close to 90% at varying SNR conditions ranging from 0dB to 20dB. Further, we explore the utility of this LSTM model for a variable symbol rate scenario. We show that a LSTM based model can learn good representations of variable length time domain sequences, which is useful in classifying modulation signals with different symbol rates. The achieved accuracy of 75% on an input sample length of 64 for which it was not trained, substantiates the representation power of the model. To reduce the data communication overhead from distributed sensors, the feasibility of classification using averaged magnitude spectrum data, or online classification on the low cost sensors is studied. Furthermore, quantized realizations of the proposed models are analyzed for deployment on sensors with low processing power.

Anomaly detection of spectrum in wireless communication via deep auto-encoders

Article

Full-text available

Jul 2017
J SUPERCOMPUT

Anomaly detection is a typical task in many fields, as well as spectrum monitoring in wireless communication. Anomaly detection task of spectrum in wireless communication is quite different from other anomaly detection tasks, mainly reflected in two aspects: (a) the variety of anomaly types makes it impossible to get the label of abnormal data. (b) the complexity and the quantity of the electromagnetic environment data increase the difficulty of manual feature extraction. Therefore, a novelty learning model is expected to deal with the task of anomaly detection of spectrum in wireless communication. In this paper, we apply the deep-structure auto-encoder neural networks to detect the anomalies of spectrum, and the time–frequency diagram is acted as the feature of the learning model. Meanwhile, a threshold is used to distinguish the anomalies from the normal data. Finally, we evaluate the performance of our models with different number of hidden layers by our experiments. The results of numerical experiments demonstrate that a model with a deeper architecture achieves relatively better performance in our spectrum anomaly detection task.

Transfer Learning with Radio Frequency Signals

Conference Paper

Jan 2021

Radio Access Technology characterisation through object detection

Article

Dec 2020
COMPUT COMMUN

Radio Access Technology (RAT) classification and monitoring are essential for efficient coexistence of different communication systems in shared spectrum. Shared spectrum, including operation in license-exempt bands, is envisioned in the fifth generation of wireless technology (5G) standards (e.g., 3GPP Rel. 16). In this paper, we propose a Machine Learning (ML) approach to characterise the spectrum utilisation and facilitate the dynamic access to it. Recent advances in Convolutional Neural Networks (CNNs) enable us to perform waveform classification by processing spectrograms as images. In contrast to other ML methods that can only provide the class of the monitored RATs, the solution we propose can recognise not only different RATs in shared spectrum, but also identify critical parameters such as inter-frame duration, frame duration, centre frequency, and signal bandwidth by using object detection and a feature extraction module to extract features from spectrograms. We have implemented and evaluated our solution using a dataset of commercial transmissions, as well as in a Software-Defined Radio (SDR) testbed environment. The scenario evaluated was the coexistence of WiFi and LTE transmissions in shared spectrum. Our results show that our approach has an accuracy of 96% in the classification of RATs from a dataset that captures transmissions of regular user communications. It also shows that the extracted features can be precise within a margin of 2%, and can detect above 94% of objects under a broad range of transmission power levels and interference conditions.

Dilated Causal Convolutional Model For RF Fingerprinting

Conference Paper

Jan 2020

Deep Learning for RF Device Fingerprinting in Cognitive Communication Networks

Article

Jan 2018

With the increasing presence of cognitive radio networks as a means to address limited spectral resources, improved wireless security has become a necessity. In particular, the potential of a node to impersonate a licensed user demonstrates the need for techniques to authenticate a radio's true identity. In this paper, we use deep learning to detect physical-layer attributes for the identification of cognitive radio devices, and demonstrate the performance of our method on a set of IEEE 802.15.4 devices. Our method is based on the empirical principle that manufacturing variability among wireless transmitters that conform to the same standard creates unique, repeatable signatures in each transmission, which can then be used as a fingerprint for device identification and verification. We develop a framework for training a convolutional neural network using the time-domain complex baseband error signal and demonstrate 92.29% identification accuracy on a set of 7 2.4 GHz commercial ZigBee devices. We also demonstrate the robustness of our method over a wide range of signal-to-noise ratios.

Very Deep Convolutional Networks for Large-Scale Image Recognition

Technical Report

Sep 2014

In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.

Deep Feature Learning for Wireless Spectrum Data

Abstract and Figures

Recommended publications

Multi-Stream Deep Neural Network for Diabetic Retinopathy Severity Classification Under a Boosting F...

Deep Feature Learning for Wireless Spectrum Data

Deep Feature Learning for Wireless Spectrum Data

Self-supervised Learning for Clustering of Wireless Spectrum Activity

Self-supervised learning for clustering of wireless spectrum activity