Conference PaperPDF Available

Benchmarking Audio-based Deep Learning Models for Detection and Identification of Unmanned Aerial Vehicles

May 2022

May 2022

DOI:10.1109/CPS-IoTBench56135.2022.00008

Conference: 2022 Workshop on Benchmarking Cyber-Physical Systems and Internet of Things (CPS-IoTBench)

Authors:

Sai Srinadhu Katta

Eduardo Kugler Viegas

PUCPR

Content uploaded by Eduardo Kugler Viegas

Content may be subject to copyright.

Benchmarking Audio-based Deep Learning Models

for Detection and Identiﬁcation of Unmanned Aerial

Vehicles

Sai Srinadhu Katta, Sivaprasad Nandyala, Eduardo Kugler Viegas, Abdelrahman AlMahmoud

Secure Systems Research Centre (SSRC)

Technology Innovation Institute (TII)

Abu Dhabi, U.A.E.

{sai, siva, eduardo, abdelrahman}@ssrc.tii.ae

Abstract—Over the last few years, Unmanned Aerial Vehicles

(UAVs) have become increasingly popular for both commercial

and personal applications. As a result, security concerns in both

physical and cyber domains have been raised, as a malicious

UAV can be used for the jamming of nearby targets or even for

carrying explosive assets. UAV detection and identiﬁcation is a

very important task for safety and security. In this regard, several

techniques have been proposed for the detection and identiﬁcation

of UAVs, in general, through image, audio, radar, and RF based

approaches. In this paper, we benchmark the detection and

identiﬁcation of UAVs via audio data from [1]. We benchmarked

with widely used deep learning algorithms such as Deep Neural

Networks (DNN), Convolutional Neural Network (CNN), Long

Short Term Memory (LSTM), Convolutional Long Short Term

Memory (CLSTM) and Transformer Encoders (TE). In addition to

the dataset of [1], we collected our own diverse identiﬁcation audio

dataset and experimented with Deep Neural Networks (DNN).

In a UAV detection task, our best model (LSTM) outperformed

the best model of [1] (CRNN) by over 4% in accuracy, 2% in

precision, 4% in recall and 4% in F1-score. In UAV identiﬁcation

task, our best model (LSTM) outperformed the best model of [1]

(CNN) by over 5% in accuracy, 2% in precision, 4% in recall

and 3% in F1-score.

Index Terms—Audio, Deep learning, Detection, Identiﬁcation,

UAVs

I. INTRODUCTION

The popularity of Unmanned Aerial Vehicles (UAV), com-

monly referred as drones, have signiﬁcantly increased over the

last few years, followed by the technological advancements

of their on-board components. In practice, modern UAVs

have enabled the deployment of user customized solutions,

which are able to analyze the data from a variety of UAV

hardware components including the camera, microphones,

LiDAR, accelerometer, GPS among others [2]. This ease of

customized UAV solutions has paved the way for several

autonomous UAV applications, such as object delivery, ﬁeld

surveillance, and even border control etc. [3].

Unfortunately, modern UAVs can also be used for malicious

purposes. Unsurprisingly, drone-based attacks can have a

signiﬁcant negative impact on the economy, safety and security.

As a result, in recent years several works have been proposed

for the detection and identiﬁcation of nearby UAVs [4]–[8].

Detection and identiﬁcation of nearby UAVs are generally

achieved using vision-based analysis [9], radio ﬁngerprint

detection [10], radar-based identiﬁcation [11], and microphone-

based approaches [12], [13]. In this work we focus on

UAV detection and identiﬁcation by audio-based approaches.

Machine Learning (ML) based techniques are predominantly

used for analyzing audio acoustic UAV signatures for UAV

detection and identiﬁcation [12]–[17].

We benchmarked UAV drone detection and identiﬁcation

from publicly available audio dataset [1], via commonly

used deep learning algorithms namely DNN, CNN, LSTM,

CLSTM and TE. In addition to [1], we collected our diverse

identiﬁcation audio dataset and experimented with DNN models

on it.

The main contributions of this paper are as follows:

•

We benchmarked various deep learning models namely

DNN, CNN, LSTM, CLSTM and TE on publicly available

UAV detection and identiﬁcation audio dataset [1]. Our

models perform signiﬁcantly better than the [1].

•

In addition to the dataset of [1], we collected our own

diverse identiﬁcation audio dataset consisting of 7 different

categories of UAVs namely no-UAV, drone, helicopter,

drone-membo, drone-bebop, airplane, and drone-hovering.

We experimented with DNN models on this and results

are promising.

The remainder of the paper is organized as follows. Section

II describes the literature review. Section III presents the audio-

based scheme for UAV detection and identiﬁcation through

deep learning architectures. Section IV talks about experimental

results and discussion. Finally, Section V concludes our work.

II. LITERATURE REVIEW

Over the last years, several techniques have been proposed

for UAV detection and identiﬁcation, ranging from video [5],

radio frequency [6], [18], thermal imaging [7], radar [11],

and more recently audio-based [12], [14], [19]–[22]. In such

a context, several approaches have been proposed for audio-

based UAV detection and identiﬁcation techniques, as they can

be easily deployed due to their negligible equipment costs, and

the promising accuracy results reported in the literature.

2022 Workshop on Benchmarking Cyber-Physical Systems and Internet of Things (CPS-IoTBench)

DOI 10.1109/CPS-IoTBench56135.2022.00008

Authorized licensed use limited to: Pontifica Universidade Catolica do Parana. Downloaded on August 31,2023 at 17:45:45 UTC from IEEE Xplore. Restrictions apply.

To achieve such a goal, UAV audio-based detection and

identiﬁcation is often implemented through four sequential

modules. First, the Data Acquisition module collects sound

samples from a given microphone. In general, the collected data

is evaluated according to a given predeﬁned time interval, e.g.

every

second. Then, the Feature Extraction module extracts

a set of behavioral features from the analyzed audio sample,

compounding a feature set. Several techniques can be used

to fulﬁll such a task in audio-based detection, including the

extraction of the audio spectrogram [17], [23], and the building

of the audio coefﬁcients [8]. Finally, the Detection/Identiﬁcation

module classiﬁes the built feature set in one of the selected

classes.

In recent years, a plethora of highly accurate audio-based

detection and identiﬁcation approaches have been proposed

for the detection and identiﬁcation tasks in UAV detection

and identiﬁcation [1], [14], [18], [20], [24]. In general, authors

resort to ML approaches, typically implemented through pattern

recognition techniques. To fulﬁll such a task, the operator

typically relies in a two-phase process, namely training, and

testing.

The training step aims at training of the ML model with

the training dataset and choosing the best model on the

validation dataset. The testing phase evaluates the ﬁnal model

detection and identiﬁcation performance metrics. In practice,

the performance measurements obtained at testing phase are

expected to be evidenced when the designed system is deployed

in production environments.

III. PROPOSED APPROACH

The tasks here are detection and identiﬁcation of UAV via

UAV audio signatures. The overall procedure of our method is

shown in Figure 1.

Almost all the related existing literature works [12], [13],

[15]–[17] exclusively followed the representation learning

techniques, we have also picked that as a design choice for

the implementation. Previous results on various settings of this

task gave state-of-the-art results. The audio data is sent through

feature extraction, the extracted features are passed through the

deep learning models giving the identiﬁcation and detection

results. The end-to-end model is as follows: ﬁrst an audio is sent

through mel frequency cepstral coefﬁcients (MFCC) feature

extraction,followed by a deep learning model of choice (DNN,

CNN, LSTM, CLSTM, TE) giving us the result of detection

and identiﬁcation. Based on the result safety/alert actions can be

performed accordingly. The detailed deep learning architectures

are as follows:

1) Deep Neural Network (DNN): The DNN is made of

fully-connected layers and non-linear activations. The input

to the DNN is the ﬂattened MFCC features, which feeds into

a stack of hidden fully-connected layers. At the output is a

linear layer followed by a softmax layer generating the output

probabilities of the classes.

2) Convolutional Neural Network (CNN): CNNs exploit

the local temporal and spectral correlation in the features via

2D convolution. The input to the CNN is the MFCC features,

Fig. 1. Overview of the proposed deep learning model for audio-based

detection and identiﬁcation of UAVs.

which feeds into a stack of convolution layers. At the output

is a linear layer followed by a softmax layer generating the

output probabilities of the classes.

3) Long Short-Term Memory (LSTM): LSTMs are known

to model long term dependencies and are shown to work very

well on various sequence modelling tasks. The input to the

LSTM is the MFCC features, and the whole ﬂattened output

sequence is fed to a linear layer followed by softmax for output

probabilities of the classes.

4) Convolutional Long Short-Term Memory (CLSTM):

CLSTMs are combination of convoultion followed by LSTMs.

This has the beneﬁts of both CNNs and LSTMs. CLSTMs

exploit the local temporal and spectral correlation, and model

the long term dependencies well. The input to the CLSTM is

the MFCC features, which has convolution followed by LSTM,

and the whole ﬂattened output sequence is fed to a linear layer

followed by softmax for output probabilities of the classes

5) Transformer Encoder (TE): Transformers are shown to be

the fundamental block for SOTA on various sequence modelling

tasks across various domains. In this work we use only the

Transformer Encoder part. The input is the MFCC features,

and the whole output sequence is fed to a linear layer followed

by softmax for output probabilities of the classes.

IV. EXPERIMENTAL RESULTS AND DISCUSSION

Our proposed scheme was evaluated and compared consid-

ering both literature and our own built dataset. The evaluation

aims at answering the following research questions, as follows:

(RQ1)What is the detection accuracy of audio-based techniques

in a publicly available dataset? (RQ2)What is the identiﬁcation

accuracy when different UAV types are considered? (RQ3)What

is the identiﬁcation performance impact when more diverse

ﬂying devices are considered?.

The proposed scheme for an audio-based detection and

identiﬁcation of nearby UAVs is implemented through a

pattern recognition pipeline. Therefore, we consider a given

microphone deployed in a monitored environment (Fig. 1,

Deployed Microphone), which will be used for the collection

and periodic sending of the environment audio samples. For

instance, collection of

-sec batches of audio in a predeﬁned

format. The main assumption is that nearby ﬂying UAVs will

Authorized licensed use limited to: Pontifica Universidade Catolica do Parana. Downloaded on August 31,2023 at 17:45:45 UTC from IEEE Xplore. Restrictions apply.

produce audio noises that will be captured by the microphone

for further analysis. The collected audio sample is analyzed by

the feature extraction module, which goal is to extract a set

of UAV related features. To achieve such a goal the module

applies an audio ﬁltering technique ( e.g., ﬁltering UAV-related

audio signals through the Mel-Frequency Cepstral Coefﬁcients

(MFCC)), before using it for the detection and identiﬁcation

tasks. As a result, a portion of non UAV-related audio can

be removed from the analyzed sample, improving the system

generalization even in highly noise environments. Finally, the

extracted ﬁltered feature vector (Fig. 1, Filtered Feature Vector)

is used as input by a deep learning classiﬁer, which outputs

a corresponding event class. In such a case, the used deep

learning model can output the analyzed event label in a two-

class setting, e.g. normal or UAV, or even in a multiclass setting,

e.g. outputting the type of UAV that generated the analyzed

audio.

The next subsections further describes the built datasets and

the performed experiments.

A. UAV Audio Datasets

In general, to properly evaluate ML-based techniques for

audio detection and identiﬁcation, it is necessary the provision

of huge amounts of labeled data. Unfortunately, due to privacy

issues, only few datasets are publicly available.

In light of this, our work makes use of Abdulla Al-Ali et.

al. [1] dataset which provides more than

1300

audio clips of

drone sounds. We used the dataset mainly for two tasks, one

is detection and other is identiﬁcation. In detection, the classes

are Drone and Not a Drone. Number of drone samples are

1332 and not a drone samples are 10372. In classiﬁcation task

the classes are membo, bebop, and not a Drone, the number

of samples are 666, 666, and 10372 respectively. Every ﬁle is

of 1sec duration. The data set is split on ﬁle level basis. The

drone data is a good representation of the real-world drone

audio.

To increase the model performance, the audio dataset is also

augmented through the introduction of noise data to ensure that

the system will be able to detect and identify the drone’s sound

from similar noises in an environment. From [1], the SNR

levels are not available. For the dataset publicly available, it

was collected in a quite indoor environment with drone ﬂying

and hovering.

Apart from the publicly available dataset, we have also

collected a new dataset with additional diverse ﬂying devices

audio sounds from multiple open sources to evaluate the impact

of model identiﬁcation performance metrics. The dataset was

built, with

UAV types, including no-UAV,drone,helicopter,

drone-membo,drone-bebop,airplane, and drone-hovering. The

majority of audio ﬁles for each UAV type was collected for 5

minutes.

For both selected datasets the audio data is made with a

sampling rate of

kHz and a linear encoding with

bits for

sample. Each input sound window is further segmented into

sub-frames of

ms using a moving Hamming window with

overlap of

ms. The sub-frames are processed by a bank of

ﬁlters to compute the short term feature in both temporal and

frequency domains [20]. For the deep learning algorithms, If

the majority of the frames is labelled with the tag ”drone” into

the given audio segment, it is assumed to recognize a ﬂying

drone in the surrounding environment.

Both datasets are split into

% for training,

% for

validation and

% for testing. The audio ﬁles are broken

into chunks of 1second.

B. Model Building

For the Feature Extraction module, for each audio frame,

we compute the Mel-Frequency Cepstral Coefﬁcients (MFCC)

that are commonly used in audio analysis. We make use of

Mel-scale in audio analysis to capture the comparatively higher

energy in lower frequencies compared to higher frequencies in

the range for the compounding of

MFCC features, which

are used as input by the selected deep learning model (Fig. 1,

Filtered Feature Vector).

We evaluate

commonly used deep learning algorithms for

the audio-based classiﬁcation task (Fig. 1, Deep Learning),

namely Deep Neural Network (DNN), Convolutional Neural

Network (CNN), Long Short-Term Memory (LSTM), Convolu-

tional Long Short-Term Memory (CLSTM), and Transformer

Encoder (TE).

The DNN was implemented with

hidden layers, each with

256

units, coped with a relu activation function, while the

output layer relies in a softmax activation function. The CNN

was implemented with

convolutional layers, each with a relu

activation function, a kernel size of

3X3

, and a

1X1

stride,

followed by a hidden layer with a softmax activation function.

There are no pooling layers used in the CNN we implemented,

the input dimension size to the CNN is [1X20X33], given the

small size there was no need to use the pool layers. The CNN

model architecture is not provided in [1], due to this, we are

not able to do a proper comparison in terms of architecture,

we made our design choice to keep the model as smaller as

possible.

The LSTM was implemented with one LSTM layer with

128

units, followed by a hidden output layer with a softmax

activation function. Similarly, the CLSTM makes use of a

convolutional layer with a relu activation function, a kernel

size of

3X3

, and a

1X1

stride, followed by a LSTM layer with

128

units, followed by a hidden output layer with a softmax

activation function. Finally, the TE was implemented with a

positional encoder, number of attention heads as

, followed

by a encoder layer with

128

units, and a output layer with a

softmax activation function,

For the model building procedure, for all selected deep

learning algorithms

100

epochs are executed with a batch

size of

128

. As for number of epochs, 100 is similar to

the related works. It is important to note that the used set

of parameters, were set similarly to related works, and no

signiﬁcant differences were found while varying them. There

is 80% train data, 10% val data, 10% test data. In each epoch

train and val data are used, model is trained on train data and

Authorized licensed use limited to: Pontifica Universidade Catolica do Parana. Downloaded on August 31,2023 at 17:45:45 UTC from IEEE Xplore. Restrictions apply.

TABLE I

DETECTION RESULTS OF THE DEEP LEARNING ALGORITHMS AS MEASURED AT THE PUBLICLY AVAILABLE DATASET [1]

Deep Learning Algorithm Accuracy (%) Precision Recall F1-Score

Recurrent Neural Network (RNN [1] 75.00 0.7592 0.6801 0.6838

Convolutional Neural Network (CNN) [1] 96.38 0.9624 0.9560 0.9590

Convolutional Recurrent Neural Network(CRNN) [1] 94.72 0.9502 0.9308 0.9393

Deep Neural Network (DNN) 98.35 0.9661 0.9549 0.9604

Convolutional Neural Network (CNN) 98.85 0.9753 0.9696 0.9724

Long Short-Term Memory (LSTM) 98.93 0.9759 0.9731 0.9745

Convolutional Long Short-Term Memory (CLSTM) 97.78 0.9460 0.9486 0.9473

Transformer Encoder (TE) 98.35 0.9634 0.9489 0.9606

TABLE II

IDENTIFICATION RESULTS OF THE DEEP LEARNING ALGORITHMS AS MEASURED AT THE PUBLICLY AVAILABLE DATASET [1]

Deep Learning Algorithm Accuracy (%) Precision Recall F1-Score

Recurrent Neural Network (RNN [1] 57.16 0.5964 0.5716 0.5562

Neural Network (CNN) [1] 92.94 0.9275 0.9263 0.9263

Convolutional Recurrent Neural Network (CRNN) [1] 92.22 0.9254 0.9223 0.9225

Deep Neural Network (DNN) 98.52 0.9589 0.9439 0.9508

Convolutional Neural Network (CNN) 98.60 0.9553 0.9447 0.9510

Long Short-Term Memory (LSTM) 98.60 0.9480 0.9603 0.9540

Convolutional Long Short-Term Memory (CLSTM) 98.11 0.9457 0.9188 0.9314

Transformer Encoder (TE) 98.19 0.9405 0.9369 0.9386

best model on val data is updated. Finally the best val model

is tested on the unseen test data.

C. Evaluation

The selected deep learning algorithms were evaluated with

respect to their accuracy, precision, recall and F1 scores. To

achieve such a goal, the following classiﬁcation performance

metrics were used:

•

True-Positive (TP): number of UAV-related audio samples

correctly classiﬁed as UAV-related.

•

True-Negative (TN): number of normal samples correctly

classiﬁed as normal.

•

False-Positive (FP): number of normal samples incorrectly

classiﬁed as UAV-related.

•

False-Negative (FN): number of UAV-related audio sam-

ples samples incorrectly classiﬁed as normal.

The F1 score was computed as the harmonic mean of

precision and recall values while considering UAV-related as

positive samples and normal as negative samples, as shown in

Eq. 3.

P recision =TP

TP +FP (1)

Recall =TP

TP +FN (2)

F1=2∗P recision ∗Recall

P recision +Recall (3)

From the literature [1], there were no experiments conducted

on the publicly available dataset with DNN, LSTM and

Transformer techniques. We performed them for a proper

TABLE III

INDIVIDUAL ACCURACIES IN OUR OWN BUILT DATASET.

UAV type Accuracy (%) Prec. Rec. F1-Score

no-UAV 85.71 1.00 0.86 0.92

drone 100 1.00 1.00 1.00

helicopter 66.66 0.67 0.67 0.67

drone-membo 100 1.00 1.00 1.00

drone-bebop 100 1.00 1.00 1.00

airplane 100 1.00 1.00 1.00

drone-hovering 100 0.88 1.00 0.93

OVERALL Dataset 95 0.96 0.95 0.95

comparison to asses the overall improvement for the detection

and classiﬁcation tasks.

The ﬁrst experiment, aims at answering RQ1 and evaluates

the identiﬁcation accuracy of the selected deep learning algo-

rithms over the publicly available dataset [1]. The evaluation

goal is to measure how the selected techniques perform when

using publicly available dataset for audio-based detection of

UAVs. In Table I shows, modeling results of [1] and our own

benchmark results for drone detection task. All our models

outperformed the models in [1] by 4% to more than 20% in

accuracy. Our best model (LSTM) outperformed best model of

[1] (CRNN) by over 4% in accuracy, 2% in precision, 4% in

recall and 4% in F1-score. Our best detection model, LSTM

classiﬁer achieved the highest accuracy of

98.93

%, precision

of 0.9759, recall of 0.9731 with an F1-Score of 0.9745.

The second experiment, aims at answering RQ2, and eval-

uates the UAV identiﬁcation accuracy of the selected deep

learning algorithms when different UAV types are considered.

To achieve such a goal, the publicly available dataset [1] is

Authorized licensed use limited to: Pontifica Universidade Catolica do Parana. Downloaded on August 31,2023 at 17:45:45 UTC from IEEE Xplore. Restrictions apply.

evaluated with different types of UAVs. In Table II shows,

modeling results of [1] and our own benchmark results for

drone identiﬁcation task. All our models outperformed the

models in [1] by 6% to more than 30% in accuracy. Our best

model (LSTM) outperformed best model of [1] (CNN) by

over 5% in accuracy, 2% in precision, 4% in recall and 3%

in F1-score. Our best identiﬁcation model, LSTM classiﬁer

achieved the highest accuracy of

98.60

%, precision of

0.9480

recall of

0.9603

with an F1-Score of

0.9540

. For instance, our

identiﬁcation LSTM model was able to achieve

98.60

%of

accuracy, a decrease of only

0.33

% when compared to the

detection scenario (Table I vs. Table II).

The DNN model in table 3 is trained and tested on the newly

collected 7 class UAV dataset. The third experiment aims at

answering RQ3, and evaluates the identiﬁcation accuracy in

our own built dataset. Table III shows that the classiﬁcation

accuracy of the DNN classiﬁer in our own built dataset. The

DNN classiﬁer achieved a high accuracy of

95.20

%. Compared

to its counterpart, trained in the publicly available dataset, the

DNN decreased the accuracy by

3.32

%, a marginal decrease

considering that the classiﬁers is being used in a different and

difﬁcult setting.

V. C ONCLUSIONS

In this work, we benchmarked the detection and identiﬁcation

of UAVs via audio, through multiple deep learning algorithms

namely DNN, CNN, LSTM, CLSTM, and TE. In addition to

dataset of [1], we also collected our own identiﬁcation dataset

and built DNN model over it. We have demonstrated that all

of the DNN, CNN, LSTM, CLSTM and TE algorithms are

able to provide signiﬁcantly higher performance metrics in

comparison to [1]. In detection task, our best model (LSTM)

outperformed the best model of [1] (CRNN) by over 4% in

accuracy, 2% in precision, 4% in recall and 4% in F1-score.

In identiﬁcation task, our best model (LSTM) outperformed

the best model of [1] (CNN) by over 5% in accuracy, 2% in

precision, 4% in recall and 3% in F1-score.

REFERENCES

[1]

Abdulla Al-Ali et. al. Sara Al-Emadi, “Drone audio dataset,”

https://github.com/saraalemadi /DroneAudioDataset, 2018.

[2]

Eleni Diamantidou Samaras, Stamatios and Antonios Lalas et al., “Deep

learning on multi sensor data for counter uav applications—a systematic

review.,” Sensors 19, vol. 22, pp. 4387, 2019.

[3]

Austin Reg, “Unmanned aircraft systems: Uavs design, development and

deployment,” John Wiley & Sons, pp. 1–365, 2011.

[4]

Beomhui Jang Jangwon Jung Seo Yoojeong and Sung bin Im, “Uav

detection using the cepstral feature with logistic regression,” Sensors,

pp. 219–2222, 2018.

[5]

Changwen Zheng Peng Junkai and Si Lingyu et.al., “Using images

rendered by pbrt to train faster r-cnn for uav detection,” Computer

Science Research Notes CSRN 2802, pp. 770–778, 2018.

[6]

Faith Erden Martins Ezuma and Ismail Guvenc et.al., “Micro-uav

detection and classiﬁcation from rf ﬁngerprints using machine learning

techniques,” 2019 IEEE Aerospace Conference, pp. 1–13, 2019.

[7]

Emmanuel Zenou Eren Unlu and Nicolas Riviere, “Using shape

descriptors for uav detection,” Electronic Imaging, vol. 1-5, 2018.

[8]

Amin Ullah Jamil Sonain, MuhibUr Rahman and Seyed Sajad Mirjavadi

et.al., “Malicious uav detection using integrated audio and visual features

for public safety applications,” Sensors 20, p. 3923, 2020.

[9]

Liu Hao and Yunfang Ren et al., “Drone detection based on an audio-

assisted camera array,” 2017 IEEE Third International Conference on

Multimedia Big Data(BigMM), p. 402–406, 2017.

[10]

Erden F. Ezuma M. and I. et.al. Guvenc, “Micro-uav detection and

classiﬁcation from rf ﬁngerprints using machine learning techniques,”

2019 IEEE Aerospace Conference, pp. 1–13, 2019.

[11]

Justin Matheson Philip Church, Christopher Grebe and Brett Owens,

“Aerial and surface security applications using lidar. in laser radar

technology and applications,” International Society for Optics and

Photonics, vol. 10636, pp. 27–38, 2018.

[12]

Chaoqun Yang Chang Xianyu and Zhiguo Shi. et. al., “A surveillance

system for drone localization and tracking using acoustic arrays,” IEEE

10th Sensor Array and Multichannel Signal Processing Workshop (SAM),

pp. 573–577, 2018.

[13]

Ald rich Martinez-Carranza Cabrera-Ponce and Caleb Rascon, “Detection

of nearby uavs using a multi-microphone array on board a uav,”

International Journal of Micro Air Vehicles, vol. 12, pp. 1–10, 2020.

[14]

Abdulla Al-Ali Al-Emadi, Sara and Abdulaziz Al-Ali, “Audio-based

drone detection and identiﬁcation using deep learning techniques with

dataset enhancement through generative adversarial networks,” Sensors

21, p. 4953, 2021.

[15]

Wansong Zhu Chen Liu and Minghui Zheng, “Audio-based fault diagnosis

method for quadrotors using convolutional neural network and transfer

learning,” 2020 American Control Conference, pp. 1367–1372, 2020.

[16]

Sara Abdulla Al-Ali Amr Mohammad Al-Emadi and Abdul aziz Al-Ali,

“Audio based drone detection and identiﬁcation using deep learning,”

15th International Wireless Communications and Mobile Computing

Conference (IWCMC), pp. 459–464, 2019.

[17]

Vemula Hari Charan., “Multiple drone detection and acoustic scene

classiﬁcation with deep learning,” PhD diss., Wright State University,

pp. 1–48, 2018.

[18]

H. Truong P. Nguyen and T. Vu.Matthan et.al., “Drone presence detection

by identifying physical signatures in the drone’s rf communication,” in

Proceedings of the 15th Annual International Conference on Mobile

Systems, Applications, and Services, ACM, vol. 5, pp. 211–224, 2017.

[19]

J Shin S Jeon and H. Yang et.al., “Empirical study of drone sound

detection in real-life environment with deep neural networks,” 2017 25th

European Signal Processing Conference (EUSIPCO), 2017.

[20]

Emiliano Pallotti Bernardini Andrea, Federica Mangiatordi and Licia

Capodiferro, “Drone detection by acoustic signature identiﬁcation,”

Electronic Imaging, pp. 60–64, 2017.

[21]

Kolamunna Harini and thilini et al., “Droneprint: Acoustic signatures

for open-set drone detection and identiﬁcation with online data,” Pro-

ceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous

Technologies, pp. 1–31, 2021.

[22]

Darren Haddad Alexander Sedunov and Alexander Yakubovskiy et.al.,

“Stevens drone detection acoustic system and experiments in acoustics

uav tracking,” 2019 IEEE International Symposium on Technologies for

Homeland Security (HST), pp. 1–7, 2019.

[23]

C Park J Kim and JC Gallagher et.al., “Real-time uav sound detection

and analysis system,” 2017 IEEE Sensors Applications Symposium (SAS),

pp. 1–5, 2017.

[24]

Amr Mohammad Al-Emadi, Sara Abdulla Al-Ali and Abdulaziz Al-

Ali, “Audio based drone detection and identiﬁcation using deep learning,”

2019 15th International Wireless Communications and Mobile Computing

Conference (IWCMC), pp. 459–464, 2019.

Authorized licensed use limited to: Pontifica Universidade Catolica do Parana. Downloaded on August 31,2023 at 17:45:45 UTC from IEEE Xplore. Restrictions apply.

Advances and Challenges in Drone Detection and Classification Techniques: A State-of-the-Art Review

Article

Full-text available

Dec 2023
SENSORS-BASEL

The fast development of unmanned aerial vehicles (UAVs), commonly known as drones, has brought a unique set of opportunities and challenges to both the civilian and military sectors. While drones have proven useful in sectors such as delivery, agriculture, and surveillance, their potential for abuse in illegal airspace invasions, privacy breaches, and security risks has increased the demand for improved detection and classification systems. This state-of-the-art review presents a detailed overview of current improvements in drone detection and classification techniques: highlighting novel strategies used to address the rising concerns about UAV activities. We investigate the threats and challenges faced due to drones’ dynamic behavior, size and speed diversity, battery life, etc. Furthermore, we categorize the key detection modalities, including radar, radio frequency (RF), acoustic, and vision-based approaches, and examine their distinct advantages and limitations. The research also discusses the importance of sensor fusion methods and other detection approaches, including wireless fidelity (Wi-Fi), cellular, and Internet of Things (IoT) networks, for improving the accuracy and efficiency of UAV detection and identification.

Drones Detection Using a Fusion of RF and Acoustic Features and Deep Neural Networks

Article

Full-text available

Apr 2024
SENSORS-BASEL

The use of drones has recently gained popularity in a diverse range of applications, such as aerial photography, agriculture, search and rescue operations, the entertainment industry, and more. However, misuse of drone technology can potentially lead to military threats, terrorist acts, as well as privacy and safety breaches. This emphasizes the need for effective and fast remote detection of potentially threatening drones. In this study, we propose a novel approach for automatic drone detection utilizing the usage of both radio frequency communication signals and acoustic signals derived from UAV rotor sounds. In particular, we propose the use of classical and deep machine-learning techniques and the fusion of RF and acoustic features for efficient and accurate drone classification. Distinct types of ML-based classifiers have been examined, including CNN- and RNN-based networks and the classical SVM method. The proposed approach has been evaluated with both frequency and audio features using common drone datasets, demonstrating better accuracy than existing state-of-the-art methods, especially in low SNR scenarios. The results presented in this paper show a classification accuracy of approximately 91% at an SNR ratio of −10 dB using the LSTM network and fused features.

Deep Learning-based drone acoustic event detection system for microphone arrays

Article

Full-text available

Oct 2023
MULTIMED TOOLS APPL

In recent years, drones have brought about numerous conveniences in our work and daily lives due to their advantages of low cost and ease of use. However, they have also introduced significant hidden threats to public safety and personal privacy. Effectively and promptly detecting drone is thus a crucial task to ensure public safety and protect individual privacy. This paper proposes a method that combines beamforming algorithm with Deep Learning neural network to achieve the detection of drone acoustic event using microphone array technology. The aim is to achieve maximum coverage and accuracy in drone detection. The proposed approach utilizes beamforming algorithm to perform directional audio capture of the drone sound signal acquired by the microphone array. It then extracts features such as Log-Mel spectrogram and Mel-Frequency Cepstral Coefficients from the audio signal, which are subsequently input to a Convolutional Neural Network for classification. The final detection result is obtained through this process. The study also incorporates experimental analysis to assess the impact of different frontend processing algorithms, dataset compositions and feature selections on the detection performance. To provide a more specific and pronounced indication of the accomplishment of the drone sound event detection task, a novel evaluation criterion is introduced, termed as the Machine- Human Ultimate Distance Ratio. This criterion is employed to assess the detection effectiveness of the drone sound event detection task. The results demonstrate that the detection range and accuracy of the drone sound event detection system based on Deep Learning and microphone array surpass those of single-microphone sound event detection method. The proposed detection approach achieves effective detection within a range of up to 135 m in the surrounding environment.

Utilization of 5G Technologies in IoT Applications: Current Limitations by Interference and Network Optimization Difficulties—A Review

Article

Full-text available

Apr 2023
SENSORS-BASEL

5G (fifth-generation technology) technologies are becoming more mainstream thanks to great efforts from telecommunication companies, research facilities, and governments. This technology is often associated with the Internet of Things to improve the quality of life for citizens by automating and gathering data recollection processes. This paper presents the 5G and IoT technologies, explaining common architectures, typical IoT implementations, and recurring problems. This work also presents a detailed and explained overview of interference in general wireless applications, interference unique to 5G and IoT, and possible optimization techniques to overcome these challenges. This manuscript highlights the importance of addressing interference and optimizing network performance in 5G networks to ensure reliable and efficient connectivity for IoT devices, which is essential for adequately functioning business processes. This insight can be helpful for businesses that rely on these technologies to improve their productivity, reduce downtime, and enhance customer satisfaction. We also highlight the potential of the convergence of networks and services in increasing the availability and speed of access to the internet, enabling a range of new and innovative applications and services.

Self-Supervised Drone Detection Using Acoustic Data

Conference Paper

Dec 2023

In recent years, the research focus on drone detection and identification via acoustic signals has grown significantly, spanning applications in both commercial and military domains. The predominant approach in this research area involves utilizing labeled data in the supervised machine learning models. However, this approach has inherent limitations as it trains models to detect or identify specific drone types present in the training dataset, constraining their generalization capacity to new drone types and tasks. This paper introduces a novel solution by employing self-supervised learning to address these limitations. The proposed model, trained without the need for labeled data, exhibits the capability to classify and identify diverse drone types and objects. It learns to differentiate between different drone models by training with a specific drone model, creating representations suitable for various downstream tasks such as classification and identification. Experimental results demonstrate the superiority of this self-supervised method over state-of-the-art supervised approaches, particularly in the context of drone detection benchmarks that lack label information.

A Small UAV Detection Method Based on Optical Flow and Visual Feature Fusion

Conference Paper

Oct 2023

A Training-time Friendly Network for Real-time Drone Detection

Conference Paper

Aug 2023

Audio-Based Drone Detection and Identification Using Deep Learning Techniques with Dataset Enhancement through Generative Adversarial Networks

Article

Full-text available

Jul 2021
SENSORS-BASEL

Drones are becoming increasingly popular not only for recreational purposes but in day-to-day applications in engineering, medicine, logistics, security and others. In addition to their useful applications, an alarming concern in regard to the physical infrastructure security, safety and privacy has arisen due to the potential of their use in malicious activities. To address this problem, we propose a novel solution that automates the drone detection and identification processes using a drone’s acoustic features with different deep learning algorithms. However, the lack of acoustic drone datasets hinders the ability to implement an effective solution. In this paper, we aim to fill this gap by introducing a hybrid drone acoustic dataset composed of recorded drone audio clips and artificially generated drone audio samples using a state-of-the-art deep learning technique known as the Generative Adversarial Network. Furthermore, we examine the effectiveness of using drone audio with different deep learning algorithms, namely, the Convolutional Neural Network, the Recurrent Neural Network and the Convolutional Recurrent Neural Network in drone detection and identification. Moreover, we investigate the impact of our proposed hybrid dataset in drone detection. Our findings prove the advantage of using deep learning techniques for drone detection and identification while confirming our hypothesis on the benefits of using the Generative Adversarial Networks to generate real-like drone audio clips with an aim of enhancing the detection of new and unfamiliar drones.

Malicious UAV Detection Using Integrated Audio and Visual Features for Public Safety Applications

Article

Full-text available

Jul 2020
SENSORS-BASEL

Unmanned aerial vehicles (UAVs) have become popular in surveillance, security, and remote monitoring. However, they also pose serious security threats to public privacy. The timely detection of a malicious drone is currently an open research issue for security provisioning companies. Recently, the problem has been addressed by a plethora of schemes. However, each plan has a limitation, such as extreme weather conditions and huge dataset requirements. In this paper, we propose a novel framework consisting of the hybrid handcrafted and deep feature to detect and localize malicious drones from their sound and image information. The respective datasets include sounds and occluded images of birds, airplanes, and thunderstorms, with variations in resolution and illumination. Various kernels of the support vector machine (SVM) are applied to classify the features. Experimental results validate the improved performance of the proposed scheme compared to other related methods.

Detection of nearby UAVs using a multi-microphone array on board a UAV

Article

Full-text available

Jan 2020

In this work, we address the problem of UAV detection flying nearby another UAV. Usually, computer vision could be used to face this problem by placing cameras onboard the patrolling UAV. However, visual processing is prone to false positives, sensible to light conditions and potentially slow if the image resolution is high. Thus, we propose to carry out the detection by using an array of microphones mounted with a special array onboard the patrolling UAV. To achieve our goal, we convert audio signals into spectrograms and used them in combination with a CNN architecture that has been trained to learn when a UAV is flying nearby, and when it is not. Clearly, the first challenge is the presence of ego-noise derived from the patrolling UAV itself through its propellers and motor’s noise. Our proposed CNN is based on Google’s Inception v.3 network. The Inception model is trained with a dataset created by us, which includes examples of when an intruder UAV flies nearby and when it does not. We conducted experiments for off-line and on-line detection. For the latter, we manage to generate spectrograms from the audio stream and process it with the Nvidia Jetson TX2 mounted onboard the patrolling UAV.

Deep Learning on Multi Sensor Data for Counter UAV Applications—A Systematic Review

Article

Full-text available

Nov 2019
SENSORS-BASEL

Usage of Unmanned Aerial Vehicles (UAVs) is growing rapidly in a wide range of consumer applications, as they prove to be both autonomous and flexible in a variety of environments and tasks. However, this versatility and ease of use also brings a rapid evolution of threats by malicious actors that can use UAVs for criminal activities, converting them to passive or active threats. The need to protect critical infrastructures and important events from such threats has brought advances in counter UAV (c-UAV) applications. Nowadays, c-UAV applications offer systems that comprise a multi-sensory arsenal often including electro-optical, thermal, acoustic, radar and radio frequency sensors, whose information can be fused to increase the confidence of threat’s identification. Nevertheless, real-time surveillance is a cumbersome process, but it is absolutely essential to detect promptly the occurrence of adverse events or conditions. To that end, many challenging tasks arise such as object detection, classification, multi-object tracking and multi-sensor information fusion. In recent years, researchers have utilized deep learning based methodologies to tackle these tasks for generic objects and made noteworthy progress, yet applying deep learning for UAV detection and classification is considered a novel concept. Therefore, the need to present a complete overview of deep learning technologies applied to c-UAV related tasks on multi-sensor data has emerged. The aim of this paper is to describe deep learning advances on c-UAV related tasks when applied to data originating from many different sensors as well as multi-sensor information fusion. This survey may help in making recommendations and improvements of c-UAV applications for the future.

Micro-UAV Detection and Classification from RF Fingerprints Using Machine Learning Techniques

Conference Paper

Full-text available

Mar 2019

Audio Based Drone Detection and Identification using Deep Learning

Conference Paper

Full-text available

Jun 2019

In recent years, unmanned aerial vehicles (UAVs) have become increasingly accessible to the public due to their high availability with affordable prices while being equipped with better technology. However, this raises a great concern from both the cyber and physical security perspectives since UAVs can be utilized for malicious activities in order to exploit vulnerabilities by spying on private properties, critical areas or to carry dangerous objects such as explosives which makes them a great threat to the society. Drone identification is considered the first step in a multi-procedural process in securing physical infrastructure against this threat. In this paper, we present drone detection and identification methods using deep learning techniques such as Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Convolutional Recurrent Neural Network (CRNN). These algorithms will be utilized to exploit the unique acoustic fingerprints of the flying drones in order to detect and identify them. We propose a comparison between the performance of different neural networks based on our dataset which features audio recorded samples of drone activities. The major contribution of our work is to validate the usage of these methodologies of drone detection and identification in real life scenarios and to provide a robust comparison of the performance between different deep neural network algorithms for this application. In addition, we are releasing the dataset of drone audio clips for the research community for further analysis.

Unmanned Aircraft Systems: UAVS Design, Development and Deployment

Book

Apr 2010

Reg Austin

DronePrint: Acoustic Signatures for Open-set Drone Detection and Identification with Online Data

Article

Mar 2021

With the ubiquitous availability of drones, they are adopted benignly in multiple applications such as cinematography, surveying, and legal goods delivery. Nonetheless, they are also being used for reconnaissance, invading personal or secure spaces, harming targeted individuals, smuggling drugs and contraband, or creating public disturbances. These malicious or improper use of drones can pose significant privacy and security threats in both civilian and military settings. Therefore, it is vital to identify drones in different environments to assist the decisions on whether or not to contain unknown drones. While there are several methods proposed for detecting the presence of a drone, they have limitations when it comes to low visibility, limited access, or hostile environments. In this paper, we propose DronePrint that uses drone acoustic signatures to detect the presence of a drone and identify the make and the model of the drone. We address the shortage of drone acoustic data by relying on audio components of online videos. In drone detection, we achieved 96% accuracy in a closed-set scenario, and 86% accuracy in a more challenging open-set scenario. Our proposed method of cascaded drone identification, where a drone is identified for its 'make' followed by the 'model' of the drone achieves 90% overall accuracy. In this work, we cover 13 commonly used commercial and consumer drone models, which is to the best of understanding is the most comprehensive such study to date. Finally, we demonstrate the robustness of DronePrint to drone hardware modifications, Doppler effect, varying SNR conditions, and in realistic open-set acoustic scenes.

An Audio-Based Fault Diagnosis Method for Quadrotors Using Convolutional Neural Network and Transfer Learning

Conference Paper

Jul 2020

Stevens Drone Detection Acoustic System and Experiments in Acoustics UAV Tracking [correction]

Conference Paper

Nov 2019

[correction] The paper incorrectly refers to a re-purposed ACAM 120 array as "OptiNav", please avoid conclusions regarding any OptiNav products based on this paper. Conclusions only pertain to the specific way of using a plane array of MEMS microphones.

Benchmarking Audio-based Deep Learning Models for Detection and Identification of Unmanned Aerial Vehicles

Recommended publications

Deep Learning Identification of Unmanned Aerial Vehicles with Hardware Implementation

Detecção e Classificação de Objetos Presentes em Imagens Aéreas de Drones de Ambientes Urbanos

Real-World On-Board Uav Audio Data Set For Propeller Anomalies

Convolutional Neural Network Based Sound Recognition Methods for Detecting Presence of Amateur Drone...

A Lightweight UAV Audio Detection Model Based on Multiscale Feature Fusion

Audio Based Drone Detection and Identification using Deep Learning