ArticlePDF Available

Cluster-Based Data Aggregation in Flying Sensor Networks Enabled Internet of Things

August 2023
Future Internet 15(8):279

August 2023
15(8):279

DOI:10.3390/fi15080279

License
CC BY 4.0

Authors:

Abdu Salam

Abdul Wali Khan University Mardan

Qaisar Javaid

International Islamic University, Islamabad

Masood Ahmed

Iqra University

Ishtiaq Wahid

Iqra University

Show all 5 authorsHide

Multiple unmanned aerial vehicles (UAVs) are organized into clusters in a flying sensor network (FSNet) to achieve scalability and prolong the network lifetime. There are a variety of optimization schemes that can be adapted to determine the cluster head (CH) and to form stable and balanced clusters. Similarly, in FSNet, duplicated data may be transmitted to the CHs when multiple UAVs monitor activities in the vicinity where an event of interest occurs. The communication of duplicate data may consume more energy and bandwidth than computation for data aggregation. This paper proposes a honey-bee algorithm (HBA) to select the optimal CH set and form stable and balanced clusters. The modified HBA determines CHs based on the residual energy, UAV degree, and relative mobility. To transmit data, the UAV joins the nearest CH. The re-affiliation rate decreases with the proposed stable clustering procedure. Once the cluster is formed, ordinary UAVs transmit data to their UAVs-CH. An aggregation method based on dynamic programming is proposed to save energy consumption and bandwidth. The data aggregation procedure is applied at the cluster level to minimize communication and save bandwidth and energy. Simulation experiments validated the proposed scheme. The simulation results are compared with recent cluster-based data aggregation schemes. The results show that our proposed scheme outperforms state-of-the-art cluster-based data aggregation schemes in FSNet.

Representation of node structure hierarchically in parse tree based on string S.

…

Simulation parameters.

…

Figures - uploaded by Muhammad Yeasir Arafat

Content may be subject to copyright.

Content uploaded by Muhammad Yeasir Arafat

Content may be subject to copyright.

Content uploaded by Abdu Salam

Content may be subject to copyright.

Citation: Salam, A.; Javaid, Q.;

Ahmad, M.; Wahid, I.; Arafat, M.Y.

Cluster-Based Data Aggregation in

Flying Sensor Networks Enabled

Internet of Things. Future Internet

2023,15, 279. https://doi.org/

10.3390/ﬁ15080279

Academic Editor: Claude Chaudet

Received: 31 July 2023

Revised: 16 August 2023

Accepted: 18 August 2023

Published: 20 August 2023

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

future internet

Article

Cluster-Based Data Aggregation in Flying Sensor Networks

Enabled Internet of Things

Abdu Salam 1, Qaisar Javaid 2, Masood Ahmad 1, Ishtiaq Wahid 1and Muhammad Yeasir Arafat 3, *

1Department of Computer Science, Abdul Wali Khan University, Mardan 23200, Pakistan;

abdusalam@awkum.edu.pk (A.S.); masood@awkum.edu.pk (M.A.); ishtiaqwahid@awkum.edu.pk (I.W.)

2Department of Computer Science and Software Engineering, International Islamic University,

Islamabad 44000, Pakistan; qaisar@iiu.edu.pk

3IT Research Institute, Chosun University, 309 Pilmun-daero, Dong-gu, Gwangju 61452, Republic of Korea

*Correspondence: myarafat@chosun.ac.kr

Abstract:

Multiple unmanned aerial vehicles (UAVs) are organized into clusters in a ﬂying sensor

network (FSNet) to achieve scalability and prolong the network lifetime. There are a variety of

optimization schemes that can be adapted to determine the cluster head (CH) and to form stable and

balanced clusters. Similarly, in FSNet, duplicated data may be transmitted to the CHs when multiple

UAVs monitor activities in the vicinity where an event of interest occurs. The communication of

duplicate data may consume more energy and bandwidth than computation for data aggregation.

This paper proposes a honey-bee algorithm (HBA) to select the optimal CH set and form stable and

balanced clusters. The modiﬁed HBA determines CHs based on the residual energy, UAV degree,

and relative mobility. To transmit data, the UAV joins the nearest CH. The re-afﬁliation rate decreases

with the proposed stable clustering procedure. Once the cluster is formed, ordinary UAVs transmit

data to their UAVs-CH. An aggregation method based on dynamic programming is proposed to save

energy consumption and bandwidth. The data aggregation procedure is applied at the cluster level

to minimize communication and save bandwidth and energy. Simulation experiments validated the

proposed scheme. The simulation results are compared with recent cluster-based data aggregation

schemes. The results show that our proposed scheme outperforms state-of-the-art cluster-based data

aggregation schemes in FSNet.

Keywords:

clustering; data aggregation; dynamic programming; ﬂying sensor network; internet

of things

1. Introduction

A multi-unmanned aerial vehicle (UAV)-aided ﬂying sensor network (FSNet) is con-

strained by various energy factors, such as limited energy, computation, memory, and

communication [

]. The energy consumption for sensing and computation is less than the

energy used for communication among the UAVs or to the ground station (GS) [

]. The

available energy resources are sometimes insufﬁcient for transmission and computation

during the mission. However, the collected data need to be communicated to the destina-

tion for further processing. The performance of the network lifetime depends on efﬁcient

energy utilization [

]. The researchers tried to minimize energy utilization in other wireless

networks, but energy utilization still exists. Due to the ﬂying speed of UAVs, the rapid

variation in topology, terrain structure, and diverse directions make it difﬁcult to collect

and route information [

]. The researchers proposed energy-efﬁcient schemes by consider-

ing different parameters such as reducing the communication distance, computation cost,

mobility, and degree. However, data collection and minimization of communication load

go unnoticed to save bandwidth and energy [7,8].

A data aggregation approach reduces the energy consumption of UAVs and increases

their lifespan. The data aggregation approach is different in wireless sensor networks

Future Internet 2023,15, 279. https://doi.org/10.3390/ﬁ15080279 https://www.mdpi.com/journal/futureinternet

Future Internet 2023,15, 279 2 of 24

(WSNs) and vehicular ad hoc networks (VANETs) from UAV networks [

]. In WSN, the

uses of the data aggregation approach are for decreasing energy consumption rather than

minimizing network capacity usage [

]. In VANET, due to the high variation in the topol-

ogy, data aggregation is performed by many vehicles. In addition to the degree, mobility,

density, and other parameters, the multi-UAV system is constrained by energy factors,

which is why it is compatible with WSN and VANET requirements [

]. UAVs also consume

energy by processing and storing more data, just like ﬂight and communication [12].

UAV-based data aggregation has emerged as a promising solution for collecting data

from remote and hard-to-reach areas. One of the main motivations for UAV-based data

aggregation is the ability to obtain data more efﬁciently and accurately. UAVs can cover a

larger area in a shorter amount of time compared to traditional data collection methods.

This means that data can be collected faster, allowing for a quicker analysis and quicker

decision-making. Additionally, UAV-based data aggregation can also reduce data collection

costs. Traditional methods often require expensive equipment and personnel, whereas

UAVs can be operated by a single person and require minimal equipment.

The fundamental operation in FSNet is data aggregation, which aims to transmit data

among UAVs or to the GS. The data aggregation approach reduces communication costs

and bandwidth utilization while obtaining aggregated data. Data aggregation utilizes the

concept of many-to-few. A data aggregation protocol describes how the data are gathered,

how they are routed to the destination, and when they should be transmitted.

Therefore, in this study, we developed a cluster-based data aggregation scheme for

UAV-based FSNet. This research contributes the following:

•

An effective mechanism designed based on a honey-bee algorithm (HBA) to select

optimal unmanned aerial vehicles–cluster head (UAVs-CH).

•The formation of balanced and stable clusters reduces re-afﬁliation rates.

•

Data aggregation algorithm proposed to limit duplicated data communication to the

base station (BS).

•Avoids the transmission of unwanted packets to the BS and save FSNet bandwidth.

•

Mathematical techniques measure the accuracy and correctness of the proposed scheme.

The remaining article is structured as follows: Section 2elaborates on the existing

literature. Section 3discusses the proposed cluster formation and data aggregation schemes.

In Section 4, experiment evaluation and simulation outcomes are outlined. In the Section 6,

the paper concludes and states future directions.

2. Background

This section reviews the existing work on energy-efﬁcient clustering and UAV-based

data aggregation approaches. A UAV-based data aggregation network architecture dif-

fers signiﬁcantly from a traditional wireless sensor network. For these reasons, existing

algorithms for stationary or low-mobility WSNs are not feasible [

]. A data aggrega-

tion algorithm based on UAVs needs to be able to adapt to networks with high mobility,

sudden changes in topology, and sporadic communication links. WSNs require different

levels of quality of service, including delay, packet loss, and reliability on the underlying

networks. Furthermore, WSNs are limited in terms of energy, computation power, and

network resources. An appropriate data aggregation technique is essential for meeting

requirements while respecting WSN limitations. WSNs require time synchronization to

coordinate data, energy, and localization. To address the sensor time synchronization

problem, the authors of [

] proposed a pairwise broadcast synchronization (PBS) protocol

for multi-cluster sensor networks that reduces overall energy consumption while maintain-

ing synchronization accuracy. Reference [

] proposed a distributed heuristic algorithm

for selecting appropriate sensors in a multi-hop sensor network to reduce the number of

message exchanges needed for network-wide synchronization using PBS.

Future Internet 2023,15, 279 3 of 24

2.1. Energy Efﬁcient Clustering

Energy-efﬁcient clustering strategies for FSNet are still in their infancy [

]. This

section presents an overview of some existing energy-efﬁcient solutions based on different

parameters, criteria, and approaches. FSNet has many military and civil applications.

Nevertheless, the main issues of UAVs in FSNet are limited energy, high mobility, fre-

quent topology changes, and terrain structure, due to the limited ﬂight time of UAVs

and lack of routing efﬁciency [

]. Aadil et al. [

] addressed these issues and proposed

a clustering model with energy-aware link-based clustering called EALC. For efﬁcient

clustering, parameters such as energy, degree, cluster formation, and cluster head lifetime

are taken into account. Link quality and transmission range are considered ﬁrst to reduce

the energy consumption of ﬂying nodes. The K-means density-clustering model used for

high mobility considers distance and energy. This model selects cluster heads (CHs) with

the least computation and maximum throughput and minimizes routing overhead. The

lifetime of UAVs increased with the use of an energy-efﬁcient selection of CHs. A simple

clustering approach is used to reduce computation overhead. Nevertheless, EALC ignored

the communication load, data aggregation, and bandwidth utilization factors.

Arafat and Moh [

] proposed an energy-efﬁcient clustering scheme based on particle

swarm optimization for an emergency mission. First, implement swarm-intelligence-based

localization and clustering (SIL and SIC) schemes that deﬁne the search space with a

boundary box to reduce computation power. UAVs are placed randomly in the search

space. SIL uses a grouping scheme to calculate the distance between the target UAVs.

The proposed model has an estimation model to measure UAV distance from CH. The

distance between nodes and CH is considered for reducing energy consumption. The

Euclidean distance is utilized for locating UAVs and balancing cluster size. In SIC, cluster

formation and CH selection are performed with a ﬁtness function that considers remaining

energy and distance. The performance metrics used in this paper are energy consumption,

communication load, and delay. Node degree and residual energy are considered to balance

inter-cluster and intra-cluster energy utilization.

Yang et al. [

] proposed a probabilistic energy-aware clustering scheme to ﬁnd the

most efﬁcient path for UAVs using ant colony optimization (ACO). WSN data-gathering

efﬁciency is achieved with UAVs. The network is divided into clusters, each with a CH,

cluster members (CMs), and UAV. The nodes are stationary with a position-aware CH; the

CH receives information from the UAVs ﬂying around the cluster heads. The proposed

model has three stages. First, a UAV senses data about the farmland event via the ground

segment. Second, the data gathered by UAVs is transmitted to the data center through the

CH. Finally, the data center contains a database and management information system.

To overcome packet loss, long delay, and increased routing overhead, Yu et al. [

]

proposed a clustering protocol based on ACO to enhance network performance. A reliable

link supervision method was established due to UAV mobility and topology variation.

The nonlinear processing scheme increases buffer size and link load to control and avoid

congestion on the link. Based on the density of UAVs, two routing strategies are proposed,

namely sparse formation and concentrated formation. ACO-based polymorphism-aware

routing integrates dynamic source routing and ACO.

2.2. UAVs-Based Data Aggregation

In FSNets, the collection and transmission of information through multiple hops

increase energy utilization [

]. The data aggregation approach reduces energy utilization

and increases network lifetime by minimizing UAVs load. The data aggregation approaches

reduce the communication cost and energy consumption [

]. The researchers developed

aggregation approaches for FSNet without redundant data elimination [24].

Wang et al. [

] introduced a UAV-assisted topology-aware data aggregation protocol

in WSN (TA-UAV-DA). The data aggregation approach was inspired by the compressive

sensing approach to reduce the errors rate in the data reconstruction process, extra overhead,

and energy consumption. Balanced tree-based topology construction is performed to

Future Internet 2023,15, 279 4 of 24

minimize the scope and update the matrix measurement. The CMs send data to CH, and

UAV gathers data from the CHs. The simulation results show that the approach performs

better when reconstructing data, and it is more efﬁcient in data aggregation and storage

constraints than a random walk and intelligent compressive sensing.

Wu et al. [

] developed an energy-efﬁcient UAV-based data aggregation protocol

(EE-UAV-DA) for WSNs. In EE-UAV-DA scheme, UAVs gather data as data mules. The

proposed method calculates the optimum link for the data mule through all CHs, using a

genetic algorithm that achieves high-energy efﬁciency. By balancing system throughput,

the proposed approach reduces the delay between the sensors and the sink node. The

optimization scheme provided by heuristic search identiﬁes optimal solutions for joint

CH selection and optimal routes for the data mule to decrease energy consumption. The

objective function calculates and measures each solution’s quality for the optimum path.

Thammawichai [

] proposed optimizing communication and computation for multi-

UAV information-gathering applications called OC-mUAV. Multi-hop clustering incorpo-

rates data aggregation by using a mixed-integer optimization formulation with mixed-

integer nonlinear programming. In order to determine the roles of UAVs, the optimal

control problem was formulated. The system framework tries to ensure that the optimal

number of UAVs communicate with BS. To maintain minimum energy consumption during

routing, each UAV acts as an aggregator. An adaptive energy consumption model mini-

mizes energy consumption by considering sensing energy, aggregation energy, transmitting

energy, and receiving multi-UAV energy. To reduce communication and communication

energy, area mapping and target tracking were addressed during testing. Target and sensor

models are used to select the sensor UAV based on a subset of UAVs. The distance between

UAVs is used for mapping. As a result of the data aggregation framework, the network is

ﬂexible and reliable since it is a self-organized network, which prolongs the lifetime of the

network due to multi-hop networks and provides better performance due to clustering of

heterogeneous UAVs.

Dong et al. [

] proposed an algorithm to collect and process data in WSN, using

UAVs and mobile agents (MAs) to search for victims at disaster sites. MAs move around

the area to collect data from sensor nodes and share information with UAVs. The UAV

assigns MAs to group leaders. The density of sensor nodes in a group known by the group

leader has high residual energy and is an optimum link to a UAV. MAs’ routing is based on

information-driven static and dynamic mobile agent planning algorithms. The proposed

scheme is efﬁcient in energy and time for any dense network using MAs and UAVs.

To decrease energy utilization in a UAV-aided WSN, Liu and Zhu [

] proposed

an energy-efﬁcient data collection method. Sensor nodes are placed randomly in the

environment. The proposed approach uses three transmission modes to solve the short

buffer size of sensor nodes to transmit data within the allotted time slots. The sensor node

selects the modes, i.e., waiting, transmission to a sink node, and uploading to UAV in each

discrete time slot. The sensor node is selected in waiting mode to sleep and not transmit the

status of the node. The sensor node uploads data to the sink node in the second mode. In

the third mode, the sensor node delivers data to the UAV based on the threshold value and

distance condition during the UAV preplanned trajectory visit. The UAV of a ﬁxed-wing

aircraft is deployed at a constant velocity. This article uses a ﬁnite-horizon sequential

Markov process and dynamic programming algorithm for the optimized transmission

policy. Secondly, the proposed method optimizes the preplanned trajectory for UAVs using

a recursive random search algorithm.

The authors of [

] analyzed the performance of an energy-constrained Internet of

Things (IoT) system that uses a power beacon and UAV for data collection. The study

examined how different system and channel parameters affect outage probability, outage

capacity, and ergodic capacity. In [

], the authors explored the challenges and possible

solutions for implementing a fully immersive and interactive industrial metaverse, which is

a virtual space that interacts with the physical world in real time. In the paper, the authors

focused on improving key performance indicators, such as the Age of Information, latency,

Future Internet 2023,15, 279 5 of 24

and reliability through optimizing short-packet structures in 6G URLLC communication.

A cooperative strategy involving an unmanned ground vehicle and UAV is proposed

in [

] to collect data from sensor nodes (SNs) in UAV-enabled data collection systems

when SNs may not be able to upload their data because of factors such as insufﬁcient

energy and low ﬂight altitude. A collaborative strategy selection algorithm that combines

multistage-based SN association and UAV-UGV path optimization algorithms was used

to determine trajectories for mobile data collection nodes on the ground and in the air to

minimize mission completion time.

3. Flying Sensor Network Cluster Optimization

In cluster-based ﬂying networks, the selection of CHs and cluster formation requires

special attention to decrease the re-afﬁliation rate and save FSNet resources [33]. To select

CHs, essential parameters such as the remaining energy, mobility, and ﬂying nodes degree

are considered to obtain optimal clusters [

–

]. These parameters are optimized to

distribute load among clusters. We use the clustering approach to balance and select CHs

in accordance with [

], as shown in Figure 1. Data aggregation is initiated once clusters

are formed.

Future Internet 2023, 15, x FOR PEER REVIEW 6 of 25

Figure 1. Flowchart of proposed system.

Clustering Setup

The HBA is applied in the cluster setup phase to determine the optimal CHs [37]. The

CH selection is based on the HBA to form a balanced cluster. When selecting CHs, the

UAV mobility and neighbor criteria are considered to minimize re-clustering. In FSNet,

once CHs are selected, they broadcast a message containing ID, position, and status. All

UAVs in CH range will receive broadcast messages and join the cluster. Once UAVs join a

cluster, they become CMs and share information with CH. If a UAV receives membership

messages from multiple CH, joining will be based on the distance between the UAV and

CH. If the distance is the same, the random UAVS-CH selection mechanism will take

place. The working of the cluster setup phase is shown in Algorithm 1 below. Once the

cluster is formed, the data-aggregation-and-communication phase is initiated to transmit

the data to the BS.

Algorithm1: Pseudo Code of UAVs Enabled CH Selection.

1 Procedure CH-Selection-Multi-UAVs (MUAVs)()

2 Input: Swarm of UAVs 𝑆𝑊

UAV nectar [𝑛], and cluster 𝐶.

3 Output: UAVs-CH

4 call function calculate-UAVs-Nectar (𝑛)

v1 represents the number of nodes (UAVs) when there are n total UAVs in the network

5 for 󰇡𝑣=1;𝑣≤ FSNet

C;𝑣++󰇢 do

// selection of UAVs-CH in a random way

6 UAVs-CH[v]=functionRand (𝑆𝑊)

7 end for

8 while (highest-value!=yes) do

Figure 1. Flowchart of proposed system.

As shown in Figure 1, the proposed approach begins with the selection of a CH-UAV.

The selection process determines which UAV will act as the CH. If the node is the CH,

it proceeds to the next step. Then, the CH-UAV broadcasts the time division multiple

access (TDMA) schedules to all member nodes within its cluster. TDMA is a channel

access method that allows multiple nodes to share the same communication channel by

dividing it into different time slots. The CH-UAV receives data from all neighboring nodes

within its cluster. Data aggregation is the process of combining or summarizing data from

multiple sources. The CH-UAV performs data aggregation on the received data, reducing

Future Internet 2023,15, 279 6 of 24

redundancy and improving efﬁciency. After data aggregation, the CH-UAV sends the

aggregated data to the BS for further processing. If the current node is not CH, it proceeds

to the next step. The non-CH node waits for the TDMA schedule broadcast by the CH-UAV.

This schedule determines when the node can transmit or receive data. The algorithm checks

the energy level of nodes. If the energy is still available, it returns to the CH selection step,

indicating that the next round of the process will begin. When the node’s energy reaches

zero or depletes, the ﬂowchart terminates, and the process ends.

In cluster-based routing for ﬂying sensor networks, CHs may receive multiple copies

of the same dataset from different sensors located in the vicinity. The communication of

data requires more resources as compared to computation. Hence, a method is required to

discard duplicated packets when sending data to the base station at the cluster level. Thus,

network resources such as batteries and bandwidth can be utilized for other purposes. The

network lifetime will increase. Our proposed algorithm works in two phases: cluster setup

and data aggregation.

Clustering Setup

The HBA is applied in the cluster setup phase to determine the optimal CHs [

]. The

CH selection is based on the HBA to form a balanced cluster. When selecting CHs, the

UAV mobility and neighbor criteria are considered to minimize re-clustering. In FSNet,

once CHs are selected, they broadcast a message containing ID, position, and status. All

UAVs in CH range will receive broadcast messages and join the cluster. Once UAVs join a

cluster, they become CMs and share information with CH. If a UAV receives membership

messages from multiple CH, joining will be based on the distance between the UAV and

CH. If the distance is the same, the random UAVS-CH selection mechanism will take place.

The working of the cluster setup phase is shown in Algorithm 1 below. Once the cluster is

formed, the data-aggregation-and-communication phase is initiated to transmit the data to

the BS.

Algorithm 1: Pseudo Code of UAVs Enabled CH Selection.

1 Procedure CH-Selection-Multi-UAVs (MUAVs)()

2 Input : Swarm of UAVs SWUAV, UAV nectar [nU AV ], and cluster CF SNet.

3 Output: UAVs-CH

4call function calculate-UAVs-Nectar (nUAV)

// v1 represents the number of nodes (UAVs) when there are ntotal UAVs in the network

5for(v=1; v≤CFSNet;v+ +)do

// selection of UAVs-CH in a random way

6 UAVs-CH[v]=functionRand (SWU AV )

7 end for

8 while (highest-value! = yes) do

9for(v1=1; v1≤nUAV;v1=v1+1)do

// the suitability of current selection is computed

10 if (v1 in UAVs-CH) then

11 Fitness Value FValueU AV =FValueU AV +1(SWU AV [u]+AFVUAV )

// Average ﬁtness value AFVUAV

12 end if

13 end for

14 if (FValueU AV <PFVal ueU AV ) then

//PFVal ueU AV is the suitable value in the existing solution

15 swap FValueU AV

16 end if

17 if (UAVs-CH-optimum! = yes) then

while(empb! =0)do // Employed bee (empb)

// visiting of bees employed till empty, where αij is UAV afﬁliation with the current

round while yis the neighborhood size

19 UAVj(x+1)=U AVj(x)+αi j ∗y

// selection of different UAVs from fellow citizen

Future Internet 2023,15, 279 7 of 24

Algorithm 1: Cont.

20 end while

21 Pri=WUAVi

∑k

j=1WUAVj

// the new UAVs probability Priwill be calculated based on Weight of UAV (WUAV )

22 while (the Obees 6=Є) do //Onlooker bees (Obees)

23 Selection of another set of UAVs-CH will be carried out subject to the probability Pri

24 end while

25 Else

26 return UAVs-CH

27 end while

28 end procedure

4. Data Aggregation and Communication

We collect data from ﬂying UAVs and match the data to discard similar data sent by

multiple sensors in the data aggregation process. Data aggregation is divided into two

levels. In level 1, the data are collected using a TDMA schedule when multiple sensors

simultaneously communicate data.

A near-linear time algorithm is proposed in this paper for the data aggregation level 2

problem in FSNet. The data coming from sensors is converted into a long string. According

to our knowledge, this is the leading work to assume nontrivial alignment among the

strings and the patterns. Speciﬁcally, we demonstrate the data aggregation problem in two

ways: First, to sanction approximating

D[i]

’s, and second, an additional procedure sanction

named partial data move, for the movement of partial data from one position to another in

a data.

This similarity among the

and

data is known as data match with moves (DMM)

and designated as

d(X,Y)

[

]. DMM is a powerful data-matching tool that can greatly

beneﬁt FSNet data aggregation. By using DMM, network operators can easily match

data from different sources, allowing for a more accurate and comprehensive analysis of

the data. The DMM algorithm can take advantage of UAV mobility by assigning them

to different regions and optimizing their movement patterns to efﬁciently collect data.

Moreover, DMM can provide a wide range of beneﬁts, including reducing the amount of

data, reducing bandwidth usage, increasing energy efﬁciency, and supporting proximity-

based data aggregation. There are various applications in computational biology where

partial data matches are considered a primeval in multiple situations. Moving a larger

subsequence is similar to the insert or delete operation; during text processing, moving a

large array together might be assumed, like reordering to deleting or inserting typescripts.

Keep in mind that the nontrivial placements are still a challenge for DMM. Hence,

d(X,Y)

is the size of the small structure of edit procedures that convert

; the allowable

procedure affects the data stated. The deletion of a character at location,

loc

, transforms

to X[1]. . . X[loc −1],X[loc +1]. . . X[n].

•

The insertion of a character, “

” at a location, “

loc

”, gives

X[1]. . . X[loc −1]

X[loc]. . . X[n].

•

The substitution of a character at the location, “

loc

”, with character, ”

”, results in

X[1]. . . X[loc −1],c,X[loc +1]. . . X[n].

•

The partial data movement with factors 1

≤loc ≤loc2≤k≤n

converts

X[1]. . . X[n]

into X[1]. . . X[loc −1],X[loc2]. . . X[loc3−1],X[loc]. . . X[loc2−1],X[loc3]. . . X[n].

The data are identical when the edit distance between two data is “0”. The metric

represents its measure. The transformation is performed in several operations, and each

operation’s cost is equal even in the inverse case. Hence,

d(X,Y)=d(Y,X)

; then, every

distance resulting from transforming one datum to another must follow the triangular

inequity. The restrictions of the interaction of edit operations are none. These restrictions

may be as follows: it is relatively conceivable for a fractional data move to take a fractional

Future Internet 2023,15, 279 8 of 24

datum to a different position and then for a successive data move to function on a fractional

datum that overlays the relocated fractional datum and its neighboring typescripts.

The deterministic algorithm results in the DMM problem, and the running time

complexity is

O(nlogn)

. An algorithm returns the equivalent array,

“Ar”

, where every

Ar[loc]

is estimated to be close to the

O(lognlog∗n)

factor. The proposed methodology

depends on inserting data to a vector of arrays below

1 metric. The

1 size among these

arrays is O(lognlog∗n), the estimate of the DMM between the two original data.

The proposed method can further solve several problems beyond the primary data

aggregation Level 2 problem. These contain data-similarity search problems. A study

presented in [

] showed that calculating the distance among two datasets is NP-Complete.

An approximation algorithm with complexity

O(logn)

was presented to ﬁnd the distance

among datasets. However, the approximation may not resolve the data-match problem.

The proposed scheme focuses on the vital components. Firstly, we parsed data into a

hierarchy of partial data. We use a simple hierarchical mechanism for parsing called edit-

sensitive parsing (ESP), which generates a tree with three degrees [

]. ESP may not be an

innovative parsing method; however, it is an effort to make straightforward the procedural

details of relating predeﬁned coin throwing to obtain classiﬁed data fragmentation. It is

expected that the ease of ESP assists in/exposes more uses of classiﬁed data decays. The

next module of this research is the approximate distance preservative data inserting to

array spaces based on hierarchical parsing.

4.1. Data Embedding

We demonstrate a data-embedding scheme, which embeds data into a multidimen-

sional matrix. Assume some data,

, over an alphabet,

∑

. The data,

, will be embedded

Em(X)

, an array with multi-magnitudes,

O(|∑||X|)

, but the number of magnitudes of

the nonzero array will be relatively minimum, indeed O(|X|).

The time complexity of the embedding

process is linear. The proposed scheme will

parse Xinto different fractional data and reﬂect the multi-set

T(X)

of these fractional data.

We conﬁrm that the size of

T(X)

is as a maximum

2|X|

. Hence,

Em(X)

is the distinguishing

array for

T(X)

. The process through which the parse tree

T(X)

will generate is known as

ESP. The following subsection explores the ESP.

4.1.1. EPS

The parse tree,

EST(X)

, that is formed for data

is the breakdown structure of par-

tial data analogous to the nodes of

EST(X)

. The aim is to limit the data-editing operations.

A clear EST contains all dynamic data of length 2

loc

, namely

Xhloc22loc . . . (loc2+1)2loc −1i ∈loc

and

loc2

; it results in a complete binary tree. How-

ever, if

is updated using addition or removal to obtain

and

will be construed

using the same technique to two dissimilar hierarchical partial datasets; thus, the resultant

embedding will not preserve approximation.

Suppose we have data

; the next step is to form an ESP tree in a hierarchal fashion

with

Pi(X)

repetitions. Every repetition produces a different level of EST. At every repe-

tition,

, the process initiates with data,

, and divides them into chunks of size two or

three. We substitute every chunk analogous to

X[loc3. . . loc2]

by name and refer to the pair

(loc,h(X[loc3. . . loc2])). Moreover, hcorresponds to a 1-1 hash function on partial data X.

Suppose

, and the repetitions until the length of data left become 1. The EST

tree of

contains levels, and for every string of

Xloc −

1, a node at level

must exist, and

the children are the nodes in level

loc −

1. Here, a leaf node is each data unit of

More precisely, we can make the partitions of

data into dissimilar non-overlapping units.

The data can be divided into non-overlapping units in three ways:

(a)

Maximum adjacent partial data of

that comprise a repetitive sign (

0 shows in the

form alfor a∈Σiwhere l>1).

(b)

Length of partial data (Long) at least log∗|∑l oc −1|not of type 1 above.

(c)

Length of partial data (short) less than log∗|∑l oc −1|not of type 1.

Future Internet 2023,15, 279 9 of 24

The general term “meta block” (mb) is used for all such partial data. To produce

next-level parsing, we process each mb as demonstrated in the following subsections.

4.1.2. Type 2: Long Data without Duplications

We assume a dataset where two consecutive symbols are duplicates and represent an

mb of type 2. Assuming a structure, X, without duplications (i.e., X[loc]6=X[loc +1]for

loc =

. . . |X|−

1), we select as a maximum

|X|/

2 and minimum

|X|/

3 partial data of

a node. We obtain

upon concatenating these nodes. The ﬁrst phase comprises repeating

the reduction process of an alphabet.

Reduction of alphabet (

∑

): For every character

C[loc]

, calculate a new tag.

C[loc −1]

is the symbol located to the left of

C[loc]

, and assume

C[loc]

and

C[loc −1]

are denoted as

binary numbers. The least-signiﬁcant bit (LSB) key where

C[loc]

differs from

C[loc −1]

represented by

, and assume bit (

C[i]

) is the value of

C[i]

at the

Leth

bit position. For

example, the location of bit

is next to the character at the former index, i.e., form label

(C[loc]) as Le +bit(Le,C[loc]).

Lemma 1. For some loc, if C[loc]6=C[l oc +1], then C[loc]6=label(C[loc +1]).

Proof.

Supposing the LSB location, where

C[loc]

differs from

C[loc +1]

, is similar in a way

where

C[loc]

also differs from

C[loc −1]

(else,

labelC[loc]6=label(C[loc +1])

). However,

the bit character at this position in every symbol must be different; therefore,

labelC[loc]6=

label(C[l oc +1]).

Adopting this method, we create an innovative series. If the existing alphabets have

length,

, then the extracted alphabets have size, 2

log|η|

. We currently repeat (repetition

is orthogonal to the duplication that produces an EST tree of

; repeating on

that is a

subseries with no matching contiguous characters) and make the character decrease until

the length of the alphabet is unable to shrink. This will take

log∗|η|

repetitions. Note that

the labels for the ﬁrst log∗|η|symbols will not exist.

The lemma states that if a symbol at position

C[loc]

is not equal to the symbol at

position

C[loc +1]

, then the label assigned to

C[loc]

is not equal to the label assigned to

C[loc +1]

. Lemma 1’s goal is to establish a connection between symbols and their corre-

sponding labels based on how they differ from one another. Lemma 1’s proof demonstrates

that if there is a difference between two symbols, (

C[loc]

and

C[loc +1]

), their labels will

also be different. The subsequent actions and procedures in the text are theoretically

justiﬁed by this lemma. 

Lemma 2. After the last reiteration of the ∑reduction, the size of ∑is 6.ï.

Proof.

Upon every repetition of cataloguing, the size of the alphabet is reduced from

|∑|

to 2

[log|∑|]

. If

|∑|>

6, then 2

[log|∑|]

is ﬁrmly smaller than 6. Ahas no duplicate symbols

contiguously, nor do the ﬁnal order of tags on Aby Lemma 1 iteratively.

Lemma 2 plays a crucial role in establishing the size of the alphabet

(∑)

after the

last iteration of cataloguing in the given text. It states that the size of

∑

is 6 times the

original alphabet size (

) after the last repetition of cataloguing. The proof of Lemma 2

demonstrates that if the initial alphabet size is greater than 6, then the reduction process

reduces the alphabet size to a value that is strictly smaller than 6.

Lastly, three passes over the order of labels were accomplished to decrease the alphabet

from {0, 1, 2}: initially, we substitute every 3 with the minimum item from {0, 1, 2} that is

not in the neighborhood of 3, and then we perform the same operation for every 4 and 5.

This produces a series of symbols extracted from the

∑{0, 1, 2}

, where no contiguous labels

are equal. We designate this series A0.

We currently choose distinct positions, known as landmarks, from the structures that

are closely related to each other. We initially chose some location, I, and a local maximum

as a landmark, such as A0[loc −1]<A0[loc]>A0[loc +1].

Future Internet 2023,15, 279 10 of 24

Two local maximums might have four overriding symbols. Moreover, we chose some

ias a landmark, i.e., local minima, such as

A0[loc −1]<A0[loc]<A0[loc +1]

, that was not

contiguous to a previously selected landmark. In Figures 2and 3, the process is depicted

graphically.

Future Internet 2023, 15, x FOR PEER REVIEW 10 of 25

The lemma states that if a symbol at position 𝐶[𝑙𝑜𝑐] is not equal to the symbol at

position 𝐶[𝑙𝑜𝑐 + 1], then the label assigned to 𝐶[𝑙𝑜𝑐] is not equal to the label assigned to

𝐶[𝑙𝑜𝑐 + 1]. Lemma 1’s goal is to establish a connection between symbols and their corre-

sponding labels based on how they diﬀer from one another. Lemma 1’s proof demon-

strates that if there is a diﬀerence between two symbols, (𝐶[𝑙𝑜𝑐] and 𝐶[𝑙𝑜𝑐+1] ), their

labels will also be diﬀerent. The subsequent actions and procedures in the text are theo-

retically justiﬁed by this lemma. ∎

Lemma 2. After the last reiteration of the ∑ reduction, the size of ∑ is 6.ɳ.

Proof. Upon every repetition of cataloguing, the size of the alphabet is reduced from |∑|

to 2[𝑙𝑜𝑔|∑|]. If |∑|>6, then 2[𝑙𝑜𝑔|∑|] is ﬁrmly smaller than 6. A has no duplicate sym-

bols contiguously, nor do the ﬁnal order of tags on A by Lemma 1 iteratively.

Lemma 2 plays a crucial role in establishing the size of the alphabet (∑) after the last

iteration of cataloguing in the given text. It states that the size of ∑ is 6 times the original

alphabet size (ɳ) after the last repetition of cataloguing. The proof of Lemma 2 demon-

strates that if the initial alphabet size is greater than 6, then the reduction process reduces

the alphabet size to a value that is strictly smaller than 6.

Lastly, three passes over the order of labels were accomplished to decrease the alpha-

bet from {0, 1, 2}: initially, we substitute every 3 with the minimum item from {0, 1, 2} that

is not in the neighborhood of 3, and then we perform the same operation for every 4 and

5. This produces a series of symbols extracted from the ∑󰇝0,1,2󰇞, where no contiguous

labels are equal. We designate this series 𝐴󰆒.

We currently choose distinct positions, known as landmarks, from the structures that

are closely related to each other. We initially chose some location, I, and a local maximum

as a landmark, such as 𝐴󰆒[𝑙𝑜𝑐 − 1] < 𝐴󰆒[𝑙𝑜𝑐] > 𝐴󰆒[𝑙𝑜𝑐 + 1]. ∎

Two local maximums might have four overriding symbols. Moreover, we chose some

i as a landmark, i.e., local minima, such as 𝐴󰆒[𝑙𝑜𝑐 − 1] < 𝐴󰆒[𝑙𝑜𝑐] < 𝐴󰆒[𝑙𝑜𝑐 + 1],that was not

contiguous to a previously selected landmark. In Figures 2 and 3, the process is depicted

graphically.

Figure 2. Landmark ﬁnding and alphabet reduction process.

Figure 3. Nodes’ formation based on landmark symbols.

Lemma 3. For some two consecutive landmark locations, 𝑙𝑜𝑐 and 𝑙𝑜𝑐,2≤|𝑙𝑜𝑐 − 𝑙𝑜𝑐|≤3.

Proof: Using our tagging mechanism, we claim that no contiguous pair of symbols is

tagged—subsequently, we may not have two contiguous local maximums and speciﬁcally

inhibit tagging local minima next to a local minimum. A modest case investigation demon-

strates that the parting of landmark locations is two overriding labels. ∎

Figure 2. Landmark ﬁnding and alphabet reduction process.

Future Internet 2023, 15, x FOR PEER REVIEW 10 of 25

The lemma states that if a symbol at position 𝐶[𝑙𝑜𝑐] is not equal to the symbol at

position 𝐶[𝑙𝑜𝑐 + 1], then the label assigned to 𝐶[𝑙𝑜𝑐] is not equal to the label assigned to

𝐶[𝑙𝑜𝑐 + 1]. Lemma 1’s goal is to establish a connection between symbols and their corre-

sponding labels based on how they diﬀer from one another. Lemma 1’s proof demon-

strates that if there is a diﬀerence between two symbols, (𝐶[𝑙𝑜𝑐] and 𝐶[𝑙𝑜𝑐 + 1] ), their