Conference PaperPDF Available

Topology-Aware Cluster Configuration for Real-time Multi-access Edge Computing

January 2023

January 2023

DOI:10.1145/3571306.3571417

Conference: ICDCN 2023: 24th International Conference on Distributed Computing and Networking

Authors:

Kolichala Rajashekar

Indian Institute of Technology Bhilai

We consider data-intensive real-time systems, such as mission- critical data-intensive applications such as forest fire detection, medical emergency services, oil pipeline monitoring, etc., which demand relatively low response time in processing data from IoT (Internet of Things) devices. Usually, in such cases, the edge com- puting paradigm is leveraged to drastically reduce the processing delay of such applications by performing the computations on edge devices placed closer to the data sources, i.e., the IoT devices. How- ever, most edge devices, such as cellular phones, tablets, and UAVs (Unmanned Aerial Vehicles), are mobile in nature. Hence, the clus- ter configuration must be dynamically adapted with respect to the changing network topology of the edge cluster such that the ob- served overall communication delay incurred by the edge devices in processing the data from the IoT devices is minimized. To that end, we propose Deep Reinforcement Learning-based intelligent assignment of IoT devices to non-stationary edge devices such that the communication delay is minimized and none of the edge devices is overloaded. We demonstrate, with some preliminary results, that our algorithm outperforms the state-of-the-art.

Content uploaded by Kolichala Rajashekar

Content may be subject to copyright.

Topology-Aware Cluster Configuration for Real-time Multi-access

Edge Computing

Kolichala Rajashekar

kolichalar@iitbhilai.ac.in

Indian Institute of Technology Bhilai

Raipur, India

Sushanta Karmakar

sushantak@iitbhilai.ac.in

Indian Institute of Technology Guwahati

Guwahati, India

Souradyuti Paul

souradyuti@iitbhilai.ac.in

Indian Institute of Technology Bhilai

Raipur, India

Subhajit Sidhanta

subhajit@iitbhilai.ac.in

Indian Institute of Technology Bhilai

Raipur, India

ABSTRACT

We consider data-intensive real-time systems, such as mission-

critical data-intensive applications such as forest re detection,

medical emergency services, oil pipeline monitoring, etc., which

demand relatively low response time in processing data from IoT

(Internet of Things) devices. Usually, in such cases, the edge com-

puting paradigm is leveraged to drastically reduce the processing

delay of such applications by performing the computations on edge

devices placed closer to the data sources, i.e., the IoT devices. How-

ever, most edge devices, such as cellular phones, tablets, and UAVs

(Unmanned Aerial Vehicles), are mobile in nature. Hence, the clus-

ter conguration must be dynamically adapted with respect to the

changing network topology of the edge cluster such that the ob-

served overall communication delay incurred by the edge devices

in processing the data from the IoT devices is minimized. To that

end, we propose Deep Reinforcement Learning-based intelligent

assignment of IoT devices to non-stationary edge devices such that

the communication delay is minimized and none of the edge devices

is overloaded. We demonstrate, with some preliminary results, that

our algorithm outperforms the state-of-the-art.

KEYWORDS

IoT, edge computing, reinforcement learning

ACM Reference Format:

Kolichala Rajashekar, Sushanta Karmakar, Souradyuti Paul, and Subhajit

Sidhanta. 2018. Topology-Aware Cluster Conguration for Real-time Multi-

access Edge Computing. In Proceedings of ACM Conference (Conference’17).

ACM, New York, NY, USA, 2 pages. https://doi.org/XXXXXXX.XXXXXXX

1 INTRODUCTION

Though the edge computing paradigm eectively reduces the re-

sponse time by processing the data from the IoT devices closer

to the data sources, maintaining a low response with increasing

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

Conference’17, July 2017, Washington, DC, USA

ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00

https://doi.org/XXXXXXX.XXXXXXX

number of IoT devices using resource-constrained edge devices is

a challenging task. Additionally, the mobility of IoT and edge de-

vices introduce more challenges to the above problem such as links

breakdown, reconguration overhead, churning etc.[2]. Therefore,

in such a dynamic edge cluster, the IoT devices must be assigned to

non-stationary edge devices while taking into account the dynamic

changes in the topology of the edge cluster and the relative change

in the distance of the IoT devices to the corresponding edge de-

vices. Even with the stationary IoT and edge devices, the problem

is Np-Hard as this problem is similar to a well-known NP-Hard

problem, i.e., Generalised Assignment Problem (GAP) [1]. Hence,

obtaining an optimal solution, in this case, is theoretically infeasible.

Therefore, we apply heuristics approaches to get a near-optimal

solution depending on the given deployment scenario. In this pa-

per, we present a novel approach for managing a non-stationary

edge cluster such that the communication delay in processing data

from IoT devices is minimized. Specically, we demonstrate that

Deep Reinforcement Learning (DRL) based approaches are able

to produce a near-optimal assignment of IoT devices to edge de-

vices, maintaining a reduced communication delay with changing

topology.

2 PROBLEM DEFINITION

Let there be

𝑚

IoT and

𝑛

mobile edge devices in the network. Let

𝐿(𝑖, 𝑗 )

𝑡

denote a pair of locations of the

𝑖

th IoT and the

𝑗

th edge device

at time

𝑡

. The mobility function

takes the previous location-pair

𝐿(𝑖, 𝑗 )

𝑡′

as input and outputs the current location-pair

𝐿(𝑖, 𝑗 )

𝑡

where

𝑡>𝑡′

. We dene the communication delay

𝐶(𝑖, 𝑗 )

𝑡

recursively using

the mobility function Mas follows.

𝐶(𝑖, 𝑗 )

𝑡=𝑓(𝐿(𝑖, 𝑗 )

𝑡, 𝐿 (𝑖,𝑗 )

𝑡′)

=𝑓(M (𝐿(𝑖, 𝑗 )

𝑡′, 𝑡, 𝑡 ′), 𝐿 (𝑖,𝑗 )

𝑡′)

=𝑔(𝐿(𝑖, 𝑗 )

𝑡′, 𝑡, 𝑡 ′)

An assignment

𝐴𝑡

is a binary matrix, where an element of it

𝐴(𝑖, 𝑗 )

𝑡is dened as follows:

𝐴(𝑖, 𝑗 )

𝑡=(1,if 𝑖th IoT device is connected to the 𝑗th edge device at time 𝑡

0,otherwise.

The set of all possible assignments is denoted by

A={𝐴𝑡|𝑡=

,· · · }

and we note that an IoT device can be connected to only

Conference’17, July 2017, Washington, DC, USA Kolichala Rajashekar, Sushanta Karmakar, Souradyuti Paul, and Subhajit Sidhanta

one edge device at a time; mathematically,

Í𝑚

𝑗=1𝐴(𝑖, 𝑗 )

𝑡=

,∀𝑖∈

{

. . . 𝑛},∀𝐴𝑡∈ A

. Therefore, our problem

𝑀𝐴

𝐸𝐴𝑃

is as an

optimization problem as dened below:

min max (C𝑡𝐴⊺

𝑡) | 𝐴𝑡∈ A } (1)

Where

C𝑡𝐴⊺

𝑡

produces a matrix of size

𝑚×𝑚

and

max (C𝐴⊺

𝑡)

gives the maximum of all elements of that matrix.

However, the aforementioned optimization problem needs to be

solved under a few constraints as described below.

(1)

The maximum number of IoT devices that can be connected

to an edge device is

. (Therefore, given

𝑚

𝑛

can be deter-

mined (or vice-versa) from the following equation

𝑚

𝑛≤ T

(2)

The hamming distance between

𝐴𝑡

and

𝐴𝑡′

should be less

than

, where

is dened as the maximum number of

allowed re-congurations.1

3 HEURISTICS DESIGN

We solve our problem in the following two scenarios where

•

Scenario 1: The IoT devices are stationary and edge devices

are mobile.

•Scenario 2: Both the IoT and edge devices are mobile.

Swarm Intelligence and Machine Learning are extensively used in

managing the resources in edge computing. To that end, we apply

Deep Reinforcement Learning (DRL) and compare with Particle

Swarm Optimization (PSO). The goal of the PSO algorithm is to nd

particle position such that it results in the best evaluation tness

(objective) function. The tness value of an edge device

𝑗

(

𝑓 𝑖𝑡 𝑛𝑒𝑠𝑠𝑗

)

is dened as follows

𝑓 𝑖𝑡 𝑛𝑒𝑠𝑠𝑗=max{𝐶(1, 𝑗 )

𝑡∗𝐴(𝑖, 𝑗 )

𝑡,· · · , 𝐶 (𝑚, 𝑗 )

𝑡∗𝐴(𝑚,𝑗 )

𝑡}(2)

In order to apply DRL, we formulate the dened problem into a

Partially Observable Markov Decision Process (POMDP) framework.

Elements of POMDP are dened as follows

•State Space of an edge device j:

𝑆(𝑗)

𝑡={𝐿(1,𝑗 )

𝑡∗𝐴(1,𝑗 )

𝑡,· · · , 𝐿 (𝑚,𝑗 )

𝑡∗𝐴(𝑚,𝑗 )

𝑡}

•Action Space of an edge device 𝑗:

𝐴𝑗={𝐴(1,𝑗 )

𝑡· · · , 𝐴 (𝑚,𝑗 )

𝑡}

•Reward for an edge device 𝑗:

𝑅𝑗

𝑡=Í𝑚

𝑖=1𝐴(𝑖, 𝑗 )

𝑡+1

max{𝐶(𝑖, 𝑗 )

𝑡∗𝐴(𝑖,𝑗 )

𝑡}∀𝑖∈ {1,· · · , 𝑚}

Considering the above elements of POMDP, we observe that the

state space and the action space are larger. Hence, applying tabular

RL methods is infeasible. Therefore, we use DRL and leverage deep

Q-learning Network (DQN) for our problem.

4 PRELIMINARY RESULTS

We performed simulation using the CRAWDAD dataset traces for

the mobility of IoT devices and random mobility for edge devices.

For realistic analysis, we obtain the communication delays from

the PlanetLab dataset.

Fig.1 represents the communication delay changes as the num-

ber of episodes increases with respect to Scenario 1. Initially, PSO

solution moves towards global optima very fast than DQN. This

An IoT device is reassigned to another edge device due to the mobility of the cor-

responding edge device. This re-assignment of IoT device to edge device is called

re-conguration.

105

110

115

120

125

130

135

140

145

150

Communication Delay (ms)

Episodes

DQN

PSO

Figure 1: Changes in communication delay observed during

episodes ( iterations) of Scenario 1

100

150

200

250

Communication Delay (ms)

Different snapshots of Scenario 2

PSO DQN

Figure 2: Communication Delay observed in dierent snap-

shots

is because of the exploration done by the DQN agent. After cer-

tain episodes, DQN agent learns the environment converges faster

whereas PSO converges very slowly.

Fig.2 shows the observed communication delay. This experiment

was conducted in dierent snapshots, where the DQN agent is

already trained whereas PSO has to run from scratch. Due to the

mobility of the IoT and edge devices, PSO is not able to give better

results than DQN.

REFERENCES

[1]

Juan A Dıaz and Elena Fernández. 2001. A tabu search heuristic for the gener-

alized assignment problem. European Journal of Operational Research, 132, 1,

22–38.

[2]

Zeeshan Hameed Mir, Deepesh Man Shrestha, Geun-Hee Cho, and Young-

Bae Ko. 2006. Mobility-aware distributed topology control for mobile multi-

hop wireless networks. In International Conference on Information Networking.

Springer, 257–266.

Investigating the effect of virtual machine migration accounting on reliability using a cluster model

Article

Full-text available

May 2023

The purpose of the article is to develop and verify with the help of mathematical modeling a software method of deploying a fault-tolerant computing cluster with a virtual machine, which consists of two physical servers (main and backup), on which a distributed data storage system with synchronous data replication from the source server to the backup server is deployed. For this purpose, the task is to conduct a computational experiment on a model of a fault-tolerant cluster, which neglects costs during recovery for the migration of virtual machines by means of the mathematical application Mathcad. Combining computing resources into clusters is a way to ensure high reliability, fault tolerance, and continuity of the computing process of computer systems. This is achieved through virtualization, which enables the movement of virtual resources, services, or applications between physical servers while maintaining the continuity of computing processes. The focus of this study is on a failover cluster, which is composed of two physical servers (primary and backup) connected through a switch, and each server has a local hard disk. A distributed storage system with synchronous data replication from the source server to the backup server is deployed on the local disks of the servers, and a virtual machine is running on the cluster. Markovian processes, flows of podias, and Kolmogorov's systems of differential equations are built into the mathematical tools of the model of a water cluster. To ensure the continuity of the computing process in case of a failure of the main server, a shadow copy of the virtual machine is launched on the backup server. The reliability of the failover cluster is measured by the coefficient of non-stationary readiness. A Markov model is proposed to assess the reliability of the failover cluster, taking into account the costs of migrating virtual machines and mechanisms that ensure the continuity of the computing process in the cluster in case of a failure of one physical server. The memory migration process maintains two copies of the virtual machine on different physical servers, enabling them to continue working on the other in the event of failure. A simplified model of the failover cluster neglects the cost of migrating virtual machines and provides an upper estimate of reliability. The study shows that the reliability of a failover cluster, as measured by the non-stationary availability factor, is significantly impacted by the virtual machine migration process. The findings of this study can be used to inform decisions about the technology chosen to ensure the failure stability and continuity of the computing process of computer systems with cluster architecture. The calculations allow us to draw a conclusion about the significant impact of virtual machine migration accounting on reliability. The calculations allow us to draw a conclusion about the significant impact of virtual machine migration accounting on reliability. The calculation was performed under the following failure rates of the server, disk, and switch: λ0 = 1,115×10-5 1/h, λ1 = 3,425×10-6 1/h, λ2 = 2,3×10-6 1/h recovery respectively: μ0 = 0,33 1/h, μ1 = 0,171/h, μ2 = 0,33 1/h. The intensity of synchronization of the distributed storage system: μ3 = 1 1/h, μ4 = 2 1/h. The difference of non-stationary cluster availability coefficients is d = К2(t) – К1(t) = 2.7×10-10

A Tabu search heuristic for the generalized assignment problem

Article

Jul 2001
EUR J OPER RES

This paper considers the generalized assignment problem (GAP). It is a well-known NP-hard combinatorial optimization problem that is interesting in itself and also appears as a subproblem in other problems of practical importance. A Tabu search heuristic for the GAP is proposed. The algorithm uses recent and medium-term memory to dynamically adjust the weight of the penalty incurred for violating feasibility. The most distinctive features of the proposed algorithm are its simplicity and its flexibility. These two characteristics result in an algorithm that, compared to other well-known heuristic procedures, provides good quality solutions in competitive computational times. Computational experiments have been performed in order to evaluate the behavior of the proposed algorithm.

Mobility-Aware Distributed Topology Control for Mobile Multi-hop Wireless Networks

Conference Paper

Jan 2006

In recent years mobile multi-hop wireless networks have re- ceived significant attention and one of the major research concerns in this area is topology control. While topology control problem in ad hoc networks is NP-complete, several heuristic and approximation based so- lutions have been presented. However, few efforts have focused on the issue of topology control with mobility. In this paper, we introduce a new topology control scheme in the presence of mobile nodes. The pro- posed scheme predicts future proximity of neighboring nodes and applies power control such that the network connectivity is maintained while re- ducing energy consumption. Simulation results show that the optimal power selection based on location prediction gives better performance in terms of energy and connectivity.

Topology-Aware Cluster Configuration for Real-time Multi-access Edge Computing

Abstract

Recommended publications

Tracking method for moving vehicles based on feature-point optical flow and Kalman filtration

Real-time implementation and design of an embedded system for identifying speed limit sign

Efficient Real-Time Divisible Load Scheduling with Advance Reservations

Real-time fault diagnosis approach based on lifting wavelet and recursive incremental clustering