Conference PaperPDF Available

Reinforcing the Edge: Autonomous Energy Management for Mobile Device Clouds

July 2019

July 2019

DOI:10.1109/INFCOMW.2019.8845263

Conference: IEEE International Conference on Computer Communications
At: Paris, France

Authors:

Venkatraman Balasubramanian

Arizona State University

Moayad Aloqaily

xanalytics

Saed Alrabaee

United Arab Emirates University

Show all 6 authorsHide

The collaboration among mobile devices to form an edge cloud for sharing computation and data can drastically reduce the tasks that need to be transmitted to the cloud. Moreover, reinforcement learning (RL) research has recently begun to intersect with edge computing to reduce the amount of data (and tasks) that needs to be transmitted over the network. For battery-powered Internet of Things (IoT) devices, the energy consumption in collaborating edge devices emerges as an important problem. To address this problem, we propose an RL-based Droplet framework for autonomous energy management. Droplet learns the power-related statistics of the devices and forms a reliable group of resources for providing a computation environment on-the-fly. We compare the energy reductions achieved by two different state-of-the-art RL algorithms. Further, we model a reward strategy for edge devices that participate in the mobile device cloud service. The proposed strategy effectively achieves a 10% gain in the rewards earned compared to state-of-the-art strategies.

Content uploaded by Moayad Aloqaily

Content may be subject to copyright.

Reinforcing the Edge: Autonomous Energy

Management for Mobile Device Clouds

Venkatraman Balasubramanian∗, Faisal Zaman†, Moayad Aloqaily§, Saed Alrabaee¶,

Maria Gorlatova‡, Martin Reisslein∗

∗Arizona State University, AZ, USA, {Vbalas11, Reisslein}@asu.edu

†University of Ottawa, Ottawa, ON, Canada, Fzama075@uottawa.ca

§Canadian University Dubai, UAE, maloqaily@ieee.org

¶United Arab Emirates University (UAEU), UAE, Salrabaee@uaeu.ac.ae

‡Duke University, NC, USA, Maria.Gorlatova@duke.edu

Abstract—The collaboration among mobile devices to form an

edge cloud for sharing computation and data can drastically

reduce the tasks that need to be transmitted to the cloud. More-

over, reinforcement learning (RL) research has recently begun to

intersect with edge computing to reduce the amount of data (and

tasks) that needs to be transmitted over the network. For battery-

powered Internet of Things (IoT) devices, the energy consumption

in collaborating edge devices emerges as an important problem.

To address this problem, we propose an RL-based Droplet

framework for autonomous energy management. Droplet learns

the power-related statistics of the devices and forms a reliable

group of resources for providing a computation environment

on-the-ﬂy. We compare the energy reductions achieved by two

different state-of-the-art RL algorithms. Further, we model a

reward strategy for edge devices that participate in the mobile

device cloud service. The proposed strategy effectively achieves

a 10% gain in the rewards earned compared to state-of-the-art

strategies.

Index Terms—Mobile Edge Computing, Device Clouds, Inter-

net of Things, Reinforcement Learning.

I. INTRODUCTION

As the IoT revolution expands further into the 5G realm,

low latency service, connected and autonomous systems, data

and service management have been the focus of today’s

research [1]–[6]. Computation and caching at the edge of

the cellular network has been proposed as one of the main

strategies for achieving sub-millisecond latencies [7]–[9]. Past

research has demonstrated that bringing computation closer

to the user results in manageable response times [10], [11].

However, the deployment overheads and costs of these edge

entities can be prohibitive for last mile users. To cope with

these costs, the authors of [12] [13] have proposed an idle

device resource composition scheme in the mobile device to

be utilized for providing a collaborative environment, thereby

resulting in a computation infrastructure [14]. However, none

of these solutions consider the draining energy inside mobile

devices that affect the stability of the mobile environment. Our

interest in this paper is in considering the energy consumption

in Mobile Device Clouds (MDC) at the time of task execution.

For example, in a coffee shop setting where people have gath-

ered and linger in the shop for sometime. In such scenarios,

instead of requesting a costly public cloud service, forming

a local device cloud between users bonded socially is more

feasible [15]. However, the battery powered devices need a

local energy manager to decide whether a device can continue

to participate or should be excluded.

Thus, this problem involves a decision making process at

the device-end where a decision maker module can either

choose to stop (for which the device receives a reward and

the system terminates) or pay a certain cost (in terms of

energy dissipation) to continue further. In terms of termination,

the device that terminates itself receives a termination reward

where as if the decision is to continue then the task execution

continues until the request is complete. All these procedures

should be executed without any user intervention. Our primary

motive here is not to ﬁnd out the battery draining periods

but to investigate the energy constraints that are within the

device deﬁned power thresholds that are estimated over a

period of time. Implicitly, this decision is also responsible

for maximizing rewards over that period. Moreover, to the

best of our knowledge, this work is the ﬁrst of its kind that

employs learning models to resolve such an issue. To this end,

we simulate two state of the art algorithms namely greedy

and near-greedy. We deploy a gradient based approach to

compare and show the resulting rewards obtained. We observe

an approximate 10% increase in rewards earned using the

gradient approach.

A. Motivation

In order to reduce the overheads in terms of deployment

costs of cloudlets and edge data-centers, leveraging the local

idle device resources is essential. Hence, opportunistic MDC

environments are an important replacement for edge cloud

deployments. These devices are hand-held user speciﬁc nodes

that change location based on user movements. Therefore,

the challenges present here are two fold. One is the device

movement - where the user who requests the service and the

serving entity both are subjected to motion. Secondly, there

are constant disruptions due to drain out periods of devices.

Earlier works such as [16] have presented ofﬂoading based

solutions. Thus, in this paper we only target the problem of

energy maintenance in such environments where reward grant

is achieved by a continuous learning process.

B. Problem Deﬁnition

In various scenarios such as smart home automation, smart

buildings such as libraries, coffee shops where devices may

or may not be connected to a power source, there are always

times when device usage remains idle. In such scenarios,

a discovery of nodes that are willing to provide cohesive

service by establishing a local computation environment will

provide a cost effective compute location compared to a

centralized public cloud. However, recently it has been found

that such opportunistic edge cloud deployments requires en-

ergy maintenance. Although, task execution is a priority, if

the device runs out of power, then the resiliency of such a

model fails. Thus, our motivation to use reinforcement learning

stems from the challenge of ﬁnding proper energy thresholds.

Further, this technique provides a two-fold beneﬁt of reward

calculation along with reliable device selection in such volatile

environments. Thus, training enables a highly robust system.

C. Contributions

To this end, our contributions in this paper are as follows:

1) We design a framework for autonomous energy manage-

ment relying on reinforcement learning approach [17].

The design is based on interaction of the agents and

proceed without human inputs.

2) The result of such a design is the production of rewards

commensurate with job completion for the nodes par-

ticipating in an MDC service. As the termination of a

service does not depend on any external inputs, it is

completely autonomous.

The remainder of this document delineates the related work

in Section II, followed by design in Section III. In Section IV

we discuss the performance evaluation. We provide concluding

remarks in Section V.

Fig. 1: Droplet Architecture

II. RE LATE D WOR KS

In [18] authors propose a distributed topology control al-

gorithm by combining design theories where the technique

focuses on asynchronous and asymmetric neighbor discovery.

According to this concept, neighbor discovery schedules are

made based on a combinatorial design of “multiples of 2” after

which a target duty cycling is done. In addition, the multiples

of 2 are applied to overcome the challenge of the block design

and support asymmetric operation. The proposed method has

shown the smallest total number of slots and wake-up slots

among existing representative neighbor discovery protocols.

On the contrary to this work, we propose an autonomous

device environment where decision to terminate is solely taken

by the device itself based on the native energy statistics

(including duty cycling periods at the time of discovery).

Further, in this work authors fail to consider a device cloud

environment where the dynamics of movement affects the

overall energy in the device due to continuous radio module

operation.

In [19] Lee et al. propose a cloud based energy management

approach that eliminates the prediction overhead by ofﬂoading

it to remote cloud. This framework pre-computes the execution

time by proﬁling web applications on dedicated mobile devices

in the cloud. Behaving like typical caches when mobile

web applications request data to servers, both the data and

its execution times are delivered to users mobile devices.

A performance control agent on the mobile device selects

an operating point to meet the response time requirement.

Contrary to this approach, our model does not make any

decisions remotely in the cloud, the decisions are made within

the composition environment.

In [15] Habak et al. propose a femto-cloud system which

provides a dynamic, self-conﬁguring and multi-device mobile

cloud from a collection of mobile devices. Authors present

the femto-cloud system architecture to enable multiple mobile

devices to be composed in to a cloud computing service

despite movements in mobile device participation. Different

from this framework, we generate a learning based approach

for energy management that repeatedly updates its local energy

decision controller and self-heals by terminating itself from the

execution rather than taking over a task and dropping it due

to the lack of energy.

In [12], [14] authors propose a novel distributed device

cloud strategy with emphasis on mobility and maintaining

seamless connectivity. In both of these works, failure to con-

sider the energy component in these devices while providing

the service makes the model incomplete. Different from this,

our model proposes for the ﬁrst time an energy aware MDC

that is based on a reinforcement learning mechanism.

III. SYS TE M DESIGN

A. Overall Droplet Architecture

Fig. 1 shows the design. Our controller design mainly

consists of three modules: the user application, controller and

the network interface.

1) The User Application refers to any application that can

avail the MDC. Via this module, the user proﬁles and

resource information is obtained. As elaborated before

in [20] the user can conﬁgure the amount of resources

that need to be shared before participating in the device

cloud formation.

2) Network Interface is the in-built network card for com-

munication with the other devices and network nodes

(such as Wi-Fi).

The controller resides in every device below the application

layer. It mainly comprises of the following modules:

1) Network Monitor and Communication Agent- Once

every device offers its network location and resource

capacities (how much of storage that can be shared) to

form the device cloud, the network monitor keeps track

of these resource information. This information is passed

to the decision control module that decides the energy

dissipation and maintains a threshold for the resources

that are part of the current composition. A ‘renew’/ ‘top-

up’ message is sent to the resource that crosses below

this value. Due to space limitations we avoid going

into detail about the control messages exchanged at this

juncture. Once a ‘renew’ message is obtained either the

affected device should ﬁnd a new device before exiting

the current cloud or can allot the tasks back to the

requester to make the requester aware of its power status

via the communication agent.

The communication agent and the network monitor

operate in a cohesive manner. This module performs

the evaluation of the network topology and the actual

maintenance updates of the task execution. Discovery

of neighbours and newly obtained user proﬁles are kept

in the database via this modules presence.

2) Decision Control- This is the core of the controller that

maintains energy information. The above two modules

provide periodic updates to the decision control for

maintaining the accuracy while the tasks are being exe-

cuted. A scheduling policy that operates based on device

movements and energy is required. It receives inputs

from the resource estimation module about the load and

the overall behavior of the resources participating in the

device cloud. It performs the main learning mechanism

as elaborated in the next section.

3) Task Manager and Resource Estimation- These two

modules have the main database associations. Here the

information is retrieved from all arriving tasks and

mapped to the resource scheduler. The energy proﬁles

are the sensed data of the devices participating in the

MDC. When a device is connected to a power source,

the task manager obtains a local approval that the most

reliable device or device’s “capability” is 100% which

means that this is the most reliable device proﬁle.

This module updates the task execution progress and

maintains the incoming job status.

B. Modelling Decision Control

In our system, a collection of devices that discover each

other over Wi-Fi is considered. These devices are idle and

provide a composed service as recognized before in [20]. We

TABLE I: Notations and Deﬁnitions

Notation Deﬁnition

R(i)Points offered for participating in the device cloud. Accrued

over-time for MDC services. 1 point per unit time spent in

task-execution.

C(i)Cost in Joules/sec. Amount of energy dissipated.

iState of Execution

OΓ(i)Energy Policy for i

0ΓExpected Cost Function

Pij Probability of going from state i to state j

On(i)Decision Maker for n devices

θ,π θ In Alg. 3, this is the proxy for uncertainty that we vary this

from 1 to 2, πIn Alg. 2, is the preference for an action

probability of exploration

SiSelection parameter for devices

PiInitial Probability

assume Ndevices together provide a device cloud service, the

user may ofﬂoad the tasks based on the resources required. We

assume that the energy constraints on the device are directly

linked to the cost of resource usage. For example, if fiis

the CPU usage of a device, then the cost C(i)for a task i

is directly related as C(i) = αfi, where αis the constant

of power-computing relation. Owing to space limitations, the

derivations have been omitted. Table I shows the employed

notations. Consider terminal reward R(i),0≤R(i)≤ ∞,

and continuation cost C(i), C(i)>0. We deﬁne a probability

of Pij in times of continuation. Let’s say while we begin

initialization of execution i, a controller calculates local energy

proﬁles and readies to stop and receive the terminal cost of

R−R(i)≥0or pay a continuing cost of C(i)and move with

a probability of Pij , where j≥0. Now for an energy policy

Γ, let OΓand 0Γrepresent the expected cost function. For any

policy Γwhich stops at P r(stopping)=1,

OΓ(i) = OΓ(i) + R, where, i = 0,1,2, . . . (1)

This is the only process that needs to be considered for a non-

negative probability. The related processes can be modelled as

a Markov decision process [21]. Thus,

Oi= min[R−R(i), C(i) + X

Pij O(j)], i ≥0.(2)

For nstages,

On(i) = min[−R(i), C(i) + X

Pij On−1(j)], i ≥0.(3)

Now, let 0n(i)be the set decision for initial state before

reaching the nth stage where the control decision is to stop:

0n(i)≥0n+1(i)≥O(i).(4)

Thus, a stable state is achieved if

lim

n→∞

On(i) = O(i).(5)

C. Autonomous Energy Management

As shown in Fig. 2 an agent takes an action based on the

learning process. For any action, the local agent prepares a

reward (R). The Learner module learns the decision control

strategy as time progresses and provides resource estimate

Decision Control

Environment

User Application

Energy Learner

Status

update

Best Action

action

state

reward

Environment

action

state

reward

Fig. 2: Learning Agent Model

updates to the user application for the best action to be taken.

This could be in terms of new discovery, worst case scenarios

of dropping a task if it comes at the cost of device shutting

down, etc. The reward received is mapped to the present value

of the action (E.g. to estimate the value of an action we simply

average all the rewards when an action is selected such as

T otal Rewards

Sum of No. choices ).

As suggested in [22] choice of a near greedy strategy, is

affected by a behavior that is greedy with a small probability

which converges to near certainty. We explore this by

randomly making functional calls to the greedy algorithm

as shown in Alg. 1. Our modeling follows a multi-bandit

approach that is inspired from [22]. We model the bandit

problem to make the power decision on the device forming the

cloud. Further, this enables calculation of reward. We believe

the bandit problem makes sense in our proposed framework

because the controller has no prior knowledge of the actual

reward in forming the cloud. Further, the controller does not

have the complete knowledge of the power requirement. The

drawback in Alg. 1 is that the choice is made immediately

without any exploration of other nodes. Although it is not an

economical approach, the strategy is simple to implement.

IV. PERFORMANCE EVALUATION

Our evaluation suggests that the design is less sensitive to

parameter changes. We observe that greedy and near-greedy

methods in probability of exploration is used to converge to

an optimal selection of action. That is, as the tasks were

changed, more ﬂuctuating rewards were noticed that require

more training. A positive observation is that the exploration of

the near greedy strategy usually results in producing actions

faster at the cost of being sub-optimal. We produce synthetic

measurements for our interpretation. Trafﬁc generation is a

Poisson distribution having reward with mean of zero and a

unit variance. We use the Python Library for this purpose. To

analyze different algorithms we complete 4000 independent

runs i.e., 2000 time steps for each.

In Fig. 3a we show the average rewards obtained with

varying θvalues. Further, in Fig. 3b, we analyze the sensitivity

of reward with respect to the θi.e., the proxy used to avoid

the unnecessary selection of action compared to Alg. 1. Using

Algorithm 1: Near Greedy Approach - Algorithm 1

Input : Si,Pi

/*Estimating Reward R */

P= []  LR

Pkeeps track of time duration each device

participate for a MDC Formation

begin

/*In case of exploitation, i.e.,

1-P, the for loop is skipped

and the max of LR

Pis returned */

foreach Siin MobileDeviceSelectionList do

if Constraints For Device Cloud Satisﬁes then

/*Refer table I for Ri

Calculation */

P←− [Update the Reward Ri=

RP revious

i+RNew

end

Return max(LR

end

Output: Average Reward For Participation in Device

Cloud Formation

Algorithm 2: Gradient Approach - Algorithm 2

Input : Si,Pi

/*Estimating Reward R, which

considers relative action

preference. */

πt(a)←preference for action a

Estimate Reward(Action) /*Returns the

estimated reward for time t */

LA←List of Actions LR

P= []  LR

Pkeeps track

of time duration each device participate for a MDC

Formation begin

/*In case of exploitation, i.e.,

1-P, the for loop is skipped

and the max of LR

Pis returned */

foreach Siin MobileDeviceSelectionList do

if Constraints For Device Cloud Satisﬁes then

foreach actioniin LAdo

Ri=Estimate Reward(i) if Ri≥

Raverage then

Increase πt(a)

end

else

Reduce πt(a)

end

P←− [Update the Reward Ri=

RP revious

i+RNew

end

Return max(LR

end

Output: Average Reward For Participation in Device

Cloud Formation

Algorithm 3: Action(Si)-Reward (Ri). Reducing the

probability of Selection of unnecessary actions.

Input : Si, Pi

Estimate Reward(Action) /*Returns the

estimated reward for time t */

/*Estimating Reward R*/

P= []

 LR

Pkeeps track of time duration each device

participate for a MDC Formation begin

/*Every time an action is taken,

exploration happens to ensure

that all actions are considered

for the determining the optimal

solution. Unlike in 1 where

exploration happens with times

foreach Siin MobileDeviceSelectionList do

if Constraints For Device Cloud Satisﬁes then

/*Refer Table I for Ri

Calculation */

RNew

i=Estimate Reward(i) +θi/*θi

is proxy for uncertainty

estimate; this converges

and reduces unnecessary

action selection for the

calculation. It is root

mean square value of the

action. */

P←− [Update the Reward Ri=

RP revious

i+RNew

end

Return max(LR

end

Output: Average Reward For Participation in Device

Cloud Formation.

the relation, E=θ×sln(t)

Nt(a)how the uncertainty is reduced,

where Nt(a)represents number of times an action has been

executed before time t and Nt(0) is the max action selection.

Thus, as θincreases the exploration increases. Approximately,

the spike at step 6 is because we have considered 5 devices

at max forming the cloud for simplicity. In the beginning all

the devices are explored and given a reward and as the next

step goes the next maximum of the already selected 5 will be

chosen hence a sudden spike. From then on, it is only building

on what has already been chosen thus we don’t see anymore

spikes.

Additionally, Fig. 3b represents % optimal actions selected.

With θzero represents a complete greedy approach hence

no exploration happens. Thus the optimal solution determined

with θ= 0 is sub-optimal. Whereas the θincreases the time

taken by system to explore increases and also the systems

conﬁdence level increases that all the devices are explored

before exploiting. Thus with increase in θvalue the system

converges slowly to an optimal solution.

Two things can be concluded from Fig. 3c. Firstly, the

relative comparison factor added to Alg. 1 shows the greedy

approach however is not optimal although it might improve the

performance. Secondly, the change of mean has no effect on

the proposed algorithm as it adapts to the new changes. Fig. 3c

shows the performance of the Alg. 3 with varying θ. Likewise,

we evaluate the changing rewards with varying πvalues. In

[23], Sun et al. arrive at a conclusion that is not very clear to

compare with our work, hence, in our work, we use baseline

as the scenario when distribution parameter is 4 as shown in

Fig. 4. Owing to the random nature of this scenario, it needs

to be further investigated. In Fig. 5, our approach (θ) produces

a better exploration compared to near-greedy and greedy. On

the other hand, Alg. 1 and Alg. 2 follow an immediate reward

selection which is sub-optimal. In Fig. 6 comparison of all

average optimal actions are represented. It is observed that

Droplet gives a 10% gain in rewards achieved.

V. CONCLUSION & FUTURE WORK

Reinforcement learning is primarily an action based training

methodology compared to other forms of learning. The issue

of energy inside the devices is important to study in an edge

device cloud setting. This is because the collaborating devices

may drain out in between the execution and computation pro-

cess. In this paper, an autonomous energy management archi-

tecture, called Droplet that learns the power-related statistics of

the device is proposed. To this end, it produces a reliable group

of resources for providing a computation environment on-the-

ﬂy. Additionally, we compare Droplet and two state-of-the-

art approaches. We analyze the energy based improvements

in each case. Further, we model a reward strategy for those

devices participating in the mobile device cloud service. We

observed that the proposed architecture effectively produces

a 10% gain in the rewards earned. In future, a real-world

implementation will be investigated and a comparative study

with the simulation results will be produced.

REFERENCES

[1] A. A. Alkheir, M. Aloqaily, and H. T. Mouftah, “Connected and

autonomous electric vehicles (CAEVs),” IT Professional, vol. 20, no. 6,

pp. 54–61, 2018.

[2] M. Aloqaily, I. Al Ridhawi, H. B. Salameh, and Y. Jararweh, “Data

and service management in densely crowded environments: Challenges,

opportunities, and recent developments,” IEEE Commun. Mag., March

2019.

[3] M. Aloqaily et al., “Congestion mitigation in densely crowded environ-

ments for augmenting QoS in vehicular clouds,” in Proc. ACM Symp.

on Des. and Analysis of Intel. Vehi. Netw. and App., 2018, pp. 49–56.

[4] Nasrallah et al., “Ultra-low latency (ULL) networks: The IEEE TSN and

IETF DetNet standards and related 5G ULL research,” IEEE Commun.

Surv. & Tut., vol. 21, no. 1, pp. 88–145, 2019.

[5] I. Parvez et al., “A survey on low latency towards 5G: RAN, core

network and caching solutions,” IEEE Commun. Surv. & Tut., vol. 20,

no. 4, pp. 3098–3130, 2018.

[6] Z. Xiang et al., “Reducing latency in virtual machines: Enabling tactile

internet for human-machine co-working,” IEEE J. Sel. Areas Commun.,

vol. 37, no. 5, 2019.

(a) Reward assignment process. (b) Estimated power selection process. (c) Reward vs time cycles variation with θ

Fig. 3: Droplet sensitivity with respect to given θ.

Fig. 4: Varying πvalues.

Fig. 5: Average of optimal action for a selected parameter

range. θfor Droplet, for near-greedy and πfor greedy.

Fig. 6: Average of all the optimal actions.

[7] M. Chen et al., “Data-driven computing and caching in 5G networks:

Architecture and delay analysis,” IEEE Wirel. Commun., vol. 25, no. 1,

pp. 70–75, 2018.

[8] S. Sukhmani et al., “Edge caching and computing in 5G for mobile

AR/VR and tactile internet,” IEEE MultiMedia, vol. 26, no. 1, pp. 21–

30, 2019.

[9] X. Wang et al., “Content-centric collaborative edge caching in 5G mobile

internet,” IEEE Wirel. Commun., vol. 25, no. 3, pp. 10–11, 2018.

[10] M. Satyanarayanan, P. Bahl, R. Carceres, and N. Davies, “The case for

VM-based cloudlets in mobile computing,” IEEE Pervasive Computing,

vol. 8, no. 4, pp. 14–23, 2009.

[11] P. Shantharama et al., “LayBack: SDN management of multi-access

edge computing (MEC) for network access services and radio resource

sharing,” IEEE Access, vol. 6, pp. 57545–57 561, 2018.

[12] V. Balasubramanian and A. Karmouch, “Managing the mobile ad-hoc

cloud ecosystem using software deﬁned networking principles,” in Proc.

Int. Symp. on Netw., Comp. and Commun. (ISNCC), May 2017, pp. 1–6.

[13] A. Mtibaa, K. A. Harras, and A. Fahim, “Towards computational

ofﬂoading in mobile device clouds,” in Proc. IEEE Int. Conf. on Cloud

Computing Technology and Science (CloudCom), 2013, pp. 331–338.

[14] V. Balasubramanian, M. Aloqaily, F. Zaman, and Y. Jararweh, “Explor-

ing computing at the edge: A multi-interface system architecture enabled

mobile device cloud,” in Proc. IEEE Int. Conf. on Cloud Networking

(CloudNet), Oct 2018, pp. 1–4.

[15] K. Habak, M. Ammar, K. A. Harras, and E. Zegura, “Femto clouds:

Leveraging mobile devices to provide cloud service at the edge,” in

Proc. IEEE Int. Conf. on Cloud Comp. (CLOUD), 2015, pp. 9–16.

[16] D. V. Le and C. Tham, “A deep reinforcement learning based ofﬂoading

scheme in ad-hoc mobile clouds,” in Proc. IEEE Infocom Workshops),

April 2018, pp. 760–765.

[17] N. Mastronarde and M. van der Schaar, “Fast reinforcement learning for

energy-efﬁcient wireless communication,” IEEE Transactions on Signal

Processing, vol. 59, no. 12, pp. 6262–6266, 2011.

[18] G. Yi, J. H. Park, and S. Choi, “Energy-efﬁcient distributed topology

control algorithm for low-power IoT communication networks,” IEEE

Access, vol. 4, pp. 9193–9203, 2016.

[19] W. Lee, D. Sunwoo, A. Gerstlauer, and L. K. John, “Cloud-guided QoS

and energy management for mobile interactive web applications,” in

Proc. IEEE Int. Conf. on Mobile Softw. Eng. and Sys., 2017, pp. 25–29.

[20] V. Balasubramanian and A. Karmouch, “An infrastructure as a service

for mobile ad-hoc cloud,” in Proc. IEEE Comp. and Commun. Workshop

and Conf. (CCWC), 2017, pp. 1–7.

[21] S. M. Ross, Introduction to Stochastic Dynamic Programming. Aca-

demic Press, 2014.

[22] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction.

MIT Press, 2018.

[23] K. Sun, Z. Chen, J. Ren, S. Yang, and J. Li, “M2C: Energy efﬁcient mo-

bile cloud system for deep learning,” in Proc. IEEE Infocom Workshops,

April 2014, pp. 167–168.

Rapid prototyping and performance evaluation of ETSI MEC-based applications

Article

Nov 2022
SIMUL MODEL PRACT TH

The Multi-access Edge Computing (MEC) standard of the European Telecommunications Standards Institute (ETSI) will enable context-aware services for users of mobile 4G/5G networks. ETSI MEC application developers need tools to aid the design and the performance evaluation of their apps. During the early stages of deployment, they should be able to evaluate the performance impact of design choices - e.g., what round-trip delay can be expected due to the interplay of computation, communication and service consumption. When a prototype of the app exists, it needs to be tested live, under controllable conditions, to measure key performance indicators. In this paper, we present an open-source framework that allows developers to do all the above. Our framework is based on Simu5G, the OMNeT++ based simulator of 5G (NewRadio) and 4G (LTE) mobile networks. It includes models of ETSI MEC entities (i.e., MEC orchestrator, MEC host, etc.) and provides a standard-compliant RESTful interface towards application endpoints. Moreover, it can interface with external applications, and can also run in real time. Therefore, one can use it as a cradle to run a MEC app live, having underneath both 4G/5G data packet transport and MEC services based on information generated by the underlying emulated radio access network. We describe our framework and present a use-case of an emulated MEC-enabled 5G scenario.

Artificial intelligence implication on energy sustainability in Internet of Things: A survey

Article

Mar 2023
INFORM PROCESS MANAG

An identity-based public auditing protocol in cloud-assisted IoT

Article

Full-text available

Jul 2022
CLUSTER COMPUT

Cloud-assisted Internet of things (IoT) is an important technological trend since employing the cloud to manage massive IoT data enhances performance of IoT applications. To provide integrity of data shared between IoT equipment, it is essential to utilize a public auditing protocol to ensure the integrity of shared data. In this paper, we propose an identity-based public auditing protocol for cloud-assisted IoT applications to address the aforementioned issue. This protocol cannot only provide users’ privacy but also it supports data integrity, data privacy, batch auditing and dynamic data operation. In addition, it is shown that the proposal is secure in the random oracle model under difficulty of elliptic-curve discrete logarithm problem. Moreover, the performance analysis shows that it outperforms other auditing protocols employing ring signatures at computation and communication overhead. Therefore, IoT users can share data in the cloud in a flexible and efficient manner, while their privacy is preserved.

Reinforcement Learning-Empowered Mobile Edge Computing for 6G Edge Intelligence

Article

Full-text available

Jan 2022

Mobile edge computing (MEC) is considered a novel paradigm for computation-intensive and delay-sensitive tasks in fifth generation (5G) networks and beyond. However, its uncertainty, referred to as dynamic and randomness, from the mobile device, wireless channel, and edge network sides, results in high-dimensional, nonconvex, nonlinear, and NP-hard optimization problems. Thanks to the evolved reinforcement learning (RL), upon iteratively interacting with the dynamic and random environment, its trained agent can intelligently obtain the optimal policy in MEC. Furthermore, its evolved versions, such as deep reinforcement learning (DRL), can achieve higher convergence speed efficiency and learning accuracy based on the parametric approximation for the large-scale state-action space. This paper provides a comprehensive research review on RL-enabled MEC and offers insight for development in this area. More importantly, associated with free mobility, dynamic channels, and distributed services, the MEC challenges that can be solved by different kinds of RL algorithms are identified, followed by how they can be solved by RL solutions in diverse mobile applications. Finally, the open challenges are discussed to provide helpful guidance for future research in RL training and learning MEC.

Reinforcement Learning-Empowered Mobile Edge Computing for 6G Edge Intelligence

Preprint

Jan 2022

Mobile edge computing (MEC) is considered a novel paradigm for computation-intensive and delay-sensitive tasks in fifth generation (5G) networks and beyond. However, its uncertainty, referred to as dynamic and randomness, from the mobile device, wireless channel, and edge network sides, results in high-dimensional, nonconvex, nonlinear, and NP-hard optimization problems. Thanks to the evolved reinforcement learning (RL), upon iteratively interacting with the dynamic and random environment, its trained agent can intelligently obtain the optimal policy in MEC. Furthermore, its evolved versions, such as deep RL (DRL), can achieve higher convergence speed efficiency and learning accuracy based on the parametric approximation for the large-scale state-action space. This paper provides a comprehensive research review on RL-enabled MEC and offers insight for development in this area. More importantly, associated with free mobility, dynamic channels, and distributed services, the MEC challenges that can be solved by different kinds of RL algorithms are identified, followed by how they can be solved by RL solutions in diverse mobile applications. Finally, the open challenges are discussed to provide helpful guidance for future research in RL training and learning MEC.

Intelligent Autonomous Robots: Cooperative Sharing for Energy Management in Industrial IoT

Conference Paper

Nov 2023

Optimal incentive strategy in blockchain-based mobile crowdsensing using game theory

Article

Oct 2023
COMPUT NETW

Mobility-Aware Offloading Decision for Multi-Access Edge Computing in 5G Networks

Article

Full-text available

Mar 2022
SENSORS-BASEL

Multi-access edge computing (MEC) is a key technology in the fifth generation (5G) of mobile networks. MEC optimizes communication and computation resources by hosting the application process close to the user equipment (UE) in network edges. The key characteristics of MEC are its ultra-low latency response and real-time applications in emerging 5G networks. However, one of the main challenges in MEC-enabled 5G networks is that MEC servers are distributed within the ultra-dense network. Hence, it is an issue to manage user mobility within ultra-dense MEC coverage, which causes frequent handover. In this study, our purposed algorithms include the handover cost while having optimum offloading decisions. The contribution of this research is to choose optimum parameters in optimization function while considering handover, delay, and energy costs. In this study, it assumed that the upcoming future tasks are unknown and online task offloading (TO) decisions are considered. Generally, two scenarios are considered. In the first one, called the online UE-BS algorithm, the users have both user-side and base station-side (BS) information. Because the BS information is available, it is possible to calculate the optimum BS for offloading and there would be no handover. However, in the second one, called the BS-learning algorithm, the users only have user-side information. This means the users need to learn time and energy costs throughout the observation and select optimum BS based on it. In the results section, we compare our proposed algorithm with recently published literature. Additionally, to evaluate the performance it is compared with the optimum offline solution and two baseline scenarios. The simulation results indicate that the proposed methods outperform the overall system performance.

Reinforcement Learning-Based Optimization for Mobile Edge Computing Scheduling Game

Article

Jan 2022

Task scheduling on edge computing servers is a critical concern affecting user experience. Current scheduling methods attain an overall appealing performance through centralized control. Nevertheless, forcing users to act based on a centralized control is impractical. Hence, this work suggests a game theory-based distributed edge computing server task scheduling model. The proposed method comprehensively considers the mobile device-server link quality and the server’s computing resource allocation and balances link quality and computing resources requirements when selecting edge computing servers. Furthermore, we develop a time series prediction algorithm based on IndRNN and LSTM to accurately predict link quality. Once Nash equilibrium is reached quickly through our proposed acceleration scheme, the proposed model provides various QoS for different priority users. The experimental results highlight that the developed solution provides differentiated services while optimizing computing resource scheduling and ensuring an approximate Nash equilibrium in polynomial time.

Carpooling in Connected and Autonomous Vehicles: Current Solutions and Future Directions

Article

Jan 2022

Owing to the advancements in communication and computation technologies, the dream of commercialized connected and autonomous cars is becoming a reality. However, among other challenges such as environmental pollution, cost, maintenance, security, and privacy, the ownership of vehicles (especially for Autonomous Vehicles (AV)) is the major obstacle in the realization of this technology at the commercial level. Furthermore, the business model of pay-as-you-go type services further attracts the consumer because there is no need for upfront investment. In this vein, the idea of car-sharing ( aka carpooling) is getting ground due to, at least in part, its simplicity, cost-effectiveness, and affordable choice of transportation. Carpooling systems are still in their infancy and face challenges such as scheduling, matching passengers interests, business model, security, privacy, and communication. To date, a plethora of research work has already been done covering different aspects of carpooling services (ranging from applications to communication and technologies); however, there is still a lack of a holistic, comprehensive survey that can be a one-stop-shop for the researchers in this area to, i) find all the relevant information, and ii) identify the future research directions. To fill these research challenges, this paper provides a comprehensive survey on carpooling in autonomous and connected vehicles and covers architecture, components, and solutions, including scheduling, matching, mobility, pricing models of carpooling. We also discuss the current challenges in carpooling and identify future research directions. This survey is aimed to spur further discussion among the research community for the effective realization of carpooling.

Reducing Latency in Virtual Machines: Enabling Tactile Internet for Human-Machine Co-Working

Article

Full-text available

Mar 2019

Software Defined Networking (SDN) and Network Function Virtualization (NFV) processed in Multi-access Edge Computing (MEC) cloud systems have been proposed as critical paradigms for achieving the low latency requirements of the tactile Internet. While Virtual Network Functions (VNFs) allow greater flexibility compared to hardware based solutions, the VNF abstraction also introduces additional packet processing delays. In this paper, we investigate the practical feasibility of NFV with respect to the tactile Internet latency requirements. We develop, implement, and evaluate Chain bAsed Low latency VNF ImplemeNtation (CALVIN), a low-latency management framework for distributed Service Function Chains (SFCs). CALVIN classifies VNFs into elementary, basic, and advanced VNFs. CALVIN implements elementary and basic VNFs in the kernel space, while advanced VNFs are implemented in the user space. Throughout, CALVIN employs a distributed mapping with one VNF per Virtual Machine (VM) in a MEC system. Moreover, CALVIN avoids the metadata structure processing and batch processing of packets in the conventional Linux networking stack so as to achieve short per-packet latencies. Our rigorous measurements on off-the-shelf conventional networking and computing hardware demonstrate that CALVIN achieves round-trip times from a MEC ingress point via two elementary forwarding VNFs (one in kernel space and one in user space) and a MEC server to a MEC egress point on the order of 0.32 ms. Our measurements also indicate that MEC network coding and encryption are feasible for small 256 byte packets with an MEC latency budget of 0.35 ms; whereas, large 1400 byte packets can complete the network coding, but not the encryption within the 0.35 ms.

Connected and Autonomous Electric Vehicles (CAEVs): A Service Management Perspective

Article

Full-text available

Dec 2018

The powerful capabilities of Connected and Autonomous Electric Vehicles (CAEVs) will make it a popular provider of a wide range of services, including mobility, sensing, computing, traffic control, and energy management. This article uses a characterization of these services to devise a service management framework and pricing schemes.

Congestion Mitigation in Densely Crowded Environments for Augmenting QoS in Vehicular Clouds

Conference Paper

Full-text available

Oct 2018

Parking lots in densely crowded environments such as stadiums, theaters, and hospitals provide great opportunities for vehicular cloud services. A cloud environment formed by individual vehicles, where each vehicle offers its resources as a service has shown feasible practices in 5G network scenarios. Moreover, resource management in 5G must be achieved in accordance with user-centric QoS requirements. In alignment with this, a key enabler of the user-centric service scheme is Network Slicing. The formation of multiple slices in such a dense environment, the congestion between sender and receiver, and resource management and allocation are topics of current research. This paper has the following contribution: First, a framework of Vehicular Clouds being restricted to individual slices in 5G cellular networks is proposed. Second, a queuing strategy for congestion control in a densely crowded environment such as parking lots is designed. Finally, a resource allocation algorithm that enables maximum matching between the tasks to be executed and the candidate slices is developed. The novelty of this approach comes from the fact that congestion control is performed at the Access Points (AP). We do this by introducing a control module that makes queuing decisions at the time of request arrival. By incorporating control module in AP, our aim is to provide AP resources in terms of transmission period to different slices, thereby, allowing WiFi resources to be shared along with the 5G radio resources. The performance benefits of the proposed solution has been investigated through simulation tests.

LayBack: SDN Management of Multi-access Edge Computing (MEC) for Network Access Services and Radio Resource Sharing

Article

Full-text available

Oct 2018

Existing radio access networks (RANs) allow only for very limited sharing of the communication and computation resources among wireless operators and heterogeneous wireless technologies. We introduce the LayBack architecture to facilitate communication and computation resource sharing among different wireless operators and technologies. LayBack organizes the RAN communication and multi-access edge computing (MEC) resources into layers, including a devices layer, a radio node [enhanced Node B (eNB), Access Point (AP)] layer, and a gateway layer. LayBack positions the “coordination point” between the different operators and technologies just behind the gateways and thus consistently decouples the fronthaul from the backhaul. The coordination point is implemented through a software defined networking (SDN) switching layer that connects the gateways to the backhaul (core) network layer. A unifying SDN orchestrator implements an SDN based management framework that centrally manages the fronthaul and backhaul communication and computation resources and coordinates the cooperation between different wireless operators and technologies. We illustrate the capabilities of the introduced LayBack architecture and SDN based management framework through a case study on a novel fluid cloud RAN (CRAN) function split. The fluid CRAN function split partitions the RAN functions into function blocks that are flexibly assigned to MEC nodes, effectively implementing the RAN functions through network function virtualization (NFV). We find that for non-uniform call arrivals, the computation of the function blocks with resource sharing among operators increases a revenue rate measure by more than 25% compared to the conventional CRAN where each operator utilizes only its own resources.

A Deep Reinforcement Learning based Offloading Scheme in Ad-hoc Mobile Clouds

Conference Paper

Full-text available

Apr 2018

In this paper, we consider the problem of making an optimal offloading decision for a mobile user in an ad-hoc mobile cloud in which the mobile user can offload his computation tasks to nearby mobile cloudlets via a device-to-device (D2D) communication-enabled cellular network. We propose a deep reinforcement learning (DRL)-based offloading scheme which enables the user to make near-optimal offloading decisions by taking into account uncertainties of user's and cloudlets' movements and the cloudlets' resource availabilities. We first propose a Markov decision process (MDP)-based offloading problem formulation which considers the composite states of the user's and cloudlets' queue states and the distance states between the user and cloudlets as the system state space. The objective of the formulated MDP-based problem is to determine the optimal action on how many tasks the user should process locally and how many tasks to offload to each cloudlet at each observed system state such that the user's utility obtained by task execution is maximized while minimizing the energy consumption, task processing delay, task loss probability and required payment. Then, we use a deep reinforcement learning scheme, called deep Q-network (DQN) to learn an efficient solution for the proposed MDP-based offloading problem. Extensive simulations were performed to evaluate the performance of the proposed offloading scheme. The simulation results validate the effectiveness of the offloading policies obtained by the proposed scheme.

Data and Service Management in Densely Crowded Environments: Challenges, Opportunities, and Recent Developments

Article

Jan 2019

Densely crowded environments such as stadiums and metro stations have shown shortcomings when users request data and services simultaneously. This is due to the excessive amount of requested and generated traffic from the user side. Based on the wide availability of user smart-mobile devices, and noting their technological advancements, devices are not being categorized only as data/service requesters anymore, but are readily being transformed to data/ service providing network-side tools. In essence, to offload some of the workload burden from the cloud, data can be either fully or partially replicated to edge and mobile devices for faster and more efficient data access in such dense environments. Moreover, densely crowded environments provide an opportunity to deliver, in a timely manner, through node collaboration, enriched user-specific services using the replicated data and device-specific capabilities. In this article, we first highlight the challenges that arise in densely crowded environments in terms of data/service management and delivery. Then we show how data replication and service composition are considered promising solutions for data and service management in densely crowded environments. Specifically, we describe how to replicate data from the cloud to the edge, and then to mobile devices to provide faster data access for users. We also discuss how services can be composed in crowded environments using service-specific overlays. We conclude the article with most of the open research areas that remain to be investigated.

Exploring Computing at the Edge: A Multi-Interface System Architecture Enabled Mobile Device Cloud

Conference Paper

Oct 2018

Edge Caching and Computing in 5G for Mobile AR/VR and Tactile Internet

Article

Nov 2018

As a result of increasing popularity of Augmented Reality and Virtual Reality (AR/VR) applications, there are significant efforts to bring AR/VR to mobile users. Parallel to the advances in AR/VR technologies, Tactile Internet is gaining interest from the research community. Both AR/VR and Tactile Internet applications require massive computational capability, high communication bandwidth, and ultra-low latency which cannot be provided with the current wireless mobile networks. By 2020, Long Term Evolution (LTE) networks will start to be replaced by fifth generation (5G) networks. Edge caching and mobile edge computing are among the potential 5G technologies that bring content and computing resources close to the users, reducing latency and load on the backhaul. The aim of this survey is to present current state-of-the-art research on edge caching and computing with a focus on AR/VR applications and Tactile Internet and to discuss applications, opportunities and challenges in this emerging field.

Ultra-Low Latency (ULL) Networks: The IEEE TSN and IETF DetNet Standards and Related 5G ULL Research

Article

Sep 2018

Many network applications, e.g., industrial control, demand Ultra-Low Latency (ULL). However, traditional packet networks can only reduce the end-to-end latencies to the order of tens of milliseconds. The IEEE 802.1 Time Sensitive Networking (TSN) standard and related research studies have sought to provide link layer support for ULL networking, while the emerging IETF Deterministic Networking (DetNet) standards seek to provide the complementary network layer ULL support. This article provides an up-to-date comprehensive survey of the IEEE TSN and IETF DetNet standards and the related research studies. The survey of these standards and research studies is organized according to the main categories of flow concept, flow synchronization, flow management, flow control, and flow integrity. ULL networking mechanisms play a critical role in the emerging fifth generation (5G) network access chain from wireless devices via access, backhaul, and core networks. We survey the studies that specifically target the support of ULL in 5G networks, with the main categories of fronthaul, backhaul, and network management. Throughout, we identify the pitfalls and limitations of the existing standards and research studies. This survey can thus serve as a basis for the development of standards enhancements and future ULL research studies that address the identified pitfalls and limitations. IEEE

Content-Centric Collaborative Edge Caching in 5G Mobile Internet

Article

Jun 2018

The fifteen articles in this special section explores expositions on novel functionalities and technologies in various wireless and mobile network caching equipment, with respect to caching algorithms, caching implementations, system integration, cache modeling, optimization, and so on. In recent years, with the rapid development of information and communication technologies (ICTs), mobile network operators are increasingly suffering from the traffic explosion problem. And this problem remains to be tackled in the upcoming fifth generation (5G) networks. However, due to the limited bandwidth, the need to save energy, and the fact that advancements in transmission techniques are approaching the Shannon limit with diminishing returns, it is clear that these advancements need to be complemented by improvements in the network, transport, and application layers to provide sustainable solutions toward richer network capacity.