Conference PaperPDF Available

Abstract

Due to the dynamic nature of a vehicular fog computing environment, efficient real-time resource allocation in an internet of vehicles (IoV) network without affecting the quality of service of any of the on-board vehicles can be challenging. This paper proposes a priority-sensitive task offloading and resource allocation scheme in an IoV network, where vehicles periodically exchange beacon messages to inquire about available services and other important information necessary for making the offloading decisions. In the proposed methodology, the vehicles are stimulated to share their idle computation resources with the task vehicles, whereby a deep reinforcement learning algorithm based on soft actor-critic (SAC) is designed to classify the tasks based on priority and computation size of each task for optimally allocating the power. In particular, the SAC algorithm works towards achieving the optimal policy for task offloading by maximizing the mean utility of the considered network. Extensive numerical results along with a comparison with other baseline algorithms, namely greedy and deep deterministic policy gradient algorithms are presented to validate the feasibility of the proposed algorithm.
SAC-Based Resource Allocation for Computation
Offloading in IoV Networks
Bishmita Hazarika, Keshav Singh, Sudip Biswas, Shahid Mumtaz§, and Chih-Peng Li
Institute of Communications Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan
Department of ECE, Indian Institute of Information Technology Guwahati, India
§Instituto de Telecomunicac¸ ˜
oes, P-3810-193 Aveiro, Portugal
Email: me.bishmita@gmail.com, {keshav.singh, cpli}@mail.nsysu.edu.tw, sudip.biswas@iiitg.ac.in, smumtaz@av.it.pt
Abstract—Due to the dynamic nature of a vehicular fog
computing environment, efficient real-time resource allocation in
an internet of vehicles (IoV) network without affecting the quality
of service of any of the on-board vehicles can be challenging. This
paper proposes a priority-sensitive task offloading and resource
allocation scheme in an IoV network, where vehicles periodically
exchange beacon messages to inquire about available services
and other important information necessary for making the
offloading decisions. In the proposed methodology, the vehicles
are stimulated to share their idle computation resources with the
task vehicles, whereby a deep reinforcement learning algorithm
based on soft actor-critic (SAC) is designed to classify the
tasks based on priority and computation size of each task for
optimally allocating the power. In particular, the SAC algorithm
works towards achieving the optimal policy for task offloading
by maximizing the mean utility of the considered network.
Extensive numerical results along with a comparison with other
baseline algorithms, namely greedy and deep deterministic policy
gradient algorithms are presented to validate the feasibility of the
proposed algorithm.
Index Terms—Internet of vehicles (IoV), deep reinforcement
learning (DRL), soft actor-critic (SAC), task offloading.
I. INTRODUCTION
IN recent years, the advent of autonomous driving
and fifth generation (5G) communications have led
to the rapid development in the field of internet of
vehicles (IoV) in conjunction with artificial intelligence
(AI) [1]. Accommodating the interests of autonomous driving
involves heterogeneous tasks, computations, and wireless
communications within the vehicular network where most
of the applications are mission-critical that demand intense
computations and are delay-sensitive. However, the on-board
resources of most consumer vehicles are usually limited
and cannot fulfill the service requirements of the vehicular
environment. Thus, the quality of service (QoS) requirements
of the vehicular network get affected.
To overcome the challenges of large delay, limited
computation, and job scheduling in IoV, task/computation
offloading in a mobile edge computing (MEC) environment
has been considered [2]–[5]. In such an environment when
a vehicle does not have sufficient resources to perform the
required computation, part of its computation or tasks are
offloaded to the MEC server or cloud. MEC servers mostly
refer to base stations (BSs) with some computation power that
are sparsely deployed. Although such infrastructure can fulfill
the demand to some extent, it becomes challenging when the
traffic within the range of a BS is high and multiple vehicles
have tasks that require offloading. As a consequence, the
communication delay increases, and latency in the execution
of the mission-critical tasks leads to the failure of the delay-
sensitive tasks. Furthermore, since the vehicles move at certain
speeds but the BSs are stationary, the wireless links between
a vehicle and the BSs remain intact only for a small amount
of time.
To address the above issues of MEC, vehicular fog
computing (VFC) has been recently proposed, which has
proved to be a more adequate solution by enabling the
vehicles to share resources among themselves [5], [6]. In
this regard, to enhance the performance of task offloading
and resource allocation in a VFC framework, the authors
in [5] and [6] focused on the problem of minimizing the
delay in task allocation using algorithms such as particle
swarm optimization algorithm and lyapunov optimization,
respectively. In a similar vein, the authors in [3] and [4]
proposed deep reinforcement learning (DRL)-based algorithms
for resource allocation in a VFC framework. Next, the
authors in [7] proposed a framework that uses beacon
messages within the vehicular network to obtain information
about the resources available in nearby vehicles. The use
of beacon messaging is a fast and simple method for
exchanging information which is a crucial component in a
VFC framework. Accordingly, the authors in [7]–[9] also used
the method of beacon messages to exchange information or
notify neighboring nodes about the node information in a
similar VFC framework. In spite of the many advantages of
VFC, several issues have limited the deployment of VFC in
real-time environments. In particular, the primary challenges in
a VFC environment are related to efficient resource allocation,
delay in mission-critical tasks, latency in communication, and
shorter link duration between the task vehicle and the service
provider.
To address the challenges plaguing the deployment of VFC,
in this paper we propose a IoV network with hybrid vehicle-
to-vehicle (V2V) and vehicle-to-MEC/cloud task offloading.
The key contributions are highlighted as follows:
We design a VFC framework which uses beacon
messages for information exchange rather than
conventional BS to vehicle wireless link. The proposed
framework considers parked vehicles, moving vehicles
as well as pedestrians with computation resources that
act as service providers to compute the offloaded tasks.
Fig. 1: An illustration of the considered IoV network.
The vehicles are stimulated to share their idle
computation resources with the task vehicles, whereby
a DRL algorithm based on soft actor-critic (SAC)
is designed using markov decision process (MDP) to
classify the tasks based on priority and computation size
of each task for optimally allocating the power.
Three utility functions are formulated based on three
priority levels of the tasks. The SAC algorithm works
towards achieving the optimal policy for task offloading
by maximizing the mean utility of the considered
network.
Extensive numerical results along with a comparison
with other baseline algorithms, namely greedy and deep
deterministic policy gradient (DDPG) algorithms are
presented to validate the feasibility of the proposed
algorithm.
Organization: The flow of the paper is organized as follows.
Section II provides the detailed explanation of the considered
system model. In Section IV the proposed DRL algorithm is
discussed and the framework is formulated. Simulation results
are discussed in Section V and finally, the conclusions are
drawn in Section VI.
II. SYS TEM MO DEL
A. System Architecture
We consider a IoV network as illustrated in Fig. 1 involving
a multi-layer distributed VFC framework that consists of
the physical layer and the hierarchical Fog-Cloud layer1.
The framework consists of two offloading modes: Vehicle-to-
Vehicle (V2V) and Vehicle-to-Pedestrian (V2P). The vehicles
that participate in task offloading are categorized into two
types: task vehicle (TV) and service vehicle (SV). The TVs
execute tasks locally, offload to other SVs or to the edge/cloud.
V2V: In a V2V communication, both the TV and SV
can be on the move or the TV is on the move and
SV is in a parked state. Regardless of the state, all
SVs within the range of a TV are eligible to provide
computation resources to the TV. In the considered
model, SV and TV can move in the same or opposite
direction. The significant factors determining a SV are
the relative distance from the TV, its direction, speed,
available resources, etc.
1In the Fog-Cloud layer, several BSs can communicate among themselves.
V2P: Apart from vehicles, pedestrians on the road with
resources can also participate in task offloading as a
service provider since most of the pedestrians carry
devices with high computation power like mobile phones,
tablets, etc.
We assume that the time in the network is divided into N
periods, with each period having multiple frames. A TV with
inadequate computation resources uses beacon messages in
every time slot to estimate the information of the nearby SVs
with a ping-ACK type message exchange. The status of a
system remains constant within one time frame but can change
over different time frames2.
Beacon message: These messages are broadcast messages
that acquire position information along with other
required basic status information of the nearby vehicles
like available resources, expected delay, relative distance,
speed and direction from the SV periodically [7] [8]
[9]. This method of using beacons reduces the delay
in resource allocation process due to their speed and
simplicity.
Accordingly, during time n, we consider the TV to have S
service providers within the communication range, with SV
being either vehicle or pedestrian within the communication
range with available computation resources. For the purpose
of this study, we focus on one TV and several SVs within its
range. Whenever a TV has to offload a task, it broadcasts
beacon messages to all the SVs within range and all the
eligible SVs with available idle resources respond to the
broadcast with ACK type messages containing the required
information. The agent located at the TV then determines
the most eligible SV for the task to be offloaded. Since
beacon messages are fast and involves lesser number of
bits per transmission, the transmission delay in assignment
of SV can be ignored. Let there be Mtasks represented
as φ1,φ2,...,φm,...,φM. The description of task φmcan be
categorized by the data size Dm, computation size or CPU
cycles Cm, delay constraint µmand task priority ρm. Now,
considering that the wireless channel between TV and SV is
static during a task, the transmission rate between them is
given as
rst =bst log2(1 + γst).(1)
Here, bst denotes the allocated bandwidth and γst denotes the
signal to interference and noise (SINR), which is given as
γst =ωtrsλα
st |hst|2
ΣjS,j6=sωjλα
j,v |hj,v|2+N0
.(2)
Here, ωtrs is the power of the transmitter, λst denotes the
relative distance between TV and SV, αis the path loss
exponent, hst is the desired channel gain, N0is the additive
white Gaussian noise and ΣjS,j6=sωjλα
j,v |hj,v|2denotes the
aggregate interference.
2The estimated time period for the two vehicles to remain in contact, along
with available resources, channel state and expected delay can change over
different time slots. Further, the duration that a SV remains within range of
a TV depends on the velocity of both vehicles that can be approximated as a
Gaussian distribution [10] [2].
III. TAS K MODE L AND UTILITY
A. Task Model
In general, the task to be offloaded are categorized based
on two factors: priority and size.
Priority of task: The task priority can be classified as
high priority tasks (PH), general tasks (PG), and low
priority tasks (PL) based on delay constraint. PHtasks
are critical tasks with a very short tolerable delay such
as security-related tasks, navigation, traffic, road analysis,
etc. A PHtask should be finished in time, otherwise, the
task would lose its value and fail. PGtasks have a longer
tolerable delay and the requirement of immediate action
is relatively lower than that of PHwhile PLtasks can
tolerate very high delays. Such tasks do not lose their
value even if it takes a long time to execute them.
Size of Task: The size of the computational tasks can be
classified into small (SSm ), general (SG), and large tasks
(SL), where small tasks are the tasks that involve the
least number of bits and large tasks involve the highest
number of bits. However, if the size of the task is too
large and the priority of the task is higher and needs to
be executed within a deadline, the TV can split the task
into multiple sub-tasks and offload them to different SVs
to be executed in parallel.
There are three ways a task can be executed: locally, V2V
offloading or edge/cloud offloading. In this work, we do
not focus on the separate edge and cloud layers. Hence,
they are considered as one. Nevertheless, the proposed model
can be easily extended to a hybrid edge and cloud model
encompassing the V2V communication layer. PHtasks are
extremely delay-sensitive and hence they must be processed
locally. Further, since PHtasks involve mostly the first-hand
computation tasks in a vehicle, the size of such tasks are
mostly small but needs continuous computation, for example,
sensing the road continuously when the TV is on the move
or continuous navigation. Hence, PHtasks of size SSm are
executed by the local processor. On the contrary, PLtasks
are delay tolerant and can be executed over a longer time
stamp. Therefore PLand SLtasks can be offloaded directly
to the edge/cloud even if there are resources available nearby
to preserve the vehicular resources for future tasks of the same
or other TVs. In case there are no eligible SVs nearby, the TV
offloads the computation task to the edge/cloud.
B. Utility
For calculating the utility we assume that the PHtasks have
a strict maximum tolerable delay or deadline beyond which
the tasks lose their values. Whenever a task execution time
exceeds the tolerable time limit, the utility becomes negative
since the task offloading has failed. The utility function can
be mathematically defined as [11]
UPH
n=(log (1 + µmnm)S0
Sm , nmµm,
ζPH, nm> µm,(3)
where nmdenotes the time taken to complete the task φmand
ζPHis the negative value assigned due to failure in executing
the task within the time limit and ηS0
Sm denotes the number
of local small-sized high priority tasks. Next, the utility of PG
tasks can be given as
UPG
n=(ζ(PG)SSx , nmµm,
ζ(PG)ec(nmµm), nm> µm.(4)
Here, ηSSx denotes the summation of the number of small
tasks of priority class PG,SGsized tasks, and the number of
sub-tasks from large tasks (divided for offloading for parallel
or simultaneous execution) or SLtasks of priority class PG.
If the task is executed within the maximum tolerable delay,
a positive value is assigned to the utility whereas if the task
execution time exceeds the time limit, unlike PH, the task does
not completely lose its value or fail but the value of utility
decreases exponentially with time. In (4), ζ(PG)is the positive
utility constant. Furthermore, UPL
ncalculates the utility of PL
and SLthe tasks that are directly offloaded to the MEC server
or cloud and is given as
UPL
n=(ζPLS0
L, nmµm,
0, nm> µm.(5)
Here, ηS0
Ldenotes the number of large tasks that is eligible to
be directly offloaded to the cloud. Although PLtasks are delay
tolerant, these tasks might still fail due to other technical issues
such as communication interruption leading to nm=. In
such cases, the utility is calculated as zero. Let χmdenote the
energy consumed in SV for task φm,fmbe the computation
frequency of SV and ιmdenotes the time required to compute
task φmand Cmdenotes the computational size of the task.
For frequency fmin a SV, the amount of energy consumed
is proportional to the amount of time required for the task
to be completed i.e. for φm, task Cm=fmιm. Hence, energy
consumed is proportional to the computational size. Then, the
overall utility of the TV can be calculated as
Un= (ρPH)UPH
n+ (ρ(PG))UPG
n+ (ρ(PL))UPL
nχmCm,
(6)
where ρPH,ρPGand ρPLare constants that represent the
priority level of high priority tasks, general tasks, and low
priority tasks, respectively and is the indicator function.
In a vehicular network, each SV can also act as a TV,
such that a SV may have its own local PHtasks along with
offloaded tasks from other vehicles. Since each vehicle has
limited computation capability, the SV needs to ensure that it
has allocated enough resources for its local PHtasks so that
they are executed within the time limit. Let a service vehicle
Sscarry Llocal PHtasks and the computation size of the
tasks and maximum tolerable delay be Csand µs, respectively.
Then, the minimum frequency required for local tasks and the
total frequency if all the resources are solely reserved for local
tasks are respectively given as
Fmin
Ss=XL
S=1
Cs
µs
,and FSs=XL
S=1
Cs
φSsµs
,(7)
where φSs=Fmin
Ss
FSsNow, from equation (6), the total utility
in calculating the local task is given by
Ulocal(φ) = XSs
s=1 log (1 + µsφµs), φ [φSs,1].(8)
When the SV executes an offloaded task along with a local
task, it allocates part of its frequency to the offloaded task.
Hence, the utility of the local task changes such that the
product of the energy consumed and computation size of the
task is equal to the difference of utility of the local task and
the utility when both local and offloaded tasks are processed.
This is given as
χmCm=Ulocal(φs)Ulocal (φ0
s).(9)
Similarly, while executing the allocated task if any new task
arrives, then the utility is again updated to Ulocal
Ss(φ00
s), where
φ00
sis dependent on the energy consumed during execution of
the tasks, which is proportional to the computation size of the
tasks.
IV. SOF T ACTO R-C RITIC (SAC) BASE D TASK
OFFLOA DING
Since in the considered framework, parameters in the V2V
network can change over different time slots, hence though
the current state of the network is known, the next state or the
future states are unknown and are likely to be different from
the current state. To address the problem of change in states,
it is modeled as a MDP3with the objective of maximizing
the utility of task offloading and is solved using a model-free
DRL algorithm that employs the SAC framework to evaluate
and improve the policy of task offloading. The details of the
proposed task offloading are illustrated below.
A. Preliminaries on RL
State space: Each TV contains an agent which takes
the decision of offloading a task on the basis of the
vehicular network’s information at time nsuch as
SINR of the communication channel, availability of SV,
remaining resources of Ss, utility of locally executed
tasks (Ulocal(φ)), data size, CPU cycle, maximum
tolerable delay, etc.
Action space: The agent in the TV analyzes the
environment based on the state space and determines the
eligible SV to offload the task.
Reward: Let denote a binary term such that the value
of is 1if a task is successfully offloaded. In the
considered framework, we have Ntime slots. Hence, the
mean reward Rcan be calculated as
R=1
NXN1
n=0 XS
s=1Ss
nUSs
n.(10)
B. SAC algorithm
An important feature of SAC is to regulate the entropy of
a policy. The policy in SAC is trained such that a trade-
off between the expected return and entropy is maximized.
Hence, the SAC algorithm is based on maximum entropy RL,
that aims to maximize the mean reward and entropy to find
the optimal policy given by π. Accordingly, the relationship
between the soft action value and the state value functions at
3The details on MDP modelling is omitted due to space constraints.
However, the readers can refer to [12] for details on MDP.
state s, action aand time nfor the considered V2V network
can be given as equation (11),(12) as per Bellman equation
Qπ(s, a) =R(sn,an)+γsn+1p[V(sn+1)] ,(11)
with Vsn=anπ[Q(sn, an)log π(an|sn)] ,(12)
where pdenotes the trajectory distribution made by π. We
consider Qπ(s, a) = Qθ(s, a)in the DNN where θdenotes
the network parameters. The Q-function parameters are not
constant and the actor and critic are further updated according
to the replay buffer regarding action and immediate reward
in the replay buffer. Thus the parameters can be trained by
minimizing the loss function given by equation(13)
Qval =(Qθ(sn, an)(Q0
θ0(sn, an)2],
JQ(θ) = sn,anrb[(1/2)(Qval )] ,(13)
where Qθ(sn, an)denotes the soft Q-value and rb denotes
replay buffer [13] and
Q0
θ0(sn, an) = R(sn,an)+γsn+1p[Vθ0(sn+1)] .(14)
In order to stabilize the iterations for action-value function,
equation (14) defines the target action-value function and θ0
is obtained by moving average of θexponentially.
The performance of the DRL algorithm depends on the
policy. Hence, if the policy is optimal the offloaded tasks will
be completed within time and utility will be high. On the
contrary, if the policy is not optimal, task computation failure
or exceeding the time limit will be a commonality which will
decrease the utility. Therefore, in order to improve the policy,
it is updated in terms of Kullback-Leibler divergence as in
equation (15) [13] where Πdenotes some set of policies which
co-relates to Gaussian distribution parameters and function Zπ
is used to normalize the distribution without effecting the new
policy.
πn= arg min
π0Π
DKL π0(.|sn)||exp((1/Λ)Qπ(sn, .)
Zπ(sn).(15)
The Kullback-Leibler divergence can be further minimized
by updating the policy parameters as
Jπ(φ) = snrbhanπφlog(πφ(an|sn)) Qθ(sn, an)]i.
(16)
The policy iteration should be continued until it reaches
the optimal policy value and converges with the maximum
entropy. The detailed SAC based task offloading algorithm is
illustrated in Algorithm 1.
V. NUMERICAL RE SULT S
In this section, we present the simulation results for the
considered IoV network. In particular, we consider one TV
and multiple SVs within the VFC framework where the SVs
include parked vehicles, moving vehicles and pedestrians.
Every moving SV can also have its own local tasks and hence,
the agent in every vehicle ensures that their local tasks or high
priority tasks are executed on time. The mean utility and mean
delay are used as the scores to measure the performance of
the network. While the mean utility is defined as Undivided
by the total number of local and offloaded tasks and sub-tasks,
Algorithm 1 SAC based task offloading
1: Initialize: Qθ1(s, a),Qθ2(s, a),Qθ0
1and Qθ0
2with weight
θ0
1=θ1and θ0
2=θ2
2: Initialize: policy πφ(a|s), weight=φ
3: for each iteration do
4: Retrieve current state s0
5: for n= 0,1,2,...,(N1) do
6: Examine the properties of offloading task from TV
7: Broadcast beacon message
8: Collect information from the environment
9: Estimate sn
10: Action andetermines SV and inform agent in TV
11: Compute reward using (10) and estimate state sn+1
12: Save tuple (sn, an, Rn, sn+1)in replay memory
13: Update JQ(θ)in equation (13)
14: Update φin equation (16)
15: Update soft action value function θ0
16: end for
17: end for
TABLE I: Vehicular network and DRL parameters
Parameters Values
Maximum relative speed (ν)±55 km/hr
Maximum relative distance between two
vehicles
400m
Maximum tolerable delay of local tasks
(τn)(4 types)
[0.5, 1, 2, 4] seconds
Computation capability of vehicle (3 types) [3-10] GHz
Data size of task (3 categories) (Dm)[0.05, 0.9, 1.5] MB
Computation size of task (3 categories)
(Cm)
[0.1, 0.5, 1.0]
Task priority constant PH=0.5, PG=1,PL=2
Maximum number of tasks 50
Maximum local task 5
Hidden layers 2
Actor hidden layer unit [1000,1000]
Critic hidden layer [600,600]
Batch size (β)250
Epsilon decay rate denominator 150
Buffer size 99999
Activation function Softmax
Training episodes 2000
the mean delay is defined as the total overall delay4divided
by the number of tasks and sub-tasks. Unless otherwise stated,
Table I shows the parameters considered for simulation. Any
other parameter used will be explicitly mentioned therein.
1) Mean utility with respect to varying vehicle speed: In
the vehicular network, the velocity of a vehicle is considered
to be uniform in each time slot. Fig. 2 shows the mean utility
for varying speeds of the vehicles. The learning rate of the
algorithm is set at 102. The speeds are set as 30 km/hr, 50
km/hr and 70 km/hr over different traffic conditions varying
from 5 to 50 vehicle/Km. It can be observed from the figure
that when the speed of the vehicles are low, the SVs stay
within the range of the TV for longer duration. Hence, these
SVs qualify to take the offloaded task with longer time limit
and the utility for them is higher during slower speeds.
4The overall delay is given by PM
m=1µm.
5 10 15 20 25 30 35 40 45 50
Vehicles/km
-58
-56
-54
-52
-50
-48
-46
-44
Mean utility
SAC=70 km/hr
SAC=50 km/hr
SAC=30 km/hr
Fig. 2: Mean utility of SAC vs traffic for different vehicle speeds.
0.5 1 2 4
Maximum tolerable delay (in seconds)
0
2
4
6
8
10
12
Mean delay (in seconds)
LR=0.0005
LR=0.0002
LR=0.001
Fig. 3: Mean delay vs maximum tolerable delay in completing high-priority
tasks for SAC for different LR when traffic =45 Vehicles/km.
2) Mean delay with respect to varying learning rates:
Figure 3 shows the mean delay of the proposed SAC algorithm
in the successful computation of high priority tasks for
different learning rates (LR) in high traffic conditions of 45
vehicles per kilometer. The x-axis represents four different
cases of maximum tolerable delay in seconds. From the figure
it can be observed that the delay in computation increases with
the increase in learning rate.
3) SAC vs DDPG vs Greedy: Next, we compare the
proposed SAC based task offloading DRL algorithm with
two other baseline algorithms, namely greedy and DDPG
algorithms. In the greedy algorithm, the agent in the TV
selects the SV via random selection among the SVs which
have maximum computation power remaining from a set of
available SVs within range without considering any other
parameters such as distance, delay, speed etc. The DDPG on
the other hand is a DRL algorithm that is similar to SAC, but
with the important discrepancy that it uses deterministic policy
while SAC uses stochastic method. Under different scenarios
the comparative performance of DDPG and SAC varies,
whereby both can outperform each other. Nevertheless, SAC
acts as a bridge between stochastic policy optimization and
DDPG-style methods. Accordingly, we analyze the considered
5 10 15 20 25 30 35 40 45 50
Vehicles/km
-80
-70
-60
-50
-40
-30
-20
Mean utility
SAC
DDPG
Greedy Algorithm
Fig. 4: SAC vs DDPG vs Greedy with respect to mean utility.
0.5 1 2 4
Maximum tolerable delay (in seconds)
0
5
10
15
Mean delay (in seconds)
DDPG
SAC
Greedy
Fig. 5: SAC vs DDPG vs Greedy in terms of mean delay vs maximum
tolerable delay in completing the highest priority tasks when traffic =45
Vehicles/Km.
V2V network with respect to various network parameters to
check the performance and feasibility of the proposed SAC
algorithm with respect to the conventional DDPG based one.
The learning rates of the SAC and DDPG are set at 102.
Fig. 4 shows the mean utility of the proposed SAC algorithm
along with DDPG and greedy for varying traffic conditions.
It can be seen that for a particular vehicle speed the utility
of SAC is always higher than that of the greedy approach for
any traffic condition. Further, in SAC the difference in value
of utility in high and low traffic is minimal. On the contrary,
DDPG performs better in high traffic condition than low traffic
conditions. Hence, it can be concluded that traffic condition
does not particularly affect SAC and it may be possible that
DDPG outperforms SAC for networks with very dense traffic.
Finally, Fig. 5 shows the mean delay of the three algorithms
in successfully completing the computation of high priority
tasks in high traffic conditions of 45 vehicles per kilometer.
Since the greedy algorithm only considers the computation
power remaining and ignores factors such as distance, expected
delay, speed etc., its delay is somewhat unpredictable. With
regards to the DDPG and SAC, the mean delay for both
algorithms are somewhat similar and increases when the
maximum tolerable delay in the network increases. The
similarity between the DDPG and SAC is due to the
consideration of higher intensity traffic (i.e., 45 vehicles/Km)
in this figure, which was also validated in Fig. 4.
VI. CONCLUSION
A DRL algorithm based on SAC was proposed for efficient
task and resource allocation in a VFC framework. In particular,
we jointly considered the priority and size of tasks along
with other network parameters that affect the connectivity
among vehicles. Accordingly, we formulated a framework that
specify the workflow of the vehicular network to allocate
the idle computation power of the nearby vehicles to certain
task vehicles. The proposed SAC algorithm works towards
achieving the optimal policy for task offloading and maximizes
the mean utility of the considered network.
ACKNOWLEDGMENT
This work was supported by the Ministry of Science and
Technology of Taiwan under Grants MOST 110-2224-E-110-
001 & MOST 109-2221-E-110-050-MY3.
REFERENCES
[1] H. Ji, O. Alfarraj, and A. Tolba, “Artificial intelligence-empowered edge
of vehicles: Architecture, enabling technologies, and applications,” IEEE
Access, vol. 8, pp. 61 020–61 034, Mar. 2020.
[2] Y. Hui, Z. Su, T. H. Luan, and J. Cai, “Content in motion: An
edge computing based relay scheme for content dissemination in urban
vehicular networks,IEEE Trans. Intell. Transp. Syst., vol. 20, no. 8,
pp. 3115–3128, Nov. 2018.
[3] R. Q. Hu et al., “Mobility-aware edge caching and computing in vehicle
networks: A deep reinforcement learning,” IEEE Trans. Veh. Technol.,
vol. 67, no. 11, pp. 10 190–10 203, Aug. 2018.
[4] J. Shi, J. Du, J. Wang, J. Wang, and J. Yuan, “Priority-aware task
offloading in vehicular fog computing based on deep reinforcement
learning,” IEEE Trans. Veh. Technol., vol. 69, no. 12, pp. 16 067–16 081,
Dec. 2020.
[5] C. Chen, L. Chen, L. Liu, S. He, X. Yuan, D. Lan, and Z. Chen, “Delay-
optimized V2V-based computation offloading in urban vehicular edge
computing and networks,” IEEE Access, vol. 8, pp. 18 863–18 873, Jan.
2020.
[6] L. Pu, X. Chen, G. Mao, Q. Xie, and J. Xu, “Chimera: An energy-
efficient and deadline-aware hybrid edge computing framework for
vehicular crowdsensing applications,IEEE Internet Things J., vol. 6,
no. 1, pp. 84–99, feb 2018.
[7] J. Feng, Z. Liu, C. Wu, and Y. Ji, “AVE: Autonomous vehicular edge
computing framework with ACO-based scheduling,IEEE Trans. Veh.
Technol., vol. 66, no. 12, pp. 10 660–10 675, Jun. 2017.
[8] Y. Zhang et al., “Research on adaptive beacon message broadcasting
cycle based on vehicle driving stability,” Int. J. Netw. Manag., vol. 31,
no. 2, p. e2091, Mar. 2021.
[9] F. J. Ros, P. M. Ruiz, and I. Stojmenovic, “Acknowledgment-based
broadcast protocol for reliable and efficient data dissemination in
vehicular Ad Hoc networks,IEEE Trans. Mobile Comput., vol. 11,
no. 1, pp. 33–46, Dec. 2010.
[10] W. L. Tan, W. C. Lau, O. Yue, and T. H. Hui, “Analytical models
and performance evaluation of drive-thru internet systems,IEEE J. Sel.
Areas Commun., vol. 29, no. 1, pp. 207–222, Dec. 2010.
[11] J. Zhao, Q. Li, Y. Gong, and K. Zhang, “Computation offloading
and resource allocation for cloud assisted mobile edge computing in
vehicular networks,IEEE Trans. Veh. Technol., vol. 68, no. 8, pp. 7944–
7956, Jun. 2019.
[12] Z. Chen and X. Wang, “Decentralized computation offloading for multi-
user mobile edge computing: A deep reinforcement learning approach,”
EURASIP J. Wirel. Commun. Netw., vol. 2020, no. 1, pp. 1–21, 2020.
[13] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-
policy maximum entropy deep reinforcement learning with a stochastic
actor,” in Proc. International conference on machine learning. PMLR,
Jul. 2018, pp. 1861–1870.
... However, MEC networks assisted by UAVs can still face several challenges such as longer transmission time due to the long line-of-sight (LoS) communication link with the user [11]. Consequently, for highly delay-sensitive communications, short-range connections such as vehicle-to-vehicle (V2V) communications can be more beneficial [12]- [14]. V2V communications are short-range communications while V2I communications such as vehicle to UAV, vehicle to roadside units (RSU) links etc., are long-range communications. ...
... where W is the mini-batch transition in replay buffer and ∇ denotes the parameters of policy and θ ψ denotes the actor parameters [1], [12]. The proposed RADiT algorithm is summarized in Algorithm 2. ...
... In SAC, the policy updates in a stochastic way while mapping the present state to the probability distributions over the actions. Accordingly, the Q-function or the critic in the SAC algorithm can be updated using the Bellman equation such that [12] Q π (s n , a n ) =R (sn,an) + ρE s n ′ ∼p [V (s n ′ )] , ...
Article
Full-text available
Digital twin (DT) has emerged as a promising technology for improving resource allocation decisions in Internet of Vehicles (IoV) networks. In this paper, we consider an IoV network where mobile edge computing (MEC) servers are deployed at the roadside units (RSUs). The IoV network provides ubiquitous connections even in areas uncovered by RSUs with the assistance of unmanned aerial vehicles (UAVs) which can act as a relay between RSUs and task vehicles. A virtual representation of the IoV network is established in the aerial network as DT which captures the dynamics of the entities of the physical network in real-time in order to perform efficient resource allocation for delay-intolerant tasks. We investigate an intelligent delay-sensitive task offloading scheme for the dynamic vehicular environment which provides computation resources via local execution, vehicle-to-vehicle (V2V), and vehicle-to-roadside-unit (V2I) offloading modes based on the energy consumption of the system. Moreover, we also propose a multi-network deep reinforcement learning (DRL)-based resource allocation algorithm (RADiT) in the DT-assisted network for maximizing the utility of the IoV network while optimizing the task offloading strategy. Further, we compare the performance of the proposed algorithm with and without the presence of V2V computation mode. RADiT is further evaluated by comparing it with another benchmark DRL algorithm called soft actor-critic (SAC) and a non-DRL approach called greedy. Finally, simulations are performed to demonstrate that the utility of the proposed RADiT algorithm is higher under every condition compared to its respective conditions in SAC and greedy approach. Consequently, the proposed framework jointly improves energy efficiency and reduces the overall delay of the network. The proposed algorithm with UAV relay further increases the efficiency of the network by increasing the task completion rate.
... Recent advancements in cloud or edge computing, task offloading, caching, and vehicular applications have transformed the Internet of Vehicles (IoV) landscape, enabling real-time information sharing and autonomous driving [1]. However, this approach faces challenges including increased latency, high energy consumption, and communication congestion [2]. To overcome these challenges, integrating vehicular fog computing (VFC) networks with mobile edge computing (MEC) servers provides a promising solution. ...
... , N }. While the velocity of a vehicle remains constant within a time slot, it may vary across different time slots [2]. To represent the location of a vehicle v at a given time slot n th , we use the notation ...
Article
Full-text available
In this paper, we propose a novel approach for optimal resource management and caching in ultra-reliable low-latency communication (URLLC)-enabled Internet of Vehicles (IoV) networks. The proposed framework includes mobile edge computing (MEC) servers integrated into roadside units (RSUs), unmanned aerial vehicles (UAVs), and base stations (BSs) for hybrid vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communication. To enhance the accuracy of the global model while considering the mobility characteristics of vehicles, we leverage an asynchronous federated learning (AFL) algorithm. The problem of optimal resource allocation is formulated to achieve the best allocation of frequency, computation, and caching resources while complying with the delay restrictions. To solve the non-convex problem, a multi-agent actor-critic type deep reinforcement learning algorithm called DMAAC algorithm is introduced. Additionally, a cooperative caching scheme based on the AFL framework called Co-Ca is proposed, utilizing a Dueling Deep-Q-Network (DDQN) to predict frequently accessed contents and cache them efficiently. Extensive simulation results show the effectiveness of the proposed framework and algorithms compared to existing schemes.
... Accordingly, the authors in [18] minimize the energy consumption and formulate a DRL-based approach to offload the tasks to roadside units (RSU) or to other vehicles. Similarly, in [16], [19]- [21], the authors used a DRL-based approach for task offloading, while the authors in [22] proposed a dynamic pricing system for offloading tasks to an unmanned aerial vehicle (UAV) mounted in the MAEC server. The authors in [23] designed a dynamic task offloading strategy to achieve optimal task offloading using the twin delayed deep deterministic policy gradient algorithm (also known as TD3), which is one of the state-of-the-art DRL algorithms. ...
Article
Full-text available
This paper considers an internet of vehicles (IoV) network, where multi-access edge computing (MAEC) servers are deployed at base stations (BSs) aided by multiple reconfigurable intelligent surfaces (RISs) for both uplink and downlink transmission. An intelligent task offloading methodology is designed to optimize the resource allocation scheme in the vehicular network which is based on the state of criticality of the network and the priority and size of tasks. We then develop a multi-agent deep reinforcement learning (MA-DRL) framework using the Markov game for optimizing the task offloading decision strategy. The proposed algorithm maximizes the mean utility of the IoV network and improves the communication quality. Extensive numerical results were performed that demonstrate that the RIS-assisted IoV network using the proposed MA-DRL algorithm achieves higher utility than current state-of-the art networks (not aided by RISs) and other baseline DRL algorithms, namely soft actor-critic (SAC), deep deterministic policy gradient (DDPG), twin delayed DDPG (TD3). The proposed method improves the offloading data rate of the tasks, reduces the mean delay and ensures that a higher percentage of offloaded tasks are completed compared to that of other DRL based and non-RIS assisted IoV frameworks. Index Terms-Internet of vehicles (IoV), multi-access edge computing (MAEC), reconfigurable intelligent surface (RIS), multi-agent deep reinforcement learning (MA-DRL).
Article
With the development of intelligent transportation, various computation intensive and delay sensitive applications are emerging in the Internet of Vehicles (IoV). The B5G/6G(Beyond 5th generation mobile communication technology/ 6th generation mobile communication technology) network has the characteristics of ultra-low latency and ultramany connections. The deployment of the Network in Boxes (NIB) supporting B5G/6G network in the vehicle can realize the real-time communication with the edge server (ES) and offload the task to the ES. However, the current Multi-access Edge Computing (MEC) lacks research on cooperative processing among multiple edge servers (ESs), and the efficiency of data-intensive computation tasks is still insufficient. In this paper, we investigate the cooperative offloading of multi-type tasks among ESs in B5G/6G networks under dynamic environment. In order to minimize the delay of task execution, we regard cooperative offloading as a Markov Decision Process (MDP), and improve the convergence speed and stability of traditional SAC(Soft Actor Critic) algorithm by adaptive weight sampling mechanism. Finally, an offline centralized training distributed execution framework based on improved soft actor critical (ISAC)(OCTDE-ISAC)is proposed to optimize the cooperative offloading strategy. The experimental results show that the proposed algorithm is better than the existing algorithm in terms of latency.
Article
Full-text available
Vehicular fog computing (VFC) has been expected as a promising scheme that can increase the computational capability of vehicles without relying on servers. Comparing with accessing the remote cloud, VFC is suitable for delay-sensitive tasks because of its low-latency vehicle-to-vehicle (V2V) transmission. However, due to the dynamic vehicular environment, how to motivate vehicles to share their idle computing resource while simultaneously evaluating the service availability of vehicles in terms of vehicle mobility and vehicular computational capability in heterogeneous vehicular networks is a main challenge. Meanwhile, tasks with different priorities of a vehicle should be processed with different efficiencies. In this work, we propose a task offloading scheme in the context of VFC, where vehicles are incentivized to share their idle computing resource by dynamic pricing, which comprehensively considers the mobility of vehicles, the task priority, and the service availability of vehicles. Given that the policy of task offloading depends on the state of the dynamic vehicular environment, we formulate the task offloading problem as a Markov decision process (MDP) aiming at maximizing the mean latency-aware utility of tasks in a period. To solve this problem, we develop a soft actor-critic (SAC) based deep reinforcement learning (DRL) algorithm for the sake of maximizing both the expected reward and the entropy of policy. Finally, extensive simulation results validate the effectiveness and superiority of our proposed scheme benchmarked with traditional algorithms.
Article
Full-text available
Mobile edge computing (MEC) emerges recently as a promising solution to relieve resource-limited mobile devices from computation-intensive tasks, which enables devices to offload workloads to nearby MEC servers and improve the quality of computation experience. In this paper, an MEC enabled multi-user multi-input multi-output (MIMO) system with stochastic wireless channels and task arrivals is considered. In order to minimize long-term average computation cost in terms of power consumption and buffering delay at each user, a deep reinforcement learning (DRL)-based dynamic computation offloading strategy is investigated to build a scalable system with limited feedback. Specifically, a continuous action space-based DRL approach named deep deterministic policy gradient (DDPG) is adopted to learn decentralized computation offloading policies at all users respectively, where local execution and task offloading powers will be adaptively allocated according to each user’s local observation. Numerical results demonstrate that the proposed DDPG-based strategy can help each user learn an efficient dynamic offloading policy and also verify the superiority of its continuous power allocation capability to policies learned by conventional discrete action space-based reinforcement learning approaches like deep Q-network (DQN) as well as some other greedy strategies with reduced computation cost. Besides, power-delay tradeoff for computation offloading is also analyzed for both the DDPG-based and DQN-based strategies.
Article
Full-text available
With the proliferation of mobile devices and a wealth of rich application services, the Internet of vehicles (IoV) has struggled to handle computationally intensive and delay-sensitive computing tasks. To substantially reduce the latency and the energy consumption, application work is offloaded from a mobile device to a remote cloud or a nearby mobile edge cloud for processing. Compared with remote clouds, mobile edge clouds are located at the edge of the network. Therefore, mobile edge computing (MEC) has the advantages of effectively utilizing idle computing and storage resources at the edge of the network and reducing the network transmission delay. In addition, mobile devices are increasingly moving toward intelligence. To satisfy the service experience and service quality requirements of mobile users, the vehicle Internet is transforming into the intelligent vehicle Internet. Artificial intelligence (AI) technology can adapt to rapidly changing dynamic environments to provide multiple task requirements for resource allocation, computational task scheduling, and vehicle trajectory prediction. On this basis, combined with MEC technology and AI technology, computing and storage resources are placed on the edge of the network to provide real-time data processing while providing more efficient and intelligent services. This article introduces IoV from three aspects, namely, MEC, AI and the advantages of combining the two, and analyzes the corresponding architecture and implementation technology. The application of MEC and AI in IoV is analyzed and compared with current approaches. Finally, several promising future directions in the field of IoV are discussed.
Article
Full-text available
The Internet of Vehicles (IoV) is an emerging paradigm, driven by recent advancements in vehicular communications and networking. Meanwhile, the capability and intelligence of vehicles are being rapidly enhanced, and this will have the potential of supporting a plethora of new exciting applications, which will integrate fully autonomous vehicles, the Internet of Things (IoT), and the environment. In view of the delay-sensitive property of these promising applications, as well as the high expense by using infrastructures and roadside units (RSU), the task offloading among vehicles has gained enormous popularity considering its free-of-charge and timely response. In this paper, by utilizing the gathering period of vehicles in urban environment due to stopped by traffic lights or Area of Interest (AOI), a task offloading scheme merely relying on vehicle-to-vehicle (V2V) communication is proposed by fully exploring the idle resources of gathered vehicles for task execution. Through formulating the task execution as a Min-Max problem among one task and several cooperative vehicles, the task executing time is optimized with the Max-Min Fairness scheme, which is further solved by the Particle Swarm Optimization (PSO) Algorithm. Extensive simulation demonstrate that our model could well meet the delay requirement of delay-sensitive application by cooperative computing among vehicles.
Article
Full-text available
This paper studies the joint communication, caching and computing design problem for achieving the operational excellence and the cost efficiency of the vehicular networks. Moreover, the resource allocation policy is designed by considering the vehicle’s mobility and the hard service deadline constraint. These critical challenges have often been either neglected or addressed inadequately in the existing work on the vehicular networks because of their high complexity. We develop a deep reinforcement learning with the multi-timescale framework to tackle these grand challenges in this paper. Furthermore, we propose the mobility-aware reward estimation for the large timescale model to mitigate the complexity due to the large action space. Numerical results are presented to illustrate the theoretical findings developed in the paper and to quantify the performance gains attained.
Article
Full-text available
With the emergence of in-vehicle applications, providing the required computational capabilities is becoming a crucial problem. This paper proposes a framework named Autonomous Vehicular Edge (AVE) for edge computing on the road, with the aim of increasing the computational capabilities of vehicles in a decentralized manner. By managing the idle computational resources on vehicles and using them efficiently, the proposed AVE framework can provide computation services in dynamic vehicular environments without requiring particular infrastructures to be deployed. Specifically, this paper introduces a workflow to support the autonomous organization of vehicular edges. Efficient job caching is proposed to better schedule jobs based on the information collected on neighboring vehicles, including GPS information. A scheduling algorithm based on ant colony optimization (ACO) is designed to solve this job assignment problem. Extensive simulations are conducted, and the simulation results demonstrate the superiority of this approach over competing schemes in typical urban and highway scenarios.
Article
Ensuring smooth communication by fixed‐cycle message beaconing in vehicular environments is necessary to address vehicles safety. However, fixed‐cycle beacon messages cannot accommodate the characteristics of fast vehicle speeds and variable network topologies and can cause problems such as channel congestion when traffic density is too high. Therefore, in order to realize safe and reliable information transmission between vehicles, this paper proposes a strategy for adaptive update of beacon message cycle based on vehicle driving stability. It is based on two rules: one is that the vehicle position prediction error is defined as an unstable vehicle, and the small error is defined as a stable vehicle; and the other is that the adaptive beacon message cycle is ranged, which is determined according to the channel load capacity. The experimental results show that the strategy can effectively avoid the channel congestion problem and improve the driving safety of the vehicle. Compared with the fixed‐cycle beacon message, the communication delay is reduced by about 10%, the packet loss is reduced by about 22%, and the energy consumption is reduced. This paper proposes a strategy for adaptive update of beacon message cycle based on vehicle driving stability. It is based on two rules: one is that the vehicle position prediction error is defined as an unstable vehicle, and the small error is defined as a stable vehicle; And the other is that the adaptive beacon message cycle is ranged, which is determined according to the channel load capacity.
Article
Computation offloading services provide required computing resources for vehicles with computation-intensive tasks. Past computation offloading research mainly focused on mobile edge computing (MEC) or cloud computing, separately. This paper presents a collaborative approach based on MEC and cloud computing that offloads services to automobiles in vehicular networks. A cloud-MEC collaborative computation offloading problem is formulated through jointly optimizing computation offloading decision and computation resource allocation. Since the problem is non-convex and NP-hard, we propose a collaborative computation offloading and resource allocation optimization (CCORAO) scheme to solve it, and design a distributed computation offloading and resource allocation (DCORA) algorithm for CCORAO scheme that achieves the optimal solution. The simulation results show that the proposed algorithm can effectively improve the system utility and computation time, especially for the scenario where the MEC servers fail to meet demands due to insufficient computation resources.
Article
Content dissemination, in particular, small-volume localized content dissemination, represents a killer application in vehicular networks, such as advertising distribution and road traffic alerts. The dissemination of contents in vehicular networks typically relies on the roadside infrastructure and moving vehicles to relay and propagate contents. Due to instinct challenges posed by the features of vehicles (mobility, selfishness, and routes) and limited communication ability of infrastructures, to efficiently motivate vehicles to join in the content dissemination process and appropriately select the relay vehicles to satisfy different transmission requirements is a challenging task. This paper develops a novel edge-computing-based content dissemination framework to address the issue, composed of two phases. In the first phase, the contents are uploaded to an edge computing device (ECD), which is an edge caching and communication infrastructure deployed by the content provider. By jointly considering the selfishness and the transmission capability of vehicles, a two-stage relay selection algorithm is designed to help the ECD selectively deliver the content through vehicle-to-infrastructure (V2I) communications to satisfy its requirements. In the second phase, the vehicles selected by the ECD relay the content to the vehicles that are interested in the content during the trip to destinations via vehicle-to-vehicle (V2V) communications, where the efficiency of content delivery is analyzed according to the probability that vehicles encounter on the path. Using extensive simulations, we show that our framework disseminates contents to vehicles more efficiently and brings more payoffs to the content provider than the conventional methods.
Article
In this paper, we propose Chimera, a novel hybrid edge computing framework, integrated with the emerging edge cloud radio access network, to augment network-wide vehicle resources for future large-scale vehicular crowdsensing applications, by leveraging a multitude of cooperative vehicles and the virtual machine (VM) pool in the edge cloud via the control of the application manager deployed in the edge cloud. We present a comprehensive framework model and formulate a novel multi-vehicle and multi-task offloading problem, aiming at minimizing the energy consumption of network-wide recruited vehicles serving heterogeneous crowdsensing applications, and meanwhile reconciling both application deadline and vehicle incentive. We invoke Lyapunov optimization framework to design TaskSche, an online task scheduling algorithm, which only utilizes the current system information. As the core components of the algorithm, we propose a task workload assignment policy based on graph transformation and a knapsack-based VM pool resource allocation policy. Rigorous theoretical analyses and extensive trace-driven simulations indicate that our framework achieves superior performance (e.g., 20% – 68% energy saving without overstepping application deadlines for network-wide vehicles compared with vehicle local processing) and scales well for a large number of vehicles and applications.