Content uploaded by Keshav Singh
Author content
All content in this area was uploaded by Keshav Singh on Apr 14, 2022
Content may be subject to copyright.
SAC-Based Resource Allocation for Computation
Offloading in IoV Networks
Bishmita Hazarika†, Keshav Singh†, Sudip Biswas‡, Shahid Mumtaz§, and Chih-Peng Li†
†Institute of Communications Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan
‡Department of ECE, Indian Institute of Information Technology Guwahati, India
§Instituto de Telecomunicac¸ ˜
oes, P-3810-193 Aveiro, Portugal
Email: me.bishmita@gmail.com, {keshav.singh, cpli}@mail.nsysu.edu.tw, sudip.biswas@iiitg.ac.in, smumtaz@av.it.pt
Abstract—Due to the dynamic nature of a vehicular fog
computing environment, efficient real-time resource allocation in
an internet of vehicles (IoV) network without affecting the quality
of service of any of the on-board vehicles can be challenging. This
paper proposes a priority-sensitive task offloading and resource
allocation scheme in an IoV network, where vehicles periodically
exchange beacon messages to inquire about available services
and other important information necessary for making the
offloading decisions. In the proposed methodology, the vehicles
are stimulated to share their idle computation resources with the
task vehicles, whereby a deep reinforcement learning algorithm
based on soft actor-critic (SAC) is designed to classify the
tasks based on priority and computation size of each task for
optimally allocating the power. In particular, the SAC algorithm
works towards achieving the optimal policy for task offloading
by maximizing the mean utility of the considered network.
Extensive numerical results along with a comparison with other
baseline algorithms, namely greedy and deep deterministic policy
gradient algorithms are presented to validate the feasibility of the
proposed algorithm.
Index Terms—Internet of vehicles (IoV), deep reinforcement
learning (DRL), soft actor-critic (SAC), task offloading.
I. INTRODUCTION
IN recent years, the advent of autonomous driving
and fifth generation (5G) communications have led
to the rapid development in the field of internet of
vehicles (IoV) in conjunction with artificial intelligence
(AI) [1]. Accommodating the interests of autonomous driving
involves heterogeneous tasks, computations, and wireless
communications within the vehicular network where most
of the applications are mission-critical that demand intense
computations and are delay-sensitive. However, the on-board
resources of most consumer vehicles are usually limited
and cannot fulfill the service requirements of the vehicular
environment. Thus, the quality of service (QoS) requirements
of the vehicular network get affected.
To overcome the challenges of large delay, limited
computation, and job scheduling in IoV, task/computation
offloading in a mobile edge computing (MEC) environment
has been considered [2]–[5]. In such an environment when
a vehicle does not have sufficient resources to perform the
required computation, part of its computation or tasks are
offloaded to the MEC server or cloud. MEC servers mostly
refer to base stations (BSs) with some computation power that
are sparsely deployed. Although such infrastructure can fulfill
the demand to some extent, it becomes challenging when the
traffic within the range of a BS is high and multiple vehicles
have tasks that require offloading. As a consequence, the
communication delay increases, and latency in the execution
of the mission-critical tasks leads to the failure of the delay-
sensitive tasks. Furthermore, since the vehicles move at certain
speeds but the BSs are stationary, the wireless links between
a vehicle and the BSs remain intact only for a small amount
of time.
To address the above issues of MEC, vehicular fog
computing (VFC) has been recently proposed, which has
proved to be a more adequate solution by enabling the
vehicles to share resources among themselves [5], [6]. In
this regard, to enhance the performance of task offloading
and resource allocation in a VFC framework, the authors
in [5] and [6] focused on the problem of minimizing the
delay in task allocation using algorithms such as particle
swarm optimization algorithm and lyapunov optimization,
respectively. In a similar vein, the authors in [3] and [4]
proposed deep reinforcement learning (DRL)-based algorithms
for resource allocation in a VFC framework. Next, the
authors in [7] proposed a framework that uses beacon
messages within the vehicular network to obtain information
about the resources available in nearby vehicles. The use
of beacon messaging is a fast and simple method for
exchanging information which is a crucial component in a
VFC framework. Accordingly, the authors in [7]–[9] also used
the method of beacon messages to exchange information or
notify neighboring nodes about the node information in a
similar VFC framework. In spite of the many advantages of
VFC, several issues have limited the deployment of VFC in
real-time environments. In particular, the primary challenges in
a VFC environment are related to efficient resource allocation,
delay in mission-critical tasks, latency in communication, and
shorter link duration between the task vehicle and the service
provider.
To address the challenges plaguing the deployment of VFC,
in this paper we propose a IoV network with hybrid vehicle-
to-vehicle (V2V) and vehicle-to-MEC/cloud task offloading.
The key contributions are highlighted as follows:
•We design a VFC framework which uses beacon
messages for information exchange rather than
conventional BS to vehicle wireless link. The proposed
framework considers parked vehicles, moving vehicles
as well as pedestrians with computation resources that
act as service providers to compute the offloaded tasks.
Fig. 1: An illustration of the considered IoV network.
•The vehicles are stimulated to share their idle
computation resources with the task vehicles, whereby
a DRL algorithm based on soft actor-critic (SAC)
is designed using markov decision process (MDP) to
classify the tasks based on priority and computation size
of each task for optimally allocating the power.
•Three utility functions are formulated based on three
priority levels of the tasks. The SAC algorithm works
towards achieving the optimal policy for task offloading
by maximizing the mean utility of the considered
network.
•Extensive numerical results along with a comparison
with other baseline algorithms, namely greedy and deep
deterministic policy gradient (DDPG) algorithms are
presented to validate the feasibility of the proposed
algorithm.
Organization: The flow of the paper is organized as follows.
Section II provides the detailed explanation of the considered
system model. In Section IV the proposed DRL algorithm is
discussed and the framework is formulated. Simulation results
are discussed in Section V and finally, the conclusions are
drawn in Section VI.
II. SYS TEM MO DEL
A. System Architecture
We consider a IoV network as illustrated in Fig. 1 involving
a multi-layer distributed VFC framework that consists of
the physical layer and the hierarchical Fog-Cloud layer1.
The framework consists of two offloading modes: Vehicle-to-
Vehicle (V2V) and Vehicle-to-Pedestrian (V2P). The vehicles
that participate in task offloading are categorized into two
types: task vehicle (TV) and service vehicle (SV). The TVs
execute tasks locally, offload to other SVs or to the edge/cloud.
•V2V: In a V2V communication, both the TV and SV
can be on the move or the TV is on the move and
SV is in a parked state. Regardless of the state, all
SVs within the range of a TV are eligible to provide
computation resources to the TV. In the considered
model, SV and TV can move in the same or opposite
direction. The significant factors determining a SV are
the relative distance from the TV, its direction, speed,
available resources, etc.
1In the Fog-Cloud layer, several BSs can communicate among themselves.
•V2P: Apart from vehicles, pedestrians on the road with
resources can also participate in task offloading as a
service provider since most of the pedestrians carry
devices with high computation power like mobile phones,
tablets, etc.
We assume that the time in the network is divided into N
periods, with each period having multiple frames. A TV with
inadequate computation resources uses beacon messages in
every time slot to estimate the information of the nearby SVs
with a ping-ACK type message exchange. The status of a
system remains constant within one time frame but can change
over different time frames2.
•Beacon message: These messages are broadcast messages
that acquire position information along with other
required basic status information of the nearby vehicles
like available resources, expected delay, relative distance,
speed and direction from the SV periodically [7] [8]
[9]. This method of using beacons reduces the delay
in resource allocation process due to their speed and
simplicity.
Accordingly, during time n, we consider the TV to have S
service providers within the communication range, with SV
being either vehicle or pedestrian within the communication
range with available computation resources. For the purpose
of this study, we focus on one TV and several SVs within its
range. Whenever a TV has to offload a task, it broadcasts
beacon messages to all the SVs within range and all the
eligible SVs with available idle resources respond to the
broadcast with ACK type messages containing the required
information. The agent located at the TV then determines
the most eligible SV for the task to be offloaded. Since
beacon messages are fast and involves lesser number of
bits per transmission, the transmission delay in assignment
of SV can be ignored. Let there be Mtasks represented
as φ1,φ2,...,φm,...,φM. The description of task φmcan be
categorized by the data size Dm, computation size or CPU
cycles Cm, delay constraint µmand task priority ρm. Now,
considering that the wireless channel between TV and SV is
static during a task, the transmission rate between them is
given as
rst =bst log2(1 + γst).(1)
Here, bst denotes the allocated bandwidth and γst denotes the
signal to interference and noise (SINR), which is given as
γst =ωtrsλ−α
st |hst|2
Σj∈S,j6=sωjλ−α
j,v |hj,v|2+N0
.(2)
Here, ωtrs is the power of the transmitter, λst denotes the
relative distance between TV and SV, αis the path loss
exponent, hst is the desired channel gain, N0is the additive
white Gaussian noise and Σj∈S,j6=sωjλ−α
j,v |hj,v|2denotes the
aggregate interference.
2The estimated time period for the two vehicles to remain in contact, along
with available resources, channel state and expected delay can change over
different time slots. Further, the duration that a SV remains within range of
a TV depends on the velocity of both vehicles that can be approximated as a
Gaussian distribution [10] [2].
III. TAS K MODE L AND UTILITY
A. Task Model
In general, the task to be offloaded are categorized based
on two factors: priority and size.
•Priority of task: The task priority can be classified as
high priority tasks (PH), general tasks (PG), and low
priority tasks (PL) based on delay constraint. PHtasks
are critical tasks with a very short tolerable delay such
as security-related tasks, navigation, traffic, road analysis,
etc. A PHtask should be finished in time, otherwise, the
task would lose its value and fail. PGtasks have a longer
tolerable delay and the requirement of immediate action
is relatively lower than that of PHwhile PLtasks can
tolerate very high delays. Such tasks do not lose their
value even if it takes a long time to execute them.
•Size of Task: The size of the computational tasks can be
classified into small (SSm ), general (SG), and large tasks
(SL), where small tasks are the tasks that involve the
least number of bits and large tasks involve the highest
number of bits. However, if the size of the task is too
large and the priority of the task is higher and needs to
be executed within a deadline, the TV can split the task
into multiple sub-tasks and offload them to different SVs
to be executed in parallel.
There are three ways a task can be executed: locally, V2V
offloading or edge/cloud offloading. In this work, we do
not focus on the separate edge and cloud layers. Hence,
they are considered as one. Nevertheless, the proposed model
can be easily extended to a hybrid edge and cloud model
encompassing the V2V communication layer. PHtasks are
extremely delay-sensitive and hence they must be processed
locally. Further, since PHtasks involve mostly the first-hand
computation tasks in a vehicle, the size of such tasks are
mostly small but needs continuous computation, for example,
sensing the road continuously when the TV is on the move
or continuous navigation. Hence, PHtasks of size SSm are
executed by the local processor. On the contrary, PLtasks
are delay tolerant and can be executed over a longer time
stamp. Therefore PLand SLtasks can be offloaded directly
to the edge/cloud even if there are resources available nearby
to preserve the vehicular resources for future tasks of the same
or other TVs. In case there are no eligible SVs nearby, the TV
offloads the computation task to the edge/cloud.
B. Utility
For calculating the utility we assume that the PHtasks have
a strict maximum tolerable delay or deadline beyond which
the tasks lose their values. Whenever a task execution time
exceeds the tolerable time limit, the utility becomes negative
since the task offloading has failed. The utility function can
be mathematically defined as [11]
UPH
n=(log (1 + µm−nm)/ηS0
Sm , nm≤µm,
−ζPH, nm> µm,(3)
where nmdenotes the time taken to complete the task φmand
−ζPHis the negative value assigned due to failure in executing
the task within the time limit and ηS0
Sm denotes the number
of local small-sized high priority tasks. Next, the utility of PG
tasks can be given as
UPG
n=(ζ(PG)/ηSSx , nm≤µm,
ζ(PG)e−c(nm−µm), nm> µm.(4)
Here, ηSSx denotes the summation of the number of small
tasks of priority class PG,SGsized tasks, and the number of
sub-tasks from large tasks (divided for offloading for parallel
or simultaneous execution) or SLtasks of priority class PG.
If the task is executed within the maximum tolerable delay,
a positive value is assigned to the utility whereas if the task
execution time exceeds the time limit, unlike PH, the task does
not completely lose its value or fail but the value of utility
decreases exponentially with time. In (4), ζ(PG)is the positive
utility constant. Furthermore, UPL
ncalculates the utility of PL
and SLthe tasks that are directly offloaded to the MEC server
or cloud and is given as
UPL
n=(ζPL/ηS0
L, nm≤µm,
0, nm> µm.(5)
Here, ηS0
Ldenotes the number of large tasks that is eligible to
be directly offloaded to the cloud. Although PLtasks are delay
tolerant, these tasks might still fail due to other technical issues
such as communication interruption leading to nm=∞. In
such cases, the utility is calculated as zero. Let χmdenote the
energy consumed in SV for task φm,fmbe the computation
frequency of SV and ιmdenotes the time required to compute
task φmand Cmdenotes the computational size of the task.
For frequency fmin a SV, the amount of energy consumed
is proportional to the amount of time required for the task
to be completed i.e. for φm, task Cm=fmιm. Hence, energy
consumed is proportional to the computational size. Then, the
overall utility of the TV can be calculated as
Un= (ρPH)UPH
n+ (ρ(PG))UPG
n+ (ρ(PL))UPL
n−χmCm,
(6)
where ρPH,ρPGand ρPLare constants that represent the
priority level of high priority tasks, general tasks, and low
priority tasks, respectively and is the indicator function.
In a vehicular network, each SV can also act as a TV,
such that a SV may have its own local PHtasks along with
offloaded tasks from other vehicles. Since each vehicle has
limited computation capability, the SV needs to ensure that it
has allocated enough resources for its local PHtasks so that
they are executed within the time limit. Let a service vehicle
Sscarry Llocal PHtasks and the computation size of the
tasks and maximum tolerable delay be Csand µs, respectively.
Then, the minimum frequency required for local tasks and the
total frequency if all the resources are solely reserved for local
tasks are respectively given as
Fmin
Ss=XL
S=1
Cs
µs
,and FSs=XL
S=1
Cs
φSsµs
,(7)
where φSs=Fmin
Ss
FSsNow, from equation (6), the total utility
in calculating the local task is given by
Ulocal(φ) = XSs
s=1 log (1 + µs−φµs), φ ∈[φSs,1].(8)
When the SV executes an offloaded task along with a local
task, it allocates part of its frequency to the offloaded task.
Hence, the utility of the local task changes such that the
product of the energy consumed and computation size of the
task is equal to the difference of utility of the local task and
the utility when both local and offloaded tasks are processed.
This is given as
χmCm=Ulocal(φs)−Ulocal (φ0
s).(9)
Similarly, while executing the allocated task if any new task
arrives, then the utility is again updated to Ulocal
Ss(φ00
s), where
φ00
sis dependent on the energy consumed during execution of
the tasks, which is proportional to the computation size of the
tasks.
IV. SOF T ACTO R-C RITIC (SAC) BASE D TASK
OFFLOA DING
Since in the considered framework, parameters in the V2V
network can change over different time slots, hence though
the current state of the network is known, the next state or the
future states are unknown and are likely to be different from
the current state. To address the problem of change in states,
it is modeled as a MDP3with the objective of maximizing
the utility of task offloading and is solved using a model-free
DRL algorithm that employs the SAC framework to evaluate
and improve the policy of task offloading. The details of the
proposed task offloading are illustrated below.
A. Preliminaries on RL
•State space: Each TV contains an agent which takes
the decision of offloading a task on the basis of the
vehicular network’s information at time nsuch as
SINR of the communication channel, availability of SV,
remaining resources of Ss, utility of locally executed
tasks (Ulocal(φ)), data size, CPU cycle, maximum
tolerable delay, etc.
•Action space: The agent in the TV analyzes the
environment based on the state space and determines the
eligible SV to offload the task.
•Reward: Let Ωdenote a binary term such that the value
of Ωis 1if a task is successfully offloaded. In the
considered framework, we have Ntime slots. Hence, the
mean reward Rcan be calculated as
R=1
NXN−1
n=0 XS
s=1ΩSs
nUSs
n.(10)
B. SAC algorithm
An important feature of SAC is to regulate the entropy of
a policy. The policy in SAC is trained such that a trade-
off between the expected return and entropy is maximized.
Hence, the SAC algorithm is based on maximum entropy RL,
that aims to maximize the mean reward and entropy to find
the optimal policy given by π. Accordingly, the relationship
between the soft action value and the state value functions at
3The details on MDP modelling is omitted due to space constraints.
However, the readers can refer to [12] for details on MDP.
state s, action aand time nfor the considered V2V network
can be given as equation (11),(12) as per Bellman equation
Qπ(s, a) =R(sn,an)+γsn+1∼p[V(sn+1)] ,(11)
with Vsn=an∼π[Q(sn, an)−log π(an|sn)] ,(12)
where pdenotes the trajectory distribution made by π. We
consider Qπ(s, a) = Qθ(s, a)in the DNN where θdenotes
the network parameters. The Q-function parameters are not
constant and the actor and critic are further updated according
to the replay buffer regarding action and immediate reward
in the replay buffer. Thus the parameters can be trained by
minimizing the loss function given by equation(13)
Qval =(Qθ(sn, an)−(Q0
θ0(sn, an)2],
JQ(θ) = sn,an∼rb[(1/2)(Qval )] ,(13)
where Qθ(sn, an)denotes the soft Q-value and rb denotes
replay buffer [13] and
Q0
θ0(sn, an) = R(sn,an)+γsn+1∼p[Vθ0(sn+1)] .(14)
In order to stabilize the iterations for action-value function,
equation (14) defines the target action-value function and θ0
is obtained by moving average of θexponentially.
The performance of the DRL algorithm depends on the
policy. Hence, if the policy is optimal the offloaded tasks will
be completed within time and utility will be high. On the
contrary, if the policy is not optimal, task computation failure
or exceeding the time limit will be a commonality which will
decrease the utility. Therefore, in order to improve the policy,
it is updated in terms of Kullback-Leibler divergence as in
equation (15) [13] where Πdenotes some set of policies which
co-relates to Gaussian distribution parameters and function Zπ
is used to normalize the distribution without effecting the new
policy.
πn= arg min
π0∈Π
DKL π0(.|sn)||exp((1/Λ)Qπ(sn, .)
Zπ(sn).(15)
The Kullback-Leibler divergence can be further minimized
by updating the policy parameters as
Jπ(φ) = sn∼rbhan∼πφ[Λ log(πφ(an|sn)) −Qθ(sn, an)]i.
(16)
The policy iteration should be continued until it reaches
the optimal policy value and converges with the maximum
entropy. The detailed SAC based task offloading algorithm is
illustrated in Algorithm 1.
V. NUMERICAL RE SULT S
In this section, we present the simulation results for the
considered IoV network. In particular, we consider one TV
and multiple SVs within the VFC framework where the SVs
include parked vehicles, moving vehicles and pedestrians.
Every moving SV can also have its own local tasks and hence,
the agent in every vehicle ensures that their local tasks or high
priority tasks are executed on time. The mean utility and mean
delay are used as the scores to measure the performance of
the network. While the mean utility is defined as Undivided
by the total number of local and offloaded tasks and sub-tasks,
Algorithm 1 SAC based task offloading
1: Initialize: Qθ1(s, a),Qθ2(s, a),Qθ0
1and Qθ0
2with weight
θ0
1=θ1and θ0
2=θ2
2: Initialize: policy πφ(a|s), weight=φ
3: for each iteration do
4: Retrieve current state s0
5: for n= 0,1,2,...,(N−1) do
6: Examine the properties of offloading task from TV
7: Broadcast beacon message
8: Collect information from the environment
9: Estimate sn
10: Action andetermines SV and inform agent in TV
11: Compute reward using (10) and estimate state sn+1
12: Save tuple (sn, an, Rn, sn+1)in replay memory
13: Update JQ(θ)in equation (13)
14: Update φin equation (16)
15: Update soft action value function θ0
16: end for
17: end for
TABLE I: Vehicular network and DRL parameters
Parameters Values
Maximum relative speed (ν)±55 km/hr
Maximum relative distance between two
vehicles
400m
Maximum tolerable delay of local tasks
(τn)(4 types)
[0.5, 1, 2, 4] seconds
Computation capability of vehicle (3 types) [3-10] GHz
Data size of task (3 categories) (Dm)[0.05, 0.9, 1.5] MB
Computation size of task (3 categories)
(Cm)
[0.1, 0.5, 1.0]
Task priority constant PH=0.5, PG=1,PL=2
Maximum number of tasks 50
Maximum local task 5
Hidden layers 2
Actor hidden layer unit [1000,1000]
Critic hidden layer [600,600]
Batch size (β)250
Epsilon decay rate denominator 150
Buffer size 99999
Activation function Softmax
Training episodes 2000
the mean delay is defined as the total overall delay4divided
by the number of tasks and sub-tasks. Unless otherwise stated,
Table I shows the parameters considered for simulation. Any
other parameter used will be explicitly mentioned therein.
1) Mean utility with respect to varying vehicle speed: In
the vehicular network, the velocity of a vehicle is considered
to be uniform in each time slot. Fig. 2 shows the mean utility
for varying speeds of the vehicles. The learning rate of the
algorithm is set at 10−2. The speeds are set as 30 km/hr, 50
km/hr and 70 km/hr over different traffic conditions varying
from 5 to 50 vehicle/Km. It can be observed from the figure
that when the speed of the vehicles are low, the SVs stay
within the range of the TV for longer duration. Hence, these
SVs qualify to take the offloaded task with longer time limit
and the utility for them is higher during slower speeds.
4The overall delay is given by PM
m=1µm.
5 10 15 20 25 30 35 40 45 50
Vehicles/km
-58
-56
-54
-52
-50
-48
-46
-44
Mean utility
SAC=70 km/hr
SAC=50 km/hr
SAC=30 km/hr
Fig. 2: Mean utility of SAC vs traffic for different vehicle speeds.
0.5 1 2 4
Maximum tolerable delay (in seconds)
0
2
4
6
8
10
12
Mean delay (in seconds)
LR=0.0005
LR=0.0002
LR=0.001
Fig. 3: Mean delay vs maximum tolerable delay in completing high-priority
tasks for SAC for different LR when traffic =45 Vehicles/km.
2) Mean delay with respect to varying learning rates:
Figure 3 shows the mean delay of the proposed SAC algorithm
in the successful computation of high priority tasks for
different learning rates (LR) in high traffic conditions of 45
vehicles per kilometer. The x-axis represents four different
cases of maximum tolerable delay in seconds. From the figure
it can be observed that the delay in computation increases with
the increase in learning rate.
3) SAC vs DDPG vs Greedy: Next, we compare the
proposed SAC based task offloading DRL algorithm with
two other baseline algorithms, namely greedy and DDPG
algorithms. In the greedy algorithm, the agent in the TV
selects the SV via random selection among the SVs which
have maximum computation power remaining from a set of
available SVs within range without considering any other
parameters such as distance, delay, speed etc. The DDPG on
the other hand is a DRL algorithm that is similar to SAC, but
with the important discrepancy that it uses deterministic policy
while SAC uses stochastic method. Under different scenarios
the comparative performance of DDPG and SAC varies,
whereby both can outperform each other. Nevertheless, SAC
acts as a bridge between stochastic policy optimization and
DDPG-style methods. Accordingly, we analyze the considered
5 10 15 20 25 30 35 40 45 50
Vehicles/km
-80
-70
-60
-50
-40
-30
-20
Mean utility
SAC
DDPG
Greedy Algorithm
Fig. 4: SAC vs DDPG vs Greedy with respect to mean utility.
0.5 1 2 4
Maximum tolerable delay (in seconds)
0
5
10
15
Mean delay (in seconds)
DDPG
SAC
Greedy
Fig. 5: SAC vs DDPG vs Greedy in terms of mean delay vs maximum
tolerable delay in completing the highest priority tasks when traffic =45
Vehicles/Km.
V2V network with respect to various network parameters to
check the performance and feasibility of the proposed SAC
algorithm with respect to the conventional DDPG based one.
The learning rates of the SAC and DDPG are set at 10−2.
Fig. 4 shows the mean utility of the proposed SAC algorithm
along with DDPG and greedy for varying traffic conditions.
It can be seen that for a particular vehicle speed the utility
of SAC is always higher than that of the greedy approach for
any traffic condition. Further, in SAC the difference in value
of utility in high and low traffic is minimal. On the contrary,
DDPG performs better in high traffic condition than low traffic
conditions. Hence, it can be concluded that traffic condition
does not particularly affect SAC and it may be possible that
DDPG outperforms SAC for networks with very dense traffic.
Finally, Fig. 5 shows the mean delay of the three algorithms
in successfully completing the computation of high priority
tasks in high traffic conditions of 45 vehicles per kilometer.
Since the greedy algorithm only considers the computation
power remaining and ignores factors such as distance, expected
delay, speed etc., its delay is somewhat unpredictable. With
regards to the DDPG and SAC, the mean delay for both
algorithms are somewhat similar and increases when the
maximum tolerable delay in the network increases. The
similarity between the DDPG and SAC is due to the
consideration of higher intensity traffic (i.e., 45 vehicles/Km)
in this figure, which was also validated in Fig. 4.
VI. CONCLUSION
A DRL algorithm based on SAC was proposed for efficient
task and resource allocation in a VFC framework. In particular,
we jointly considered the priority and size of tasks along
with other network parameters that affect the connectivity
among vehicles. Accordingly, we formulated a framework that
specify the workflow of the vehicular network to allocate
the idle computation power of the nearby vehicles to certain
task vehicles. The proposed SAC algorithm works towards
achieving the optimal policy for task offloading and maximizes
the mean utility of the considered network.
ACKNOWLEDGMENT
This work was supported by the Ministry of Science and
Technology of Taiwan under Grants MOST 110-2224-E-110-
001 & MOST 109-2221-E-110-050-MY3.
REFERENCES
[1] H. Ji, O. Alfarraj, and A. Tolba, “Artificial intelligence-empowered edge
of vehicles: Architecture, enabling technologies, and applications,” IEEE
Access, vol. 8, pp. 61 020–61 034, Mar. 2020.
[2] Y. Hui, Z. Su, T. H. Luan, and J. Cai, “Content in motion: An
edge computing based relay scheme for content dissemination in urban
vehicular networks,” IEEE Trans. Intell. Transp. Syst., vol. 20, no. 8,
pp. 3115–3128, Nov. 2018.
[3] R. Q. Hu et al., “Mobility-aware edge caching and computing in vehicle
networks: A deep reinforcement learning,” IEEE Trans. Veh. Technol.,
vol. 67, no. 11, pp. 10 190–10 203, Aug. 2018.
[4] J. Shi, J. Du, J. Wang, J. Wang, and J. Yuan, “Priority-aware task
offloading in vehicular fog computing based on deep reinforcement
learning,” IEEE Trans. Veh. Technol., vol. 69, no. 12, pp. 16 067–16 081,
Dec. 2020.
[5] C. Chen, L. Chen, L. Liu, S. He, X. Yuan, D. Lan, and Z. Chen, “Delay-
optimized V2V-based computation offloading in urban vehicular edge
computing and networks,” IEEE Access, vol. 8, pp. 18 863–18 873, Jan.
2020.
[6] L. Pu, X. Chen, G. Mao, Q. Xie, and J. Xu, “Chimera: An energy-
efficient and deadline-aware hybrid edge computing framework for
vehicular crowdsensing applications,” IEEE Internet Things J., vol. 6,
no. 1, pp. 84–99, feb 2018.
[7] J. Feng, Z. Liu, C. Wu, and Y. Ji, “AVE: Autonomous vehicular edge
computing framework with ACO-based scheduling,” IEEE Trans. Veh.
Technol., vol. 66, no. 12, pp. 10 660–10 675, Jun. 2017.
[8] Y. Zhang et al., “Research on adaptive beacon message broadcasting
cycle based on vehicle driving stability,” Int. J. Netw. Manag., vol. 31,
no. 2, p. e2091, Mar. 2021.
[9] F. J. Ros, P. M. Ruiz, and I. Stojmenovic, “Acknowledgment-based
broadcast protocol for reliable and efficient data dissemination in
vehicular Ad Hoc networks,” IEEE Trans. Mobile Comput., vol. 11,
no. 1, pp. 33–46, Dec. 2010.
[10] W. L. Tan, W. C. Lau, O. Yue, and T. H. Hui, “Analytical models
and performance evaluation of drive-thru internet systems,” IEEE J. Sel.
Areas Commun., vol. 29, no. 1, pp. 207–222, Dec. 2010.
[11] J. Zhao, Q. Li, Y. Gong, and K. Zhang, “Computation offloading
and resource allocation for cloud assisted mobile edge computing in
vehicular networks,” IEEE Trans. Veh. Technol., vol. 68, no. 8, pp. 7944–
7956, Jun. 2019.
[12] Z. Chen and X. Wang, “Decentralized computation offloading for multi-
user mobile edge computing: A deep reinforcement learning approach,”
EURASIP J. Wirel. Commun. Netw., vol. 2020, no. 1, pp. 1–21, 2020.
[13] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-
policy maximum entropy deep reinforcement learning with a stochastic
actor,” in Proc. International conference on machine learning. PMLR,
Jul. 2018, pp. 1861–1870.