Conference PaperPDF Available

SAC-Based Resource Allocation for Computation Offloading in IoV Networks

April 2022

April 2022

DOI:10.1109/EuCNC/6GSummit54941.2022.9815654

Conference: EuCNC & 6G Summit
At: Grenoble, France

Authors:

Bishmita Hazarika

National Sun Yat-sen University

Keshav Singh

National Sun Yat-sen University

Sudip Biswas

The University of Edinburgh

Shahid Mumtaz

Nottingham Trent University

Show all 5 authorsHide

Due to the dynamic nature of a vehicular fog computing environment, efficient real-time resource allocation in an internet of vehicles (IoV) network without affecting the quality of service of any of the on-board vehicles can be challenging. This paper proposes a priority-sensitive task offloading and resource allocation scheme in an IoV network, where vehicles periodically exchange beacon messages to inquire about available services and other important information necessary for making the offloading decisions. In the proposed methodology, the vehicles are stimulated to share their idle computation resources with the task vehicles, whereby a deep reinforcement learning algorithm based on soft actor-critic (SAC) is designed to classify the tasks based on priority and computation size of each task for optimally allocating the power. In particular, the SAC algorithm works towards achieving the optimal policy for task offloading by maximizing the mean utility of the considered network. Extensive numerical results along with a comparison with other baseline algorithms, namely greedy and deep deterministic policy gradient algorithms are presented to validate the feasibility of the proposed algorithm.

Content uploaded by Keshav Singh

Content may be subject to copyright.

SAC-Based Resource Allocation for Computation

Ofﬂoading in IoV Networks

Bishmita Hazarika†, Keshav Singh†, Sudip Biswas‡, Shahid Mumtaz§, and Chih-Peng Li†

†Institute of Communications Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan

‡Department of ECE, Indian Institute of Information Technology Guwahati, India

§Instituto de Telecomunicac¸ ˜

oes, P-3810-193 Aveiro, Portugal

Email: me.bishmita@gmail.com, {keshav.singh, cpli}@mail.nsysu.edu.tw, sudip.biswas@iiitg.ac.in, smumtaz@av.it.pt

Abstract—Due to the dynamic nature of a vehicular fog

computing environment, efﬁcient real-time resource allocation in

an internet of vehicles (IoV) network without affecting the quality

of service of any of the on-board vehicles can be challenging. This

paper proposes a priority-sensitive task ofﬂoading and resource

allocation scheme in an IoV network, where vehicles periodically

exchange beacon messages to inquire about available services

and other important information necessary for making the

ofﬂoading decisions. In the proposed methodology, the vehicles

are stimulated to share their idle computation resources with the

task vehicles, whereby a deep reinforcement learning algorithm

based on soft actor-critic (SAC) is designed to classify the

tasks based on priority and computation size of each task for

optimally allocating the power. In particular, the SAC algorithm

works towards achieving the optimal policy for task ofﬂoading

by maximizing the mean utility of the considered network.

Extensive numerical results along with a comparison with other

baseline algorithms, namely greedy and deep deterministic policy

gradient algorithms are presented to validate the feasibility of the

proposed algorithm.

Index Terms—Internet of vehicles (IoV), deep reinforcement

learning (DRL), soft actor-critic (SAC), task ofﬂoading.

I. INTRODUCTION

IN recent years, the advent of autonomous driving

and ﬁfth generation (5G) communications have led

to the rapid development in the ﬁeld of internet of

vehicles (IoV) in conjunction with artiﬁcial intelligence

(AI) [1]. Accommodating the interests of autonomous driving

involves heterogeneous tasks, computations, and wireless

communications within the vehicular network where most

of the applications are mission-critical that demand intense

computations and are delay-sensitive. However, the on-board

resources of most consumer vehicles are usually limited

and cannot fulﬁll the service requirements of the vehicular

environment. Thus, the quality of service (QoS) requirements

of the vehicular network get affected.

To overcome the challenges of large delay, limited

computation, and job scheduling in IoV, task/computation

ofﬂoading in a mobile edge computing (MEC) environment

has been considered [2]–[5]. In such an environment when

a vehicle does not have sufﬁcient resources to perform the

required computation, part of its computation or tasks are

ofﬂoaded to the MEC server or cloud. MEC servers mostly

refer to base stations (BSs) with some computation power that

are sparsely deployed. Although such infrastructure can fulﬁll

the demand to some extent, it becomes challenging when the

trafﬁc within the range of a BS is high and multiple vehicles

have tasks that require ofﬂoading. As a consequence, the

communication delay increases, and latency in the execution

of the mission-critical tasks leads to the failure of the delay-

sensitive tasks. Furthermore, since the vehicles move at certain

speeds but the BSs are stationary, the wireless links between

a vehicle and the BSs remain intact only for a small amount

of time.

To address the above issues of MEC, vehicular fog

computing (VFC) has been recently proposed, which has

proved to be a more adequate solution by enabling the

vehicles to share resources among themselves [5], [6]. In

this regard, to enhance the performance of task ofﬂoading

and resource allocation in a VFC framework, the authors

in [5] and [6] focused on the problem of minimizing the

delay in task allocation using algorithms such as particle

swarm optimization algorithm and lyapunov optimization,

respectively. In a similar vein, the authors in [3] and [4]

proposed deep reinforcement learning (DRL)-based algorithms

for resource allocation in a VFC framework. Next, the

authors in [7] proposed a framework that uses beacon

messages within the vehicular network to obtain information

about the resources available in nearby vehicles. The use

of beacon messaging is a fast and simple method for

exchanging information which is a crucial component in a

VFC framework. Accordingly, the authors in [7]–[9] also used

the method of beacon messages to exchange information or

notify neighboring nodes about the node information in a

similar VFC framework. In spite of the many advantages of

VFC, several issues have limited the deployment of VFC in

real-time environments. In particular, the primary challenges in

a VFC environment are related to efﬁcient resource allocation,

delay in mission-critical tasks, latency in communication, and

shorter link duration between the task vehicle and the service

provider.

To address the challenges plaguing the deployment of VFC,

in this paper we propose a IoV network with hybrid vehicle-

to-vehicle (V2V) and vehicle-to-MEC/cloud task ofﬂoading.

The key contributions are highlighted as follows:

•We design a VFC framework which uses beacon

messages for information exchange rather than

conventional BS to vehicle wireless link. The proposed

framework considers parked vehicles, moving vehicles

as well as pedestrians with computation resources that

act as service providers to compute the ofﬂoaded tasks.

Fig. 1: An illustration of the considered IoV network.

•The vehicles are stimulated to share their idle

computation resources with the task vehicles, whereby

a DRL algorithm based on soft actor-critic (SAC)

is designed using markov decision process (MDP) to

classify the tasks based on priority and computation size

of each task for optimally allocating the power.

•Three utility functions are formulated based on three

priority levels of the tasks. The SAC algorithm works

towards achieving the optimal policy for task ofﬂoading

by maximizing the mean utility of the considered

network.

•Extensive numerical results along with a comparison

with other baseline algorithms, namely greedy and deep

deterministic policy gradient (DDPG) algorithms are

presented to validate the feasibility of the proposed

algorithm.

Organization: The ﬂow of the paper is organized as follows.

Section II provides the detailed explanation of the considered

system model. In Section IV the proposed DRL algorithm is

discussed and the framework is formulated. Simulation results

are discussed in Section V and ﬁnally, the conclusions are

drawn in Section VI.

II. SYS TEM MO DEL

A. System Architecture

We consider a IoV network as illustrated in Fig. 1 involving

a multi-layer distributed VFC framework that consists of

the physical layer and the hierarchical Fog-Cloud layer1.

The framework consists of two ofﬂoading modes: Vehicle-to-

Vehicle (V2V) and Vehicle-to-Pedestrian (V2P). The vehicles

that participate in task ofﬂoading are categorized into two

types: task vehicle (TV) and service vehicle (SV). The TVs

execute tasks locally, ofﬂoad to other SVs or to the edge/cloud.

•V2V: In a V2V communication, both the TV and SV

can be on the move or the TV is on the move and

SV is in a parked state. Regardless of the state, all

SVs within the range of a TV are eligible to provide

computation resources to the TV. In the considered

model, SV and TV can move in the same or opposite

direction. The signiﬁcant factors determining a SV are

the relative distance from the TV, its direction, speed,

available resources, etc.

1In the Fog-Cloud layer, several BSs can communicate among themselves.

•V2P: Apart from vehicles, pedestrians on the road with

resources can also participate in task ofﬂoading as a

service provider since most of the pedestrians carry

devices with high computation power like mobile phones,

tablets, etc.

We assume that the time in the network is divided into N

periods, with each period having multiple frames. A TV with

inadequate computation resources uses beacon messages in

every time slot to estimate the information of the nearby SVs

with a ping-ACK type message exchange. The status of a

system remains constant within one time frame but can change

over different time frames2.

•Beacon message: These messages are broadcast messages

that acquire position information along with other

required basic status information of the nearby vehicles

like available resources, expected delay, relative distance,

speed and direction from the SV periodically [7] [8]

[9]. This method of using beacons reduces the delay

in resource allocation process due to their speed and

simplicity.

Accordingly, during time n, we consider the TV to have S

service providers within the communication range, with SV

being either vehicle or pedestrian within the communication

range with available computation resources. For the purpose

of this study, we focus on one TV and several SVs within its

range. Whenever a TV has to ofﬂoad a task, it broadcasts

beacon messages to all the SVs within range and all the

eligible SVs with available idle resources respond to the

broadcast with ACK type messages containing the required

information. The agent located at the TV then determines

the most eligible SV for the task to be ofﬂoaded. Since

beacon messages are fast and involves lesser number of

bits per transmission, the transmission delay in assignment

of SV can be ignored. Let there be Mtasks represented

as φ1,φ2,...,φm,...,φM. The description of task φmcan be

categorized by the data size Dm, computation size or CPU

cycles Cm, delay constraint µmand task priority ρm. Now,

considering that the wireless channel between TV and SV is

static during a task, the transmission rate between them is

given as

rst =bst log2(1 + γst).(1)

Here, bst denotes the allocated bandwidth and γst denotes the

signal to interference and noise (SINR), which is given as

γst =ωtrsλ−α

st |hst|2

Σj∈S,j6=sωjλ−α

j,v |hj,v|2+N0

.(2)

Here, ωtrs is the power of the transmitter, λst denotes the

relative distance between TV and SV, αis the path loss

exponent, hst is the desired channel gain, N0is the additive

white Gaussian noise and Σj∈S,j6=sωjλ−α

j,v |hj,v|2denotes the

aggregate interference.

2The estimated time period for the two vehicles to remain in contact, along

with available resources, channel state and expected delay can change over

different time slots. Further, the duration that a SV remains within range of

a TV depends on the velocity of both vehicles that can be approximated as a

Gaussian distribution [10] [2].

III. TAS K MODE L AND UTILITY

A. Task Model

In general, the task to be ofﬂoaded are categorized based

on two factors: priority and size.

•Priority of task: The task priority can be classiﬁed as

high priority tasks (PH), general tasks (PG), and low

priority tasks (PL) based on delay constraint. PHtasks

are critical tasks with a very short tolerable delay such

as security-related tasks, navigation, trafﬁc, road analysis,

etc. A PHtask should be ﬁnished in time, otherwise, the

task would lose its value and fail. PGtasks have a longer

tolerable delay and the requirement of immediate action

is relatively lower than that of PHwhile PLtasks can

tolerate very high delays. Such tasks do not lose their

value even if it takes a long time to execute them.

•Size of Task: The size of the computational tasks can be

classiﬁed into small (SSm ), general (SG), and large tasks

(SL), where small tasks are the tasks that involve the

least number of bits and large tasks involve the highest

number of bits. However, if the size of the task is too

large and the priority of the task is higher and needs to

be executed within a deadline, the TV can split the task

into multiple sub-tasks and ofﬂoad them to different SVs

to be executed in parallel.

There are three ways a task can be executed: locally, V2V

ofﬂoading or edge/cloud ofﬂoading. In this work, we do

not focus on the separate edge and cloud layers. Hence,

they are considered as one. Nevertheless, the proposed model

can be easily extended to a hybrid edge and cloud model

encompassing the V2V communication layer. PHtasks are

extremely delay-sensitive and hence they must be processed

locally. Further, since PHtasks involve mostly the ﬁrst-hand

computation tasks in a vehicle, the size of such tasks are

mostly small but needs continuous computation, for example,

sensing the road continuously when the TV is on the move

or continuous navigation. Hence, PHtasks of size SSm are

executed by the local processor. On the contrary, PLtasks

are delay tolerant and can be executed over a longer time

stamp. Therefore PLand SLtasks can be ofﬂoaded directly

to the edge/cloud even if there are resources available nearby

to preserve the vehicular resources for future tasks of the same

or other TVs. In case there are no eligible SVs nearby, the TV

ofﬂoads the computation task to the edge/cloud.

B. Utility

For calculating the utility we assume that the PHtasks have

a strict maximum tolerable delay or deadline beyond which

the tasks lose their values. Whenever a task execution time

exceeds the tolerable time limit, the utility becomes negative

since the task ofﬂoading has failed. The utility function can

be mathematically deﬁned as [11]

UPH

n=(log (1 + µm−nm)/ηS0

Sm , nm≤µm,

−ζPH, nm> µm,(3)

where nmdenotes the time taken to complete the task φmand

−ζPHis the negative value assigned due to failure in executing

the task within the time limit and ηS0

Sm denotes the number

of local small-sized high priority tasks. Next, the utility of PG

tasks can be given as

UPG

n=(ζ(PG)/ηSSx , nm≤µm,

ζ(PG)e−c(nm−µm), nm> µm.(4)

Here, ηSSx denotes the summation of the number of small

tasks of priority class PG,SGsized tasks, and the number of

sub-tasks from large tasks (divided for ofﬂoading for parallel

or simultaneous execution) or SLtasks of priority class PG.

If the task is executed within the maximum tolerable delay,

a positive value is assigned to the utility whereas if the task

execution time exceeds the time limit, unlike PH, the task does

not completely lose its value or fail but the value of utility

decreases exponentially with time. In (4), ζ(PG)is the positive

utility constant. Furthermore, UPL

ncalculates the utility of PL

and SLthe tasks that are directly ofﬂoaded to the MEC server

or cloud and is given as

UPL

n=(ζPL/ηS0

L, nm≤µm,

0, nm> µm.(5)

Here, ηS0

Ldenotes the number of large tasks that is eligible to

be directly ofﬂoaded to the cloud. Although PLtasks are delay

tolerant, these tasks might still fail due to other technical issues

such as communication interruption leading to nm=∞. In

such cases, the utility is calculated as zero. Let χmdenote the

energy consumed in SV for task φm,fmbe the computation

frequency of SV and ιmdenotes the time required to compute

task φmand Cmdenotes the computational size of the task.

For frequency fmin a SV, the amount of energy consumed

is proportional to the amount of time required for the task

to be completed i.e. for φm, task Cm=fmιm. Hence, energy

consumed is proportional to the computational size. Then, the

overall utility of the TV can be calculated as

Un= (ρPH)UPH

n+ (ρ(PG))UPG

n+ (ρ(PL))UPL

n−χmCm,

(6)

where ρPH,ρPGand ρPLare constants that represent the

priority level of high priority tasks, general tasks, and low

priority tasks, respectively and is the indicator function.

In a vehicular network, each SV can also act as a TV,

such that a SV may have its own local PHtasks along with

ofﬂoaded tasks from other vehicles. Since each vehicle has

limited computation capability, the SV needs to ensure that it

has allocated enough resources for its local PHtasks so that

they are executed within the time limit. Let a service vehicle

Sscarry Llocal PHtasks and the computation size of the

tasks and maximum tolerable delay be Csand µs, respectively.

Then, the minimum frequency required for local tasks and the

total frequency if all the resources are solely reserved for local

tasks are respectively given as

Fmin

Ss=XL

S=1

µs

,and FSs=XL

S=1

φSsµs

,(7)

where φSs=Fmin

FSsNow, from equation (6), the total utility

in calculating the local task is given by

Ulocal(φ) = XSs

s=1 log (1 + µs−φµs), φ ∈[φSs,1].(8)

When the SV executes an ofﬂoaded task along with a local

task, it allocates part of its frequency to the ofﬂoaded task.

Hence, the utility of the local task changes such that the

product of the energy consumed and computation size of the

task is equal to the difference of utility of the local task and

the utility when both local and ofﬂoaded tasks are processed.

This is given as

χmCm=Ulocal(φs)−Ulocal (φ0

s).(9)

Similarly, while executing the allocated task if any new task

arrives, then the utility is again updated to Ulocal

Ss(φ00

s), where

φ00

sis dependent on the energy consumed during execution of

the tasks, which is proportional to the computation size of the

tasks.

IV. SOF T ACTO R-C RITIC (SAC) BASE D TASK

OFFLOA DING

Since in the considered framework, parameters in the V2V

network can change over different time slots, hence though

the current state of the network is known, the next state or the

future states are unknown and are likely to be different from

the current state. To address the problem of change in states,

it is modeled as a MDP3with the objective of maximizing

the utility of task ofﬂoading and is solved using a model-free

DRL algorithm that employs the SAC framework to evaluate

and improve the policy of task ofﬂoading. The details of the

proposed task ofﬂoading are illustrated below.

A. Preliminaries on RL

•State space: Each TV contains an agent which takes

the decision of ofﬂoading a task on the basis of the

vehicular network’s information at time nsuch as

SINR of the communication channel, availability of SV,

remaining resources of Ss, utility of locally executed

tasks (Ulocal(φ)), data size, CPU cycle, maximum

tolerable delay, etc.

•Action space: The agent in the TV analyzes the

environment based on the state space and determines the

eligible SV to ofﬂoad the task.

•Reward: Let Ωdenote a binary term such that the value

of Ωis 1if a task is successfully ofﬂoaded. In the

considered framework, we have Ntime slots. Hence, the

mean reward Rcan be calculated as

R=1

NXN−1

n=0 XS

s=1ΩSs

nUSs

n.(10)

B. SAC algorithm

An important feature of SAC is to regulate the entropy of

a policy. The policy in SAC is trained such that a trade-

off between the expected return and entropy is maximized.

Hence, the SAC algorithm is based on maximum entropy RL,

that aims to maximize the mean reward and entropy to ﬁnd

the optimal policy given by π. Accordingly, the relationship

between the soft action value and the state value functions at

3The details on MDP modelling is omitted due to space constraints.

However, the readers can refer to [12] for details on MDP.

state s, action aand time nfor the considered V2V network

can be given as equation (11),(12) as per Bellman equation

Qπ(s, a) =R(sn,an)+γsn+1∼p[V(sn+1)] ,(11)

with Vsn=an∼π[Q(sn, an)−log π(an|sn)] ,(12)

where pdenotes the trajectory distribution made by π. We

consider Qπ(s, a) = Qθ(s, a)in the DNN where θdenotes

the network parameters. The Q-function parameters are not

constant and the actor and critic are further updated according

to the replay buffer regarding action and immediate reward

in the replay buffer. Thus the parameters can be trained by

minimizing the loss function given by equation(13)

Qval =(Qθ(sn, an)−(Q0

θ0(sn, an)2],

JQ(θ) = sn,an∼rb[(1/2)(Qval )] ,(13)

where Qθ(sn, an)denotes the soft Q-value and rb denotes

replay buffer [13] and

θ0(sn, an) = R(sn,an)+γsn+1∼p[Vθ0(sn+1)] .(14)

In order to stabilize the iterations for action-value function,

equation (14) deﬁnes the target action-value function and θ0

is obtained by moving average of θexponentially.

The performance of the DRL algorithm depends on the

policy. Hence, if the policy is optimal the ofﬂoaded tasks will

be completed within time and utility will be high. On the

contrary, if the policy is not optimal, task computation failure

or exceeding the time limit will be a commonality which will

decrease the utility. Therefore, in order to improve the policy,

it is updated in terms of Kullback-Leibler divergence as in

equation (15) [13] where Πdenotes some set of policies which

co-relates to Gaussian distribution parameters and function Zπ

is used to normalize the distribution without effecting the new

policy.

πn= arg min

π0∈Π

DKL π0(.|sn)||exp((1/Λ)Qπ(sn, .)

Zπ(sn).(15)

The Kullback-Leibler divergence can be further minimized

by updating the policy parameters as

Jπ(φ) = sn∼rbhan∼πφ[Λ log(πφ(an|sn)) −Qθ(sn, an)]i.

(16)

The policy iteration should be continued until it reaches

the optimal policy value and converges with the maximum

entropy. The detailed SAC based task ofﬂoading algorithm is

illustrated in Algorithm 1.

V. NUMERICAL RE SULT S

In this section, we present the simulation results for the

considered IoV network. In particular, we consider one TV

and multiple SVs within the VFC framework where the SVs

include parked vehicles, moving vehicles and pedestrians.

Every moving SV can also have its own local tasks and hence,

the agent in every vehicle ensures that their local tasks or high

priority tasks are executed on time. The mean utility and mean

delay are used as the scores to measure the performance of

the network. While the mean utility is deﬁned as Undivided

by the total number of local and ofﬂoaded tasks and sub-tasks,

Algorithm 1 SAC based task ofﬂoading

1: Initialize: Qθ1(s, a),Qθ2(s, a),Qθ0

1and Qθ0

2with weight

θ0

1=θ1and θ0

2=θ2

2: Initialize: policy πφ(a|s), weight=φ

3: for each iteration do

4: Retrieve current state s0

5: for n= 0,1,2,...,(N−1) do

6: Examine the properties of ofﬂoading task from TV

7: Broadcast beacon message

8: Collect information from the environment

9: Estimate sn

10: Action andetermines SV and inform agent in TV

11: Compute reward using (10) and estimate state sn+1

12: Save tuple (sn, an, Rn, sn+1)in replay memory

13: Update JQ(θ)in equation (13)

14: Update φin equation (16)

15: Update soft action value function θ0

16: end for

17: end for

TABLE I: Vehicular network and DRL parameters

Parameters Values

Maximum relative speed (ν)±55 km/hr

Maximum relative distance between two

vehicles

400m

Maximum tolerable delay of local tasks

(τn)(4 types)

[0.5, 1, 2, 4] seconds

Computation capability of vehicle (3 types) [3-10] GHz

Data size of task (3 categories) (Dm)[0.05, 0.9, 1.5] MB

Computation size of task (3 categories)

(Cm)

[0.1, 0.5, 1.0]

Task priority constant PH=0.5, PG=1,PL=2

Maximum number of tasks 50

Maximum local task 5

Hidden layers 2

Actor hidden layer unit [1000,1000]

Critic hidden layer [600,600]

Batch size (β)250

Epsilon decay rate denominator 150

Buffer size 99999

Activation function Softmax

Training episodes 2000

the mean delay is deﬁned as the total overall delay4divided

by the number of tasks and sub-tasks. Unless otherwise stated,

Table I shows the parameters considered for simulation. Any

other parameter used will be explicitly mentioned therein.

1) Mean utility with respect to varying vehicle speed: In

the vehicular network, the velocity of a vehicle is considered

to be uniform in each time slot. Fig. 2 shows the mean utility

for varying speeds of the vehicles. The learning rate of the

algorithm is set at 10−2. The speeds are set as 30 km/hr, 50

km/hr and 70 km/hr over different trafﬁc conditions varying

from 5 to 50 vehicle/Km. It can be observed from the ﬁgure

that when the speed of the vehicles are low, the SVs stay

within the range of the TV for longer duration. Hence, these

SVs qualify to take the ofﬂoaded task with longer time limit

and the utility for them is higher during slower speeds.

4The overall delay is given by PM

m=1µm.

5 10 15 20 25 30 35 40 45 50

Vehicles/km

-58

-56

-54

-52

-50

-48

-46

-44

Mean utility

SAC=70 km/hr

SAC=50 km/hr

SAC=30 km/hr

Fig. 2: Mean utility of SAC vs trafﬁc for different vehicle speeds.

0.5 1 2 4

Maximum tolerable delay (in seconds)

Mean delay (in seconds)

LR=0.0005

LR=0.0002

LR=0.001

Fig. 3: Mean delay vs maximum tolerable delay in completing high-priority

tasks for SAC for different LR when trafﬁc =45 Vehicles/km.

2) Mean delay with respect to varying learning rates:

Figure 3 shows the mean delay of the proposed SAC algorithm

in the successful computation of high priority tasks for

different learning rates (LR) in high trafﬁc conditions of 45

vehicles per kilometer. The x-axis represents four different

cases of maximum tolerable delay in seconds. From the ﬁgure

it can be observed that the delay in computation increases with

the increase in learning rate.

3) SAC vs DDPG vs Greedy: Next, we compare the

proposed SAC based task ofﬂoading DRL algorithm with

two other baseline algorithms, namely greedy and DDPG

algorithms. In the greedy algorithm, the agent in the TV

selects the SV via random selection among the SVs which

have maximum computation power remaining from a set of

available SVs within range without considering any other

parameters such as distance, delay, speed etc. The DDPG on

the other hand is a DRL algorithm that is similar to SAC, but

with the important discrepancy that it uses deterministic policy

while SAC uses stochastic method. Under different scenarios

the comparative performance of DDPG and SAC varies,

whereby both can outperform each other. Nevertheless, SAC

acts as a bridge between stochastic policy optimization and

DDPG-style methods. Accordingly, we analyze the considered

5 10 15 20 25 30 35 40 45 50

Vehicles/km

-80

-70

-60

-50

-40

-30

-20

Mean utility

SAC

DDPG

Greedy Algorithm

Fig. 4: SAC vs DDPG vs Greedy with respect to mean utility.

0.5 1 2 4

Maximum tolerable delay (in seconds)

Mean delay (in seconds)

DDPG

SAC

Greedy

Fig. 5: SAC vs DDPG vs Greedy in terms of mean delay vs maximum

tolerable delay in completing the highest priority tasks when trafﬁc =45

Vehicles/Km.

V2V network with respect to various network parameters to

check the performance and feasibility of the proposed SAC

algorithm with respect to the conventional DDPG based one.

The learning rates of the SAC and DDPG are set at 10−2.

Fig. 4 shows the mean utility of the proposed SAC algorithm

along with DDPG and greedy for varying trafﬁc conditions.

It can be seen that for a particular vehicle speed the utility

of SAC is always higher than that of the greedy approach for

any trafﬁc condition. Further, in SAC the difference in value

of utility in high and low trafﬁc is minimal. On the contrary,

DDPG performs better in high trafﬁc condition than low trafﬁc

conditions. Hence, it can be concluded that trafﬁc condition

does not particularly affect SAC and it may be possible that

DDPG outperforms SAC for networks with very dense trafﬁc.

Finally, Fig. 5 shows the mean delay of the three algorithms

in successfully completing the computation of high priority

tasks in high trafﬁc conditions of 45 vehicles per kilometer.

Since the greedy algorithm only considers the computation

power remaining and ignores factors such as distance, expected

delay, speed etc., its delay is somewhat unpredictable. With

regards to the DDPG and SAC, the mean delay for both

algorithms are somewhat similar and increases when the

maximum tolerable delay in the network increases. The

similarity between the DDPG and SAC is due to the

consideration of higher intensity trafﬁc (i.e., 45 vehicles/Km)

in this ﬁgure, which was also validated in Fig. 4.

VI. CONCLUSION

A DRL algorithm based on SAC was proposed for efﬁcient

task and resource allocation in a VFC framework. In particular,

we jointly considered the priority and size of tasks along

with other network parameters that affect the connectivity

among vehicles. Accordingly, we formulated a framework that

specify the workﬂow of the vehicular network to allocate

the idle computation power of the nearby vehicles to certain

task vehicles. The proposed SAC algorithm works towards

achieving the optimal policy for task ofﬂoading and maximizes

the mean utility of the considered network.

ACKNOWLEDGMENT

This work was supported by the Ministry of Science and

Technology of Taiwan under Grants MOST 110-2224-E-110-

001 & MOST 109-2221-E-110-050-MY3.

REFERENCES

[1] H. Ji, O. Alfarraj, and A. Tolba, “Artiﬁcial intelligence-empowered edge

of vehicles: Architecture, enabling technologies, and applications,” IEEE

Access, vol. 8, pp. 61 020–61 034, Mar. 2020.

[2] Y. Hui, Z. Su, T. H. Luan, and J. Cai, “Content in motion: An

edge computing based relay scheme for content dissemination in urban

vehicular networks,” IEEE Trans. Intell. Transp. Syst., vol. 20, no. 8,

pp. 3115–3128, Nov. 2018.

[3] R. Q. Hu et al., “Mobility-aware edge caching and computing in vehicle

networks: A deep reinforcement learning,” IEEE Trans. Veh. Technol.,

vol. 67, no. 11, pp. 10 190–10 203, Aug. 2018.

[4] J. Shi, J. Du, J. Wang, J. Wang, and J. Yuan, “Priority-aware task

ofﬂoading in vehicular fog computing based on deep reinforcement

learning,” IEEE Trans. Veh. Technol., vol. 69, no. 12, pp. 16 067–16 081,

Dec. 2020.

[5] C. Chen, L. Chen, L. Liu, S. He, X. Yuan, D. Lan, and Z. Chen, “Delay-

optimized V2V-based computation ofﬂoading in urban vehicular edge

computing and networks,” IEEE Access, vol. 8, pp. 18 863–18 873, Jan.

2020.

[6] L. Pu, X. Chen, G. Mao, Q. Xie, and J. Xu, “Chimera: An energy-

efﬁcient and deadline-aware hybrid edge computing framework for

vehicular crowdsensing applications,” IEEE Internet Things J., vol. 6,

no. 1, pp. 84–99, feb 2018.

[7] J. Feng, Z. Liu, C. Wu, and Y. Ji, “AVE: Autonomous vehicular edge

computing framework with ACO-based scheduling,” IEEE Trans. Veh.

Technol., vol. 66, no. 12, pp. 10 660–10 675, Jun. 2017.

[8] Y. Zhang et al., “Research on adaptive beacon message broadcasting

cycle based on vehicle driving stability,” Int. J. Netw. Manag., vol. 31,

no. 2, p. e2091, Mar. 2021.

[9] F. J. Ros, P. M. Ruiz, and I. Stojmenovic, “Acknowledgment-based

broadcast protocol for reliable and efﬁcient data dissemination in

vehicular Ad Hoc networks,” IEEE Trans. Mobile Comput., vol. 11,

no. 1, pp. 33–46, Dec. 2010.

[10] W. L. Tan, W. C. Lau, O. Yue, and T. H. Hui, “Analytical models

and performance evaluation of drive-thru internet systems,” IEEE J. Sel.

Areas Commun., vol. 29, no. 1, pp. 207–222, Dec. 2010.

[11] J. Zhao, Q. Li, Y. Gong, and K. Zhang, “Computation ofﬂoading

and resource allocation for cloud assisted mobile edge computing in

vehicular networks,” IEEE Trans. Veh. Technol., vol. 68, no. 8, pp. 7944–

7956, Jun. 2019.

[12] Z. Chen and X. Wang, “Decentralized computation ofﬂoading for multi-

user mobile edge computing: A deep reinforcement learning approach,”

EURASIP J. Wirel. Commun. Netw., vol. 2020, no. 1, pp. 1–21, 2020.

[13] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-

policy maximum entropy deep reinforcement learning with a stochastic

actor,” in Proc. International conference on machine learning. PMLR,

Jul. 2018, pp. 1861–1870.

RADiT: Resource Allocation in Digital Twin-Driven UAV-aided Internet of Vehicle Networks

Article

Full-text available

Aug 2023
IEEE J SEL AREA COMM

Digital twin (DT) has emerged as a promising technology for improving resource allocation decisions in Internet of Vehicles (IoV) networks. In this paper, we consider an IoV network where mobile edge computing (MEC) servers are deployed at the roadside units (RSUs). The IoV network provides ubiquitous connections even in areas uncovered by RSUs with the assistance of unmanned aerial vehicles (UAVs) which can act as a relay between RSUs and task vehicles. A virtual representation of the IoV network is established in the aerial network as DT which captures the dynamics of the entities of the physical network in real-time in order to perform efficient resource allocation for delay-intolerant tasks. We investigate an intelligent delay-sensitive task offloading scheme for the dynamic vehicular environment which provides computation resources via local execution, vehicle-to-vehicle (V2V), and vehicle-to-roadside-unit (V2I) offloading modes based on the energy consumption of the system. Moreover, we also propose a multi-network deep reinforcement learning (DRL)-based resource allocation algorithm (RADiT) in the DT-assisted network for maximizing the utility of the IoV network while optimizing the task offloading strategy. Further, we compare the performance of the proposed algorithm with and without the presence of V2V computation mode. RADiT is further evaluated by comparing it with another benchmark DRL algorithm called soft actor-critic (SAC) and a non-DRL approach called greedy. Finally, simulations are performed to demonstrate that the utility of the proposed RADiT algorithm is higher under every condition compared to its respective conditions in SAC and greedy approach. Consequently, the proposed framework jointly improves energy efficiency and reduces the overall delay of the network. The proposed algorithm with UAV relay further increases the efficiency of the network by increasing the task completion rate.

AFL-DMAAC: Integrated Resource Management and Cooperative Caching for URLLC-IoV Networks

Article

Full-text available

Aug 2023

In this paper, we propose a novel approach for optimal resource management and caching in ultra-reliable low-latency communication (URLLC)-enabled Internet of Vehicles (IoV) networks. The proposed framework includes mobile edge computing (MEC) servers integrated into roadside units (RSUs), unmanned aerial vehicles (UAVs), and base stations (BSs) for hybrid vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communication. To enhance the accuracy of the global model while considering the mobility characteristics of vehicles, we leverage an asynchronous federated learning (AFL) algorithm. The problem of optimal resource allocation is formulated to achieve the best allocation of frequency, computation, and caching resources while complying with the delay restrictions. To solve the non-convex problem, a multi-agent actor-critic type deep reinforcement learning algorithm called DMAAC algorithm is introduced. Additionally, a cooperative caching scheme based on the AFL framework called Co-Ca is proposed, utilizing a Dueling Deep-Q-Network (DDQN) to predict frequently accessed contents and cache them efficiently. Extensive simulation results show the effectiveness of the proposed framework and algorithms compared to existing schemes.

Multi-agent DRL-based Task Offloading in Multiple RIS-aided IoV Networks

Article

Full-text available

Aug 2023
IEEE T VEH TECHNOL

This paper considers an internet of vehicles (IoV) network, where multi-access edge computing (MAEC) servers are deployed at base stations (BSs) aided by multiple reconfigurable intelligent surfaces (RISs) for both uplink and downlink transmission. An intelligent task offloading methodology is designed to optimize the resource allocation scheme in the vehicular network which is based on the state of criticality of the network and the priority and size of tasks. We then develop a multi-agent deep reinforcement learning (MA-DRL) framework using the Markov game for optimizing the task offloading decision strategy. The proposed algorithm maximizes the mean utility of the IoV network and improves the communication quality. Extensive numerical results were performed that demonstrate that the RIS-assisted IoV network using the proposed MA-DRL algorithm achieves higher utility than current state-of-the art networks (not aided by RISs) and other baseline DRL algorithms, namely soft actor-critic (SAC), deep deterministic policy gradient (DDPG), twin delayed DDPG (TD3). The proposed method improves the offloading data rate of the tasks, reduces the mean delay and ensures that a higher percentage of offloaded tasks are completed compared to that of other DRL based and non-RIS assisted IoV frameworks. Index Terms-Internet of vehicles (IoV), multi-access edge computing (MAEC), reconfigurable intelligent surface (RIS), multi-agent deep reinforcement learning (MA-DRL).

Asynchronous Federated Learning-Based Resource Management in URLLC-IoV Networks

Conference Paper

Dec 2023

RIS-Aided Integrated Sensing and Communications

Conference Paper

Dec 2023

Digital Twin-Assisted Resource Allocation in UAV-Aided Internet of Vehicles Networks

Conference Paper

May 2023

Deep Reinforcement Learning for Energy-Efficient Task Offloading in Cooperative Vehicular Edge Networks

Conference Paper

Full-text available

Jul 2023

Multi-Agent Reinforcement Learning Based Cooperative Multitype Task Offloading Strategy for Internet of Vehicles in B5G/6G Network

Article

Jul 2023

With the development of intelligent transportation, various computation intensive and delay sensitive applications are emerging in the Internet of Vehicles (IoV). The B5G/6G(Beyond 5th generation mobile communication technology/ 6th generation mobile communication technology) network has the characteristics of ultra-low latency and ultramany connections. The deployment of the Network in Boxes (NIB) supporting B5G/6G network in the vehicle can realize the real-time communication with the edge server (ES) and offload the task to the ES. However, the current Multi-access Edge Computing (MEC) lacks research on cooperative processing among multiple edge servers (ESs), and the efficiency of data-intensive computation tasks is still insufficient. In this paper, we investigate the cooperative offloading of multi-type tasks among ESs in B5G/6G networks under dynamic environment. In order to minimize the delay of task execution, we regard cooperative offloading as a Markov Decision Process (MDP), and improve the convergence speed and stability of traditional SAC(Soft Actor Critic) algorithm by adaptive weight sampling mechanism. Finally, an offline centralized training distributed execution framework based on improved soft actor critical (ISAC)(OCTDE-ISAC)is proposed to optimize the cooperative offloading strategy. The experimental results show that the proposed algorithm is better than the existing algorithm in terms of latency.

Multi-Agent DRL-Based Computation Offloading in Multiple RIS-Aided IoV Networks

Conference Paper

Nov 2022

Priority-Aware Task Offloading in Vehicular Fog Computing Based on Deep Reinforcement Learning

Article

Full-text available

Dec 2020

Vehicular fog computing (VFC) has been expected as a promising scheme that can increase the computational capability of vehicles without relying on servers. Comparing with accessing the remote cloud, VFC is suitable for delay-sensitive tasks because of its low-latency vehicle-to-vehicle (V2V) transmission. However, due to the dynamic vehicular environment, how to motivate vehicles to share their idle computing resource while simultaneously evaluating the service availability of vehicles in terms of vehicle mobility and vehicular computational capability in heterogeneous vehicular networks is a main challenge. Meanwhile, tasks with different priorities of a vehicle should be processed with different efficiencies. In this work, we propose a task offloading scheme in the context of VFC, where vehicles are incentivized to share their idle computing resource by dynamic pricing, which comprehensively considers the mobility of vehicles, the task priority, and the service availability of vehicles. Given that the policy of task offloading depends on the state of the dynamic vehicular environment, we formulate the task offloading problem as a Markov decision process (MDP) aiming at maximizing the mean latency-aware utility of tasks in a period. To solve this problem, we develop a soft actor-critic (SAC) based deep reinforcement learning (DRL) algorithm for the sake of maximizing both the expected reward and the entropy of policy. Finally, extensive simulation results validate the effectiveness and superiority of our proposed scheme benchmarked with traditional algorithms.

Decentralized computation offloading for multi-user mobile edge computing: a deep reinforcement learning approach

Article

Full-text available

Sep 2020
EURASIP J WIREL COMM

Mobile edge computing (MEC) emerges recently as a promising solution to relieve resource-limited mobile devices from computation-intensive tasks, which enables devices to offload workloads to nearby MEC servers and improve the quality of computation experience. In this paper, an MEC enabled multi-user multi-input multi-output (MIMO) system with stochastic wireless channels and task arrivals is considered. In order to minimize long-term average computation cost in terms of power consumption and buffering delay at each user, a deep reinforcement learning (DRL)-based dynamic computation offloading strategy is investigated to build a scalable system with limited feedback. Specifically, a continuous action space-based DRL approach named deep deterministic policy gradient (DDPG) is adopted to learn decentralized computation offloading policies at all users respectively, where local execution and task offloading powers will be adaptively allocated according to each user’s local observation. Numerical results demonstrate that the proposed DDPG-based strategy can help each user learn an efficient dynamic offloading policy and also verify the superiority of its continuous power allocation capability to policies learned by conventional discrete action space-based reinforcement learning approaches like deep Q-network (DQN) as well as some other greedy strategies with reduced computation cost. Besides, power-delay tradeoff for computation offloading is also analyzed for both the DDPG-based and DQN-based strategies.

Artificial Intelligence-Empowered Edge of Vehicles: Architecture, Enabling Technologies, and Applications

Article

Full-text available

Mar 2020

With the proliferation of mobile devices and a wealth of rich application services, the Internet of vehicles (IoV) has struggled to handle computationally intensive and delay-sensitive computing tasks. To substantially reduce the latency and the energy consumption, application work is offloaded from a mobile device to a remote cloud or a nearby mobile edge cloud for processing. Compared with remote clouds, mobile edge clouds are located at the edge of the network. Therefore, mobile edge computing (MEC) has the advantages of effectively utilizing idle computing and storage resources at the edge of the network and reducing the network transmission delay. In addition, mobile devices are increasingly moving toward intelligence. To satisfy the service experience and service quality requirements of mobile users, the vehicle Internet is transforming into the intelligent vehicle Internet. Artificial intelligence (AI) technology can adapt to rapidly changing dynamic environments to provide multiple task requirements for resource allocation, computational task scheduling, and vehicle trajectory prediction. On this basis, combined with MEC technology and AI technology, computing and storage resources are placed on the edge of the network to provide real-time data processing while providing more efficient and intelligent services. This article introduces IoV from three aspects, namely, MEC, AI and the advantages of combining the two, and analyzes the corresponding architecture and implementation technology. The application of MEC and AI in IoV is analyzed and compared with current approaches. Finally, several promising future directions in the field of IoV are discussed.

Delay-Optimized V2V-Based Computation Offloading in Urban Vehicular Edge Computing and Networks

Article

Full-text available

Jan 2020

The Internet of Vehicles (IoV) is an emerging paradigm, driven by recent advancements in vehicular communications and networking. Meanwhile, the capability and intelligence of vehicles are being rapidly enhanced, and this will have the potential of supporting a plethora of new exciting applications, which will integrate fully autonomous vehicles, the Internet of Things (IoT), and the environment. In view of the delay-sensitive property of these promising applications, as well as the high expense by using infrastructures and roadside units (RSU), the task offloading among vehicles has gained enormous popularity considering its free-of-charge and timely response. In this paper, by utilizing the gathering period of vehicles in urban environment due to stopped by traffic lights or Area of Interest (AOI), a task offloading scheme merely relying on vehicle-to-vehicle (V2V) communication is proposed by fully exploring the idle resources of gathered vehicles for task execution. Through formulating the task execution as a Min-Max problem among one task and several cooperative vehicles, the task executing time is optimized with the Max-Min Fairness scheme, which is further solved by the Particle Swarm Optimization (PSO) Algorithm. Extensive simulation demonstrate that our model could well meet the delay requirement of delay-sensitive application by cooperative computing among vehicles.

Mobility-Aware Edge Caching and Computing in Vehicle Networks: A Deep Reinforcement Learning

Article

Full-text available

Jul 2018

This paper studies the joint communication, caching and computing design problem for achieving the operational excellence and the cost efficiency of the vehicular networks. Moreover, the resource allocation policy is designed by considering the vehicle’s mobility and the hard service deadline constraint. These critical challenges have often been either neglected or addressed inadequately in the existing work on the vehicular networks because of their high complexity. We develop a deep reinforcement learning with the multi-timescale framework to tackle these grand challenges in this paper. Furthermore, we propose the mobility-aware reward estimation for the large timescale model to mitigate the complexity due to the large action space. Numerical results are presented to illustrate the theoretical findings developed in the paper and to quantify the performance gains attained.

AVE: Autonomous Vehicular Edge Computing Framework with ACO-Based Scheduling

Article

Full-text available

Jun 2017

With the emergence of in-vehicle applications, providing the required computational capabilities is becoming a crucial problem. This paper proposes a framework named Autonomous Vehicular Edge (AVE) for edge computing on the road, with the aim of increasing the computational capabilities of vehicles in a decentralized manner. By managing the idle computational resources on vehicles and using them efficiently, the proposed AVE framework can provide computation services in dynamic vehicular environments without requiring particular infrastructures to be deployed. Specifically, this paper introduces a workflow to support the autonomous organization of vehicular edges. Efficient job caching is proposed to better schedule jobs based on the information collected on neighboring vehicles, including GPS information. A scheduling algorithm based on ant colony optimization (ACO) is designed to solve this job assignment problem. Extensive simulations are conducted, and the simulation results demonstrate the superiority of this approach over competing schemes in typical urban and highway scenarios.

Research on adaptive beacon message broadcasting cycle based on vehicle driving stability

Article

Jan 2020

Ensuring smooth communication by fixed‐cycle message beaconing in vehicular environments is necessary to address vehicles safety. However, fixed‐cycle beacon messages cannot accommodate the characteristics of fast vehicle speeds and variable network topologies and can cause problems such as channel congestion when traffic density is too high. Therefore, in order to realize safe and reliable information transmission between vehicles, this paper proposes a strategy for adaptive update of beacon message cycle based on vehicle driving stability. It is based on two rules: one is that the vehicle position prediction error is defined as an unstable vehicle, and the small error is defined as a stable vehicle; and the other is that the adaptive beacon message cycle is ranged, which is determined according to the channel load capacity. The experimental results show that the strategy can effectively avoid the channel congestion problem and improve the driving safety of the vehicle. Compared with the fixed‐cycle beacon message, the communication delay is reduced by about 10%, the packet loss is reduced by about 22%, and the energy consumption is reduced. This paper proposes a strategy for adaptive update of beacon message cycle based on vehicle driving stability. It is based on two rules: one is that the vehicle position prediction error is defined as an unstable vehicle, and the small error is defined as a stable vehicle; And the other is that the adaptive beacon message cycle is ranged, which is determined according to the channel load capacity.

Computation Offloading and Resource Allocation For Cloud Assisted Mobile Edge Computing in Vehicular Networks

Article

Jun 2019

Computation offloading services provide required computing resources for vehicles with computation-intensive tasks. Past computation offloading research mainly focused on mobile edge computing (MEC) or cloud computing, separately. This paper presents a collaborative approach based on MEC and cloud computing that offloads services to automobiles in vehicular networks. A cloud-MEC collaborative computation offloading problem is formulated through jointly optimizing computation offloading decision and computation resource allocation. Since the problem is non-convex and NP-hard, we propose a collaborative computation offloading and resource allocation optimization (CCORAO) scheme to solve it, and design a distributed computation offloading and resource allocation (DCORA) algorithm for CCORAO scheme that achieves the optimal solution. The simulation results show that the proposed algorithm can effectively improve the system utility and computation time, especially for the scenario where the MEC servers fail to meet demands due to insufficient computation resources.

Content in Motion: An Edge Computing Based Relay Scheme for Content Dissemination in Urban Vehicular Networks

Article

Nov 2018

Content dissemination, in particular, small-volume localized content dissemination, represents a killer application in vehicular networks, such as advertising distribution and road traffic alerts. The dissemination of contents in vehicular networks typically relies on the roadside infrastructure and moving vehicles to relay and propagate contents. Due to instinct challenges posed by the features of vehicles (mobility, selfishness, and routes) and limited communication ability of infrastructures, to efficiently motivate vehicles to join in the content dissemination process and appropriately select the relay vehicles to satisfy different transmission requirements is a challenging task. This paper develops a novel edge-computing-based content dissemination framework to address the issue, composed of two phases. In the first phase, the contents are uploaded to an edge computing device (ECD), which is an edge caching and communication infrastructure deployed by the content provider. By jointly considering the selfishness and the transmission capability of vehicles, a two-stage relay selection algorithm is designed to help the ECD selectively deliver the content through vehicle-to-infrastructure (V2I) communications to satisfy its requirements. In the second phase, the vehicles selected by the ECD relay the content to the vehicles that are interested in the content during the trip to destinations via vehicle-to-vehicle (V2V) communications, where the efficiency of content delivery is analyzed according to the probability that vehicles encounter on the path. Using extensive simulations, we show that our framework disseminates contents to vehicles more efficiently and brings more payoffs to the content provider than the conventional methods.

Chimera: An Energy-efficient and Deadline-aware Hybrid Edge Computing Framework for Vehicular Crowdsensing Applications

Article

Sep 2018

In this paper, we propose Chimera, a novel hybrid edge computing framework, integrated with the emerging edge cloud radio access network, to augment network-wide vehicle resources for future large-scale vehicular crowdsensing applications, by leveraging a multitude of cooperative vehicles and the virtual machine (VM) pool in the edge cloud via the control of the application manager deployed in the edge cloud. We present a comprehensive framework model and formulate a novel multi-vehicle and multi-task offloading problem, aiming at minimizing the energy consumption of network-wide recruited vehicles serving heterogeneous crowdsensing applications, and meanwhile reconciling both application deadline and vehicle incentive. We invoke Lyapunov optimization framework to design TaskSche, an online task scheduling algorithm, which only utilizes the current system information. As the core components of the algorithm, we propose a task workload assignment policy based on graph transformation and a knapsack-based VM pool resource allocation policy. Rigorous theoretical analyses and extensive trace-driven simulations indicate that our framework achieves superior performance (e.g., 20% – 68% energy saving without overstepping application deadlines for network-wide vehicles compared with vehicle local processing) and scales well for a large number of vehicles and applications.

SAC-Based Resource Allocation for Computation Offloading in IoV Networks

Abstract

Recommended publications

DRL-Based Resource Allocation for Computation Offloading in IoV Networks

Multi-Agent DRL-Based Computation Offloading in Multiple RIS-Aided IoV Networks

Priority-Aware Computational Resource Allocation

Multi-agent DRL-based Task Offloading in Multiple RIS-aided IoV Networks