PreprintPDF Available

Dynamic Resource Management in Integrated NOMA Terrestrial-Satellite Networks using Multi-Agent Reinforcement Learning

October 2023

October 2023

DOI:10.48550/arXiv.2310.11814

License
CC BY 4.0

Authors:

Ali Nauman P.hD.

Yeungnam University

Haya Alshahrani

Princess Nora bint Abdul Rahman University

Nadhem Nemri

King Khalid University

Kamal M Othman

Umm Al-Qura University

Show all 9 authorsHide

Preprints and early-stage research may not have been peer reviewed yet.

The integration of terrestrial and satellite wireless communication networks offers a practical solution to enhance network coverage, connectivity, and cost-effectiveness. Moreover, in today's interconnected world, connectivity's reliable and widespread availability is increasingly important across various domains. This is especially more crucial for applications like the Internet of Things (IoT), remote sensing, disaster management, and bridging the digital divide. However, allocating the limited network resources efficiently and ensuring seamless handover between satellite and terrestrial networks present significant challenges. Therefore, this study introduces a resource allocation framework for integrated satellite-terrestrial networks to address these challenges. The framework leverages local cache pool deployments and non-orthogonal multiple access (NOMA) to reduce time delays and improve energy efficiency. Our proposed approach utilizes a multi-agent enabled deep deterministic policy gradient algorithm (MADDPG) to optimize user association, cache design, and transmission power control, resulting in enhanced energy efficiency. The approach comprises two phases: User Association and Power Control, where users are treated as agents, and Cache Optimization, where the satellite (Bs) is considered the agent. Through extensive simulations, we demonstrate that our approach surpasses conventional single-agent deep reinforcement learning algorithms in addressing cache design and resource allocation challenges in integrated terrestrial-satellite networks. Specifically, our proposed approach achieves significantly higher energy efficiency and reduced time delays compared to existing methods. This research highlights the importance and addresses the need for efficient resource allocation and cache design in integrated terrestrial-satellite networks, paving the way for enhanced connectivity and improved network performance in various applications.

System Model

…

Evaluating proposed scheme and benchmark algorithms for Request Complementation rate under different cache sizes

…

Figures - uploaded by Wali Ullah Khan

Content may be subject to copyright.

Content uploaded by Wali Ullah Khan

Content may be subject to copyright.

DRAFT 1

Dynamic Resource Management in Integrated

NOMA Terrestrial-Satellite Networks using

Multi-Agent Reinforcement Learning

Ali Nauman, Haya Mesfer Alshahrani, Nadhem Nemri, Kamal M. Othman, Nojood O Aljehane,

Mashael Maashi, Ashit Kumar Dutta, Mohammed Assiri, Wali Ullah Khan

Abstract—The integration of terrestrial and satellite wireless

communication networks offers a practical solution to enhance

network coverage, connectivity, and cost-effectiveness. Moreover,

in today’s interconnected world, connectivity’s reliable and

widespread availability is increasingly important across various

domains. This is especially more crucial for applications like the

Internet of Things (IoT), remote sensing, disaster management,

and bridging the digital divide. However, allocating the limited

network resources efﬁciently and ensuring seamless handover

between satellite and terrestrial networks present signiﬁcant

challenges. Therefore, this study introduces a resource allocation

framework for integrated satellite-terrestrial networks to address

these challenges. The framework leverages local cache pool

deployments and non-orthogonal multiple access (NOMA) to

reduce time delays and improve energy efﬁciency. Our proposed

approach utilizes a multi-agent enabled deep deterministic policy

gradient algorithm (MADDPG) to optimize user association,

cache design, and transmission power control, resulting in en-

Acknowledgement: The authors extend their appreciation to the Dean-

ship of Scientiﬁc Research at King Khalid University for funding this

work through large group Research Project under grant number (RGP2/

02/44). Princess Nourah bint Abdulrahman University Researchers Support-

ing Project number (PNURSP2023R237), Princess Nourah bint Abdulrah-

man University, Riyadh, Saudi Arabia. Research Supporting Project number

(RSPD2023R787), King Saud University, Riyadh, Saudi Arabia. This study

is supported via funding from Prince Sattam bin Abdulaziz University project

number (PSAU/2023/R/1444).

Ali Nauman is with the Department of Information and Communica-

tion Engineering, Yeungnam University, Republic of Korea (email: anau-

man@ynu.ac.kr)

Haya Mesfer Alshahrani is with the Department of Information Systems,

College of Computer and Information Sciences, Princess Nourah Bint Ab-

dulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia (email:

hmalshahrani@pnu.edu.sa).

Nadhem Nemri is with the Department of Information Systems, College

of Science & Art at Mahayil, King Khalid University, Saudi Arabia (email:

nnemri@kku.edu.sa).

Kamal M. Othman is with Department of Electrical Engineering, College

of Engineering, Umm Al-Qura University, Makkah, Saudi Arabia. (email:

kmothman@uqu.edu.sa)

Nojood O Aljehane is with the Department of Computer Science, Faculty

of Computers and Information Technology, University of Tabuk, Tabuk, Saudi

Arabia. (Email: naljohani@ut.edu.sa)

Mashael Maashi is with the Department of Software Engineering, College

of Computer and Information Sciences,King Saud University, Po Box 103786,

Riyadh 11543, Saudi Arabia. (email: mrbwesabi@gmail.com)

Ashit Kumar Dutta is with the Department of Computer Science and

Information System, College of Applied Sciences, AlMaarefa University,

Riyadh 11597, Saudi Arabia. (email: adota@mcst.edu.sa).

Mohammed Assiri is with the Department of Computer Science, College

of Sciences and Humanities- Aﬂaj, Prince Sattam bin Abdulaziz University,

Aﬂaj 16273, Saudi Arabia (email: meo.nrmo@gmail.com).

Wali Ullah Khan is with the Interdisciplinary Center for Security, Relia-

bility and Trust (SnT), University of Luxembourg, 1855 Luxembourg City,

Luxembourg (email: waliullah.khan@uni.lu).

Corresponding author: Ali Nauman (email: anauman@ynu.ac.kr)

hanced energy efﬁciency. The approach comprises two phases:

User Association and Power Control, where users are treated

as agents, and Cache Optimization, where the satellite (Bs) is

considered the agent. Through extensive simulations, we demon-

strate that our approach surpasses conventional single-agent deep

reinforcement learning algorithms in addressing cache design and

resource allocation challenges in integrated terrestrial-satellite

networks. Speciﬁcally, our proposed approach achieves signiﬁ-

cantly higher energy efﬁciency and reduced time delays compared

to existing methods. This research highlights the importance and

addresses the need for efﬁcient resource allocation and cache

design in integrated terrestrial-satellite networks, paving the way

for enhanced connectivity and improved network performance in

various applications.

Index Terms—Satellite-terrestrial networks, non-orthogonal

multiple access, resource optimization, interference management.

I. INTRODUCTION

The upcoming sixth-generation (6G) communications tech-

nologies and networks are intended to provide fast connectivity

all over the world [1], [2]. This network will provide ultra-high

data rate, very low latency and information security [3]. This

can be achieved by exploring new sustainable frameworks and

solutions. Integrated terrestrial and non-terrestrial networks

represent a fusion of ground-based infrastructures, such as

ﬁbre-optic cables and cell towers, with non-terrestrial systems

like satellites and drones, to establish seamless and dependable

connectivity [4], [5]. These networks bring forth numerous

beneﬁts, including expanded coverage, enhanced redundancy,

and improved resilience during natural disasters or other

disruptions [6]. As the demand for high-speed and reliable

connectivity continues to grow in the context of 6G networks

[7], the integration of terrestrial and non-terrestrial networks

is gaining paramount importance [8]. This integration enables

greater ﬂexibility and efﬁciency in data transmission, facilitat-

ing improved accessibility to information for individuals and

devices in remote or inaccessible locations [9].

Non-orthogonal multiple access (NOMA) technologies, uti-

lizing power domain multiplexing, have recently emerged as

a promising candidate for forthcoming 6G networks [10].

This technology has demonstrated signiﬁcant potential for

enhancing energy efﬁciency, accommodating a larger number

of concurrent users, and reducing latency, as validated by

recent studies [11], [12]. However, the implementation of

NOMA poses inherent complexities and several challenges

arXiv:2310.11814v1 [eess.SP] 18 Oct 2023

DRAFT 2

that need to be addressed [13]. These challenges encompass

the requirement for advanced signal processing techniques,

the development of efﬁcient power allocation algorithms [14],

and the effective management of interference among users [6]

sharing the same resources [15]. In response, researchers and

industry professionals are actively exploring novel techniques

and strategies to overcome these challenges and realize the

full potential of NOMA [16].

The integration of terrestrial-satellite networks plays a piv-

otal role in the development of the emerging 6G system,

with NOMA protocols frequently employed in this context, as

highlighted in recent studies [15]. This network architecture

enables the provision of cost-effective communication services

to both terrestrial base stations (BSs) and remote areas covered

by satellites, resulting in an expanded coverage area and

improved service quality requirements [17], [18]. However,

due to the limited availability of resources, practical resource

allocation methods are essential to enhance the system’s

energy efﬁciency and service quality [19]–[21].

Integrated terrestrial satellite communication networks face

a signiﬁcant challenge in the form of bottlenecks, which can

negatively impact service quality for speciﬁc users [22]. To

address this challenge, deploying cache pools for the system’s

base stations (BSs) has emerged as a promising solution [23].

Cache pools help reduce the amount of data that needs to be

transmitted across the network, thereby alleviating congestion,

improving overall performance, enabling efﬁcient ﬁle retrieval,

and reducing time delays [24]. However, effectively utilizing

cache pools necessitates additional storage capacity and care-

ful management of the caches.

Another key challenge in integrating terrestrial and satellite

networks is ensuring seamless and efﬁcient operation without

disruptions or delays [25]. This requires meticulous coor-

dination and management of the various systems involved.

Advanced technologies, including artiﬁcial intelligence [26]

and machine learning, play a crucial role in optimizing the

performance of integrated networks [27]. These technologies

enable intelligent decision-making, resource allocation [28],

and network optimization [29], leading to enhanced system

efﬁciency and robustness [30]. By leveraging these advanced

techniques, the integration of terrestrial and satellite networks

can achieve optimal performance while delivering reliable and

uninterrupted connectivity [31].

In order to improve network performance in integrated

terrestrial-to-satellite communication networks, this paper

presents a cache-enabled downlink framework that is specif-

ically made for NOMA-based systems. To increase overall

network efﬁciency, the framework optimizes user association

[32], transmission power control, and caching placement.

To tackle this optimization problem, the proposed approach

employs a state-of-the-art and highly efﬁcient multi-agent-

enabled deep reinforcement-based learning mechanism. The

effectiveness of the proposed method is demonstrated through

a comprehensive comparison with benchmark algorithms,

showcasing its superior performance in optimizing the given

problem. By leveraging advanced deep reinforcement learning

techniques [33], the proposed approach introduces a novel

and innovative solution for addressing complex optimization

challenges in integrated networks. The study’s contributions

and results are extensively discussed in the subsequent sec-

tions, providing a practical and viable solution for managing

and allocating resources in hybrid networks that integrate

both terrestrial and satellite infrastructures. By incorporating

caching capabilities into the framework and optimizing various

network parameters, the proposed approach aims to improve

overall network performance, reduce congestion, enhance data

retrieval efﬁciency, and minimize transmission delays. This re-

search contributes to the advancement of integrated terrestrial-

to-satellite communication networks by introducing an inno-

vative methodology and showcasing its effectiveness through

rigorous evaluation and comparison with existing algorithms.

A. Recent Advances (Academia)

In the last couple of years, NOMA has been extensively

investigated in different terrestrial and non-terrestrial net-

works. For example, in the ﬁeld of backscatter-enabled multi-

roadside unit vehicular-to-everything communications, authors

[44] proposed NOMA to enhance the spectral efﬁciency

of the system through optimal resource allocation. In the

context of satellite networks, researchers in [34] focused

on optimizing the system resources to investigate the long-

term utility of NOMA-enabled satellite networks. Similarly,

authors [45] optimized the power allocation and reﬂection

coefﬁcient in multi-user NOMA networks to maximize the

sum capacity, even under imperfect successive interference

cancellation (SIC) decoding. NOMA has also been employed

in satellite networks to mitigate interference and improve sys-

tem fairness. For instance, in [35], the authors utilized NOMA

techniques to achieve interference mitigation and enhance the

max-min fairness of the system. In another study, authors [36]

proposed a joint user pairing and power allocation scheme

for NOMA-enabled satellite networks, aiming to maximize

the sum capacity of the system. Moreover, the energy and

spectrum optimization in NOMA-enabled small-cell networks

have been addressed by authors [46] using a multi-objective

power allocation approach. Additionally, the admission con-

trol problem in NOMA-enabled satellite networks has been

investigated in [37] to enhance the supported users while

guaranteeing the quality of services. More recently, authors

[47] have explored the potential of NOMA-enabled backscatter

communications in Industry 5.0.

In recent years, resource allocation in hybrid terrestrial-

satellite networks has been the subject of numerous stud-

ies. One approach proposed in the literature, such as [15],

focused on utilizing precoding techniques for optimization

purposes. Another study conducted by [38] explored data

placement and delivery strategies to minimize the number

of hops required. The integration of terrestrial and satellite

technologies in wireless backhaul networks has also received

signiﬁcant attention. Researchers, as exempliﬁed in [39], have

analyzed the impact of cross-layer design on link scheduling,

ﬂow control, and frequency assignment. To address challenges

related to power and ﬂow assignment, [40] proposed the use of

convex relaxation techniques. Similarly, [48], [49] employed

successive convex approximation to transform a non-convex

DRAFT 3

TABLE I: Comparison of the proposed work with existing related works in academia.

Ref. System model Satellite(s) Work Objective OMA/NOMA AI/non-AI Proposed Solution

[34] Satellite network Single maximize the long-term net-

work utility NOMA Non-AI

Lyapunov optimization framework, the Karush-

Kuhn-Tucker conditions, and the particle swarm

optimization algorithm

[35] Satellite network Single

improve the worst overall

channel throughput rate

(OCTR)

NOMA Non-AI Heuristic approach for joint Power, decoding-

Order, and time slot optimization

[36] Satellite network Multiple maximize the sum rate and

achieve fairness NOMA Non-AI SCA for joint user pairing and power allocation

[37] Satellite network Single Maximize number of user NOMA Non-AI Matching theory for channel and power allocation

[15]

Integrated

terrestrial-satellite

network

Single Maximizing system capacity NOMA Non-AI ZF-beamforming at BS while SCA and dual

method for power allocation

[38]

Integrated

terrestrial-satellite

network

Single Minimizing path-length and

maximizing the throughput OMA Non-AI Lagrangian dual method for resource allocation

[39]

Integrated

terrestrial-satellite

network

Single Maximizing network through-

put OMA Non-AI Interior point method for power and ﬂow assign-

ment

[40]

Integrated

terrestrial-satellite

network

Single Maximizing the core network

trafﬁc OMA Non-AI Estimations based method for resource allocation

[41] Satellite network Single Maximizing sum rate NOMA Non-AI Precoding vector design and ﬁrst-order Taylor

expansion for iterative power allocation

[42] Satellite network Single Maximizing the transmission

efﬁciency OMA AI DRL for dynamic resource allocation

[43] Satellite network Single Maximizing the expected

long-term resource utilization NOMA AI DRL for dynamic resource allocation

[Our]

Integrated

terrestrial-satellite

network

Multiple

Reducing time delay and

maximizing overall energy ef-

ﬁciency

NOMA AI

Adopting MADDPG algorithm for optimizing user

association, cache design, and transmission power

control.

optimization problem into a convex one, aiming to improve

the security rate for users with inadequate channel state

information. Furthermore, in the context of satellite ground

fusion networks, authors in [41] proposed a NOMA-based

resource allocation scheme. This scheme optimized resource

allocation by grouping users into clusters and employing

an iterative beamforming algorithm with a penalty function.

Despite the development of traditional techniques for optimal

resource allocation in hybrid terrestrial-satellite communica-

tion networks, the dynamic nature of the environment poses

signiﬁcant challenges. Predicting users’ needs for cached ﬁles

is difﬁcult, and the integrated terrestrial-satellite environment

is inherently unstable. Additionally, the optimization problem

space has various limitations, making formulating an appro-

priate mathematical model challenging.

In order to tackle these challenges, researchers have ex-

plored the application of deep reinforcement learning (DRL)

algorithms for optimal resource allocation and cache design

in integrated terrestrial-to-satellite communication networks.

DRL has shown promise in addressing optimization problems

characterized by high unpredictability. Several studies have

proposed cooperative multi-agent deep reinforcement learn-

ing (CMDRL) frameworks for radio resource management

strategies in integrated networks. For instance, in [25], a

deep Q-network (DQN) was utilized to improve user access,

while [42] suggested using DQNs to formulate radio resource

management plans. In the domain of cognitive radio settings,

[50] employed various DRL techniques to regulate power.

B. Recent Advances (Industry/Standardization)

Standardization for terrestrial-satellite communications

within 3GPP began in 2017 [52]. This standardization effort

can be categorized into two primary domains: enhancements

for non-terrestrial networks and enhancements for terrestrial

networks. The former seeks to establish a global standard for

future satellite-based communications, stimulating signiﬁcant

growth in the satellite industry. Activities within the latter

domain serve a dual purpose, ensuring that mobile standards

align with the connectivity requirements for safe operation on

non-terrestrial platforms. The goals and outcomes of 3GPP’s

work spanning from Rel-15 through Rel-17, as well as the

currently under investigation topics for Rel-18, are detailed

and summarized in Table II.

In the terminology of 3GPP, terrestrial-satellite networks re-

fer to the utilization of satellites or High Altitude Platform Sta-

tions (HAPS) to provide connectivity services, particularly in

remote areas where traditional cellular coverage is lacking. In

the Rel-17, 3GPP introduced a foundational set of features to

facilitate next-generation spectrum operation over terrestrial-

satellite networks within the frequency range of FR1, which

covers frequencies up to 7.125 GHz. In the upcoming Rel-18,

3GPP aims to further enhance next-generation operations in

terrestrial-satellite contexts. This enhancement will include im-

proving coverage for handheld devices, exploring deployments

in frequency bands exceeding 10 GHz, addressing mobility

challenges, ensuring seamless service continuity between ter-

restrial and non-terrestrial networks, and examining regulatory

requirements for verifying user locations within the network

[53].

Back in 3GPP’s Rel-15, support for non-terrestrial plat-

forms within the previous network generation was ﬁrst in-

troduced. This encompassed various elements, including im-

plementing signaling procedures for identifying non-terrestrial

users through subscription-based methods. Additionally, mech-

anisms were established for reporting critical non-terrestrial

platform parameters such as height, location, speed, and

DRAFT 4

TABLE II: 3GPP standardization works on terrestrial-satellite networks. [51]–[53]

Release Advance in terrestrial-satellite networks

Rel-15

Rel-15 focuses on New Radio, which is proposed for the support of terrestrial-satellite networks [TR 38,811]. It also identiﬁes

relevant use case scenarios for terrestrial-satellite networks and spectrum integration, such as S-band and Ka-band. Moreover, it also

deﬁnes the footprint size, angle of evaluation, beam conﬁguration, and antenna design. Further, this release speciﬁes the channel

propagation model [TR 38.901].

Rel-16

This release proposes solutions for new radio in terrestrial-satellite networks [TR 38.821]. It focuses on FR1 bands in terrestrial-

satellite networks to support the Internet of Things (IoT). Moreover, it identiﬁes the changes required in the physical layer and

other layers while the assumptions are in system-level simulations. Besides that, this release also studies resource optimization’s

impact on terrestrial-satellite networks’ performance. Furthermore, it incorporates the access of terrestrial-satellite networks in

next-generation communications, as mentioned in [TR 22.822], for delivering various services.

Rel-17

Rel-17 discusses the support of narrowband IoT and machine-type communication in terrestrial-satellite, mentioned in [TR 36.763]. It

is primarily tailored to the speciﬁc demands of IoT applications. In the context of 6G, signiﬁcant attention has been directed towards

the architectural considerations for satellite access as delineated in [TR 23.737]. This undertaking encompasses enhancements

across multiple facets, including reﬁnements in radio frequency and physical layer parameters, protocol optimizations, and the

more effective management of radio resources. Moreover, it involves the identiﬁcation of an apt architectural framework, resolving

issues about integrated-satellite roaming, and augmentation of conditional handover procedures.

Rel-18

The terrestrial-satellite enhancements will examine the system coverage for practical handheld devices and access beyond 10 GHz

for stationary and mobile platforms. The research will explore the prerequisites for network-validated user positioning and tackle

issues related to mobility and the seamless continuity of services as users transition between terrestrial and satellite networks and

different non-terrestrial networks.

ﬂight path. New measurement reports were also introduced

to effectively manage non-terrestrial interference, particularly

in scenarios involving a speciﬁc density of low-altitude non-

terrestrial platforms.

In subsequent releases, 3GPP’s focus extended to address

the needs of connected non-terrestrial platforms at the applica-

tion layer while strongly emphasizing security considerations.

These releases also laid the foundation for deﬁning how non-

terrestrial platforms interact with the Trafﬁc Management sys-

tem, enabling coordinated and secure non-terrestrial platform

operations within the network. As next-generation use cases

evolve, 3GPP’s Rel-18 is set to introduce dedicated next-

generation spectrum support explicitly tailored for devices

operating onboard aerial vehicles. This development will in-

volve exploring additional triggers for conditional handover,

using BS uptilting techniques to improve communication,

and implementing signaling mechanisms to indicate non-

terrestrial platform beamforming capabilities, among other

enhancements [53].

C. Motivation

DRL has emerged as a promising approach for addressing

resource allocation and cache design challenges in satellite

scenarios in recent years. It has been successfully applied

to optimize resource allocation for throughput and bandwidth

in hybrid terrestrial-to-satellite communication networks [54]

and to allocate resources in multi-beam satellite communica-

tion systems [43]. DRL has also been utilized in cognitive

satellite scenarios for multi-objective optimization [55] and

task scheduling [56]. The use of DRL has proven beneﬁcial

in cache design as well. Actor-critic frameworks have been

employed for edge caching scenarios [56], and Q-learning and

DQN with value function approximation have been applied

to address joint optimization problems for base station and

user caching [57]. DRL-based algorithms have also been

utilized for resource distribution and cache placement. For

example, Zhang et al. proposed a DRL algorithm in [58]

to simultaneously optimize user association, NOMA power

allocation, UAV deployment, and UAV cache placement to

minimize content delivery time. Additionally, a Q-learning-

based algorithm was presented in [59] for resource allocation

and cache placement.

However, it is important to note that existing single-agent

DRL algorithms have limitations when dealing with a large

number of agents in dynamic and unpredictable environments.

Moreover, as the number of agents increases, the optimal

allocation of resources and cache design become increasingly

complex. Despite this complexity, this particular topic has not

been thoroughly investigated. Building upon the preliminary

ﬁndings presented in [60], this study aims to explore the issue

in greater depth using three distinct approaches.

•New Cache Architecture: The study proposes a

novel cache architecture speciﬁcally designed for hybrid

satellite-based networks. This new architecture aims to

optimize cache utilization and enhance overall network

performance.

•Agent-Based Modelling: The research employs an

agent-based modelling approach to represent users, base

stations, and satellites within the network. By using this

framework, the study investigates optimal resource allo-

cation strategies and cache design for improved network

performance.

•Simulation and Evaluation: The proposed methods are

evaluated through simulations from various perspectives.

The simulation results provide insights into the effective-

ness and efﬁciency of the new cache architecture and the

agent-based modelling approach.

By combining these three approaches, the study aims to

advance our understanding of optimal resource allocation and

cache design in multi-agent-enabled reinforcement learning-

based integrated terrestrial-satellite networks. The results ob-

tained from the simulations will contribute to the valida-

tion and evaluation of the proposed methods. In contrast to

widely used deep reinforcement learning algorithms such as

Deep Deterministic Policy Gradient (DDPG), Random Policy

algorithm, Genetic Algorithm (GA), and Proximal Policy

Optimization (PPO), our proposed approach incorporates three

key differentiating factors:

DRAFT 5

•Multi-Agent Framework: Our approach employs a

multi-agent framework, which is speciﬁcally tailored to

address the challenges of resource allocation and cache

design in integrated terrestrial-satellite networks. This

framework enables the modeling of multiple interacting

agents, including users, base stations, and satellites, al-

lowing for more realistic representation of the network

dynamics and interactions.

•New Cache Architecture: The proposed optimization

scheme introduces a new cache architecture for hybrid

terrestrial-satellite networks. The objective of this cache

architecture is to optimize cache utilization and improve

the network performance. Adopting this innovative cache

design, the proposed optimization scheme tackles the

unique cache-related challenges in integrated terrestrial-

satellite networks.

•Simulation and Evaluation: The proposed approach

performs extensive simulation results and evaluates the

system performance from various perspectives. Through

these ﬁndings, we assess the system performance in

more detail and the effectiveness of our proposed model.

Moreover, it enables us to evaluate and reﬁne the resource

management strategies and cache design, ensuring their

practicality in real-world scenarios.

Combining all these factors creates a comprehensive and

specialized solution that advances the understanding and de-

velopment of optimal resource allocation and cache man-

agement strategies in integrated terrestrial-satellite networks.

By integrating the multi-agent framework, the novel cache

architecture design, and the numerical-based assessment, The

proposed scheme provides practical and efﬁcient solutions that

address these networks’ speciﬁc challenges and requirements.

Table I provides a comparison of the most related work with

the proposed optimization framework.

D. Contributions

As perceived from the above discussion, the challenges

associated with efﬁcient cache design and optimal resource

allocation in integrated terrestrial-satellite communication net-

works are signiﬁcant. To overcome these challenges, we intro-

duce a two-stage approach emphasizing optimal resource allo-

cation and cache design based on the Multi-Agent Deep De-

terministic Policy Gradient (MADDPG) algorithm. In which

users are regarded as agents and utilize a gradient approach

based on MADDPG to optimize the allocation of resources.

This optimization takes into account both user association

and transmission power control. The subsequent phase entails

introducing a cache design plan. In this plan, agents are

represented by base stations and satellites, and their objec-

tive is to enhance energy efﬁciency. The simulation results

demonstrate the effectiveness of the proposed multi-agent

deep reinforcement learning (DRL) algorithm in addressing

the optimization problem. This study has the potential to

signiﬁcantly enhance the performance of integrated terrestrial

and satellite networks while also laying a solid foundation for

future research in this area. This paper’s main contributions

are as follows:

•In order to provide users with NOMA services, we

present a framework for integrating terrestrial and satellite

communication networks that use BSs and satellites. In

order to improve energy efﬁciency and cut latency, our

proposed framework uses a special cache design that uses

cache equipment for both BSs and satellites.

•After the preceding step, we formulate a joint optimiza-

tion problem with the objective of maximizing energy

efﬁciency through optimal placement of BSs and satel-

lites, considering caching, an association of users as well

as transmission power control.

•Following that, in two steps, the optimal allocation of

resources, as well as the design of cache aspects of the

optimization problem, are addressed. To accomplish this,

we employ the Multi-Agent Deep Deterministic Policy

Gradient (MADDPG) multi-agent deep reinforcement

learning algorithm, which permits users, base stations,

and satellites to act as agents and optimize resource allo-

cation and caching placement. On the basis of MADDPG,

a novel power control and user association system has

been introduced.

•Our proposal entails a cache design strategy that relies on

the MADDPG technique. This plan enables both Base

Stations (BSs) and satellites to select ﬁles from a ﬁle

library and subsequently store them in their respective

local cache pools. The primary beneﬁt of implementing

this approach is that it signiﬁcantly enhances the system’s

energy efﬁciency.

•Finally, we conclude by comparing our suggested opti-

mization framework to benchmark algorithms in order

to assess its effectiveness. Regarding energy efﬁciency,

user satisfaction, and throughput, the experimental results

show that our approach performs better than the other

algorithms.

II. SY ST EM MO DE L

Motivated by the concept of integrated terrestrial non-

terrestrial networks in [38]–[42], [61]1, our study considers an

integrated terrestrial-satellite network providing joint services

to ground users, as shown in Figure 1. The network comprises

a set of Mbase stations (BSs) on the ground, denoted by B,

and Klow-orbit satellites, represented by K. The users are

divided into two groups: Nbusers are served by ground BSs,

while the remaining Nsusers are served by satellites.

To mitigate user interference, we utilize a Non-Orthogonal

Multiple Access (NOMA) scheme for BS-connected users

[62]. This approach employs successive interference cancel-

lation (SIC) at the receiver and superposition coding at the

transmitter, allowing for sequential detection, demodulation,

and interference cancellation. Users associated with a single

BS are clustered into NOMA groups, where decoding priority

is given to those with superior channel information, reducing

interference from the users with higher path loss

1Please note that integrated terrestrial and non-terrestrial networks are an

emerging research area in academia and industry. The terrestrial networks are

well-established and deployed. The non-terrestrial networks are expected to

be deployed by the end of 2030. However, integrating both networks involves

several challenges that need to be tackled.

DRAFT 6

Fig. 1: System Model

Moreover, the user association is a critical element of the

proposed integrated terrestrial and satellite communication

network, which enables users to connect to either a base

station or a satellite during each time slot t. To represent these

connections, binary variables αm

n(t)and αk

n(t)are used to

indicate whether the nth user is connected to the mth base

station or the kth satellite at time t. These binary variables

play a crucial role in optimizing network performance, as they

directly impact the quality of service for users and the overall

network capacity. Optimization objectives such as minimizing

interference, maximizing throughput, or balancing network

load can be achieved by adjusting these binary variables.

Furthermore, the SINR of the nth user connected to the

mth BS in a given time tis calculated using the implemented

NOMA scheme [63]:

γm

n(t) = αm

n(t)|gm

n(t)|2pn(t)

n′(t) + Im′

n′(t) + Ik(t) + No

,(1)

Where in the above equation, each user has a transmit power

pn(t) = βn(pmax

b/N1

b), which depends on a power control

factor βn(t), the maximum power available at the base station

pmax

b, and the number of users a single base station can

serve N1

b. Following that, the channel between the nth user

to mth BS is represented by gm

n(t) = qˆgn,md−ξ

n,m, where

ˆgn,m denotes Rayleigh fading coefﬁcient, dn,m is the distance

and ξrepresents the pathloss exponent [64]. Practically, the

distance dn,m can be calculated using the associated localiza-

tion method, which involves cooperative positioning among

multiple aircraft in cellular networks, as discussed by the

authors in [65]–[67]. Similarly, interference in the network

is caused by users at the same base station and users in

other base stations. From the satellite, users are represented

by Im

n′,Im′

n′and Ikrespectively. Whereas Norepresents the

noise spectral density. Moreover, the interference caused by

users in the same base station is calculated based on the

channel gains between users and is represented as Im

n′(t) =

Pn′=nαm

n′(t)|gm

n′(t)|2pn′(t). Hence, the interference caused

by the n′th users forms m′th base stations are determined by

summing over all active users in other base stations, repre-

sented as Im′

n′(t) = Pm′=mPNm′

n′=nαm′

n′(t)gm′

n′(t)

2pn′(t).

Moreover, the interference caused by the user connected

to the satellite is determined by the channel gain between

the user and the base station, represented as Ik(t) =

k=1 PNs

n=1 ak

n(t)|hm

n(t)|2pk

n(t). The transmission power of

the nth satellite user is denoted as pk

n(t).

Hence, the SINR of nth satellite user can be expressed as

[68]:

γk

n(t) = αk

n(t)hk

n(t)

2pk

n(t)

n(t) + Ik

n′(t) + Ik′

n′(t) + No

,(2)

where in the above expression, hk

n(t)represents the block

faded channel between the user nand their associated kth

satellite such that hk

n(t) = ˆ

hjπϑ

n,k ,, where ˆ

hn,k denotes

the complex-valued channel coefﬁcient, ϑis the Doppler

shift while j=√−1. Moreover, the transmission power

control of the nth satellite user, pk

n(t=βn(pmax

k/N2

s)),

is determined by the power control factor βn(t)and the

maximum available power pmax

k. Each satellite can serve

up to N2

susers. Interference in the network is caused by

users in base stations and other satellite users, which are

denoted by Im

n(t),Ik

n′(t)and Ik′

n′(t), respectively. Moreover,

the interference from users associated with the BSs can be

calculated as Im

n(t)(t) = PM

m=1 PN1

n=1 αm

n(t)gk

n(t)

2pm

n(t).

DRAFT 7

Similarly, the interference from other users from the same

satellite as well as from other satellite users can be calcu-

lated as Ik

n′(t) = PN2

n′=nαk

n′(t)hk

n′(t)

2pk

n′(t)and Ik′

n′=

Pk′=kPn′=nαk′

n′(t)hk

n′(t)

2pk

n′(t). Here, nis the index of

the current satellite user, Mis the total number of base stations

in the network, and Kis the total number of satellites in the

network. Let us assume large and small scale fading [69]; then

the complex-valued channel coefﬁcient can be deﬁned as:

hn,k =sGkGk

nc

4πfcdk

n2,(3)

where cdeﬁnes the speed of light, fcstates the carrier-

frequency, dk

nrepresents the distance from the satellite, Gk

describes the antenna gain at receiver, and Gkshows the

antenna gain at the satellite. It is important to mention that

the antenna gain of the satellite Gkgenerally depends on the

radiation pattern and the ground terminal location. It can be

written as:

Gk=Gmax J1(Λk

2Λk

+ 36J3(Λk

(Λk

n)32

,(4)

where Gmax is the maximum gain at the beam of satellite,

Λk

n= 2.07123 sin (θl,ι)/sin(θ3dB )such that θk

nshows the

angle between the ground terminal and the satellite for any

given location, where the 3 dB loss related to the satellite

beam, which is given by θ3dB. Further, J1and J2represent

the Bessel functions of the ﬁrst and second orders, respectively.

Following that, the nth users energy efﬁciency at time slot

tcan be expressed as follows:

Ψn(t) =

m=1

αm

n(t)Rm

n(t)

n(t)+

k=1

αk

n(t)Rk

n(t)

ps,m(t),∀n, l. (5)

Moreover in (5), Rm

n(t) = log2(1 + γm

n(t)),Rk

n(t) =

log21 + γk

n(t)reprsent the achievable rates. Similarly, the

proposed system model facilitates the ﬁle retrieval process for

network users by utilizing cache pools in base stations and

satellites. To request a ﬁle, users can access a ﬁle library

U={1, . . . , U }, and the cache pool size is ﬁxed and based

on the number of ﬁles and their size. Speciﬁcally, each base

station and satellite have a cache pool of size Mu< U and

Ms< F , respectively, where the base station can store Mu×s

bits of ﬁles, and each satellite is capable of storing Ms×s

bits of ﬁles.

Similarly, when a nth puts in a request for a ﬁle, to address

this, the system ﬁrst checks if the requested ﬁle is available in

the pool of the local cache of the base station. Let Lmdenote

the set of ﬁles cached at base station m. If the requested ﬁle

is in Lm, the ﬁle can be transmitted to the user, and the power

consumed during this process is represented by pm,r(t). If the

ﬁle is not available in Cm, the user looks for the ﬁle in the

core network, which incurs a power consumption of pl,r(t).

Similarly, if the requested ﬁle is available in the satellite

cache pool, In that case, the user retrieves the ﬁle directly,

and the power consumed during this process is represented by

pk,r (t), where Lkdenotes the set of ﬁles cached at satellite

lk. If the requested ﬁle is unavailable in Lk, the user’s request

is forwarded toward the ground gateway, and a ﬁle is accessed

from the core network, resulting in a power consumption of

pl,r(t).

The caching system provides beneﬁts in terms of reducing

time delays and alleviating power consumption by prioritizing

locally cached ﬁles over ﬁles that need to be retrieved from

the core network. The caching gain is dependent on whether

the requested ﬁle is found in the cache pool (local) or not.

Similarly, the Jm(t)indicates whether the local cache

device satisﬁed the mth BS user’s ﬁle request at time t.

Jn(t) = (1,Request Satisﬁed

0,Otherwise.(6)

Files in the system are assumed to follow the Zipf distribution,

and their popularity affects the caching effect. In this regard, a

generalized Zipf distribution is used in the system to estimate

values of εranging from 0.56 to 0.83, respectively [56].

ym=1/uε

u=1(uε)−1,∀u. (7)

The reward for caching deployment is given by reducing the

time delay can be expressed as follows:

xn(t) = Jn(t)Cs

nT−1

n,(8)

The system model represents the time delay Tnassociated

with downloading requested content for the user nvia a back-

haul link. Moreover, the size of the cache ﬁle is denoted as

s, and Cnrepresents the content requested by the nth user.

When the requested content is available in the local cache, it

can be directly obtained from there, reducing the time delay.

Likewise, it’s worth examining the advantages of using a

satellite cache in terms of minimizing latency and transmitting

data that is cached, thereby improving overall performance,

which can be expressed as:

n(t) = Jm(t)Ck

n(Tk

m)−1(9)

The time required for the nth user to download the requested

content through the back-haul link is (Tmk), indicating a delay

in the process.

The cache policy’s effectiveness is evaluated by examining

the hit rate for the cache, which represents the proportion of

requests from users that are successfully fulﬁlled. To calculate

the cache hit rate for a given duration of time t, the following

formula can be used:

Ω(t) = PN

n=1 Jn(t)

N.(10)

We will use the notation P(t)to represent the total amount of

power used by all BS users at ttime.

P(t) = pn(t) + J′

n(t)pl,r(t) + Jn(t)pn,r(t),(11)

Where J′

n(t) = (1 −Jn(t)). In our system model, the power

consumption for both BS and satellite users can be expressed

using the variables P(t)and Pk(t), respectively. Power con-

sumption for BS users includes both the transmission power

of user n, denoted as pn(t)and the power consumption for

data retrieval. The latter is further subdivided into two parts:

pn,r(t)for data retrieval from the BS cache as well as pl,r (t)

DRAFT 8

from the core network via the back-haul link, respectively. On

the other hand, for users connected to the satellite, we utilize

the variable Pk(t)to represent the total power consumption,

including the power consumption for transmission and data

retrieval.

Pk(t) = pk,n(t) +(1 −Jn(t)) pk,l,r (t)+Jn(t)pk ,n,r(t),(12)

Various factors determine the power consumption of a user

in a satellite network. The transmit power pn

k(t)required for

the user nto communicate with the satellite, as well as the

power consumed during data retrieval from the satellite’s cache

pk,n,r (t)or the core network via the Gateway Station pk,l,r(t),

are examples. We combine the BS cache with the satellite to

determine how well the satellite network uses energy. This

allows us to estimate nth user energy efﬁciency over t. Energy

efﬁciency is a crucial metric for ﬁguring out how well the

satellite network works and how to improve its design to give

users the most beneﬁts while using the least power. It can be

expressed as follows:

Ψn(t)=

m=1

αm

n(t)Rm

pn(t)+ J′

npl,r(t) + Jn(t)pn,r(t)

k=1

αk

n(t)Rk

n(t).

(13)

Where in (13), Jn′

n1−Jm(t).

III. PROB LE M FORMULATION

This work aims to maximize the system’s overall energy

efﬁciency by optimal allocation of resources, e.g., transmission

power, user association matrix, and cache layout. Furthermore,

it seeks to minimize the system’s overall energy consumption

while maintaining high performance and quality of service.

The ultimate goal is to achieve an optimal balance between

energy efﬁciency and system performance by ﬁnding the most

efﬁcient way to allocate resources while meeting operational

requirements. The associated constraint represents the restric-

tion that each user is limited to a single BS or satellite during

a speciﬁc period and can be expressed as follows:

m=1

αm

n(t) +

k=1

αk

n(t)≤1,∀n. (14)

A maximum power constraint exists for each user associated

with a BS or a satellite. The transmission power limit for users

associated with a BS is given by:

pn(t)≤pmax

,∀n. (15)

The transmission power limit for users associated with a

satellite is given by:

n(t)≤pmax

,∀n. (16)

It is important to remember that the QoS limitations of each

BS and satellite limit the maximum number of users they can

accommodate. The maximum number of users for a speciﬁc

BS is:

n=1

αm

n(t)≤Nb

1,∀n. (17)

The quantity of service constraint for a satellite is:

n=1

αk

n(t)≤N2

s,∀n. (18)

The following constraint shows that each user’s power

control factor is restricted to fall within 0to 1.

βn(t)∈[0,1],∀n. (19)

The caching strategy of base stations and satellites is

constrained by the capacity of their respective local caches.

In addition, the size of user content requests is smaller than

the available local storage capacity, which is still insufﬁcient to

accommodate the total size of all ﬁle libraries. The constraint

that describes the limitation on the local cache capacity for

BS and satellite is as follows:

Cn≤Mf≤U, Cn≤Ms≤U. (20)

Based on the aforementioned objective and constraints, the

optimization problem can be expressed mathematically. One

possible formulation is as follows:

max

m=1

αm

n(t)ηm

k=1

αk

n(t)ψk

m=1

αm

n(t) +

k=1

αk

n(t)≤1,∀n. (21a)

n=1

αm

n(t)≤Nb

1,∀n. (21b)

n=1

αk

n(t)≤N2

s,∀n. (21c)

pn(t)≤pmax

,∀n. (21d)

n(t)≤pmax

,∀n. (21e)

βn(t)∈[0,1],∀n. (21f)

Cn≤Mf≤U, (21g)

Cn≤Ms≤U. (21h)

Where, ηm

n=Rm

pn(t)+J′

npl,r(t)+Jn(t)pn,r (t)and ψk

n=Rk

n(t).

The optimization problem mentioned above appears to be a

mixed-integer nonlinear optimization problem (MINLP). This

is because it involves both integer and continuous variables and

nonlinear constraints such as power constraints in constraints

(15) and (16). Additionally, the objective function is also

nonlinear. MINLPs are known to be challenging to solve as

they combine the computational difﬁculties of both nonlinear

and integer optimization problems.

DRAFT 9

IV. OPTIMIZING TERRESTRIAL-S ATEL LI TE NE TW OR K

EFFICIENCY WITH MU LTI -AG EN T DRL (MADDPG)

This section outlines a MADDPG approach for enhanc-

ing the integrated terrestrial-satellite NOMA communication

network. The main aim of this work is to maximize the

objective function value by determining the optimal allocation

of resources, e.g., transmission power control, the design of

the cache, and the allocation of users. Therefore, to achieve

this, we suggest two MADDPG algorithms that concentrate

on different subproblems. To ensure peak performance, both

algorithms carefully choose the agents.

A. Reinforcement Learning

The objective of reinforcement learning (RL), a type of

machine learning, is to teach an agent how to interact with

the environment to maximize a cumulative reward signal. RL

does not always need a dataset to learn from, unlike supervised

learning, which necessitates a labeled dataset. Instead, an RL

agent can learn by interacting with its surroundings, getting

feedback through rewards or penalties, and then changing its

behavior.

The agent participates in the RL process by acting in the

world and receiving feedback as a reward signal. In order

to raise its expected cumulative reward, the agent modiﬁes

its policy, a function that links states to actions. Modifying

the parameters of the decision-making model is part of the

learning process, which involves updating the policy.

The agent continuously reﬁnes its behavior through trial and

error, one of RL’s advantages. The agent experiments with

various actions in the environment, evaluates the rewards that

result, and then modiﬁes its strategy to carry out more actions

that produce greater rewards. This cycle repeats until the agent

discovers a course of action that maximizes its cumulative

reward.

B. Enhancing NOMA Networks with MADDPG

The integrated terrestrial-satellite NOMA communication

network comprises multiple agents, which makes it a com-

plex multi-agent scenario. In such a scenario, the suggested

MADDPG algorithm is the most suitable approach due to

its ﬂexibility in handling many agents. On the other hand,

traditional single-agent reinforcement learning methods may

encounter overﬁtting issues against competitors in dynamic

and unstable environments. The objective function value in

an integrated terrestrial and satellite NOMA communication

network is maximized by using a Markov decision process

(MDP), State space (S), action space (A), reward space, and

transition probability space must all be deﬁned. We can only

ﬁnd an effective solution if we accurately model the problem.

In this conﬁguration, each user acts as an agent by monitoring

their immediate environment, selecting appropriate actions

from the available action space, and carrying those actions

out to satisfy the conﬁguration’s prerequisites. When a user

completes all of their tasks, they are rewarded. Regardless

of the algorithm used, in both analyzed approaches, agents,

actions, states, and rewards had distinct deﬁnitions.

1) Multi-Agent Reinforcement Learning for User Associa-

tion and Power Control in NOMA Networks: An integrated

terrestrial-satellite network’s energy efﬁciency optimization

problem can be resolved using the MADDPG algorithm,

detailed in Algorithm 1. These key concepts, which include

agents, actions, states, and rewards in this algorithm, are

deﬁned below:

agent: Every participant in an integrated terrestrial and

satellite-enabled NOMA communication network is treated as

a potential agent.

Action: In the above-mentioned system design, every agent

is assigned 2tasks speciﬁed by the action space A1=

A11, A12 . The ﬁrst task, denoted as A11, involves the user as-

sociation process, establishing the relationship between agents

and the base stations (BSs) or satellites. The A11 action is

represented by a vector A11 =an

1(t), . . . , an

M(t), where each

entry corresponds to the association decision of a particular

agent. To represent the A11 action discretely, the action space

A11 must be discretized. The second task, represented by a

vector A12 =α1(t), . . . , αM(t), involves the power control

factor that each agent uses to determine its transmission power.

In summary, each agent ﬁrst selects the appropriate BSs or the

satellites for the association using discretized user association

action A11 and subsequently determines its transmission power

using the transmission power control factor A12.

Reward: Similarly, each user aims to optimize energy

efﬁciency by taking appropriate actions. To evaluate the ef-

fectiveness of these actions, we deﬁne the reward for the n-th

user at the current time slot tas follows:

R1(t)Ψn(t).(22)

State: In this system, the state space for each agent is de-

termined by its observation of energy efﬁciency. Speciﬁcally,

for the user nin time t, the state is determined by comparing

its energy efﬁciency with the previous time slot. If there is an

improvement, then SΨn

1mis ≈1. Moreover, for the entire system

state space is deﬁned as S1=SΨn

11 (t), . . . , SΨn

1N, where SΨn

represents the state for the user nin time slot t.

SΨn

1i=(1,if R1(t)≥R1(t−1)

0,else (23)

where irepresents the index of n-th user.

2) MADDPG-based Cache Optimization: In order to allo-

cate the resource optimally in the integrated terrestrial and

satellite NOMA communication network, we employ Algo-

rithm 1. Once this optimization is complete, we proceed with

the optimization for the design of the cache of the BS as well

as for the satellites using Algorithm 2. Due to the differences

in the optimization goals, the actions, agents, rewards, and

states in Algorithm 2 may differ slightly from those used in

Algorithm 1 and can be stated as follows:

Agent: Algorithm 2 treats BSs or satellites as agents decid-

ing which ﬁles to retrieve from the library.

Action: Every satellite or BS determines ﬁles to utilize

from the ﬁles library for each time slot. This set of ﬁles,

denoted as A2 = {A21}, constitutes the local cache pool by

combining the ﬁle libraries. Reward: The system’s goal is to

DRAFT 10

maximize its energy efﬁciency, which is achieved by having

each base station (BS) or satellite execute a set of operations.

The number of agents in the system totals M+K. The overall

energy efﬁciency of the users served by the n-th base station

or satellite is the reward for the operations of the facilities that

operate those base stations and satellites. To be more speciﬁc,

the sum of the reward for the n-th piece of bad behavior that

occurs during the time slot tdepends on the quantity that is

denoted as:

R2(t)Ψn

=(PNb

n=1 Ψn, n ∈[1, Nb],

PNs

n=1 Ψn, n ∈[M+ 1, M +K]

(24)

State: In Algorithm 2, the satellite or base station (BS) is

the agent responsible for optimizing the system’s energy efﬁ-

ciency. Thus, S2piΨnis set to 1 if the reward for the n-th BS or

the satellite time tslot is greater than that in the t−1. The sys-

tem’s state space is denoted by S2={SΨn

21 (t), . . . , SΨn

2N(t)}.

The quantity AAA is assigned to S2iEE(t).

SΨn

2i(t) = (1,if reward

0,otherwise. (25)

In order to achieve a greater degree of stability, the MADDPG

algorithm can gather data concerning the activities of various

other agents. The probability of the change will be discussed

in more detail in the upcoming paragraphs.

Ps′|s, a1,...aN, X1,...,XM=Ps′|s, a1,...aN

=Ps′|s, a1,...aN, X ′

1,...,X′

M.(26)

A MADDPG algorithm for an integrated terrestrial-satellite

NOMA communication network maintains stability even when

agents’ policies are dynamically updated. Equation (26) shows

the state transition probability, with airepresenting the action

taken by the agent and srepresenting the current state. The

network includes Magents, each with a set of corresponding

parameter values w=ϖ1, . . . , ϖMoptimized to maximize

returns via the MADDPG algorithm.

The MADDPG algorithm uses reinforcement learning to

enable multiple agents to learn from their experiences and

improve policies cooperatively. Each agent updates its policy

based on observations and a centralized critic estimating the

expected return. The critic considers all agents’ experiences

and policies, facilitating collaboration and performance im-

provement.

All Magents’ policy values are represented by x=

x1, . . . , χM , and each agent optimizes its policy using its

unique set of parameter values w. The following equation

determines the gradient of the objective function, which mea-

sures how the policy should be updated for each agent. ByA-

gentsan achieves better network performance and increases

overall system capacity. by optimising individual policies

∇xiI(χi) = Ex,a∼Dh∇xiχi(αi|ωi)∇aiQX

i(x, a1,...,aN)i,

(27)

The MADDPG algorithm utilizes two distinct neural networks,

namely the actor network and the critic network, in order to

optimize the performance of all the M agents in the integrated

terrestrial and satellite NOMA networks. At the same time, the

observation and action spaces of the agents are denoted by x

and a, respectively, while the replay memory is represented

by D. The actor-network in the algorithm selects actions by

taking into account the policy, using a continuous action space

to choose between A1and A2. On the other hand, the critic

network in the algorithm is used to evaluate the actions that

need to be executed by updating the Qfunction, which is

represented by Qγ

1(x, a1, . . . , aN)as shown in equation (27).

To update the networks, the policy network of the actor-

network is updated using gradient descent based on equation

(27) [70]. Meanwhile, the critic network updates the Qfunc-

tion by minimizing the loss function L(ωi), as illustrated in

the following equation.

y=ri+rQu′

ix′, a′

1,...,a′

Na′

j=u′

j(oj).(28)

C. Algorithm Description

This section presents two algorithms, Algorithm 1 for

system resource allocation and Algorithm 2 for system cache

design, based on the MADDPG algorithm [60]. Before starting

the algorithms, the neural network parameters and replay

memory are initialized. The actor-network probabilistic-ally

selects behaviours, while the other network, also known as

the critic network used to evaluate the chosen behaviours.

In contrast, the actor adjusts the probability of the selected

behaviour upon the evaluation from the critic network. In

iterative MADDPG for terrestrial-satellite network, each agent

is assigned an initial state and then observes its new state

at each step of an episode, with energy efﬁciency improved

at each instance. After successfully executing an action, each

agent is rewarded and transitions to a new state. Both policy

and exploration inform the agent’s decision on what to do.

Finally, these values are stored in memory for potential future

replay.

V. RE SU LTS AND DISCUSSION

In this study, we have employed a simulation-based ap-

proach to evaluate the performance of our proposed model2.

The experimental process involves the development of a syn-

thetic/virtual simulation environment that closely represents

the characteristics and behaviours of integrated terrestrial and

satellite NOMA communication networks. The simulation

environment is designed to mimic the network infrastructure,

including base stations, satellites, users, and the commu-

nication channels between them. Various factors, such as

user mobility, channel conditions, interference, and caching

mechanisms, are considered to create a realistic simulation

environment. We utilize MATLAB-based software tools and

frameworks speciﬁcally designed for network simulations to

implement the simulation environment. These tools provide

2Given the unique and intricate nature of the network conﬁguration under

examination, it is not straightforward to directly compare it to previous studies

on integrated terrestrial-satellite networks found in the existing literature.

Therefore, we turn to the comparison of the proposed scheme with the differ-

ent learning algorithms, including the deep reinforcement learning algorithm

(DDPG), the Random Policy algorithm, the Genetic Algorithm (GA), and the

Proximal Policy Optimization (PPO) algorithms.

DRAFT 11

Fig. 2: Algorithm 1 Flow Chart

capabilities for modelling and simulating the network compo-

nents, implementing multi-agent deep reinforcement learning

algorithms, and evaluating performance metrics. Furthermore,

the experimental setup involves the deployment of our pro-

posed multi-agent deep reinforcement learning model within

the simulation environment. We carefully conﬁgure and tune

the hyperparameters of the MADDPG algorithm, such as

network topology, learning rates, exploration strategies, and

reward functions, to ensure effective resource allocation and

cache design. Our research primarily focuses on the algorith-

mic and methodological aspects of resource allocation and

cache design in integrated terrestrial and satellite NOMA

communication networks. Therefore, the experimental setup

primarily involves the software-based simulation environment

rather than physical or hybrid infrastructure.

The experimental environment encompasses speciﬁc conﬁg-

urations, including a network with 36 agents, consisting of 6

agents representing base stations and 2 agents representing

satellites. The channel characteristics of the base stations

follow a Rayleigh distribution, while the satellite parameters

align with previous study ﬁndings. The ﬁle library denoted as

U, has a capacity of 40, with 3 cache devices allocated for

the base stations and a satellite cache with a capacity of 3.

The user count and ﬁle content size is set at [1 2] bits. Power

consumption parameters are also speciﬁed, covering data re-

trieval power consumption for different scenarios. The cache

design optimization of the proposed scheme is conducted

using Algorithm 2. Our simulation model utilises an Adam

optimizer with the Rectiﬁed Linear Unit (ReLU) activation

function. The learning rate is set to 0.001, the discount factor

is 0.95, and the batch size is 10. The experiment is conducted

over 1000 iterations, with each agent completing 100 steps

Fig. 3: Algorithm 2 Flow Chart

0 200 400 600 800 1000

200

400

600

800

1000

1200

Objetive Function Value (bits/sec/Hz)

Iteration

User, N=40

User, N=32

User, N=24

Fig. 4: Algorithm convergence with varied numbers of Users

per episode. While physical or hybrid infrastructure imple-

mentation considerations are signiﬁcant in evaluating network

performance, conducting large-scale physical experiments in

integrated terrestrial and satellite networks can be complex, ex-

pensive, and challenging to scale. Hence, simulations provide a

feasible and efﬁcient approach to evaluate the performance of

our proposed model and conduct extensive experiments under

different scenarios.

We assessed the performance of the proposed scheme in

optimizing the system’s objective function by evaluating its

convergence with varying numbers of agents. Figure 4 show-

DRAFT 12

0 200 400 600 800 1000

400

500

600

700

800

900

1000

1100

Objetive Function Value (bits/sec/Hz)

Iteration

Learning Rate =10

-3

Learning Rate =10

-4

Learning Rate =10

-5

Fig. 5: Effects of learning rate on Algorithm convergence

0 200 400 600 800 1000

100

200

300

400

500

600

700

800

900

Objetive Function Value (bits/sec/Hz)

Iteration

Base-Station

Satellite

Fig. 6: BS vs. Satellite users: a comparison of objective fuction value

convergence

cases the convergence results of the proposed scheme with

24, 32, and 40 agents. To test the scheme’s effectiveness in

a more challenging environment, we increased the values of

band N1

bto 5 while maintaining the network parameters

at N= 24,M= 6, and K= 2 for the 24-agent scenario.

The curves in Figure 4 demonstrate that the proposed scheme

consistently achieved a maximum reward of approximately

700 Iterations across all three agent numbers, demonstrating its

strong convergence performance. The proposed scheme exhib-

ited robust convergence behavior, even in complex terrestrial-

satellite networks with multiple agents.

Furthermore, we conducted an analysis to assess the impact

of different learning rates on the convergence performance

of the proposed scheme. This experiment utilized network

parameters with M= 32,N= 6,S= 2, and M1=

M2= 4. As illustrated in Figure 5, the curves exhibit varying

convergence speeds and relative heights, depending on the

learning rate. Notably, a higher learning rate resulted in a faster

convergence point. This observation emphasizes the efﬁcacy

of the proposed scheme in optimizing the system’s objective

0 200 400 600 800 1000

200

400

600

800

1000

Objetive Function Value (bits/sec/Hz)

Iteration

Proposed Scheme

Benchmark 1

Benchmark 2

Benchmark 3

Fig. 7: Proposed scheme versus benchmark algorithms: a comparison

0 5 10 15 20 25 30 35 40 45

Objetive Function Value (bits/sec/Hz)

Gound Users

Proposed Scheme

Benchmark 1

Benchmark 2

Benchmark 3

Benchmark 4

Fig. 8: The impact of user count on objective function value of

different algorithms

function across a diverse range of agent densities and learning

rates.

The results depicted in Figure 6 demonstrate the conver-

gence of both base stations (BSs) and satellites. Both graphs

exhibit similar convergence rates, reaching their optimal states

relatively quickly using the same experimental setup as in the

previous ﬁgures. Figure 6 also highlights that users connected

to BSs are more energy-efﬁcient compared to satellite users.

Satellite users converge to total energy efﬁciency of approxi-

mately 175 bits per joule per hertz, whereas BS users achieve

a higher total energy efﬁciency of about 750 bits per joule

per hertz. This efﬁciency disparity can be attributed to the

improved channel conditions experienced by BS users.

To evaluate the optimization performance of the proposed

scheme, we conducted experiments employing four different

algorithms. Figure 7 illustrates the comparison between the

proposed scheme and benchmark algorithms: the widely used

deep reinforcement learning algorithm, DDPG, serving as a

benchmark; the Random Policy algorithm, the Genetic Al-

DRAFT 13

0 200 400 600 800 1000

200

400

600

800

1000

1200

1400

1600

1800

Objetive Function Value (bits/sec/Hz)

Iteration

=4, U=40

=3, U=40

=4, U=50

Fig. 9: Cache Reward Proposed Scheme: convergence across capacity

and library variations

0 200 400 600 800 1000

600

800

1000

1200

1400

1600

Objetive Function Value (bits/sec/Hz)

Iteration

Proposed Scheme

Benchmark 1

Benchmark 3

Benchmark 4

Fig. 10: Proposed scheme versus benchmark algorithms: a compari-

son

gorithm (GA), and the Proximal Policy Optimization (PPO)

algorithm. The GA algorithm simulates natural evolution to

identify the optimal solution, while the PPO algorithm is a

newly developed policy gradient algorithm. The Random Pol-

icy algorithm randomly selects user collaboration and power

control actions in each episode.

The performance of these algorithms in optimizing the

system’s objective function was evaluated based on the number

of users in the network, as shown in Figure 7 for M= 32. The

proposed scheme and the DDPG and PPO algorithms achieve

converging rewards. The proposed scheme demonstrates supe-

rior resource optimization performance, surpassing the other

two algorithms. In terms of energy efﬁciency, the proposed

scheme attains the highest rate at 625 bits per joule per hertz.

In comparison, the PPO algorithm starts at 450 bits per joule

per hertz and converges to approximately 580 bits per joule

per hertz after around 400 iterations.

Moreover, the DDPG algorithm achieves the same energy

efﬁciency as the proposed scheme. Conversely, the Random

0 200 400 600 800 1000

0.000

0.049

0.098

0.147

0.196

0.245

0.294

0.343

0.392

Request Completation Rate

Iteration

Proposed Scheme

Benchmark 1

Benchmark 4

Benchmark 5

Fig. 11: Proposed scheme compared to benchmark algorithms

1 2 3 4 5 6

400

600

800

1000

1200

1400

1600

1800

2000

Objetive Function Value (bits/sec/Hz)

Cache Size

Proposed Scheme U=40

Proposed Scheme U=50

Benchmark 1

Benchmark 3

Benchmark 4

Benchmark 6

Fig. 12: Proposed scheme and benchmark algorithms: a comparison

across cache size variations

Policy algorithm exhibits poor convergence, with its curve

ﬂuctuating between 400 and 500 bits per joule per hertz. Com-

pared to the proposed scheme, the other algorithms demon-

strate weaker stability and inferior performance in optimizing

the objective function.

The advantage of the proposed scheme (MADDPG) lies in

its ability to achieve both high resource optimization perfor-

mance and energy efﬁciency, as demonstrated in Figure 7.

The proposed scheme outperforms the benchmark algorithms,

showcasing its superior stability and consistent performance.

It achieves the highest energy efﬁciency rate of 625 bits per

joule per hertz, providing signiﬁcant gains compared to the

PPO algorithm’s convergence to around 580 bits per joule per

hertz after approximately 400 iterations. Thus, the proposed

scheme (MADDPG) stands out as an effective and efﬁcient

solution for optimizing system objectives in complex multi-

agent networks.

The relationship between the users in a network and the

objective function value of algorithms is demonstrated in

Figure 8. Results demonstrate that the system’s energy efﬁ-

DRAFT 14

0 1 2 3 4 5 6

1E-7

1E-6

1E-5

1E-4

0.001

0.01

0.1

Request Completation Rate

Cache Size

Proposed Scheme U=40

Proposed Scheme U=50

Benchmark 1

Benchmark 3

Benchmark 4

Benchmark 6

Fig. 13: Evaluating proposed scheme and benchmark algorithms for

Request Complementation rate under different cache sizes

ciency increases as the user density per BS and the satellite

increase. The proposed scheme outperforms other benchmark

algorithms regarding energy efﬁciency when three to ﬁve users

share each BS and satellite. With each BS and satellite having

ﬁve users, the system’s energy efﬁciency can reach up to

1180 bits/Joule/Hz. The proposed scheme is signiﬁcantly more

energy-efﬁcient than the random policy, DDPG, PPO, and GA

algorithms, indicating its effectiveness.

Figure 9 shows the convergence analysis of Algorithm

2 for various local capacities and ﬁle libraries. With just

1000 training iterations, Algorithm 2 converges quickly, taking

around 60 iterations. The ﬁgure compares four examples with

different ﬁle library sizes and local capacities.

Similarly, the convergence analysis of Algorithm 2 is in-

vestigated as the sizes of the satellite local capacity, BS, and

ﬁle library vary. Algorithm 2 performs well with four different

capacities. However, ineffective local cache deployment in the

ﬁrst training iteration increases power consumption, yielding

a decrease in cache reward to below 1090 bits per Joule per

Hertz. The cache rewards improve as training iterations in-

crease and converge after about 50 iterations. As the ﬁle library

grows, the cache reward in the framework decreases due to

the ﬁxed local cache capacity, making it more challenging to

locate the ﬁle in Algorithm 2. The local cache capacity and

reward decline for the same cache ﬁle library.

Figures 10 and 11 compare the convergence processes for

energy efﬁciency and cache hit rate among various cache op-

timization algorithms at Nf= 3 and U= 40. As illustrated in

Figure 10, Algorithm 2 and the DDPG algorithm can converge

in energy efﬁciency. However, Algorithm 2 performs better

for a given cache and ﬁle library size than other algorithms.

The DDPG algorithm attains a training value of 1100 bits

per joule per Hz at around 700 iterations, while benchmark-

3 exhibits an oscillating curve between 600 and 650 bits per

joule per Hz, hindering convergence. Algorithm 2 outperforms

other algorithms in optimizing the system’s objective function

more effectively and steadily.

Figure 11 shows the Algorithm 2 request competition rate.

Results demonstrate that the value rises to 0.13 and converges

at 0.33, while the DDPG algorithm’s cache hit rate converges

slowly. Therefore, the proposed scheme produces better cache

hit rates.

Figures 12 and 13 show the request competition rate and the

energy efﬁciency of various algorithms for each base station

and satellite with different cache sizes. Despite the uncached

strategy requiring less memory, the MADDPG algorithm has

lower energy efﬁciency when the cache capacity is only 1, as

depicted in Figure 12. This trend is due to the limited serving

capacity if the local cache acts as a performance bottleneck in

retrieving the relevant ﬁle as per the user’s request, resulting in

suboptimal outcomes. However, as the cache capacity varies

from value 1 to value 6, the performance of the proposed

scheme becomes more energy-efﬁcient. The MADDPG al-

gorithm consistently outperforms other algorithms regarding

energy efﬁciency, and the performance gap between the two

algorithms widens with larger cache sizes.

Figure 13 shows that the proposed MADDPG algorithm

achieves a higher cache hit rate than other algorithms, fol-

lowing the same pattern as Figure 12. These graphs help us

better understand the dynamics of the relationships. The cache

reward and hit rate will decline proportionally to the size of

the ﬁle library within a speciﬁc range, assuming that the local

cache capacity will not change. This is because accessing the

necessary ﬁles will get harder and harder as the library gets

more signiﬁcant. On the other hand, increasing the local cache

capacity will increase both cache reward and cache hit rate,

presuming that the size of the ﬁle library stays constant.

These ﬁndings validate the effectiveness of the proposed

MADDPG scheme in achieving high energy efﬁciency and

cache hit rates, outperforming other benchmark algorithms

across various network conﬁgurations and cache sizes.

VI. CONCLUSION AND FUTURE DIRECTIONS

In conclusion, our proposed approach for enhancing energy

efﬁciency in integrated terrestrial and satellite NOMA com-

munication networks offers several advantages compared to

existing reference contributions. Firstly, our use of a multi-

agent deep reinforcement learning technique, speciﬁcally the

MADDPG algorithm, outperforms benchmark algorithms and

the standard DDPG algorithm, which only utilizes a single

agent. This highlights the effectiveness of our approach in

addressing the complexities of resource allocation and cache

design in a multi-agent setting. By leveraging the MAD-

DPG algorithm, we achieve optimal user association, power

management, and cache layout. It allows us to model users

and BSs as agents and enables the resource management

and cache design optimization solution in a more efﬁcient

way. Moreover, the proposed MADDPG-based cache design

strategy enables BSs and satellites to act as agents and

intelligently select ﬁles from the library to store in local cache

pools. This strategy improves the efﬁciency of data retrieval

and maximizes the energy efﬁciency of the network. Our

results demonstrate the beneﬁts of the considered framework

of energy efﬁciency optimization in integrated terrestrial-

satellite NOMA networks. We signiﬁcantly advance this re-

search area by outperforming the benchmark algorithms and

DRAFT 15

utilizing multi-agent deep reinforcement learning. Our work

serves as a solid foundation for future studies and opens up

opportunities to explore more complex scenarios, such as the

optimal allocation of resources in a multiple-layer NOMA-

enabled satellite communication network model. Speciﬁcally,

we aim to focus on minimizing power consumption, which is

crucial for sustainable and energy-efﬁcient network operations.

While our proposed model showcases various advantages,

there are certain limitations that should be acknowledged. One

limitation is the computational complexity associated with

multi-agent deep reinforcement learning algorithms, which

can require substantial computational resources and time.

Additionally, the performance of our approach heavily relies

on accurate modelling and representation of the network

environment, including user behaviour and channel condi-

tions. Ensuring the reliability and real-world applicability

of these models is an ongoing challenge. In summary, our

proposed approach offers signiﬁcant advantages over existing

contributions, including superior performance, efﬁcient multi-

agent optimization, and effective cache design. While there

are limitations to be addressed, our research paves the way

for future investigations and advancements in energy-efﬁcient

integrated terrestrial and satellite communication networks.

REFERENCES

[1] W. U. Khan, E. Lagunas, A. Mahmood, S. Chatzinotas, and B. Otter-

sten, “RIS-assisted energy-efﬁcient LEO satellite communications with

NOMA,” arXiv preprint arXiv:2306.10422, 2023.

[2] B. Cao, M. Li, X. Liu, J. Zhao, W. Cao, and Z. Lv, “Many-objective

deployment optimization for a drone-assisted camera network,” IEEE

Transactions on Network Science and Engineering, vol. 8, no. 4, pp.

2756–2764, October 2021.

[3] B. Li, M. Zhang, Y. Rong, and Z. Han, “Transceiver optimization for

wireless powered time-division duplex MU-MIMO systems: Non-robust

and robust designs,” IEEE Transactions on Wireless Communications,

vol. 21, no. 6, pp. 4594–4607, June 2022.

[4] G. Geraci, D. Lopez-Perez, M. Benzaghta, and S. Chatzinotas, “Inte-

grating terrestrial and non-terrestrial networks: 3D opportunities and

challenges,” IEEE Communications Magazine, 2022.

[5] B. Cao, Z. Sun, J. Zhang, and Y. Gu, “Resource allocation in 5G

IoV architecture based on SDN and gog-cloud computing,” IEEE

Transactions on Intelligent Transportation Systems, vol. 22, no. 6, pp.

3832–3840, June 2021.

[6] W. U. Khan, Z. Ali, E. Lagunas, A. Mahmood, M. Asif, A. Ihsan,

S. Chatzinotas, B. Ottersten, and O. A. Dobre, “Rate splitting multi-

ple access for next generation cognitive radio enabled LEO satellite

networks,” IEEE Transactions on Wireless Communications, pp. 1–1,

2023.

[7] M. M. Azari, S. Solanki, S. Chatzinotas, O. Kodheli, H. Sallouha,

A. Colpaert, J. F. M. Montoya, S. Pollin, A. Haqiqatnejad, A. Mostaani

et al., “Evolution of non-terrestrial networks from 5G to 6G: A survey,”

IEEE communications surveys & tutorials, 2022.

[8] A. Mahmood, A. Ahmed, M. Naeem, M. R. Amirzada, and A. Al-

Dweik, “Weighted utility aware computational overhead minimization

of wireless power mobile edge cloud,” Computer Communications, vol.

190, pp. 178–189, 2022.

[9] S. Saaﬁ, O. Vikhrova, G. Fodor, J. Hosek, and S. Andreev, “AI-aided

integrated terrestrial and non-terrestrial 6G solutions for sustainable

maritime networking,” IEEE Network, vol. 36, no. 3, pp. 183–190, 2022.

[10] W. U. Khan, F. Jameel, T. Ristaniemi, S. Khan, G. A. S. Sidhu, and

J. Liu, “Joint spectral and energy efﬁciency optimization for downlink

NOMA networks,” IEEE Transactions on Cognitive Communications

and Networking, vol. 6, no. 2, pp. 645–656, 2019.

[11] H. Zhang, B. Wang, C. Jiang, K. Long, A. Nallanathan, V. C. Leung,

and H. V. Poor, “Energy efﬁcient dynamic resource optimization in noma

system,” IEEE Transactions on Wireless Communications, vol. 17, no. 9,

pp. 5671–5683, 2018.

[12] H. Zhang, H. Zhang, W. Liu, K. Long, J. Dong, and V. C. Leung, “En-

ergy efﬁcient user clustering, hybrid precoding and power optimization

in terahertz mimo-noma systems,” IEEE Journal on selected areas in

communications, vol. 38, no. 9, pp. 2074–2085, 2020.

[13] A. Nauman, M. Obayya, M. M. Asiri, K. Yadav, M. Maashi, M. Assiri,

M. K. Ehsan, and S. W. Kim, “Minimizing energy consumption for

noma multi-drone communications in automotive-industry 5.0,” Journal

of King Saud University-Computer and Information Sciences, p. 101547,

2023.

[14] A. Mahmood, A. Ahmed, M. Naeem, and Y. Hong, “Partial ofﬂoading

in energy harvested mobile edge computing: A direct search approach,”

IEEE Access, vol. 8, pp. 36 757–36 763, 2020.

[15] X. Zhu, C. Jiang, L. Kuang, N. Ge, and J. Lu, “Non-orthogonal multiple

access based integrated terrestrial-satellite networks,” IEEE Journal on

Selected Areas in Communications, vol. 35, no. 10, pp. 2253–2267,

2017.

[16] W. U. Khan, J. Liu, F. Jameel, V. Sharma, R. J¨

antti, and Z. Han,

“Spectral efﬁciency optimization for next generation NOMA-enabled

IoT networks,” IEEE Transactions on Vehicular Technology, vol. 69,

no. 12, pp. 15 284–15 297, 2020.

[17] S. Fu, J. Gao, and L. Zhao, “Integrated resource management for

terrestrial-satellite systems,” IEEE Transactions on Vehicular Technol-

ogy, vol. 69, no. 3, pp. 3256–3266, 2020.

[18] B. Deng, C. Jiang, J. Yan, N. Ge, S. Guo, and S. Zhao, “Joint

multigroup precoding and resource allocation in integrated terrestrial-

satellite networks,” IEEE Transactions on Vehicular Technology, vol. 68,

no. 8, pp. 8075–8090, 2019.

[19] Z. Zhao, G. Xu, N. Zhang, and Q. Zhang, “Performance analysis of the

hybrid satellite-terrestrial relay network with opportunistic scheduling

over generalized fading channels,” IEEE Transactions on Vehicular

Technology, vol. 71, no. 3, pp. 2914–2924, March 2022.

[20] S. Pan, M. Lin, M. Xu, S. Zhu, L.-A. Bian, and G. Li, “A low-proﬁle

programmable beam scanning holographic array antenna without phase

shifters,” IEEE Internet of Things Journal, vol. 9, no. 11, pp. 8838–8851,

June 2022.

[21] B. Li, Q. Li, Y. Zeng, Y. Rong, and R. Zhang, “3D trajectory op-

timization for energy-efﬁcient UAV communication: A control design

perspective,” IEEE Transactions on Wireless Communications, vol. 21,

no. 6, pp. 4579–4593, June 2022.

[22] M. Giordani and M. Zorzi, “Non-terrestrial networks in the 6G era:

Challenges and opportunities,” IEEE Network, vol. 35, no. 2, pp. 244–

251, 2020.

[23] A. Sattarzadeh, Y. Liu, A. Mohamed, R. Song, P. Xiao, Z. Song,

H. Zhang, R. Tafazolli, and C. Niu, “Satellite-based non-terrestrial

networks in 5G: Insights and challenges,” IEEE Access, vol. 10, pp.

11 274–11 283, 2021.

[24] F. Rinaldi, H.-L. Maattanen, J. Torsner, S. Pizzi, S. Andreev, A. Iera,

Y. Koucheryavy, and G. Araniti, “Non-terrestrial networks in 5G &

beyond: A survey,” IEEE access, vol. 8, pp. 165178–165 200, 2020.

[25] Y. Cao, S.-Y. Lien, and Y.-C. Liang, “Deep reinforcement learning for

multi-user access control in non-terrestrial networks,” IEEE Transactions

on Communications, vol. 69, no. 3, pp. 1605–1619, 2020.

[26] W. U. Khan, A. Mahmood, A. Bozorgchenani, M. A. Jamshed,

A. Ranjha, E. Lagunas, H. Pervaiz, S. Chatzinotas, B. Ottersten, and

P. Popovski, “Opportunities for intelligent reﬂecting surfaces in 6g-

empowered v2x communications,” arXiv preprint arXiv:2210.00494,

2022.

[27] I. Ahmad, J. Suomalainen, P. Porambage, A. Gurtov, J. Huusko, and

M. H¨

oyhty¨

a, “Security of Satellite-Terrestrial Communications: Chal-

lenges and Potential Solutions,” IEEE Access, vol. 10, pp. 96038–

96 052, 2022.

[28] S. Raza, S. Wang, M. Ahmed, M. R. Anwar, M. A. Mirza, and W. U.

Khan, “Task ofﬂoading and resource allocation for iov using 5g nr-v2x

communication,” IEEE Internet of Things Journal, vol. 9, no. 13, pp.

10 397–10 410, 2021.

[29] M. Ahmed et al., “Vehicular communication network enabled cav data

ofﬂoading: A review,” IEEE Transactions on Intelligent Transportation

Systems, 2023.

[30] I. Rasheed et al., “Lstm-based distributed conditional generative ad-

versarial network for data-driven 5g-enabled maritime uav communica-

tions,” IEEE Transactions on Intelligent Transportation Systems, 2022.

[31] D. Shome et al., “Federated learning and next generation wireless

communications: A survey on bidirectional relationship,” Transactions

on Emerging Telecommunications Technologies, vol. 33, no. 7, p. e4458,

2022.

[32] A. Mahmood, T. X. Vu, S. Chatzinotas, and B. Ottersten, “Joint

optimization of 3d placement and radio resource allocation for per-uav

DRAFT 16

sum rate maximization,” IEEE Transactions on Vehicular Technology,

2023.

[33] T. Hasan et al., “Securing industrial internet of things against botnet

attacks using hybrid deep learning approach,” IEEE Transactions on

Network Science and Engineering, 2022.

[34] J. Jiao, Y. Sun, S. Wu, Y. Wang, and Q. Zhang, “Network utility

maximization resource allocation for NOMA in satellite-based Internet

of Things,” IEEE Internet of Things Journal, vol. 7, no. 4, pp. 3230–

3242, 2020.

[35] A. Wang, L. Lei, E. Lagunas, A. I. P´

erez-Neira, S. Chatzinotas,

and B. Ottersten, “NOMA-enabled multi-beam satellite systems: Joint

optimization to overcome offered-requested data mismatches,” IEEE

Transactions on Vehicular Technology, vol. 70, no. 1, pp. 900–913, 2020.

[36] R. Ge, D. Bian, J. Cheng, K. An, J. Hu, and G. Li, “Joint user pairing

and power allocation for noma-based geo and leo satellite network,”

IEEE Access, vol. 9, pp. 93 255–93 266, 2021.

[37] R. Wang, W. Kang, G. Liu, R. Ma, and B. Li, “Admission control and

power allocation for NOMA-based satellite multi-beam network,” IEEE

Access, vol. 8, pp. 33 631–33 643, 2020.

[38] Z. Ji, S. Wu, C. Jiang, and W. Wang, “Popularity-driven content place-

ment and multi-hop delivery for terrestrial-satellite networks,” IEEE

Communications Letters, vol. 24, no. 11, pp. 2574–2578, 2020.

[39] E. Lagunas, L. Lei, S. Chatzinotas, and B. Ottersten, “Power and ﬂow

assignment for 5g integrated terrestrial-satellite backhaul networks,”

in 2019 IEEE Wireless Communications and Networking Conference

(WCNC). IEEE, 2019, pp. 1–6.

[40] M. Shaat, A. I. P´

erez-Neira, G. Femenias, and F. Riera-Palou, “Joint

frequency assignment and ﬂow control for hybrid terrestrial-satellite

backhauling networks,” in 2017 International Symposium on Wireless

Communication Systems (ISWCS). IEEE, 2017, pp. 293–298.

[41] Z. Gao, A. Liu, C. Han, and X. Liang, “Sum rate maximization of

massive mimo noma in leo satellite communication system,” IEEE

Wireless Communications Letters, vol. 10, no. 8, pp. 1667–1671, 2021.

[42] X. Liao, X. Hu, Z. Liu, S. Ma, L. Xu, X. Li, W. Wang, and F. M.

Ghannouchi, “Distributed intelligence: A veriﬁcation for multi-agent drl-

based multibeam satellite resource allocation,” IEEE Communications

Letters, vol. 24, no. 12, pp. 2785–2789, 2020.

[43] X. Hu, S. Liu, R. Chen, W. Wang, and C. Wang, “A deep reinforcement

learning-based framework for dynamic resource allocation in multibeam

satellite systems,” IEEE Communications Letters, vol. 22, no. 8, pp.

1612–1615, 2018.

[44] W. U. Khan, F. Jameel, N. Kumar, R. J¨

antti, and M. Guizani,

“Backscatter-enabled efﬁcient V2X communication with non-orthogonal

multiple access,” IEEE Transactions on Vehicular Technology, vol. 70,

no. 2, pp. 1724–1735, 2021.

[45] W. U. Khan, X. Li, M. Zeng, and O. A. Dobre, “Backscatter-enabled

NOMA for future 6G systems: A new optimization framework under

imperfect SIC,” IEEE Communications Letters, vol. 25, no. 5, pp. 1669–

1672, 2021.

[46] W. U. Khan, F. Jameel, X. Li, M. Bilal, and T. A. Tsiftsis, “Joint spec-

trum and energy optimization of NOMA-enabled small-cell networks

with QoS guarantee,” IEEE Transactions on Vehicular Technology,

vol. 70, no. 8, pp. 8337–8342, 2021.

[47] W. U. Khan, A. Ihsan, T. N. Nguyen, Z. Ali, and M. A. Javed,

“NOMA-enabled backscatter communications for green transportation in

automotive-industry 5.0,” IEEE Transactions on Industrial Informatics,

vol. 18, no. 11, pp. 7862–7874, 2022.

[48] W. U. Khan, E. Lagunas, A. Mahmood, Z. Ali, M. Asif, S. Chatzino-

tas, and B. Ottersten, “Integration of noma with reﬂecting intelligent

surfaces: A multi-cell optimization with sic decoding errors,” IEEE

Transactions on Green Communications and Networking, 2023.

[49] W. U. Khan, E. Lagunas, A. Mahmood, Z. Ali, S. Chatzinotas, B. Otter-

sten, and O. A. Dobre, “Integration of backscatter communication with

multi-cell noma: a spectral efﬁciency optimization under imperfect sic,”

in 2022 IEEE 27th International Workshop on Computer Aided Modeling

and Design of Communication Links and Networks (CAMAD). IEEE,

2022, pp. 147–152.

[50] H. Zhang, N. Yang, W. Huangfu, K. Long, and V. C. Leung, “Power

control based on deep reinforcement learning for spectrum sharing,”

IEEE Transactions on Wireless Communications, vol. 19, no. 6, pp.

4209–4219, 2020.

[51] G. Geraci, A. Garcia-Rodriguez, M. M. Azari, A. Lozano, M. Mezzav-

illa, S. Chatzinotas, Y. Chen, S. Rangan, and M. Di Renzo, “What will

the future of uav cellular communications be? a ﬂight from 5g to 6g,”

IEEE communications surveys & tutorials, vol. 24, no. 3, pp. 1304–1335,

2022.

[52] X. Lin, S. Rommer, S. Euler, E. A. Yavuz, and R. S. Karlsson,

“5g from space: An overview of 3gpp non-terrestrial networks,” IEEE

Communications Standards Magazine, vol. 5, no. 4, pp. 147–153, 2021.

[53] X. Lin, “An overview of 5g advanced evolution in 3gpp release 18,”

IEEE Communications Standards Magazine, vol. 6, no. 3, pp. 77–83,

2022.

[54] P. V. R. Ferreira, R. Paffenroth, A. M. Wyglinski, T. M. Hackett, S. G.

Bilen, R. C. Reinhart, and D. J. Mortensen, “Reinforcement learning

for satellite communications: From leo to deep space operations,” IEEE

Communications Magazine, vol. 57, no. 5, pp. 70–75, 2019.

[55] C. Zhong, M. C. Gursoy, and S. Velipasalar, “Deep reinforcement

learning-based edge caching in wireless networks,” IEEE Transactions

on Cognitive Communications and Networking, vol. 6, no. 1, pp. 48–61,

2020.

[56] Z. Zhang, H. Chen, M. Hua, C. Li, Y. Huang, and L. Yang, “Double

coded caching in ultra dense networks: Caching and multicast scheduling

via deep reinforcement learning,” IEEE Transactions on Communica-

tions, vol. 68, no. 2, pp. 1071–1086, 2019.

[57] Y. Qian, R. Wang, J. Wu, B. Tan, and H. Ren, “Reinforcement learning-

based optimal computing and caching in mobile edge network,” IEEE

Journal on Selected Areas in Communications, vol. 38, no. 10, pp. 2343–

2355, 2020.

[58] T. Zhang, Z. Wang, Y. Liu, W. Xu, and A. Nallanathan, “Joint resource,

deployment, and caching optimization for ar applications in dynamic

uav noma networks,” IEEE Transactions on Wireless Communications,

vol. 21, no. 5, pp. 3409–3422, 2021.

[59] ——, “Caching placement and resource allocation for cache-enabling

uav noma networks,” IEEE Transactions on Vehicular Technology,

vol. 69, no. 11, pp. 12 897–12 911, 2020.

[60] X. Li, H. Zhang, W. Li, and K. Long, “Multi-agent drl for user

association and power control in terrestrial-satellite network,” in 2021

IEEE Global Communications Conference (GLOBECOM). IEEE, 2021,

pp. 1–5.

[61] G. Geraci, D. L´

opez-P´

erez, M. Benzaghta, and S. Chatzinotas, “In-

tegrating terrestrial and non-terrestrial networks: 3d opportunities and

challenges,” IEEE Communications Magazine, vol. 61, no. 4, pp. 42–

48, 2023.

[62] W. U. Khan, J. Liu, F. Jameel, M. T. R. Khan, S. H. Ahmed, and

R. J¨

antti, “Secure backscatter communications in multi-cell NOMA

networks: Enabling link security for massive IoT networks,” in IEEE

INFOCOM 2020-IEEE Conference on Computer Communications Work-

shops (INFOCOM WKSHPS). IEEE, 2020, pp. 213–218.

[63] H. Jiang, X. Dai, Z. Xiao, and A. Iyengar, “Joint task ofﬂoading

and resource allocation for energy-constrained mobile edge computing,”

IEEE Transactions on Mobile Computing, vol. 22, no. 7, pp. 4000–4015,

July 2023.

[64] W. U. Khan, X. Li, A. Ihsan, M. A. Khan, V. G. Menon, and M. Ahmed,

“NOMA-enabled optimization framework for next-generation small-cell

IoV networks under imperfect SIC decoding,” IEEE Transactions on

Intelligent Transportation Systems, vol. 23, no. 11, pp. 22 442–22 451,

2021.

[65] Q. Liu, R. Liu, Z. Wang, and J. S. Thompson, “Uav swarm-enabled

localization in isolated region: a rigidity-constrained deployment per-

spective,” IEEE Wireless Communications Letters, vol. 10, no. 9, pp.

2032–2036, 2021.

[66] Q. Liu, R. Liu, Z. Wang, L. Han, and J. S. Thompson, “A v2x-integrated

positioning methodology in ultradense networks,” IEEE Internet of

Things Journal, vol. 8, no. 23, pp. 17 014–17 028, 2021.

[67] Q. Liu, R. Liu, Y. Zhang, Y. Yuan, Z. Wang, H. Yang, L. Ye, M. Guizani,

and J. S. Thompson, “Management of positioning functions in cellular

networks for time-sensitive transportation applications,” IEEE Transac-

tions on Intelligent Transportation Systems, 2023.

[68] B. Cao, J. Zhang, X. Liu, Z. Sun, W. Cao, R. M. Nowak, and

Z. Lv, “Edge–cloud resource scheduling in space–air–ground-integrated

networks for internet of vehicles,” IEEE Internet of Things Journal,

vol. 9, no. 8, pp. 5765–5772, April 2022.

[69] W. U. Khan, Z. Ali, E. Lagunas, S. Chatzinotas, and B. Ottersten,

“Rate Splitting Multiple Access for Cognitive Radio GEO-LEO Co-

Existing Satellite Networks,” in GLOBECOM 2022-2022 IEEE Global

Communications Conference. IEEE, 2022, pp. 5165–5170.

[70] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath,

“Deep reinforcement learning: A brief survey,” IEEE Signal Processing

Magazine, vol. 34, no. 6, pp. 26–38, 2017.

ResearchGate has not been able to resolve any citations for this publication.

Joint Optimization of 3D Placement and Radio Resource Allocation for Per-UAV Sum Rate Maximization

Article

Full-text available

Oct 2023

Unmanned aerial vehicles (UAV) have emerged as a practical solution that provides on-demand services to users in areas where the terrestrial network is non-existent or temporarily unavailable, e.g., due to natural disasters or network congestion. In general, UAVs' user-serving capacity is typically constrained by their limited battery life and the finite communication resources that highly impact their performance. This work considers the orthogonal frequency division multiple access (OFDMA) enabled multiple unmanned aerial vehicles (multi-UAV) communication systems to provide on-demand services. The main aim of this work is to derive an efficient technique for the allocation of radio resources, 3D placement of UAVs, and user association matrices. To achieve the desired objectives, we decoupled the original joint optimization problem into two sub-problems: (i) 3D placement and user association and (ii) sum-rate maximization for optimal radio resource allocation, which are solved iteratively. The proposed iterative algorithm is shown via numerical results to achieve fast convergence speed after fewer than 10 iterations. The benefits of the proposed design are demonstrated via superior sum-rate performance compared to existing reference designs. Moreover, results showed that the optimal power and sub-carrier allocation help to mitigate the inter-cell interference that directly impacts the system's performance.

Minimizing Energy Consumption for NOMA Multi-Drone Communications in Automotive-Industry 5.0

Article

Full-text available

Apr 2023

The forthcoming era of the automotive industry, known as Automotive-Industry 5.0, will leverage the latest advancements in 6G communications technology to enable reliable, computationally advanced, and energy-efficient exchange of data between diverse onboard sensors, drones and other vehicles. We propose a non-orthogonal multiple access (NOMA) multi-drone communications network in order to address the requirements of enormous connections, various quality of services (QoS), ultra-reliability, and low latency in upcoming sixth-generation (6G) drone communications. Through the use of a power optimization framework, one of our goals is to evaluate the energy efficiency of the system. In particular, we define a non-convex power optimization problem while considering the possibility of imperfect successive interference cancellation (SIC) detection. Therefore, the goal is to reduce the total energy consumption of NOMA drone communications while guaranteeing the lowest possible rate for wireless devices. We use a novel method based on iterative sequential quadratic programming (SQP) to get the best possible solution to the non-convex optimization problem so that we may move on to the next step and solve it. The standard OMA framework, the Karush–Kuhn–Tucker (KKT)-based NOMA framework, and the average power NOMA framework are compared with the newly proposed optimization framework. The results of the Monte Carlo simulation demonstrate the accuracy of our derivations. The results that have been presented also demonstrate that the optimization framework that has been proposed is superior to previous benchmark frameworks in terms of system-achievable energy efficiency.

RIS-Assisted Energy-Efficient LEO Satellite Communications With NOMA

Article

Jan 2023

Low Earth Orbit (LEO) satellite networks are expected to play a crucial role in providing high-speed internet access and low-latency communication worldwide. However, some challenges can affect the performance of LEO satellite networks. For example, they can face energy and spectral efficiency challenges, such as high power consumption and spectral congestion, due to the increasing number of satellites. Furthermore, mobile ground users tend to operate with low directive antennas, which pose significant challenges in closing the LEO-to-ground communication link, especially when operating at a high-frequency range. To overcome these challenges, energy-efficient technologies like reconfigurable intelligent surfaces (RIS) and advanced spectrum management techniques like non-orthogonal multiple access (NOMA) can be employed. RIS can improve signal quality and reduce power consumption, while NOMA can enhance spectral efficiency by sharing the same resources among multiple users. This paper proposes an energy-efficient RIS-assisted downlink NOMA communication for LEO satellite networks while ensuring the quality of services. The proposed framework simultaneously optimizes the NOMA transmit power of the LEO satellite and the passive beamforming of RIS, considering the assumption of imperfect successive interference cancellation. Due to the nature of the considered system and optimization variables, the energy efficiency maximization problem is non-convex. In practice, obtaining the optimal solution for such problems is very challenging. Therefore, we adopt alternating optimization methods to handle the joint optimization in two steps. In step 1, for any given phase shift vector, we calculate satellite transmit power towards each ground terminal using the Lagrangian dual method. Then, in step 2, given the transmit power, we design passive beamforming for RIS by solving the semi-definite programming. We also compare our solution with a benchmark framework having a fixed phase shift design and a conventional NOMA framework without involving RIS. Numerical results show that the proposed optimization framework achieves 21.47% and 54.9% higher energy efficiency compared to the benchmark and conventional frameworks.

Vehicular Communication Network Enabled CAV Data Offloading: A Review

Article

Aug 2023

The connected and autonomous vehicles (CAV) applications and services-based traffic make an extra burden on the already congested cellular networks. Offloading is envisioned as a promising solution to tackle cellular networks’ traffic explosion problem. Notably, vehicular traffic offloading leveraging different vehicular communication network (VCN) modes is one of the potential techniques to address the data traffic problem in cellular networks. This paper surveys the state-of-the-art literature for vehicular data offloading under a communication perspective, i.e., vehicle to vehicle (V2V), vehicle to roadside infrastructure (V2I), and vehicle to everything (V2X). First, we pinpoint the significant classification of vehicular data/traffic offloading techniques, considering whether data is to download or upload. Next, for better intuition of each data offloading’s category, we sub-classify the existing schemes based on their objectives. Then, the existing literature on vehicular data/traffic is elaborated, compared, and analyzed based on approaches, objectives, merits, demerits, etc. Finally, we highlight the open research challenges in this field and predict future research trends.

Rate Splitting Multiple Access for Next Generation Cognitive Radio Enabled LEO Satellite Networks

Article

Nov 2023

Low Earth Orbit (LEO) satellite communication (SatCom) has drawn particular attention recently due to its high data rate services and low round-trip latency. It has low launching and manufacturing costs than Medium Earth Orbit (MEO) and Geostationary Earth Orbit (GEO) satellites. Moreover, LEO SatCom has the potential to provide global coverage with a high-speed data rate and low transmission latency. However, the spectrum scarcity might be one of the challenges in the growth of LEO satellites, impacting severe restrictions on developing ground-space integrated networks. To address this issue, cognitive radio and rate splitting multiple access (RSMA) are the two emerging technologies for high spectral efficiency and massive connectivity. This paper proposes a cognitive radio enabled LEO SatCom using RSMA radio access technique with the coexistence of GEO SatCom network. In particular, this work aims to maximize the sum rate of LEO SatCom by simultaneously optimizing the power budget over different beams, RSMA power allocation for users over each beam, and subcarrier user assignment while restricting the interference temperature to GEO SatCom. The problem of sum rate maximization is formulated as non-convex, where the global optimal solution is challenging to obtain. Thus, an efficient solution can be obtained in three steps: first we employ a successive convex approximation technique to reduce the complexity and make the problem more tractable. Second, for any given resource block user assignment, we adopt KarushKuhnTucker (KKT) conditions to calculate the transmit power over different beams and RSMA power allocation of users over each beam. Third, using the allocated power, we design an efficient algorithm based on the greedy approach for resource block user assignment. For comparison, we propose two suboptimal schemes with fixed power allocation over different beams and random resource block user assignment as the benchmark. Numerical results provided in this work are obtained based on the Monte Carlo simulations, which demonstrate the benefits of the proposed optimization scheme compared to the benchmark schemes.

Integration of NOMA with Reflecting Intelligent Surfaces: A Multi-cell Optimization with SIC Decoding Errors

Article

Sep 2023

Reflecting intelligent surfaces (RIS) has gained significant attention due to its high energy and spectral efficiency in next-generation wireless networks. By using low-cost passive reflecting elements, RIS can smartly reconfigure the signal propagation to extend the wireless communication coverage. On the other hand, non-orthogonal multiple access (NOMA) has been proven as a key air interface technique for supporting massive connections over limited resources. Utilizing the superposition coding and successive interference cancellation (SIC) techniques, NOMA can multiplex multiple users over the same spectrum and time resources by allocating different power levels. This paper proposes a new optimization scheme in a multi-cell RIS-NOMA network to enhance the spectral efficiency under SIC decoding errors. In particular, the power budget of the base station and the transmit power of NOMA users while the passive beamforming of RIS is simultaneously optimized in each cell. Due to objective function and quality of service constraints, the joint problem is formulated as non-convex, which is very complex and challenging to obtain the optimal global solution. To reduce the complexity and make the problem tractable, we first decouple the original problem into two sub-problems for power allocation and passive beamforming. Then, the efficient solution of each sub-problem is obtained in two-steps. In the first-step of For power allocation sub-problem, we transform it to a convex problem by the inner approximation method and then solve it through a standard convex optimization solver in the second-step. Accordingly, in the first-step of passive beamforming, it is transformed into a standard semi-definite programming problem by successive convex approximation and different of convex programming methods. Then, penalty based method is used to achieve a Rank-1 solution for passive beamforming in second-step. Numerical results demonstrate the benefits of the proposed optimization scheme in the multi-cell RIS-NOMA network.

Management of Positioning Functions in Cellular Networks for Time-Sensitive Transportation Applications

Article

Nov 2023

Device positioning has generally been recognized as an enabling technology for numerous vehicular applications in intelligent transportation systems (ITS). The downlink time difference of arrival (DL-TDOA) technique in cellular networks requires range information of geographically diverse base stations (BSs) to be measured by user equipment (UE) through the positioning reference signal (PRS). However, inter-cell interference from surrounding BSs can be particularly serious under poor network planning or dense deployments. This may lead to a relatively longer measurement time to locate the UE, causing an unacceptable location update rate to time-sensitive applications. In this case, PRS muting of certain wireless resources has been envisioned as a promising solution to increase the detectability of a weak BS. In this paper, to reduce UE measurement latency while ensuring high location accuracy, we propose a muting strategy managed by positioning functions that utilizes a combination of optimized pseudo-random sequences (CO-PRS) for multiple BSs to coordinate the muting of PRS resources. The original sequence is first truncated according to the muting period, and a modified greedy selection is performed to form a set of control sequences as the muting configurations (MC) with balance and concurrency constraints. Moreover, efficient information exchange can be achieved with the seeds used for regenerating the MC. Extensive simulations demonstrate that the proposed scheme outperforms the conventional random and ideal muting benchmarks in terms of measurement latency by about 30%, especially when dealing with severe near-far problems in cellular networks.

Rate Splitting Multiple Access for Cognitive Radio GEO-LEO Co-Existing Satellite Networks

Conference Paper

Dec 2022

Integrating Terrestrial and Non-Terrestrial Networks: 3D Opportunities and Challenges

Article

Jan 2022

Integrating terrestrial and non-terrestrial networks has the potential of connecting the unconnected and enhancing the user experience for the already-connected, with technological and societal implications of the greatest long-term significance. A convergence of ground, air, and space wireless communications also represents a formidable endeavor for the mobile and satellite communications industries alike, as it entails defining and intelligently orchestrating a new 3D wireless network architecture. In this article, we present the key opportunities and challenges arising from this revolution by presenting some of its disruptive use cases and key building blocks, reviewing the relevant standardization activities, and pointing to open research problems. By considering two multi-operator paradigms, we also showcase how terrestrial networks could be efficiently re-engineered to cater for aerial services, or opportunistically complemented by nonterrestrial infrastructure to augment their current capabilities.

Integration of Backscatter Communication with Multi-cell NOMA: A Spectral Efficiency Optimization under Imperfect SIC

Conference Paper

Nov 2022

Dynamic Resource Management in Integrated NOMA Terrestrial-Satellite Networks using Multi-Agent Reinforcement Learning

Abstract and Figures

Recommended publications

Dynamic resource management in integrated NOMA terrestrial-satellite networks using multi-agent rein...

Multi-Agent DRL for Resource Allocation and Cache Design in Terrestrial-Satellite Networks

Efficient Resource Allocation and User Association in NOMA-Enabled Vehicular-Aided HetNets with High...

Minimizing Energy Consumption for NOMA Multi-Drone Communications in Automotive-Industry 5.0