PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

The integration of terrestrial and satellite wireless communication networks offers a practical solution to enhance network coverage, connectivity, and cost-effectiveness. Moreover, in today's interconnected world, connectivity's reliable and widespread availability is increasingly important across various domains. This is especially more crucial for applications like the Internet of Things (IoT), remote sensing, disaster management, and bridging the digital divide. However, allocating the limited network resources efficiently and ensuring seamless handover between satellite and terrestrial networks present significant challenges. Therefore, this study introduces a resource allocation framework for integrated satellite-terrestrial networks to address these challenges. The framework leverages local cache pool deployments and non-orthogonal multiple access (NOMA) to reduce time delays and improve energy efficiency. Our proposed approach utilizes a multi-agent enabled deep deterministic policy gradient algorithm (MADDPG) to optimize user association, cache design, and transmission power control, resulting in enhanced energy efficiency. The approach comprises two phases: User Association and Power Control, where users are treated as agents, and Cache Optimization, where the satellite (Bs) is considered the agent. Through extensive simulations, we demonstrate that our approach surpasses conventional single-agent deep reinforcement learning algorithms in addressing cache design and resource allocation challenges in integrated terrestrial-satellite networks. Specifically, our proposed approach achieves significantly higher energy efficiency and reduced time delays compared to existing methods. This research highlights the importance and addresses the need for efficient resource allocation and cache design in integrated terrestrial-satellite networks, paving the way for enhanced connectivity and improved network performance in various applications.
Content may be subject to copyright.
DRAFT 1
Dynamic Resource Management in Integrated
NOMA Terrestrial-Satellite Networks using
Multi-Agent Reinforcement Learning
Ali Nauman, Haya Mesfer Alshahrani, Nadhem Nemri, Kamal M. Othman, Nojood O Aljehane,
Mashael Maashi, Ashit Kumar Dutta, Mohammed Assiri, Wali Ullah Khan
Abstract—The integration of terrestrial and satellite wireless
communication networks offers a practical solution to enhance
network coverage, connectivity, and cost-effectiveness. Moreover,
in today’s interconnected world, connectivity’s reliable and
widespread availability is increasingly important across various
domains. This is especially more crucial for applications like the
Internet of Things (IoT), remote sensing, disaster management,
and bridging the digital divide. However, allocating the limited
network resources efficiently and ensuring seamless handover
between satellite and terrestrial networks present significant
challenges. Therefore, this study introduces a resource allocation
framework for integrated satellite-terrestrial networks to address
these challenges. The framework leverages local cache pool
deployments and non-orthogonal multiple access (NOMA) to
reduce time delays and improve energy efficiency. Our proposed
approach utilizes a multi-agent enabled deep deterministic policy
gradient algorithm (MADDPG) to optimize user association,
cache design, and transmission power control, resulting in en-
Acknowledgement: The authors extend their appreciation to the Dean-
ship of Scientific Research at King Khalid University for funding this
work through large group Research Project under grant number (RGP2/
02/44). Princess Nourah bint Abdulrahman University Researchers Support-
ing Project number (PNURSP2023R237), Princess Nourah bint Abdulrah-
man University, Riyadh, Saudi Arabia. Research Supporting Project number
(RSPD2023R787), King Saud University, Riyadh, Saudi Arabia. This study
is supported via funding from Prince Sattam bin Abdulaziz University project
number (PSAU/2023/R/1444).
Ali Nauman is with the Department of Information and Communica-
tion Engineering, Yeungnam University, Republic of Korea (email: anau-
man@ynu.ac.kr)
Haya Mesfer Alshahrani is with the Department of Information Systems,
College of Computer and Information Sciences, Princess Nourah Bint Ab-
dulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia (email:
hmalshahrani@pnu.edu.sa).
Nadhem Nemri is with the Department of Information Systems, College
of Science & Art at Mahayil, King Khalid University, Saudi Arabia (email:
nnemri@kku.edu.sa).
Kamal M. Othman is with Department of Electrical Engineering, College
of Engineering, Umm Al-Qura University, Makkah, Saudi Arabia. (email:
kmothman@uqu.edu.sa)
Nojood O Aljehane is with the Department of Computer Science, Faculty
of Computers and Information Technology, University of Tabuk, Tabuk, Saudi
Arabia. (Email: naljohani@ut.edu.sa)
Mashael Maashi is with the Department of Software Engineering, College
of Computer and Information Sciences,King Saud University, Po Box 103786,
Riyadh 11543, Saudi Arabia. (email: mrbwesabi@gmail.com)
Ashit Kumar Dutta is with the Department of Computer Science and
Information System, College of Applied Sciences, AlMaarefa University,
Riyadh 11597, Saudi Arabia. (email: adota@mcst.edu.sa).
Mohammed Assiri is with the Department of Computer Science, College
of Sciences and Humanities- Aflaj, Prince Sattam bin Abdulaziz University,
Aflaj 16273, Saudi Arabia (email: meo.nrmo@gmail.com).
Wali Ullah Khan is with the Interdisciplinary Center for Security, Relia-
bility and Trust (SnT), University of Luxembourg, 1855 Luxembourg City,
Luxembourg (email: waliullah.khan@uni.lu).
Corresponding author: Ali Nauman (email: anauman@ynu.ac.kr)
hanced energy efficiency. The approach comprises two phases:
User Association and Power Control, where users are treated
as agents, and Cache Optimization, where the satellite (Bs) is
considered the agent. Through extensive simulations, we demon-
strate that our approach surpasses conventional single-agent deep
reinforcement learning algorithms in addressing cache design and
resource allocation challenges in integrated terrestrial-satellite
networks. Specifically, our proposed approach achieves signifi-
cantly higher energy efficiency and reduced time delays compared
to existing methods. This research highlights the importance and
addresses the need for efficient resource allocation and cache
design in integrated terrestrial-satellite networks, paving the way
for enhanced connectivity and improved network performance in
various applications.
Index Terms—Satellite-terrestrial networks, non-orthogonal
multiple access, resource optimization, interference management.
I. INTRODUCTION
The upcoming sixth-generation (6G) communications tech-
nologies and networks are intended to provide fast connectivity
all over the world [1], [2]. This network will provide ultra-high
data rate, very low latency and information security [3]. This
can be achieved by exploring new sustainable frameworks and
solutions. Integrated terrestrial and non-terrestrial networks
represent a fusion of ground-based infrastructures, such as
fibre-optic cables and cell towers, with non-terrestrial systems
like satellites and drones, to establish seamless and dependable
connectivity [4], [5]. These networks bring forth numerous
benefits, including expanded coverage, enhanced redundancy,
and improved resilience during natural disasters or other
disruptions [6]. As the demand for high-speed and reliable
connectivity continues to grow in the context of 6G networks
[7], the integration of terrestrial and non-terrestrial networks
is gaining paramount importance [8]. This integration enables
greater flexibility and efficiency in data transmission, facilitat-
ing improved accessibility to information for individuals and
devices in remote or inaccessible locations [9].
Non-orthogonal multiple access (NOMA) technologies, uti-
lizing power domain multiplexing, have recently emerged as
a promising candidate for forthcoming 6G networks [10].
This technology has demonstrated significant potential for
enhancing energy efficiency, accommodating a larger number
of concurrent users, and reducing latency, as validated by
recent studies [11], [12]. However, the implementation of
NOMA poses inherent complexities and several challenges
arXiv:2310.11814v1 [eess.SP] 18 Oct 2023
DRAFT 2
that need to be addressed [13]. These challenges encompass
the requirement for advanced signal processing techniques,
the development of efficient power allocation algorithms [14],
and the effective management of interference among users [6]
sharing the same resources [15]. In response, researchers and
industry professionals are actively exploring novel techniques
and strategies to overcome these challenges and realize the
full potential of NOMA [16].
The integration of terrestrial-satellite networks plays a piv-
otal role in the development of the emerging 6G system,
with NOMA protocols frequently employed in this context, as
highlighted in recent studies [15]. This network architecture
enables the provision of cost-effective communication services
to both terrestrial base stations (BSs) and remote areas covered
by satellites, resulting in an expanded coverage area and
improved service quality requirements [17], [18]. However,
due to the limited availability of resources, practical resource
allocation methods are essential to enhance the system’s
energy efficiency and service quality [19]–[21].
Integrated terrestrial satellite communication networks face
a significant challenge in the form of bottlenecks, which can
negatively impact service quality for specific users [22]. To
address this challenge, deploying cache pools for the system’s
base stations (BSs) has emerged as a promising solution [23].
Cache pools help reduce the amount of data that needs to be
transmitted across the network, thereby alleviating congestion,
improving overall performance, enabling efficient file retrieval,
and reducing time delays [24]. However, effectively utilizing
cache pools necessitates additional storage capacity and care-
ful management of the caches.
Another key challenge in integrating terrestrial and satellite
networks is ensuring seamless and efficient operation without
disruptions or delays [25]. This requires meticulous coor-
dination and management of the various systems involved.
Advanced technologies, including artificial intelligence [26]
and machine learning, play a crucial role in optimizing the
performance of integrated networks [27]. These technologies
enable intelligent decision-making, resource allocation [28],
and network optimization [29], leading to enhanced system
efficiency and robustness [30]. By leveraging these advanced
techniques, the integration of terrestrial and satellite networks
can achieve optimal performance while delivering reliable and
uninterrupted connectivity [31].
In order to improve network performance in integrated
terrestrial-to-satellite communication networks, this paper
presents a cache-enabled downlink framework that is specif-
ically made for NOMA-based systems. To increase overall
network efficiency, the framework optimizes user association
[32], transmission power control, and caching placement.
To tackle this optimization problem, the proposed approach
employs a state-of-the-art and highly efficient multi-agent-
enabled deep reinforcement-based learning mechanism. The
effectiveness of the proposed method is demonstrated through
a comprehensive comparison with benchmark algorithms,
showcasing its superior performance in optimizing the given
problem. By leveraging advanced deep reinforcement learning
techniques [33], the proposed approach introduces a novel
and innovative solution for addressing complex optimization
challenges in integrated networks. The study’s contributions
and results are extensively discussed in the subsequent sec-
tions, providing a practical and viable solution for managing
and allocating resources in hybrid networks that integrate
both terrestrial and satellite infrastructures. By incorporating
caching capabilities into the framework and optimizing various
network parameters, the proposed approach aims to improve
overall network performance, reduce congestion, enhance data
retrieval efficiency, and minimize transmission delays. This re-
search contributes to the advancement of integrated terrestrial-
to-satellite communication networks by introducing an inno-
vative methodology and showcasing its effectiveness through
rigorous evaluation and comparison with existing algorithms.
A. Recent Advances (Academia)
In the last couple of years, NOMA has been extensively
investigated in different terrestrial and non-terrestrial net-
works. For example, in the field of backscatter-enabled multi-
roadside unit vehicular-to-everything communications, authors
[44] proposed NOMA to enhance the spectral efficiency
of the system through optimal resource allocation. In the
context of satellite networks, researchers in [34] focused
on optimizing the system resources to investigate the long-
term utility of NOMA-enabled satellite networks. Similarly,
authors [45] optimized the power allocation and reflection
coefficient in multi-user NOMA networks to maximize the
sum capacity, even under imperfect successive interference
cancellation (SIC) decoding. NOMA has also been employed
in satellite networks to mitigate interference and improve sys-
tem fairness. For instance, in [35], the authors utilized NOMA
techniques to achieve interference mitigation and enhance the
max-min fairness of the system. In another study, authors [36]
proposed a joint user pairing and power allocation scheme
for NOMA-enabled satellite networks, aiming to maximize
the sum capacity of the system. Moreover, the energy and
spectrum optimization in NOMA-enabled small-cell networks
have been addressed by authors [46] using a multi-objective
power allocation approach. Additionally, the admission con-
trol problem in NOMA-enabled satellite networks has been
investigated in [37] to enhance the supported users while
guaranteeing the quality of services. More recently, authors
[47] have explored the potential of NOMA-enabled backscatter
communications in Industry 5.0.
In recent years, resource allocation in hybrid terrestrial-
satellite networks has been the subject of numerous stud-
ies. One approach proposed in the literature, such as [15],
focused on utilizing precoding techniques for optimization
purposes. Another study conducted by [38] explored data
placement and delivery strategies to minimize the number
of hops required. The integration of terrestrial and satellite
technologies in wireless backhaul networks has also received
significant attention. Researchers, as exemplified in [39], have
analyzed the impact of cross-layer design on link scheduling,
flow control, and frequency assignment. To address challenges
related to power and flow assignment, [40] proposed the use of
convex relaxation techniques. Similarly, [48], [49] employed
successive convex approximation to transform a non-convex
DRAFT 3
TABLE I: Comparison of the proposed work with existing related works in academia.
Ref. System model Satellite(s) Work Objective OMA/NOMA AI/non-AI Proposed Solution
[34] Satellite network Single maximize the long-term net-
work utility NOMA Non-AI
Lyapunov optimization framework, the Karush-
Kuhn-Tucker conditions, and the particle swarm
optimization algorithm
[35] Satellite network Single
improve the worst overall
channel throughput rate
(OCTR)
NOMA Non-AI Heuristic approach for joint Power, decoding-
Order, and time slot optimization
[36] Satellite network Multiple maximize the sum rate and
achieve fairness NOMA Non-AI SCA for joint user pairing and power allocation
[37] Satellite network Single Maximize number of user NOMA Non-AI Matching theory for channel and power allocation
[15]
Integrated
terrestrial-satellite
network
Single Maximizing system capacity NOMA Non-AI ZF-beamforming at BS while SCA and dual
method for power allocation
[38]
Integrated
terrestrial-satellite
network
Single Minimizing path-length and
maximizing the throughput OMA Non-AI Lagrangian dual method for resource allocation
[39]
Integrated
terrestrial-satellite
network
Single Maximizing network through-
put OMA Non-AI Interior point method for power and flow assign-
ment
[40]
Integrated
terrestrial-satellite
network
Single Maximizing the core network
traffic OMA Non-AI Estimations based method for resource allocation
[41] Satellite network Single Maximizing sum rate NOMA Non-AI Precoding vector design and first-order Taylor
expansion for iterative power allocation
[42] Satellite network Single Maximizing the transmission
efficiency OMA AI DRL for dynamic resource allocation
[43] Satellite network Single Maximizing the expected
long-term resource utilization NOMA AI DRL for dynamic resource allocation
[Our]
Integrated
terrestrial-satellite
network
Multiple
Reducing time delay and
maximizing overall energy ef-
ficiency
NOMA AI
Adopting MADDPG algorithm for optimizing user
association, cache design, and transmission power
control.
optimization problem into a convex one, aiming to improve
the security rate for users with inadequate channel state
information. Furthermore, in the context of satellite ground
fusion networks, authors in [41] proposed a NOMA-based
resource allocation scheme. This scheme optimized resource
allocation by grouping users into clusters and employing
an iterative beamforming algorithm with a penalty function.
Despite the development of traditional techniques for optimal
resource allocation in hybrid terrestrial-satellite communica-
tion networks, the dynamic nature of the environment poses
significant challenges. Predicting users’ needs for cached files
is difficult, and the integrated terrestrial-satellite environment
is inherently unstable. Additionally, the optimization problem
space has various limitations, making formulating an appro-
priate mathematical model challenging.
In order to tackle these challenges, researchers have ex-
plored the application of deep reinforcement learning (DRL)
algorithms for optimal resource allocation and cache design
in integrated terrestrial-to-satellite communication networks.
DRL has shown promise in addressing optimization problems
characterized by high unpredictability. Several studies have
proposed cooperative multi-agent deep reinforcement learn-
ing (CMDRL) frameworks for radio resource management
strategies in integrated networks. For instance, in [25], a
deep Q-network (DQN) was utilized to improve user access,
while [42] suggested using DQNs to formulate radio resource
management plans. In the domain of cognitive radio settings,
[50] employed various DRL techniques to regulate power.
B. Recent Advances (Industry/Standardization)
Standardization for terrestrial-satellite communications
within 3GPP began in 2017 [52]. This standardization effort
can be categorized into two primary domains: enhancements
for non-terrestrial networks and enhancements for terrestrial
networks. The former seeks to establish a global standard for
future satellite-based communications, stimulating significant
growth in the satellite industry. Activities within the latter
domain serve a dual purpose, ensuring that mobile standards
align with the connectivity requirements for safe operation on
non-terrestrial platforms. The goals and outcomes of 3GPP’s
work spanning from Rel-15 through Rel-17, as well as the
currently under investigation topics for Rel-18, are detailed
and summarized in Table II.
In the terminology of 3GPP, terrestrial-satellite networks re-
fer to the utilization of satellites or High Altitude Platform Sta-
tions (HAPS) to provide connectivity services, particularly in
remote areas where traditional cellular coverage is lacking. In
the Rel-17, 3GPP introduced a foundational set of features to
facilitate next-generation spectrum operation over terrestrial-
satellite networks within the frequency range of FR1, which
covers frequencies up to 7.125 GHz. In the upcoming Rel-18,
3GPP aims to further enhance next-generation operations in
terrestrial-satellite contexts. This enhancement will include im-
proving coverage for handheld devices, exploring deployments
in frequency bands exceeding 10 GHz, addressing mobility
challenges, ensuring seamless service continuity between ter-
restrial and non-terrestrial networks, and examining regulatory
requirements for verifying user locations within the network
[53].
Back in 3GPP’s Rel-15, support for non-terrestrial plat-
forms within the previous network generation was first in-
troduced. This encompassed various elements, including im-
plementing signaling procedures for identifying non-terrestrial
users through subscription-based methods. Additionally, mech-
anisms were established for reporting critical non-terrestrial
platform parameters such as height, location, speed, and
DRAFT 4
TABLE II: 3GPP standardization works on terrestrial-satellite networks. [51]–[53]
Release Advance in terrestrial-satellite networks
Rel-15
Rel-15 focuses on New Radio, which is proposed for the support of terrestrial-satellite networks [TR 38,811]. It also identifies
relevant use case scenarios for terrestrial-satellite networks and spectrum integration, such as S-band and Ka-band. Moreover, it also
defines the footprint size, angle of evaluation, beam configuration, and antenna design. Further, this release specifies the channel
propagation model [TR 38.901].
Rel-16
This release proposes solutions for new radio in terrestrial-satellite networks [TR 38.821]. It focuses on FR1 bands in terrestrial-
satellite networks to support the Internet of Things (IoT). Moreover, it identifies the changes required in the physical layer and
other layers while the assumptions are in system-level simulations. Besides that, this release also studies resource optimization’s
impact on terrestrial-satellite networks’ performance. Furthermore, it incorporates the access of terrestrial-satellite networks in
next-generation communications, as mentioned in [TR 22.822], for delivering various services.
Rel-17
Rel-17 discusses the support of narrowband IoT and machine-type communication in terrestrial-satellite, mentioned in [TR 36.763]. It
is primarily tailored to the specific demands of IoT applications. In the context of 6G, significant attention has been directed towards
the architectural considerations for satellite access as delineated in [TR 23.737]. This undertaking encompasses enhancements
across multiple facets, including refinements in radio frequency and physical layer parameters, protocol optimizations, and the
more effective management of radio resources. Moreover, it involves the identification of an apt architectural framework, resolving
issues about integrated-satellite roaming, and augmentation of conditional handover procedures.
Rel-18
The terrestrial-satellite enhancements will examine the system coverage for practical handheld devices and access beyond 10 GHz
for stationary and mobile platforms. The research will explore the prerequisites for network-validated user positioning and tackle
issues related to mobility and the seamless continuity of services as users transition between terrestrial and satellite networks and
different non-terrestrial networks.
flight path. New measurement reports were also introduced
to effectively manage non-terrestrial interference, particularly
in scenarios involving a specific density of low-altitude non-
terrestrial platforms.
In subsequent releases, 3GPP’s focus extended to address
the needs of connected non-terrestrial platforms at the applica-
tion layer while strongly emphasizing security considerations.
These releases also laid the foundation for defining how non-
terrestrial platforms interact with the Traffic Management sys-
tem, enabling coordinated and secure non-terrestrial platform
operations within the network. As next-generation use cases
evolve, 3GPP’s Rel-18 is set to introduce dedicated next-
generation spectrum support explicitly tailored for devices
operating onboard aerial vehicles. This development will in-
volve exploring additional triggers for conditional handover,
using BS uptilting techniques to improve communication,
and implementing signaling mechanisms to indicate non-
terrestrial platform beamforming capabilities, among other
enhancements [53].
C. Motivation
DRL has emerged as a promising approach for addressing
resource allocation and cache design challenges in satellite
scenarios in recent years. It has been successfully applied
to optimize resource allocation for throughput and bandwidth
in hybrid terrestrial-to-satellite communication networks [54]
and to allocate resources in multi-beam satellite communica-
tion systems [43]. DRL has also been utilized in cognitive
satellite scenarios for multi-objective optimization [55] and
task scheduling [56]. The use of DRL has proven beneficial
in cache design as well. Actor-critic frameworks have been
employed for edge caching scenarios [56], and Q-learning and
DQN with value function approximation have been applied
to address joint optimization problems for base station and
user caching [57]. DRL-based algorithms have also been
utilized for resource distribution and cache placement. For
example, Zhang et al. proposed a DRL algorithm in [58]
to simultaneously optimize user association, NOMA power
allocation, UAV deployment, and UAV cache placement to
minimize content delivery time. Additionally, a Q-learning-
based algorithm was presented in [59] for resource allocation
and cache placement.
However, it is important to note that existing single-agent
DRL algorithms have limitations when dealing with a large
number of agents in dynamic and unpredictable environments.
Moreover, as the number of agents increases, the optimal
allocation of resources and cache design become increasingly
complex. Despite this complexity, this particular topic has not
been thoroughly investigated. Building upon the preliminary
findings presented in [60], this study aims to explore the issue
in greater depth using three distinct approaches.
New Cache Architecture: The study proposes a
novel cache architecture specifically designed for hybrid
satellite-based networks. This new architecture aims to
optimize cache utilization and enhance overall network
performance.
Agent-Based Modelling: The research employs an
agent-based modelling approach to represent users, base
stations, and satellites within the network. By using this
framework, the study investigates optimal resource allo-
cation strategies and cache design for improved network
performance.
Simulation and Evaluation: The proposed methods are
evaluated through simulations from various perspectives.
The simulation results provide insights into the effective-
ness and efficiency of the new cache architecture and the
agent-based modelling approach.
By combining these three approaches, the study aims to
advance our understanding of optimal resource allocation and
cache design in multi-agent-enabled reinforcement learning-
based integrated terrestrial-satellite networks. The results ob-
tained from the simulations will contribute to the valida-
tion and evaluation of the proposed methods. In contrast to
widely used deep reinforcement learning algorithms such as
Deep Deterministic Policy Gradient (DDPG), Random Policy
algorithm, Genetic Algorithm (GA), and Proximal Policy
Optimization (PPO), our proposed approach incorporates three
key differentiating factors:
DRAFT 5
Multi-Agent Framework: Our approach employs a
multi-agent framework, which is specifically tailored to
address the challenges of resource allocation and cache
design in integrated terrestrial-satellite networks. This
framework enables the modeling of multiple interacting
agents, including users, base stations, and satellites, al-
lowing for more realistic representation of the network
dynamics and interactions.
New Cache Architecture: The proposed optimization
scheme introduces a new cache architecture for hybrid
terrestrial-satellite networks. The objective of this cache
architecture is to optimize cache utilization and improve
the network performance. Adopting this innovative cache
design, the proposed optimization scheme tackles the
unique cache-related challenges in integrated terrestrial-
satellite networks.
Simulation and Evaluation: The proposed approach
performs extensive simulation results and evaluates the
system performance from various perspectives. Through
these findings, we assess the system performance in
more detail and the effectiveness of our proposed model.
Moreover, it enables us to evaluate and refine the resource
management strategies and cache design, ensuring their
practicality in real-world scenarios.
Combining all these factors creates a comprehensive and
specialized solution that advances the understanding and de-
velopment of optimal resource allocation and cache man-
agement strategies in integrated terrestrial-satellite networks.
By integrating the multi-agent framework, the novel cache
architecture design, and the numerical-based assessment, The
proposed scheme provides practical and efficient solutions that
address these networks’ specific challenges and requirements.
Table I provides a comparison of the most related work with
the proposed optimization framework.
D. Contributions
As perceived from the above discussion, the challenges
associated with efficient cache design and optimal resource
allocation in integrated terrestrial-satellite communication net-
works are significant. To overcome these challenges, we intro-
duce a two-stage approach emphasizing optimal resource allo-
cation and cache design based on the Multi-Agent Deep De-
terministic Policy Gradient (MADDPG) algorithm. In which
users are regarded as agents and utilize a gradient approach
based on MADDPG to optimize the allocation of resources.
This optimization takes into account both user association
and transmission power control. The subsequent phase entails
introducing a cache design plan. In this plan, agents are
represented by base stations and satellites, and their objec-
tive is to enhance energy efficiency. The simulation results
demonstrate the effectiveness of the proposed multi-agent
deep reinforcement learning (DRL) algorithm in addressing
the optimization problem. This study has the potential to
significantly enhance the performance of integrated terrestrial
and satellite networks while also laying a solid foundation for
future research in this area. This paper’s main contributions
are as follows:
In order to provide users with NOMA services, we
present a framework for integrating terrestrial and satellite
communication networks that use BSs and satellites. In
order to improve energy efficiency and cut latency, our
proposed framework uses a special cache design that uses
cache equipment for both BSs and satellites.
After the preceding step, we formulate a joint optimiza-
tion problem with the objective of maximizing energy
efficiency through optimal placement of BSs and satel-
lites, considering caching, an association of users as well
as transmission power control.
Following that, in two steps, the optimal allocation of
resources, as well as the design of cache aspects of the
optimization problem, are addressed. To accomplish this,
we employ the Multi-Agent Deep Deterministic Policy
Gradient (MADDPG) multi-agent deep reinforcement
learning algorithm, which permits users, base stations,
and satellites to act as agents and optimize resource allo-
cation and caching placement. On the basis of MADDPG,
a novel power control and user association system has
been introduced.
Our proposal entails a cache design strategy that relies on
the MADDPG technique. This plan enables both Base
Stations (BSs) and satellites to select files from a file
library and subsequently store them in their respective
local cache pools. The primary benefit of implementing
this approach is that it significantly enhances the system’s
energy efficiency.
Finally, we conclude by comparing our suggested opti-
mization framework to benchmark algorithms in order
to assess its effectiveness. Regarding energy efficiency,
user satisfaction, and throughput, the experimental results
show that our approach performs better than the other
algorithms.
II. SY ST EM MO DE L
Motivated by the concept of integrated terrestrial non-
terrestrial networks in [38]–[42], [61]1, our study considers an
integrated terrestrial-satellite network providing joint services
to ground users, as shown in Figure 1. The network comprises
a set of Mbase stations (BSs) on the ground, denoted by B,
and Klow-orbit satellites, represented by K. The users are
divided into two groups: Nbusers are served by ground BSs,
while the remaining Nsusers are served by satellites.
To mitigate user interference, we utilize a Non-Orthogonal
Multiple Access (NOMA) scheme for BS-connected users
[62]. This approach employs successive interference cancel-
lation (SIC) at the receiver and superposition coding at the
transmitter, allowing for sequential detection, demodulation,
and interference cancellation. Users associated with a single
BS are clustered into NOMA groups, where decoding priority
is given to those with superior channel information, reducing
interference from the users with higher path loss
1Please note that integrated terrestrial and non-terrestrial networks are an
emerging research area in academia and industry. The terrestrial networks are
well-established and deployed. The non-terrestrial networks are expected to
be deployed by the end of 2030. However, integrating both networks involves
several challenges that need to be tackled.
DRAFT 6
Fig. 1: System Model
Moreover, the user association is a critical element of the
proposed integrated terrestrial and satellite communication
network, which enables users to connect to either a base
station or a satellite during each time slot t. To represent these
connections, binary variables αm
n(t)and αk
n(t)are used to
indicate whether the nth user is connected to the mth base
station or the kth satellite at time t. These binary variables
play a crucial role in optimizing network performance, as they
directly impact the quality of service for users and the overall
network capacity. Optimization objectives such as minimizing
interference, maximizing throughput, or balancing network
load can be achieved by adjusting these binary variables.
Furthermore, the SINR of the nth user connected to the
mth BS in a given time tis calculated using the implemented
NOMA scheme [63]:
γm
n(t) = αm
n(t)|gm
n(t)|2pn(t)
Im
n(t) + Im
n(t) + Ik(t) + No
,(1)
Where in the above equation, each user has a transmit power
pn(t) = βn(pmax
b/N1
b), which depends on a power control
factor βn(t), the maximum power available at the base station
pmax
b, and the number of users a single base station can
serve N1
b. Following that, the channel between the nth user
to mth BS is represented by gm
n(t) = qˆgn,mdξ
n,m, where
ˆgn,m denotes Rayleigh fading coefficient, dn,m is the distance
and ξrepresents the pathloss exponent [64]. Practically, the
distance dn,m can be calculated using the associated localiza-
tion method, which involves cooperative positioning among
multiple aircraft in cellular networks, as discussed by the
authors in [65]–[67]. Similarly, interference in the network
is caused by users at the same base station and users in
other base stations. From the satellite, users are represented
by Im
n,Im
nand Ikrespectively. Whereas Norepresents the
noise spectral density. Moreover, the interference caused by
users in the same base station is calculated based on the
channel gains between users and is represented as Im
n(t) =
Pn=nαm
n(t)|gm
n(t)|2pn(t). Hence, the interference caused
by the nth users forms mth base stations are determined by
summing over all active users in other base stations, repre-
sented as Im
n(t) = Pm=mPNm
b
n=nαm
n(t)gm
n(t)
2pn(t).
Moreover, the interference caused by the user connected
to the satellite is determined by the channel gain between
the user and the base station, represented as Ik(t) =
PK
k=1 PNs
n=1 ak
n(t)|hm
n(t)|2pk
n(t). The transmission power of
the nth satellite user is denoted as pk
n(t).
Hence, the SINR of nth satellite user can be expressed as
[68]:
γk
n(t) = αk
n(t)hk
n(t)
2pk
n(t)
Im
n(t) + Ik
n(t) + Ik
n(t) + No
,(2)
where in the above expression, hk
n(t)represents the block
faded channel between the user nand their associated kth
satellite such that hk
n(t) = ˆ
hjπϑ
n,k ,, where ˆ
hn,k denotes
the complex-valued channel coefficient, ϑis the Doppler
shift while j=1. Moreover, the transmission power
control of the nth satellite user, pk
n(t=βn(pmax
k/N2
s)),
is determined by the power control factor βn(t)and the
maximum available power pmax
k. Each satellite can serve
up to N2
susers. Interference in the network is caused by
users in base stations and other satellite users, which are
denoted by Im
n(t),Ik
n(t)and Ik
n(t), respectively. Moreover,
the interference from users associated with the BSs can be
calculated as Im
n(t)(t) = PM
m=1 PN1
b
n=1 αm
n(t)gk
n(t)
2pm
n(t).
DRAFT 7
Similarly, the interference from other users from the same
satellite as well as from other satellite users can be calcu-
lated as Ik
n(t) = PN2
s
n=nαk
n(t)hk
n(t)
2pk
n(t)and Ik
n=
Pk=kPn=nαk
n(t)hk
n(t)
2pk
n(t). Here, nis the index of
the current satellite user, Mis the total number of base stations
in the network, and Kis the total number of satellites in the
network. Let us assume large and small scale fading [69]; then
the complex-valued channel coefficient can be defined as:
ˆ
hn,k =sGkGk
nc
4πfcdk
n2,(3)
where cdefines the speed of light, fcstates the carrier-
frequency, dk
nrepresents the distance from the satellite, Gk
n
describes the antenna gain at receiver, and Gkshows the
antenna gain at the satellite. It is important to mention that
the antenna gain of the satellite Gkgenerally depends on the
radiation pattern and the ground terminal location. It can be
written as:
Gk=Gmax J1k
n)
k
n
+ 36J3k
n)
k
n)32
,(4)
where Gmax is the maximum gain at the beam of satellite,
Λk
n= 2.07123 sin (θl,ι)/sin(θ3dB )such that θk
nshows the
angle between the ground terminal and the satellite for any
given location, where the 3 dB loss related to the satellite
beam, which is given by θ3dB. Further, J1and J2represent
the Bessel functions of the first and second orders, respectively.
Following that, the nth users energy efficiency at time slot
tcan be expressed as follows:
Ψn(t) =
m
X
m=1
αm
n(t)Rm
n(t)
pm
n(t)+
K
X
k=1
αk
n(t)Rk
n(t)
ps,m(t),n, l. (5)
Moreover in (5), Rm
n(t) = log2(1 + γm
n(t)),Rk
n(t) =
log21 + γk
n(t)reprsent the achievable rates. Similarly, the
proposed system model facilitates the file retrieval process for
network users by utilizing cache pools in base stations and
satellites. To request a file, users can access a file library
U={1, . . . , U }, and the cache pool size is fixed and based
on the number of files and their size. Specifically, each base
station and satellite have a cache pool of size Mu< U and
Ms< F , respectively, where the base station can store Mu×s
bits of files, and each satellite is capable of storing Ms×s
bits of files.
Similarly, when a nth puts in a request for a file, to address
this, the system first checks if the requested file is available in
the pool of the local cache of the base station. Let Lmdenote
the set of files cached at base station m. If the requested file
is in Lm, the file can be transmitted to the user, and the power
consumed during this process is represented by pm,r(t). If the
file is not available in Cm, the user looks for the file in the
core network, which incurs a power consumption of pl,r(t).
Similarly, if the requested file is available in the satellite
cache pool, In that case, the user retrieves the file directly,
and the power consumed during this process is represented by
pk,r (t), where Lkdenotes the set of files cached at satellite
lk. If the requested file is unavailable in Lk, the user’s request
is forwarded toward the ground gateway, and a file is accessed
from the core network, resulting in a power consumption of
pl,r(t).
The caching system provides benefits in terms of reducing
time delays and alleviating power consumption by prioritizing
locally cached files over files that need to be retrieved from
the core network. The caching gain is dependent on whether
the requested file is found in the cache pool (local) or not.
Similarly, the Jm(t)indicates whether the local cache
device satisfied the mth BS user’s file request at time t.
Jn(t) = (1,Request Satisfied
0,Otherwise.(6)
Files in the system are assumed to follow the Zipf distribution,
and their popularity affects the caching effect. In this regard, a
generalized Zipf distribution is used in the system to estimate
values of εranging from 0.56 to 0.83, respectively [56].
ym=1/uε
PU
u=1(uε)1,u. (7)
The reward for caching deployment is given by reducing the
time delay can be expressed as follows:
xn(t) = Jn(t)Cs
nT1
n,(8)
The system model represents the time delay Tnassociated
with downloading requested content for the user nvia a back-
haul link. Moreover, the size of the cache file is denoted as
s, and Cnrepresents the content requested by the nth user.
When the requested content is available in the local cache, it
can be directly obtained from there, reducing the time delay.
Likewise, it’s worth examining the advantages of using a
satellite cache in terms of minimizing latency and transmitting
data that is cached, thereby improving overall performance,
which can be expressed as:
xk
n(t) = Jm(t)Ck
n(Tk
m)1(9)
The time required for the nth user to download the requested
content through the back-haul link is (Tmk), indicating a delay
in the process.
The cache policy’s effectiveness is evaluated by examining
the hit rate for the cache, which represents the proportion of
requests from users that are successfully fulfilled. To calculate
the cache hit rate for a given duration of time t, the following
formula can be used:
Ω(t) = PN
n=1 Jn(t)
N.(10)
We will use the notation P(t)to represent the total amount of
power used by all BS users at ttime.
P(t) = pn(t) + J
n(t)pl,r(t) + Jn(t)pn,r(t),(11)
Where J
n(t) = (1 Jn(t)). In our system model, the power
consumption for both BS and satellite users can be expressed
using the variables P(t)and Pk(t), respectively. Power con-
sumption for BS users includes both the transmission power
of user n, denoted as pn(t)and the power consumption for
data retrieval. The latter is further subdivided into two parts:
pn,r(t)for data retrieval from the BS cache as well as pl,r (t)
DRAFT 8
from the core network via the back-haul link, respectively. On
the other hand, for users connected to the satellite, we utilize
the variable Pk(t)to represent the total power consumption,
including the power consumption for transmission and data
retrieval.
Pk(t) = pk,n(t) +(1 Jn(t)) pk,l,r (t)+Jn(t)pk ,n,r(t),(12)
Various factors determine the power consumption of a user
in a satellite network. The transmit power pn
k(t)required for
the user nto communicate with the satellite, as well as the
power consumed during data retrieval from the satellite’s cache
pk,n,r (t)or the core network via the Gateway Station pk,l,r(t),
are examples. We combine the BS cache with the satellite to
determine how well the satellite network uses energy. This
allows us to estimate nth user energy efficiency over t. Energy
efficiency is a crucial metric for figuring out how well the
satellite network works and how to improve its design to give
users the most benefits while using the least power. It can be
expressed as follows:
Ψn(t)=
M
X
m=1
αm
n(t)Rm
n
pn(t)+ J
npl,r(t) + Jn(t)pn,r(t)
+
K
X
k=1
αk
n(t)Rk
n)
pk
n(t).
(13)
Where in (13), Jn
n1Jm(t).
III. PROB LE M FORMULATION
This work aims to maximize the system’s overall energy
efficiency by optimal allocation of resources, e.g., transmission
power, user association matrix, and cache layout. Furthermore,
it seeks to minimize the system’s overall energy consumption
while maintaining high performance and quality of service.
The ultimate goal is to achieve an optimal balance between
energy efficiency and system performance by finding the most
efficient way to allocate resources while meeting operational
requirements. The associated constraint represents the restric-
tion that each user is limited to a single BS or satellite during
a specific period and can be expressed as follows:
M
X
m=1
αm
n(t) +
K
X
k=1
αk
n(t)1,n. (14)
A maximum power constraint exists for each user associated
with a BS or a satellite. The transmission power limit for users
associated with a BS is given by:
pn(t)pmax
b
N1
b
,n. (15)
The transmission power limit for users associated with a
satellite is given by:
pk
n(t)pmax
k
N2
s
,n. (16)
It is important to remember that the QoS limitations of each
BS and satellite limit the maximum number of users they can
accommodate. The maximum number of users for a specific
BS is:
Nb
1
X
n=1
αm
n(t)Nb
1,n. (17)
The quantity of service constraint for a satellite is:
N2
s
X
n=1
αk
n(t)N2
s,n. (18)
The following constraint shows that each user’s power
control factor is restricted to fall within 0to 1.
βn(t)[0,1],n. (19)
The caching strategy of base stations and satellites is
constrained by the capacity of their respective local caches.
In addition, the size of user content requests is smaller than
the available local storage capacity, which is still insufficient to
accommodate the total size of all file libraries. The constraint
that describes the limitation on the local cache capacity for
BS and satellite is as follows:
CnMfU, CnMsU. (20)
Based on the aforementioned objective and constraints, the
optimization problem can be expressed mathematically. One
possible formulation is as follows:
max
M
X
m=1
M
X
m=1
αm
n(t)ηm
n+
K
X
k=1
αk
n(t)ψk
n.
M
X
m=1
αm
n(t) +
K
X
k=1
αk
n(t)1,n. (21a)
Nb
1
X
n=1
αm
n(t)Nb
1,n. (21b)
N2
s
X
n=1
αk
n(t)N2
s,n. (21c)
pn(t)pmax
b
N1
b
,n. (21d)
pk
n(t)pmax
k
N2
s
,n. (21e)
βn(t)[0,1],n. (21f)
CnMfU, (21g)
CnMsU. (21h)
Where, ηm
n=Rm
n
pn(t)+J
npl,r(t)+Jn(t)pn,r (t)and ψk
n=Rk
n)
pk
n(t).
The optimization problem mentioned above appears to be a
mixed-integer nonlinear optimization problem (MINLP). This
is because it involves both integer and continuous variables and
nonlinear constraints such as power constraints in constraints
(15) and (16). Additionally, the objective function is also
nonlinear. MINLPs are known to be challenging to solve as
they combine the computational difficulties of both nonlinear
and integer optimization problems.
DRAFT 9
IV. OPTIMIZING TERRESTRIAL-S ATEL LI TE NE TW OR K
EFFICIENCY WITH MU LTI -AG EN T DRL (MADDPG)
This section outlines a MADDPG approach for enhanc-
ing the integrated terrestrial-satellite NOMA communication
network. The main aim of this work is to maximize the
objective function value by determining the optimal allocation
of resources, e.g., transmission power control, the design of
the cache, and the allocation of users. Therefore, to achieve
this, we suggest two MADDPG algorithms that concentrate
on different subproblems. To ensure peak performance, both
algorithms carefully choose the agents.
A. Reinforcement Learning
The objective of reinforcement learning (RL), a type of
machine learning, is to teach an agent how to interact with
the environment to maximize a cumulative reward signal. RL
does not always need a dataset to learn from, unlike supervised
learning, which necessitates a labeled dataset. Instead, an RL
agent can learn by interacting with its surroundings, getting
feedback through rewards or penalties, and then changing its
behavior.
The agent participates in the RL process by acting in the
world and receiving feedback as a reward signal. In order
to raise its expected cumulative reward, the agent modifies
its policy, a function that links states to actions. Modifying
the parameters of the decision-making model is part of the
learning process, which involves updating the policy.
The agent continuously refines its behavior through trial and
error, one of RL’s advantages. The agent experiments with
various actions in the environment, evaluates the rewards that
result, and then modifies its strategy to carry out more actions
that produce greater rewards. This cycle repeats until the agent
discovers a course of action that maximizes its cumulative
reward.
B. Enhancing NOMA Networks with MADDPG
The integrated terrestrial-satellite NOMA communication
network comprises multiple agents, which makes it a com-
plex multi-agent scenario. In such a scenario, the suggested
MADDPG algorithm is the most suitable approach due to
its flexibility in handling many agents. On the other hand,
traditional single-agent reinforcement learning methods may
encounter overfitting issues against competitors in dynamic
and unstable environments. The objective function value in
an integrated terrestrial and satellite NOMA communication
network is maximized by using a Markov decision process
(MDP), State space (S), action space (A), reward space, and
transition probability space must all be defined. We can only
find an effective solution if we accurately model the problem.
In this configuration, each user acts as an agent by monitoring
their immediate environment, selecting appropriate actions
from the available action space, and carrying those actions
out to satisfy the configuration’s prerequisites. When a user
completes all of their tasks, they are rewarded. Regardless
of the algorithm used, in both analyzed approaches, agents,
actions, states, and rewards had distinct definitions.
1) Multi-Agent Reinforcement Learning for User Associa-
tion and Power Control in NOMA Networks: An integrated
terrestrial-satellite network’s energy efficiency optimization
problem can be resolved using the MADDPG algorithm,
detailed in Algorithm 1. These key concepts, which include
agents, actions, states, and rewards in this algorithm, are
defined below:
agent: Every participant in an integrated terrestrial and
satellite-enabled NOMA communication network is treated as
a potential agent.
Action: In the above-mentioned system design, every agent
is assigned 2tasks specified by the action space A1=
A11, A12 . The first task, denoted as A11, involves the user as-
sociation process, establishing the relationship between agents
and the base stations (BSs) or satellites. The A11 action is
represented by a vector A11 =an
1(t), . . . , an
M(t), where each
entry corresponds to the association decision of a particular
agent. To represent the A11 action discretely, the action space
A11 must be discretized. The second task, represented by a
vector A12 =α1(t), . . . , αM(t), involves the power control
factor that each agent uses to determine its transmission power.
In summary, each agent first selects the appropriate BSs or the
satellites for the association using discretized user association
action A11 and subsequently determines its transmission power
using the transmission power control factor A12.
Reward: Similarly, each user aims to optimize energy
efficiency by taking appropriate actions. To evaluate the ef-
fectiveness of these actions, we define the reward for the n-th
user at the current time slot tas follows:
R1(tn(t).(22)
State: In this system, the state space for each agent is de-
termined by its observation of energy efficiency. Specifically,
for the user nin time t, the state is determined by comparing
its energy efficiency with the previous time slot. If there is an
improvement, then SΨn
1mis 1. Moreover, for the entire system
state space is defined as S1=SΨn
11 (t), . . . , SΨn
1N, where SΨn
1m
represents the state for the user nin time slot t.
SΨn
1i=(1,if R1(t)R1(t1)
0,else (23)
where irepresents the index of n-th user.
2) MADDPG-based Cache Optimization: In order to allo-
cate the resource optimally in the integrated terrestrial and
satellite NOMA communication network, we employ Algo-
rithm 1. Once this optimization is complete, we proceed with
the optimization for the design of the cache of the BS as well
as for the satellites using Algorithm 2. Due to the differences
in the optimization goals, the actions, agents, rewards, and
states in Algorithm 2 may differ slightly from those used in
Algorithm 1 and can be stated as follows:
Agent: Algorithm 2 treats BSs or satellites as agents decid-
ing which files to retrieve from the library.
Action: Every satellite or BS determines files to utilize
from the files library for each time slot. This set of files,
denoted as A2 = {A21}, constitutes the local cache pool by
combining the file libraries. Reward: The system’s goal is to
DRAFT 10
maximize its energy efficiency, which is achieved by having
each base station (BS) or satellite execute a set of operations.
The number of agents in the system totals M+K. The overall
energy efficiency of the users served by the n-th base station
or satellite is the reward for the operations of the facilities that
operate those base stations and satellites. To be more specific,
the sum of the reward for the n-th piece of bad behavior that
occurs during the time slot tdepends on the quantity that is
denoted as:
R2(tn
=(PNb
n=1 Ψn, n [1, Nb],
PNs
n=1 Ψn, n [M+ 1, M +K]
(24)
State: In Algorithm 2, the satellite or base station (BS) is
the agent responsible for optimizing the system’s energy effi-
ciency. Thus, S2piΨnis set to 1 if the reward for the n-th BS or
the satellite time tslot is greater than that in the t1. The sys-
tem’s state space is denoted by S2={SΨn
21 (t), . . . , SΨn
2N(t)}.
The quantity AAA is assigned to S2iEE(t).
SΨn
2i(t) = (1,if reward
0,otherwise. (25)
In order to achieve a greater degree of stability, the MADDPG
algorithm can gather data concerning the activities of various
other agents. The probability of the change will be discussed
in more detail in the upcoming paragraphs.
Ps|s, a1,...aN, X1,...,XM=Ps|s, a1,...aN
=Ps|s, a1,...aN, X
1,...,X
M.(26)
A MADDPG algorithm for an integrated terrestrial-satellite
NOMA communication network maintains stability even when
agents’ policies are dynamically updated. Equation (26) shows
the state transition probability, with airepresenting the action
taken by the agent and srepresenting the current state. The
network includes Magents, each with a set of corresponding
parameter values w=ϖ1, . . . , ϖMoptimized to maximize
returns via the MADDPG algorithm.
The MADDPG algorithm uses reinforcement learning to
enable multiple agents to learn from their experiences and
improve policies cooperatively. Each agent updates its policy
based on observations and a centralized critic estimating the
expected return. The critic considers all agents’ experiences
and policies, facilitating collaboration and performance im-
provement.
All Magents’ policy values are represented by x=
x1, . . . , χM , and each agent optimizes its policy using its
unique set of parameter values w. The following equation
determines the gradient of the objective function, which mea-
sures how the policy should be updated for each agent. ByA-
gentsan achieves better network performance and increases
overall system capacity. by optimising individual policies
xiI(χi) = Ex,aDhxiχi(αi|ωi)aiQX
i(x, a1,...,aN)i,
(27)
The MADDPG algorithm utilizes two distinct neural networks,
namely the actor network and the critic network, in order to
optimize the performance of all the M agents in the integrated
terrestrial and satellite NOMA networks. At the same time, the
observation and action spaces of the agents are denoted by x
and a, respectively, while the replay memory is represented
by D. The actor-network in the algorithm selects actions by
taking into account the policy, using a continuous action space
to choose between A1and A2. On the other hand, the critic
network in the algorithm is used to evaluate the actions that
need to be executed by updating the Qfunction, which is
represented by Qγ
1(x, a1, . . . , aN)as shown in equation (27).
To update the networks, the policy network of the actor-
network is updated using gradient descent based on equation
(27) [70]. Meanwhile, the critic network updates the Qfunc-
tion by minimizing the loss function L(ωi), as illustrated in
the following equation.
y=ri+rQu
ix, a
1,...,a
Na
j=u
j(oj).(28)
C. Algorithm Description
This section presents two algorithms, Algorithm 1 for
system resource allocation and Algorithm 2 for system cache
design, based on the MADDPG algorithm [60]. Before starting
the algorithms, the neural network parameters and replay
memory are initialized. The actor-network probabilistic-ally
selects behaviours, while the other network, also known as
the critic network used to evaluate the chosen behaviours.
In contrast, the actor adjusts the probability of the selected
behaviour upon the evaluation from the critic network. In
iterative MADDPG for terrestrial-satellite network, each agent
is assigned an initial state and then observes its new state
at each step of an episode, with energy efficiency improved
at each instance. After successfully executing an action, each
agent is rewarded and transitions to a new state. Both policy
and exploration inform the agent’s decision on what to do.
Finally, these values are stored in memory for potential future
replay.
V. RE SU LTS AND DISCUSSION
In this study, we have employed a simulation-based ap-
proach to evaluate the performance of our proposed model2.
The experimental process involves the development of a syn-
thetic/virtual simulation environment that closely represents
the characteristics and behaviours of integrated terrestrial and
satellite NOMA communication networks. The simulation
environment is designed to mimic the network infrastructure,
including base stations, satellites, users, and the commu-
nication channels between them. Various factors, such as
user mobility, channel conditions, interference, and caching
mechanisms, are considered to create a realistic simulation
environment. We utilize MATLAB-based software tools and
frameworks specifically designed for network simulations to
implement the simulation environment. These tools provide
2Given the unique and intricate nature of the network configuration under
examination, it is not straightforward to directly compare it to previous studies
on integrated terrestrial-satellite networks found in the existing literature.
Therefore, we turn to the comparison of the proposed scheme with the differ-
ent learning algorithms, including the deep reinforcement learning algorithm
(DDPG), the Random Policy algorithm, the Genetic Algorithm (GA), and the
Proximal Policy Optimization (PPO) algorithms.
DRAFT 11
Fig. 2: Algorithm 1 Flow Chart
capabilities for modelling and simulating the network compo-
nents, implementing multi-agent deep reinforcement learning
algorithms, and evaluating performance metrics. Furthermore,
the experimental setup involves the deployment of our pro-
posed multi-agent deep reinforcement learning model within
the simulation environment. We carefully configure and tune
the hyperparameters of the MADDPG algorithm, such as
network topology, learning rates, exploration strategies, and
reward functions, to ensure effective resource allocation and
cache design. Our research primarily focuses on the algorith-
mic and methodological aspects of resource allocation and
cache design in integrated terrestrial and satellite NOMA
communication networks. Therefore, the experimental setup
primarily involves the software-based simulation environment
rather than physical or hybrid infrastructure.
The experimental environment encompasses specific config-
urations, including a network with 36 agents, consisting of 6
agents representing base stations and 2 agents representing
satellites. The channel characteristics of the base stations
follow a Rayleigh distribution, while the satellite parameters
align with previous study findings. The file library denoted as
U, has a capacity of 40, with 3 cache devices allocated for
the base stations and a satellite cache with a capacity of 3.
The user count and file content size is set at [1 2] bits. Power
consumption parameters are also specified, covering data re-
trieval power consumption for different scenarios. The cache
design optimization of the proposed scheme is conducted
using Algorithm 2. Our simulation model utilises an Adam
optimizer with the Rectified Linear Unit (ReLU) activation
function. The learning rate is set to 0.001, the discount factor
is 0.95, and the batch size is 10. The experiment is conducted
over 1000 iterations, with each agent completing 100 steps
Fig. 3: Algorithm 2 Flow Chart
0 200 400 600 800 1000
200
400
600
800
1000
1200
Objetive Function Value (bits/sec/Hz)
Iteration
User, N=40
User, N=32
User, N=24
Fig. 4: Algorithm convergence with varied numbers of Users
per episode. While physical or hybrid infrastructure imple-
mentation considerations are significant in evaluating network
performance, conducting large-scale physical experiments in
integrated terrestrial and satellite networks can be complex, ex-
pensive, and challenging to scale. Hence, simulations provide a
feasible and efficient approach to evaluate the performance of
our proposed model and conduct extensive experiments under
different scenarios.
We assessed the performance of the proposed scheme in
optimizing the system’s objective function by evaluating its
convergence with varying numbers of agents. Figure 4 show-
DRAFT 12
0 200 400 600 800 1000
400
500
600
700
800
900
1000
1100
Objetive Function Value (bits/sec/Hz)
Iteration
Learning Rate =10
-3
Learning Rate =10
-4
Learning Rate =10
-5
Fig. 5: Effects of learning rate on Algorithm convergence
0 200 400 600 800 1000
100
200
300
400
500
600
700
800
900
Objetive Function Value (bits/sec/Hz)
Iteration
Base-Station
Satellite
Fig. 6: BS vs. Satellite users: a comparison of objective fuction value
convergence
cases the convergence results of the proposed scheme with
24, 32, and 40 agents. To test the scheme’s effectiveness in
a more challenging environment, we increased the values of
N1
band N1
bto 5 while maintaining the network parameters
at N= 24,M= 6, and K= 2 for the 24-agent scenario.
The curves in Figure 4 demonstrate that the proposed scheme
consistently achieved a maximum reward of approximately
700 Iterations across all three agent numbers, demonstrating its
strong convergence performance. The proposed scheme exhib-
ited robust convergence behavior, even in complex terrestrial-
satellite networks with multiple agents.
Furthermore, we conducted an analysis to assess the impact
of different learning rates on the convergence performance
of the proposed scheme. This experiment utilized network
parameters with M= 32,N= 6,S= 2, and M1=
M2= 4. As illustrated in Figure 5, the curves exhibit varying
convergence speeds and relative heights, depending on the
learning rate. Notably, a higher learning rate resulted in a faster
convergence point. This observation emphasizes the efficacy
of the proposed scheme in optimizing the system’s objective
0 200 400 600 800 1000
0
200
400
600
800
1000
Objetive Function Value (bits/sec/Hz)
Iteration
Proposed Scheme
Benchmark 1
Benchmark 2
Benchmark 3
Fig. 7: Proposed scheme versus benchmark algorithms: a comparison
0 5 10 15 20 25 30 35 40 45
10
0
10
1
10
2
10
3
Objetive Function Value (bits/sec/Hz)
Gound Users
Proposed Scheme
Benchmark 1
Benchmark 2
Benchmark 3
Benchmark 4
Fig. 8: The impact of user count on objective function value of
different algorithms
function across a diverse range of agent densities and learning
rates.
The results depicted in Figure 6 demonstrate the conver-
gence of both base stations (BSs) and satellites. Both graphs
exhibit similar convergence rates, reaching their optimal states
relatively quickly using the same experimental setup as in the
previous figures. Figure 6 also highlights that users connected
to BSs are more energy-efficient compared to satellite users.
Satellite users converge to total energy efficiency of approxi-
mately 175 bits per joule per hertz, whereas BS users achieve
a higher total energy efficiency of about 750 bits per joule
per hertz. This efficiency disparity can be attributed to the
improved channel conditions experienced by BS users.
To evaluate the optimization performance of the proposed
scheme, we conducted experiments employing four different
algorithms. Figure 7 illustrates the comparison between the
proposed scheme and benchmark algorithms: the widely used
deep reinforcement learning algorithm, DDPG, serving as a
benchmark; the Random Policy algorithm, the Genetic Al-
DRAFT 13
0 200 400 600 800 1000
0
200
400
600
800
1000
1200
1400
1600
1800
Objetive Function Value (bits/sec/Hz)
Iteration
M
u
=4, U=40
M
u
=3, U=40
M
u
=4, U=50
M
u
=4, U=50
Fig. 9: Cache Reward Proposed Scheme: convergence across capacity
and library variations
0 200 400 600 800 1000
600
800
1000
1200
1400
1600
Objetive Function Value (bits/sec/Hz)
Iteration
Proposed Scheme
Benchmark 1
Benchmark 3
Benchmark 4
Fig. 10: Proposed scheme versus benchmark algorithms: a compari-
son
gorithm (GA), and the Proximal Policy Optimization (PPO)
algorithm. The GA algorithm simulates natural evolution to
identify the optimal solution, while the PPO algorithm is a
newly developed policy gradient algorithm. The Random Pol-
icy algorithm randomly selects user collaboration and power
control actions in each episode.
The performance of these algorithms in optimizing the
system’s objective function was evaluated based on the number
of users in the network, as shown in Figure 7 for M= 32. The
proposed scheme and the DDPG and PPO algorithms achieve
converging rewards. The proposed scheme demonstrates supe-
rior resource optimization performance, surpassing the other
two algorithms. In terms of energy efficiency, the proposed
scheme attains the highest rate at 625 bits per joule per hertz.
In comparison, the PPO algorithm starts at 450 bits per joule
per hertz and converges to approximately 580 bits per joule
per hertz after around 400 iterations.
Moreover, the DDPG algorithm achieves the same energy
efficiency as the proposed scheme. Conversely, the Random
0 200 400 600 800 1000
0.000
0.049
0.098
0.147
0.196
0.245
0.294
0.343
0.392
Request Completation Rate
Iteration
Proposed Scheme
Benchmark 1
Benchmark 4
Benchmark 5
Fig. 11: Proposed scheme compared to benchmark algorithms
1 2 3 4 5 6
400
600
800
1000
1200
1400
1600
1800
2000
Objetive Function Value (bits/sec/Hz)
Cache Size
Proposed Scheme U=40
Proposed Scheme U=50
Benchmark 1
Benchmark 3
Benchmark 4
Benchmark 6
Fig. 12: Proposed scheme and benchmark algorithms: a comparison
across cache size variations
Policy algorithm exhibits poor convergence, with its curve
fluctuating between 400 and 500 bits per joule per hertz. Com-
pared to the proposed scheme, the other algorithms demon-
strate weaker stability and inferior performance in optimizing
the objective function.
The advantage of the proposed scheme (MADDPG) lies in
its ability to achieve both high resource optimization perfor-
mance and energy efficiency, as demonstrated in Figure 7.
The proposed scheme outperforms the benchmark algorithms,
showcasing its superior stability and consistent performance.
It achieves the highest energy efficiency rate of 625 bits per
joule per hertz, providing significant gains compared to the
PPO algorithm’s convergence to around 580 bits per joule per
hertz after approximately 400 iterations. Thus, the proposed
scheme (MADDPG) stands out as an effective and efficient
solution for optimizing system objectives in complex multi-
agent networks.
The relationship between the users in a network and the
objective function value of algorithms is demonstrated in
Figure 8. Results demonstrate that the system’s energy effi-
DRAFT 14
0 1 2 3 4 5 6
1E-7
1E-6
1E-5
1E-4
0.001
0.01
0.1
1
Request Completation Rate
Cache Size
Proposed Scheme U=40
Proposed Scheme U=50
Benchmark 1
Benchmark 3
Benchmark 4
Benchmark 6
Fig. 13: Evaluating proposed scheme and benchmark algorithms for
Request Complementation rate under different cache sizes
ciency increases as the user density per BS and the satellite
increase. The proposed scheme outperforms other benchmark
algorithms regarding energy efficiency when three to five users
share each BS and satellite. With each BS and satellite having
five users, the system’s energy efficiency can reach up to
1180 bits/Joule/Hz. The proposed scheme is significantly more
energy-efficient than the random policy, DDPG, PPO, and GA
algorithms, indicating its effectiveness.
Figure 9 shows the convergence analysis of Algorithm
2 for various local capacities and file libraries. With just
1000 training iterations, Algorithm 2 converges quickly, taking
around 60 iterations. The figure compares four examples with
different file library sizes and local capacities.
Similarly, the convergence analysis of Algorithm 2 is in-
vestigated as the sizes of the satellite local capacity, BS, and
file library vary. Algorithm 2 performs well with four different
capacities. However, ineffective local cache deployment in the
first training iteration increases power consumption, yielding
a decrease in cache reward to below 1090 bits per Joule per
Hertz. The cache rewards improve as training iterations in-
crease and converge after about 50 iterations. As the file library
grows, the cache reward in the framework decreases due to
the fixed local cache capacity, making it more challenging to
locate the file in Algorithm 2. The local cache capacity and
reward decline for the same cache file library.
Figures 10 and 11 compare the convergence processes for
energy efficiency and cache hit rate among various cache op-
timization algorithms at Nf= 3 and U= 40. As illustrated in
Figure 10, Algorithm 2 and the DDPG algorithm can converge
in energy efficiency. However, Algorithm 2 performs better
for a given cache and file library size than other algorithms.
The DDPG algorithm attains a training value of 1100 bits
per joule per Hz at around 700 iterations, while benchmark-
3 exhibits an oscillating curve between 600 and 650 bits per
joule per Hz, hindering convergence. Algorithm 2 outperforms
other algorithms in optimizing the system’s objective function
more effectively and steadily.
Figure 11 shows the Algorithm 2 request competition rate.
Results demonstrate that the value rises to 0.13 and converges
at 0.33, while the DDPG algorithm’s cache hit rate converges
slowly. Therefore, the proposed scheme produces better cache
hit rates.
Figures 12 and 13 show the request competition rate and the
energy efficiency of various algorithms for each base station
and satellite with different cache sizes. Despite the uncached
strategy requiring less memory, the MADDPG algorithm has
lower energy efficiency when the cache capacity is only 1, as
depicted in Figure 12. This trend is due to the limited serving
capacity if the local cache acts as a performance bottleneck in
retrieving the relevant file as per the user’s request, resulting in
suboptimal outcomes. However, as the cache capacity varies
from value 1 to value 6, the performance of the proposed
scheme becomes more energy-efficient. The MADDPG al-
gorithm consistently outperforms other algorithms regarding
energy efficiency, and the performance gap between the two
algorithms widens with larger cache sizes.
Figure 13 shows that the proposed MADDPG algorithm
achieves a higher cache hit rate than other algorithms, fol-
lowing the same pattern as Figure 12. These graphs help us
better understand the dynamics of the relationships. The cache
reward and hit rate will decline proportionally to the size of
the file library within a specific range, assuming that the local
cache capacity will not change. This is because accessing the
necessary files will get harder and harder as the library gets
more significant. On the other hand, increasing the local cache
capacity will increase both cache reward and cache hit rate,
presuming that the size of the file library stays constant.
These findings validate the effectiveness of the proposed
MADDPG scheme in achieving high energy efficiency and
cache hit rates, outperforming other benchmark algorithms
across various network configurations and cache sizes.
VI. CONCLUSION AND FUTURE DIRECTIONS
In conclusion, our proposed approach for enhancing energy
efficiency in integrated terrestrial and satellite NOMA com-
munication networks offers several advantages compared to
existing reference contributions. Firstly, our use of a multi-
agent deep reinforcement learning technique, specifically the
MADDPG algorithm, outperforms benchmark algorithms and
the standard DDPG algorithm, which only utilizes a single
agent. This highlights the effectiveness of our approach in
addressing the complexities of resource allocation and cache
design in a multi-agent setting. By leveraging the MAD-
DPG algorithm, we achieve optimal user association, power
management, and cache layout. It allows us to model users
and BSs as agents and enables the resource management
and cache design optimization solution in a more efficient
way. Moreover, the proposed MADDPG-based cache design
strategy enables BSs and satellites to act as agents and
intelligently select files from the library to store in local cache
pools. This strategy improves the efficiency of data retrieval
and maximizes the energy efficiency of the network. Our
results demonstrate the benefits of the considered framework
of energy efficiency optimization in integrated terrestrial-
satellite NOMA networks. We significantly advance this re-
search area by outperforming the benchmark algorithms and
DRAFT 15
utilizing multi-agent deep reinforcement learning. Our work
serves as a solid foundation for future studies and opens up
opportunities to explore more complex scenarios, such as the
optimal allocation of resources in a multiple-layer NOMA-
enabled satellite communication network model. Specifically,
we aim to focus on minimizing power consumption, which is
crucial for sustainable and energy-efficient network operations.
While our proposed model showcases various advantages,
there are certain limitations that should be acknowledged. One
limitation is the computational complexity associated with
multi-agent deep reinforcement learning algorithms, which
can require substantial computational resources and time.
Additionally, the performance of our approach heavily relies
on accurate modelling and representation of the network
environment, including user behaviour and channel condi-
tions. Ensuring the reliability and real-world applicability
of these models is an ongoing challenge. In summary, our
proposed approach offers significant advantages over existing
contributions, including superior performance, efficient multi-
agent optimization, and effective cache design. While there
are limitations to be addressed, our research paves the way
for future investigations and advancements in energy-efficient
integrated terrestrial and satellite communication networks.
REFERENCES
[1] W. U. Khan, E. Lagunas, A. Mahmood, S. Chatzinotas, and B. Otter-
sten, “RIS-assisted energy-efficient LEO satellite communications with
NOMA,” arXiv preprint arXiv:2306.10422, 2023.
[2] B. Cao, M. Li, X. Liu, J. Zhao, W. Cao, and Z. Lv, “Many-objective
deployment optimization for a drone-assisted camera network,” IEEE
Transactions on Network Science and Engineering, vol. 8, no. 4, pp.
2756–2764, October 2021.
[3] B. Li, M. Zhang, Y. Rong, and Z. Han, “Transceiver optimization for
wireless powered time-division duplex MU-MIMO systems: Non-robust
and robust designs,” IEEE Transactions on Wireless Communications,
vol. 21, no. 6, pp. 4594–4607, June 2022.
[4] G. Geraci, D. Lopez-Perez, M. Benzaghta, and S. Chatzinotas, “Inte-
grating terrestrial and non-terrestrial networks: 3D opportunities and
challenges,” IEEE Communications Magazine, 2022.
[5] B. Cao, Z. Sun, J. Zhang, and Y. Gu, “Resource allocation in 5G
IoV architecture based on SDN and gog-cloud computing,” IEEE
Transactions on Intelligent Transportation Systems, vol. 22, no. 6, pp.
3832–3840, June 2021.
[6] W. U. Khan, Z. Ali, E. Lagunas, A. Mahmood, M. Asif, A. Ihsan,
S. Chatzinotas, B. Ottersten, and O. A. Dobre, “Rate splitting multi-
ple access for next generation cognitive radio enabled LEO satellite
networks,” IEEE Transactions on Wireless Communications, pp. 1–1,
2023.
[7] M. M. Azari, S. Solanki, S. Chatzinotas, O. Kodheli, H. Sallouha,
A. Colpaert, J. F. M. Montoya, S. Pollin, A. Haqiqatnejad, A. Mostaani
et al., “Evolution of non-terrestrial networks from 5G to 6G: A survey,”
IEEE communications surveys & tutorials, 2022.
[8] A. Mahmood, A. Ahmed, M. Naeem, M. R. Amirzada, and A. Al-
Dweik, “Weighted utility aware computational overhead minimization
of wireless power mobile edge cloud,” Computer Communications, vol.
190, pp. 178–189, 2022.
[9] S. Saafi, O. Vikhrova, G. Fodor, J. Hosek, and S. Andreev, “AI-aided
integrated terrestrial and non-terrestrial 6G solutions for sustainable
maritime networking,” IEEE Network, vol. 36, no. 3, pp. 183–190, 2022.
[10] W. U. Khan, F. Jameel, T. Ristaniemi, S. Khan, G. A. S. Sidhu, and
J. Liu, “Joint spectral and energy efficiency optimization for downlink
NOMA networks,” IEEE Transactions on Cognitive Communications
and Networking, vol. 6, no. 2, pp. 645–656, 2019.
[11] H. Zhang, B. Wang, C. Jiang, K. Long, A. Nallanathan, V. C. Leung,
and H. V. Poor, “Energy efficient dynamic resource optimization in noma
system,” IEEE Transactions on Wireless Communications, vol. 17, no. 9,
pp. 5671–5683, 2018.
[12] H. Zhang, H. Zhang, W. Liu, K. Long, J. Dong, and V. C. Leung, “En-
ergy efficient user clustering, hybrid precoding and power optimization
in terahertz mimo-noma systems,” IEEE Journal on selected areas in
communications, vol. 38, no. 9, pp. 2074–2085, 2020.
[13] A. Nauman, M. Obayya, M. M. Asiri, K. Yadav, M. Maashi, M. Assiri,
M. K. Ehsan, and S. W. Kim, “Minimizing energy consumption for
noma multi-drone communications in automotive-industry 5.0, Journal
of King Saud University-Computer and Information Sciences, p. 101547,
2023.
[14] A. Mahmood, A. Ahmed, M. Naeem, and Y. Hong, “Partial offloading
in energy harvested mobile edge computing: A direct search approach,
IEEE Access, vol. 8, pp. 36 757–36 763, 2020.
[15] X. Zhu, C. Jiang, L. Kuang, N. Ge, and J. Lu, “Non-orthogonal multiple
access based integrated terrestrial-satellite networks,” IEEE Journal on
Selected Areas in Communications, vol. 35, no. 10, pp. 2253–2267,
2017.
[16] W. U. Khan, J. Liu, F. Jameel, V. Sharma, R. J¨
antti, and Z. Han,
“Spectral efficiency optimization for next generation NOMA-enabled
IoT networks,” IEEE Transactions on Vehicular Technology, vol. 69,
no. 12, pp. 15 284–15 297, 2020.
[17] S. Fu, J. Gao, and L. Zhao, “Integrated resource management for
terrestrial-satellite systems,” IEEE Transactions on Vehicular Technol-
ogy, vol. 69, no. 3, pp. 3256–3266, 2020.
[18] B. Deng, C. Jiang, J. Yan, N. Ge, S. Guo, and S. Zhao, “Joint
multigroup precoding and resource allocation in integrated terrestrial-
satellite networks,” IEEE Transactions on Vehicular Technology, vol. 68,
no. 8, pp. 8075–8090, 2019.
[19] Z. Zhao, G. Xu, N. Zhang, and Q. Zhang, “Performance analysis of the
hybrid satellite-terrestrial relay network with opportunistic scheduling
over generalized fading channels, IEEE Transactions on Vehicular
Technology, vol. 71, no. 3, pp. 2914–2924, March 2022.
[20] S. Pan, M. Lin, M. Xu, S. Zhu, L.-A. Bian, and G. Li, A low-profile
programmable beam scanning holographic array antenna without phase
shifters,” IEEE Internet of Things Journal, vol. 9, no. 11, pp. 8838–8851,
June 2022.
[21] B. Li, Q. Li, Y. Zeng, Y. Rong, and R. Zhang, “3D trajectory op-
timization for energy-efficient UAV communication: A control design
perspective, IEEE Transactions on Wireless Communications, vol. 21,
no. 6, pp. 4579–4593, June 2022.
[22] M. Giordani and M. Zorzi, “Non-terrestrial networks in the 6G era:
Challenges and opportunities,” IEEE Network, vol. 35, no. 2, pp. 244–
251, 2020.
[23] A. Sattarzadeh, Y. Liu, A. Mohamed, R. Song, P. Xiao, Z. Song,
H. Zhang, R. Tafazolli, and C. Niu, “Satellite-based non-terrestrial
networks in 5G: Insights and challenges,” IEEE Access, vol. 10, pp.
11 274–11 283, 2021.
[24] F. Rinaldi, H.-L. Maattanen, J. Torsner, S. Pizzi, S. Andreev, A. Iera,
Y. Koucheryavy, and G. Araniti, “Non-terrestrial networks in 5G &
beyond: A survey,” IEEE access, vol. 8, pp. 165178–165 200, 2020.
[25] Y. Cao, S.-Y. Lien, and Y.-C. Liang, “Deep reinforcement learning for
multi-user access control in non-terrestrial networks,” IEEE Transactions
on Communications, vol. 69, no. 3, pp. 1605–1619, 2020.
[26] W. U. Khan, A. Mahmood, A. Bozorgchenani, M. A. Jamshed,
A. Ranjha, E. Lagunas, H. Pervaiz, S. Chatzinotas, B. Ottersten, and
P. Popovski, “Opportunities for intelligent reflecting surfaces in 6g-
empowered v2x communications, arXiv preprint arXiv:2210.00494,
2022.
[27] I. Ahmad, J. Suomalainen, P. Porambage, A. Gurtov, J. Huusko, and
M. H¨
oyhty¨
a, “Security of Satellite-Terrestrial Communications: Chal-
lenges and Potential Solutions,” IEEE Access, vol. 10, pp. 96038–
96 052, 2022.
[28] S. Raza, S. Wang, M. Ahmed, M. R. Anwar, M. A. Mirza, and W. U.
Khan, “Task offloading and resource allocation for iov using 5g nr-v2x
communication,” IEEE Internet of Things Journal, vol. 9, no. 13, pp.
10 397–10 410, 2021.
[29] M. Ahmed et al., “Vehicular communication network enabled cav data
offloading: A review,” IEEE Transactions on Intelligent Transportation
Systems, 2023.
[30] I. Rasheed et al., “Lstm-based distributed conditional generative ad-
versarial network for data-driven 5g-enabled maritime uav communica-
tions,” IEEE Transactions on Intelligent Transportation Systems, 2022.
[31] D. Shome et al., “Federated learning and next generation wireless
communications: A survey on bidirectional relationship, Transactions
on Emerging Telecommunications Technologies, vol. 33, no. 7, p. e4458,
2022.
[32] A. Mahmood, T. X. Vu, S. Chatzinotas, and B. Ottersten, “Joint
optimization of 3d placement and radio resource allocation for per-uav
DRAFT 16
sum rate maximization,” IEEE Transactions on Vehicular Technology,
2023.
[33] T. Hasan et al., “Securing industrial internet of things against botnet
attacks using hybrid deep learning approach,” IEEE Transactions on
Network Science and Engineering, 2022.
[34] J. Jiao, Y. Sun, S. Wu, Y. Wang, and Q. Zhang, “Network utility
maximization resource allocation for NOMA in satellite-based Internet
of Things,” IEEE Internet of Things Journal, vol. 7, no. 4, pp. 3230–
3242, 2020.
[35] A. Wang, L. Lei, E. Lagunas, A. I. P´
erez-Neira, S. Chatzinotas,
and B. Ottersten, “NOMA-enabled multi-beam satellite systems: Joint
optimization to overcome offered-requested data mismatches, IEEE
Transactions on Vehicular Technology, vol. 70, no. 1, pp. 900–913, 2020.
[36] R. Ge, D. Bian, J. Cheng, K. An, J. Hu, and G. Li, “Joint user pairing
and power allocation for noma-based geo and leo satellite network,
IEEE Access, vol. 9, pp. 93 255–93 266, 2021.
[37] R. Wang, W. Kang, G. Liu, R. Ma, and B. Li, “Admission control and
power allocation for NOMA-based satellite multi-beam network, IEEE
Access, vol. 8, pp. 33 631–33 643, 2020.
[38] Z. Ji, S. Wu, C. Jiang, and W. Wang, “Popularity-driven content place-
ment and multi-hop delivery for terrestrial-satellite networks, IEEE
Communications Letters, vol. 24, no. 11, pp. 2574–2578, 2020.
[39] E. Lagunas, L. Lei, S. Chatzinotas, and B. Ottersten, “Power and flow
assignment for 5g integrated terrestrial-satellite backhaul networks,”
in 2019 IEEE Wireless Communications and Networking Conference
(WCNC). IEEE, 2019, pp. 1–6.
[40] M. Shaat, A. I. P´
erez-Neira, G. Femenias, and F. Riera-Palou, “Joint
frequency assignment and flow control for hybrid terrestrial-satellite
backhauling networks,” in 2017 International Symposium on Wireless
Communication Systems (ISWCS). IEEE, 2017, pp. 293–298.
[41] Z. Gao, A. Liu, C. Han, and X. Liang, “Sum rate maximization of
massive mimo noma in leo satellite communication system, IEEE
Wireless Communications Letters, vol. 10, no. 8, pp. 1667–1671, 2021.
[42] X. Liao, X. Hu, Z. Liu, S. Ma, L. Xu, X. Li, W. Wang, and F. M.
Ghannouchi, “Distributed intelligence: A verification for multi-agent drl-
based multibeam satellite resource allocation,” IEEE Communications
Letters, vol. 24, no. 12, pp. 2785–2789, 2020.
[43] X. Hu, S. Liu, R. Chen, W. Wang, and C. Wang, A deep reinforcement
learning-based framework for dynamic resource allocation in multibeam
satellite systems,” IEEE Communications Letters, vol. 22, no. 8, pp.
1612–1615, 2018.
[44] W. U. Khan, F. Jameel, N. Kumar, R. J¨
antti, and M. Guizani,
“Backscatter-enabled efficient V2X communication with non-orthogonal
multiple access,” IEEE Transactions on Vehicular Technology, vol. 70,
no. 2, pp. 1724–1735, 2021.
[45] W. U. Khan, X. Li, M. Zeng, and O. A. Dobre, “Backscatter-enabled
NOMA for future 6G systems: A new optimization framework under
imperfect SIC,” IEEE Communications Letters, vol. 25, no. 5, pp. 1669–
1672, 2021.
[46] W. U. Khan, F. Jameel, X. Li, M. Bilal, and T. A. Tsiftsis, “Joint spec-
trum and energy optimization of NOMA-enabled small-cell networks
with QoS guarantee,” IEEE Transactions on Vehicular Technology,
vol. 70, no. 8, pp. 8337–8342, 2021.
[47] W. U. Khan, A. Ihsan, T. N. Nguyen, Z. Ali, and M. A. Javed,
“NOMA-enabled backscatter communications for green transportation in
automotive-industry 5.0, IEEE Transactions on Industrial Informatics,
vol. 18, no. 11, pp. 7862–7874, 2022.
[48] W. U. Khan, E. Lagunas, A. Mahmood, Z. Ali, M. Asif, S. Chatzino-
tas, and B. Ottersten, “Integration of noma with reflecting intelligent
surfaces: A multi-cell optimization with sic decoding errors,” IEEE
Transactions on Green Communications and Networking, 2023.
[49] W. U. Khan, E. Lagunas, A. Mahmood, Z. Ali, S. Chatzinotas, B. Otter-
sten, and O. A. Dobre, “Integration of backscatter communication with
multi-cell noma: a spectral efficiency optimization under imperfect sic,
in 2022 IEEE 27th International Workshop on Computer Aided Modeling
and Design of Communication Links and Networks (CAMAD). IEEE,
2022, pp. 147–152.
[50] H. Zhang, N. Yang, W. Huangfu, K. Long, and V. C. Leung, “Power
control based on deep reinforcement learning for spectrum sharing,”
IEEE Transactions on Wireless Communications, vol. 19, no. 6, pp.
4209–4219, 2020.
[51] G. Geraci, A. Garcia-Rodriguez, M. M. Azari, A. Lozano, M. Mezzav-
illa, S. Chatzinotas, Y. Chen, S. Rangan, and M. Di Renzo, “What will
the future of uav cellular communications be? a flight from 5g to 6g,”
IEEE communications surveys & tutorials, vol. 24, no. 3, pp. 1304–1335,
2022.
[52] X. Lin, S. Rommer, S. Euler, E. A. Yavuz, and R. S. Karlsson,
“5g from space: An overview of 3gpp non-terrestrial networks, IEEE
Communications Standards Magazine, vol. 5, no. 4, pp. 147–153, 2021.
[53] X. Lin, “An overview of 5g advanced evolution in 3gpp release 18,”
IEEE Communications Standards Magazine, vol. 6, no. 3, pp. 77–83,
2022.
[54] P. V. R. Ferreira, R. Paffenroth, A. M. Wyglinski, T. M. Hackett, S. G.
Bilen, R. C. Reinhart, and D. J. Mortensen, “Reinforcement learning
for satellite communications: From leo to deep space operations,” IEEE
Communications Magazine, vol. 57, no. 5, pp. 70–75, 2019.
[55] C. Zhong, M. C. Gursoy, and S. Velipasalar, “Deep reinforcement
learning-based edge caching in wireless networks,” IEEE Transactions
on Cognitive Communications and Networking, vol. 6, no. 1, pp. 48–61,
2020.
[56] Z. Zhang, H. Chen, M. Hua, C. Li, Y. Huang, and L. Yang, “Double
coded caching in ultra dense networks: Caching and multicast scheduling
via deep reinforcement learning,” IEEE Transactions on Communica-
tions, vol. 68, no. 2, pp. 1071–1086, 2019.
[57] Y. Qian, R. Wang, J. Wu, B. Tan, and H. Ren, “Reinforcement learning-
based optimal computing and caching in mobile edge network,” IEEE
Journal on Selected Areas in Communications, vol. 38, no. 10, pp. 2343–
2355, 2020.
[58] T. Zhang, Z. Wang, Y. Liu, W. Xu, and A. Nallanathan, “Joint resource,
deployment, and caching optimization for ar applications in dynamic
uav noma networks, IEEE Transactions on Wireless Communications,
vol. 21, no. 5, pp. 3409–3422, 2021.
[59] ——, “Caching placement and resource allocation for cache-enabling
uav noma networks, IEEE Transactions on Vehicular Technology,
vol. 69, no. 11, pp. 12 897–12 911, 2020.
[60] X. Li, H. Zhang, W. Li, and K. Long, “Multi-agent drl for user
association and power control in terrestrial-satellite network, in 2021
IEEE Global Communications Conference (GLOBECOM). IEEE, 2021,
pp. 1–5.
[61] G. Geraci, D. L´
opez-P´
erez, M. Benzaghta, and S. Chatzinotas, “In-
tegrating terrestrial and non-terrestrial networks: 3d opportunities and
challenges,” IEEE Communications Magazine, vol. 61, no. 4, pp. 42–
48, 2023.
[62] W. U. Khan, J. Liu, F. Jameel, M. T. R. Khan, S. H. Ahmed, and
R. J¨
antti, “Secure backscatter communications in multi-cell NOMA
networks: Enabling link security for massive IoT networks, in IEEE
INFOCOM 2020-IEEE Conference on Computer Communications Work-
shops (INFOCOM WKSHPS). IEEE, 2020, pp. 213–218.
[63] H. Jiang, X. Dai, Z. Xiao, and A. Iyengar, “Joint task offloading
and resource allocation for energy-constrained mobile edge computing,”
IEEE Transactions on Mobile Computing, vol. 22, no. 7, pp. 4000–4015,
July 2023.
[64] W. U. Khan, X. Li, A. Ihsan, M. A. Khan, V. G. Menon, and M. Ahmed,
“NOMA-enabled optimization framework for next-generation small-cell
IoV networks under imperfect SIC decoding,” IEEE Transactions on
Intelligent Transportation Systems, vol. 23, no. 11, pp. 22 442–22 451,
2021.
[65] Q. Liu, R. Liu, Z. Wang, and J. S. Thompson, “Uav swarm-enabled
localization in isolated region: a rigidity-constrained deployment per-
spective, IEEE Wireless Communications Letters, vol. 10, no. 9, pp.
2032–2036, 2021.
[66] Q. Liu, R. Liu, Z. Wang, L. Han, and J. S. Thompson, A v2x-integrated
positioning methodology in ultradense networks,” IEEE Internet of
Things Journal, vol. 8, no. 23, pp. 17 014–17 028, 2021.
[67] Q. Liu, R. Liu, Y. Zhang, Y. Yuan, Z. Wang, H. Yang, L. Ye, M. Guizani,
and J. S. Thompson, “Management of positioning functions in cellular
networks for time-sensitive transportation applications, IEEE Transac-
tions on Intelligent Transportation Systems, 2023.
[68] B. Cao, J. Zhang, X. Liu, Z. Sun, W. Cao, R. M. Nowak, and
Z. Lv, “Edge–cloud resource scheduling in space–air–ground-integrated
networks for internet of vehicles, IEEE Internet of Things Journal,
vol. 9, no. 8, pp. 5765–5772, April 2022.
[69] W. U. Khan, Z. Ali, E. Lagunas, S. Chatzinotas, and B. Ottersten,
“Rate Splitting Multiple Access for Cognitive Radio GEO-LEO Co-
Existing Satellite Networks,” in GLOBECOM 2022-2022 IEEE Global
Communications Conference. IEEE, 2022, pp. 5165–5170.
[70] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath,
“Deep reinforcement learning: A brief survey,” IEEE Signal Processing
Magazine, vol. 34, no. 6, pp. 26–38, 2017.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Unmanned aerial vehicles (UAV) have emerged as a practical solution that provides on-demand services to users in areas where the terrestrial network is non-existent or temporarily unavailable, e.g., due to natural disasters or network congestion. In general, UAVs' user-serving capacity is typically constrained by their limited battery life and the finite communication resources that highly impact their performance. This work considers the orthogonal frequency division multiple access (OFDMA) enabled multiple unmanned aerial vehicles (multi-UAV) communication systems to provide on-demand services. The main aim of this work is to derive an efficient technique for the allocation of radio resources, 3D placement of UAVs, and user association matrices. To achieve the desired objectives, we decoupled the original joint optimization problem into two sub-problems: (i) 3D placement and user association and (ii) sum-rate maximization for optimal radio resource allocation, which are solved iteratively. The proposed iterative algorithm is shown via numerical results to achieve fast convergence speed after fewer than 10 iterations. The benefits of the proposed design are demonstrated via superior sum-rate performance compared to existing reference designs. Moreover, results showed that the optimal power and sub-carrier allocation help to mitigate the inter-cell interference that directly impacts the system's performance.
Article
Full-text available
The forthcoming era of the automotive industry, known as Automotive-Industry 5.0, will leverage the latest advancements in 6G communications technology to enable reliable, computationally advanced, and energy-efficient exchange of data between diverse onboard sensors, drones and other vehicles. We propose a non-orthogonal multiple access (NOMA) multi-drone communications network in order to address the requirements of enormous connections, various quality of services (QoS), ultra-reliability, and low latency in upcoming sixth-generation (6G) drone communications. Through the use of a power optimization framework, one of our goals is to evaluate the energy efficiency of the system. In particular, we define a non-convex power optimization problem while considering the possibility of imperfect successive interference cancellation (SIC) detection. Therefore, the goal is to reduce the total energy consumption of NOMA drone communications while guaranteeing the lowest possible rate for wireless devices. We use a novel method based on iterative sequential quadratic programming (SQP) to get the best possible solution to the non-convex optimization problem so that we may move on to the next step and solve it. The standard OMA framework, the Karush–Kuhn–Tucker (KKT)-based NOMA framework, and the average power NOMA framework are compared with the newly proposed optimization framework. The results of the Monte Carlo simulation demonstrate the accuracy of our derivations. The results that have been presented also demonstrate that the optimization framework that has been proposed is superior to previous benchmark frameworks in terms of system-achievable energy efficiency.
Article
Low Earth Orbit (LEO) satellite networks are expected to play a crucial role in providing high-speed internet access and low-latency communication worldwide. However, some challenges can affect the performance of LEO satellite networks. For example, they can face energy and spectral efficiency challenges, such as high power consumption and spectral congestion, due to the increasing number of satellites. Furthermore, mobile ground users tend to operate with low directive antennas, which pose significant challenges in closing the LEO-to-ground communication link, especially when operating at a high-frequency range. To overcome these challenges, energy-efficient technologies like reconfigurable intelligent surfaces (RIS) and advanced spectrum management techniques like non-orthogonal multiple access (NOMA) can be employed. RIS can improve signal quality and reduce power consumption, while NOMA can enhance spectral efficiency by sharing the same resources among multiple users. This paper proposes an energy-efficient RIS-assisted downlink NOMA communication for LEO satellite networks while ensuring the quality of services. The proposed framework simultaneously optimizes the NOMA transmit power of the LEO satellite and the passive beamforming of RIS, considering the assumption of imperfect successive interference cancellation. Due to the nature of the considered system and optimization variables, the energy efficiency maximization problem is non-convex. In practice, obtaining the optimal solution for such problems is very challenging. Therefore, we adopt alternating optimization methods to handle the joint optimization in two steps. In step 1, for any given phase shift vector, we calculate satellite transmit power towards each ground terminal using the Lagrangian dual method. Then, in step 2, given the transmit power, we design passive beamforming for RIS by solving the semi-definite programming. We also compare our solution with a benchmark framework having a fixed phase shift design and a conventional NOMA framework without involving RIS. Numerical results show that the proposed optimization framework achieves 21.47% and 54.9% higher energy efficiency compared to the benchmark and conventional frameworks.
Article
The connected and autonomous vehicles (CAV) applications and services-based traffic make an extra burden on the already congested cellular networks. Offloading is envisioned as a promising solution to tackle cellular networks’ traffic explosion problem. Notably, vehicular traffic offloading leveraging different vehicular communication network (VCN) modes is one of the potential techniques to address the data traffic problem in cellular networks. This paper surveys the state-of-the-art literature for vehicular data offloading under a communication perspective, i.e., vehicle to vehicle (V2V), vehicle to roadside infrastructure (V2I), and vehicle to everything (V2X). First, we pinpoint the significant classification of vehicular data/traffic offloading techniques, considering whether data is to download or upload. Next, for better intuition of each data offloading’s category, we sub-classify the existing schemes based on their objectives. Then, the existing literature on vehicular data/traffic is elaborated, compared, and analyzed based on approaches, objectives, merits, demerits, etc. Finally, we highlight the open research challenges in this field and predict future research trends.
Article
Low Earth Orbit (LEO) satellite communication (SatCom) has drawn particular attention recently due to its high data rate services and low round-trip latency. It has low launching and manufacturing costs than Medium Earth Orbit (MEO) and Geostationary Earth Orbit (GEO) satellites. Moreover, LEO SatCom has the potential to provide global coverage with a high-speed data rate and low transmission latency. However, the spectrum scarcity might be one of the challenges in the growth of LEO satellites, impacting severe restrictions on developing ground-space integrated networks. To address this issue, cognitive radio and rate splitting multiple access (RSMA) are the two emerging technologies for high spectral efficiency and massive connectivity. This paper proposes a cognitive radio enabled LEO SatCom using RSMA radio access technique with the coexistence of GEO SatCom network. In particular, this work aims to maximize the sum rate of LEO SatCom by simultaneously optimizing the power budget over different beams, RSMA power allocation for users over each beam, and subcarrier user assignment while restricting the interference temperature to GEO SatCom. The problem of sum rate maximization is formulated as non-convex, where the global optimal solution is challenging to obtain. Thus, an efficient solution can be obtained in three steps: first we employ a successive convex approximation technique to reduce the complexity and make the problem more tractable. Second, for any given resource block user assignment, we adopt KarushKuhnTucker (KKT) conditions to calculate the transmit power over different beams and RSMA power allocation of users over each beam. Third, using the allocated power, we design an efficient algorithm based on the greedy approach for resource block user assignment. For comparison, we propose two suboptimal schemes with fixed power allocation over different beams and random resource block user assignment as the benchmark. Numerical results provided in this work are obtained based on the Monte Carlo simulations, which demonstrate the benefits of the proposed optimization scheme compared to the benchmark schemes.
Article
Reflecting intelligent surfaces (RIS) has gained significant attention due to its high energy and spectral efficiency in next-generation wireless networks. By using low-cost passive reflecting elements, RIS can smartly reconfigure the signal propagation to extend the wireless communication coverage. On the other hand, non-orthogonal multiple access (NOMA) has been proven as a key air interface technique for supporting massive connections over limited resources. Utilizing the superposition coding and successive interference cancellation (SIC) techniques, NOMA can multiplex multiple users over the same spectrum and time resources by allocating different power levels. This paper proposes a new optimization scheme in a multi-cell RIS-NOMA network to enhance the spectral efficiency under SIC decoding errors. In particular, the power budget of the base station and the transmit power of NOMA users while the passive beamforming of RIS is simultaneously optimized in each cell. Due to objective function and quality of service constraints, the joint problem is formulated as non-convex, which is very complex and challenging to obtain the optimal global solution. To reduce the complexity and make the problem tractable, we first decouple the original problem into two sub-problems for power allocation and passive beamforming. Then, the efficient solution of each sub-problem is obtained in two-steps. In the first-step of For power allocation sub-problem, we transform it to a convex problem by the inner approximation method and then solve it through a standard convex optimization solver in the second-step. Accordingly, in the first-step of passive beamforming, it is transformed into a standard semi-definite programming problem by successive convex approximation and different of convex programming methods. Then, penalty based method is used to achieve a Rank-1 solution for passive beamforming in second-step. Numerical results demonstrate the benefits of the proposed optimization scheme in the multi-cell RIS-NOMA network.
Article
Device positioning has generally been recognized as an enabling technology for numerous vehicular applications in intelligent transportation systems (ITS). The downlink time difference of arrival (DL-TDOA) technique in cellular networks requires range information of geographically diverse base stations (BSs) to be measured by user equipment (UE) through the positioning reference signal (PRS). However, inter-cell interference from surrounding BSs can be particularly serious under poor network planning or dense deployments. This may lead to a relatively longer measurement time to locate the UE, causing an unacceptable location update rate to time-sensitive applications. In this case, PRS muting of certain wireless resources has been envisioned as a promising solution to increase the detectability of a weak BS. In this paper, to reduce UE measurement latency while ensuring high location accuracy, we propose a muting strategy managed by positioning functions that utilizes a combination of optimized pseudo-random sequences (CO-PRS) for multiple BSs to coordinate the muting of PRS resources. The original sequence is first truncated according to the muting period, and a modified greedy selection is performed to form a set of control sequences as the muting configurations (MC) with balance and concurrency constraints. Moreover, efficient information exchange can be achieved with the seeds used for regenerating the MC. Extensive simulations demonstrate that the proposed scheme outperforms the conventional random and ideal muting benchmarks in terms of measurement latency by about 30%, especially when dealing with severe near-far problems in cellular networks.
Article
Integrating terrestrial and non-terrestrial networks has the potential of connecting the unconnected and enhancing the user experience for the already-connected, with technological and societal implications of the greatest long-term significance. A convergence of ground, air, and space wireless communications also represents a formidable endeavor for the mobile and satellite communications industries alike, as it entails defining and intelligently orchestrating a new 3D wireless network architecture. In this article, we present the key opportunities and challenges arising from this revolution by presenting some of its disruptive use cases and key building blocks, reviewing the relevant standardization activities, and pointing to open research problems. By considering two multi-operator paradigms, we also showcase how terrestrial networks could be efficiently re-engineered to cater for aerial services, or opportunistically complemented by nonterrestrial infrastructure to augment their current capabilities.