Fig 1 - uploaded by Fahime Khoramnejad
Content may be subject to copyright.
Reinforcement learning with deep Q-Network (DQN).

Reinforcement learning with deep Q-Network (DQN).

Source publication
Article
Full-text available
Multi-access edge computing (MEC) is an important enabling technology for 5G and 6G networks. With MEC, mobile devices can offload their computationally heavy tasks to a nearby server which can be a simple node at a base station, a vehicle or another device. With the increasing number of devices, slices and multiple radio access technologies, the p...

Contexts in source publication

Context 1
... depicted in Fig. 1, the target network is used for obtaining Y DQN m (which is stated in equation (26)). Also, as mentioned in reference [16], [46], the loss function is given in terms of the difference between Y DQN m and the Q-value obtained from the online network. To optimize the loss function, the parameters for the online network are updated by ...
Context 2
... the total average transmit power level for the users grows. However, for the situations with more than 20 users, DJROM algorithm and MARL-DQN algorithm have better performance (in terms of the total transmit power level for the users) than the MARL-DQL algorithm. This is expected due to using the target network and the experience replay. In Fig. 10, it is shown that the total average of computational efficiency for the users is increasing when the number of users grows. Specifically, DJROM algorithm has better performance (in terms of the total average of computational efficiency for the users) in the networks with more than 20 users. This may be because of mitigating the ...

Similar publications

Article
Full-text available
Recently, computation offloading has become an effective method for overcoming the constraint of a mobile device (MD) using computation-intensive mobile and offloading delay-sensitive application tasks to the remote cloud-based data center. Smart city benefitted from offloading to edge point. Consider a mobile edge computing (MEC) network in multip...
Article
Full-text available
Designing 5G networks and related applications such as the Internet of Things (IoT), Cellular and autonomous vehicular networks (AVNET) is a challenge. Indeed, these networks are subjected to a number of constraints that play a crucial role in the network’s quality of service (QoS). Among these constraints, we have the management of computing, stor...
Article
Full-text available
Mobile edge cloud networks can be used to offload computationally intensive tasks from Internet of Things (IoT) devices to nearby mobile edge servers, thereby lowering energy consumption and response time for ground mobile users or IoT devices. Integration of Unmanned Aerial Vehicles (UAVs) and the mobile edge computing (MEC) server will significan...
Article
Full-text available
Mobile Edge Computing (MEC) is a new method to overcome the resource limitations of mobile devices by enabling Computation Offloading (CO) with low latency. This paper proposes a multi-user multi-task effective system to offload computations for MEC that guarantees in terms of energy, latency for MEC. To begin, radio and computation resources are i...
Preprint
Full-text available
The fog radio access network (F-RAN) is a promising technology in which the user mobile devices (MDs) can offload computation tasks to the nearby fog access points (F-APs). Due to the limited resource of F-APs, it is important to design an efficient task offloading scheme. In this paper, by considering time-varying network environment, a dynamic co...

Citations

... Network slicing offers a more granular quality of service that can be tailored to specific applications while also providing service isolation between them. [4][5][6] When multiple applications share the same network infrastructure, allocating resources end-to-end (E2E) becomes a challenging task that requires careful management to maintain high service levels. [7][8][9] However, managing the resources of an E2E network to address changing service demands is a complex task. ...
Article
Full-text available
    6G network services demand significant computer resources. Network slicing offers a potential solution by enabling customized services on shared infrastructure. However, dynamic service needs in heterogeneous environments pose challenges to resource provisioning. 6G applications like extended reality and connected vehicles require service differentiation for optimal quality of experience (QoE). Granular resource allocation within slices is a complex issue. To address the complexity of QoE services in dynamic slicing, a deep reinforcement learning (DRL) approach called customized sub‐slicing is proposed. This approach involves splitting access, transport, and core slices into sub‐slices to handle service differentiation among 6G applications. The focus is on creating sub‐slices and dynamically scaling slices for intelligent resource allocation and reallocation based on QoS requirements for each sub‐slice. The problem is formulated as an integer linear programming (ILP) optimization problem with real‐world constraints. To effectively allocate sub‐slices and dynamically scale resources, the Advantage Actor‐Critic (A2C)‐based Network Sub‐slice Allocation and Optimization (NS‐AO) algorithm is proposed. Experimental results demonstrate that the proposed algorithm outperforms the state of the art in terms of training stability, learning time, sub‐slice acceptance rate, and resilience to topology changes.
    ... Key studies focus on multi-domain network slicing frameworks [8], solutions to the Virtual Network Embedding (VNE) problem using Algorithm Selection (AS) and Deep Reinforcement Learning (DRL) [9], and the impact of traffic demand forecasting on DRL slicing agent performance [10]. Reviews in network security for 6G identify critical challenges in this domain [11,12]. These early works indicate the significant potential of network slicing as a foundational technology in 6G networks. ...
    Article
    Full-text available
    This research paper focuses on thoroughly examining the challenges in 6G network slicing. To develop, evaluate performance characteristics for on-demand reallocation and instantaneously changeable QoS EvoNetSlice model. The study employs integrated evolutionary algorithms with artificial intelligence-enabled data analytics and multi-objective optimization to optimize network resources usage under minimum end-to-end delay, high transmission rates and optimal background data management. Firstly, the network resource allocation individuals should be based on the network traffic data, QoD (quality of demand) value for some applications and users’ behaviors. The performance degradation detection and quality of service (QoS) adaptation mechanism combined with a multi-layer objective fitness function for achieving good balance in conflict between conflicting objectives. Results indicate that EvoNetSlice improves the general efficiency of a particular network, adapts according to ever shifting requirements for QoS at any time and provides crucial statistics-focused data on network management. The importance of this work lies in developing the future 6G network’s technology. W the key issues, including resource optimization and real-time adaptation required to support modern 6G services, are considered by EvoNetSlice. Such an exploration is an essential element in developing flexible 6G systems that will define next-generation wireless communication.
    ... D2QN [20] has wide application in many fields. For example, Huang applied D2QN in joint relay selection and power allocation in the secure cognitive radio relay network [34], Tan designed a D2QN-based handover algorithm for vehicleto-network communications problem [35], Khoramnejad proposed a D2QN-based component carriers management algorithm to solve the stochastic game in 5G covering problem [36], Khoramnejad also utilized D2QN to joint resource allocation and offloading problem [37], Sharma trained D2QN agent for fire evacuation environment [38]. Bui applied D2QN in the battery energy storage system. ...
    Article
    Full-text available
    —Distributed manufacturing involving heterogeneous factories presents significant challenges to enterprises. Furthermore, the need to prioritize various jobs based on order urgency and customer importance further complicates the scheduling process. Consequently, this study addresses the practical issue by tackling the distributed heterogeneous hybrid flow shop scheduling problem with multiple priorities of jobs (DHHFSP-MPJ). The primary objective is to simultaneously minimize the total weighted tardiness and total energy consumption. To solve DHHFSP-MPJ, a double deep Q-network-based co-evolution (D2QCE) is developed with four features: i) The global and local searches are allocated into two populations to balance computational resources; ii) A hybrid heuristic strategy is proposed to obtain an initialized population with great convergence and diversity; iii) Four knowledge-based neighborhood structures are proposed to accelerate converging. Next, the double deep Q-Network is applied to learn operator selection; and iv) An energy-efficient strategy is presented to save energy. To verify the effectiveness of D2QCE, five state-of-the-art algorithms are compared on 20 instances and a real-world case. The results of numerical experiments indicate that: i) The D2QN can learn fast by only consuming a few computation resources and can select the best operator. ii) Combining D2QN and co-evolution can vastly improve the performance of evolutionary algorithms for solving distributed shop scheduling. iii) The proposed D2QCE has better performance than state-of-the-arts for DHHFSP-MPJ.
    ... Moreover, DQN can efficiently solve the strategies selection problem with large state space without defining the states set. Due to its advantage, DQN has been widely applied in different problems, including reversible data hiding for color images [31], vehicle-to-network communications [32], joint offloading and resource allocation [33], battery energy storage systems [34], gait pattern controller for humanoid robots [35], edge computing [36], Internet of Things [37], scheduling agile earth observation satellites [38], and FJS [39], [40]. Thus, considering the large state space of DHFJS, DQN is applied to learn and decide how to select an optimal operator. ...
    Article
    Full-text available
    Energy-aware distributed heterogeneous flexible job shop scheduling (DHFJS) problem is an extension of the traditional FJS, which is harder to solve. This work aims to minimize total energy consumption (TEC) and makespan for DHFJS. A deep $Q$ -networks-based co-evolution algorithm (DQCE) is proposed to solve this NP-hard problem, which includes four parts: First, a new co-evolutionary framework is proposed, which allocates sufficient computation to global searching and executes local search surrounding elite solutions. Next, nine problem features-based local search operators are designed to accelerate convergence. Moreover, deep $Q$ -networks are applied to learn and select the best operator for each solution. Furthermore, an efficient heuristic method is proposed to reduce TEC. Finally, $20$ instances and a real-world case are employed to evaluate the effectiveness of DQCE. Experimental results indicate that DQCE outperforms the six state-of-the-art algorithms for DHFJS.
    ... To verify the performance of the DSSO proposed in this paper, DSSO is compared with LOCAL, MINCO [39], and DJROM [40] for experiments. ...
    Article
    Full-text available
    With the surge in tasks for in-vehicle terminals, the resulting network congestion and time delay cannot meet the service needs of users. Offloading algorithms are introduced to handle vehicular tasks, which will greatly improve the above problems. In this paper, the dependencies of vehicular tasks are represented as directed acyclic graphs, and network slices are integrated within the edge server. The Dynamic Selection Slicing-based Offloading Algorithm for in-vehicle tasks in MEC (DSSO) is proposed. First, a computational offloading model for vehicular tasks is established based on available resources, wireless channel state, and vehicle loading level. Second, the solution of the model is transformed into a Markov decision process, and the combination of the DQN algorithm and Dueling Network from deep reinforcement learning is used to select the appropriate slices and dynamically update the optimal offloading strategy for in-vehicle tasks in the effective interval. Finally, an experimental environment is set up to compare the DSSO algorithm with LOCAL, MINCO, and DJROM, the results show that the system energy consumption of DSSO algorithm resources is reduced by 10.31%, the time latency is decreased by 22.75%, and the ratio of dropped tasks is decreased by 28.71%.
    ... The authors have used this method in [26], [27] as well 1 . To address the RL system, we employ the CA2C algorithm proposed [17]. ...
    ... 1 Due to the space limitation, we have dropped the details of formulating the stochastic game and MDP and refer the reader to the works above in [26], [27]. VOLUME 4, 2016 7 This article has been accepted for publication in IEEE Access. ...
    ... Specifically, Q b ps, a b q is defined in terms of the expectation of the weighted sum of the short-term reward for the agent [30]. Based on what is discussed in [17] and [26], the optimal policy for an agent gNB b is the policy under which the Q-function for that agent is maximized, i.e., ab " πb psq " arg max ...
    Article
    Full-text available
    Aggregating multiple component carriers (CCs) from different frequency bands, also known as Carrier Aggregation (CA), and Dual Connectivity (DC), i.e., concurrently transmitting and receiving from two nodes or cell groups, are employed in 5G and 6G wireless networks to enhance coverage and capacity. In wireless networks with DC and CA, the performance can be boosted by dynamically adjusting the uplink (UL) transmit power level for the user equipments (UEs) and properly activating/deactivating the CCs for the UEs. In this paper, we study the problem of joint dynamic UL power-sharing and CC management. The objective is to simultaneously minimize the delay and power consumption for the UEs. The pertinent problem is a multi-objective optimization problem with both discrete and continuous variables and therefore is hard to solve. We first model it as a multi-agent reinforcement learning (RL) system with compound action to handle the problem. Then, we employ a compound-action actor-critic algorithm to find the optimal policy and propose the Joint Power-Sharing and Carrier Aggregation (JPSCA) algorithm. The performance of the JPSCA algorithm is compared with two baseline algorithms. Our results show that the performance of the JPSCA algorithm in terms of the average rate, delay, and UL transmit power level outperforms the baselines where UL power control and CC management are performed disjointly. For 25 UEs, our proposed JPSCA algorithm decreases the UE power consumption and UE delay by about 28% and 16%, respectively, concerning the all-CC and equal power-sharing schemes.
    ... The reverse auction mechanism maximizes the saving cost of the CSP and the offloading rate under different scenarios. In [105], a multiagent double deep Q-learning (MA-DDQN) approach is presented for partial offloading and binary offloading in MECenabled wireless networks. It considers joint offloading and resource allocation in the uplink channels. ...
    ... Summary: Some recent works addressing task offloading using ML techniques in 6G networks are discussed. The proposed ML approach and an overview of the advantages and limitations of the recent works are provided in Table 7. Furthermore, the recent works are categorized according to the ML types such as DL [101], uSML [102], DRL [103][104][105][106][107], and FL [108][109][110] to address task offloading problems as explained in Fig. 8. These recent works focused to increase the computation efficiency, optimize resource utilization, enhance the convergence speed, and reduce the system delay. ...
    Article
    Full-text available
    The upcoming 6G networks are sixth-sense next-generation communication networks with an ever-increasing demand for enhanced end-to-end (E2E) connectivity towards a connected, sustainable world. Recent developments in artificial intelligence (AI) have enabled a wide range of novel technologies through the availability of advanced machine learning (ML) models, large datasets, and high computational power. In addition, intelligent resource management is a key feature of 6G networks that enables self-configuration and self-healing by leveraging the parallel computing and autonomous decision-making ability of ML techniques to enhance energy efficiency and computational capacity in 6G networks. Consequently, ML techniques will play a significant role in addressing resource management and mobility management challenges in 6G wireless networks. This article provides a comprehensive review of state-of-the-art ML algorithms applied in 6G wireless networks, categorized into learning types, including supervised and unsupervised machine learning, Deep Learning (DL), Reinforcement Learning (RL), Deep Reinforcement Learning (DRL) and Federated Learning (FL). In particular, we review the ML algorithms applied in the emerging networks paradigm, such as device-to-device (D2D) networks, vehicular networks (Vnet), and Fog-Radio Access Networks (F-RANs). We highlight the ML-based solutions to address technical challenges in terms of resource allocation, task offloading, and handover management. We also provide a detailed review of the ML techniques to improve energy efficiency and reduce latency in 6G wireless networks. To this end, we identify the open research issues and future trends concerning ML-based intelligent resource management applications in 6G networks.
    ... A fairly common solution technique is based on the use of deep-Q networks (DQN). Several resource allocation problems in computationally constrained environments [12], [13], [14], [15] and other related issues like joint server selection, task offloading, and handover [16], [17], [18], [19], [20], [21], [22], [23] in multi-access edge computing wireless networks have been tackled through DQNs. However, extending such approaches to a multi-service scenario falls into serious scalability issues. ...
    ... More deep RL approaches are proposed for resource allocation in edge applications dealing with industrial IoT and internet of medical things [16], [17]. Other related resource allocation problems in computationally constrained scenarios such as joint server selection, task offloading and handover in multi-access edge computing wireless networks have been tackled through DQNs as in [18], [19], [20], [21]. Besides, solutions for other computation resource dependent problems like content caching [23], and network function placement in edge servers [22] have also been proposed using DQN. ...
    Article
    Full-text available
    The combination of service virtualization and edge computing allows for low latency services, while keeping data storage and processing local. However, given the limited resources available at the edge, a conflict in resource usage arises when both virtualized user applications and network functions need to be supported. Further, the concurrent resource request by user applications and network functions is often entangled, since the data generated by the former has to be transferred by the latter, and vice versa. In this paper, we first show through experimental tests the correlation between a video-based application and a vRAN. Then, owing to the complex involved dynamics, we develop a scalable reinforcement learning framework for resource orchestration at the edge, which leverages a Pareto analysis for provable fair and efficient decisions. We validate our framework, named VERA, through a real-time proof-of-concept implementation, which we also use to obtain datasets reporting real-world operational conditions and performance. Using such experimental datasets, we demonstrate that VERA meets the KPI targets for over $96\%$ of the observation period and performs similarly when executed in our real-time implementation, with KPI differences below 12.4%. Further, its scaling cost is $54\%$ lower than a centralized framework based on deep-Q networks.
    ... Li et al. [17] presented a heterogeneous MEC network based on reinforcement learning (RL) to optimize resource allocation in the wireless system. Khoramnejad et al. [18] discussed a DRL-based joint traffic offloading and resource management scheme to promote content delivery in MEC-assisted networks. Xu et al. [19] integrated collaborative caching and DRL to build an intelligent edge caching framework to reduce redundant content and transmission delay. ...
    Article
    Full-text available
    Complex dynamic services and heterogeneous network environments make the asymmetrical control a curial issue to handle on the Internet. With the advent of the Internet of Things (IoT) and the fifth generation (5G), the emerging network applications lead to the explosive growth of mobile traffic while bringing forward more challenging service requirements to future radio access networks. Therefore, how to effectively allocate limited heterogeneous network resources to improve content delivery for massive application services to ensure network quality of service (QoS) becomes particularly urgent in heterogeneous network environments. To cope with the explosive mobile traffic caused by emerging Internet services, this paper designs an intelligent optimization strategy based on deep reinforcement learning (DRL) for resource allocation in heterogeneous cloud-edge-end collaboration environments. Meanwhile, the asymmetrical control problem caused by complex dynamic services and heterogeneous network environments is discussed and overcome by distributed cooperation among cloud-edge-end nodes in the system. Specifically, the multi-layer heterogeneous resource allocation problem is formulated as a maximal traffic offloading model, where content caching and request aggregation mechanisms are utilized. A novel DRL policy is proposed to improve content distribution by making cache replacement and task scheduling for arriving content requests in accordance with the information about users’ history requests, in-network cache capacity, available link bandwidth and topology structure. The performance of our proposed solution and its similar counterparts are analyzed in different network conditions.
    ... They also identified that using AI for energy efficiency would be essential in 6G. Furthermore, Khoramnejad et al. [8] proposed a deep RL approach to solve a joint optimization problem consisting of maximizing computation and minimizing energy consumption for a 5G and beyond network through offloading. Their network also makes use of MEC servers as a processing unit to assist their network in computing-intensive tasks. ...
    ... We used the Q-Learning algorithm presented in [6]. The Q-Learning algorithm uses the action set defined in (7), reward function defined in (8), and epsilongreedy policy. The Q-Learning algorithm's state is the same as (6), but without the transmission delays ∆ j1∈J +t∈T j2∈J + . ...
    Preprint
    The fifth and sixth generations of wireless communication networks are enabling tools such as internet of things devices, unmanned aerial vehicles (UAVs), and artificial intelligence, to improve the agricultural landscape using a network of devices to automatically monitor farmlands. Surveying a large area requires performing a lot of image classification tasks within a specific period of time in order to prevent damage to the farm in case of an incident, such as fire or flood. UAVs have limited energy and computing power, and may not be able to perform all of the intense image classification tasks locally and within an appropriate amount of time. Hence, it is assumed that the UAVs are able to partially offload their workload to nearby multi-access edge computing devices. The UAVs need a decision-making algorithm that will decide where the tasks will be performed, while also considering the time constraints and energy level of the other UAVs in the network. In this paper, we introduce a Deep Q-Learning (DQL) approach to solve this multi-objective problem. The proposed method is compared with Q-Learning and three heuristic baselines, and the simulation results show that our proposed DQL-based method achieves comparable results when it comes to the UAVs' remaining battery levels and percentage of deadline violations. In addition, our method is able to reach convergence 13 times faster than Q-Learning.