Structure of the frame of (a) conventional TDMA and (b) dynamic TDMA.

Structure of the frame of (a) conventional TDMA and (b) dynamic TDMA.

Source publication
Article
Full-text available
The traditional method to solve nondeterministic-polynomial-time (NP)-hard optimization problems is to apply meta-heuristic algorithms. In contrast, Deep Q Learning (DQL) uses memory of experience and deep neural network (DNN) to choose steps and progress towards solving the problem. The dynamic time-division multiple access (DTDMA) scheme is a via...

Contexts in source publication

Context 1
... the LED transmits data to up-to four users (u 0 ,u 1 ,u 2 ,u 3 ), and the channel between the users and LED are indicated by h i (i= 0 to 3) for users u 0 to u 3 , respectively. The VLC networks usually contain mobile users that can change their positions. A change in the position causes a change in the channel condition between the LED and user. Fig. 2 shows the structure of the signal in both TDMA and DTDMA. In both these types, the signal is divided into time slots that are assigned to different users. In TDMA, the duration of the time slots is fixed, whereas, in DTDMA, the duration of time slots is variable, as shown in Fig. 2(b). DTDMA has better resource utilization and ...
Context 2
... a change in the channel condition between the LED and user. Fig. 2 shows the structure of the signal in both TDMA and DTDMA. In both these types, the signal is divided into time slots that are assigned to different users. In TDMA, the duration of the time slots is fixed, whereas, in DTDMA, the duration of time slots is variable, as shown in Fig. 2(b). DTDMA has better resource utilization and data-rates as compared to the conventional TDMA [11]. In DTDMA, the intensity of current at the LED can also be adjusted to improve the system performance further. ...

Similar publications

Article
Full-text available
In this paper, we explore the secrecy performance of a visible light communication (VLC) system consisting of distributed light-emitting diodes (LEDs) and multiple users (UEs) randomly positioned within an indoor environment while considering the presence of an eavesdropper. To enhance the confidentiality of the system, we formulate a problem of ma...

Citations

... ✓ DQN A deep-Q network is an actor-critic algorithm used for learning discrete actions and also learns from replay buffer with previous experiences [30]. ...
Article
Full-text available
From a future perspective and with the current advancements in technology, deep reinforcement learning (DRL) is set to play an important role in several areas like transportation, automation, finance, medical and in many more fields with less human interaction. With the popularity of its fast‐learning algorithms there is an exponential increase in the opportunities for handling dynamic environments without any explicit programming. Additionally, DRL sophisticatedly handles real‐world complex problems in different environments. It has grasped great attention in the areas of natural language processing (NLP), speech recognition, computer vision and image classification which has led to a drastic increase in solving complex problems like planning, decision‐making and perception. This survey provides a comprehensive analysis of DRL and different types of neural network, DRL architectures, and their real‐world applications. Recent and upcoming trends in the field of artificial intelligence (AI) and its categories have been emphasized and potential challenges have been discussed.
... However, in recent years, a few studies have explored an inclusive machine learning (ML) based approach to tackle the bottleneck of handover mechanism in hybrid VLC and RF networks. The authors in [13] investigate the handover techniques in hybrid RF/VLC however fail to address network recovery and decision process. While in [14] an intelligent AP handover technique increases handover efficiency lowering network latency therefore, author [15]RL-based LTE handover parameter optimization to selforganize network (SON) without the inclusion of human intervention.Moreover, this scheme assesses HO performance in SON-based LTE systems using various time to trigger values dependent on user's speed and surrounding cell types. ...
Article
Full-text available
Ever increasing demand for mobile data results in the looming crisis of a spectrum crunch in the existing wireless fidelity (WiFi) networks. To alleviate the spectrum crunch one possible solution is to exploit and investigate the convergence of different networks domain. In an indoor environment, integrating high speed-data transmission of light fidelity (LiFi) and large coverage of WiFi, namely a hybrid LiFi and WiFi networks (HLWNets) have drawn significant attention. Meanwhile, handover issues in HLWNets become challenging because of the complete overlap between LiFi and WiFi coverage areas. Additionally, mobile user devices (UDs) in optical LiFi attocells cause frequent handover while WiFi is vulnerable to traffic overload. In this paper, a downlink hybrid system using four LiFi APs and a WiFi APs is considered to enhance the mobility of UDs and reduce handover overhead using a reinforcement learning (RL) approach. The decision to perform handover is dictated and analyzed through different key performance parameters. Those performance matrices are the rate of change signal interference plus noise ratio (SINR), average throughput of UDs, handover rate, and speed of UDs. Results show that the proposed RL approach provides a noticeable performance for various mobile speed scenarios compared to the existing handover algorism
... For OWC systems, the work in [42] introduced a RLbased resource allocation solution for integrated VLC and VLC positioning (VLCP) systems to maximise the sum-rate of users. In [43], time-slots are allocated intelligently using RL in order to maximise the spectral efficiency of VLC systems where a dynamic time-division multiplexing (DTDMA) scheme was considered. The Q-learning algorithm was also adopted in [36] to solve an optimisation problem for resource allocation in a WDM-based VLC system. ...
Conference Paper
Full-text available
Vertical Cavity Surface Emitting Lasers (VCSELs) have demonstrated suitability for data transmission in indoor optical wireless communication (OWC) systems due to the high modulation bandwidth and low manufacturing cost of these sources. Specifically, resource allocation is one of the major challenges that can affect the performance of multi-user optical wireless systems. In this paper, an optimisation problem is formulated to optimally assign each user to an optical access point (AP) composed of multiple VCSELs within a VCSEL array at a certain time to maximise the signal to interference plus noise ratio (SINR). In this context, a mixed-integer linear programming (MILP) model is introduced to solve this optimisation problem. Despite the optimality of the MILP model, it is considered impractical due to its high complexity, high memory and full system information requirements. Therefore, reinforcement Learning (RL) is considered, which recently has been widely investigated as a practical solution for various optimisation problems in cellular networks due to its ability to interact with environments with no previous experience. In particular, a Q-learning (QL) algorithm is investigated to perform resource management in a steerable VCSEL-based OWC systems. The results demonstrate the ability of the QL algorithm to achieve optimal solutions close to the MILP model. Moreover, the adoption of beam steering, using holograms implemented by exploiting liquid crystal devices, results in further enhancement in the performance of the network considered.
... In [38] an intelligent resource allocation scheme was introduced for integrated VLC and VLC positioning (VLCP) systems using reinforcement learning to maximise the sum-rate achieved by users. The authors in [39] proposed a reinforcement learning based time-slots allocation scheme in VLC systems with dynamic time-division multiplexing (DTDMA) with the objective of maximising the spectral efficiency. ...
Conference Paper
Full-text available
Visible Light Communication (VLC) has been widely investigated during the last decade due to its ability to provide high data rates with low power consumption. In general, resource management is an important issue in cellular networks that can highly effect their performance. In this paper, an optimisation problem is formulated to assign each user to an optimal access point and a wavelength at a given time. This problem can be solved using mixed integer linear programming (MILP). However, using MILP is not considered a practical solution due to its complexity and memory requirements. In addition, accurate information must be provided to perform the resource allocation. Therefore, the optimisation problem is reformulated using reinforcement learning (RL), which has recently received tremendous interest due to its ability to interact with any environment without prior knowledge. In this paper, the resource allocation optimisation problem in VLC systems is investigated using the basic Q-learning algorithm. Two scenarios are simulated to compare the results with the previously proposed MILP model. The results demonstrate the ability of the Q-learning algorithm to provide optimal solutions close to the MILP model without prior knowledge of the system.
... With the development of artificial intelligence, algorithms like deep Q-network have been widely used for decision making in various practical problems [18][19][20]. e use of reinforcement learning in path planning is increasing and has provided different goal-oriented path planning for various types of vehicles due to its strong performance and high applicability in decision making of path selection. Liu et al. designed a best path selection method to help different types of intelligent driving vehicles based on the prior knowledge applied reinforcement learning strategy [21]. ...
Article
Full-text available
Fast road emergency response can minimize the losses caused by traffic accidents. However, emergency rescue on urban arterial roads is faced with the high probability of congestion caused by accidents, which makes the planning of rescue path complicated. This paper proposes a refined path planning method for emergency rescue vehicles on congested urban arterial roads during traffic accidents. Firstly, a rescue path planning environment for emergency vehicles on congested urban arterial roads based on the Markov decision process is established, which focuses on the architecture of arterial roads, taking the traffic efficiency and vehicle queue length into consideration of path planning; then, the prioritized experience replay deep Q-network (PERDQN) reinforcement learning algorithm is used for path planning under different traffic control schemes. The proposed method is tested on the section of East Youyi Road in Xi’an, Shaanxi Province, China. The results show that compared with the traditional shortest path method, the rescue route planned by PERDQN reduces the arrival time to the accident site by 67.1%, and the queue length at upstream of the accident point is shortened by 16.3%, which shows that the proposed method is capable to plan the rescue path for emergency vehicles in urban arterial roads with congestion, shorten the arrival time, and reduce the vehicle queue length caused by accidents.
... For OWC systems, the work in [42] introduced a RL-based resource allocation solution for integrated VLC and VLC positioning (VLCP) systems to maximise the sum-rate of users. In [43], time-slots are allocated intelligently using RL in order to maximise the spectral efficiency of VLC systems where a dynamic timedivision multiplexing (DTDMA) scheme was considered. The Q-learning algorithm was also adopted in [36] to solve an optimisation problem for resource allocation in a WDMbased VLC system. ...
Preprint
Full-text available
Vertical Cavity Surface Emitting Lasers (VCSELs) have demonstrated suitability for data transmission in indoor optical wireless communication (OWC) systems due to the high modulation bandwidth and low manufacturing cost of these sources. Specifically, resource allocation is one of the major challenges that can affect the performance of multi-user optical wireless systems. In this paper, an optimisation problem is formulated to optimally assign each user to an optical access point (AP) composed of multiple VCSELs within a VCSEL array at a certain time to maximise the signal to interference plus noise ratio (SINR). In this context, a mixed-integer linear programming (MILP) model is introduced to solve this optimisation problem. Despite the optimality of the MILP model, it is considered impractical due to its high complexity, high memory and full system information requirements. Therefore, reinforcement Learning (RL) is considered, which recently has been widely investigated as a practical solution for various optimization problems in cellular networks due to its ability to interact with environments with no previous experience. In particular, a Q-learning (QL) algorithm is investigated to perform resource management in a steerable VCSEL-based OWC systems. The results demonstrate the ability of the QL algorithm to achieve optimal solutions close to the MILP model. Moreover, the adoption of beam steering, using holograms implemented by exploiting liquid crystal devices, results in further enhancement in the performance of the network considered.
... The solution to RLbased optimisation problems addresses numerous applications including but not limited to link adaptation, power control, and resource allocation. In [39] an intelligent resource allocation scheme was introduced for integrated VLC and VLC positioning (VLCP) systems using reinforcement learning to maximise the sum-rate achieved by users. The authors in [40] proposed a reinforcement learning based time-slots allocation scheme in VLC systems with dynamic time-division multiplexing (DTDMA) with the objective of maximising the spectral efficiency. ...
Preprint
Full-text available
Visible Light Communication (VLC) has been widely investigated during the last decade due to its ability to provide high data rates with low power consumption. In general, resource management is an important issue in cellular networks that can highly effect their performance. In this paper, an optimisation problem is formulated to assign each user to an optimal access point and a wavelength at a given time. This problem can be solved using mixed integer linear programming (MILP). However, using MILP is not considered a practical solution due to its complexity and memory requirements. In addition, accurate information must be provided to perform the resource allocation. Therefore, the optimisation problem is reformulated using reinforcement learning (RL), which has recently received tremendous interest due to its ability to interact with any environment without prior knowledge. In this paper, we investigate solving the resource allocation optimisation problem in VLC systems using the basic Q-learning algorithm. Two scenarios are simulated to compare the results with the previously proposed MILP model. The results demonstrate the ability of the Q-learning algorithm to provide optimal solutions close to the MILP model without prior knowledge of the system.
... The model-free RL estimates the best actions for the states through interaction with the environment. Q-learning is a popular temporal difference (TD)-based model-free method of RL [2]. It uses a table to store the Q-values of all possibles pairs of states and actions. ...
... In the recent past, the deep reinforcement learning (DRL) approach has successfully been employed to solve optimization problems in many fields of engineering such as non-convex and non-deterministic polynomial-time (NP)hard optimization problems in wireless communications [2], [5]- [8]. Non-orthogonal multiple access technique (NOMA) is an innovative multiple-access method proposed for 5G and beyond networks [9]- [12]. ...
... , u N −1 }, where N is the total number of users. We use the symbol s i to denote the signal of user u i , where u i ∈ U and E(s i ) 2 = 1, i ∈ {0, 1, . . . , N − 1}. ...
... The model-free RL estimates the best actions for the states through interaction with the environment. Q-learning is a popular temporal difference (TD)-based model-free method of RL [2]. It uses a table to store the Q-values of all possibles pairs of states and actions. ...
... In the recent past, the deep reinforcement learning (DRL) approach has successfully been employed to solve optimization problems in many fields of engineering such as non-convex and non-deterministic polynomial-time (NP)hard optimization problems in wireless communications [2], [5]- [8]. Non-orthogonal multiple access technique (NOMA) is an innovative multiple-access method proposed for 5G and beyond networks [9]- [12]. ...
... , u N −1 }, where N is the total number of users. We use the symbol s i to denote the signal of user u i , where u i ∈ U and E(s i ) 2 = 1, i ∈ {0, 1, . . . , N − 1}. ...
... The model-free RL estimates the best actions for the states through interaction with the environment. Q-learning is a popular temporal difference (TD)-based model-free method of RL [2]. It uses a table to store the Q-values of all possibles pairs of states and actions. ...
... In the recent past, the deep reinforcement learning (DRL) approach has successfully been employed to solve optimization problems in many fields of engineering such as non-convex and non-deterministic polynomial-time (NP)hard optimization problems in wireless communications [2], [5]- [8]. Non-orthogonal multiple access technique (NOMA) is an innovative multiple-access method proposed for 5G and beyond networks [9]- [12]. ...
... , u N −1 }, where N is the total number of users. We use the symbol s i to denote the signal of user u i , where u i ∈ U and E(s i ) 2 = 1, i ∈ {0, 1, . . . , N − 1}. ...
Preprint
Full-text available
NOMA is a radio access technique that multiplexes several users over the frequency resource and provides high throughput and fairness among different users. The maximization of the minimum the data-rate, also known as max-min, is a popular approach to ensure fairness among the users. NOMA optimizes the transmission power (or power-coefficients) of the users to perform max-min. The problem is a constrained non-convex optimization for users greater than two. We propose to solve this problem using the Double Deep Q Learning (DDQL) technique, a popular method of reinforcement learning. The DDQL technique employs a Deep Q- Network to learn to choose optimal actions to optimize users’ power-coefficients. The model of the Markov Decision Process (MDP) is critical to the success of the DDQL method, and helps the DQN to learn to take better actions. An MDP model is proposed in which the state consists of the power-coefficients values, data-rate of users, and vectors indicating which of the power-coefficients can be increased or decreased. An action simultaneously increases the power-coefficient of one user and reduces another user’s power-coefficient by the same amount. The amount of change can be small or large. The action-space contains all possible ways to alter the values of any two users at a time. DQN consists of a convolutional layer and fully connected layers. We compared the proposed method with the sequential least squares programming and trust-region constrained algorithms and found that the proposed method can produce competitive results.