Definition of state space.

Source publication

SSPQL: Stochastic shortest path-based Q-learning

Article

Full-text available

Apr 2011

Reinforcement learning (RL) has been widely used as a mechanism for autonomous robots to learn state-action pairs by interacting with their environment. However, most RL methods usually suffer from slow convergence when deriving an optimum policy in practical applications. To solve this problem, a stochastic shortest path-based Q-learning (SSPQL) i...

Convergence curves of DE, SDE, NDE, and NSSDE for some typical test...

Convergence curves of jDE, ODE, DECLS, MGBDE, and NSSDE for some...

Self-Adaptive differential evolution with global neighborhood search

Article

Full-text available

Jul 2017

Differential evolution (DE) is a simple yet efficient stochastic search approach for numerical optimization. However, it tends to suffer from slow convergence when tackling complicated problems. In addition, its search ability is significantly influenced by its control parameters. To improve the performance of the basic DE, this paper proposes a se...

An Enhancement of Reinforcement Learning by Scheduling with Learning Effects

Conference Paper

Sep 2023

Radosław Rudek

Safe, Efﬁcient, and Robust Predictive Control of Constrained Nonlinear Systems

Thesis

Full-text available

Apr 2017

Vishnu R Desaraju

As autonomous systems are deployed in increasingly complex and uncertain environments, safe, accurate, and robust feedback control techniques are required to ensure reliable operation. Accurate trajectory tracking is essential to complete a variety of tasks, but this may be difficult if the system’s dynamics change online, e.g., due to environmental effects or hardware degradation. As a result, uncertainty mitigation techniques are also necessary to ensure safety and accuracy. This problem is well suited to a receding-horizon optimal control formulation via Nonlinear Model Predictive Control (NMPC). NMPC employs a nonlinear model of the plant dynamics to compute non-myopic control policies, thereby improving tracking accuracy relative to reactive approaches. This formulation ensures constraints on the dynamics are satisfied and can compensate for uncertainty in the state and dynamics model via robust and adaptive extensions. However, existing NMPC techniques are computationally expensive, and many operating domains preclude reliable, high-rate communication with a base station. This is particularly difficult for small, agile systems, such as micro air vehicles, that have severely limited computation due to size, weight, and power restrictions but require high-rate feedback control to maintain stability. Therefore, the system must be able to operate safely and reliably with typically limited onboard computational resources. In this thesis, we propose a series of non-myopic, computationally-efficient, feedback control strategies that enable accurate and reliable operation in the presence of unmodeled system dynamics and state uncertainty. The key concept underlying these techniques is the reuse of past experiences to reduce online computation and enhance control performance in novel scenarios. These experiences inform an online-updated estimate of the system dynamics model and the choice of controller to optimize performance for a given scenario. We present a set of simulation and experimental studies with a small aerial robot operating in windy environments to assess the performance of the proposed control methodologies. These results demonstrate that leveraging past experiences to inform feedback control yields high-rate, constrained, robust-adaptive control and enables the deployment of predictive control techniques on systems with severe computational constraints.

Optimal Parameters Configuration for TCP Goodput Improvement in CR Networks

Conference Paper

May 2012

In cognitive radio networks (CRNs), TCP goodput is one of the key issues to measure it's performance. However, most existing research efforts on TCP performance improvement have two weaknesses as follows: first of all, most of them only consider the underlying parameters to optimize the physical performance, the TCP performance have been neglected; Second, they are largely formulated as a Markov Decision Process (MDP), which requires a complete knowledge of network and cannot be directly applied to distributed CRNs. To solve the above problems, a Q-BMDP algorithm is proposed in this paper: Each user in CRN autonomously decides modulation type and transmitting power in PHY, channels to access in MAC to find the best TCP goodput. Due to the existence of perception error of environment, this issue is formulated as a Partial Observable Markov Decision Process (POMDP) which is then converted to belief state MDP, with Q-value iteration to find the optimal strategy. Simulation results show that the network can learn optimal strategy to effectively improve TCP goodput in dynamic wireless network.

Boost-wise pre-loaded mixture of experts for classification tasks

Article

Full-text available

May 2012
NEURAL COMPUT APPL

A modified version of Boosted Mixture of Experts (BME) is presented in this paper. While previous related works, namely BME, attempt to improve the performance by incorporating complementary features of a hybrid combining framework, they have some drawback. Analyzing the problems of previous approaches has suggested several modifications that have led us to propose a new method called Boost-wise Pre-loaded Mixture of Experts (BPME). We present a modification in pre-loading (initialization) procedure of ME, which addresses previous problems and overcomes them by employing a two-stage pre-loading procedure. In this approach, both the error and confidence measures are used as the difficulty criteria in boost-wise partitioning of problem space.

Target-Sensitive Control of Markov and Semi-Markov Processes

Article

Full-text available

Oct 2011
INT J CONTROL AUTOM

Abhijit Gosavi

We develop the theory for Markov and semi-Markov control using dynamic programming and reinforcement learning in which a form of semi-variance which computes the variability of rewards below a pre-specified target is penalized. The objective is to optimize a function of the rewards and risk where risk is penalized. Penalizing variance, which is popular in the literature, has some drawbacks that can be avoided with semi-variance. KeywordsRelative value iteration–semi-Markov control–semi-variance–stochastic shortest path problem–target-sensitive

Path Planning in an Uncertain Environment Using Approximate Dynamic Programming Methods

Conference Paper

Jul 2018

Routing in uncertain environments is challenging as it involves a number of contextual elements, such as different environmental conditions (forecast realizations with varying spatial and temporal uncertainty), changes in mission goals while en route, and asset status. In this paper, we use an approximate dynamic programming method with Q-factors to determine a cost-to-go approximation by treating the weather forecast realization information as a stochastic state. These types of algorithms take a large amount of offline computation time to determine the cost-to-go approximation, but once obtained, the online route recommendation is nearly instantaneous and several orders of magnitude faster than previously proposed ship routing algorithms. The proposed algorithm is robust to the uncertainty present in the weather forecasts. We compare this algorithm to a well-known shortest path algorithm and apply the approach to a real-world shipping tragedy using weather forecast realizations available prior to the event.

Crosslayer parameter configuration for TCP throughput improvement in cognitive radio networks

Article

Feb 2013
ACTA PHYS SIN-CH ED

In cognitive radio network (CRN), TCP end to end throughput is one of the key issues to measure its performance. However, most of existing research efforts devoted to TCP performance improvement have two weaknesses as follows. First, most of them only consider the underlying parameters to optimize the physical performance, but the TCP performance is neglected. Second, they are largely formulated as a Markov decision process (MDP), which requires a complete knowledge of network and cannot be directly applied to CRNs. To solve the above problems, a Q-BMDP algorithm is proposed in this paper. Each user in CRN combines modulation type and transmitting power at the physical layer, access channels at the media access control layer and TCP congestion control factor to maximize the TCP throughput. Due to the existence of perception error of environment, this issue is formulated as a partial observable Markov decision process (POMDP) which is then converted to belief state MDP, with Q-value iteration to find the approximately optimal strategy. Simulation and analysis results show that the proposed algorithm can be approximately converged to optimal strategy under a maximum error limit, and can effectively improve TCP throughput in a dynamic wireless network under the premise of the limited power consumption.

A New Link-Based Single Tree Building Algorithm for Shortest Path Searching in an Urban Road Transportation Network

Article

Jul 2013

The shortest-path searching algorithm must not only find a global solution to the destination, but also solve a turn penalty problem (TPP) in an urban road transportation network (URTN). Although the Dijkstra algorithm (DA) as a representative node-based algorithm secures a global solution to the shortest path search (SPS) in the URTN by visiting all the possible paths to the destination, the DA does not solve the TPP and the slow execution speed problem (SEP) because it must search for the temporary minimum cost node. Potts and Oliver solved the TPP by modifying the visiting unit from a node to the link type of a tree-building algorithm like the DA. The Multi Tree Building Algorithm (MTBA), classified as a representative Link Based Algorithm (LBA), does not extricate the SEP because the MTBA must search many of the origin and destination links as well as the candidate links in order to find the SPS. In this paper, we propose a new Link-Based Single Tree Building Algorithm in order to reduce the SEP of the MTBA by applying the breaking rule to the LBA and also prove its usefulness by comparing the proposed with other algorithms such as the node-based DA and the link-based MTBA for the error rates and execution speeds.

An Efficient Initialization Approach of Q-learning for Mobile Robots

Article

Feb 2012
INT J CONTROL AUTOM

This article demonstrates that Q-learning can be accelerated by appropriately specifying initial Q-values using dynamic wave expansion neural network. In our method, the neural network has the same topography as robot work space. Each neuron corresponds to a certain discrete state. Every neuron of the network will reach an equilibrium state according to the initial environment information. The activity of the special neuron denotes the maximum cumulative reward by following the optimal policy from the corresponding state when the network is stable. Then the initial Q-values are defined as the immediate reward plus the maximum cumulative reward by following the optimal policy beginning at the succeeding state. In this way, we create a mapping between the known environment information and the initial values of Q-table based on neural network. The prior knowledge can be incorporated into the learning system, and give robots a better learning foundation. Results of experiments in a grid world problem show that neural network-based Q-learning enables a robot to acquire an optimal policy with better learning performance compared to conventional Q-learning and potential field-based Qlearning.

Definition of state space.

Similar publications

Citations