Figure 5 - uploaded by David Abel
Content may be subject to copyright.
Average results from all maps.  

Average results from all maps.  

Source publication
Article
Full-text available
Robots that interact with people must flexibly respond to requests by planning in stochastic state spaces that are often too large to solve for optimal behavior. In this work, we develop a framework for goal and state dependent action priors that can be used to prune away irrelevant actions based on the robot's current goal, thereby greatly acceler...

Contexts in source publication

Context 1
... ble 2 shows the average Bellman updates, accumulated re- ward, and CPU time for RTDP, LP-RTDP and EP-RTDP af- ter planning in 20 different maps of each goal (100 total). Figure 5 shows the results averaged across all maps. We re- port CPU time for completeness, but our results were run on a networked cluster where each node had differing computer and memory resources. ...
Context 2
... re- port CPU time for completeness, but our results were run on a networked cluster where each node had differing computer and memory resources. As a result, the CPU results have some variance not consistent with the number of Bellman updates in Table 2. Despite this noise, overall the average CPU time shows statistically significant improvement over- all with our priors, as shown in Figure 5. Furthermore, we reevaluate each predicate every time the agent visits a state, which could be optimized by caching predicate evaluations, further reducing the CPU time taken for EP-RTDP and LP- RTDP. ...

Similar publications

Preprint
Full-text available
Despite achieving great success on performance in various sequential decision task, deep reinforcement learning is extremely data inefficient. Many approaches have been proposed to improve the data efficiency, e.g. transfer learning which utilizes knowledge learned from related tasks to accelerate training. Previous researches on transfer learning...

Citations

... The problem of discovering re-composable representations is generally motivated by combinatorial task spaces. The traditional approach to enforcing this compositional inductive bias is to compactly represent the task space with MDPs that human-defined abstractions of entities, such as factored MDPs (Boutilier et al., 1995;2000;Guestrin et al., 2003a), relational MDPs (Wang et al., 2008;Guestrin et al., 2003b;Gardiol & Kaelbling, 2003), and object-oriented MDPs (Diuk et al., 2008;Abel et al., 2015). Approaches building off of such symbolic abstractions (Chang et al., 2016;Battaglia et al., 2018;Zadaianchuk et al., 2022;Bapst et al., 2019;Zhang et al., 2018) do not address the problem of how such entity abstractions arise from raw data. ...
Preprint
Object rearrangement is a challenge for embodied agents because solving these tasks requires generalizing across a combinatorially large set of configurations of entities and their locations. Worse, the representations of these entities are unknown and must be inferred from sensory percepts. We present a hierarchical abstraction approach to uncover these underlying entities and achieve combinatorial generalization from unstructured visual inputs. By constructing a factorized transition graph over clusters of entity representations inferred from pixels, we show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment. We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects, which outperforms current offline deep RL methods when evaluated on simulated rearrangement tasks.
... However, according to Hafner et al. [16], using prediction models seems a promising way to deal with this problem. There are several examples of using prior knowledge to guide an RL agent, as in [22,30,11,1,26]. However, causal models provide several advantages: (i) Intervention, evaluate changes in the effects given interventions in the causes; (ii) Explanation, to know why a certain sequence of decisions was chosen, and (iii) Counterfactual, to evaluate the potential impact of alternative actions. ...
Chapter
Full-text available
Reinforcement learning (RL) is the de facto learning by interaction paradigm within machine learning. One of the intrinsic challenges of RL is the trade-off between exploration and exploitation. To solve this problem, in this paper, we propose to improve the reinforcement learning exploration process with an agent that can exploit causal relationships of the world. A causal graphical model is used to restrict the search space by reducing the actions that an agent can take through graph queries that check which variables are direct causes of the variables of interest. Our main contributions are a framework to represent causal information and an algorithm to guide the action selection process of a reinforcement learning agent, by querying the causal graph. We test our approach on discrete and continuous domains and show that using the causal structure in the Q-learning action selection step, leads to higher jump-start reward and stability. Furthermore, it is also shown that a better performance is obtained even with partial and spurious relationships in the causal graphical model.KeywordsReinforcement learningCausal graphical modelsAction selection
... In the work by Nakhost and Müller (2010) actions are eliminated from a constructed plan, comprised of a sequence of actions, in order to reduce its execution complexity, but not the planning complexity. Other works (e.g., Rosman and Ramamoorthy, 2012, Sherstov and Stone, 2005, Abel et al., 2015 discuss how to remove actions from consideration, by transferring previously learned knowledge across tasks. Our approach makes no assumption on prior knowledge, and does not even require the decision process to be sequential. ...
Thesis
Full-text available
The fundamental goal of artificial intelligence and robotics research is to allow agents and robots to autonomously plan and execute their actions. To achieve reliable and robust operation, these agents must account for uncertainty; this may derive from dynamic environments, noisy sensors, or inaccurate delivery of actions. Practically, these settings require reasoning over probabilistic states, known as "beliefs". At large, the focus of this work is to allow computationally efficient decision making under uncertainty, which we formulate as decision making in the belief space. To determine the optimal course of action, the agent should predict the future development of its belief, considering multiple candidate actions. Examining beliefs over high-dimensional state vectors (e.g., entire trajectories) can significantly improve estimation accuracy. Yet, it also holds high computational complexity, which makes the real-time solution of this problem extremely challenging. Inherently, solving a decision problem requires evaluation of the candidate actions according to some objective function. In this work, we claim that we can often spare this exhaustive calculation, and, instead, identify and solve a simplified decision problem, which leads to the same (or similar) action selection. A problem may be simplified by adapting each of its components -- initial belief, objective function, and candidate actions -- such that the objective calculation becomes "easier". In support of this concept, we present several contributions. We begin by presenting a novel and fundamental theoretical framework, which allows to formally analyze candidate optimality, and quantify the loss induced from possible sub-optimal action selection. This analysis approach also allows us to identify actions which are guaranteed to be sub-optimal, and can be eliminated. We then show how the idea of problem simplification can be applied to decision making in the belief space. Practically, the belief is often represented with a graphical model, or with the high-dimensional upper triangular "square root matrix". Thus, we suggest to simplify the problem by considering a sparse approximation of the initial belief (graph or matrix), which can be efficiently updated, in order to calculate the candidates' objective values. Accordingly, we present a scalable belief sparsification algorithm, and analyze its effect on the decision problem using our suggested framework. We follow with another method for reducing the computational complexity of the problem, by optimizing the order of variables in the state vector before planning. This operation can help reducing the number of variables which would be affected by the belief updates. We call this approach PIVOT: Predictive Incremental Variable Ordering Tactic. Unfortunately, changing the order of variables modifies the matrix in a non-trivial way, and implies its costly complete recalculation. Thus, in support of PIVOT, we also present an efficient and parallelizable algorithm for modification of this matrix on variable reordering, by manipulating it directly. We demonstrate the benefits of our methods to the solution of active Simultaneous Localization and Mapping (SLAM) problems for mobile robots, where we manage to significantly reduce computation time, with practically no loss in the quality of solution. Other relevant tasks include autonomous navigation, sensor placement, and robotic manipulation.
... Most of the existing IML methods can be classified into three classes: integrating action priors, sharing representations, and reusing learned models. Action priors integration methods [38], [39], [40] leveraged the old experience to identify a set of optimal skills so as to reduce the computational complexity of learning the new tasks. Brunskill, et al. [40] proposed a PAC-inspired skill-set discovery algorithm and they theoretically proved how the learned sub-set skills can benefit continuing learning in terms of reducing complexity. ...
Article
Full-text available
Resource allocation problems often manifest as online decision-making tasks where the proper allocation strategy depends on the understanding of the allocation environment and resources workload. Most existing resource allocation methods are based on meticulously designed heuristics which ignore the patterns of incoming tasks, so the dynamics of incoming tasks cannot be properly handled. To address this problem, we mine the task patterns from the large volume of historical allocation data and propose a reinforcement learning model termed IRDA to learn the allocation strategy in an incremental way. We observe that historical allocation data is usually generated from the daily repeated operations, which is not independent and identically distributed. Training with partial of this dataset can make the allocation strategy converged already, thereby wasting a lot of remaining data. To improve the learning efficiency, we partition the whole historical allocation big dataset into multi-batch datasets, which forces the agent to continuously ``explore" and learn on the distinct state spaces. IRDA reuses the strategy learned from the previous batch dataset and adapts it to the learning on the next batch dataset, so as to incrementally learn from multi-batch datasets and improve the allocation strategy. We apply the proposed method to handle baggage carousel allocation at Hong Kong International Airport (HKIA). The experimental results show that IRDA is capable of incrementally learning from multi-batch datasets, and improves the baggage carousel resource utilization by around 51.86% compared to the current baggage carousel allocation system at HKIA.
... In [15] actions are eliminated from a constructed plan, comprised of a sequence of actions, in order to reduce its execution complexity, and not the planning complexity. For learning problems, a few papers [16][17][18] have attempted to remove actions from consideration by transferring previous learned knowledge across learning tasks. Our method requires no previous knowledge, and not even for the decision making to be sequential. ...
Chapter
Full-text available
In this paper we develop a novel paradigm to efficiently solve decision making and planning problems, and demonstrate it for the challenging case of planning under uncertainty. While conventional methods tend to optimize properties of specific problems, and sacrifice performance in order to reduce their complexity, our approach has no coupling to a specific problem, and relies solely on the structure of the general decision problem, in order to directly reduce its computational cost, with no influence over the quality of solution, nor the maintained state. Using bounded approximations of the state, we can easily eliminate unfit actions, while sparing the need to exactly evaluate the objective function for all the candidate actions. The original problem can then be solved considering a minimal subset of candidates. Since the approach is especially relevant when the action domain is large, and the objective function is expensive to evaluate, we later extend the discussion specifically for decision making under uncertainty and belief space planning, and present dedicated and practical tools, in order to apply the method to a sensor deployment problem. This paper continues our previous work towards efficient decision making.
... The existing InterRL methods convert human/agent feedback signals into a reward, a value or a decision rule, and pass the knowledge about one specific component of RL into the agent's learning process for performance acceleration [2]. Without lose of generality, four distinct types of InterRL methods can be categorized: (1) Action-based Methods: that use the human's feedback signals to directly affect the agent's action selection process [1,24,27]; ...
Preprint
Full-text available
Providing reinforcement learning agents with informationally rich human knowledge can dramatically improve various aspects of learning. Prior work has developed different kinds of shaping methods that enable agents to learn efficiently in complex environments. All these methods, however, tailor human guidance to agents in specialized shaping procedures, thus embodying various characteristics and advantages in different domains. In this paper, we investigate the interplay between different shaping methods for more robust learning performance. We propose an adaptive shaping algorithm which is capable of learning the most suitable shaping method in an on-line manner. Results in two classic domains verify its effectiveness from both simulated and real human studies, shedding some light on the role and impact of human factors in human-robot collaborative learning.
... As can be seen, an RL process is composed of the following main components: the action A, the policy π, the reward R and the value function V . Consequently, to guide an agent's RL process, humans can manipulate the agent's action A [1,23], policy Π [9,10], reward R [6,7,19], value function V [13] or their various combinations as follows: ...
Chapter
Full-text available
The computational complexity of reinforcement learning algorithms increases exponentially with the size of the problem. An effective solution to this problem is to provide reinforcement learning agents with informationally rich human knowledge, so as to expedite the learning process. Various integration methods have been proposed to combine human reward with agent reward in reinforcement learning. However, the essential distinction of these combination methods and their respective advantages and disadvantages are still unclear. In this paper, we propose an adaptive learning algorithm that is capable of selecting the most suitable method from a portfolio of combination methods in an adaptive manner. We show empirically that our algorithm enables better learning performance under various conditions, compared to the approaches using one combination method alone. By analyzing different ways of integrating human knowledge into reinforcement learning, our work provides some important insights into understanding the role and impact of human factors in human-robot collaborative learning.
... The first is the 10 × 30 Upworld grid problem from Abel et al. (2016). The second is the Trench problem from Abel et al. (2015), in which the agent must seek out a block, pick it up, carry it to a trench, place the block in the trench, and walk across the block to the goal. In each case, we vary the size of the state space by changing the width of the grid. ...
Conference Paper
Full-text available
In lifelong reinforcement learning, agents must effectively transfer knowledge across tasks while simultaneously addressing exploration, credit assignment , and generalization. State abstraction can help overcome these hurdles by compressing the representation used by an agent, thereby reducing the computational and statistical burdens of learning. To this end, we here develop theory to compute and use state abstractions in lifelong reinforcement learning. We introduce two new classes of abstractions: (1) transitive state abstractions , whose optimal form can be computed efficiently , and (2) PAC state abstractions, which are guaranteed to hold with respect to a distribution of tasks. We show that the joint family of transitive PAC abstractions can be acquired efficiently, preserve near optimal-behavior, and experimentally reduce sample complexity in simple domains, thereby yielding a family of desirable abstractions for use in lifelong reinforcement learning. Along with these positive results, we show that there are pathological cases where state abstractions can negatively impact performance.
... Our approach is also related to Relational MDP and Object Oriented MDP (Hernandez-Gardiol & Kaelbling, 2003;van Otterlo, 2005;Diuk et al., 2008;Abel et al., 2015), where states are described as a set of objects, each of which is an instantiation of canonical classes, and each instantiated object has a set of attributes. Our work is especially related to (Guestrin et al., 2003a), where the aim is to show that by using a relational representation of an MDP, a policy from one domain can generalize to a new domain. ...
Article
The tasks that an agent will need to solve often are not known during training. However, if the agent knows which properties of the environment are important then, after learning how its actions affect those properties, it may be able to use this knowledge to solve complex tasks without training specifically for them. Towards this end, we consider a setup in which an environment is augmented with a set of user defined attributes that parameterize the features of interest. We propose a method that learns a policy for transitioning between "nearby" sets of attributes, and maintains a graph of possible transitions. Given a task at test time that can be expressed in terms of a target set of attributes, and a current state, our model infers the attributes of the current state and searches over paths through attribute space to get a high level plan, and then uses its low level policy to execute the plan. We show in 3D block stacking, grid-world games, and StarCraft that our model is able to generalize to longer, more complex tasks at test time by composing simpler learned policies.
... In [15] actions are eliminated from a constructed plan, comprised of a sequence of actions, in order to reduce its execution complexity, and not the planning complexity. For learning problems, a few papers [16][17][18] have attempted to remove actions from consideration by transferring previous learned knowledge across learning tasks. Our method requires no previous knowledge, and not even for the decision making to be sequential. ...
Conference Paper
Full-text available
In this paper we develop a novel paradigm to efficiently solve decision making and planning problems, and demonstrate it for the challenging case of planning under uncertainty. While conventional methods tend to optimize properties of specific problems, and sacrifice performance in order to reduce their complexity, our approach has no coupling to a specific problem, and relies solely on the structure of the general decision problem, in order to directly reduce its computational cost, with no influence over the quality of solution, nor the maintained state. Using bounded approximations of the state, we can easily eliminate unfit actions, while sparing the need to exactly evaluate the objective function for all the candidate actions. The original problem can then be solved considering a minimal subset of candidates. Since the approach is especially relevant when the action domain is large, and the objective function is expensive to evaluate, we later extend the discussion specifically for decision making under uncertainty and belief space planning, and present dedicated and practical tools, in order to apply the method to a sensor deployment problem. This paper continues our previous work towards efficient decision making.