Figure 5: Average results from all maps.

Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement

Preprint

Mar 2023

Object rearrangement is a challenge for embodied agents because solving these tasks requires generalizing across a combinatorially large set of configurations of entities and their locations. Worse, the representations of these entities are unknown and must be inferred from sensory percepts. We present a hierarchical abstraction approach to uncover these underlying entities and achieve combinatorial generalization from unstructured visual inputs. By constructing a factorized transition graph over clusters of entity representations inferred from pixels, we show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment. We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects, which outperforms current offline deep RL methods when evaluated on simulated rearrangement tasks.

Causal Based Action Selection Policy for Reinforcement Learning

Chapter

Full-text available

Oct 2021

Reinforcement learning (RL) is the de facto learning by interaction paradigm within machine learning. One of the intrinsic challenges of RL is the trade-off between exploration and exploitation. To solve this problem, in this paper, we propose to improve the reinforcement learning exploration process with an agent that can exploit causal relationships of the world. A causal graphical model is used to restrict the search space by reducing the actions that an agent can take through graph queries that check which variables are direct causes of the variables of interest. Our main contributions are a framework to represent causal information and an algorithm to guide the action selection process of a reinforcement learning agent, by querying the causal graph. We test our approach on discrete and continuous domains and show that using the causal structure in the Q-learning action selection step, leads to higher jump-start reward and stability. Furthermore, it is also shown that a better performance is obtained even with partial and spurious relationships in the causal graphical model.KeywordsReinforcement learningCausal graphical modelsAction selection

Efficient Decision Making under Uncertainty in High-Dimensional State Spaces

Thesis

Full-text available

Jun 2021

Khen Elimelech

The fundamental goal of artificial intelligence and robotics research is to allow agents and robots to autonomously plan and execute their actions. To achieve reliable and robust operation, these agents must account for uncertainty; this may derive from dynamic environments, noisy sensors, or inaccurate delivery of actions. Practically, these settings require reasoning over probabilistic states, known as "beliefs". At large, the focus of this work is to allow computationally efficient decision making under uncertainty, which we formulate as decision making in the belief space. To determine the optimal course of action, the agent should predict the future development of its belief, considering multiple candidate actions. Examining beliefs over high-dimensional state vectors (e.g., entire trajectories) can significantly improve estimation accuracy. Yet, it also holds high computational complexity, which makes the real-time solution of this problem extremely challenging. Inherently, solving a decision problem requires evaluation of the candidate actions according to some objective function. In this work, we claim that we can often spare this exhaustive calculation, and, instead, identify and solve a simplified decision problem, which leads to the same (or similar) action selection. A problem may be simplified by adapting each of its components -- initial belief, objective function, and candidate actions -- such that the objective calculation becomes "easier". In support of this concept, we present several contributions. We begin by presenting a novel and fundamental theoretical framework, which allows to formally analyze candidate optimality, and quantify the loss induced from possible sub-optimal action selection. This analysis approach also allows us to identify actions which are guaranteed to be sub-optimal, and can be eliminated. We then show how the idea of problem simplification can be applied to decision making in the belief space. Practically, the belief is often represented with a graphical model, or with the high-dimensional upper triangular "square root matrix". Thus, we suggest to simplify the problem by considering a sparse approximation of the initial belief (graph or matrix), which can be efficiently updated, in order to calculate the candidates' objective values. Accordingly, we present a scalable belief sparsification algorithm, and analyze its effect on the decision problem using our suggested framework. We follow with another method for reducing the computational complexity of the problem, by optimizing the order of variables in the state vector before planning. This operation can help reducing the number of variables which would be affected by the belief updates. We call this approach PIVOT: Predictive Incremental Variable Ordering Tactic. Unfortunately, changing the order of variables modifies the matrix in a non-trivial way, and implies its costly complete recalculation. Thus, in support of PIVOT, we also present an efficient and parallelizable algorithm for modification of this matrix on variable reordering, by manipulating it directly. We demonstrate the benefits of our methods to the solution of active Simultaneous Localization and Mapping (SLAM) problems for mobile robots, where we manage to significantly reduce computation time, with practically no loss in the quality of solution. Other relevant tasks include autonomous navigation, sensor placement, and robotic manipulation.

IRDA: Incremental Reinforcement Learning for Dynamic Resource Allocation

Article

Full-text available

Apr 2020

Resource allocation problems often manifest as online decision-making tasks where the proper allocation strategy depends on the understanding of the allocation environment and resources workload. Most existing resource allocation methods are based on meticulously designed heuristics which ignore the patterns of incoming tasks, so the dynamics of incoming tasks cannot be properly handled. To address this problem, we mine the task patterns from the large volume of historical allocation data and propose a reinforcement learning model termed IRDA to learn the allocation strategy in an incremental way. We observe that historical allocation data is usually generated from the daily repeated operations, which is not independent and identically distributed. Training with partial of this dataset can make the allocation strategy converged already, thereby wasting a lot of remaining data. To improve the learning efficiency, we partition the whole historical allocation big dataset into multi-batch datasets, which forces the agent to continuously ``explore" and learn on the distinct state spaces. IRDA reuses the strategy learned from the previous batch dataset and adapts it to the learning on the next batch dataset, so as to incrementally learn from multi-batch datasets and improve the allocation strategy. We apply the proposed method to handle baggage carousel allocation at Hong Kong International Airport (HKIA). The experimental results show that IRDA is capable of incrementally learning from multi-batch datasets, and improves the baggage carousel resource utilization by around 51.86% compared to the current baggage carousel allocation system at HKIA.

Fast Action Elimination for Efficient Decision Making and Belief Space Planning Using Bounded Approximations

Chapter

Full-text available

Nov 2019

In this paper we develop a novel paradigm to efficiently solve decision making and planning problems, and demonstrate it for the challenging case of planning under uncertainty. While conventional methods tend to optimize properties of specific problems, and sacrifice performance in order to reduce their complexity, our approach has no coupling to a specific problem, and relies solely on the structure of the general decision problem, in order to directly reduce its computational cost, with no influence over the quality of solution, nor the maintained state. Using bounded approximations of the state, we can easily eliminate unfit actions, while sparing the need to exactly evaluate the objective function for all the candidate actions. The original problem can then be solved considering a minimal subset of candidates. Since the approach is especially relevant when the action domain is large, and the objective function is expensive to evaluate, we later extend the discussion specifically for decision making under uncertainty and belief space planning, and present dedicated and practical tools, in order to apply the method to a sensor deployment problem. This paper continues our previous work towards efficient decision making.

Learning Shaping Strategies in Human-in-the-loop Interactive Reinforcement Learning

Preprint

Full-text available

Nov 2018

Providing reinforcement learning agents with informationally rich human knowledge can dramatically improve various aspects of learning. Prior work has developed different kinds of shaping methods that enable agents to learn efficiently in complex environments. All these methods, however, tailor human guidance to agents in specialized shaping procedures, thus embodying various characteristics and advantages in different domains. In this paper, we investigate the interplay between different shaping methods for more robust learning performance. We propose an adaptive shaping algorithm which is capable of learning the most suitable shaping method in an on-line manner. Results in two classic domains verify its effectiveness from both simulated and real human studies, shedding some light on the role and impact of human factors in human-robot collaborative learning.

Adaptively Shaping Reinforcement Learning Agents via Human Reward: 15th Pacific Rim International Conference on Artificial Intelligence, Nanjing, China, August 28–31, 2018, Proceedings, Part I

Chapter

Full-text available

Jul 2018

The computational complexity of reinforcement learning algorithms increases exponentially with the size of the problem. An effective solution to this problem is to provide reinforcement learning agents with informationally rich human knowledge, so as to expedite the learning process. Various integration methods have been proposed to combine human reward with agent reward in reinforcement learning. However, the essential distinction of these combination methods and their respective advantages and disadvantages are still unclear. In this paper, we propose an adaptive learning algorithm that is capable of selecting the most suitable method from a portfolio of combination methods in an adaptive manner. We show empirically that our algorithm enables better learning performance under various conditions, compared to the approaches using one combination method alone. By analyzing different ways of integrating human knowledge into reinforcement learning, our work provides some important insights into understanding the role and impact of human factors in human-robot collaborative learning.

State Abstractions for Lifelong Reinforcement Learning

Conference Paper

Full-text available

Jul 2018

In lifelong reinforcement learning, agents must effectively transfer knowledge across tasks while simultaneously addressing exploration, credit assignment , and generalization. State abstraction can help overcome these hurdles by compressing the representation used by an agent, thereby reducing the computational and statistical burdens of learning. To this end, we here develop theory to compute and use state abstractions in lifelong reinforcement learning. We introduce two new classes of abstractions: (1) transitive state abstractions , whose optimal form can be computed efficiently , and (2) PAC state abstractions, which are guaranteed to hold with respect to a distribution of tasks. We show that the joint family of transitive PAC abstractions can be acquired efficiently, preserve near optimal-behavior, and experimentally reduce sample complexity in simple domains, thereby yielding a family of desirable abstractions for use in lifelong reinforcement learning. Along with these positive results, we show that there are pathological cases where state abstractions can negatively impact performance.

Composable Planning with Attributes

Article

Mar 2018

The tasks that an agent will need to solve often are not known during training. However, if the agent knows which properties of the environment are important then, after learning how its actions affect those properties, it may be able to use this knowledge to solve complex tasks without training specifically for them. Towards this end, we consider a setup in which an environment is augmented with a set of user defined attributes that parameterize the features of interest. We propose a method that learns a policy for transitioning between "nearby" sets of attributes, and maintains a graph of possible transitions. Given a task at test time that can be expressed in terms of a target set of attributes, and a current state, our model infers the attributes of the current state and searches over paths through attribute space to get a high level plan, and then uses its low level policy to execute the plan. We show in 3D block stacking, grid-world games, and StarCraft that our model is able to generalize to longer, more complex tasks at test time by composing simpler learned policies.

Fast Action Elimination for Efficient Decision Making and Belief Space Planning Using Bounded Approximations

Conference Paper

Full-text available

Dec 2017

In this paper we develop a novel paradigm to efficiently solve decision making and planning problems, and demonstrate it for the challenging case of planning under uncertainty. While conventional methods tend to optimize properties of specific problems, and sacrifice performance in order to reduce their complexity, our approach has no coupling to a specific problem, and relies solely on the structure of the general decision problem, in order to directly reduce its computational cost, with no influence over the quality of solution, nor the maintained state. Using bounded approximations of the state, we can easily eliminate unfit actions, while sparing the need to exactly evaluate the objective function for all the candidate actions. The original problem can then be solved considering a minimal subset of candidates. Since the approach is especially relevant when the action domain is large, and the objective function is expensive to evaluate, we later extend the discussion specifically for decision making under uncertainty and belief space planning, and present dedicated and practical tools, in order to apply the method to a sensor deployment problem. This paper continues our previous work towards efficient decision making.

Average results from all maps.

Contexts in source publication

Similar publications

Citations