ThesisPDF Available

Dynamic Decision-making in Continuous Partially Observable Domains: A Novel Method and its Application for Autonomous Driving

Authors:

Abstract and Figures

Decision-making is a crucial challenge on the way to fully autonomous systems. In real world tasks, assessing the consequences of decisions is aggravated by two factors: uncertainty and the continuous nature of the environment. In this work, we develop a general method for solving continuous partially observable Markov decision processes (POMDPs) that combines learning and planning. We apply it to autonomous driving in urban scenarios with hidden objects and cooperative driver interactions.
Content may be subject to copyright.
A preview of the PDF is not available
... [3] developed a MDP model, making decisions in a crowd environment. [4] proposed a scene description and scene prediction method based on partially observable markov decision process (POMDP) to cope with uncertainty environments. Although behavioral decision making based on MDP can realize the human-like decision making and deal with the uncertainty, the decision search space is too large to realize the on-line real-time decision. ...
... A threshold is applied to distinguish safety or danger in low intervehicle distance situations. The risk estimation formula for TIV is: (4) where 0 represents there is no risk for following vehicle when TIV calculated greater than the threshold, and 1 represents there is risk and following vehicle should take some action to avoid collision. ...
... To escape these issues, Bai et al. combine a particle-based belief representation with the use of α-functions encoded as policy graphs to represent the value function (Bai et al., 2011). Instead of using policy graphs, which pose some limitations for long planning horizons, Brechtel et al. propose to represent the α-functions using regression trees (Brechtel et al., 2013;Brechtel, 2015). Finally, Sunberg and Kochenderfer propose an extension of POMCP that can handle fully continuous POMDPs by applying progressive widening to the action and observation spaces ). ...
... Most of these transitions are based on a set of hand-coded rules and timeout values. For this reason, this kind of approach is also referred to as manual decision programming (Brechtel, 2015). Several decision-making approaches presented years after the DARPA Urban Challenge have continued to rely on FSMs (Ardelt et al., 2012;Ziegler et al., 2014b). ...
Thesis
During the past few decades automakers have consistently introduced technological innovations aimed to make road vehicles safer. The level of sophistication of these advanced driver assistance systems has increased parallel to developments in sensor technology and embedded computing power. More recently, a lot of the research made both by industry and institutions has concentrated on achieving fully automated driving. The potential societal benefits of this technology are numerous, including safer roads, improved traffic flows, increased mobility for the elderly and the disabled, and optimized human productivity. However, before autonomous vehicles can be commercialized they should be able to safely share the road with human drivers. In other words, they should be capable of inferring the state and intentions of surrounding traffic from the raw data provided by a variety of onboard sensors, and to use this information to make safe navigation decisions. Moreover, in order to truly navigate safely they should also consider potential obstacles not observed by the sensors (such as occluded vehicles or pedestrians). Despite the apparent complexity of the task, humans are extremely good at predicting the development of traffic situations. After all, the actions of any traffic participant are constrained by the road network, by the traffic rules, and by a risk-aversive common sense. The lack of this ability to naturally understand a traffic scene constitutes perhaps the major challenge holding back the large-scale deployment of truly autonomous vehicles in the roads.In this thesis, we address the full pipeline from driver behavior modeling and inference to decision-making for navigation. In the first place, we model the behavior of a generic driver automatically from demonstrated driving data, avoiding thus the traditional hand-tuning of the model parameters. This model encodes the preferences of a driver with respect to the road network (e.g. preferred lane or speed) and also with respect to other road users (e.g. preferred distance to the leading vehicle). Secondly, we describe a method that exploits the learned model to predict the future sequence of actions of any driver in a traffic scene up to the distant future. This model-based prediction method assumes that all traffic participants behave in a risk-aware manner and can therefore fail to predict dangerous maneuvers or accidents. To be able to handle such cases, we propose a more sophisticated probabilistic model that estimates the state and intentions of surrounding traffic by combining the model-based prediction with the dynamic evidence provided by the sensors. In a way, the proposed model mimics the reasoning process of human drivers: we know what a given vehicle is likely to do given the situation (this is given by the model), but we closely monitor its dynamics to detect deviations from the expected behavior. In practice, combining both sources of information results in an increased robustness of the intention estimates in comparison with approaches relying only on dynamic evidence. Finally, the learned driver behavioral model and the prediction model are integrated within a probabilistic decision-making framework. The proposed methods are validated with real-world data collected with an instrumented vehicle. Although focused on highway environments, this work could be easily adapted to handle alternative traffic scenarios.
... Another approach for behavior planning is the modeling the problem as a Markov Decision Process. In [13], the autonomous driving problem at unsignalized intersections is modeled as a POMDP with a dynamic bayesian network (DBN) to model the evolution of states. The framework uses Monte Carlo (MC) sampling and also learns a discrete representation of the state space. ...
... The problem is that in order to use a discrete state space, an effective enough discrete representation of the real, continuous world must be found so that every state contains an adequate level of information to allow for the Markov property to be valid. Doing this would add a lot of computational overhead as seen in [13], where a discrete representation of the continuous space is created iteratively but the step size is set to 1.5 seconds. Instead, point based methods that use a set of samples to approximate the continuous value function can be used, as shown in [16]. ...
Conference Paper
Full-text available
Abstract—In recent years, automated driving has gained the public spotlight as more prototypes are tested amongst human drivers. Since driving is a social behavior, automated vehicles (AVs) have to cooperate with other road users in the shared driving spaces. Therefore tactical behaviour planning with respect to the intentions of the surrounding road users is an essential functionality at the maneuver level. Moreover, the incorporation of uncertainty in road users intentions and sensor measurements is of high importance, especially in urban scenarios. This paper presents a generalized approach towards decision making for automated vehicles encountering by formulating the problem as a continuous Partially Observable Markov Decision Process (POMDP) by combining the advantage of two state of the art solvers - MCVI and SARSOP. Furthermore, the uncertainties in intention estimation model of the road users are incorporated using a dynamic bayesian network. Validation has been done for longitudinal behavior planning in varied unsignalized intersection-based scenarios using the simulation software IPG CarMaker.
... For example, Brechtel (2015) outlines highway driving scenarios as a generic Markov Decision Process with continuous states and action spaces based on the dynamics and inputs of a car. Nevertheless, the authors devised a novel discretization method to fit the states and actions into a solvable MDP formulation. ...
... Although the POMCP solver is capable of handling continuous state spaces, the authors used discrete states. Brechtel introduces a POMDP solver that is able to work with continuous spaces by learning a problem-specific representation of the state space [6], [7]. ...
Preprint
Decision making under uncertainty can be framed as a partially observable Markov decision process (POMDP). Finding exact solutions of POMDPs is generally computationally intractable, but the solution can be approximated by sampling-based approaches. These sampling-based POMDP solvers rely on multi-armed bandit (MAB) heuristics, which assume the outcomes of different actions to be uncorrelated. In some applications, like motion planning in continuous spaces, similar actions yield similar outcomes. In this paper, we utilize variants of MAB heuristics that make Lipschitz continuity assumptions on the outcomes of actions to improve the efficiency of sampling-based planning approaches. We demonstrate the effectiveness of this approach in the context of motion planning for automated driving.
... Interaction of (a) an MDP agent or (b) a POMDP with the world (Image credit:[39]).MDP are modeled using the DBN proposed by Gindele et al.,[82] that accounts for the interactions between vehicles. The policy is then recalculated each time a new traffic situation is encountered. ...
Thesis
Full-text available
Recent advances in Autonomous Vehicles (AV) driving raised up all the importance to ensure the complete reliability of AV maneuvers even in highly dynamic and uncertain environments/situations. This objective becomes even more challenging due to the uniqueness of every traffic situation/condition. To cope with all these very constrained and complex configurations, AVs must have appropriate control architecture with reliable and real-time Risk Assessment and Management Strategies (RAMS). These targeted RAMS must lead to reduce drastically the navigation risks (theoretically, lower than any human-like driving behavior), with a systemic way. Consequently, the aim is also to reduce the need for too extensive testing (which could take several months and years for each produced RAMS without at the end having absolute prove). Hence the goal in this Ph.D. thesis is to have a provable methodology for AV RAMS. This dissertation addresses the full pipeline from risk assessment, path planning to decision-making and control of autonomous vehicles. In the first place, an overall Probabilistic Multi-Controller Architecture (P-MCA) is designed for safe autonomous driving under uncertainties. The P-MCA is composed of several interconnected modules that are responsible for: assessing the collision risk with all observed vehicles while considering their trajectories' predictions; planning the different driving maneuvers; making the decision on the most suitable actions to achieve; control the vehicle movement; aborting safely the engaged maneuver if necessary (due for instance to a sudden change in the environment); and as last resort planning evasive actions if there is no other choice. The proposed risk assessment is based on a dual-safety stage strategy. The first stage analyzes the actual driving situation and predicts potential collisions. This is performed while taking into consideration several dynamic constraints and traffic conditions that are known at the time of planning. The second stage is applied in real-time, during the maneuver achievement, where a safety verification mechanism is activated to quantify the risks and the criticality of the driving situation beyond the remaining time to achieve the maneuver. The decision-making strategy is based on a Sequential Decision Networks for Maneuver Selection and Verification (SDN-MSV) and corresponds to an important module of the P-MCA. This module is designed to manage several road maneuvers under uncertainties. It utilizes the defined safety stages assessment to propose discrete actions that allow to: derive appropriate maneuvers in a given traffic situation and provide a safety retrospection that updates in real-time the ego-vehicle movements according to the environment dynamic, in order to face any sudden hazardous and risky situation. In the latter case, it is proposed to compute the corresponding low-level control based on the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) that allows the ego-vehicle to pursue the advised collision-free evasive trajectory to avert an accident and to guarantee safety at any time.The reliability and the flexibility of the overall proposed P-MCA and its elementary components have been intensively validated, first in simulated traffic conditions, with various driving scenarios, and secondly, in real-time with the autonomous vehicles available at Institut Pascal.
Article
Full-text available
Real-time, accurate, and robust localisation is critical for autonomous vehicles (AVs) to achieve safe, efficient driving, whilst real-time performance is essential for AVs to achieve their current position in time for decision making. To date, no review paper has quantitatively compared the real-time performance between different localisation techniques based on various hardware platforms and programming languages and analysed the relations among localisation methodologies, real-time performance and accuracy. Therefore, this paper discusses the state-of-the-art localisation techniques and analyses their overall performance in AV application. For further analysis, this paper firstly proposes a localisation algorithm operations capability (LAOC)-based equivalent comparison method to compare the relative computational complexity of different localisation techniques; then, it comprehensively discusses the relations among methodologies, computational complexity, and accuracy. Analysis results show that the computational complexity of localisation approaches differs by a maximum of about 10⁷ times, whilst accuracy varies by about 100 times. Vision- and data fusion-based localisation techniques have about 2-5 times potential for improving accuracy compared with lidar-based localisation. Lidar- and vision-based localisation can reduce computational complexity by improving image registration method efficiency. Data fusion-based localisation can achieve better real-time performance compared with lidar- and vision-based localisation because each standalone sensor does not need to develop a complex algorithm to achieve its best localisation potential. Vehicle-to-everything (V2X) technology can improve positioning robustness. Finally, the potential solutions and future orientations of AVs' localisation based on the quantitative comparison results are discussed.
Article
Full-text available
Being able to build a map of the environment and to simultaneously localize within this map is an essential skill for mobile robots navigating in unknown environments in absence of external referencing systems such as GPS. This so-called simultaneous localization and mapping (SLAM) problem has been one of the most popular research topics in mobile robotics for the last two decades and efficient approaches for solving this task have been proposed. One intuitive way of formulating SLAM is to use a graph whose nodes correspond to the poses of the robot at different points in time and whose edges represent constraints between the poses. The latter are obtained from observations of the environment or from movement actions carried out by the robot. Once such a graph is constructed, the map can be computed by finding the spatial configuration of the nodes that is mostly consistent with the measurements modeled by the edges. In this paper, we provide an introductory description to the graph-based SLAM problem. Furthermore, we discuss a state-of-the-art solution that is based on least-squares error minimization and exploits the structure of the SLAM problems during optimization. The goal of this tutorial is to enable the reader to implement the proposed methods from scratch.
Article
This work is a contribution to understanding multi-object traffic scenes from video sequences. All data is provided by a camera system which is mounted on top of the autonomous driving platform AnnieWAY. The proposed probabilistic generative model reasons jointly about the 3D scene layout as well as the 3D location and orientation of objects in the scene. In particular, the scene topology, geometry as well as traffic activities are inferred from short video sequences. © 2013 Karlsruher Institut fur Technologie (KIT). All rights reserved.
Conference Paper
We present a motion planning framework for autonomous on-road driving considering both the uncertainty caused by an autonomous vehicle and other traffic participants. The future motion of traffic participants is predicted using a local planner, and the uncertainty along the predicted trajectory is computed based on Gaussian propagation. For the autonomous vehicle, the uncertainty from localization and control is estimated based on a Linear-Quadratic Gaussian (LQG) framework. Compared with other safety assessment methods, our framework allows the planner to avoid unsafe situations more efficiently, thanks to the direct uncertainty information feedback to the planner. We also demonstrate our planner's ability to generate safer trajectories compared to planning only with a LQG framework.
Conference Paper
The problem of modeling the navigation behavior of multiple interacting agents arises in different areas including robotics, computer graphics, and behavioral science. In this paper, we present an approach to learn the composite navigation behavior of interacting agents from demonstrations. The decision process that ultimately leads to the observed continuous trajectories of the agents often also comprises discrete decisions, which partition the space of composite trajectories into homotopy classes. Therefore, our method uses a mixture probability distribution that consists of a discrete distribution over the homotopy classes and continuous distributions over the composite trajectories for each homotopy class. Our approach learns the model parameters of this distribution that match, in expectation, the observed behavior in terms of user-defined features. To compute the feature expectations over the high-dimensional continuous distributions, we use Hamiltonian Markov chain Monte Carlo sampling. We exploit that the distributions are highly structured due to physical constraints and guide the sampling process to regions of high probability. We apply our approach to learning the behavior of pedestrians and demonstrate that it outperforms state-of-the-art methods.