Conference Paper

Traffic Density-Based Discovery of Hot Routes in Road Networks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Finding hot routes (traffic flow patterns) in a road network is an important problem. They are beneficial to city planners, police departments, real estate developers, and many others. Knowing the hot routes allows the city to better direct traffic or analyze congestion causes. In the past, this problem has largely been addressed with domain knowledge of city. But in recent years, detailed information about vehicles in the road network have become available. With the development and adoption of RFID and other location sensors, an enormous amount of moving object trajectories are being collected and can be used towards finding hot routes. This is a challenging problem due to the complex nature of the data. If objects traveled in organized clusters, it would be straightforward to use a clustering algorithm to find the hot routes. But, in the real world, objects move in unpredictable ways. Variations in speed, time, route, and other factors cause them to travel in rather fleeting “clusters.” These properties make the problem difficult for a naive approach. To this end, we propose a new density-based algorithm named FlowScan. Instead of clustering the moving objects, road segments are clustered based on the density of common traffic they share. We implemented FlowScan and tested it under various conditions. Our experiments show that the system is both efficient and effective at discovering hot routes.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Since hot spots can be interpreted as areas of high crowdedness of vehicles, clustering can be applied to solve the problem. In addition, trajectory clustering is used in a variety of applications such as motion prediction [163], traffic monitoring [164,165], activity understanding [166,167], and scene analysis [168,169]. ...
... Trajectory clustering has gained interest over the years, and several clustering methods explicitly tailored for trajectories have been proposed [170,171,165]. In addition, trajectory-specific clustering approaches have been developed by adapting statistical and probabilistic models to account for trajectory characteristics. ...
... It is an efficient method to analyse trajectory data and has been used to gain space, time, or semantic information inside trajectory data. Trajectory clustering is used in a variety of applications such as motion prediction [163], traffic monitoring [164,165], activity understanding [166,167], and scene analysis [168,169]. Similarly, trajectory clustering can be used to group students based on their learning path. ...
Thesis
Full-text available
Motivated by the increasing influence of data analytics in the higher education sector, this thesis focuses on enhancing the effectiveness and quality of an undergraduate student's journey. An undergraduate student's journey begins when they enrol at a university and ends once employed in the graduate labour market. Findings of this research benefits stakeholders of education, such as educational policymakers, education providers, current and prospective students in solving a variety of problems including student drop-out, low course satisfaction, and undesirable graduate employment outcomes.
... This enables various kinds of real-time analyses on mobility data -for instance, real-time detection of regularities (e.g., typical traffic flows over a road network) and anomalies (e.g., traffic jams) -thus allowing institutions to quickly act when facing urgent decision-making tasks in urban settings. Indeed, the ability to capture and monitor the evolution of clusters (i.e., groups) of moving objects that exhibit similar movement behaviors allows to gain valuable insights on various kinds of mobility patterns [1]- [3]. For example, a traffic jam can be seen as a sequence of merges of distinct clusters of objects, or as an evolution of clusters of objects which exhibit similar movements in space and time. ...
... We report that such approaches limit themselves to Euclidean spaces and instantaneous positions, and thus fail to capture mobility behaviors that unravel over time, e.g., they fail to capture typically trafficked routes or traffic jams. On the other hand, other approaches [1], [2], [8], [9] consider moving objects trajectories constrained by road-networks. Even though these solutions adopt time discretization (i.e., time is partitioned in intervals to reduce computational costs), they do not employ incremental clustering as they recompute clusters from scratch at every time interval. ...
... In [1] the authors propose FlowScan, a density-based approach that aims to find hot routes, i.e., heavily trafficked paths within a road network. To achieve this goal FlowScan considers historical trajectory data and employs a trajectory clustering algorithm that relies on the notion of traffic density to detect dense road segments, i.e., road segments traversed by a number of trajectories above a given threshold. ...
... KONG et al., 2018;PARK et al., 2018;DING et al., 2018a;KWON et al., 2018). Assim, explorar e extrair informações úteis em fluxo de dados de trajetória foi extensivamente estudado na última década (DUAN et al., 2016;SACHARIDIS et al., 2008;LANGE et al., 2011;MAO et al., 2016 GUI et al., 2016;WU et al., 2014;SACHARIDIS et al., 2008;LI et al., 2007). ...
... Esta etapa destaca esses atributos por meio de um conjunto de operações de processamento de imagens, como filtros, suavização e contraste. Desta forma, com um conjunto pré-definido de aprimoramentos visuais, é possível destacar visualmente padrões de mobilidade, como ondas de tráfego (TREIBER; KESTING, 2013) e rotas quentes (LI et al., 2007) O próximo capítulo apresenta as aplicações da abordagem proposta na análise de padrões de mobilidade global e local. ...
... O problema que visa descobrir rotas que frequentemente apresentam tráfego pesado (rotas quentes) foi abordado recentemente por vários trabalhos (ALMEIDA et al., 2017;GUI et al., 2016;WU et al., 2014;SACHARIDIS et al., 2008;LI et al., 2007). Além desses, os trabalhos de clusterização de trajetórias também podem ser usados para encontrar rotas quentes -como foi discutido na Seção 3.4 do Capítulo 3. Analisar como essas rotas quentes evoluem permite que as agências de trânsito identifiquem eventos anormais em um estágio inicial para direcionar melhor o tráfego. ...
Thesis
Full-text available
Urbanization is accelerating worldwide, giving rise to serious traffic problems. With the increasing availability of location acquisition technologies, massive movement data are collected continuously in a streaming manner. These data are a valuable source to help transit agencies to identify abnormal events that require immediate attention to better direct traffic. In this regard, visual analytics can help by combining automated analysis with interactive visualization for effective understanding, reasoning, and decision-making. Traditional approaches aggregate movement by employing the concept of time-window discretization and exploring an entire dataset. However, they can present inconsistencies in time and space with the real traffic dynamics. In this thesis, we present a novel approach to discover global and local mobility patterns in real time. Different from other existing approaches, our method tracks the evolution of the objects’ movement in real time. We believe that no other approach captures and keeps track of how the hot routes evolve in an incremental manner. Moreover, we conducted extensive experiments on real-world and simulated datasets to evaluate the effectiveness of our method. We also present the benefits and limitations of our visualization proposal based on domain expert feedback. Finally, we present performance tests with very encouraging results to support our approach in visualizing the total traffic flow of a big city. The results demonstrate that our method scales linearly with the size of the dataset, and is able to deal with large datasets and with streams of high-sampling rates.
... Usually, transit agencies combine video surveillance with computer vision methods to provide significant aid to human traffic control operators in traffic monitoring [2][3][4] . However, robust and accurate detection and tracking of moving objects remain a severe problem for most com- Typical approaches for monitoring trajectory data streams aim to discover the routes with large volumes of traffic in the recent past, also known as hot routes or hot motion paths [8,11,12] . Analyzing how those hot routes evolve allows transit agencies to identify abnormal events at an early stage to better direct traffic. ...
... However, monitoring tens of thousands of moving objects in real time is very challenging because it requires high-speed processing of a vast volume of data streams. There are several approaches for performing hot routes discovery, such as those proposed in [8,11,12] . Moreover, the studies of clustering trajectory data streams [9,10,[13][14][15] could be used to capture hot routes through groups of similar sub-trajectories of moving objects that unfold across a sequence of recent time intervals -a.k.a. ...
... Moreover, our approach tracks the objects' dynamics in real time. Different from other existing approaches, our method neither explores an entire data stream [8,11,16,17] nor uses time windows [14,15] . In fact, extracting patterns from the entire data stream or within a time window is not very useful for real-time monitoring because the extracted patterns become outdated very fast. ...
Article
Full-text available
With the increasing availability of location acquisition technologies, massive movement data are collected continuously in a streaming manner. These data are a valuable source to help transit agencies to monitor the routes with heavy traffic (hot routes) and to identify abnormal events that require immediate at- tention to better direct traffic. In this regard, visual analytics can help by combining automated analysis with interactive visualization for effective understanding, reasoning, and decision-making. Traditional ap- proaches aggregate movement by employing the concept of time-window discretization and exploring an entire dataset. However, they can present inconsistencies in time and space with the real traffic dynam- ics. In this paper, we present a novel approach to discover hot routes in real time. Different from other existing approaches, our method tracks the evolution of the objects’ movement in real time. We believe that no other approach captures and keeps track of how the hot routes evolve in an incremental man- ner. Moreover, we conducted extensive experiments on real-world and simulated datasets to evaluate the effectiveness and performance of our method. The results demonstrate that our method scales linearly with the size of the dataset, and is able to deal with large datasets and with streams of high-sampling rates.
... The hot route in this context is the sequence of adjacent edges that share the certain amount of traffic (minTraffic) defined as a parameter. [13] Definition 2 (Convoy) Grouping of vehicles in space-time following a common flow at the average track speed held together in a green-wave window. ...
... This work [13] proposes a new density-based algorithm named FlowScan. Instead of clustering the moving objects, road segments are clustered based on the density of common traffic they share. ...
... To evaluate the GR algorithm we used frequent routes obtained from the data set of taxi trajectories mentioned above, using routes that contained traffic lights in their route and where there were intersections. To extract the Hot routes we use the Flowscan algorithm [13], we use the minTraffic = 1 parameters to ensure that the Hot Routes have the greatest possible coverage of the trajectories of the city. We used the Webster Model to calculate cycle times based on highway type parameters and average track speed. ...
Article
Full-text available
The objective of this paper is to cover the research in the area of adaptive traffic control with emphasis on applied optimization methods. A distinction can be made between classical systems, which operate with a common cycle time, and the more flexible ones, phase-based approaches, which are shown to be more suitable for adaptive traffic control. Classic optimization solutions for this problem result in a model which is relatively easy to represent but may be difficult to fit into the standard mixed-integer programming (MIP) scheme. We propose an alternative approach to find an optimal global solution for the green wave problem on hot routes, which consists of reducing it to a Job Shop Scheduler problem using the Webster Model to adapt the cycles to road characteristics and average traffic speed.
... Then, we adopt DA.Y2 = [1, 0, 0, 1, 0] to calculate the footmark graph. Specifically, we fetch Y2 from disk and do the following calculation (line [11][12][13][14][15][16][17][18][19][20][21] Since v1 is reached, the calculation is over. Note that there are a small portion of trajectories whose starting time is not in T . ...
... Informally, a hot route is a path with heavy traffic. Various trajectory clustering approaches [16,15,9,14] can be utilized to discover hot routes. In [24], an online algorithm is developed to detect the hot motion paths that have been frequently traversed by the past travelers. ...
Conference Paper
Full-text available
The rise of GPS-equipped mobile devices has led to the emergence of big trajectory data. In this paper, we study a new path finding query which finds the most frequent path (MFP) during user-specified time periods in large-scale historical trajectory data. We refer to this query as time period-based MFP (TPMFP). Specifically , given a time period T , a source vs and a destination v d , TPMFP searches the MFP from vs to v d during T. Though there exist several proposals on defining MFP, they only consider a fixed time period. Most importantly, we find that none of them can well reflect people's common sense notion which can be described by three key properties, namely suffix-optimal (i.e., any suffix of an MFP is also an MFP), length-insensitive (i.e., MFP should not favor shorter or longer paths), and bottleneck-free (i.e., MFP should not contain infrequent edges). The TPMFP with the above properties will reveal not only common routing preferences of the past travelers, but also take the time effectiveness into consideration. Therefore, our first task is to give a TPMFP definition that satisfies the above three properties. Then, given the comprehensive TPMFP definition, our next task is to find TPMFP over huge amount of tra-jectory data efficiently. Particularly, we propose efficient search algorithms together with novel indexes to speed up the processing of TPMFP. To demonstrate both the effectiveness and the efficiency of our approach, we conduct extensive experiments using a real dataset containing over 11 million trajectories.
... Nevertheless, GPS trajectories often contain lots of noisy GPS coordinates, owing to the inevitable errors of GPS positioning [1][2][3][4][5][6][7]. ese coordinates deviate from the true positions of mobile devices, calling for the important preprocessing task of map-matching [8][9][10][11][12][13][14][15]. ...
... In the example of Figure 7, an input trajectory is projected to the segment SID 2, and the mean speed of the trajectory moving on the segment is estimated as 6 m/s. Referring to the speed histogram of SID 2, it can be observed that 6 m/s falls in the bucket interval [4,8), and the probability 0.2 of the input trajectory matches this segment. ...
Article
Full-text available
Map-matching, an important preprocessing task in many location-based services (LBS), projects each point of the global positioning system (GPS) within a trajectory dataset onto a digital map. The state-of-the-art map-matching algorithms typically employ Hidden Markov model (HMM) via shortest path computation. But the computation of the shortest path might not work well on low-sampling-rate trajectory data (e.g., one GPS point every 1–5 min), leading to low matching precision and high running time. To solve the problem, this paper firstly identifies frequent patterns (FPs) in historical trajectories to capture meaningful mobility behaviors, and then extracts mobile behavior criterion (MBC) of mobile users. Such a criterion generally represents the route choice of mobile users on road networks. Moreover, the temporal information within trajectory data was employed to estimate the speed of mobile users on road segments. The identified FPs, coupled with MBC and moving speed, help to improve the map-matching precision of low-sampling-rate trajectories. In addition, an FP-forest structure was proposed to index the identified FPs. The structure could greatly speed up the lookup of frequent paths for shorter running time. Furthermore, the FP-forest structure was pruned to reduce redundancy with smaller space cost. Finally, experiments were carried out on real-world datasets. The results confirm that our FP-matching method outperforms state-of-the-art in terms of effectiveness and efficiency.
... Trajectory Data Mining. The bike lane planning problem is related to the trajectory data mining [30], [31], [32], [33], [34], [35], [36], [37], [38]. Many systems have been proposed to discover frequently used routes based on massive trajectory data, e.g., [30], [31], [32], [33], [37], [39]. ...
... The bike lane planning problem is related to the trajectory data mining [30], [31], [32], [33], [34], [35], [36], [37], [38]. Many systems have been proposed to discover frequently used routes based on massive trajectory data, e.g., [30], [31], [32], [33], [37], [39]. There are also some projects on clustering/summarizing trajectories on the road network [40], [41], which help urban planners to know the popular routes and improve public transportation system. ...
Article
Full-text available
Cycling as a green transportation mode has been promoted by many governments all over the world. As a result, constructing effective bike lanes has become a crucial task to promote the cycling life style, as well-planned bike lanes can reduce traffic congestions and safety risks. Unfortunately, existing trajectory mining approaches for bike lane planning do not consider one or more key realistic government constraints: 1) budget limitations, 2) construction convenience, and 3) bike lane utilization. In this paper, we propose a data-driven approach to develop bike lane construction plans based on the large-scale real world bike trajectory data collected from Mobike, a station-less bike sharing system. We enforce these constraints to formulate our problem and introduce a flexible objective function to tune the benefit between coverage of users and the length of their trajectories. We prove the NP-hardness of the problem and propose greedy-based heuristics to address it. To improve the efficiency of the bike lane planning system for the urban planner, we propose a novel trajectory indexing structure and deploy the system based on a parallel computing framework (Storm) to improve the system's efficiency. Finally, extensive experiments and case studies are provided to demonstrate the system efficiency and effectiveness.
... • Although some classification algorithms have been proposed to deal with spatio-temporal trajectories [8][9][10], they rely on high-resolution mobility data where the moving object frequently reports its current location. This type of data is usually generated by certain GPS feeds like vehicle-mounted navigators. ...
... Besides, the TRAOD framework [9] provides a trajectory classification method to detect abnormal trajectories. Finally, Li et al. [10] proposed a novel trajectory representation based on geographic motifs. These motifs give raise to a new feature space that is used to build a hierarchical-based classifier. ...
Article
Volunteer Geographic Information (VGI) is one of the key enablers of the mobility mining discipline. This work introduces a novel data-driven methodology to create a classifier of spatio-temporal trajectories based on VGI. Although other solutions have been proposed, they usually do not fully consider the low resolution and uncertainty of VGI due to its inherent human nature. The proposed approach introduces a classifier based on fuzzy rules that are able to deal with this kind of data. The solution is applied in a use case for real-time detection of tourists and local citizens’ flows and it is compared with a well-established trajectory classifier exhibiting quite promising results.
... Sometimes, a user wishes for a route with less traffic [11], [12] while another time shortest path is preferred. Nevertheless, the main goal is still pattern extraction and mining [3], [6], [11], [13], [14], grouping similar routes and trajectories [15], [16], route prediction [7], [8], [17] and hot path mining [18], [19] including pattern mining [3], [6], [13], [14], trajectory clustering [15], [16], hot route discovery [18], [19], trajectory prediction [7], [8], [17] etc. Nevertheless, none of the afore-mentioned research approaches the challenge of discovering the most popular routes from one given location to another based on defined user constraints and preferences, and in many cases, uses simulated data. ...
... Sometimes, a user wishes for a route with less traffic [11], [12] while another time shortest path is preferred. Nevertheless, the main goal is still pattern extraction and mining [3], [6], [11], [13], [14], grouping similar routes and trajectories [15], [16], route prediction [7], [8], [17] and hot path mining [18], [19] including pattern mining [3], [6], [13], [14], trajectory clustering [15], [16], hot route discovery [18], [19], trajectory prediction [7], [8], [17] etc. Nevertheless, none of the afore-mentioned research approaches the challenge of discovering the most popular routes from one given location to another based on defined user constraints and preferences, and in many cases, uses simulated data. ...
Article
Full-text available
Accurate analysis of tourist movement is essential for a country to devise sustainable policies for promoting and growing tourism. From the activities of tourists and the spots they visit, the amount of revenue generated for a particular region can be predicted. However, the tourist preferences evolve and vary from one user to another, and thus, a tourist spot favorite for one set of users is not preferred by another set of users. This paper aims to design and implement a novel application to recommend an optimal travel route based on user constraints. The user constraints can be the maximum time, distance, and popularity of a particular place. A real data is collected from the WiFi routers installed at different tourist spots of Jeju Island, South Korea. We apply a Markov Chain Model to predict the popularity of different places on a short-term and long-term basis. The popularity index alongside user constraints is provided to find optimal routes. A responsive web-based prototype is developed to collect user constraints, and in response, recommends optimal routes using Google Maps directory services. Results indicate the difference between short-term and long-term popularity to prove the effectiveness of Markov Chains in forecasting long-term behavior. The system is made responsive for all sizes of screens to make it uniformly serviceable on mobile phones. The accuracy of the system is computed based on the historical data and the recommendation system, and it is ascertained to fall between 95% to 100% all the time. Furthermore, the results are compared with popular state-of-the-art methods, and they are found to be significantly better than in long-term location prediction.
... Based on the previous statements, the hot-routes are represented by a sequence of grid cells. Our method is based on the [11] method with some improvements. The following parts show the method to find the hot routes in the grid based map. ...
... Density Reachable. Based on the reference [11] concept called densityreachable which has directly and indirectly density-reachable. We transfer this concepts to trajectory model to detect the hot route based on the grid map, and redefine the definitions. ...
Chapter
Full-text available
A perfect charging station network plays a key role in Electric Vehicle (EV) adoption. In order to find optimal cost to construct the network and maximize satisfaction of usage consideration, we proposes a data driven framework for solving the problem of locating the charging station. Spatial-temporal models are built to analyze the EV usage behavior in the urban area. The features such as charging demand, high energy consumption area, and highly traveled paths are captured. We evaluate the proposed models on a real-world EV dataset. The results clearly demonstrate the efficiency and accuracy of our models on locating EV charging stations.
... Two main lines of research focus on: (1) traic monitoring at an aggregate level, e.g., to help city administration, and (2) services that road users are getting. Existing work towards traic monitoring include monitoring congestion [129], assessing the safety of roads and intersections [143], traic prediction [132], evacuation routing [265], optimizing the public transportation schedules [193]. Eforts on the services provided to road users include routing queries that balance the traic across roads [69], help ind drivers inding nearest facilities [121], personalized routing [130], eco-routing for minimizing greenhouse emissions [134], and enabling multi-modal trip planning [225]. ...
Article
Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the domain of mobility data science. Towards a unified approach to mobility data science, we present a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art, and describe open challenges for the research community in the coming years.
... Transit companies need to ensure their services evolve with the needs of their passengers and help shape better service in their growth. Hence, transit companies have to periodically share their passengers' trajectories among their own internal departments and external transportation companies in order to perform a comprehensive analysis of passenger flow in an area, with the goal of supporting trajectory data mining [19,26,27,50,63] and traffic management [33]. By using a probabilistic flowgraph, as shown in Figure 2, an analyst can identify the major trends in passenger flow and hot paths in a traffic network. ...
Thesis
Full-text available
The increasing use of location-aware devices provides many opportunities for analyzing and mining human mobility. The trajectory of a person can be represented as a sequence of visited locations with different timestamps. Storing, sharing, and analyzing personal trajectories may pose new privacy threats. Previous studies have shown that employing traditional privacy models and anonymization methods often leads to low information quality in the resulting data. In this thesis we propose a method for achieving anonymity in a trajectory database while preserving the information to support effective passenger flow analysis. Specifically, we first extract the passenger flowgraph, which is a commonly employed representation for modeling uncertain moving objects, from the raw trajectory data. We then anonymize the data with the goal of minimizing the impact on the flowgraph. Extensive experimental results on both synthetic and real-life data sets suggest that the framework is effective to overcome the special challenges in trajectory data anonymization, namely, high dimensionality, sparseness, and sequentiality.
... The increasing availability of large data sets containing spatio-temporal data is driving the need for data analysis methodologies. Spatio-temporal data has been widely used in various applications: traffic monitoring [5,21,20], trajectory similarity search used in route searching and semantic understanding, [11,3,10,28], weather and pollution monitoring [25], and anomaly detection [7,8,24]. ...
Preprint
Full-text available
With the rise of the Internet of Things, strategies for effectively processing big data are essential for discovering meaningul insights. The time series datasets produced by groups of interconnected devices contain valuable underlying patterns. Recent works have extracted patterns from spatio-temporal datasets to aid in road network generation, activity recognition, and others. The speed and accuracy of the underlying geometry reconstruction are important in these applications. Existing methods such as kernel density estimation (KDE) have been used but are often computationally expensive. We propose modifying edge quadtrees to utilize their effective heirarchical structure. Our modification estimates density using a novel trajectory count function which provides mathematical guarantees on the stability of the count by enforcing an invariance to local perturbations. We evaluate our method's effectiveness at extracting the underlying geometry and representative subsample points. For verification, we compare against a KDE variant at extracting the underlying shape of noisy synthetic trajectories travelling alonng the shape. We compare map extraction from GPS traces against current methods. Our method significantly improves runtime while extracting the geometry better or at least comparably. We also compare against maxmin subsampling on an activity recognition data set and find a significant runtime improvement with comparable performance.
... Several studies were proposed to discover traffic jam in a constrained road network. A traffic detection algorithm that uses the density of traffic in sequences of road segments to discover hot routes in a road network was proposed in [43]. We found also, NETSCAN, a clustering algorithm for road networks [44]. ...
Article
Full-text available
Electric Vehicles (EVs) are regarded to be among the most environmentally and economically efficient transportation solutions. However, barriers and range limitations hinder this technology’s progress and deployment. In this paper, we examine EV route planning to derive optimal routes considering energy consumption by analyzing historical trajectory data. More specifically, we propose a novel approach for EV route planning that considers real-time traffic incidents, road topology, charging station locations during battery failure, and finally, traffic flow prediction extracted from historical trajectory data to generate energy maps. Our approach consists of four phases: the off-line phase which aims to build the energy graph, the application of the A* algorithm to deliver the optimal EV path, the NEAT trajectory clustering which aims to produce dense trajectory clusters for a given period of the day, and finally, the on-line phase based on our algorithm to plan an optimal EV path based on real traffic incidents, dense trajectory clusters, road topology information, vehicle characteristics, and charging station locations. We set up experiments on real cases to establish the optimal route for electric cars, demonstrating the effectiveness and efficiency of our proposed algorithm.
... This does not imply that all activities must necessarily be associated with a point in a street. Furthermore, other network properties such as GPS trajectories and traffic densities of road networks [24] are not considered. The objective function used in SNAS is based on maximizing the activity coverage of summary paths, not on minimizing the distance of activities to summary paths. ...
Article
Data summarization is an important concept in data mining for finding a compact representation of a dataset. In spatial network activity summarization (SNAS), we are given a spatial network and a collection of activities (e.g., pedestrian fatality reports, crime reports) and the goal is to find k shortest paths that summarize the activities. SNAS is important for applications where observations occur along linear paths such as roadways, train tracks, etc. SNAS is computationally challenging because of the large number of k subsets of shortest paths in a spatial network. Previous work has focused on either geometry or subgraph-based approaches (e.g., only one path), and cannot summarize activities using multiple paths. This paper proposes a K-Main Routes (KMR) approach that discovers k shortest paths to summarize activities. KMR generalizes K-means for network space but uses shortest paths instead of ellipses to summarize activities. To improve performance, KMR uses network Voronoi, divide and conquer, and pruning strategies. We present a case study comparing KMR's network-based output (i.e., shortest paths) to geometry-based outputs (e.g., ellipses) on pedestrian fatality data. Experimental results on synthetic and real data show that KMR with our performance-tuning decisions yields substantial computational savings without reducing summary path coverage.
... Traffic density based methods have also been proposed to find hot routes in road networks. The work in [16] proposed a density-based algorithm to group roads based on their shared common traffic and proposed a framework to find popular routes. The work in [18] proposed a fuel-efficient route plan method based on game theory. ...
Article
Full-text available
While the problems of finding the shortest path and k-shortest paths have been extensively researched, the research community has been shifting its focus towards discovering and identifying paths based on user preferences. Since users naturally follow some of the paths more than other paths, the popularity of a given path often reflects such user preferences. Given a set of user traversals in a road network and a set of paths between a given source and destination pair, we address the problem of performing top-k ranking of the paths in that set based on path popularity. In this paper, we introduce a new model for computing the popularity scores of paths. Our main contributions are threefold. First, we propose a framework for modeling user traversals in a road network as transactions. Second, we present an approach for efficiently computing the popularity score of any path based on the itemsets extracted from the transactions using pattern mining techniques. Third, we conducted an extensive performance evaluation with two real datasets to demonstrate the effectiveness of the proposed scheme.
... Such devices have led to large movement databases that store trajectory data, in which each point represents a position in space at a given time. These databases contain a great deal of knowledge and require analysis (Demšar et al., 2015) such as the extraction of movement patterns across diverse trajectories in order to perform human mobility or traffic monitoring (Li et al., 2007;Zheng et al., 2008), motion prediction (Sung et al., 2012), human action recognition (Yao et al., 2017) or detection of abnormal behavior in maritime routes (Vespe & Mazzarella, 2016). ...
Article
Full-text available
In the context of the surveillance of the maritime traffic, a major challenge is the automatic identification of traffic flows from a set of observed trajectories, in order to derive good management measures or to detect abnormal or illegal behaviours for example. In this paper, we propose a new modelling framework to cluster sequences of a large amount of trajectories recorded at potentially irregular frequencies. The model is specified within a continuous time framework, being robust to irregular sampling in records and accounting for possible heterogeneous movement patterns within a single trajectory. It partitions a trajectory into sub-trajectories, or movement modes, allowing a clustering of both individuals’ movement patterns and trajectories. The clustering is performed using non parametric Bayesian methods, namely the hierarchical Dirichlet process, and considers a stochastic variational inference to estimate the model’s parameters, hence providing a scalable method in an easy-to-distribute framework. Performance is assessed on both simulated data and on our motivational large trajectory dataset from the automatic identification system, used to monitor the world maritime traffic: the clusters represent significant, atomic motion-patterns, making the model informative for stakeholders.
... Density-based clustering approaches usually adopt density thresholds to determine groups of flows based on some well-known density clustering algorithms such as DBSCAN (Ester et al. 1996) and OPTICS (Ankerst et al. 1999). Density-based clustering approaches have been widely applied in multiple domains such as activity pattern mining (Scholz andLu 2014, Tao andThill 2016a;Jiang, Ferreira, and González 2017) and traffic monitoring (Atev, Masoud, and Papanikolopoulos 2006;Li et al. 2007). Distance-based clustering techniques are developed based on similarity and distance measures between flows, having the potential to reflect the grouping process of flows. ...
Article
Full-text available
Many types of spatial flows, including pedestrian flows and vehicle flows, are constrained by and distribute on spatial networks. In the literature, network-constrained flows are usually modeled as a direct line in planar space using methods designed for flows in planar space. Further, in spatial statistical analysis of flow patterns, distance measures and the hypothesis of spatial randomness of flows also have a significant impact on the determination of flow patterns. In this study, we extend the global and local Ripley’s K functions for planar flows to network space. Both the network and planar K-functions for flows are applied to detect the patterns of taxi Origin-Destination flow data on a road network at multiple scales. The effect of distance measures and simulation methods in the network and planar Ripley’s K functions are examined. We found that the planar K function is more sensitive to the changes in scale and tends to detect more clustered flows compared with the network K function at the same scale. Distance measures and simulation methods have a more significant influence on the detection of patterns of network-constrained flows than the selection of the network or planar Ripley’s K functions. This study suggests that distance measures and hypotheses of spatial randomness have to be chosen carefully before applying flow pattern analytic methods to network-constrained flows and interpreting the results of flow patterns.
... Some studies focused on the preprocessing of these trajectory data including data cleaning [1], interpolation [2], [3], compression [4], [5], and segmentation [6]. Other works are devoted to explore the application of trajectory data mining in route recommendation [7]- [10], motion prediction [11], behavior understanding [12], [13], abnormal detection [14], [15] and traffic monitoring [16]- [18]. ...
Article
Full-text available
Trajectory classification is a hot topic in the field of spatiotemporal data mining. Existing models exert spatial or temporal computation on trajectory data, which require huge efforts and are often time consuming and lack of efficiency. This article proposes a model to classify unknown ship trajectories through a syntax recognition approach. By using the background semantic information in the rasterized sea chart, the model transforms the ship trajectories into symbolic sentences containing both spatiotemporal and semantic information, and reduces their scale. The class feature is expressed as a context-free grammar and the data classification is implemented through syntax parsing. The parsing requires less computation and is more efficient. Experiments are carried out to verify the model’s practicability, and the results show that it is valid and effective.
... Easing traffic congestion mainly provides support for urban traffic planning and management. For instance, Li et al. [25] proposed a density-based clustering algorithm called FlowScan to identify high-density traffic locations at road-level to alleviate traffic congestion. Anbaroglu et al. [26] proposed a Non-Recurrent Congestion (NRC) events detection methodology to support the accurate detection of NRC events on large urban road networks. ...
Article
Full-text available
Accurate detection of locations of indoor high-density crowds is crucial for early warning and emergency rescue during indoor safety accidents. The spatial structure of indoor environments is more complicated than outdoor environments. The locations of indoor high-density crowds are more likely to be the sites of security accidents. Existing detection methods for high-density crowd locations mostly focus on outdoor environments, and relatively few detection methods exist for indoor environments. This study proposes a novel detection framework for high-density indoor crowd locations termed IndoorSRC (Simplification-Reconstruction-Cluster). In this paper, a novel indoor spatiotemporal clustering algorithm called Indoor-STAGNES is proposed to detect the indoor trajectory stay points to simplify indoor movement trajectory. Then, we propose use of a Kalman filter algorithm to reconstruct the indoor trajectory and properly align and resample the data. Finally, an indoor spatiotemporal density clustering algorithm called Indoor-STOPTICS is proposed to detect the locations of high-density crowds in the indoor environment from the reconstructed trajectory. Extensive experiments were conducted using indoor Wi-Fi positioning datasets collected from a shopping mall. The results show that the IndoorSRC framework evidently outperforms the existing baseline method in terms of detection performance.
... Alternatively, some approaches combine the use of Bluetooth and adaptive signal control data, see, Yuan et al. (2018). Finally, related works using unsupervised approaches to detect non-recurrent traffic congestions (incidents) on urban road networks have been also presented, see, e.g., Li et al. (2007); Anbaroglu et al. (2014). The idea behind these methods is based on clustering congested links. ...
Article
Full-text available
A novel automatic incident detection (AID) method for freeways, based on the use of data provided by Bluetooth sensors and an unsupervised anomaly detection approach, is presented. The two main advantages of the proposed AID system are: (i) the use of Bluetooth sensors offers several practical advantages over inductive loop detectors (ILD), which is one of the preferred sensing technology for traffic flow; and (ii) the unsupervised anomaly detection approach builds a model without the need of incident information. A common problem when designing an AID system is that incident information, i.e., ground-truth data, with enough accuracy is seldom available. Isolation forest is the unsupervised anomaly detection approach adopted in this work. This method is based on characterizing anomalous traffic conditions by exploiting the fact that anomalies tend to be isolated. The most remarkable feature of this anomaly detection method is its high detection performance while having a very simple tuning procedure and an extremely low computational demand. Finally, the effectiveness of the presented AID method is demonstrated using real traffic data collected by a network of Bluetooth sensors installed in Ayalon Highway, Tel Aviv.
... Graph-based clustering methods use the proximity graph to define a similarity measure between two points and partition a set of points into k groups based on graph-cut algorithms (e.g., min-cut) [13,23,35]. Network connectivity constraints were incorporated into the clustering methods to identify dense graph-paths or sub-networks [24,29,36]. Clustering methods that have to obey size constraints have been investigated [6,25]. ...
Article
Full-text available
Given a geometric space and a set of weighted spatial points, the Size Constrained k Simple Polygons (SCkSP) problem identifies k simple polygons that maximize the total weights of the spatial points covered by the polygons and meet the polygon size constraint. The SCkSP problem is important for many societal applications including hotspot area detection and resource allocation. The problem is NP-hard; it is computationally challenging because of the large number of spatial points and the polygon size constraint. Our preliminary work introduced the Nearest Neighbor Triangulation and Merging (NNTM) algorithm for SCkSP to meet the size constraint while maximizing the total weights of the spatial points. However, we find that the performance of the NNTM algorithm is dependent on the t-nearest graph. In this paper, we extend our previous work and propose a novel approach that outperforms our prior work. Experiments using Chicago crime and U.S. Federal wildfire datasets demonstrate that the proposed algorithm significantly reduces the computational cost of our prior work and produces a better solution.
... In the face of huge amounts of data, what kind of evaluation function should be used as the condition of congestion is the first issue to be considered. Researchers have proposed a large number of methods, such as BP neural network (Yu et al. 2016), deep learning (Ma et al. 2015), Markov logic networks (Lippi et al. 2010), hybrid genetic algorithm based online support vector machine (Su and Yu 2007), mixture of Gaussian trees (Šingliar and Hauskrecht 2007), density-based algorithm (Li et al. 2007), etc. to detect traffic conditions. ...
Article
Full-text available
Road hotspots detection method is a key issue in the field of intelligent transportation research. Compared with normal hotspots caused by high traffic flow, abnormal hotspots, which are results of road accidents, perform an occurrence time random behavior and difficult to predict. Deducing from the pulse diagnosis method, in this paper, a region real-time congestion factor is constructed to realize road abnormal hotspots discovery. Taxi’s GPS data of Hangzhou City, China are employed to find abnormal pulse of road segment, while the relationship between proposed congestion factor and the real-time traffic data is discussed. Two accidental scenarios are built to verify the validity of the proposed method. The experiment results show that the proposed method performs well in real-time abnormal hotspot detection and analysis output could be useful in path planning and traffic management.
... The main methods to get hot routes are moving object clustering, trajectory clustering and simple graph linkage [2]. The purpose of moving object clustering is to analyze the aggregation pattern of moving objects and find moving objects that move together for a certain time. ...
Conference Paper
Hot routes refer to routes that massive vehicles pass through in a period of time. Mining hot routes of private vehicles can help us understand the travel behavior of private vehicles, which is of great help to urban traffic management and construction. This study aims to mine hot routes of private vehicles using Electronic Registration Identification (ERI) data, which is huge amount of traffic data. In this paper, we propose a mining algorithm, Prefix-projected Sequence Pattern Mining based on Successor Set (PSSS), which is based on the idea of PrefixSpan algorithm to mine hot routes. Firstly, we extract private vehicle trips from ERI data. Then we transform trips into string sequences. We use the PSSS algorithm to mine hot routes of private vehicles. Finally, we analyze the hot routes of private vehicles and compare efficiency of two algorithms. The experimental results are of guiding significance to the traffic management and construction of intelligent transportation.
... Density of traffic on road segments can be used to discover 'hot routes' (Li et al., 2007). In San Francisco the traffic intensity on each road segment was used to calculate an estimated kernel density, which is a land use modeling method that produces a smooth surface over the city area (SCI, 2014). ...
Article
Scooters, or gasoline powered two-wheelers, are becoming increasingly popular in the Netherlands. They provide fast, independent and affordable transportation, especially in urban congested areas. Unfortunately, they also have considerable adverse impacts on the environment and human health. The three most prominent impacts are associated with air pollution, noise pollution and traffic accidents. While the total contribution of emissions by scooters is relatively small compared to total traffic related emissions, they have a disproportionally large impact on their direct environment, especially when sharing roads with bicycles as in the Netherlands, where they are characterized as super-polluters. A scoping GIS based assessment, using theoretical and available secondary data, could identify routes with highest likelihood of scooter presence to estimate exhaust and noise impacts and related traffic accidents. Estimated are provided for the total population, and the number of childcare facilities within the impact areas. For future projections four different scenarios are analyzed. For the case study of the town of Enschede in the Netherlands the present noise/exhaust environmental impact of scooters is affecting at least 30% of the population and in the future this number can increase to 38%–53%.
... Based on the massive bike trips, [3] provides bike lane planning recommendations to the goverment. And the frequent route mining [6,17,20,23] is benecial to city planners and provides a guidance for congestion analysis. In addition, [18] nds top-k in uential locations that cover as much trajectories. ...
Conference Paper
Full-text available
Illegal vehicle parking is a common urban problem faced by major cities in the world, as it incurs traffic jams, which lead to air pollution and traffic accidents. Traditional approaches to detect illegal vehicle parking events rely highly on active human efforts, e.g., police patrols or surveillance cameras. However, these approaches are extremely ineffective to cover a large city. The massive and high quality sharing bike trajectories from Mobike offer us with a unique opportunity to design a ubiquitous illegal parking detection system, as most of the illegal parking events happen at curbsides and have significant impact on the bike users. Two main components are employed to mine the trajectories in our system: 1)~trajectory pre-processing, which filters outlier GPS points, performs map-matching and builds indexes for bike trajectories; and 2)~illegal parking detection, which models the normal trajectories, extracts features from the evaluation trajectories and utilizes a distribution test-based method to discover the illegal parking events. The system is deployed on the cloud internally used by Mobike. Finally, extensive experiments and many insightful case studies based on the massive trajectories in Beijing are presented.
... Trajectories collected by transit companies are periodically shared with internal and external organizations for the purpose of trajectory mining [44] [53] and traffic management [52][6] [33]. Publishing or sharing raw trajectories raises privacy concerns because the data is susceptible to attacks that rely on attacker's background knowledge about some target victims whose trajectories are included in the published data. ...
Article
Full-text available
In recent years, the collection of spatio-temporal data that captures human movements has increased tremendously due to the advancements in hardware and software systems capable of collecting person-specific data. The bulk of the data collected by these systems has numerous applications, or it can simply be used for general data analysis. Therefore, publishing such big data is greatly beneficial for data recipients. However, in its raw form, the collected data contains sensitive information pertaining to the individuals from which it was collected and must be anonymized before publication. In this paper, we study the problem of privacy-preserving passenger trajectories publishing and propose a solution under the rigorous differential privacy model. Unlike sequential data, which describes sequentiality between data items, handling spatio-temporal data is a challenging task due to the fact that introducing a temporal dimension results in extreme sparseness. Our proposed solution introduces an efficient algorithm, called SafePath, that models trajectories as a noisy prefix tree and publishes ϵ-differentially-private trajectories while minimizing the impact on data utility. Experimental evaluation on real-life transit data in Montreal suggests that SafePath significantly improves efficiency and scalability with respect to large and sparse datasets, while achieving comparable results to existing solutions in terms of the utility of the sanitized data.
Chapter
In this chapter, we will introduce the development of trajectory clustering analysis. First, we review some related works on clustering of trajectory data, especially including the subspace clustering-based methods. Second, we depict a general framework, termed as atomic-representation-based subspace clustering (ARSC) for the clustering of trajectory data. ARSC is a subspace clustering framework by first computing the atomic representations of data points and then clustering them using the representations. By using ARSC as a general platform, we introduce a robust subspace clustering method that is referred as minimum error entropy-based sparse subspace clustering (MEESSC) against outliers and heavy data noises. MEESSC computes the representation of each data point by minimizing the ℓ1 norm regularized minimum error entropy-based loss function. Experimental results are shown to validate the efficacy and robustness of MEESSC for the clustering of trajectory data.
Article
Traffic congestion increases travel time and is a major source of pollution and health damage in developing-country cities. Data scarcity frequently confines traffic improvement projects to sites where congestion can be easily measured. This article uses spatiotemporal data from new global sources to revisit the siting problem in Dhaka, Bangladesh, where local congestion measures are augmented by estimates of citywide travel time, pollution exposure, and pollution vulnerability. We combine Google Traffic data with an econometric model linking traffic, pollution readings from a local monitoring station, and weather data to estimate the spatial distribution of vehicular pollution. We explore pollution-vulnerability implications by incorporating spatial distributions of poor households, children, and the elderly. Using the Open Source Routing Machine and OpenStreetMaps, we estimate systemwide travel-time gains from reducing congestion at each point in a grid covering the Dhaka metro area. We find a large divergence of siting priorities in single-dimensional exercises that focus exclusively on local congestion, citywide travel time, vehicular pollution, or vulnerable-resident pollution exposure. By implication, optimal siting requires a social objective function with explicit weights assigned to each of the four dimensions. The new global information sources permit extending this multidimensional approach to many cities throughout the developing world.
Chapter
Ship trajectory anomaly detection, route planning, location prediction, collision detection and other issues have become the main research directions in the field of ocean navigation. Ship trajectory clustering is the key to address these problems. By mining the motion patterns of ship trajectory, those similar trajectories are grouped into the same category. Traditional trajectory clustering method usually needs to select the Spatio-temporal trajectory measurement method based on the data volume, computational complexity, noise and other influencing factors. The selection of optimal similarity measure formula needs prior knowledge and extensive experimentation, resulting in computational intensive and time-consuming. In this paper, we propose a ship trajectory motion pattern extraction algorithm based on one-dimensional convolutional auto-encoder without Spatio-temporal trajectory measurement methods. By extracting the low-dimensional representation of the ship’s trajectory, our approach can keep the sequence of trajectory points and reduce the distance calculation bias. The experimental results show that our proposed algorithm has good clustering performance while preserving the main motion characteristics of ship trajectory.
Article
The discovery of moving object trajectory patterns representing high traffic density has been covered in various works using diverse approaches. These models are useful in areas such as transportation planning, traffic monitoring, and advertising on public roads. However, though studies tend to recognize the importance of these types of patterns in utility, they usually do not consider traffic congestion as a particular condition of high traffic. In this work, we present a model for the discovery of high traffic flow patterns in relation to traffic congestion. This relationship is represented in terms of traffic that is shared between different sectors of the pattern, making it possible to identify traffic flow situations causing congestion. We also complement this model by discovering alternative paths for the severe traffic depicted in these patterns. These alternative paths depend on traffic level and location inside the road network. Depending on the traffic conditions, alternative paths are commonly sought by drivers when they are approaching a traffic jam, in order to mitigate the effects of traffic congestion. We compare these models with related work from similar areas and validate them by conducting experiments using real data. We describe discovered patterns related to the main elements of the road network in the dataset and show their advantages in comparison to related models. Based on the displayed metrics, the algorithms’ implementation offers good performance execution for the given dataset volume. The results presented confirm the usefulness of the proposed patterns as a tool that helps to improve traffic, allowing the identification of problems and possible alternatives.
Article
This article comprehensively surveys the development of trajectory data classification. Considering the critical role of trajectory data classification in modern intelligent systems for surveillance security, abnormal behavior detection, crowd behavior analysis, and traffic control, trajectory data classification has attracted growing attention. According to the availability of manual labels, which is critical to the classification performances, the methods can be classified into three categories, i.e., unsupervised, semi-supervised, and supervised. Furthermore, classification methods are divided into some sub-categories according to what extracted features are used. We provide a holistic understanding and deep insight into three types of trajectory data classification methods and present some promising future directions.
Chapter
The road traffic condition information includes some important traffic control information such as U-turn or left turn which affect driver’s travel. We proposed a novel way to identify road intersection and traffic control information through analyzing floating car trajectory data automatically and timely. First, a difference-based algorithm is proposed to filter the outlier trajectory data. Then a map matching method based on three-level grid is applied. Finally, an automatic algorithm is developed to recognize the road traffic control information timely. The entire floating trajectory data of Fuzhou about 1.5 million records are used to verify the proposed method. Experiment result indicates that the method has high efficiency and accuracy rate. We construct this system based on trajectory data by 6000 taxis a day. The results of the operation show that the correct rate is high 87.7%, which indicates that it is very valuable.
Article
Effective public transit planning needs to address realistic travel demands, which can be illustrated by corridors across major residential areas and activity centers. It is vital to identify public transit corridors that contain the most significant transit travel demand patterns. We propose a two-stage approach to discover primary public transit corridors at high spatio-temporal resolutions using massive real-world smart card and bus trajectory data, which manifest rich transit demand patterns over space and time. The first stage was to reconstruct chained trips for individual passengers using multi-source massive public transit data. In the second stage, a shared-flow clustering algorithm was developed to identify public transit corridors based on reconstructed individual transit trips. The proposed approach was evaluated using transit data collected in Shenzhen, China. Experimental results demonstrated that the proposed approach is a practical tool for extracting time-varying corridors for many potential applications, such as transit planning and management. © 2018
Article
Studying the movements of crowds is important for understanding and predicting the behavior of large groups of people. When analyzing such crowds, one is often interested in the long-term macro-level motions of the crowd, as opposed to the micro-level individual movements at each moment in time. A high-level representation of these motions is thus desirable. In this work, we present a scalable method for detection of crowd motion patterns, i.e., spatial areas describing the dominant motions within the crowd. For measuring crowd movements, we propose a fast, scalable, and low-cost method based on proximity graphs. For analyzing crowd movements, we utilize a three-stage pipeline: (1) represents the behavior of each person at each moment in time using a low-dimensional data point, (2) cluster these data points based on spatial relations, and (3) concatenate these clusters based on temporal relations. Experiments on synthetic datasets reveals our method can handle various scenarios including curved lanes and diverging flows. Evaluation on real-world datasets shows our method is able to extract useful motion patterns from such scenarios which could not be properly detected by existing methods. Overall, we see our work as an initial step towards rich pattern recognition.
Chapter
With the widespread use of devices with geo-positioning technologies, an unprecedented volume of trajectory data is becoming available. In this paper, we propose and study the problem of multi-range query processing over trajectories, that finds the trajectories that pass through a set of given spatio-temporal ranges. Such queries can facilitate urban planning applications by finding traffic movement flows between different parts of a city at different time intervals. To our best knowledge, this is the first work on answering multi-range queries on trajectories. In particular, we first propose a novel two-level index structure that preserves both the co-location of trajectories, and the co-location of points within trajectories. Next we present an efficient query processing algorithm that employs several pruning techniques at different levels of the index. The results of our extensive experimental studies on two real datasets demonstrate that our approach outperforms the baseline by 1 to 2 orders of magnitude.
Article
Full-text available
At present, a large amount of traffic-related data is obtained manually and through sensors and social media, e.g., traffic statistics, accident statistics, road information, and users' comments. In this paper, we propose a novel framework for mining traffic risk from such heterogeneous data. Traffic risk refers to the possibility of occurrence of traffic accidents. Specifically, we focus on two issues: 1) predicting the number of accidents on any road or at intersection and 2) clustering roads to identify risk factors for risky road clusters. We present a unified approach for addressing these issues by means of feature-based non-negative matrix factorization (FNMF). In particular, we develop a new multiplicative update algorithm for the FNMF to handle big traffic data. Using real-traffic data in Tokyo, we demonstrate that the proposed algorithm can be used to predict traffic risk at any location more accurately and efficiently than existing methods, and that a number of clusters of risky roads can be identified and characterized by two risk factors. In summary, our work can be regarded as the first step to a new research area of traffic risk mining.
Article
Incident hotspots are used as a direct indicator of the needs for road maintenance and infrastructure upgrade, and an important reference for investment location decisions. Previous incident hotspot identification methods are all region based, ignoring the underlying road network constraints. We first demonstrate how region based hotspot detection may be inaccurate. We then present Dijkstra’s-DBSCAN, a new network based density clustering algorithm specifically for traffic incidents which combines a modified Dijkstra’s shortest path algorithm with DBSCAN (density based spatial clustering of applications with noise). The modified Dijkstra’s algorithm, instead of returning the shortest path from a source to a target as the original algorithm does, returns a set of nodes (incidents) that are within a requested distance when traveling from the source. By retrieving the directly reachable neighbors using this modified Dijkstra’s algorithm, DBSCAN gains its awareness of network connections and measures distance more practically. It avoids clustering incidents that are close but not connected. The new approach extracts hazardous lanes instead of regions, and so is a much more precise approach for incident management purposes; it reduces the O(n2) computational cost to O(n), and can process the entire U.S. network in seconds; it has routing flexibility and can extract clusters of any shape and connections; it is parallellable and can utilize distributed computing resources. Our experiments verified the new methodology’s capability of supporting safety management on a complicated surface street configuration. It also works for customized lane configuration, such as freeways, freeway junctions, interchanges, roundabouts, and other complex combinations.
Chapter
Within the mobility mining discipline, several solutions for the classification of spatio-temporal trajectories have been proposed. However, they usually do not fully consider the particularities of trajectories from human-generated data like online social networks. For that reason, this work introduces a novel classifier based on Support Vector Machines (SVM), which fits the low resolution of this type of geographic data. This solution is applied in a use case for the detection of tourist mobility exhibiting quite promising results.
Article
Full-text available
An important consideration in similarity-based retrieval of moving object trajectories is the definition of a distance function. The existing distance functions are usually sensitive to noise, shifts and scaling of data that commonly occur due to sensor failures, errors in detection techniques, disturbance signals, and different sampling rates. Cleaning data to eliminate these is not always possible. In this paper, we introduce a novel distance function, Edit Distance on Real sequence (EDR) which is robust against these data imperfections. Analysis and comparison of EDR with other popular distance functions, such as Euclidean distance, Dynamic Time Warping (DTW), Edit distance with Real Penalty (ERP), and Longest Common Subsequences (LCSS), indicate that EDR is more robust than Euclidean distance, DTW and ERP, and it is on average 50% more accurate than LCSS. We also develop three pruning techniques to improve the retrieval efficiency of EDR and show that these techniques can be combined effectively in a search, increasing the pruning power significantly. The experimental results confirm the superior efficiency of the combined methods.
Conference Paper
Full-text available
A moving cluster is defined by a set of objects that move close to each other for a long time interval. Real-life examples are a group of migrating animals, a convoy of cars moving in a city, etc. We study the discovery of moving clusters in a database of object trajectories. The difference of this problem compared to clustering trajectories and mining movement patterns is that the identity of a moving cluster remains unchanged while its location and content may change over time. For example, while a group of animals are migrating, some animals may leave the group or new animals may enter it. We provide a formal definition for moving clusters and describe three algorithms for their automatic discovery: (i) a straight-forward method based on the definition, (ii) a more efficient method which avoids redundant checks and (iii) an approximate algorithm which trades accuracy for speed by borrowing ideas from the MPEG-2 video encoding. The experimental results demonstrate the efficiency of our techniques and their applicability to large spatio-temporal datasets.
Conference Paper
Full-text available
This paper introduces a Viterbi algorithm to obtain a sub-optimal state sequence for trajectory-HMM, which is derived from HMM with explicit relationship between static and dynamic features. The trajectory-HMM can alleviate some limitations of HMM, which are (i) constant statistics within HMM state and (ii) conditional independence of observations given the state sequence, without increasing the number of model parameters. The proposed algorithm was applied to state-boundary optimization for Viterbi training and N-best rescoring. In a speaker-dependent continuous speech recognition experiment, trajectory-HMM with the proposed algorithm achieved about 14% error reduction over the standard HMM with the conventional Viterbi algorithm.
Conference Paper
Full-text available
This paper introduces a hierarchical Markov model that can learn and infer a user's daily movements through the commu- nity. The model uses multiple levels of abstraction in order to bridge the gap between raw GPS sensor measurements and high level information such as a user's mode of transporta- tion or her goal. We apply Rao-Blackwellised particle filters for efficient inference both at the low level and at the higher levels of the hierarchy. Significant locations such as goals or locations where the user frequently changes mode of trans- portation are learned from GPS data logs without requiring any manual labeling. We show how to detect abnormal be- haviors (e.g. taking a wrong bus) by concurrently tracking his activities with a trained and a prior model. Experiments show that our model is able to accurately predict the goals of a per- son and to recognize situations in which the user performs un- known activities.
Conference Paper
Full-text available
With the proliferation of mobile computing, the ability to index efficiently the movements of mobile objects becomes important. Objects are typically seen as moving in two-dimensional (x,y) space, which means that their movements across time may be embedded in the three-dimensional (x,y,t) space. Further, the movements are typically represented as trajectories, sequences of connected line segments. In certain cases, movement is restricted, and specifically in this paper, we aim at exploiting that movements occur in transportation networks to reduce the dimensionality of the data. Briefly, the idea is to reduce movements to occur in one spatial dimension. As a consequence, the movement data becomes two-dimensional (x,t). The advantages of considering such lower-dimensional trajectories are the reduced overall size of the data and the lower-dimensional indexing challenge. Since off-the-shelf systems typically do not offer higher-dimensional indexing, this reduction in dimensionality allows us to use such DBMSes to store and index trajectories. Moreover, we argue that, given the right circumstances, indexing these dimensionality-reduced trajectories can be more efficient than using a three-dimensional index. This hypothesis is verified by an experimental study that incorporates trajectories stemming from real and synthetic road networks.
Conference Paper
Full-text available
We investigate techniques for analysis and retrieval of object trajectories in two or three dimensional space. Such data usually contain a large amount of noise, that has made previously used metrics fail. Therefore, we formalize non-metric similarity functions based on the longest common subsequence (LCSS), which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to similar portions of the sequences. Stretching of sequences in time is allowed, as well as global translation of the sequences in space. Efficient approximate algorithms that compute these similarity measures are also provided. We compare these new methods to the widely used Euclidean and time warping distance functions (for real and synthetic data) and show the superiority of our approach, especially in the strong presence of noise. We prove a weaker version of the triangle inequality and employ it in an indexing structure to answer nearest neighbor queries. Finally, we present experimental results that validate the accuracy and efficiency of our approach
Article
Full-text available
This paper presents a unifying probabilistic framework for clustering individuals or systems into groups when the available data measurements are not multivariate vectors of xed dimensionality. For example, one might have data from a set of medical patients, where for each patient one has a set of of observed time-series, each time-series of potentially dierent length and dierent sampling rate. We propose a general model-based probabilistic framework for clustering data types of this form which are non-vector in nature and may vary in size from individual to individual. The Expectation-Maximization (EM) procedure for clustering within this framework is discussed and we discuss how it be applied in a general manner to clustering of sequences, time-series, trajectories, and other non-vector data. We show that a number of earlier algorithms can be viewed as special cases within this unifying framework. The paper concludes with several illustrations of the method, including clustering o...
Article
Full-text available
In this paper we address the problem of clustering trajectories, namely sets of short sequences of data measured as a function of a dependent variable such as time. Examples include storm path trajectories, longitudinal data such as drug therapy response, functional expression data in computational biology, and movements of objects or individuals in video sequences. Our clustering algorithm is based on a principled method for probabilistic modelling of a set of trajectories as individual sequences of points generated from a finite mixture model consisting of regression model components. Unsupervised learning is carried out using maximum likelihood principles. Specifically, the EM algorithm is used to cope with the hidden data problem (i.e., the cluster memberships). We also develop generalizations of the method to handle non-parametric (kernel) regression components as well as multi-dimensional outputs. Simulation results comparing our method with other clustering methods such as K-means and Gaussian mixtures are presented as well as experimental results on real data sets.
Article
Full-text available
Benchmarking spatiotemporal database systems requires the definition of suitable datasets simulating the typical behavior of moving objects. Previous approaches for generating spatiotemporal data do not consider that moving objects often follow a given network. Therefore, benchmarks require datasets consisting of such “network-based” moving objects. In this paper, the most important properties of network-based moving objects are presented and discussed. Essential aspects are the maximum speed and the maximum capacity of connections, the influence of other moving objects on the speed and the route of an object, the adequate determination of the start and destination of an object, the influence of external events, and time-scheduled traffic. These characteristics are the basis for the specification and development of a new generator for spatiotemporal data. This generator combines real data (the network) with user-defined properties of the resulting dataset. A framework is proposed where the user can control the behavior of the generator by re-defining the functionality of selected object classes. An experimental performance investigation demonstrates that the chosen approach is suitable for generating large data sets.
Article
This paper introduces a hierarchical Markov model that can learn and infer a user's daily movements through an urban community. The model uses multiple levels of abstraction in order to bridge the gap between raw GPS sensor measurements and high level information such as a user's destination and mode of transportation. To achieve efficient inference, we apply Rao–Blackwellized particle filters at multiple levels of the model hierarchy. Locations such as bus stops and parking lots, where the user frequently changes mode of transportation, are learned from GPS data logs without manual labeling of training data. We experimentally demonstrate how to accurately detect novel behavior or user errors (e.g. taking a wrong bus) by explicitly modeling activities in the context of the user's historical data. Finally, we discuss an application called “Opportunity Knocks” that employs our techniques to help cognitively-impaired people use public transportation safely.
Conference Paper
Spatial collocation patterns associate the co-existence of non-spatial features in a spatial neighborhood. An example of such a pattern can associate contaminated water reservoirs with certain deceases in their spatial neighborhood. Previous work on discovering collocation patterns converts neighborhoods of feature instances to itemsets and applies mining techniques for transactional data to discover the patterns. We propose a method that combines the discovery of spatial neighborhoods with the mining process. Our technique is an extension of a spatial join algorithm that operates on multiple inputs and counts long pattern instances. As demonstrated by experimentation, it yields significant performance improvements compared to previous approaches.
Conference Paper
Spatial co-location patterns represent the subsets of events whose instances are frequently located together in geographic space. We identified the computational bottleneck in the execution time of a current co-location mining algorithm. A large fraction of the join-based co-location miner algorithm is devoted to computing joins to identify instances of candidate co-location patterns. We propose a novel partial-join approach for mining co-location patterns efficiently. It transactionizes continuous spatial data while keeping track of the spatial information not modeled by transactions. It uses a transaction-based Apriori algorithm as a building block and adopts the instance join method for residual instances not identified in transactions. We show that the algorithm is correct and complete in finding all co-location rules which have prevalence and conditional probability above the given thresholds. An experimental evaluation using synthetic datasets and a real dataset shows that our algorithm is computationally more efficient than the join-based algorithm.
Conference Paper
The development of a spatiotemporal access method suitable for objects moving on fixed networks is a very attractive challenge due to the great number of real-world spatiotemporal database applications and fleet management systems dealing with this type of objects. In this work, a new indexing technique, named Fixed Network R-Tree (FNR-Tree), is proposed for objects constrained to move on fixed networks in 2-dimensional space. The general idea that describes the FNR-Tree is a forest of 1-dimensional (ID) R-Trees on top of a 2-dimensional (2D) R-Tree. The 2D R-Tree is used to index the spatial data of the network (e.g. roads consisting of line segments), while the ID R-Trees are used to index the time interval of each object's movement inside a given link of the network. The performance study, comparing this novel access method with the traditional R-Tree under various datasets and queries, shows that the FNR-Tree outperforms the R-Tree in most cases.
Conference Paper
The problem of mining spatiotemporal ,patterns is finding sequences,of events ,that occur frequently in spatiotemporal datasets. Spatiotemporal datasets store the evolution of objects over time. Examples include sequences of sensor images of a geographical region, data that describes the location and movement of individual objects over time, or data that describes the evolution of natural phenomena, such as forest coverage. The discovered patterns are sequences,of events,that occur most frequently. In this paper, we present DFS_MINE, a new algorithm for fast mining of ,frequent ,spatiotemporal ,patterns ,in ,environmental data. DFS_MINE, as its name suggests, uses a Depth-First-Search-like approach to the problem which ,allows very fast discoveries of long sequential patterns. ,DFS_MINE performs database ,scans ,to discover ,frequent sequences rather than relying on information stored in main memory, which has the advantage ,that the a mount of space required is minimal. Previous approaches utilize a Breadth-First-Search-like approach ,and ,are ,not efficient for discovering long frequent sequences. Moreover, they require storing in main memory all occurrences ,of each sequence in the database and, as a result, the a mount of space needed is rather large. Experiments showthat the I/O cost o fthe database scans is o ffset byth e efficiency of the DFS-like approach ,that ensures ,fast discovery of long frequent patterns. DFS_MINE is also ideal for mining frequent spatiotemporal sequences,with various spatial granularities. Spatial granularit y refers to how fine or how general our view of the space we are examining is.
Conference Paper
Given a collection of boolean spatial features, the co-location pattern discovery process finds the subsets of features frequently located together. For example, the analysis of an ecology dataset may reveal the frequent co-location of a fire ignition source feature with a needle vegetation type feature and a drought feature. The spatial co-location rule problem is different from the association rule problem. Even though boolean spatial feature types (also called spatial events) may correspond...
Book
The current trends in consumer electronics--including the use of GPS-equipped PDAs, phones, and vehicles, as well as the RFID-tag tracking and sensor networks--require the database support of a specific flavor of spatio-temporal databases. These we call Moving Objects Databases. Why do you need this book? With current systems, most data management professionals are not able to smoothly integrate spatio-temporal data from moving objects, making data from, say, the path of a hurricane very difficult to model, design, and query. Whether your field is geology, national security, urban planning, mobile computing, or almost anything in between, this book's concepts and techniques will help you solve the data management problems associated with this kind of data. + Focuses on the modeling and design of data from moving objects--such as people, animals, vehicles, hurricanes, forest fires, oil spills, armies, or other objects--as well as the storage, retrieval, and querying of that very voluminous data. + Demonstrates through many practical examples and illustrations how new concepts and techniques are used to integrate time and space in database applications. + Provides exercises and solutions in each chapter to enable the reader to explore recent research results in practice. Click here to view more details on Moving Objects Databases (Preface, TOC, Chapter 1, References, etc.).
Article
Existing trajectory clustering algorithms group similar trajectories as a whole, thus discovering common trajectories. Our key observation is that clustering trajectories as a whole could miss common sub-trajectories. Discovering common sub-trajectories is very useful in many applications, especially if we have regions of special interest for analysis. In this paper, we propose a new partition-and-group framework for clustering trajectories, which partitions a trajectory into a set of line segments, and then, groups similar line segments together into a cluster. The primary advantage of this framework is to discover common sub-trajectories from a trajectory database. Based on this partition-and-group framework, we develop a trajectory clustering algorithm TRACLUS. Our algorithm consists of two phases: partitioning and grouping. For the first phase, we present a formal trajectory partitioning algorithm using the minimum description length (MDL) principle. For the second phase, we present a density-based line-segment clustering algorithm. Experimental results demonstrate that TRACLUS correctly discovers common sub-trajectories from real trajectory data.
Conference Paper
The modeling of user behavior patterns for personalized information services in mobile environments has recently become a popular research theme. Most of the research aims at predicting the user's future behavior (and/or location) by extracting frequent patterns from the history of location data sequences. However, sometimes user behavior changes according to the external information such as date, time, weather, etc., and we cannot accurately predict it based on the location data sequences alone. In this paper, we propose a new travel destination prediction method including day and time as external information. First, the user's travel history information including the location, date and time is stored. Then, from the external information, time/day categories that have correlation to the user's destination based on entropy are determined. Finally, using the categories, a destination that depends on the external information can be successfully predicted. An application of the method to data collected from a car navigation system showed possibility for an improved performance comparing to the conventional methods. Higher destination prediction accuracy during the first several minutes after user's departure was reported.
Article
Sequential pattern mining is an important data mining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Most of the previously developed sequential pattern mining methods, such as GSP, explore a candidate generation-and-test approach [R. Agrawal et al. (1994)] to reduce the number of candidates to be examined. However, this approach may not be efficient in mining large sequence databases having numerous patterns and/or long patterns. In this paper, we propose a projection-based, sequential pattern-growth approach for efficient mining of sequential patterns. In this approach, a sequence database is recursively projected into a set of smaller projected databases, and sequential patterns are grown in each projected database by exploring only locally frequent fragments. Based on an initial study of the pattern growth-based sequential pattern mining, FreeSpan [J. Han et al. (2000)], we propose a more efficient method, called PSP, which offers ordered growth and reduced projected databases. To further improve the performance, a pseudoprojection technique is developed in PrefixSpan. A comprehensive performance study shows that PrefixSpan, in most cases, outperforms the a priori-based algorithm GSP, FreeSpan, and SPADE [M. Zaki, (2001)] (a sequential pattern mining algorithm that adopts vertical data format), and PrefixSpan integrated with pseudoprojection is the fastest among all the tested algorithms. Furthermore, this mining methodology can be extended to mining sequential patterns with user-specified constraints. The high promise of the pattern-growth approach may lead to its further extension toward efficient mining of other kinds of frequent patterns, such as frequent substructures.
Article
Given a collection of boolean spatial features, the co-location pattern discovery process finds the subsets of features frequently located together. For example, the analysis of an ecology dataset may reveal the frequent co-location of a fire ignition source feature with a needle vegetation type feature and a drought feature. The spatial co-location rule problem is different from the association rule problem. Even though boolean spatial feature types (also called spatial events) may correspond to items in association rules over market-basket datasets, there is no natural notion of transactions. This creates difficulty in using traditional measures (e.g. support, confidence) and applying association rule mining algorithms which use support based pruning. We propose a notion of user-specified neighborhoods in place of transactions to specify groups of items. New interest measures for spatial co-location patterns are proposed which are robust in the face of potentially infinite overlapping neighborhoods. We also propose an algorithm to mine frequent spatial co-location patterns and analyze its correctness, and completeness. We plan to carry out experimental evaluations and performance tuning in the near future.
Article
We investigate techniques for analysis and retrieval of object trajectories in a two or three dimensional space. Such kind of data usually contain a great amount of noise, that makes all previously used metrics fail. Therefore, here we formalize non-metric similarity functions based on the Longest Common Subsequence (LCSS), which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to the similar portions of the sequences. Stretching of sequences in time is allowed, as well as global translating of the sequences in space. Efficient approximate algorithms that compute these similarity measures are also provided. We compare these new methods to the widely used Euclidean and Time Warping distance functions (for real and synthetic data) and show the superiority of our approach, especially under the strong presence of noise. We prove a weaker version of the triangle inequality and employ it in an indexing structure to answer nearest neighbor queries. Finally, we present experimental results that validate the accuracy and efficiency of our approach.