Conference Paper

Traffic Density-Based Discovery of Hot Routes in Road Networks

July 2007

July 2007

DOI:10.1007/978-3-540-73540-3_25

Source
DBLP

Conference: Advances in Spatial and Temporal Databases, 10th International Symposium, SSTD 2007, Boston, MA, USA, July 16-18, 2007, Proceedings

Authors:

Xiaolei Li

Henan Polytechnic University

Jiawei Han

University of Illinois, Urbana-Champaign

Hector Gonzalez

Tecnológico de Monterrey

Finding hot routes (traffic flow patterns) in a road network is an important problem. They are beneficial to city planners, police departments, real estate developers, and many others. Knowing the hot routes allows the city to better direct traffic or analyze congestion causes. In the past, this problem has largely been addressed with domain knowledge of city. But in recent years, detailed information about vehicles in the road network have become available. With the development and adoption of RFID and other location sensors, an enormous amount of moving object trajectories are being collected and can be used towards finding hot routes. This is a challenging problem due to the complex nature of the data. If objects traveled in organized clusters, it would be straightforward to use a clustering algorithm to find the hot routes. But, in the real world, objects move in unpredictable ways. Variations in speed, time, route, and other factors cause them to travel in rather fleeting “clusters.” These properties make the problem difficult for a naive approach. To this end, we propose a new density-based algorithm named FlowScan. Instead of clustering the moving objects, road segments are clustered based on the density of common traffic they share. We implemented FlowScan and tested it under various conditions. Our experiments show that the system is both efficient and effective at discovering hot routes.

Understanding University Students’ Journey Using Advanced Data Analytics

Thesis

Full-text available

Aug 2021

Sameera Jayaratna

Motivated by the increasing influence of data analytics in the higher education sector, this thesis focuses on enhancing the effectiveness and quality of an undergraduate student's journey. An undergraduate student's journey begins when they enrol at a university and ends once employed in the graduate labour market. Findings of this research benefits stakeholders of education, such as educational policymakers, education providers, current and prospective students in solving a variety of problems including student drop-out, low course satisfaction, and undesirable graduate employment outcomes.

Online Clustering of Trajectories in Road Networks

Conference Paper

Jun 2020

Visualização interativa de dinâmicas de tráfego através de dados de trajetórias

Thesis

Full-text available

Dec 2018

Urbanization is accelerating worldwide, giving rise to serious traffic problems. With the increasing availability of location acquisition technologies, massive movement data are collected continuously in a streaming manner. These data are a valuable source to help transit agencies to identify abnormal events that require immediate attention to better direct traffic. In this regard, visual analytics can help by combining automated analysis with interactive visualization for effective understanding, reasoning, and decision-making. Traditional approaches aggregate movement by employing the concept of time-window discretization and exploring an entire dataset. However, they can present inconsistencies in time and space with the real traffic dynamics. In this thesis, we present a novel approach to discover global and local mobility patterns in real time. Different from other existing approaches, our method tracks the evolution of the objects’ movement in real time. We believe that no other approach captures and keeps track of how the hot routes evolve in an incremental manner. Moreover, we conducted extensive experiments on real-world and simulated datasets to evaluate the effectiveness of our method. We also present the benefits and limitations of our visualization proposal based on domain expert feedback. Finally, we present performance tests with very encouraging results to support our approach in visualizing the total traffic flow of a big city. The results demonstrate that our method scales linearly with the size of the dataset, and is able to deal with large datasets and with streams of high-sampling rates.

Real-time discovery of hot routes on trajectory data streams using interactive visualization based on GPU

Article

Full-text available

Sep 2018
COMPUT GRAPH-UK

With the increasing availability of location acquisition technologies, massive movement data are collected continuously in a streaming manner. These data are a valuable source to help transit agencies to monitor the routes with heavy traffic (hot routes) and to identify abnormal events that require immediate at- tention to better direct traffic. In this regard, visual analytics can help by combining automated analysis with interactive visualization for effective understanding, reasoning, and decision-making. Traditional ap- proaches aggregate movement by employing the concept of time-window discretization and exploring an entire dataset. However, they can present inconsistencies in time and space with the real traffic dynam- ics. In this paper, we present a novel approach to discover hot routes in real time. Different from other existing approaches, our method tracks the evolution of the objects’ movement in real time. We believe that no other approach captures and keeps track of how the hot routes evolve in an incremental man- ner. Moreover, we conducted extensive experiments on real-world and simulated datasets to evaluate the effectiveness and performance of our method. The results demonstrate that our method scales linearly with the size of the dataset, and is able to deal with large datasets and with streams of high-sampling rates.

Optimization of urban semaphore times turning into JSSP

Article

Full-text available

Aug 2018

The objective of this paper is to cover the research in the area of adaptive traffic control with emphasis on applied optimization methods. A distinction can be made between classical systems, which operate with a common cycle time, and the more flexible ones, phase-based approaches, which are shown to be more suitable for adaptive traffic control. Classic optimization solutions for this problem result in a model which is relatively easy to represent but may be difficult to fit into the standard mixed-integer programming (MIP) scheme. We propose an alternative approach to find an optimal global solution for the green wave problem on hot routes, which consists of reducing it to a Job Shop Scheduler problem using the Webster Model to adapt the cycles to road characteristics and average traffic speed.

Finding Time Period-Based Most Frequent Path in Big Trajectory Data

Conference Paper

Full-text available

Jun 2013

The rise of GPS-equipped mobile devices has led to the emergence of big trajectory data. In this paper, we study a new path finding query which finds the most frequent path (MFP) during user-specified time periods in large-scale historical trajectory data. We refer to this query as time period-based MFP (TPMFP). Specifically , given a time period T , a source vs and a destination v d , TPMFP searches the MFP from vs to v d during T. Though there exist several proposals on defining MFP, they only consider a fixed time period. Most importantly, we find that none of them can well reflect people's common sense notion which can be described by three key properties, namely suffix-optimal (i.e., any suffix of an MFP is also an MFP), length-insensitive (i.e., MFP should not favor shorter or longer paths), and bottleneck-free (i.e., MFP should not contain infrequent edges). The TPMFP with the above properties will reveal not only common routing preferences of the past travelers, but also take the time effectiveness into consideration. Therefore, our first task is to give a TPMFP definition that satisfies the above three properties. Then, given the comprehensive TPMFP definition, our next task is to find TPMFP over huge amount of tra-jectory data efficiently. Particularly, we propose efficient search algorithms together with novel indexes to speed up the processing of TPMFP. To demonstrate both the effectiveness and the efficiency of our approach, we conduct extensive experiments using a real dataset containing over 11 million trajectories.

Map-Matching on Low Sampling Rate Trajectories through Frequent Pattern Mining

Article

Full-text available

Mar 2022

Map-matching, an important preprocessing task in many location-based services (LBS), projects each point of the global positioning system (GPS) within a trajectory dataset onto a digital map. The state-of-the-art map-matching algorithms typically employ Hidden Markov model (HMM) via shortest path computation. But the computation of the shortest path might not work well on low-sampling-rate trajectory data (e.g., one GPS point every 1–5 min), leading to low matching precision and high running time. To solve the problem, this paper firstly identifies frequent patterns (FPs) in historical trajectories to capture meaningful mobility behaviors, and then extracts mobile behavior criterion (MBC) of mobile users. Such a criterion generally represents the route choice of mobile users on road networks. Moreover, the temporal information within trajectory data was employed to estimate the speed of mobile users on road segments. The identified FPs, coupled with MBC and moving speed, help to improve the map-matching precision of low-sampling-rate trajectories. In addition, an FP-forest structure was proposed to index the identified FPs. The structure could greatly speed up the lookup of frequent paths for shorter running time. Furthermore, the FP-forest structure was pruned to reduce redundancy with smaller space cost. Finally, experiments were carried out on real-world datasets. The results confirm that our FP-matching method outperforms state-of-the-art in terms of effectiveness and efficiency.

Interactive Bike Lane Planning Using Sharing Bikes’ Trajectories

Article

Full-text available

Jan 2019

Cycling as a green transportation mode has been promoted by many governments all over the world. As a result, constructing effective bike lanes has become a crucial task to promote the cycling life style, as well-planned bike lanes can reduce traffic congestions and safety risks. Unfortunately, existing trajectory mining approaches for bike lane planning do not consider one or more key realistic government constraints: 1) budget limitations, 2) construction convenience, and 3) bike lane utilization. In this paper, we propose a data-driven approach to develop bike lane construction plans based on the large-scale real world bike trajectory data collected from Mobike, a station-less bike sharing system. We enforce these constraints to formulate our problem and introduce a flexible objective function to tune the benefit between coverage of users and the length of their trajectories. We prove the NP-hardness of the problem and propose greedy-based heuristics to address it. To improve the efficiency of the bike lane planning system for the urban planner, we propose a novel trajectory indexing structure and deploy the system based on a parallel computing framework (Storm) to improve the system's efficiency. Finally, extensive experiments and case studies are provided to demonstrate the system efficiency and effectiveness.

Classification of spatio-temporal trajectories from Volunteer Geographic Information through fuzzy rules

Article

Nov 2019
APPL SOFT COMPUT

Volunteer Geographic Information (VGI) is one of the key enablers of the mobility mining discipline. This work introduces a novel data-driven methodology to create a classifier of spatio-temporal trajectories based on VGI. Although other solutions have been proposed, they usually do not fully consider the low resolution and uncertainty of VGI due to its inherent human nature. The proposed approach introduces a classifier based on fuzzy rules that are able to deal with this kind of data. The solution is applied in a use case for real-time detection of tourists and local citizens’ flows and it is compared with a well-established trajectory classifier exhibiting quite promising results.

A Stochastic Approach Towards Travel Route Optimization and Recommendation Based on Users Constraints Using Markov Chain

Article

Full-text available

Jul 2019

Accurate analysis of tourist movement is essential for a country to devise sustainable policies for promoting and growing tourism. From the activities of tourists and the spots they visit, the amount of revenue generated for a particular region can be predicted. However, the tourist preferences evolve and vary from one user to another, and thus, a tourist spot favorite for one set of users is not preferred by another set of users. This paper aims to design and implement a novel application to recommend an optimal travel route based on user constraints. The user constraints can be the maximum time, distance, and popularity of a particular place. A real data is collected from the WiFi routers installed at different tourist spots of Jeju Island, South Korea. We apply a Markov Chain Model to predict the popularity of different places on a short-term and long-term basis. The popularity index alongside user constraints is provided to find optimal routes. A responsive web-based prototype is developed to collect user constraints, and in response, recommends optimal routes using Google Maps directory services. Results indicate the difference between short-term and long-term popularity to prove the effectiveness of Markov Chains in forecasting long-term behavior. The system is made responsive for all sizes of screens to make it uniformly serviceable on mobile phones. The accuracy of the system is computed based on the historical data and the recommendation system, and it is ascertained to fall between 95% to 100% all the time. Furthermore, the results are compared with popular state-of-the-art methods, and they are found to be significantly better than in long-term location prediction.

A Spatial-Temporal Model for Locating Electric Vehicle Charging Stations

Chapter

Full-text available

Jul 2018

A perfect charging station network plays a key role in Electric Vehicle (EV) adoption. In order to find optimal cost to construct the network and maximize satisfaction of usage consideration, we proposes a data driven framework for solving the problem of locating the charging station. Spatial-temporal models are built to analyze the EV usage behavior in the urban area. The features such as charging demand, high energy consumption area, and highly traveled paths are captured. We evaluate the proposed models on a real-world EV dataset. The results clearly demonstrate the efficiency and accuracy of our models on locating EV charging stations.

Mobility Data Science: Perspectives and Challenges

Article

May 2024

Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the domain of mobility data science. Towards a unified approach to mobility data science, we present a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art, and describe open challenges for the research community in the coming years.

Privacy Preservation in High-Dimensional Trajectory Data for Passenger Flow Analysis

Thesis

Full-text available

May 2023

The increasing use of location-aware devices provides many opportunities for analyzing and mining human mobility. The trajectory of a person can be represented as a sequence of visited locations with different timestamps. Storing, sharing, and analyzing personal trajectories may pose new privacy threats. Previous studies have shown that employing traditional privacy models and anonymization methods often leads to low information quality in the resulting data. In this thesis we propose a method for achieving anonymity in a trajectory database while preserving the information to support effective passenger flow analysis. Specifically, we first extract the passenger flowgraph, which is a commonly employed representation for modeling uncertain moving objects, from the raw trajectory data. We then anonymize the data with the goal of minimizing the impact on the flowgraph. Extensive experimental results on both synthetic and real-life data sets suggest that the framework is effective to overcome the special challenges in trajectory data anonymization, namely, high dimensionality, sparseness, and sequentiality.

Robust Trajectory-based Density Estimation for Geometric Structure Recovery: Theory and Applications

Preprint

Full-text available

Oct 2022

With the rise of the Internet of Things, strategies for effectively processing big data are essential for discovering meaningul insights. The time series datasets produced by groups of interconnected devices contain valuable underlying patterns. Recent works have extracted patterns from spatio-temporal datasets to aid in road network generation, activity recognition, and others. The speed and accuracy of the underlying geometry reconstruction are important in these applications. Existing methods such as kernel density estimation (KDE) have been used but are often computationally expensive. We propose modifying edge quadtrees to utilize their effective heirarchical structure. Our modification estimates density using a novel trajectory count function which provides mathematical guarantees on the stability of the count by enforcing an invariance to local perturbations. We evaluate our method's effectiveness at extracting the underlying geometry and representative subsample points. For verification, we compare against a KDE variant at extracting the underlying shape of noisy synthetic trajectories travelling alonng the shape. We compare map extraction from GPS traces against current methods. Our method significantly improves runtime while extracting the geometry better or at least comparably. We also compare against maxmin subsampling on an activity recognition data set and find a significant runtime improvement with comparable performance.

Optimal Electric Vehicles Route Planning with Traffic Flow Prediction and Real-Time Traffic Incidents

Article

Full-text available

Mar 2022

Electric Vehicles (EVs) are regarded to be among the most environmentally and economically efficient transportation solutions. However, barriers and range limitations hinder this technology’s progress and deployment. In this paper, we examine EV route planning to derive optimal routes considering energy consumption by analyzing historical trajectory data. More specifically, we propose a novel approach for EV route planning that considers real-time traffic incidents, road topology, charging station locations during battery failure, and finally, traffic flow prediction extracted from historical trajectory data to generate energy maps. Our approach consists of four phases: the off-line phase which aims to build the energy graph, the application of the A* algorithm to deliver the optimal EV path, the NEAT trajectory clustering which aims to produce dense trajectory clusters for a given period of the day, and finally, the on-line phase based on our algorithm to plan an optimal EV path based on real traffic incidents, dense trajectory clusters, road topology information, vehicle characteristics, and charging station locations. We set up experiments on real cases to establish the optimal route for electric cars, demonstrating the effectiveness and efficiency of our proposed algorithm.

A K-Main Routes Approach to Spatial Network Activity Summarization

Article

Nov 2021

Data summarization is an important concept in data mining for finding a compact representation of a dataset. In spatial network activity summarization (SNAS), we are given a spatial network and a collection of activities (e.g., pedestrian fatality reports, crime reports) and the goal is to find k shortest paths that summarize the activities. SNAS is important for applications where observations occur along linear paths such as roadways, train tracks, etc. SNAS is computationally challenging because of the large number of k subsets of shortest paths in a spatial network. Previous work has focused on either geometry or subgraph-based approaches (e.g., only one path), and cannot summarize activities using multiple paths. This paper proposes a K-Main Routes (KMR) approach that discovers k shortest paths to summarize activities. KMR generalizes K-means for network space but uses shortest paths instead of ellipses to summarize activities. To improve performance, KMR uses network Voronoi, divide and conquer, and pruning strategies. We present a case study comparing KMR's network-based output (i.e., shortest paths) to geometry-based outputs (e.g., ellipses) on pedestrian fatality data. Experimental results on synthetic and real data show that KMR with our performance-tuning decisions yields substantial computational savings without reducing summary path coverage.

A framework for discovering popular paths using transactional modeling and pattern mining

Article

Full-text available

Mar 2022
DISTRIB PARALLEL DAT

While the problems of finding the shortest path and k-shortest paths have been extensively researched, the research community has been shifting its focus towards discovering and identifying paths based on user preferences. Since users naturally follow some of the paths more than other paths, the popularity of a given path often reflects such user preferences. Given a set of user traversals in a road network and a set of paths between a given source and destination pair, we address the problem of performing top-k ranking of the paths in that set based on path popularity. In this paper, we introduce a new model for computing the popularity scores of paths. Our main contributions are threefold. First, we propose a framework for modeling user traversals in a road network as transactions. Second, we present an approach for efficiently computing the popularity score of any path based on the itemsets extracted from the transactions using pattern mining techniques. Third, we conducted an extensive performance evaluation with two real datasets to demonstrate the effectiveness of the proposed scheme.

Scalable clustering of segmented trajectories within a continuous time framework: application to maritime traffic data

Article

Full-text available

Jul 2021
MACH LEARN

In the context of the surveillance of the maritime traffic, a major challenge is the automatic identification of traffic flows from a set of observed trajectories, in order to derive good management measures or to detect abnormal or illegal behaviours for example. In this paper, we propose a new modelling framework to cluster sequences of a large amount of trajectories recorded at potentially irregular frequencies. The model is specified within a continuous time framework, being robust to irregular sampling in records and accounting for possible heterogeneous movement patterns within a single trajectory. It partitions a trajectory into sub-trajectories, or movement modes, allowing a clustering of both individuals’ movement patterns and trajectories. The clustering is performed using non parametric Bayesian methods, namely the hierarchical Dirichlet process, and considers a stochastic variational inference to estimate the model’s parameters, hence providing a scalable method in an easy-to-distribute framework. Performance is assessed on both simulated data and on our motivational large trajectory dataset from the automatic identification system, used to monitor the world maritime traffic: the clusters represent significant, atomic motion-patterns, making the model informative for stakeholders.

Ripley’s K‐function for Network‐Constrained Flow Data

Article

Full-text available

Jun 2021

Many types of spatial flows, including pedestrian flows and vehicle flows, are constrained by and distribute on spatial networks. In the literature, network-constrained flows are usually modeled as a direct line in planar space using methods designed for flows in planar space. Further, in spatial statistical analysis of flow patterns, distance measures and the hypothesis of spatial randomness of flows also have a significant impact on the determination of flow patterns. In this study, we extend the global and local Ripley’s K functions for planar flows to network space. Both the network and planar K-functions for flows are applied to detect the patterns of taxi Origin-Destination flow data on a road network at multiple scales. The effect of distance measures and simulation methods in the network and planar Ripley’s K functions are examined. We found that the planar K function is more sensitive to the changes in scale and tends to detect more clustered flows compared with the network K function at the same scale. Distance measures and simulation methods have a more significant influence on the detection of patterns of network-constrained flows than the selection of the network or planar Ripley’s K functions. This study suggests that distance measures and hypotheses of spatial randomness have to be chosen carefully before applying flow pattern analytic methods to network-constrained flows and interpreting the results of flow patterns.

A Trajectory Classification Model Using Grammar Parsing

Article

Full-text available

Jan 2020

Lei Bao

Trajectory classification is a hot topic in the field of spatiotemporal data mining. Existing models exert spatial or temporal computation on trajectory data, which require huge efforts and are often time consuming and lack of efficiency. This article proposes a model to classify unknown ship trajectories through a syntax recognition approach. By using the background semantic information in the rasterized sea chart, the model transforms the ship trajectories into symbolic sentences containing both spatiotemporal and semantic information, and reduces their scale. The class feature is expressed as a context-free grammar and the data classification is implemented through syntax parsing. The parsing requires less computation and is more efficient. Experiments are carried out to verify the model’s practicability, and the results show that it is valid and effective.

Detection of Indoor High-Density Crowds via Wi-Fi Tracking Data

Article

Full-text available

Sep 2020
SENSORS-BASEL

Accurate detection of locations of indoor high-density crowds is crucial for early warning and emergency rescue during indoor safety accidents. The spatial structure of indoor environments is more complicated than outdoor environments. The locations of indoor high-density crowds are more likely to be the sites of security accidents. Existing detection methods for high-density crowd locations mostly focus on outdoor environments, and relatively few detection methods exist for indoor environments. This study proposes a novel detection framework for high-density indoor crowd locations termed IndoorSRC (Simplification-Reconstruction-Cluster). In this paper, a novel indoor spatiotemporal clustering algorithm called Indoor-STAGNES is proposed to detect the indoor trajectory stay points to simplify indoor movement trajectory. Then, we propose use of a Kalman filter algorithm to reconstruct the indoor trajectory and properly align and resample the data. Finally, an indoor spatiotemporal density clustering algorithm called Indoor-STOPTICS is proposed to detect the locations of high-density crowds in the indoor environment from the reconstructed trajectory. Extensive experiments were conducted using indoor Wi-Fi positioning datasets collected from a shopping mall. The results show that the IndoorSRC framework evidently outperforms the existing baseline method in terms of detection performance.

Automatic incident detection on freeways based on Bluetooth traffic monitoring

Article

Full-text available

Oct 2020
ACCIDENT ANAL PREV

A novel automatic incident detection (AID) method for freeways, based on the use of data provided by Bluetooth sensors and an unsupervised anomaly detection approach, is presented. The two main advantages of the proposed AID system are: (i) the use of Bluetooth sensors offers several practical advantages over inductive loop detectors (ILD), which is one of the preferred sensing technology for traffic flow; and (ii) the unsupervised anomaly detection approach builds a model without the need of incident information. A common problem when designing an AID system is that incident information, i.e., ground-truth data, with enough accuracy is seldom available. Isolation forest is the unsupervised anomaly detection approach adopted in this work. This method is based on characterizing anomalous traffic conditions by exploiting the fact that anomalies tend to be isolated. The most remarkable feature of this anomaly detection method is its high detection performance while having a very simple tuning procedure and an extremely low computational demand. Finally, the effectiveness of the presented AID method is demonstrated using real traffic data collected by a network of Bluetooth sensors installed in Ayalon Highway, Tel Aviv.

Size constrained k simple polygons

Article

Full-text available

Jan 2021
GEOINFORMATICA

Given a geometric space and a set of weighted spatial points, the Size Constrained k Simple Polygons (SCkSP) problem identifies k simple polygons that maximize the total weights of the spatial points covered by the polygons and meet the polygon size constraint. The SCkSP problem is important for many societal applications including hotspot area detection and resource allocation. The problem is NP-hard; it is computationally challenging because of the large number of spatial points and the polygon size constraint. Our preliminary work introduced the Nearest Neighbor Triangulation and Merging (NNTM) algorithm for SCkSP to meet the size constraint while maximizing the total weights of the spatial points. However, we find that the performance of the NNTM algorithm is dependent on the t-nearest graph. In this paper, we extend our previous work and propose a novel approach that outperforms our prior work. Experiments using Chicago crime and U.S. Federal wildfire datasets demonstrate that the proposed algorithm significantly reduces the computational cost of our prior work and produces a better solution.

RRCF: an abnormal pulse diagnosis factor for road abnormal hotspots detection

Article

Full-text available

Jan 2021

Road hotspots detection method is a key issue in the field of intelligent transportation research. Compared with normal hotspots caused by high traffic flow, abnormal hotspots, which are results of road accidents, perform an occurrence time random behavior and difficult to predict. Deducing from the pulse diagnosis method, in this paper, a region real-time congestion factor is constructed to realize road abnormal hotspots discovery. Taxi’s GPS data of Hangzhou City, China are employed to find abnormal pulse of road segment, while the relationship between proposed congestion factor and the real-time traffic data is discussed. Two accidental scenarios are built to verify the validity of the proposed method. The experiment results show that the proposed method performs well in real-time abnormal hotspot detection and analysis output could be useful in path planning and traffic management.

Mining Private Vehicle Hot Routes Using Electronic Registration Identification Data

Conference Paper

Jun 2019

Hot routes refer to routes that massive vehicles pass through in a period of time. Mining hot routes of private vehicles can help us understand the travel behavior of private vehicles, which is of great help to urban traffic management and construction. This study aims to mine hot routes of private vehicles using Electronic Registration Identification (ERI) data, which is huge amount of traffic data. In this paper, we propose a mining algorithm, Prefix-projected Sequence Pattern Mining based on Successor Set (PSSS), which is based on the idea of PrefixSpan algorithm to mine hot routes. Firstly, we extract private vehicle trips from ERI data. Then we transform trips into string sequences. We use the PSSS algorithm to mine hot routes of private vehicles. Finally, we analyze the hot routes of private vehicles and compare efficiency of two algorithms. The experimental results are of guiding significance to the traffic management and construction of intelligent transportation.

Analyzing the social impacts of scooters with geo-spatial methods

Article

Jul 2019

Scooters, or gasoline powered two-wheelers, are becoming increasingly popular in the Netherlands. They provide fast, independent and affordable transportation, especially in urban congested areas. Unfortunately, they also have considerable adverse impacts on the environment and human health. The three most prominent impacts are associated with air pollution, noise pollution and traffic accidents. While the total contribution of emissions by scooters is relatively small compared to total traffic related emissions, they have a disproportionally large impact on their direct environment, especially when sharing roads with bicycles as in the Netherlands, where they are characterized as super-polluters. A scoping GIS based assessment, using theoretical and available secondary data, could identify routes with highest likelihood of scooter presence to estimate exhaust and noise impacts and related traffic accidents. Estimated are provided for the total population, and the number of childcare facilities within the impact areas. For future projections four different scenarios are analyzed. For the case study of the town of Enschede in the Netherlands the present noise/exhaust environmental impact of scooters is affecting at least 30% of the population and in the future this number can increase to 38%–53%.

Detecting Vehicle Illegal Parking Events using Sharing Bikes' Trajectories

Conference Paper

Full-text available

Jul 2018

Illegal vehicle parking is a common urban problem faced by major cities in the world, as it incurs traffic jams, which lead to air pollution and traffic accidents. Traditional approaches to detect illegal vehicle parking events rely highly on active human efforts, e.g., police patrols or surveillance cameras. However, these approaches are extremely ineffective to cover a large city. The massive and high quality sharing bike trajectories from Mobike offer us with a unique opportunity to design a ubiquitous illegal parking detection system, as most of the illegal parking events happen at curbsides and have significant impact on the bike users. Two main components are employed to mine the trajectories in our system: 1)~trajectory pre-processing, which filters outlier GPS points, performs map-matching and builds indexes for bike trajectories; and 2)~illegal parking detection, which models the normal trajectories, extracts features from the evaluation trajectories and utilizes a distribution test-based method to discover the illegal parking events. The system is deployed on the cloud internally used by Mobike. Finally, extensive experiments and many insightful case studies based on the massive trajectories in Beijing are presented.

SafePath: Differentially-Private Publishing of Passenger Trajectories in Transportation Systems

Article

Full-text available

Jul 2018
COMPUT NETW

In recent years, the collection of spatio-temporal data that captures human movements has increased tremendously due to the advancements in hardware and software systems capable of collecting person-specific data. The bulk of the data collected by these systems has numerous applications, or it can simply be used for general data analysis. Therefore, publishing such big data is greatly beneficial for data recipients. However, in its raw form, the collected data contains sensitive information pertaining to the individuals from which it was collected and must be anonymized before publication. In this paper, we study the problem of privacy-preserving passenger trajectories publishing and propose a solution under the rigorous differential privacy model. Unlike sequential data, which describes sequentiality between data items, handling spatio-temporal data is a challenging task due to the fact that introducing a temporal dimension results in extreme sparseness. Our proposed solution introduces an efficient algorithm, called SafePath, that models trajectories as a noisy prefix tree and publishes ϵ-differentially-private trajectories while minimizing the impact on data utility. Experimental evaluation on real-life transit data in Montreal suggests that SafePath significantly improves efficiency and scalability with respect to large and sparse datasets, while achieving comparable results to existing solutions in terms of the utility of the sanitized data.

Trajectory Clustering Analysis

Chapter

Feb 2023

In this chapter, we will introduce the development of trajectory clustering analysis. First, we review some related works on clustering of trajectory data, especially including the subspace clustering-based methods. Second, we depict a general framework, termed as atomic-representation-based subspace clustering (ARSC) for the clustering of trajectory data. ARSC is a subspace clustering framework by first computing the atomic representations of data points and then clustering them using the representations. By using ARSC as a general platform, we introduce a robust subspace clustering method that is referred as minimum error entropy-based sparse subspace clustering (MEESSC) against outliers and heavy data noises. MEESSC computes the representation of each data point by minimizing the ℓ1 norm regularized minimum error entropy-based loss function. Experimental results are shown to validate the efficacy and robustness of MEESSC for the clustering of trajectory data.

A streaming data visualization framework for supporting decision-making in the Intensive Care Unit

Article

Apr 2023
EXPERT SYST APPL

Siting priorities for congestion-reducing projects in Dhaka: a spatiotemporal analysis of traffic congestion, travel times, air pollution, and exposure vulnerability

Article

Aug 2021

Traffic congestion increases travel time and is a major source of pollution and health damage in developing-country cities. Data scarcity frequently confines traffic improvement projects to sites where congestion can be easily measured. This article uses spatiotemporal data from new global sources to revisit the siting problem in Dhaka, Bangladesh, where local congestion measures are augmented by estimates of citywide travel time, pollution exposure, and pollution vulnerability. We combine Google Traffic data with an econometric model linking traffic, pollution readings from a local monitoring station, and weather data to estimate the spatial distribution of vehicular pollution. We explore pollution-vulnerability implications by incorporating spatial distributions of poor households, children, and the elderly. Using the Open Source Routing Machine and OpenStreetMaps, we estimate systemwide travel-time gains from reducing congestion at each point in a grid covering the Dhaka metro area. We find a large divergence of siting priorities in single-dimensional exercises that focus exclusively on local congestion, citywide travel time, vehicular pollution, or vulnerable-resident pollution exposure. By implication, optimal siting requires a social objective function with explicit weights assigned to each of the four dimensions. The new global information sources permit extending this multidimensional approach to many cities throughout the developing world.

PerRD: A System for Personalized Route Description

Conference Paper

Apr 2019

AIS Ship Trajectory Clustering Based on Convolutional Auto-encoder

Chapter

Jan 2021

Ship trajectory anomaly detection, route planning, location prediction, collision detection and other issues have become the main research directions in the field of ocean navigation. Ship trajectory clustering is the key to address these problems. By mining the motion patterns of ship trajectory, those similar trajectories are grouped into the same category. Traditional trajectory clustering method usually needs to select the Spatio-temporal trajectory measurement method based on the data volume, computational complexity, noise and other influencing factors. The selection of optimal similarity measure formula needs prior knowledge and extensive experimentation, resulting in computational intensive and time-consuming. In this paper, we propose a ship trajectory motion pattern extraction algorithm based on one-dimensional convolutional auto-encoder without Spatio-temporal trajectory measurement methods. By extracting the low-dimensional representation of the ship’s trajectory, our approach can keep the sequence of trajectory points and reduce the distance calculation bias. The experimental results show that our proposed algorithm has good clustering performance while preserving the main motion characteristics of ship trajectory.

Model Study of Traffic Congestion Impacted by Incidents

Conference Paper

Sep 2019

Discovering traffic congestion through traffic flow patterns generated by moving object trajectories

Article

Nov 2019
COMPUT ENVIRON URBAN

The discovery of moving object trajectory patterns representing high traffic density has been covered in various works using diverse approaches. These models are useful in areas such as transportation planning, traffic monitoring, and advertising on public roads. However, though studies tend to recognize the importance of these types of patterns in utility, they usually do not consider traffic congestion as a particular condition of high traffic. In this work, we present a model for the discovery of high traffic flow patterns in relation to traffic congestion. This relationship is represented in terms of traffic that is shared between different sectors of the pattern, making it possible to identify traffic flow situations causing congestion. We also complement this model by discovering alternative paths for the severe traffic depicted in these patterns. These alternative paths depend on traffic level and location inside the road network. Depending on the traffic conditions, alternative paths are commonly sought by drivers when they are approaching a traffic jam, in order to mitigate the effects of traffic congestion. We compare these models with related work from similar areas and validate them by conducting experiments using real data. We describe discovered patterns related to the main elements of the road network in the dataset and show their advantages in comparison to related models. Based on the displayed metrics, the algorithms’ implementation offers good performance execution for the given dataset volume. The results presented confirm the usefulness of the proposed patterns as a tool that helps to improve traffic, allowing the identification of problems and possible alternatives.

Trajectory Data Classification: A Review

Article

Aug 2019

This article comprehensively surveys the development of trajectory data classification. Considering the critical role of trajectory data classification in modern intelligent systems for surveillance security, abnormal behavior detection, crowd behavior analysis, and traffic control, trajectory data classification has attracted growing attention. According to the availability of manual labels, which is critical to the classification performances, the methods can be classified into three categories, i.e., unsupervised, semi-supervised, and supervised. Furthermore, classification methods are divided into some sub-categories according to what extracted features are used. We provide a holistic understanding and deep insight into three types of trajectory data classification methods and present some promising future directions.

A Novel Approach to Identify Intersection Information via Trajectory Big Data Analysis in Urban Environments

Chapter

Jan 2020

The road traffic condition information includes some important traffic control information such as U-turn or left turn which affect driver’s travel. We proposed a novel way to identify road intersection and traffic control information through analyzing floating car trajectory data automatically and timely. First, a difference-based algorithm is proposed to filter the outlier trajectory data. Then a map matching method based on three-level grid is applied. Finally, an automatic algorithm is developed to recognize the road traffic control information timely. The entire floating trajectory data of Fuzhou about 1.5 million records are used to verify the proposed method. Experiment result indicates that the method has high efficiency and accuracy rate. We construct this system based on trajectory data by 6000 taxis a day. The results of the operation show that the correct rate is high 87.7%, which indicates that it is very valuable.

Evaluation of the Use of Streaming Graph Processing Algorithms for Road Congestion Detection

Conference Paper

Dec 2018

Profiling and Grouping Space-time Activity Patterns of Urban Individuals

Conference Paper

Apr 2018

Jianan Shen

Identifying primary public transit corridors using multi-source big transit data

Article

Dec 2018

Effective public transit planning needs to address realistic travel demands, which can be illustrated by corridors across major residential areas and activity centers. It is vital to identify public transit corridors that contain the most significant transit travel demand patterns. We propose a two-stage approach to discover primary public transit corridors at high spatio-temporal resolutions using massive real-world smart card and bus trajectory data, which manifest rich transit demand patterns over space and time. The first stage was to reconstruct chained trips for individual passengers using multi-source massive public transit data. In the second stage, a shared-flow clustering algorithm was developed to identify public transit corridors based on reconstructed individual transit trips. The proposed approach was evaluated using transit data collected in Shenzhen, China. Experimental results demonstrated that the proposed approach is a practical tool for extracting time-varying corridors for many potential applications, such as transit planning and management. © 2018

Congestion Detection and Distribution Pattern Analysis Based on Spatiotemporal Density Clustering

Conference Paper

Jun 2018

Spatial Co-location Pattern Mining: 6th International Conference, BDA 2018, Warangal, India, December 18–21, 2018, Proceedings

Chapter

Nov 2018

Venkata Gunturi

Scalable Detection of Crowd Motion Patterns

Article

Nov 2018

Studying the movements of crowds is important for understanding and predicting the behavior of large groups of people. When analyzing such crowds, one is often interested in the long-term macro-level motions of the crowd, as opposed to the micro-level individual movements at each moment in time. A high-level representation of these motions is thus desirable. In this work, we present a scalable method for detection of crowd motion patterns, i.e., spatial areas describing the dominant motions within the crowd. For measuring crowd movements, we propose a fast, scalable, and low-cost method based on proximity graphs. For analyzing crowd movements, we utilize a three-stage pipeline: (1) represents the behavior of each person at each moment in time using a low-dimensional data point, (2) cluster these data points based on spatial relations, and (3) concatenate these clusters based on temporal relations. Experiments on synthetic datasets reveals our method can handle various scenarios including curved lanes and diverging flows. Evaluation on real-world datasets shows our method is able to extract useful motion patterns from such scenarios which could not be properly detected by existing methods. Overall, we see our work as an initial step towards rich pattern recognition.

Efficient Multi-range Query Processing on Trajectories: 37th International Conference, ER 2018, Xi'an, China, October 22–25, 2018, Proceedings

Chapter

Sep 2018

With the widespread use of devices with geo-positioning technologies, an unprecedented volume of trajectory data is becoming available. In this paper, we propose and study the problem of multi-range query processing over trajectories, that finds the trajectories that pass through a set of given spatio-temporal ranges. Such queries can facilitate urban planning applications by finding traffic movement flows between different parts of a city at different time intervals. To our best knowledge, this is the first work on answering multi-range queries on trajectories. In particular, we first propose a novel two-level index structure that preserves both the co-location of trajectories, and the co-location of points within trajectories. Next we present an efficient query processing algorithm that employs several pruning techniques at different levels of the index. The results of our extensive experimental studies on two real datasets demonstrate that our approach outperforms the baseline by 1 to 2 orders of magnitude.

Traffic Risk Mining From Heterogeneous Road Statistics

Article

Full-text available

Sep 2018

At present, a large amount of traffic-related data is obtained manually and through sensors and social media, e.g., traffic statistics, accident statistics, road information, and users' comments. In this paper, we propose a novel framework for mining traffic risk from such heterogeneous data. Traffic risk refers to the possibility of occurrence of traffic accidents. Specifically, we focus on two issues: 1) predicting the number of accidents on any road or at intersection and 2) clustering roads to identify risk factors for risky road clusters. We present a unified approach for addressing these issues by means of feature-based non-negative matrix factorization (FNMF). In particular, we develop a new multiplicative update algorithm for the FNMF to handle big traffic data. Using real-traffic data in Tokyo, we demonstrate that the proposed algorithm can be used to predict traffic risk at any location more accurately and efficiently than existing methods, and that a number of clusters of risky roads can be identified and characterized by two risk factors. In summary, our work can be regarded as the first step to a new research area of traffic risk mining.

Dijkstra’s-DBSCAN: Fast, Accurate, and Routable Density Based Clustering of Traffic Incidents on Large Road Network

Article

Sep 2018

Incident hotspots are used as a direct indicator of the needs for road maintenance and infrastructure upgrade, and an important reference for investment location decisions. Previous incident hotspot identification methods are all region based, ignoring the underlying road network constraints. We first demonstrate how region based hotspot detection may be inaccurate. We then present Dijkstra’s-DBSCAN, a new network based density clustering algorithm specifically for traffic incidents which combines a modified Dijkstra’s shortest path algorithm with DBSCAN (density based spatial clustering of applications with noise). The modified Dijkstra’s algorithm, instead of returning the shortest path from a source to a target as the original algorithm does, returns a set of nodes (incidents) that are within a requested distance when traveling from the source. By retrieving the directly reachable neighbors using this modified Dijkstra’s algorithm, DBSCAN gains its awareness of network connections and measures distance more practically. It avoids clustering incidents that are close but not connected. The new approach extracts hazardous lanes instead of regions, and so is a much more precise approach for incident management purposes; it reduces the O(n2) computational cost to O(n), and can process the entire U.S. network in seconds; it has routing flexibility and can extract clusters of any shape and connections; it is parallellable and can utilize distributed computing resources. Our experiments verified the new methodology’s capability of supporting safety management on a complicated surface street configuration. It also works for customized lane configuration, such as freeways, freeway junctions, interchanges, roundabouts, and other complex combinations.

CoRe: Generating a Computationally Representative Road Skeleton - Integrating AADT with Road Structure: 20th International Conference, DaWaK 2018, Regensburg, Germany, September 3–6, 2018, Proceedings

Chapter

Aug 2018

Corridor Learning Using Individual Trajectories

Conference Paper

Jun 2018

Mining K primary corridors from vehicle GPS trajectories on a road network based on traffic flow

Conference Paper

May 2018

Classification of Spatio-Temporal Trajectories Based on Support Vector Machines

Chapter

Jun 2018

Within the mobility mining discipline, several solutions for the classification of spatio-temporal trajectories have been proposed. However, they usually do not fully consider the particularities of trajectories from human-generated data like online social networks. For that reason, this work introduces a novel classifier based on Support Vector Machines (SVM), which fits the low resolution of this type of geographic data. This solution is applied in a use case for the detection of tourist mobility exhibiting quite promising results.

Robust and fast similarity search for moving object trajectories

Article

Full-text available

Jun 2005

An important consideration in similarity-based retrieval of moving object trajectories is the definition of a distance function. The existing distance functions are usually sensitive to noise, shifts and scaling of data that commonly occur due to sensor failures, errors in detection techniques, disturbance signals, and different sampling rates. Cleaning data to eliminate these is not always possible. In this paper, we introduce a novel distance function, Edit Distance on Real sequence (EDR) which is robust against these data imperfections. Analysis and comparison of EDR with other popular distance functions, such as Euclidean distance, Dynamic Time Warping (DTW), Edit distance with Real Penalty (ERP), and Longest Common Subsequences (LCSS), indicate that EDR is more robust than Euclidean distance, DTW and ERP, and it is on average 50% more accurate than LCSS. We also develop three pruning techniques to improve the retrieval efficiency of EDR and show that these techniques can be combined effectively in a search, increasing the pruning power significantly. The experimental results confirm the superior efficiency of the combined methods.

On Discovering Moving Clusters in Spatio-temporal Data

Conference Paper

Full-text available

Lect Notes Comput Sci

A moving cluster is defined by a set of objects that move close to each other for a long time interval. Real-life examples are a group of migrating animals, a convoy of cars moving in a city, etc. We study the discovery of moving clusters in a database of object trajectories. The difference of this problem compared to clustering trajectories and mining movement patterns is that the identity of a moving cluster remains unchanged while its location and content may change over time. For example, while a group of animals are migrating, some animals may leave the group or new animals may enter it. We provide a formal definition for moving clusters and describe three algorithms for their automatic discovery: (i) a straight-forward method based on the definition, (ii) a more efficient method which avoids redundant checks and (iii) an approximate algorithm which trades accuracy for speed by borrowing ideas from the MPEG-2 video encoding. The experimental results demonstrate the efficiency of our techniques and their applicability to large spatio-temporal datasets.

A Viterbi algorithm for a trajectory model derived from HMM with explicit relationship between static and dynamic features

Conference Paper

Full-text available

Jun 2004
Acoust Speech Signal Process

This paper introduces a Viterbi algorithm to obtain a sub-optimal state sequence for trajectory-HMM, which is derived from HMM with explicit relationship between static and dynamic features. The trajectory-HMM can alleviate some limitations of HMM, which are (i) constant statistics within HMM state and (ii) conditional independence of observations given the state sequence, without increasing the number of model parameters. The proposed algorithm was applied to state-boundary optimization for Viterbi training and N-best rescoring. In a speaker-dependent continuous speech recognition experiment, trajectory-HMM with the proposed algorithm achieved about 14% error reduction over the standard HMM with the conventional Viterbi algorithm.

Learning and Inferring Transportation Routines

Conference Paper

Full-text available

Jan 2004

This paper introduces a hierarchical Markov model that can learn and infer a user's daily movements through the commu- nity. The model uses multiple levels of abstraction in order to bridge the gap between raw GPS sensor measurements and high level information such as a user's mode of transporta- tion or her goal. We apply Rao-Blackwellised particle filters for efficient inference both at the low level and at the higher levels of the hierarchy. Significant locations such as goals or locations where the user frequently changes mode of trans- portation are learned from GPS data logs without requiring any manual labeling. We show how to detect abnormal be- haviors (e.g. taking a wrong bus) by concurrently tracking his activities with a trained and a prior model. Experiments show that our model is able to accurately predict the goals of a per- son and to recognize situations in which the user performs un- known activities.

Indexing of network constrained moving objects

Conference Paper

Full-text available

Nov 2003

With the proliferation of mobile computing, the ability to index efficiently the movements of mobile objects becomes important. Objects are typically seen as moving in two-dimensional (x,y) space, which means that their movements across time may be embedded in the three-dimensional (x,y,t) space. Further, the movements are typically represented as trajectories, sequences of connected line segments. In certain cases, movement is restricted, and specifically in this paper, we aim at exploiting that movements occur in transportation networks to reduce the dimensionality of the data. Briefly, the idea is to reduce movements to occur in one spatial dimension. As a consequence, the movement data becomes two-dimensional (x,t). The advantages of considering such lower-dimensional trajectories are the reduced overall size of the data and the lower-dimensional indexing challenge. Since off-the-shelf systems typically do not offer higher-dimensional indexing, this reduction in dimensionality allows us to use such DBMSes to store and index trajectories. Moreover, we argue that, given the right circumstances, indexing these dimensionality-reduced trajectories can be more efficient than using a three-dimensional index. This hypothesis is verified by an experimental study that incorporates trajectories stemming from real and synthetic road networks.

Conference Paper

Full-text available

Feb 2002

We investigate techniques for analysis and retrieval of object trajectories in two or three dimensional space. Such data usually contain a large amount of noise, that has made previously used metrics fail. Therefore, we formalize non-metric similarity functions based on the longest common subsequence (LCSS), which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to similar portions of the sequences. Stretching of sequences in time is allowed, as well as global translation of the sequences in space. Efficient approximate algorithms that compute these similarity measures are also provided. We compare these new methods to the widely used Euclidean and time warping distance functions (for real and synthetic data) and show the superiority of our approach, especially in the strong presence of noise. We prove a weaker version of the triangle inequality and employ it in an indexing structure to answer nearest neighbor queries. Finally, we present experimental results that validate the accuracy and efficiency of our approach

A General Probabilistic Framework for Clustering Individuals and Objects

Article

Full-text available

Jul 2000

This paper presents a unifying probabilistic framework for clustering individuals or systems into groups when the available data measurements are not multivariate vectors of xed dimensionality. For example, one might have data from a set of medical patients, where for each patient one has a set of of observed time-series, each time-series of potentially dierent length and dierent sampling rate. We propose a general model-based probabilistic framework for clustering data types of this form which are non-vector in nature and may vary in size from individual to individual. The Expectation-Maximization (EM) procedure for clustering within this framework is discussed and we discuss how it be applied in a general manner to clustering of sequences, time-series, trajectories, and other non-vector data. We show that a number of earlier algorithms can be viewed as special cases within this unifying framework. The paper concludes with several illustrations of the method, including clustering o...

Trajectory Clustering with Mixtures of Regression Models

Article

Full-text available

Jul 1999

In this paper we address the problem of clustering trajectories, namely sets of short sequences of data measured as a function of a dependent variable such as time. Examples include storm path trajectories, longitudinal data such as drug therapy response, functional expression data in computational biology, and movements of objects or individuals in video sequences. Our clustering algorithm is based on a principled method for probabilistic modelling of a set of trajectories as individual sequences of points generated from a finite mixture model consisting of regression model components. Unsupervised learning is carried out using maximum likelihood principles. Specifically, the EM algorithm is used to cope with the hidden data problem (i.e., the cluster memberships). We also develop generalizations of the method to handle non-parametric (kernel) regression components as well as multi-dimensional outputs. Simulation results comparing our method with other clustering methods such as K-means and Gaussian mixtures are presented as well as experimental results on real data sets.

A Framework for Generating Network-Based Moving Objects

Article

Full-text available

Jun 2000

Thomas Brinkhoff

Benchmarking spatiotemporal database systems requires the definition of suitable datasets simulating the typical behavior of moving objects. Previous approaches for generating spatiotemporal data do not consider that moving objects often follow a given network. Therefore, benchmarks require datasets consisting of such “network-based” moving objects. In this paper, the most important properties of network-based moving objects are presented and discussed. Essential aspects are the maximum speed and the maximum capacity of connections, the influence of other moving objects on the speed and the route of an object, the adequate determination of the start and destination of an object, the influence of external events, and time-scheduled traffic. These characteristics are the basis for the specification and development of a new generator for spatiotemporal data. This generator combines real data (the network) with user-defined properties of the resulting dataset. A framework is proposed where the user can control the behavior of the generator by re-defining the functionality of selected object classes. An experimental performance investigation demonstrates that the chosen approach is suitable for generating large data sets.

Learning and Inferring Transportation Routines

Article

Apr 2007
ARTIF INTELL

This paper introduces a hierarchical Markov model that can learn and infer a user's daily movements through an urban community. The model uses multiple levels of abstraction in order to bridge the gap between raw GPS sensor measurements and high level information such as a user's destination and mode of transportation. To achieve efficient inference, we apply Rao–Blackwellized particle filters at multiple levels of the model hierarchy. Locations such as bus stops and parking lots, where the user frequently changes mode of transportation, are learned from GPS data logs without manual labeling of training data. We experimentally demonstrate how to accurately detect novel behavior or user errors (e.g. taking a wrong bus) by explicitly modeling activities in the context of the user's historical data. Finally, we discuss an application called “Opportunity Knocks” that employs our techniques to help cognitively-impaired people use public transportation safely.

Fast mining of spatial collocations

Conference Paper

Aug 2004

Spatial collocation patterns associate the co-existence of non-spatial features in a spatial neighborhood. An example of such a pattern can associate contaminated water reservoirs with certain deceases in their spatial neighborhood. Previous work on discovering collocation patterns converts neighborhoods of feature instances to itemsets and applies mining techniques for transactional data to discover the patterns. We propose a method that combines the discovery of spatial neighborhoods with the mining process. Our technique is an extension of a spatial join algorithm that operates on multiple inputs and counts long pattern instances. As demonstrated by experimentation, it yields significant performance improvements compared to previous approaches.

A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise

Conference Paper

Jan 1996

A partial join approach for mining co-location patterns

Conference Paper

Nov 2004

Spatial co-location patterns represent the subsets of events whose instances are frequently located together in geographic space. We identified the computational bottleneck in the execution time of a current co-location mining algorithm. A large fraction of the join-based co-location miner algorithm is devoted to computing joins to identify instances of candidate co-location patterns. We propose a novel partial-join approach for mining co-location patterns efficiently. It transactionizes continuous spatial data while keeping track of the spatial information not modeled by transactions. It uses a transaction-based Apriori algorithm as a building block and adopts the instance join method for residual instances not identified in transactions. We show that the algorithm is correct and complete in finding all co-location rules which have prevalence and conditional probability above the given thresholds. An experimental evaluation using synthetic datasets and a real dataset shows that our algorithm is computationally more efficient than the join-based algorithm.

Indexing Objects Moving on Fixed Networks

Conference Paper

Jul 2003
Lect Notes Comput Sci

Elias Frentzos

The development of a spatiotemporal access method suitable for objects moving on fixed networks is a very attractive challenge due to the great number of real-world spatiotemporal database applications and fleet management systems dealing with this type of objects. In this work, a new indexing technique, named Fixed Network R-Tree (FNR-Tree), is proposed for objects constrained to move on fixed networks in 2-dimensional space. The general idea that describes the FNR-Tree is a forest of 1-dimensional (ID) R-Trees on top of a 2-dimensional (2D) R-Tree. The 2D R-Tree is used to index the spatial data of the network (e.g. roads consisting of line segments), while the ID R-Trees are used to index the time interval of each object's movement inside a given link of the network. The performance study, comparing this novel access method with the traditional R-Tree under various datasets and queries, shows that the FNR-Tree outperforms the R-Tree in most cases.

Efficient Mining of Spatiotemporal Patterns

Conference Paper

Jul 2001

The problem of mining spatiotemporal ,patterns is finding sequences,of events ,that occur frequently in spatiotemporal datasets. Spatiotemporal datasets store the evolution of objects over time. Examples include sequences of sensor images of a geographical region, data that describes the location and movement of individual objects over time, or data that describes the evolution of natural phenomena, such as forest coverage. The discovered patterns are sequences,of events,that occur most frequently. In this paper, we present DFS_MINE, a new algorithm for fast mining of ,frequent ,spatiotemporal ,patterns ,in ,environmental data. DFS_MINE, as its name suggests, uses a Depth-First-Search-like approach to the problem which ,allows very fast discoveries of long sequential patterns. ,DFS_MINE performs database ,scans ,to discover ,frequent sequences rather than relying on information stored in main memory, which has the advantage ,that the a mount of space required is minimal. Previous approaches utilize a Breadth-First-Search-like approach ,and ,are ,not efficient for discovering long frequent sequences. Moreover, they require storing in main memory all occurrences ,of each sequence in the database and, as a result, the a mount of space needed is rather large. Experiments showthat the I/O cost o fthe database scans is o ffset byth e efficiency of the DFS-like approach ,that ensures ,fast discovery of long frequent patterns. DFS_MINE is also ideal for mining frequent spatiotemporal sequences,with various spatial granularities. Spatial granularit y refers to how fine or how general our view of the space we are examining is.

Discovering Spatial Colocation Patterns: A Summary of Results

Conference Paper

Jul 2001
Lect Notes Comput Sci

Moving Objects Databases

Book

Jan 2005

The current trends in consumer electronics--including the use of GPS-equipped PDAs, phones, and vehicles, as well as the RFID-tag tracking and sensor networks--require the database support of a specific flavor of spatio-temporal databases. These we call Moving Objects Databases. Why do you need this book? With current systems, most data management professionals are not able to smoothly integrate spatio-temporal data from moving objects, making data from, say, the path of a hurricane very difficult to model, design, and query. Whether your field is geology, national security, urban planning, mobile computing, or almost anything in between, this book's concepts and techniques will help you solve the data management problems associated with this kind of data. + Focuses on the modeling and design of data from moving objects--such as people, animals, vehicles, hurricanes, forest fires, oil spills, armies, or other objects--as well as the storage, retrieval, and querying of that very voluminous data. + Demonstrates through many practical examples and illustrations how new concepts and techniques are used to integrate time and space in database applications. + Provides exercises and solutions in each chapter to enable the reader to explore recent research results in practice. Click here to view more details on Moving Objects Databases (Preface, TOC, Chapter 1, References, etc.).

Trajectory Clustering: A Partition-and-Group Framework

Article

Jun 2007

Existing trajectory clustering algorithms group similar trajectories as a whole, thus discovering common trajectories. Our key observation is that clustering trajectories as a whole could miss common sub-trajectories. Discovering common sub-trajectories is very useful in many applications, especially if we have regions of special interest for analysis. In this paper, we propose a new partition-and-group framework for clustering trajectories, which partitions a trajectory into a set of line segments, and then, groups similar line segments together into a cluster. The primary advantage of this framework is to discover common sub-trajectories from a trajectory database. Based on this partition-and-group framework, we develop a trajectory clustering algorithm TRACLUS. Our algorithm consists of two phases: partitioning and grouping. For the first phase, we present a formal trajectory partitioning algorithm using the minimum description length (MDL) principle. For the second phase, we present a density-based line-segment clustering algorithm. Experimental results demonstrate that TRACLUS correctly discovers common sub-trajectories from real trajectory data.

Travel destination prediction using frequent crossing pattern from driving history

Conference Paper

Oct 2005

The modeling of user behavior patterns for personalized information services in mobile environments has recently become a popular research theme. Most of the research aims at predicting the user's future behavior (and/or location) by extracting frequent patterns from the history of location data sequences. However, sometimes user behavior changes according to the external information such as date, time, weather, etc., and we cannot accurately predict it based on the location data sequences alone. In this paper, we propose a new travel destination prediction method including day and time as external information. First, the user's travel history information including the location, date and time is stored. Then, from the external information, time/day categories that have correlation to the user's destination based on entropy are determined. Finally, using the categories, a destination that depends on the external information can be successfully predicted. An application of the method to data collected from a car navigation system showed possibility for an improved performance comparing to the conventional methods. Higher destination prediction accuracy during the first several minutes after user's departure was reported.

Mining sequential patterns by pattern-growth: The PrefixSpan approach

Article

Dec 2004

Sequential pattern mining is an important data mining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Most of the previously developed sequential pattern mining methods, such as GSP, explore a candidate generation-and-test approach [R. Agrawal et al. (1994)] to reduce the number of candidates to be examined. However, this approach may not be efficient in mining large sequence databases having numerous patterns and/or long patterns. In this paper, we propose a projection-based, sequential pattern-growth approach for efficient mining of sequential patterns. In this approach, a sequence database is recursively projected into a set of smaller projected databases, and sequential patterns are grown in each projected database by exploring only locally frequent fragments. Based on an initial study of the pattern growth-based sequential pattern mining, FreeSpan [J. Han et al. (2000)], we propose a more efficient method, called PSP, which offers ordered growth and reduced projected databases. To further improve the performance, a pseudoprojection technique is developed in PrefixSpan. A comprehensive performance study shows that PrefixSpan, in most cases, outperforms the a priori-based algorithm GSP, FreeSpan, and SPADE [M. Zaki, (2001)] (a sequential pattern mining algorithm that adopts vertical data format), and PrefixSpan integrated with pseudoprojection is the fastest among all the tested algorithms. Furthermore, this mining methodology can be extended to mining sequential patterns with user-specified constraints. The high promise of the pattern-growth approach may lead to its further extension toward efficient mining of other kinds of frequent patterns, such as frequent substructures.

Discovering Spatial Co-location Patterns: A Summary of Results

Article

Jan 2003

Shashi Shekhar

Given a collection of boolean spatial features, the co-location pattern discovery process finds the subsets of features frequently located together. For example, the analysis of an ecology dataset may reveal the frequent co-location of a fire ignition source feature with a needle vegetation type feature and a drought feature. The spatial co-location rule problem is different from the association rule problem. Even though boolean spatial feature types (also called spatial events) may correspond to items in association rules over market-basket datasets, there is no natural notion of transactions. This creates difficulty in using traditional measures (e.g. support, confidence) and applying association rule mining algorithms which use support based pruning. We propose a notion of user-specified neighborhoods in place of transactions to specify groups of items. New interest measures for spatial co-location patterns are proposed which are robust in the face of potentially infinite overlapping neighborhoods. We also propose an algorithm to mine frequent spatial co-location patterns and analyze its correctness, and completeness. We plan to carry out experimental evaluations and performance tuning in the near future.

Article

Dec 2001

We investigate techniques for analysis and retrieval of object trajectories in a two or three dimensional space. Such kind of data usually contain a great amount of noise, that makes all previously used metrics fail. Therefore, here we formalize non-metric similarity functions based on the Longest Common Subsequence (LCSS), which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to the similar portions of the sequences. Stretching of sequences in time is allowed, as well as global translating of the sequences in space. Efficient approximate algorithms that compute these similarity measures are also provided. We compare these new methods to the widely used Euclidean and Time Warping distance functions (for real and synthetic data) and show the superiority of our approach, especially under the strong presence of noise. We prove a weaker version of the triangle inequality and employ it in an indexing structure to answer nearest neighbor queries. Finally, we present experimental results that validate the accuracy and efficiency of our approach.

Traffic Density-Based Discovery of Hot Routes in Road Networks

Abstract

No full-text available

Recommended publications

Where Gifted Minds Meet Great Opportunities: Faculty of Excellence

Research on Intelligent Traffic Scheduling Based on Dijkstra Algorithm: Applications and Techniques...

Traffic lane controller using RFID and IoT