Figure - available from: Journal of Big Data
This content is subject to copyright. Terms and conditions apply.
Event detection results at time-stamp t=2017-01-15\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t = 2017-01-15$$\end{document} 7pm: a Constructed quad-tree using tweets in the interval [t-T:t\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t-T:t$$\end{document}) where T=3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=3$$\end{document}-days, b Flagged events with Poisson signal, and c Distribution of Poisson signals for all nodes.

Event detection results at time-stamp t=2017-01-15\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t = 2017-01-15$$\end{document} 7pm: a Constructed quad-tree using tweets in the interval [t-T:t\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t-T:t$$\end{document}) where T=3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=3$$\end{document}-days, b Flagged events with Poisson signal, and c Distribution of Poisson signals for all nodes.

Source publication
Article
Full-text available
A key challenge in mining social media data streams is to identify events which are actively discussed by a group of people in a specific local or global area. Such events are useful for early warning for accident, protest, election or breaking news. However, neither the list of events nor the resolution of both event time and space is fixed or kno...

Similar publications

Preprint
Full-text available
A key challenge in mining social media data streams is to identify events which are actively discussed by a group of people in a specific local or global area. Such events are useful for early warning for accident, protest, election or breaking news. However, neither the list of events nor the resolution of both event time and space is fixed or kno...

Citations

... Spatiotemporal pattern mining or event detection (Yu et al. 2020) can detect changes in air pollution levels by extracting meaningful patterns from accumulated datasets (Guralnik and Srivastava 1999). Event detection is a data mining method used in many applications such as social media, sensor networks, urban traffic, and video streams (George et al. 2021;Guille and Favre 2015;Souto and Liebig 2016;Medioni et al. 2001). Spatial and temporal dimensions are critical for event detection (Kisilevich et al. 2010) and extracting patterns. ...
Article
Full-text available
In recent years, our world has experienced significant disruptions due to the COVID-19 pandemic, and Russia's 2022 invasion of Ukraine, impacting human activities and the global environment. This paper explored air quality changes in Ukraine due to COVID-19, and Russia's invasion of Ukraine using on-demand with a what-you-see-is-what-you-get approach. During the COVID-19 pandemic, strict quarantine policies in Ukraine led to a 2% reduction in tropospheric NO2 concentration before the lockdown and 4% during the lockdown period. Cities like Kyiv, Donetsk, and Dnipro exhibited reductions of 5%, 11%, and 16%, respectively. Total SO2 column concentration decreased by 6% before the lockdown and 2.5% during the lockdown period, except in high population density areas. Kyiv showed the highest reduction of 17% in SO2 concentration, while Donetsk and Dnipro exhibited an 11% reduction. However, during the Russian invasion, there was a significant increase in tropospheric NO2 concentration in heavily destroyed Kharkiv while most eastern regions experienced a reduction. The total SO2 column was 48% higher before the war but reduced throughout the country after the war, except for in Kyiv and a few central regions. These findings can contribute to analyzing air pollution and building digital twin simulations for future reconstruction scenarios.
... They also suggest a new quality metric called the strength index, which determines the accuracy of the reported event automatically. To build an intrusion detection platform and gather massive amounts of data for intrusion detection, Hye-Min Lee and Sang-Joon Lee [25] recommended using DL and CNN (Convolutional Neural Networks). By gathering and examining user visit logs and linking to big data, they develop an intelligent big data platform that gathers data. ...
Article
Event detection acts an important role among modern society and it is a popular computer process that permits to detect the events automatically. Big data is more useful for the event detection due to large size of data. Multimodal event detection is utilized for the detection of events using heterogeneous types of data. This work aims to perform for classification of diverse events using Optimized Ensemble learning approach. The Multi-modal event data including text, image and audio are sent to the user devices from cloud or server where three models are generated for processing audio, text and image. At first, the text, image and audio data is processed separately. The process of creating a text model includes pre-processing using Imputation of missing values and data normalization. Then the textual feature extraction using integrated N-gram approach. The Generation of text model using Convolutional two directional LSTM (2DCon_LSTM). The steps involved in image model generation are pre-processing using Min-Max Gaussian filtering (MMGF). Image feature extraction using VGG-16 network model and generation of image model using Tweaked auto encoder (TAE) model. The steps involved in audio model generation are pre-processing using Discrete wavelet transform (DWT). Then the audio feature extraction using Hilbert Huang transform (HHT) and Generation of audio model using Attention based convolutional capsule network (Attn_CCNet). The features obtained by the generated models of text, image and audio are fused together by feature ensemble approach. From the fused feature vector, the optimal features are trained through improved battle royal optimization (IBRO) algorithm. A deep learning model called Convolutional duo Gated recurrent unit with auto encoder (C-Duo GRU_AE) is used as a classifier. Finally, different types of events are classified where the global model are then sent to the user devices with high security and offers better decision making process. The proposed methodology achieves better performances are Accuracy (99.93%), F1-score (99.91%), precision (99.93%), Recall (99.93%), processing time (17seconds) and training time (0.05seconds). Performance analysis exceeds several comparable methodologies in precision, recall, accuracy, F1 score, training time, and processing time. This designates that the proposed methodology achieves improved performance than the compared schemes. In addition, the proposed scheme detects the multi-modal events accurately.
... This makes social media a viable data source for user-generated content, particularly during a disaster when traditional surveys are inconvenient. Many literatures have demonstrated the effectiveness of social media data in disaster management, particularly in crisis communication, human mobility, damage valuation, and event detection, among others (16,(24)(25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35)(36). However, there are very few studies that have looked at the social network connectivity of the local people and networklevel properties of the local communities along with their crisis narrative analysis during a compound hazard like Hurricane Laura amid the COVID-19 pandemic. ...
Article
Full-text available
Online social networks allow different agencies and the public to interact and share the underlying risks and protective actions during major disasters. This study revealed such crisis communication patterns during Hurricane Laura compounded by the COVID-19 pandemic. Hurricane Laura was one of the strongest (Category 4) hurricanes on record to make landfall in Cameron, Louisiana, U.S. Using an application programming interface (API), this study utilizes large-scale social media data obtained from Twitter through the recently released academic track that provides complete and unbiased observations. The data captured publicly available tweets shared by active Twitter users from the vulnerable areas threatened by Hurricane Laura. Online social networks were based on Twitter's user influence feature (i.e., mentions or tags) that allows notification of other users while posting a tweet. Using network science theories and advanced community detection algorithms, the study split these networks into 21 components of various size, the largest of which contained eight well-defined communities. Several natural language processing techniques (i.e., word clouds, bigrams, topic modeling) were applied to the tweets shared by the users in these communities to observe their risk-taking or risk-averse behavior during a major compounding crisis. Social media accounts of local news media, radio, universities, and popular sports pages were among those which heavily involved and closely interacted with local residents. In contrast, emergency management and planning units in the area engaged less with the public. The findings of this study provide novel insights into the design of efficient social media communication guidelines to respond better in future disasters.
... The alternative lies in evaluations such as George et al. (2021)'s. George et al. (2021)'s evaluation only includes two baselines, but the lengthy experiments to tweak the parameter-laden algorithmsincluding the authors' own-constrained the evaluation to a small sample of the data. ...
... The alternative lies in evaluations such as George et al. (2021)'s. George et al. (2021)'s evaluation only includes two baselines, but the lengthy experiments to tweak the parameter-laden algorithmsincluding the authors' own-constrained the evaluation to a small sample of the data. Sacrifices like George et al. (2021)'s appear commonly. ...
... George et al. (2021)'s evaluation only includes two baselines, but the lengthy experiments to tweak the parameter-laden algorithmsincluding the authors' own-constrained the evaluation to a small sample of the data. Sacrifices like George et al. (2021)'s appear commonly. The 20 manual evaluations that provided dataset statistics in our review averaged just 5.95 corpora. ...
Article
Full-text available
Event tracking literature based on Twitter does not have a state-of-the-art. What it does have is a plethora of manual evaluation methodologies and inventive automatic alternatives: incomparable and irreproducible studies incongruous with the idea of a state-of-the-art. Many researchers blame Twitter's data sharing policy for the lack of common datasets and a universal ground truth–for the lack of reproducibility–but many other issues stem from the conscious decisions of those same researchers. In this paper, we present the most comprehensive review yet on event tracking literature's evaluations on Twitter. We explore the challenges of manual experiments, the insufficiencies of automatic analyses and the misguided notions on reproducibility. Crucially, we discredit the widely-held belief that reusing tweet datasets could induce reproducibility. We reveal how tweet datasets self-sanitize over time; how spam and noise become unavailable at much higher rates than legitimate content, rendering downloaded datasets incomparable with the original. Nevertheless, we argue that Twitter's policy can be a hindrance without being an insurmountable barrier, and propose how the research community can make its evaluations more reproducible. A state-of-the-art remains attainable for event tracking research.
... A spatio-temporal event is defined as an event that occurs at a specific time and location of interest to stakeholders [6,7]. Natural disasters, crimes, and business events that occurred at a specific time and location are examples of spatio-temporal events. ...
Article
Full-text available
As the scale of online news and social media expands, attempts to analyze the latest social issues and consumer trends are increasing. Research on detecting spatio-temporal event sentences in text data is being actively conducted. However, a document contains important spatio-temporal events necessary for event analysis, as well as non-critical events for event analysis. It is important to increase the accuracy of event analysis by extracting only the key events necessary for event analysis from among a large number of events. In this study, we define important 'representative spatio-temporal event documents' for the core subject of documents and propose a BiLSTM-based document classification model to classify representative spatio-temporal event documents. We build 10,000 gold-standard training datasets to train the proposed BiLSTM model. The experimental results show that our BiLSTM model improves the F1 score by 2.6% and the accuracy by 4.5% compared to the baseline CNN model.
... Yang et al. [20] proposed a burst event detection method for social networks based on topological features, considering mining network structural features. George et al. [21] proposed an unsupervised online spatiotemporal event detection system that can detect events in social media data at different spatiotemporal resolutions in real-time. However, the current burst event detection mainly focuses on a single social media, only using the data of Twitter or Weibo for burst event detection, without introducing the information published by other social media. ...
Preprint
Full-text available
With the frequent occurrence of public emergencies around the world today, how to effectively use big data and artificial intelligence technologies to accurately and efficiently detect and identify burst events of the Internet has become a hot issue. These existing burst event detection methods lack of comprehensively considering multi-data source of social media and their influences, which leads to a lower accuracy. This paper proposes a novel burst event detection model based on cross social media influence and unsupervised clustering. In this article, we, explain the basic framework of burst event detection, along with characteristics of social media influence, and the word frequency features and growth rate features. In our proposed approach, according to the time information in the data stream, social media network data were sliced and the burst word features in each time window were calculated. Then, the three burst features were fused to compute the burst degree of words; after that the words larger than the threshold were selected to form the burst word set. Finally, the agglomerative hierarchical clustering method is introduced to cluster the burst word set and extracts the burst event from it. The results of the experiment on a real-world social media dataset show that the detection method has significantly improved in Precision and F1-score value compared with the latest four burst event detection methods and prove the effectiveness of the proposed method.
... Cao et al., 2015;X. Zheng, J. Han, and A. Sun, 2018), event detection (George et al., 2021), among others. In contrast to the previous topic quality metrics (TC and PMI), these metrics allow us to evaluate how relevant and accurate the detected topics are, compared to the ground truth topics. ...
Preprint
Capturing the similarities between human language units is crucial for explaining how humans associate different objects, and therefore its computation has received extensive attention, research, and applications. With the ever-increasing amount of information around us, calculating similarity becomes increasingly complex, especially in many cases, such as legal or medical affairs, measuring similarity requires extra care and precision, as small acts within a language unit can have significant real-world effects. My research goal in this thesis is to develop regression models that account for similarities between language units in a more refined way. Computation of similarity has come a long way, but approaches to debugging the measures are often based on continually fitting human judgment values. To this end, my goal is to develop an algorithm that precisely catches loopholes in a similarity calculation. Furthermore, most methods have vague definitions of the similarities they compute and are often difficult to interpret. The proposed framework addresses both shortcomings. It constantly improves the model through catching different loopholes. In addition, every refinement of the model provides a reasonable explanation. The regression model introduced in this thesis is called progressively refined similarity computation, which combines attack testing with adversarial training. The similarity regression model of this thesis achieves state-of-the-art performance in handling edge cases.
... This makes social media a viable data source for user-generated content, particularly during a disaster when traditional surveys are inconvenient. Many literatures have demonstrated the effectiveness of social media data in disaster management, particularly in crisis communication (16,(24)(25)(26)(27)(28)(29), human mobility (30)(31)(32)(33), damage valuation (34), and event detection (35,36), among others. However, there are very few studies that have looked at the social network connectivity of the local people and network-level properties of the local communities along with their crisis narrative analysis during a compound hazard like hurricane Laura amid the Covid 19 pandemic. ...
Preprint
Full-text available
Online social networks allow different agencies and the public to interact and share the underlying risks and protective actions during major disasters. This study revealed such crisis communication patterns during hurricane Laura compounded by the COVID-19 pandemic. Laura was one of the strongest (Category 4) hurricanes on record to make landfall in Cameron, Louisiana. Using the Application Programming Interface (API), this study utilizes large-scale social media data obtained from Twitter through the recently released academic track that provides complete and unbiased observations. The data captured publicly available tweets shared by active Twitter users from the vulnerable areas threatened by Laura. Online social networks were based on user influence feature ( mentions or tags) that allows notifying other users while posting a tweet. Using network science theories and advanced community detection algorithms, the study split these networks into twenty-one components of various sizes, the largest of which contained eight well-defined communities. Several natural language processing techniques (i.e., word clouds, bigrams, topic modeling) were applied to the tweets shared by the users in these communities to observe their risk-taking or risk-averse behavior during a major compounding crisis. Social media accounts of local news media, radio, universities, and popular sports pages were among those who involved heavily and interacted closely with local residents. In contrast, emergency management and planning units in the area engaged less with the public. The findings of this study provide novel insights into the design of efficient social media communication guidelines to respond better in future disasters.
... In the paper [26], random forest and gradient boosting classifier is analyzed and achieved the precision rate is 81 and 79 respectively. Similarly, the other methods are analyzed like PGM, SVM and DT [27] and clustering method in the paper [28]. ...
... In the paper [26], random forest and gradient boosting classifier is analyzed and achieved the precision rate is 81 and 79 respectively. Similarly, the other methods are analyzed like PGM, SVM and DT [27] and clustering method in the paper [28]. While comparing the methods, the less accuracy is achieved 72.5 by utilizing the method of DT and the high level of accuracy is achieved in the proposed method i.e., 91.032. ...
Article
Full-text available
Nowadays, event forecasting in Twitter can be considered an essential, significant and difficult issue. Maximum conventional methods are focusing on temporal events like sports or elections. These methods do not calculate the spatial features too their correlation analysis. Hence, this paper proposes an Improved Deep Belief Neural Network (iDBNN) for civil unrest event forecasting in twitter data. This proposed method is utilized to forecast the future event with the consideration of the tweets. The proposed method is designed with three phases named as pre-processing phase, feature extraction phase, and civil unrest event forecasting. Initially, the proposed method is used to train the Hong Kong Protest event 2019 tweet data for forecasting events. In the pre-processing phase, removal of special symbol, removal of URL, username removal, tokenization and stop word removal are done. After that, the essential features such as domain weight, event weight, textual similarity, spatial similarity, temporal similarity, and Relative Document-Term Frequency Difference (RDTFD) are extracted and then applied for training the proposed model. To empower the training phase of proposed iDBNN method, the Jellyfish Algorithm is utilized to select optimal weight parameter coefficients of DBNN for training the model parameters. The projected technique is authenticated by statistical capacities and compared with the conventional methods such as Hidden Markov Model (HMM) and Random Forest (RF) respectively. Comparing with other traditional methods, the proposed model shows better performance in terms of prediction and processing time. The iDBNN model shows 91% prediction accuracy that is much higher than the traditional DBNN.
... With the rapid emergence of social media, social data with geotags (locations) open up new possibilities for many significant applications, such as location-based services ranging from targeted advertising to crisis detection (George et al. 2021). However, the low ratio of geotagged social data makes the pursuit of the aforementioned applications challenging. ...
Article
Geographical information provided in social media data is useful for many valuable applications. However, only a small proportion of social media posts are explicitly geotagged with their posting locations, which makes the pursuit of these applications challenging. Motivated by this, we propose a 2-level hierarchical classification method that builds upon a BERT model, coupled with textual information and temporal context, which we denote HierBERT. As far as we are aware, this work is the first to utilize a 2-level hierarchical classification approach alongside BERT and temporal information for geolocation prediction. Experimental results based on two social media datasets show that HierBERT outperforms various state-of-art baselines in terms of accuracy and distance error metrics.