Graphical model presentation of dDTM. C = c0. .. cn is the resulting topic chain.

Source publication

Modeling discrete dynamic topics

Conference Paper

Full-text available

Apr 2017

Topic modeling is an important area which aims at indexing and exploring massive data streams. In this paper we introduce a discrete Dynamic Topic Modeling (dDTM) algorithm, which is able to model a dynamic topic that is not necessarily present over all time slices in a stream of documents. Our proposed model has applications in modeling dynamic to...

Context 1

... to DTM, our model is also based on LDA [7]. Given a sequential dataset (i.e., stream of documents) di- vided into di↵erent time slices, as shown in Figure 3, LDA model is applied for computing topics in each time slice. In our model we use LDA for extracting topics and chaining topics over di↵erent time slices together in order to com- pute the topic evolutions. ...

View in full-text

Context 2

... Model: Our model, illustrated in Figure 3, esti- mates the topical chains based on the multinomial distri- butions over words (i.e., topics) in a non-linear fashion as opposed to DTM where topics evolve linearly. ...

View in full-text

Context 3

... Figure 3 show the graphical model representation of dDTM. The vectors of the distributions over words (i.e., topics) of each time slice are given as input to the Baum-Welch al- gorithm. ...

View in full-text

Reducing tail latencies in micro-batch streaming workloads

Conference Paper

Full-text available

Sep 2017

Spark Streaming discretizes streams of data into micro-batches, each of which is further sub-divided into tasks and processed in parallel to improve job throughput. Previous work [2, 3] has lowered end-to-end latency in Spark Streaming. However, two causes of high tail latencies remain unaddressed: 1) data is not load-balanced across tasks, and 2)...

Figure 5. A sampling example of the operation procedure in KSample when...

Figure 6. Comparison of uniformity confidence between KSample and...

Figure 7. (a) Operation procedure of KSample for p = 0:3 and nine...

Figure 9. Operation procedure of (a) KSample and (b) UCKSample for p =...

Figure 10. Working procedure of UC-KSample when the sample size...

Variable size sampling to support high uniformity confidence in sensor data streams

Article

Full-text available

Apr 2018

In order to rapidly process large amounts of sensor stream data, it is effective to extract and use samples that reflect the characteristics and patterns of the data stream well. In this article, we focus on improving the uniformity confidence of KSample, which has the characteristics of random sampling in the stream environment. For this, we first...

Change Detection in Multidimensional Data Streams with Efficient Tensor Subspace Model

Chapter

Full-text available

Jun 2018

Boguslaw Cyganek

The paper presents a method for change detection in multidimensional streams of data based on a tensor model constructed from the Higher-Order Singular Value Decomposition of raw data tensors. The method was applied to the problem of video shot detection showing good accuracy and high speed of execution compared with other more time demanding tenso...

ATEM: A Topic Evolution Model for the Detection of Emerging Topics in Scientific Archives

Chapter

Full-text available

Feb 2024

This paper presents ATEM, a novel framework for studying topic evolution in scientific archives. ATEM employs dynamic topic modeling and dynamic graph embedding to explore the dynamics of content and citations within a scientific corpus. ATEM explores a new notion of citation context that uncovers emerging topics by analyzing the dynamics of citation links between evolving topics. Our experiments demonstrate that ATEM can efficiently detect emerging cross-disciplinary topics within the DBLP archive of over five million computer science articles.

ATEM: A Topic Evolution Model for the Detection of Emerging Topics in Scientific Archives

Preprint

Jun 2023

This paper presents ATEM, a novel framework for studying topic evolution in scientific archives. ATEM is based on dynamic topic modeling and dynamic graph embedding techniques that explore the dynamics of content and citations of documents within a scientific corpus. ATEM explores a new notion of contextual emergence for the discovery of emerging interdisciplinary research topics based on the dynamics of citation links in topic clusters. Our experiments show that ATEM can efficiently detect emerging cross-disciplinary topics within the DBLP archive of over five million computer science articles.

On the modeling of cyber-attacks associated with social engineering: A parental control prototype

Article

Jun 2023

ANTM: An Aligned Neural Topic Model for Exploring Evolving Topics

Preprint

Feb 2023

As the amount of text data generated by humans and machines increases, the necessity of understanding large corpora and finding a way to extract insights from them is becoming more crucial than ever. Dynamic topic models are effective methods that primarily focus on studying the evolution of topics present in a collection of documents. These models are widely used for understanding trends, exploring public opinion in social networks, or tracking research progress and discoveries in scientific archives. Since topics are defined as clusters of semantically similar documents, it is necessary to observe the changes in the content or themes of these clusters in order to understand how topics evolve as new knowledge is discovered over time. In this paper, we introduce the Aligned Neural Topic Model (ANTM), a dynamic neural topic model that uses document embeddings to compute clusters of semantically similar documents at different periods and to align document clusters to represent their evolution. This alignment procedure preserves the temporal similarity of document clusters over time and captures the semantic change of words characterized by their context within different periods. Experiments on four different datasets show that ANTM outperforms probabilistic dynamic topic models (e.g. DTM, DETM) and significantly improves topic coherence and diversity over other existing dynamic neural topic models (e.g. BERTopic).

A Survey of Computational Methods for Online Mental State Assessment on Social Media

Article

Full-text available

Mar 2021

Mental state assessment by analysing user-generated content is a field that has recently attracted considerable attention. Today, many people are increasingly utilising online social media platforms to share their feelings and moods. This provides a unique opportunity for researchers and health practitioners to proactively identify linguistic markers or patterns that correlate with mental disorders such as depression, schizophrenia or suicide behaviour. This survey describes and reviews the approaches that have been proposed for mental state assessment and identification of disorders using online digital records. The presented studies are organised according to the assessment technology and the feature extraction process conducted. We also present a series of studies which explore different aspects of the language and behaviour of individuals suffering from mental disorders, and discuss various aspects related to the development of experimental frameworks. Furthermore, ethical considerations regarding the treatment of individuals’ data are outlined. The main contributions of this survey are a comprehensive analysis of the proposed approaches for online mental state assessment on social media, a structured categorisation of the methods according to their design principles, lessons learnt over the years and a discussion on possible avenues for future research.

A Multi-Source Collection of Event-Labeled News Documents

Conference Paper

Full-text available

Sep 2019

In this paper, we present a collection of news documents labeled at the level of crisp events. Compared to other publicly-available collections, our dataset is made of heterogeneous documents published by popular news channels on different platforms in the same temporal window and, therefore, dealing with roughly the same events and topics. The collection spans 4 months and comprises 147K news documents from 27 news streams, i.e., 9 different channels and 3 platforms: Twitter, RSS portals, and news websites. We also provide relevance labels of news documents for some selected events. These relevance judgments were collected using crowdsourcing. The collection can be useful to researchers investigating challenging news-mining tasks, such as event detection and tracking, multi-stream analysis, and temporal analysis of news publishing patterns.

Augmentation of Human Memory: Anticipating Topics that Continue in the Next Meeting

Conference Paper

Full-text available

Mar 2018

Memory augmentation is the process of providing human memory with information that facilitates and complements the recall of an event in a person»s past. Recently, there has been a lot of attention on processing the content of meetings for later reuse, such as reviewing a meeting for supporting failing memories, keeping in mind key issues, verification, etc. That is due to the fact that meetings are essential for sharing knowledge in organizations. In this paper, we propose four novel time-series methods for predicting the topics that one should review in preparation for a next meeting. The predicted/recommended topics can be reviewed by a user as a memory augmentation process to facilitate recall of key points of a previous meeting. With the growing number of meetings at an organization that one may attend weekly and with the growing number of topics discussed, forgetting past meetings becomes eminent, hence recommending certain topics to the user in order to prepare the user for a future meeting is beneficial and important. Our experimental results on real-world data, demonstrate that our methods significantly outperform a state-of-the-art Hidden Markov Model baseline. This indicates the efficacy of our proposed methods for modeling semantics in temporal data.

Predicting Topics in Scholarly Papers

Chapter

Full-text available

Mar 2018

Linking News across Multiple Streams for Timeliness Analysis

Conference Paper

Nov 2017

Linking multiple news streams based on the reported events and analyzing the streams' temporal publishing patterns are two very important tasks for information analysis, discovering newsworthy stories, studying the event evolution, and detecting untrustworthy sources of information. In this paper, we propose techniques for cross-linking news streams based on the reported events with the purpose of analyzing the temporal dependencies among streams. Our research tackles two main issues: (1) how news streams are connected as reporting an event or the evolution of the same event and (2) how timely the newswires report related events using different publishing platforms. Our approach is based on dynamic topic modeling for detecting and tracking events over the timeline and on clustering news according to the events. We leverage the event-based clustering to link news across different streams and present two scoring functions for ranking the streams based on their timeliness in publishing news about a specific event.

Event mining and timeliness analysis from heterogeneous news streams

Article

May 2019
INFORM PROCESS MANAG

News documents published online represent an important source of information that can be used for event detection and tracking as well as for analyzing the temporal publishing relationships among different news streams. In this paper, we describe our research on detecting, tracking, and predicting events from multiple news streams. We also analyze the temporal publishing patterns of newswires on different platforms and their timeliness in reporting the events. First, we present an approach based on discrete dynamic topic modeling and Hidden Markov Model for event detection and tracking. Then, we predict the events that would persist in the next time slice, which can be important for forecasting facts that would be popular in the future. We leverage the detected events for clustering news documents according to the events they describe. This allows us to determine which newswires published news about an event and to analyze their temporal ordering in reporting events. Finally, we propose two scoring functions for ranking the newswires based on their timeliness. We tested our methodologies on different collections of news articles and tweets. Moreover, we built a collection of heterogeneous news documents with event-document labels which were manually assessed using crowdsourcing. Experimental results showed that, compared to the traditional dynamic topic model, our approach is able to timely detect emerging topics (events). Overall, we could register an event coverage of about 90% w.r.t. the pool of labeled events. The evolution of events is captured by event chains which are highly coherent (0.76) and informative (0.60) allowing to effectively reconstruct the stories. Furthermore, the event-based clustering of news documents has a good trade-off of precision and recall (F-score = 0.83) and the topic keywords provide a semantic description of the events represented by the clusters. Concerning our analysis on the temporal publishing relationships among news streams, we could observe interesting patterns on the usage of the different platforms, for example, some newswires still favor their own official websites, while others tend to publish more timely on Twitter.

Graphical model presentation of dDTM. C = c0. .. cn is the resulting topic chain.

Contexts in source publication

Similar publications

Citations