Figure 3 - uploaded by Seyed Ali Bahrainian
Content may be subject to copyright.
Graphical model presentation of dDTM. C = c0. .. cn is the resulting topic chain.

Graphical model presentation of dDTM. C = c0. .. cn is the resulting topic chain.

Source publication
Conference Paper
Full-text available
Topic modeling is an important area which aims at indexing and exploring massive data streams. In this paper we introduce a discrete Dynamic Topic Modeling (dDTM) algorithm, which is able to model a dynamic topic that is not necessarily present over all time slices in a stream of documents. Our proposed model has applications in modeling dynamic to...

Contexts in source publication

Context 1
... to DTM, our model is also based on LDA [7]. Given a sequential dataset (i.e., stream of documents) di- vided into di↵erent time slices, as shown in Figure 3, LDA model is applied for computing topics in each time slice. In our model we use LDA for extracting topics and chaining topics over di↵erent time slices together in order to com- pute the topic evolutions. ...
Context 2
... Model: Our model, illustrated in Figure 3, esti- mates the topical chains based on the multinomial distri- butions over words (i.e., topics) in a non-linear fashion as opposed to DTM where topics evolve linearly. ...
Context 3
... Figure 3 show the graphical model representation of dDTM. The vectors of the distributions over words (i.e., topics) of each time slice are given as input to the Baum-Welch al- gorithm. ...

Similar publications

Conference Paper
Full-text available
Spark Streaming discretizes streams of data into micro-batches, each of which is further sub-divided into tasks and processed in parallel to improve job throughput. Previous work [2, 3] has lowered end-to-end latency in Spark Streaming. However, two causes of high tail latencies remain unaddressed: 1) data is not load-balanced across tasks, and 2)...
Article
Full-text available
In order to rapidly process large amounts of sensor stream data, it is effective to extract and use samples that reflect the characteristics and patterns of the data stream well. In this article, we focus on improving the uniformity confidence of KSample, which has the characteristics of random sampling in the stream environment. For this, we first...
Chapter
Full-text available
The paper presents a method for change detection in multidimensional streams of data based on a tensor model constructed from the Higher-Order Singular Value Decomposition of raw data tensors. The method was applied to the problem of video shot detection showing good accuracy and high speed of execution compared with other more time demanding tenso...

Citations

... It uses a Bayesian approach to model the change in the relative proportions of topics within documents over time. DTM is extended by several new models, such as the Discrete-Time Dynamic Topic Model (dDTM) [15] and the Continuous-Time Dynamic Topic Model (cDTM) [16] to handle continuity in different ways. A more recent family of topic models uses novel word embedding and language models to analyze content evolution. ...
Chapter
Full-text available
This paper presents ATEM, a novel framework for studying topic evolution in scientific archives. ATEM employs dynamic topic modeling and dynamic graph embedding to explore the dynamics of content and citations within a scientific corpus. ATEM explores a new notion of citation context that uncovers emerging topics by analyzing the dynamics of citation links between evolving topics. Our experiments demonstrate that ATEM can efficiently detect emerging cross-disciplinary topics within the DBLP archive of over five million computer science articles.
... It uses a Bayesian approach to model the change in the relative proportions of topics within documents over time. DTM is extended by several new models, such as the Discrete-Time Dynamic Topic Model (dDTM) [15] and the Continuous-Time Dynamic Topic Model (cDTM) [16] to handle continuity in different ways. A more recent family of topic models uses novel word embedding and language models to analyze content evolution. ...
Preprint
This paper presents ATEM, a novel framework for studying topic evolution in scientific archives. ATEM is based on dynamic topic modeling and dynamic graph embedding techniques that explore the dynamics of content and citations of documents within a scientific corpus. ATEM explores a new notion of contextual emergence for the discovery of emerging interdisciplinary research topics based on the dynamics of citation links in topic clusters. Our experiments show that ATEM can efficiently detect emerging cross-disciplinary topics within the DBLP archive of over five million computer science articles.
... Topic models using semantic context to improve document classification [18][19][20], [21][22][23], [24][25][26] Topic modeling applied to the study of short texts [27][28][29], [30][31][32], [33][34][35] Topic models that identify topics as they emerge over time [16,36,37], [17,38] Incorporation of topic features to improve document grouping accuracy [39] Multi-part topic model improves information retrieval and document classification performance [40] Topic models that automatically detect recurring patterns of expressions [41] Modeling of topics using word co-occurrence patterns. ...
... It uses a Bayesian approach to model the change in the relative proportions of topics within documents over time. DTM is extended by several new models [6,18,[31][32][33]. For instance, authors in [6] stated that time is inherently continuous and therefore they characterize their proposed model with a continuous distribution over timestamps. ...
... For instance, authors in [6] stated that time is inherently continuous and therefore they characterize their proposed model with a continuous distribution over timestamps. One other example is the Discrete-Time Dynamic Topic Model (dDTM) [32], which requires the data to have discrete time intervals, while the Continuous-Time Dynamic Topic Model (cDTM) [18] can handle any data point in time, regardless of the time resolution. ...
Preprint
As the amount of text data generated by humans and machines increases, the necessity of understanding large corpora and finding a way to extract insights from them is becoming more crucial than ever. Dynamic topic models are effective methods that primarily focus on studying the evolution of topics present in a collection of documents. These models are widely used for understanding trends, exploring public opinion in social networks, or tracking research progress and discoveries in scientific archives. Since topics are defined as clusters of semantically similar documents, it is necessary to observe the changes in the content or themes of these clusters in order to understand how topics evolve as new knowledge is discovered over time. In this paper, we introduce the Aligned Neural Topic Model (ANTM), a dynamic neural topic model that uses document embeddings to compute clusters of semantically similar documents at different periods and to align document clusters to represent their evolution. This alignment procedure preserves the temporal similarity of document clusters over time and captures the semantic change of words characterized by their context within different periods. Experiments on four different datasets show that ANTM outperforms probabilistic dynamic topic models (e.g. DTM, DETM) and significantly improves topic coherence and diversity over other existing dynamic neural topic models (e.g. BERTopic).
... While many researchers attempted to the infer the sentiment [47], the topic [9] or even the political leaning [62] conveyed by a piece of text, we are particularly interested in analysing and understanding the psychological state of the person behind the writing, better known as Mental State Assessment. The identification of mental state alterations using language has drawn the attention of many researchers over the last years [11,43,84,106,117]. ...
... Third, the context from where the data under analysis were collected, which in this case comprises social media platforms. Several venues were explored, mostly those concerned with the study of natural language processing and information retrieval, such as ACL 6 , EMNLP 7 , ECIR 8 and CLEF 9 . In addition, we also took into consideration multi-disciplinary journals directed to a broader audience and mainly focused on advancing science for the benefit of society, such as ScienceDirect, PLoS ONE and Scientific Reports. ...
Article
Full-text available
Mental state assessment by analysing user-generated content is a field that has recently attracted considerable attention. Today, many people are increasingly utilising online social media platforms to share their feelings and moods. This provides a unique opportunity for researchers and health practitioners to proactively identify linguistic markers or patterns that correlate with mental disorders such as depression, schizophrenia or suicide behaviour. This survey describes and reviews the approaches that have been proposed for mental state assessment and identification of disorders using online digital records. The presented studies are organised according to the assessment technology and the feature extraction process conducted. We also present a series of studies which explore different aspects of the language and behaviour of individuals suffering from mental disorders, and discuss various aspects related to the development of experimental frameworks. Furthermore, ethical considerations regarding the treatment of individuals’ data are outlined. The main contributions of this survey are a comprehensive analysis of the proposed approaches for online mental state assessment on social media, a structured categorisation of the methods according to their design principles, lessons learnt over the years and a discussion on possible avenues for future research.
... This is beneficial for recommending news to users, story reconstruction, and news summarization [7]. Also, it can foster research on dynamic topic modeling which tries to model the topic/event evolution over time [1,2]. ...
Conference Paper
Full-text available
In this paper, we present a collection of news documents labeled at the level of crisp events. Compared to other publicly-available collections, our dataset is made of heterogeneous documents published by popular news channels on different platforms in the same temporal window and, therefore, dealing with roughly the same events and topics. The collection spans 4 months and comprises 147K news documents from 27 news streams, i.e., 9 different channels and 3 platforms: Twitter, RSS portals, and news websites. We also provide relevance labels of news documents for some selected events. These relevance judgments were collected using crowdsourcing. The collection can be useful to researchers investigating challenging news-mining tasks, such as event detection and tracking, multi-stream analysis, and temporal analysis of news publishing patterns.
... In our previous work [4], we introduced a topic model for tracking the evolution of intermi ent topics over time. In other words, this model tracks the evolution of topics that may occur discretely over time, such that a topic does not need to be necessarily present over all time slices. ...
Conference Paper
Full-text available
Memory augmentation is the process of providing human memory with information that facilitates and complements the recall of an event in a person»s past. Recently, there has been a lot of attention on processing the content of meetings for later reuse, such as reviewing a meeting for supporting failing memories, keeping in mind key issues, verification, etc. That is due to the fact that meetings are essential for sharing knowledge in organizations. In this paper, we propose four novel time-series methods for predicting the topics that one should review in preparation for a next meeting. The predicted/recommended topics can be reviewed by a user as a memory augmentation process to facilitate recall of key points of a previous meeting. With the growing number of meetings at an organization that one may attend weekly and with the growing number of topics discussed, forgetting past meetings becomes eminent, hence recommending certain topics to the user in order to prepare the user for a future meeting is beneficial and important. Our experimental results on real-world data, demonstrate that our methods significantly outperform a state-of-the-art Hidden Markov Model baseline. This indicates the efficacy of our proposed methods for modeling semantics in temporal data.
... DTM assumes that all topics are present in all the time slices of a sequential corpus of text. The second baseline is the discrete Dynamic Topic Model (dDTM) [1] which modifies DTM by relaxing the assumption that a topic should be present in all the time slices. Thus, dDTM tracks the evolution of intermittent topics over time, hence the word "discrete" in the name. ...
... The model, based on Latent Dirichlet Allocation (LDA) [5], can capture the evolution of a topic over time and show various trends, for example, the changing probability of a term in a topic over time, or the popularity of that term at different time intervals. Based on DTM, Continuous-time Dynamic Topic model (cDTM) [13] and discrete Dynamic Topic Model (dDTM) [1] were introduced. cDTM can track any change in a topic and was shown to be effective for short time intervals and the changes in topics are often very small. ...
... In spite of being a powerful model for statistical interpretation of a sequential corpus, DTM comes with two limitations: (1) The assumption that topics change slowly over time which holds for some document collections (e.g., the articles from the journal Science) where topics evolve at a low pace, but does not hold for others (e.g., online discussions, news). Moreover, the topic evolutions may have skips in the timeline and it is reasonable to assume that in textual streams, different topics may emerge, disappear, and appear again after some time. ...
... To overcome this limitation, in [2] we presented dDTM that models the topics of time slice t i irrespective of those of t i−1 . Then, it chains the discovered topics using a Hidden Markov Model [23]. ...
... This holds for non-sequential collections of documents but not for streams, where the temporal information is very important and can be leveraged for better detecting the topics and for tracking their evolution over time. To address this limitation, Blei and Lafferty proposed Dynamic Topic Model (DTM) [5], while in [2] we presented a discretized version of it for dealing with non-linear evolutions of topics. Event detection. ...
Conference Paper
Linking multiple news streams based on the reported events and analyzing the streams' temporal publishing patterns are two very important tasks for information analysis, discovering newsworthy stories, studying the event evolution, and detecting untrustworthy sources of information. In this paper, we propose techniques for cross-linking news streams based on the reported events with the purpose of analyzing the temporal dependencies among streams. Our research tackles two main issues: (1) how news streams are connected as reporting an event or the evolution of the same event and (2) how timely the newswires report related events using different publishing platforms. Our approach is based on dynamic topic modeling for detecting and tracking events over the timeline and on clustering news according to the events. We leverage the event-based clustering to link news across different streams and present two scoring functions for ranking the streams based on their timeliness in publishing news about a specific event.
Article
News documents published online represent an important source of information that can be used for event detection and tracking as well as for analyzing the temporal publishing relationships among different news streams. In this paper, we describe our research on detecting, tracking, and predicting events from multiple news streams. We also analyze the temporal publishing patterns of newswires on different platforms and their timeliness in reporting the events. First, we present an approach based on discrete dynamic topic modeling and Hidden Markov Model for event detection and tracking. Then, we predict the events that would persist in the next time slice, which can be important for forecasting facts that would be popular in the future. We leverage the detected events for clustering news documents according to the events they describe. This allows us to determine which newswires published news about an event and to analyze their temporal ordering in reporting events. Finally, we propose two scoring functions for ranking the newswires based on their timeliness. We tested our methodologies on different collections of news articles and tweets. Moreover, we built a collection of heterogeneous news documents with event-document labels which were manually assessed using crowdsourcing. Experimental results showed that, compared to the traditional dynamic topic model, our approach is able to timely detect emerging topics (events). Overall, we could register an event coverage of about 90% w.r.t. the pool of labeled events. The evolution of events is captured by event chains which are highly coherent (0.76) and informative (0.60) allowing to effectively reconstruct the stories. Furthermore, the event-based clustering of news documents has a good trade-off of precision and recall (F-score = 0.83) and the topic keywords provide a semantic description of the events represented by the clusters. Concerning our analysis on the temporal publishing relationships among news streams, we could observe interesting patterns on the usage of the different platforms, for example, some newswires still favor their own official websites, while others tend to publish more timely on Twitter.