Table 1 - uploaded by Zhongfei Zhang
Content may be subject to copyright.
Source publication
The latent topic model plays an important role in the unsupervised learning from a corpus, which provides a probabilistic interpretation of the corpus in terms of the latent topic space. An underpinning assumption which most of the topic models are based on is that the documents are assumed to be independent of each other. However, this assumption...
Similar publications
Citation analysis does not tell the whole story about the innovativeness of scientific papers. Works by prominent authors tend to receive disproportionately many citations, while publications by less well-known researchers covering the same topics may not attract as much attention. In this paper we address the shortcomings of traditional scientomet...
Citations
... Also, vsLDA can be used for object recognition, image segmentation [23,25], or collaborative filtering [11,15] because vsLDA finds topics with more discriminative power. With vsLDA, we showed one way of incorporating variable selection into LDA and improving the results, so the natural next step would be to incorporate variable selection into other topic models [1,13,18,8] for improved results. ...
In latent Dirichlet allocation (LDA), topics are multinomial distributions
over the entire vocabulary. However, the vocabulary usually contains many words
that are not relevant in forming the topics. We adopt a variable selection
method widely used in statistical modeling as a dimension reduction tool and
combine it with LDA. In this variable selection model for LDA (vsLDA), topics
are multinomial distributions over a subset of the vocabulary, and by excluding
words that are not informative for finding the latent topic structure of the
corpus, vsLDA finds topics that are more robust and discriminative. We compare
three models, vsLDA, LDA with symmetric priors, and LDA with asymmetric priors,
on heldout likelihood, MCMC chain consistency, and document classification. The
performance of vsLDA is better than symmetric LDA for likelihood and
classification, better than asymmetric LDA for consistency and classification,
and about the same in the other comparisons.
In many applications, ideas that are described by a set of words often flow
between different groups. To facilitate users in analyzing the flow, we present
a method to model the flow behaviors that aims at identifying the lead-lag
relationships between word clusters of different user groups. In particular, an
improved Bayesian conditional cointegration based on dynamic time warping is
employed to learn links between words in different groups. A tensor-based
technique is developed to cluster these linked words into different clusters
(ideas) and track the flow of ideas. The main feature of the tensor
representation is that we introduce two additional dimensions to represent both
time and lead-lag relationships. Experiments on both synthetic and real
datasets show that our method is more effective than methods based on
traditional clustering techniques and achieves better accuracy. A case study
was conducted to demonstrate the usefulness of our method in helping users
understand the flow of ideas between different user groups on social media