Conference PaperPDF Available

Detection of Rumor in Social Media

Authors:
DETECTION OF RUMOR IN SOCIAL MEDIA
Manan Vohra Misha Kakkar
Department of CSE Department of CSE
Amity University, Uttar Pradesh Amity University, Uttar Pradesh
India India
mananvohra564@gmail.com mkakkar@amity.edu
Abstract - Proliferation of internet made social media
popular among people with approximately 2.34 billion
users worldwide, this popularity made social media a
one of the major information source but it also made
spread of rumor very easy and hence the information
on social media carry a lot of false claims. Most of the
previous works on detection rumors focused on the
manually, language processing or creating a directed
graph. In this paper, we have proposed a system for
automatic rumor detection on social media. Our system
will collect data from social media sites like Twitter and
will preprocess this data to generate topic and check for
its veracity. Detection of rumor may have practical
application for journalists, news readers, emergency
services and financial markets and to help minimize the
spread of false information on social media.
Keywords- Rumor Detection, Social Media, Rumor
Spreading
I. INTRODUCTION
Today social media has taken over the traditional
methods of communication and with this has brought
a change on how information is being delivered to
large audience, making it easy to spread information
within short period. But veracity of the information
or the news that is being spread all over the social
media is not confirmed, and many times it has
happened that is has spread chaos among people all
over the world. Today most of the people get their
news from social media and get exposed to a daily
dose of rumors, hoaxes, conspiracy theories and
misleading news, and all this gets mixed with the
correct information from the honest or reliable
sources making truth harder to discern. Social media
has an enormous audience which makes false
information to viral as likely as correct information,
as people use social media on daily basis and with a
few share or retweet misleading information reaches
to many people who further shares it and process
continues spreading the rumor to wide audience.
Various experiments results shows that people most
people tend to trust links or information that there
friends share without even verifying the source of
information. Also people believe on the
misinformation they get from the links they click on
social media, for example fake news and ads, huge
number of people becomes bait to this and open the
page, hence getting fake news and page owner earn
money from ads on that page, so fake news could
also make money while polluting the social media
with falsehood, these sites or page are commonly
known as click bait sites which manufacture hoaxes
to make money from advertisements and hyper
partisan sites publish and spread rumors on social
media to influence public opinion.
Social media sites like Facebook, Twitter lack tool
which can detect any information that is being spread
on the sites for being a rumor, and can shut it down if
necessary as soon as possible before it creates any
problem. Presently these giant social media sites are
totally dependent on their staff and time to assign any
information a rumor, which could have been
successful 10 or 15 years back when social media
usage and data on it was less but currently social
media sites have billions on users and huge amount
of data growing exponentially which makes it really
difficult to manually check the veracity of all the
information flowing on social media.
In June 2016 a fake “Facebook privacy notice”
spread like a forest fire on social media, urging users
to copy and paste a particular piece of text on their
Facebook wall which will help them retain their
profile privacy, like that thing they share or photos
they upload or their personal information, which
ultimately turned out to be a rumor and millions of
Facebook users shared this and became a victim this
false claim. A rumor is a piece of information whose
veracity and source is not confirmed and can bring
harm in many ways.
Thus in this paper, a rumor detection system is
proposed, which determine the authenticity of an
information and classify it as rumor or not a rumor.
Paper is organized as section II discusses literature
review. Problem is defined in section III.
Methodology used to develop this rumor detection
system is explained in section IV. Result and analysis
485
978-1-5386-1719-9/18/$31.00 c
2018 IEEE
is presented in section V followed by conclusion in
section VI.
II. LITERATURE REVIEW
Much previous works has tried to develop for
complex problem, like detecting the nature that is
true or false of a meme that is spreading over social
media [1, 2, 3, 4]. “Truthy” system attempted to solve
the meme rumor problem by categorising on basis of
their spread that is spreading “organically” or
“astroturf” campaign that is spread by a single person
or organisation as rumor[5, 6].
Recent studies collected image of Hurricane Sandy
from twitter, which contained both fake and real
images of Hurricane Sandy. From which they
randomly selected 5767 tweets and analysed them by
using the properties of a tweet like number of friends,
status, followers, its content and metadata, on the
basis of which they categorised them a real of fake
[7].
In recent studies Zhao, et al. proposed early detection
of rumor using the cue terms like “unconfirmed”,
“not true” or “debunk” in the tweet content to find
out whether there is uncertainty or denial or
questioning in it. These terms captures the tweet
content hidden implications and they categorized the
questioning or uncertainty tweets are possible rumor.
They were able to define the temporal traits of non-
rumor or rumor events but the clear cut difference
was not there[8].
In their work Jing Ma et al. presented a RNN-based
rumor detection model. Using Sina Weibo and twitter
data they developed two microblog dataset, and
proposed a method that converted the incoming
stream of microblog post as continuous variable-
length rime series, and presented RNNs with
different kind of layers and hidden units for
classification [9].
Friggeri, et al. characterize the structure of rumor that
is spread on Facebook. They considered copying- and-
pasting of test as test post and uploading and re-
sharing of photos as two major technological
affordances and near exact path that a rumor take on
social network was constructed. Within the Facebook
they measured the longevity of instance and
replication and analysed comments with links to
Snops.com a rumor debunking website[10].
However both [Jing Ma et al., 2016][ Friggeri, et al.
rumor debunking sites like Snops.com, but our model
is not dependent on any of the rumor debunking site
and works to early detect a rumor.
III. PROBLEM DEFINITION
Social media has an enormous audience which makes
false information to viral as likely as correct
information, as people use social media on daily basis
and with a few share or retweet misleading
information reaches to many people who further
shares it and process continues spreading the rumor
to wide audience. A rumor detection system works in
a two-phase manner. In first phase, extracting some
piece of information from social networking site(s).
In second phase it classifies whether that piece of
information is correct or not.
This paper present a system which works in the same
two phase manner. First phase includes process of
data collection, data preprocessing followed by topic
and text extraction. In secong phase, web scraping
and text classification takes place.
III. METHODOLOGY
In this section, we present empirical experiments to
evaluate the proposed method of rumor detection.
Dataset collection from social media
Data Preprocessing
Topic Extraction via LDA
Text Extraction
Web scraping for News Extraction
no News yes
found?
2014] suffer from delays and limited coverage they
only work after a rumor has gained attention of the
Rumor Not a Rumor
486 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence)
Fig.1. Methodology
3.1 Data Collection
We started with selecting any one of the social media
sites from where the data is to be collected, Twitter
was chosen for this purpose. Twitter API was
implemented and for a particular hashtag tweets were
collected and only text or content of the tweet was
save in a text file, pre-processing of the tweet content
is done so as to remove URL, emoticons, etc. This
generated our dataset which is unstructured in nature
and has only text content in it.
3.2 Data pre-processing
Collected data is cleaned by removing URL,
username, punctuation marks. The cleaned data is
then encoded to ASCII so that UTF-8 emoticons and
symbols a removed. The removal of emoticons from
the data is essential as their occurrence is high, which
tend to generate wrong output of topic modelling
algorithm. Next preprocessing step is removal of stop
words. Stop words are the word which does not
possess any meaning alone and are just text
connecters.
3.3 Keyword Generation
To generate topics from the preprocessed data, topic
modelling is performed via Latent Dirichlet
Allocation(LDA). Latent Dirichlet Allocation(LDA)
one of the most famous topic modelling algorithm
which take the text corpus as input and works on that
corpus to generate topic keywords.
LDA procedure:
1. Traverses all the documents.
2. Assign each word in the document to one of
the k topic specified in input.
3. This word distributions and topic
representation for all topics and documents
respectively.
4. Topic representation improvement is done
by finding the probabilities.
a. p(topic t | document d)
b. p(word w| topic t)
5. By finding (topic t’ | document d) * p(word
w | topic t’) a word is reassigned to a new
topic t’.
We configured LDA to generate one topic with three
keywords, these keywords represents the dataset as
whole and serves as query for our News website
scraping step.
3.4 News website scrapping
Most newspapers and news channels have developed
there news websites to take advantage of the large
audience internet has and since then these news
websites has become the major source of credible and
verified news for people all over the world. The
strong brand recognition and credibility has made
them so famous on internet. These news websites are
trusted just because they publish all the news article
with some credible source or evidence, they never
publish any hoaxes or unconfirmed content, so these
news websites are always considered as credible
news source, and because of this we will use feature
of these news sites in our system for detecting rumor.
We choose 4 trusted news websites:
1. Associated Press
2. The New York Times
3. Yahoo News
4. FAROO
We did web scraping of these four news websites
with the keywords generated in keyword generation
step as search query, “AND” search was done with
these keyword so that articles related to all of the key
words are search i.e query is searched for articles as a
whole not for individual keyword, otherwise if no
articles are there for the query “OR” search will
search for article for individual keyword, which will
not good for our s ystem and will generate fault
results. After scraping these sites for articles the links
of these articles are save and displayed in the GUI.
3.5 Detection
In detection module we will the veracity of the topic
is check by the results from the news sites, if the after
scraping any one of the news sites return even one
article which will be related to topic only, the topic
will be classifies a not a rumor, as there is some
credible source of that topic is present, else if all the
news sites results are empty that is there is no article
related to this topic is found, we will assign this topic
as possible rumor as no credible source if found.
2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence) 487
IV. RESULT AND ANALYSIS
Fig. 2. GUI Main Menu
Fig. 2 illustrates, the main menu of our system from
this we can open all the three modules out the rumor
detection system
Fig. 3. Data Collection
Fig. 3 illustrates, the streaming of tweets form twitter
API for hashtag “trump”
Fig. 4. Dataset file
Fig. 4 illustrates, the dataset file generated which
contain the tweets text after preprocessing.
Fig. 5. LDA output
Fig. 5 illustrates, the output generated by the LDA
algorithm with our dataset as input. Fig. 6 illustrates
the output of the news websites. Here article links are
shown for all four news sites. Fig. 7 illustrates, the
assigning a topic as a rumor on the basis of the output
of the news websites.
Fig. 6. News site scraping output
Fig. 7. Detection output
The system is developed using Python programming
language. The system is tested for 200 piece of
information, which included 150 rumor topics and 50
non rumor topics. To measure the performance of the
proposed system, confusion matrix generated as
shown in the Table 1. As the table show out of 150
rumor topics our system detected 140 correctly and
10 incorrectly, whereas for 50 non rumor topics 44
were correctly detected and 6 incorrectly.
488 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence)
TABLE 1 : Confusion matrix for 200 sample topics
N
o
t
a
r
umo
r
Ru
m
or To
t
al
N
o
a
r
umo
r
44 6 50
Ru
m
o
r
10 140 150
To
al 54 146 200
Based on the generated confusion matrix, our system
has an accuracy of 92% having 96% precision and
70% recall.
CONCLUSION
Social media is a big platform for sharing and
spreading information, as it is easy to use and
millions of users. But as much as social media is a
boon for us it also has its disadvantage which is
spreading of false information or rumor. Detection of
rumor has a major importance as it will help us to
reduce or stop the speeding of false information
which can cause harm or create disturbance among
people.
So, for detecting rumor we designed a system where
we can collect information from any social media
sites posts, pre-processed it to generate topic specific
data base. We then applied topic modelling on our
dataset to generate three keywords which give the
meaning of our dataset that is what the dataset is
about. These keywords were search on our selected
four news websites and news articles were extracted
from the results. If no article was found in all the four
sites the new assigned that topic as rumor other wise
if article was found its was assigned as not a rumor.
There is still possibility of improving the
effectiveness of out rumor detection system as more
number of news websites can be added so that a
broader search is provided to the system, also our
system is not limited to any one social media site,
data can be collected from any of the social media
site by implementing its API for example like
Facebook, LinkedIn, etc. Also more the number
tweets for a particular hashtag is collected, more
accurate topic will be generated by LDA.
REFERENCES
[1] X. Yu, F. Yang, M. Yang and Y. Liu.
“Automatic detection of rumor on sina weibo.”
In ACM SIGKDD, page 13. ACM, 2012.
[2] M. Mendoza, C. Castillo and B. Poblete.
“Information credibility on twitter.” In the 20th
international conference on World wide web,
pages 675–684. ACM, 2011.
[3] A. Gupta and P. Kumaraguru. Credibility
ranking of tweets during high impact events. In
ACM, 2012.
[4] S. Kwon, M. Cha, K. Jung, W. Chen, and Y.
Wang. “Prominent features of rumor propagation
in online social media.” In ICDM 2013 pages
1103–1108. IEEE, 2013.
[5] M. Meiss, M. Conover, J. Ratkiewicz, F.
Menczer, B. Gonçalves and A. Flammini.
“Detecting and tracking political abuse in social
media.” In ICWSM, 2011.
[6] B. Gonçalves, S. Patil, M. Conover, F. Menczer,
M. Meiss and A. Flammini. “Truthy: mapping
the spread of astroturf in microblog streams.” In
20th international conference comp.
[7] A. Gupta, A. Joshi, P. Kumaraguru, and H.
Lamba. “Faking sandy: characterizing and
identifying fake images on twitter during
hurricane sandy”. In International World Wide
Web Conferences Steering Committee, 2013.
[8] Qiaozhu Mei, Paul Resnick and Zhe Zhao.
“Enquiring minds: Early detection of rumors in
social media from enquiry posts”. In WWW,
2015.
[9] Jing Ma, Wei gao, Prasenjit Mitra, Sejeong
Kwon, Bernard J. Jansen, Kam-FaiWong,
Meeyoung Cha. “Detecting Rumors from
Microblogs with Recurrent Neural Networks.” In
IJCAI-16, pages 3818-3824 , 2016
[10] Justin Cheng, Adrien Friggeri, Dean Eckles, and
Lada A Adamic. “Rumor cascades.” In ICWSM,
2014.
[11] Friggeri, L. A. Adamic, D. Eckles, and J. Cheng.
“Rumor cascades.” In 8
th
International AAAI
Conference, 2014.
[12] H. Lamba, A. Gupta and P. Kumaraguru. “$1.00
per rt# bostonmarathon# prayforboston:
Analyzing fake content on twitter”. In eCRS,
2013, pp1-12.
[13] A. Roseway, S. Counts, M. R. Morris, A. Hoff,
and J. Schwarz. “Tweeting is believing?” In
ACM 2012 conference, pages 441–450. ACM,
2012.
[14] S. Sun, H. Liu, J. He, and X. Du. “Detecting
event rumors on sina weibo automatically.” In
Springer 2013, pages 120–131.
2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence) 489
... Thus, social media platforms (i.e., Twitter, Facebook, etc.) are considered powerful tools through which news and information can be rapidly transmitted and propagated. These platforms empowered their significance to be the essence of information and news source for individuals through the WWW [1]. However, social media and UGC platforms are a double-edged sword. ...
... Another method introduced was the combination of sentiment analysis with topic modeling. In it, the authors extracted the sentiment from replies or retweets to analyze the general opinion on topics and assist LDA in the classification of fake news [47,48]. ...
Preprint
Full-text available
The rise of social media has ignited an unprecedented circulation of false information in our society. It is even more evident in times of crises, such as the COVID-19 pandemic. Fact-checking efforts have expanded greatly and have been touted as among the most promising solutions to fake news, especially in times like these. Several studies have reported the development of fact-checking organizations in Western societies, albeit little attention has been given to the Global South. Here, to fill this gap, we introduce a novel Markov-inspired computational method for identifying topics in tweets. In contrast to other topic modeling approaches, our method clusters topics and their current evolution in a predefined time window. Through these, we collected data from Twitter accounts of two Brazilian fact-checking outlets and presented the topics debunked by these initiatives in fortnights throughout the pandemic. By comparing these organizations, we could identify similarities and differences in what was shared by them. Our method resulted in an important technique to cluster topics in a wide range of scenarios, including an infodemic -- a period overabundance of the same information. In particular, the data clearly revealed a complex intertwining between politics and the health crisis during this period. We conclude by proposing a generic model which, in our opinion, is suitable for topic modeling and an agenda for future research.
... Another method introduced was the combination of sentiment analysis with topic modeling. In it, the authors extracted the sentiment from replies or retweets to analyze the general opinion on topics to assist LDA in the classification of fake news [49,50]. ...
Article
The rise of social media has ignited an unprecedented circulation of false information in our society. It is even more evident in times of crisis, such as the COVID-19 pandemic. Fact-checking efforts have significantly expanded and have been touted as among the most promising solutions to fake news. Several studies have reported the development of fact-checking organizations in Western societies, albeit little attention has been given to the Global South. Here, to fill this gap, we introduce a novel Markov-inspired computational method for identifying topics in tweets. In contrast to other topic modeling approaches, our method clusters topics and their current evolution in a predefined time window. To conduct our experiments, we collected data from Twitter accounts of two Brazilian fact-checking outlets and presented the topics debunked by these initiatives in fortnights throughout the pandemic. By comparing these organizations, we could identify similarities and differences in what was shared by them. Our method resulted in an important technique to cluster topics in a wide range of scenarios, including an infodemic-a period overabundance of the same information. In particular, our results revealed a complex intertwining between politics and the health crisis during this period. We conclude by proposing a novel method which, in our opinion, is suitable for topic modeling and also an agenda for future research.
Chapter
The growth of social media has caused people to post a significant content online. They find it easier to communicate in the virtual world rather than to talk to a person and share their problems. This has given rise to a pertinent question—can we use social media to help these youngsters to understand the state of their mental health? If so, how can we do it without manual intervention so that they do not feel uncomfortable to share their problems? The main purpose of this research was to find out a way in which we can detect mental health disorders without manual intervention. Machine learning can be the right solution to this problem. We used social media sites to analyse the mental state of the users. Machine learning models are trained to analyse the mental health of the people. Machine learning algorithms are employed to analyse the mental health disorders based on social media posts. Sentiment analysis can be used to detect the tone of the user from the message. This tone is useful in identifying whether the tweets are positive or negative. The proposed system uses the popular social media platform—Twitter to detect depression in the users. The tweets collected from Twitter users are used to train the machine learning model to detect whether the tweet is positive or negative. We have used logistic regression to train the model. This results in a model that is able to accurately predict whether the Twitter user is depressed or not. This can be used to detect the depression in the users and link to health workers or helplines which help the depressed person cope with their mental illness.
Article
Full-text available
Social media has become the primary source for rumor spreading, and information quality is an increasingly important issue in this context. In the last years, many researchers have been working on methods to improve the rumor classification, especially on the identification of fake news in social media, with good results. However, due to the complexity of natural language, this task presents difficult challenges, and many research opportunities. This survey analyzes 87 distinct publications, which were systematically selected out of 1333 candidates. This work covers eight years of research on fake news applied in social media and presents the main methods, text and user features, and datasets used in literature.
Conference Paper
Full-text available
Microblogging platforms are an ideal place for spreading rumors and automatically debunking rumors is a crucial problem. To detect rumors, existing approaches have relied on hand-crafted features for employing machine learning algorithms that require daunting manual effort. Upon facing a dubious claim, people dispute its truthfulness by posting various cues over time, which generates long-distance dependencies of evidence. This paper presents a novel method that learns continuous representations of microblog events for identifying rumors. The proposed model is based on recurrent neural networks (RNN) for learning the hidden representations that capture the variation of contextual information of relevant posts over time. Experimental results on datasets from two real-world microblog platforms demonstrate that (1) the RNN method outperforms state-of-the-art rumor detection models that use hand-crafted features; (2) performance of the RNN-based algorithm is further improved via sophisticated recurrent units and extra hidden layers; (3) RNN-based method detects rumors more quickly and accurately than existing techniques, including the leading online rumor debunking services.
Conference Paper
Full-text available
Sina Weibo has become one of the most popular social networks in China. In the meantime, it also becomes a good place to spread various spams. Unlike previous studies on detecting spams such as ads, pornographic messages and phishing, we focus on identifying event rumors (rumors about social events), which are more harmful than other kinds of spams especially in China. To detect event rumors from enormous posts, we studied the characteristics of event rumors and extracted features which can distinguish rumors from ordinary posts. The experiments conducted on real dataset show that the new features are effective to improve the rumor classifier. Further analysis of the event rumors reveals that they can be classified into 4 different types. We propose an approach for detecting one major type, text-picture unmatched event rumors. The experiment demonstrates that this approach is well-performed.
Conference Paper
Full-text available
Online social media has emerged as one of the prominent channels for dissemination of information during real world events. Malicious content is posted online during events, which can result in damage, chaos and monetary losses in the real world. We analyzed one such media i.e. Twitter, for content generated during the event of Boston Marathon Blasts, that occurred on April, 15th, 2013. A lot of fake content and malicious profiles originated on Twitter network during this event. The aim of this work is to perform in-depth characterization of what factors influenced in malicious content and profiles becoming viral. Our results showed that 29% of the most viral content on Twitter, during the Boston crisis were rumors and fake content; while 51% was generic opinions and comments; and rest was true information.We found that large number of users with high social reputation and verified accounts were responsible for spreading the fake content. Next, we used regression prediction model, to verify that, overall impact of all users who propagate the fake content at a given time, can be used to estimate the growth of that content in future. Many malicious accounts were created on Twitter during the Boston event, that were later suspended by Twitter. We identified over six thousand such user profiles, we observed that the creation of such profiles surged considerably right after the blasts occurred. We identified closed community structure and star formation in the interaction network of these suspended profiles amongst themselves.
Conference Paper
Full-text available
The problem of identifying rumors is of practical importance especially in online social networks, since information can diffuse more rapidly and widely than the offline counterpart. In this paper, we identify characteristics of rumors by examining the following three aspects of diffusion: temporal, structural, and linguistic. For the temporal characteristics, we propose a new periodic time series model that considers daily and external shock cycles, where the model demonstrates that rumor likely have fluctuations over time. We also identify key structural and linguistic differences in the spread of rumors and non-rumors. Our selected features classify rumors with high precision and recall in the range of 87% to 92%, that is higher than other states of the arts on rumor classification.
Conference Paper
Full-text available
In today's world, online social media plays a vital role during real world events, especially crisis events. There are both positive and negative effects of social media coverage of events, it can be used by authorities for effective disaster management or by malicious entities to spread rumors and fake news. The aim of this paper, is to highlight the role of Twitter, during Hurricane Sandy (2012) to spread fake images about the disaster. We identified 10,350 unique tweets containing fake images that were circulated on Twitter, during Hurricane Sandy. We performed a characterization analysis, to understand the temporal, social reputation and influence patterns for the spread of fake images. Eighty six percent of tweets spreading the fake images were retweets, hence very few were original tweets. Our results showed that top thirty users out of 10,215 users (0.3%) resulted in 90% of the retweets of fake images; also network links such as follower relationships of Twitter, contributed very less (only 11%) to the spread of these fake photos URLs. Next, we used classification models, to distinguish fake images from real images of Hurricane Sandy. Best results were obtained from Decision Tree classifier, we got 97% accuracy in predicting fake images from real. Also, tweet based features were very effective in distinguishing fake images tweets from real, while the performance of user based features was very poor. Our results, showed that, automated techniques can be used in identifying real images from fake images posted on Twitter.
Article
Full-text available
Twitter has evolved from being a conversation or opinion sharing medium among friends into a platform to share and disseminate information about current events. Events in the real world create a corresponding spur of posts (tweets) on Twitter. Not all content posted on Twitter is trustworthy or useful in providing information about the event. In this paper, we analyzed the credibility of information in tweets corresponding to fourteen high impact news events of 2011 around the globe. From the data we analyzed, on average 30% of total tweets posted about an event contained situational information about the event while 14% was spam. Only 17% of the total tweets posted about the event contained situational awareness information that was credible. Using regression analysis, we identified the important content and sourced based features, which can predict the credibility of information in a tweet. Prominent content based features were number of unique characters, swear words, pronouns, and emoticons in a tweet, and user based features like the number of followers and length of username. We adopted a supervised machine learning and relevance feedback approach using the above features, to rank tweets according to their credibility score. The performance of our ranking algorithm significantly enhanced when we applied re-ranking strategy. Results show that extraction of credible information from Twitter can be automated with high confidence.
Conference Paper
Many previous techniques identify trending topics in social media, even topics that are not pre-defined. We present a technique to identify trending rumors, which we define as topics that include disputed factual claims. Putting aside any attempt to assess whether the rumors are true or false, it is valuable to identify trending rumors as early as possible. It is extremely difficult to accurately classify whether every individual post is or is not making a disputed factual claim. We are able to identify trending rumors by recasting the problem as finding entire clusters of posts whose topic is a disputed factual claim. The key insight is that when there is a rumor, even though most posts do not raise questions about it, there may be a few that do. If we can find signature text phrases that are used by a few people to express skepticism about factual claims and are rarely used to express anything else, we can use those as detectors for rumor clusters. Indeed, we have found a few phrases that seem to be used exactly that way, including: "Is this true?", "Really?", and "What?". Relatively few posts related to any particular rumor use any of these enquiry phrases, but lots of rumor diffusion processes have some posts that do and have them quite early in the diffusion. We have developed a technique based on searching for the enquiry phrases, clustering similar posts together, and then collecting related posts that do not contain these simple phrases. We then rank the clusters by their likelihood of really containing a disputed factual claim. The detector, which searches for the very rare but very informative phrases, combined with clustering and a classifier on the clusters, yields surprisingly good performance. On a typical day of Twitter, about a third of the top 50 clusters were judged to be rumors, a high enough precision that human analysts might be willing to sift through them.
Article
Online social networks provide a rich substrate for rumor propagation. Information received via friends tends to be trusted, and online social networks allow individuals to transmit information to many friends at once. By referencing known rumors from Snopes.com, a popular website documenting memes and urban legends, we track the propagation of thousands of rumors appearing on Facebook. From this sample we infer the rates at which rumors from different categories and of varying truth value are uploaded and reshared. We find that rumor cascades run deeper in the social network than reshare cascades in general. We then examine the effect of individual reshares receiving a comment containing a link to a Snopes article on the evolution of the cascade. We find that receiving such a comment increases the likelihood that a reshare of a rumor will be deleted. Furthermore, large cascades are able to accumulate hundreds of Snopes comments while continuing to propagate. Finally, using a dataset of rumors copied and pasted from one status update to another, we show that rumors change over time and that different variants tend to dominate different bursts in popularity. Copyright © 2014, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Article
The problem of gauging information credibility on social networks has received considerable attention in recent years. Most previous work has chosen Twitter, the world's largest micro-blogging platform, as the premise of research. In this work, we shift the premise and study the problem of information credibility on Sina Weibo, China's leading micro-blogging service provider. With eight times more users than Twitter, Sina Weibo is more of a Facebook-Twitter hybrid than a pure Twitter clone, and exhibits several important characteristics that distinguish it from Twitter. We collect an extensive set of microblogs which have been confirmed to be false rumors based on information from the official rumor-busting service provided by Sina Weibo. Unlike previous studies on Twitter where the labeling of rumors is done manually by the participants of the experiments, the official nature of this service ensures the high quality of the dataset. We then examine an extensive set of features that can be extracted from the microblogs, and train a classifier to automatically detect the rumors from a mixed set of true information and false information. The experiments show that some of the new features we propose are indeed effective in the classification, and even the features considered in previous studies have different implications with Sina Weibo than with Twitter. To the best of our knowledge, this is the first study on rumor analysis and detection on Sina Weibo.