Conference PaperPDF Available

Detection of Rumor in Social Media

January 2018

January 2018

DOI:10.1109/CONFLUENCE.2018.8442442

Conference: 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence)

Authors:

Misha Kakkar

Amity University

Content uploaded by Misha Kakkar

Content may be subject to copyright.

DETECTION OF RUMOR IN SOCIAL MEDIA

Manan Vohra Misha Kakkar

Department of CSE Department of CSE

Amity University, Uttar Pradesh Amity University, Uttar Pradesh

India India

mananvohra564@gmail.com mkakkar@amity.edu

Abstract - Proliferation of internet made social media

popular among people with approximately 2.34 billion

users worldwide, this popularity made social media a

one of the major information source but it also made

spread of rumor very easy and hence the information

on social media carry a lot of false claims. Most of the

previous works on detection rumors focused on the

manually, language processing or creating a directed

graph. In this paper, we have proposed a system for

automatic rumor detection on social media. Our system

will collect data from social media sites like Twitter and

will preprocess this data to generate topic and check for

its veracity. Detection of rumor may have practical

application for journalists, news readers, emergency

services and financial markets and to help minimize the

spread of false information on social media.

Keywords- Rumor Detection, Social Media, Rumor

Spreading

I. INTRODUCTION

Today social media has taken over the traditional

methods of communication and with this has brought

a change on how information is being delivered to

large audience, making it easy to spread information

within short period. But veracity of the information

or the news that is being spread all over the social

media is not confirmed, and many times it has

happened that is has spread chaos among people all

over the world. Today most of the people get their

news from social media and get exposed to a daily

dose of rumors, hoaxes, conspiracy theories and

misleading news, and all this gets mixed with the

correct information from the honest or reliable

sources making truth harder to discern. Social media

has an enormous audience which makes false

information to viral as likely as correct information,

as people use social media on daily basis and with a

few share or retweet misleading information reaches

to many people who further shares it and process

continues spreading the rumor to wide audience.

Various experiments results shows that people most

people tend to trust links or information that there

friends share without even verifying the source of

information. Also people believe on the

misinformation they get from the links they click on

social media, for example fake news and ads, huge

number of people becomes bait to this and open the

page, hence getting fake news and page owner earn

money from ads on that page, so fake news could

also make money while polluting the social media

with falsehood, these sites or page are commonly

known as click bait sites which manufacture hoaxes

to make money from advertisements and hyper

partisan sites publish and spread rumors on social

media to influence public opinion.

Social media sites like Facebook, Twitter lack tool

which can detect any information that is being spread

on the sites for being a rumor, and can shut it down if

necessary as soon as possible before it creates any

problem. Presently these giant social media sites are

totally dependent on their staff and time to assign any

information a rumor, which could have been

successful 10 or 15 years back when social media

usage and data on it was less but currently social

media sites have billions on users and huge amount

of data growing exponentially which makes it really

difficult to manually check the veracity of all the

information flowing on social media.

In June 2016 a fake “Facebook privacy notice”

spread like a forest fire on social media, urging users

to copy and paste a particular piece of text on their

Facebook wall which will help them retain their

profile privacy, like that thing they share or photos

they upload or their personal information, which

ultimately turned out to be a rumor and millions of

Facebook users shared this and became a victim this

false claim. A rumor is a piece of information whose

veracity and source is not confirmed and can bring

harm in many ways.

Thus in this paper, a rumor detection system is

proposed, which determine the authenticity of an

information and classify it as rumor or not a rumor.

Paper is organized as section II discusses literature

review. Problem is defined in section III.

Methodology used to develop this rumor detection

system is explained in section IV. Result and analysis

485

978-1-5386-1719-9/18/$31.00 c

2018 IEEE

is presented in section V followed by conclusion in

section VI.

II. LITERATURE REVIEW

Much previous works has tried to develop for

complex problem, like detecting the nature that is

true or false of a meme that is spreading over social

media [1, 2, 3, 4]. “Truthy” system attempted to solve

the meme rumor problem by categorising on basis of

their spread that is spreading “organically” or

“astroturf” campaign that is spread by a single person

or organisation as rumor[5, 6].

Recent studies collected image of Hurricane Sandy

from twitter, which contained both fake and real

images of Hurricane Sandy. From which they

randomly selected 5767 tweets and analysed them by

using the properties of a tweet like number of friends,

status, followers, its content and metadata, on the

basis of which they categorised them a real of fake

[7].

In recent studies Zhao, et al. proposed early detection

of rumor using the cue terms like “unconfirmed”,

“not true” or “debunk” in the tweet content to find

out whether there is uncertainty or denial or

questioning in it. These terms captures the tweet

content hidden implications and they categorized the

questioning or uncertainty tweets are possible rumor.

They were able to define the temporal traits of non-

rumor or rumor events but the clear cut difference

was not there[8].

In their work Jing Ma et al. presented a RNN-based

rumor detection model. Using Sina Weibo and twitter

data they developed two microblog dataset, and

proposed a method that converted the incoming

stream of microblog post as continuous variable-

length rime series, and presented RNNs with

different kind of layers and hidden units for

classification [9].

Friggeri, et al. characterize the structure of rumor that

is spread on Facebook. They considered copying- and-

pasting of test as test post and uploading and re-

sharing of photos as two major technological

affordances and near exact path that a rumor take on

social network was constructed. Within the Facebook

they measured the longevity of instance and

replication and analysed comments with links to

Snops.com a rumor debunking website[10].

However both [Jing Ma et al., 2016][ Friggeri, et al.

rumor debunking sites like Snops.com, but our model

is not dependent on any of the rumor debunking site

and works to early detect a rumor.

III. PROBLEM DEFINITION

Social media has an enormous audience which makes

false information to viral as likely as correct

information, as people use social media on daily basis

and with a few share or retweet misleading

information reaches to many people who further

shares it and process continues spreading the rumor

to wide audience. A rumor detection system works in

a two-phase manner. In first phase, extracting some

piece of information from social networking site(s).

In second phase it classifies whether that piece of

information is correct or not.

This paper present a system which works in the same

two phase manner. First phase includes process of

data collection, data preprocessing followed by topic

and text extraction. In secong phase, web scraping

and text classification takes place.

III. METHODOLOGY

In this section, we present empirical experiments to

evaluate the proposed method of rumor detection.

Dataset collection from social media

Data Preprocessing

Topic Extraction via LDA

Text Extraction

Web scraping for News Extraction

no News yes

found?

2014] suffer from delays and limited coverage they

only work after a rumor has gained attention of the

Rumor Not a Rumor

486 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Conﬂuence)

Fig.1. Methodology

3.1 Data Collection

We started with selecting any one of the social media

sites from where the data is to be collected, Twitter

was chosen for this purpose. Twitter API was

implemented and for a particular hashtag tweets were

collected and only text or content of the tweet was

save in a text file, pre-processing of the tweet content

is done so as to remove URL, emoticons, etc. This

generated our dataset which is unstructured in nature

and has only text content in it.

3.2 Data pre-processing

Collected data is cleaned by removing URL,

username, punctuation marks. The cleaned data is

then encoded to ASCII so that UTF-8 emoticons and

symbols a removed. The removal of emoticons from

the data is essential as their occurrence is high, which

tend to generate wrong output of topic modelling

algorithm. Next preprocessing step is removal of stop

words. Stop words are the word which does not

possess any meaning alone and are just text

connecters.

3.3 Keyword Generation

To generate topics from the preprocessed data, topic

modelling is performed via Latent Dirichlet

Allocation(LDA). Latent Dirichlet Allocation(LDA)

one of the most famous topic modelling algorithm

which take the text corpus as input and works on that

corpus to generate topic keywords.

LDA procedure:

1. Traverses all the documents.

2. Assign each word in the document to one of

the k topic specified in input.

3. This word distributions and topic

representation for all topics and documents

respectively.

4. Topic representation improvement is done

by finding the probabilities.

a. p(topic t | document d)

b. p(word w| topic t)

5. By finding (topic t’ | document d) * p(word

w | topic t’) a word is reassigned to a new

topic t’.

We configured LDA to generate one topic with three

keywords, these keywords represents the dataset as

whole and serves as query for our News website

scraping step.

3.4 News website scrapping

Most newspapers and news channels have developed

there news websites to take advantage of the large

audience internet has and since then these news

websites has become the major source of credible and

verified news for people all over the world. The

strong brand recognition and credibility has made

them so famous on internet. These news websites are

trusted just because they publish all the news article

with some credible source or evidence, they never

publish any hoaxes or unconfirmed content, so these

news websites are always considered as credible

news source, and because of this we will use feature

of these news sites in our system for detecting rumor.

We choose 4 trusted news websites:

1. Associated Press

2. The New York Times

3. Yahoo News

4. FAROO

We did web scraping of these four news websites

with the keywords generated in keyword generation

step as search query, “AND” search was done with

these keyword so that articles related to all of the key

words are search i.e query is searched for articles as a

whole not for individual keyword, otherwise if no

articles are there for the query “OR” search will

search for article for individual keyword, which will

not good for our s ystem and will generate fault

results. After scraping these sites for articles the links

of these articles are save and displayed in the GUI.

3.5 Detection

In detection module we will the veracity of the topic

is check by the results from the news sites, if the after

scraping any one of the news sites return even one

article which will be related to topic only, the topic

will be classifies a not a rumor, as there is some

credible source of that topic is present, else if all the

news sites results are empty that is there is no article

related to this topic is found, we will assign this topic

as possible rumor as no credible source if found.

2018 8th International Conference on Cloud Computing, Data Science & Engineering (Conﬂuence) 487

IV. RESULT AND ANALYSIS

Fig. 2. GUI Main Menu

Fig. 2 illustrates, the main menu of our system from

this we can open all the three modules out the rumor

detection system

Fig. 3. Data Collection

Fig. 3 illustrates, the streaming of tweets form twitter

API for hashtag “trump”

Fig. 4. Dataset file

Fig. 4 illustrates, the dataset file generated which

contain the tweets text after preprocessing.

Fig. 5. LDA output

Fig. 5 illustrates, the output generated by the LDA

algorithm with our dataset as input. Fig. 6 illustrates

the output of the news websites. Here article links are

shown for all four news sites. Fig. 7 illustrates, the

assigning a topic as a rumor on the basis of the output

of the news websites.

Fig. 6. News site scraping output

Fig. 7. Detection output

The system is developed using Python programming

language. The system is tested for 200 piece of

information, which included 150 rumor topics and 50

non rumor topics. To measure the performance of the

proposed system, confusion matrix generated as

shown in the Table 1. As the table show out of 150

rumor topics our system detected 140 correctly and

10 incorrectly, whereas for 50 non rumor topics 44

were correctly detected and 6 incorrectly.

488 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Conﬂuence)

TABLE 1 : Confusion matrix for 200 sample topics

umo

or To

umo

44 6 50

10 140 150

al 54 146 200

Based on the generated confusion matrix, our system

has an accuracy of 92% having 96% precision and

70% recall.

CONCLUSION

Social media is a big platform for sharing and

spreading information, as it is easy to use and

millions of users. But as much as social media is a

boon for us it also has its disadvantage which is

spreading of false information or rumor. Detection of

rumor has a major importance as it will help us to

reduce or stop the speeding of false information

which can cause harm or create disturbance among

people.

So, for detecting rumor we designed a system where

we can collect information from any social media

sites posts, pre-processed it to generate topic specific

data base. We then applied topic modelling on our

dataset to generate three keywords which give the

meaning of our dataset that is what the dataset is

about. These keywords were search on our selected

four news websites and news articles were extracted

from the results. If no article was found in all the four

sites the new assigned that topic as rumor other wise

if article was found its was assigned as not a rumor.

There is still possibility of improving the

effectiveness of out rumor detection system as more

number of news websites can be added so that a

broader search is provided to the system, also our

system is not limited to any one social media site,

data can be collected from any of the social media

site by implementing its API for example like

Facebook, LinkedIn, etc. Also more the number

tweets for a particular hashtag is collected, more

accurate topic will be generated by LDA.

REFERENCES

[1] X. Yu, F. Yang, M. Yang and Y. Liu.

“Automatic detection of rumor on sina weibo.”

In ACM SIGKDD, page 13. ACM, 2012.

[2] M. Mendoza, C. Castillo and B. Poblete.

“Information credibility on twitter.” In the 20th

international conference on World wide web,

pages 675–684. ACM, 2011.

[3] A. Gupta and P. Kumaraguru. Credibility

ranking of tweets during high impact events. In

ACM, 2012.

[4] S. Kwon, M. Cha, K. Jung, W. Chen, and Y.

Wang. “Prominent features of rumor propagation

in online social media.” In ICDM 2013 pages

1103–1108. IEEE, 2013.

[5] M. Meiss, M. Conover, J. Ratkiewicz, F.

Menczer, B. Gonçalves and A. Flammini.

“Detecting and tracking political abuse in social

media.” In ICWSM, 2011.

[6] B. Gonçalves, S. Patil, M. Conover, F. Menczer,

M. Meiss and A. Flammini. “Truthy: mapping

the spread of astroturf in microblog streams.” In

20th international conference comp.

[7] A. Gupta, A. Joshi, P. Kumaraguru, and H.

Lamba. “Faking sandy: characterizing and

identifying fake images on twitter during

hurricane sandy”. In International World Wide

Web Conferences Steering Committee, 2013.

[8] Qiaozhu Mei, Paul Resnick and Zhe Zhao.

“Enquiring minds: Early detection of rumors in

social media from enquiry posts”. In WWW,

2015.

[9] Jing Ma, Wei gao, Prasenjit Mitra, Sejeong

Kwon, Bernard J. Jansen, Kam-FaiWong,

Meeyoung Cha. “Detecting Rumors from

Microblogs with Recurrent Neural Networks.” In

IJCAI-16, pages 3818-3824 , 2016

[10] Justin Cheng, Adrien Friggeri, Dean Eckles, and

Lada A Adamic. “Rumor cascades.” In ICWSM,

2014.

[11] Friggeri, L. A. Adamic, D. Eckles, and J. Cheng.

“Rumor cascades.” In 8

International AAAI

Conference, 2014.

[12] H. Lamba, A. Gupta and P. Kumaraguru. “$1.00

per rt# bostonmarathon# prayforboston:

Analyzing fake content on twitter”. In eCRS,

2013, pp1-12.

[13] A. Roseway, S. Counts, M. R. Morris, A. Hoff,

and J. Schwarz. “Tweeting is believing?” In

ACM 2012 conference, pages 441–450. ACM,

2012.

[14] S. Sun, H. Liu, J. He, and X. Du. “Detecting

event rumors on sina weibo automatically.” In

Springer 2013, pages 120–131.

2018 8th International Conference on Cloud Computing, Data Science & Engineering (Conﬂuence) 489

Arabic Fake News Detection Using Deep Learning

Article

Full-text available

Jan 2022
CMC-COMPUT MATER CON

Fake news agenda in the era of COVID-19: Identifying trends through fact-checking content

Preprint

Full-text available

Dec 2020

The rise of social media has ignited an unprecedented circulation of false information in our society. It is even more evident in times of crises, such as the COVID-19 pandemic. Fact-checking efforts have expanded greatly and have been touted as among the most promising solutions to fake news, especially in times like these. Several studies have reported the development of fact-checking organizations in Western societies, albeit little attention has been given to the Global South. Here, to fill this gap, we introduce a novel Markov-inspired computational method for identifying topics in tweets. In contrast to other topic modeling approaches, our method clusters topics and their current evolution in a predefined time window. Through these, we collected data from Twitter accounts of two Brazilian fact-checking outlets and presented the topics debunked by these initiatives in fortnights throughout the pandemic. By comparing these organizations, we could identify similarities and differences in what was shared by them. Our method resulted in an important technique to cluster topics in a wide range of scenarios, including an infodemic -- a period overabundance of the same information. In particular, the data clearly revealed a complex intertwining between politics and the health crisis during this period. We conclude by proposing a generic model which, in our opinion, is suitable for topic modeling and an agenda for future research.

Fake news agenda in the era of COVID-19: Identifying trends through fact-checking content

Article

Dec 2020

The rise of social media has ignited an unprecedented circulation of false information in our society. It is even more evident in times of crisis, such as the COVID-19 pandemic. Fact-checking efforts have significantly expanded and have been touted as among the most promising solutions to fake news. Several studies have reported the development of fact-checking organizations in Western societies, albeit little attention has been given to the Global South. Here, to fill this gap, we introduce a novel Markov-inspired computational method for identifying topics in tweets. In contrast to other topic modeling approaches, our method clusters topics and their current evolution in a predefined time window. To conduct our experiments, we collected data from Twitter accounts of two Brazilian fact-checking outlets and presented the topics debunked by these initiatives in fortnights throughout the pandemic. By comparing these organizations, we could identify similarities and differences in what was shared by them. Our method resulted in an important technique to cluster topics in a wide range of scenarios, including an infodemic-a period overabundance of the same information. In particular, our results revealed a complex intertwining between politics and the health crisis during this period. We conclude by proposing a novel method which, in our opinion, is suitable for topic modeling and also an agenda for future research.

Social Media Mining to Detect Mental Health Disorders Using Machine Learning

Chapter

Jan 2023

The growth of social media has caused people to post a significant content online. They find it easier to communicate in the virtual world rather than to talk to a person and share their problems. This has given rise to a pertinent question—can we use social media to help these youngsters to understand the state of their mental health? If so, how can we do it without manual intervention so that they do not feel uncomfortable to share their problems? The main purpose of this research was to find out a way in which we can detect mental health disorders without manual intervention. Machine learning can be the right solution to this problem. We used social media sites to analyse the mental state of the users. Machine learning models are trained to analyse the mental health of the people. Machine learning algorithms are employed to analyse the mental health disorders based on social media posts. Sentiment analysis can be used to detect the tone of the user from the message. This tone is useful in identifying whether the tweets are positive or negative. The proposed system uses the popular social media platform—Twitter to detect depression in the users. The tweets collected from Twitter users are used to train the machine learning model to detect whether the tweet is positive or negative. We have used logistic regression to train the model. This results in a model that is able to accurately predict whether the Twitter user is depressed or not. This can be used to detect the depression in the users and link to health workers or helplines which help the depressed person cope with their mental illness.

A systematic mapping on automatic classification of fake news in social media

Article

Full-text available

Dec 2020

Social media has become the primary source for rumor spreading, and information quality is an increasingly important issue in this context. In the last years, many researchers have been working on methods to improve the rumor classification, especially on the identification of fake news in social media, with good results. However, due to the complexity of natural language, this task presents difficult challenges, and many research opportunities. This survey analyzes 87 distinct publications, which were systematically selected out of 1333 candidates. This work covers eight years of research on fake news applied in social media and presents the main methods, text and user features, and datasets used in literature.

Detecting Rumors from Microblogs with Recurrent Neural Networks

Conference Paper

Full-text available

Jul 2016

Microblogging platforms are an ideal place for spreading rumors and automatically debunking rumors is a crucial problem. To detect rumors, existing approaches have relied on hand-crafted features for employing machine learning algorithms that require daunting manual effort. Upon facing a dubious claim, people dispute its truthfulness by posting various cues over time, which generates long-distance dependencies of evidence. This paper presents a novel method that learns continuous representations of microblog events for identifying rumors. The proposed model is based on recurrent neural networks (RNN) for learning the hidden representations that capture the variation of contextual information of relevant posts over time. Experimental results on datasets from two real-world microblog platforms demonstrate that (1) the RNN method outperforms state-of-the-art rumor detection models that use hand-crafted features; (2) performance of the RNN-based algorithm is further improved via sophisticated recurrent units and extra hidden layers; (3) RNN-based method detects rumors more quickly and accurately than existing techniques, including the leading online rumor debunking services.

Detecting Event Rumors on Sina Weibo Automatically

Conference Paper

Full-text available

Apr 2013

Sina Weibo has become one of the most popular social networks in China. In the meantime, it also becomes a good place to spread various spams. Unlike previous studies on detecting spams such as ads, pornographic messages and phishing, we focus on identifying event rumors (rumors about social events), which are more harmful than other kinds of spams especially in China. To detect event rumors from enormous posts, we studied the characteristics of event rumors and extracted features which can distinguish rumors from ordinary posts. The experiments conducted on real dataset show that the new features are effective to improve the rumor classifier. Further analysis of the event rumors reveals that they can be classified into 4 different types. We propose an approach for detecting one major type, text-picture unmatched event rumors. The experiment demonstrates that this approach is well-performed.

1.00 per RT #BostonMarathon #PrayForBoston: Analyzing fake content on Twitter

Conference Paper

Full-text available

Sep 2013

Online social media has emerged as one of the prominent channels for dissemination of information during real world events. Malicious content is posted online during events, which can result in damage, chaos and monetary losses in the real world. We analyzed one such media i.e. Twitter, for content generated during the event of Boston Marathon Blasts, that occurred on April, 15th, 2013. A lot of fake content and malicious profiles originated on Twitter network during this event. The aim of this work is to perform in-depth characterization of what factors influenced in malicious content and profiles becoming viral. Our results showed that 29% of the most viral content on Twitter, during the Boston crisis were rumors and fake content; while 51% was generic opinions and comments; and rest was true information.We found that large number of users with high social reputation and verified accounts were responsible for spreading the fake content. Next, we used regression prediction model, to verify that, overall impact of all users who propagate the fake content at a given time, can be used to estimate the growth of that content in future. Many malicious accounts were created on Twitter during the Boston event, that were later suspended by Twitter. We identified over six thousand such user profiles, we observed that the creation of such profiles surged considerably right after the blasts occurred. We identified closed community structure and star formation in the interaction network of these suspended profiles amongst themselves.

Automatic Detection of Rumor on Sina Weibo

Data

Full-text available

May 2015

Prominent Features of Rumor Propagation in Online Social Media

Conference Paper

Full-text available

Dec 2013

The problem of identifying rumors is of practical importance especially in online social networks, since information can diffuse more rapidly and widely than the offline counterpart. In this paper, we identify characteristics of rumors by examining the following three aspects of diffusion: temporal, structural, and linguistic. For the temporal characteristics, we propose a new periodic time series model that considers daily and external shock cycles, where the model demonstrates that rumor likely have fluctuations over time. We also identify key structural and linguistic differences in the spread of rumors and non-rumors. Our selected features classify rumors with high precision and recall in the range of 87% to 92%, that is higher than other states of the arts on rumor classification.

Faking Sandy: characterizing and identifying fake images on Twitter during Hurricane Sandy

Conference Paper

Full-text available

May 2013

In today's world, online social media plays a vital role during real world events, especially crisis events. There are both positive and negative effects of social media coverage of events, it can be used by authorities for effective disaster management or by malicious entities to spread rumors and fake news. The aim of this paper, is to highlight the role of Twitter, during Hurricane Sandy (2012) to spread fake images about the disaster. We identified 10,350 unique tweets containing fake images that were circulated on Twitter, during Hurricane Sandy. We performed a characterization analysis, to understand the temporal, social reputation and influence patterns for the spread of fake images. Eighty six percent of tweets spreading the fake images were retweets, hence very few were original tweets. Our results showed that top thirty users out of 10,215 users (0.3%) resulted in 90% of the retweets of fake images; also network links such as follower relationships of Twitter, contributed very less (only 11%) to the spread of these fake photos URLs. Next, we used classification models, to distinguish fake images from real images of Hurricane Sandy. Best results were obtained from Decision Tree classifier, we got 97% accuracy in predicting fake images from real. Also, tweet based features were very effective in distinguishing fake images tweets from real, while the performance of user based features was very poor. Our results, showed that, automated techniques can be used in identifying real images from fake images posted on Twitter.

Credibility ranking of tweets during high impact events

Article

Full-text available

Apr 2012

Twitter has evolved from being a conversation or opinion sharing medium among friends into a platform to share and disseminate information about current events. Events in the real world create a corresponding spur of posts (tweets) on Twitter. Not all content posted on Twitter is trustworthy or useful in providing information about the event. In this paper, we analyzed the credibility of information in tweets corresponding to fourteen high impact news events of 2011 around the globe. From the data we analyzed, on average 30% of total tweets posted about an event contained situational information about the event while 14% was spam. Only 17% of the total tweets posted about the event contained situational awareness information that was credible. Using regression analysis, we identified the important content and sourced based features, which can predict the credibility of information in a tweet. Prominent content based features were number of unique characters, swear words, pronouns, and emoticons in a tweet, and user based features like the number of followers and length of username. We adopted a supervised machine learning and relevance feedback approach using the above features, to rank tweets according to their credibility score. The performance of our ranking algorithm significantly enhanced when we applied re-ranking strategy. Results show that extraction of credible information from Twitter can be automated with high confidence.

Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts

Conference Paper

May 2015

Many previous techniques identify trending topics in social media, even topics that are not pre-defined. We present a technique to identify trending rumors, which we define as topics that include disputed factual claims. Putting aside any attempt to assess whether the rumors are true or false, it is valuable to identify trending rumors as early as possible. It is extremely difficult to accurately classify whether every individual post is or is not making a disputed factual claim. We are able to identify trending rumors by recasting the problem as finding entire clusters of posts whose topic is a disputed factual claim. The key insight is that when there is a rumor, even though most posts do not raise questions about it, there may be a few that do. If we can find signature text phrases that are used by a few people to express skepticism about factual claims and are rarely used to express anything else, we can use those as detectors for rumor clusters. Indeed, we have found a few phrases that seem to be used exactly that way, including: "Is this true?", "Really?", and "What?". Relatively few posts related to any particular rumor use any of these enquiry phrases, but lots of rumor diffusion processes have some posts that do and have them quite early in the diffusion. We have developed a technique based on searching for the enquiry phrases, clustering similar posts together, and then collecting related posts that do not contain these simple phrases. We then rank the clusters by their likelihood of really containing a disputed factual claim. The detector, which searches for the very rare but very informative phrases, combined with clustering and a classifier on the clusters, yields surprisingly good performance. On a typical day of Twitter, about a third of the top 50 clusters were judged to be rumors, a high enough precision that human analysts might be willing to sift through them.

Rumor Cascades

Article

May 2014

Online social networks provide a rich substrate for rumor propagation. Information received via friends tends to be trusted, and online social networks allow individuals to transmit information to many friends at once. By referencing known rumors from Snopes.com, a popular website documenting memes and urban legends, we track the propagation of thousands of rumors appearing on Facebook. From this sample we infer the rates at which rumors from different categories and of varying truth value are uploaded and reshared. We find that rumor cascades run deeper in the social network than reshare cascades in general. We then examine the effect of individual reshares receiving a comment containing a link to a Snopes article on the evolution of the cascade. We find that receiving such a comment increases the likelihood that a reshare of a rumor will be deleted. Furthermore, large cascades are able to accumulate hundreds of Snopes comments while continuing to propagate. Finally, using a dataset of rumors copied and pasted from one status update to another, we show that rumors change over time and that different variants tend to dominate different bursts in popularity. Copyright © 2014, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Automatic detection of rumor on Sina Weibo

Article

Aug 2012

The problem of gauging information credibility on social networks has received considerable attention in recent years. Most previous work has chosen Twitter, the world's largest micro-blogging platform, as the premise of research. In this work, we shift the premise and study the problem of information credibility on Sina Weibo, China's leading micro-blogging service provider. With eight times more users than Twitter, Sina Weibo is more of a Facebook-Twitter hybrid than a pure Twitter clone, and exhibits several important characteristics that distinguish it from Twitter. We collect an extensive set of microblogs which have been confirmed to be false rumors based on information from the official rumor-busting service provided by Sina Weibo. Unlike previous studies on Twitter where the labeling of rumors is done manually by the participants of the experiments, the official nature of this service ensures the high quality of the dataset. We then examine an extensive set of features that can be extracted from the microblogs, and train a classifier to automatically detect the rumors from a mixed set of true information and false information. The experiments show that some of the new features we propose are indeed effective in the classification, and even the features considered in previous studies have different implications with Sina Weibo than with Twitter. To the best of our knowledge, this is the first study on rumor analysis and detection on Sina Weibo.

Detection of Rumor in Social Media

Recommended publications

Early Rumor Detection in Social Media Based on Graph Convolutional Networks

Transmission of Rumor and Criticism in Twitter after the Great Japan Earthquake

Are authorities denying or supporting? Detecting stance of authorities towards rumors in Twitter

Rumor Detection and Classification for Twitter Data