Conference PaperPDF Available

Detecting Malicious Social Bots: Story of a Never-Ending Clash

November 2019

November 2019

Conference: The 1st Multidisciplinary International Symposium on Disinformation in Open Online Media (MISDOOM'19)
At: Hamburg, Germany

Authors:

Italian National Research Council

Recently, studies on the characterization and detection of social bots were published at an impressive rate. By looking back at over ten years of research and experimentation on social bots detection, in this paper we aim at understanding past, present, and future research trends in this crucial field. In doing so, we discuss about one of the nastiest features of social bots-that is, their evolutionary nature. Then, we highlight the switch from supervised bot detection techniques - focusing on feature engineering and on the analysis of one account at a time - to unsupervised ones, where the focus is on proposing new detection algorithms and on the analysis of groups of accounts that behave in a coordinated and synchronized fashion. These unsupervised, group-analyses techniques currently represent the state-of-the-art in social bot detection. Going forward, we analyze the latest research trend in social bot detection in order to highlight a promising new development of this crucial field.

Content uploaded by Stefano Cresci

Content may be subject to copyright.

Detecting Malicious Social Bots:

Story of a Never-Ending Clash

Stefano Cresci[0000−0003−0170−2445]

Institute for Informatics and Telematics (IIT-CNR), Pisa, Italy

stefano.cresci@iit.cnr.it

Abstract. Recently, studies on the characterization and detection of

social bots were published at an impressive rate. By looking back at over

ten years of research and experimentation on social bots detection, in

this paper we aim at understanding past, present, and future research

trends in this crucial ﬁeld. In doing so, we discuss about one of the nas-

tiest features of social bots – that is, their evolutionary nature. Then,

we highlight the switch from supervised bot detection techniques – fo-

cusing on feature engineering and on the analysis of one account at a

time – to unsupervised ones, where the focus is on proposing new detec-

tion algorithms and on the analysis of groups of accounts that behave

in a coordinated and synchronized fashion. These unsupervised, group-

analyses techniques currently represent the state-of-the-art in social bot

detection. Going forward, we analyze the latest research trend in social

bot detection in order to highlight a promising new development of this

crucial ﬁeld.

Keywords: Social bots ·bot evolution ·reactive detection ·proactive

detection ·adversarial machine learning ·generalizability.

1 Introduction

Social media and Online Social Networks (OSNs) are having a profound impact

on our everyday life, giving voice to the crowds and reshaping the information

landscape. Indeed, the deluge of real-time data spontaneously shared in OSNs

already proved valuable in many diﬀerent domains, spanning tourism [7], safety

and security [4,3], transportation and politics [14,23], to name but a few notable

cases.

However, the democratizing eﬀect of OSNs does not come without costs [6].

In 2016, “post-truth” was selected by the Oxford dictionary as the word of the

year, and in 2017 “fake news” was selected for the same purpose by Collins

dictionary. Still in 2017, the World Economic Forum raised a warning on the

potential distortion eﬀect of OSNs on user perceptions of reality1. Moreover, the

same openness of OSNs that favored the democratization of information (e.g.,

the support for programmatic access via APIs and the support for anonymity),

1http://reports.weforum.org/global-risks-2017

2 S. Cresci

2014 2015 2016 2017 2018 2019 2020

time

queries

data

loess fit

(a) Normalized Google queries per month.

100

150

200

2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020

time

publications

data

loess fit

pubmed

wos

dimension.ai

(b) Publications per year.

Fig. 1: Trends in search queries and publications regarding social bots.

also inevitably favored the proliferation of social bots. Indeed, previous studies

report that social bots are as old as OSNs themselves [18]. With the term social

bot, we broadly refer to computer programs capable of automatically produc-

ing, re-sharing, and liking content in OSNs, or even capable of establishing and

maintaining social relations. In fact, any of our supposedly online friends may

instead be a fake, automated account, part of large coordinated groups [18].

Not all social bots are malicious and dangerous, and some of them also serve

beneﬁcial purposes, such as contributing to gather accurate information in the

aftermath of emergencies [2,25]. Unfortunately, however, the vast majority ac-

tually pursue malicious goals. These malicious bots try to hide their automated

nature by imitating the behaviors of legitimate users. Moreover, they often act

in a synchronized and coordinated fashion – a strategy that collectively allows

them to increase their impact. Many recent studies concluded that social bots

played a role in strategic information operations orchestrated in the run up to

several major political elections, both in western and eastern countries [32,33].

As additional evidence for this claim, Twitter recently banned several thousands

accounts, linked to many diﬀerent malicious information operations perpetrated

between 2016 and 20192. Other recent studies also suggested that social bots

were used to exacerbate online social discussions about controversial topics (e.g.,

vaccination and immigration debates), thus increasing polarization and fueling

abusive and hateful speech [34]. Across the whole Twittersphere, it is reported

that social bots account for 9 to 15% of total active platform users [35]. Even

more worryingly however, when strong political or economical incentives are at

stake, the presence of bots exponentially increases. As an example, a recent study

reported that 71% of all users mentioning stocks traded in US ﬁnancial markets,

are likely to be bots [10].

Since social bots have a central role in the diﬀusion of disinformation, spam,

and malware, both scholars and practitioners devoted much eﬀort to the devel-

opment of detection techniques. Nowadays, new studies on the characterization

and detection of social bots are published at an impressive rate, as shown in

Figure 1. An analysis of a subset of publications from 2018 reports that more

2https://about.twitter.com/en us/values/elections-integrity.html

Detecting malicious social bots: story of a never-ending clash 3

than 3 new papers were published (on average) every week on the topic of social

bots3. The rapidly growing publication trend suggests that in the near future

there will be one new paper published every day, which poses a heavy burden

on researchers trying to keep pace with the evolution of this ﬁeld. This issue is

also emphasized by the lack of a thorough survey. Perhaps more importantly,

the rate at which new studies on this topic are published implies that a huge

eﬀort is taking place worldwide in order to overcome the diﬀusion of social bots.

Given this picture, an important question arises: where is all this eﬀort leading?

In the remainder of this paper we try to answer this crucial question via a

longitudinal analysis of ten years of research in the ﬁeld of social bot detection.

2 Traditional social bot detection

The ﬁrst work that focused on the detection of misbehaving accounts in OSNs

dates back to January 2010 [38]. Since then and until present days, the vast ma-

jority of attempts at bot detection have been based on heuristics (i.e., rule-based)

or on supervised machine learning [9]. An important implication of the adoption

of supervised machine learning is that each account is analyzed singularly. In

other words, given a group of accounts to investigate (e.g., an OSN community),

the detection technique is separately applied to each account of the group, to

which it assigns a label (either bot or legitimate). In fact, the key assumption

of this large body of work is that each bot/fake/spammer has peculiar features

that make it clearly distinguishable from legitimate accounts. This approach to

the task of social bot detection, which we call “traditional”, thus revolves around

the application of oﬀ-the-shelf machine learning algorithms on the accounts un-

der investigation, rather than on developing new algorithms. Indeed, most of the

works in this branch are focused on designing machine learning features – that

is, they are focused on the task of feature engineering – capable of maximizing

detection performances of well-known algorithms, such as SVM, decision trees,

random forests, and more [9].

Regarding features to exploit for the detection, 3 classes have been mainly

considered: (i) proﬁle features [8,15]; (ii) features extracted from the posts, such

as posting behavior and content of posted messages [28,5]; and (iii) features

derived from the social or interaction graph of the accounts [22,26]. The classes

of features exploited by the detection technique have a strong impact on both

the performances of the detector as well as its eﬃciency. For instance, in Twitter

it has been demonstrated that those features that mostly contribute towards the

predictive power of bot detectors (e.g., graph-based features such as measures of

centrality in the social graph), are also the most costly ones, in terms of needed

data and computation [8].

Despite achieving promising initial results, the traditional approach – which

still comprises the majority of papers published nowadays – has a number of

drawbacks. The ﬁrst challenge in developing a supervised detector is related

3Source: https://www.dimensions.ai/

4 S. Cresci

to the availability of a ground truth (i.e., labeled) dataset, to be used in the

learning phase of the classiﬁer. In most cases, a real ground truth is lacking and

the labels are simply given by human operators that manually analyze the data.

Critical issues arise since, as of 2019, we still lack a “standard” deﬁnition of what

a social bot is [21,37]. Moreover, humans have been proven to suﬀer from several

biases [29] and to largely fail at spotting modern, sophisticated bots, with only

'24% bots correctly labeled as such by humans [9].

The biggest drawback of traditional approaches, however, is due to the evo-

lutionary nature of social bots, which we discuss in the following section.

3 The issue of bot evolution

Early success at social bot detection, in turn, inevitably inspired countermea-

sures by bot developers. Because of this, newer bots often feature advanced char-

acteristics that make them way harder to detect with respect to older ones. This

iterative process, that leads to the development of always more sophisticated

social bots, is commonly referred to as bot evolution.

A noteworthy work published in 2011, and later extended in 2013 [36], pro-

vided the ﬁrst evidence and the theoretical foundations to study social bot evo-

lution. The ﬁrst wave of social bots that populated OSNs until around 2011 was

made of rather simplistic bots – mainly accounts with very low perceived rep-

utation (e.g., few social connections and posted messages) and featuring clear

signs of automation (e.g., repeated spam of the same URLs). On the contrary,

the social bots studied in [36] appeared as more popular and credible, given the

relatively large number of their social connections. In addition, they were no

longer spamming the same messages over and over again, but they were instead

posting several messages with the same meaning but with diﬀerent words, in or-

der to avoid detection techniques based on content analysis. Starting from these

ﬁndings, authors of [36] also proposed a supervised machine learning classiﬁer

that was speciﬁcally designed for detecting evolving bots. Their classiﬁer simul-

taneously leveraged features computed from the content of posted messages,

social connections, and tweeting behaviors, and initially proved capable of ac-

curately detecting the sophisticated bots. More recently, new studies provided

evidence of a third generation of social bots that spread through OSNs from 2016

onwards [18,9]. Unfortunately, the classiﬁer originally developed in [36] was no

longer successful at detecting the third wave of social bots, as shown in [9].

The previous example serves as anecdotal evidence of bot evolution, and of

the detrimental eﬀect it has on bot detectors. Additional evidence is reported

in [9], where authors evaluated the survivability of diﬀerent bots, and the abil-

ity of humans in spotting bots in the wild. Speciﬁcally, authors of [9] showed

that only '5% of evolved bots are removed from social platforms (i.e., high

survivability), whilst “old” social bots are removed '60% of the times (i.e.,

low/moderate survivability). Moreover, in a large-scale crowdsourcing experi-

ment, tech-savvy social media users proved unable to tell apart evolved bots

and legitimate users, 76% of the times (i.e., 3 out of 4 evade detection by hu-

Detecting malicious social bots: story of a never-ending clash 5

mans). The same users were instead unable of spotting “old” social bots only

9% of the times (i.e., only 1 out of 10 evades detection) [9].

What results reported in [18,9] ultimately tell us, is that current sophisticated

bots are practically indistinguishable from legitimate accounts, if analyzed one at

a time. In other words, the results about bot evolution tell us that the assumption

of traditional (i.e., supervised) bot detection approaches, according to which

bots have features that allow to distinguish them from legitimate accounts, is

no longer true.

4 Modern social bot detection

The diﬃculties in detecting sophisticated bots with supervised approaches that

are based on the analysis of individual accounts, recently gave rise to a new

research trend that aims to analyze groups of accounts as a whole. This new

research trend is also motivated by the interest of platform administrators in

detecting what they typically refer to as “coordinated inauthentic behavior”4,5 .

Since 2013, several diﬀerent research teams independently started to propose

new techniques for social bot detection. Despite being based on diﬀerent key

concepts, all these new techniques – that collectively represent the “modern”

approach to social bot detection – included important contributions also from

the algorithmic point of view, thus shifting from general-purpose machine learn-

ing algorithms such as SVMs and decision trees, to ad-hoc algorithms that were

speciﬁcally designed for detecting bots. Furthermore, the majority of these new

algorithms considered groups of accounts as a whole, rather than single accounts,

thus moving in the direction of detecting the coordinated and synchronized be-

havior that characterizes malicious botnets [9].

As a consequence of this paradigm-shift, modern bot detectors are partic-

ularly eﬀective at detecting evolving, coordinated, and synchronized bots. For

instance, the technique discussed in [13] associates each account to a sequence

of characters that encodes its behavioral information. Such sequences are then

compared between one another to ﬁnd anomalous similarities among sequences

of a subgroup of accounts. The similarity is computed by measuring the longest

common subsequence shared by all the accounts of the group. Accounts that

share a suspiciously long subsequence are then labeled as bots. Instead, the fam-

ily of systems described in [22,26] build a bipartite graph of accounts and their

interactions with content (e.g., retweets to some other tweets) or with other ac-

counts (e.g., becoming followers of other accounts). Then, they aim to detect

anomalously dense blocks in the graph, which might be representative of coor-

dinated and synchronized attacks. Another recent example of an unsupervised,

group-based technique is RTbust [27], which is tailored for detecting mass-

retweeting bots. The technique leverages unsupervised feature extraction and

clustering. An LSTM autoencoder converts the retweet time series of accounts

into compact and informative latent feature vectors, which are then clustered

4https://newsroom.fb.com/news/2018/12/inside-feed-coordinated-inauthentic-behavior/

5https://help.twitter.com/en/rules-and-policies/platform- manipulation

6 S. Cresci

by a hierarchical density-based algorithm. Accounts belonging to large clusters

characterized by malicious retweeting patterns are labeled as bots, since they

are likely to represent retweeting botnets.

Given that bot detection techniques belonging to this modern approach still

represent the minority of all published papers on social bot detection, we still

lack a through and systematic study of the improvement brought by the mod-

ern approach to social bot detection. However, the ﬁrst preliminary results that

compared the detection performances of traditional and modern detectors on

the same datasets, seem to support the increased eﬀectiveness of the latter. In

particular, the technique introduced in [13] outperformed several traditional de-

tectors on two datasets, yielding an average F1 improvement of +0.37. Similarly,

RTbust [27] improved on a widely used traditional bot detector by increasing

F1 of +0.44. The promising results with modern bot detectors tell us that fo-

cusing on groups is advantageous. In fact, large groups of coordinated bots are

more likely to leave traces of automation than a single bot, independently of

how sophisticated the individual bots are [9]. By performing analyses at group

level, this modern approach appears to be able to raise the bar for bot develop-

ers to evade detection. Furthermore, the majority of modern bot detectors are

semi-supervised or unsupervised, which gives higher guarantees on the general-

izability of the detector and mitigates challenges related to the acquisition of a

reliable ground-truth.

5 The way ahead

So far, we highlighted that a shift is taking place in the development of bot

detectors, in order to counter the evolutionary nature of social bots. Now, by

looking at the latest advances in this thriving ﬁeld, we aim at gaining some

insights into the future of social bot detection.

Notably, both the traditional and the modern approach to social bot detec-

tion have always followed a reactive schema. Quite naturally, the driving factor

for the development of new and better bot detectors have been bot mischiefs

themselves. As soon as scholars and OSN administrators identiﬁed a new group

of bots, possibly featuring new and advanced characteristics, they started the

development of detectors capable of spotting them. A major implication of this

reactive approach is that improvements in bot detection are possible only after

having collected evidence of new bot mischiefs. In turn, this means that scholars

and OSN administrators are constantly one step behind of bot developers, and

that bots have a signiﬁcant time span (i.e., the time needed to design, develop,

and deploy a new detector) during which they are essentially free to tamper with

our online environments.

However, another – radically diﬀerent – approach to social bot detection is

possible, and has just started being investigated by several researchers. This

trailblazing direction of research involves the application of adversarial machine

learning [19] to bot detection. Adversarial machine learning has already been

applied to a number of ﬁelds such as computer vision [24] and speech recog-

Detecting malicious social bots: story of a never-ending clash 7

nition [30], with exceptional results. In general, it is considered as a machine

learning paradigm that can be proﬁtably applied to all scenarios that are intrin-

sically adversarial (i.e., with adversaries interested in fooling machine learning

models) [19], with social bots detection clearly being one of such scenarios [17].

In the so-called adversarial social bot detection, scholars try to ﬁnd meaning-

ful adversarial examples with which to test current bot detectors [11]. In other

words, this branch of research aims at studying possible attacks to existing bot

detectors, with the goal of building more robust and more secure detectors. In

this context, adversarial examples might be sophisticated types of existing bots

that manage to evade detection by current techniques [1], or even bots that do

not exist yet, but whose behaviors and characteristics are simulated [12], or bots

developed ad-hoc for the sake of experimentation [20]. Finding good adversarial

examples can, in turn, help scholars understand the weaknesses of existing bot

detection systems, before such weaknesses are eﬀectively exploited by bot devel-

opers. As a result, bot hunters need not wait anymore for new bot mischiefs in

order to adapt their techniques, but instead they can proactively test them, in an

eﬀort that could quickly make them more robust. Among the positive outcomes

of adversarial approaches to bot detection, is a more rapid understanding of the

drawbacks of current detectors and the opportunity to gain insights into new

features for achieving more robust and more reliable detectors.

Despite the high hopes placed on adversarial social bot detection, this re-

search direction is still in its infancy. The very ﬁrst works in this ﬁeld have in

fact been published just in 2018 and 2019. Adversarial approaches to social bot

detection thus represent a promising new development of this ﬁeld. However,

eﬀorts at adversarial social bot detection can only be successful if the scientiﬁc

community decides to rise to the many open challenges. Among the challenges

opened up by proactive and adversarial approaches is the development of tech-

niques for creating many diﬀerent kinds of adversarial examples, with which to

test existing bot detectors. A task that, to date, was only tackled by relying on

the creativity of some researchers and only for a few limited cases [11,20,12].

Moreover, adversarial approaches have proved computationally and data inten-

sive in some of the early tasks to which they were applied, with only few solutions

proposed to date to boost their eﬃciency [31]. Another challenge thus revolves

around assessing the eﬃciency of adversarial social bot detection, as well as its

coverage of the possible types of attacks (i.e., how likely it is with the adversarial

approach to anticipate a real future attack or a real future evolution of bots).

6 Conclusions

Our longitudinal analysis of the ﬁrst decade of research in social bot detection

revealed some interesting trends in the development of bot detectors. In partic-

ular, we identiﬁed 3 ages of bot detection: (i) the traditional age, characterized

by the study of account features and by the adoption of oﬀ-the-shelf supervised

machine learning algorithms; (ii) the modern age, characterized by the develop-

ment of ad-hoc unsupervised algorithms for detecting groups of colluding bots;

8 S. Cresci

time

current state of development

traditional mo dern adversarial

key con-

cept

features allow to tell

apart bots and legiti-

mate accounts

synchronization and

coordination allow to

detect botnets

improve bot detectors

by ﬁnding their weak-

nesses

development

focus †features (e.g., via fea-

ture engineering)

detection algorithms adversarial examples

method ‡supervised, oﬀ-the-

shelf ML (e.g., deci-

sion trees, SVMs)

unsupervised, ad-hoc

algorithms

adversarial ML

target §single accounts groups of accounts bot detectors

†: what scholars aim to optimize

‡: which machine learning (ML) paradigm scholars adopt

§: to what scholars apply their method

Table 1: The analysis of more than a decade of research and experimentation

in social bot detection allows to identify 3 main directions of research, corre-

sponding to 3 diﬀerent ages: the traditional, the modern, and the adversarial

age. In turn, each age is characterized by a few distinctive features reported

above. Furthermore, an analysis of recently published papers on social bot de-

tection, positions current endeavors somewhere in between the traditional and

the modern ages.

and (iii) the newborn adversarial age, whose promise is to apply the paradigm

of adversarial machine learning to the task of bot detection. Given the consid-

erable amount of work still needed to lay the foundations of adversarial social

bot detection, the adversarial age has not really sparked yet. However, if it lives

up to its expectations, it might blossom soon with a tremendous impact. Apart

from the adversarial age, the characteristics of currently published works in so-

cial bot detection still highlight a majority of traditional detectors. However,

the gap between newly proposed traditional and modern detectors is narrowing.

Hence we can conclude that the peak of the traditional age is probably over, and

that we are moving towards the peak of the modern age, as pictorially shown in

Table 1.

The exponentially growing body of work on social bot detection shown in

Figure 1, somehow reassures us that much eﬀort is bound to be devoted to the

ﬁght of this critical issue. However, at the same time it also poses some new

challenges. Firstly, it is becoming more and more important to be able to or-

ganize this large body of work. Doing so would not only contribute to a better

exploitation of this knowledge, but would also allow researchers in bot detec-

tion to more eﬀectively and more eﬃciently provide new solutions (e.g., avoid

wasting time and eﬀort on solutions that have already proved unsuccessful).

Detecting malicious social bots: story of a never-ending clash 9

fake

followers

(])

mass

retweeters

fake

reviewers

information

polluters

now next evolution

types of social bots

time

adversarial social bot detection

generalize on:

bot

evolutions

types

of bots

both

diﬃculty

of detection:

easy

medium

hard

Fig. 2: A bi-dimensional theory of generalizability for social bot detectors. Let

us consider a detector developed for a speciﬁc kind of bots (marked with ]). The

detector will likely achieve its best performances when used against the same bots

it was developed for (green-colored scenario). However, it would be useful to also

evaluate its detection performances against diﬀerent kinds of bots, thus moving

along the yaxis. Furthermore, by exploiting adversarial social bot detection,

it could also be possible to estimate its detection performances against evolved

bots, thus moving along the xaxis of the generalizability space. The hardest

foreseeable evaluation scenario is the one where a detector is tested against

evolved versions of bots for which it was not originally designed (red-colored).

The vast majority of newly proposed bot detectors are only evaluated in the

easiest scenario.

Unfortunately, thorough and comprehensive surveys on bot detection are still

few and far between. To this regard, this paper aims to provide a contribution

to the critical review and analysis of the vast literature in this ﬁeld. Secondly,

more papers on this topic inevitably imply that more bot detectors will be pro-

posed. With the growing number of disparate detection techniques, it is thus

becoming increasingly important to have standard tools (e.g., frameworks, ref-

erence datasets, methodologies) to evaluate and compare them. In particular,

one facet of bot detectors that is often overlook is their generalizability – that is,

their capability in maintaining good detection results also for types of bots that

have not been originally considered. To this regard, the analyses carried out in

this study lay the foundations for a bi-dimensional theory of generalizability, as

shown in Figure 2. A desirable scenario for the near future would involve the

possibility to easily evaluate any new bot detector against many diﬀerent types

10 S. Cresci

of social bots in order to assess its strengths and weaknesses, for instance by

following the approach laid out in [16]. It would also be proﬁtable to be able to

evaluate detectors against possible evolved versions of current bots, by applying

the adversarial approach previously described. In order to reach this ambitious

goal, we must ﬁrst create reference datasets that comprise several diﬀerent kinds

of bots, thus signiﬁcantly adding to the sparse resources existing as of today6.

Then, as already anticipated, we should also devise additional ways for creating

a broad array of diverse adversarial examples. These challenges currently stand

as unsolved, and call for the highest eﬀort of our scientiﬁc community.

Acknowledgments

This research is supported in part by the EU H2020 Program under the scheme

INFRAIA-1-2014-2015: Research Infrastructures grant agreement #654024

SoBigData: Social Mining & Big Data Ecosystem.

References

1. Assenmacher, D., Adam, L., Frischlich, L., Trautmann, H., Grimme, C.: Openbots.

arXiv preprint arXiv:1902.06691 (2019)

2. Avvenuti, M., Bellomo, S., Cresci, S., La Polla, M.N., Tesconi, M.: Hybrid crowd-

sensing: A novel paradigm to combine the strengths of opportunistic and partici-

patory crowdsensing. In: ACM WWW Companion (2017)

3. Avvenuti, M., Cresci, S., Del Vigna, F., Fagni, T., Tesconi, M.: Crismap: a big data

crisis mapping system based on damage detection and geoparsing. Information

Systems Frontiers 20(5), 993–1011 (2018)

4. Avvenuti, M., Cresci, S., Marchetti, A., Meletti, C., Tesconi, M.: Predictability or

early warning: using social media in modern emergency response. IEEE Internet

Computing 20(6) (2016)

5. Chavoshi, N., Hamooni, H., Mueen, A.: DeBot: Twitter bot detection via warped

correlation. In: IEEE ICDM (2016)

6. Cresci, S.: Harnessing the Social Sensing revolution: Challenges and Opportunities.

PhD dissertation, University of Pisa (2018)

7. Cresci, S., D’Errico, A., Gazz´e, D., Lo Duca, A., Marchetti, A., Tesconi, M.: To-

wards a DBpedia of tourism: the case of Tourpedia. In: ISWC (2014)

8. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: Fame for sale:

Eﬃcient detection of fake Twitter followers. Decision Support Systems 80 (2015)

9. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: The paradigm-

shift of social spambots: Evidence, theories, and tools for the arms race. In: ACM

WWW Companion (2017)

10. Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: Cashtag piggybacking: Un-

covering spam and bot activity in stock microblogs on twitter. ACM Transactions

on the Web 13(2), 11 (2019)

11. Cresci, S., Petrocchi, M., Spognardi, A., Tognazzi, S.: From Reaction to Proac-

tion: Unexplored Ways to the Detection of Evolving Spambots. In: ACM WWW

Companion (2018)

6https://botometer.iuni.iu.edu/bot-repository/datasets.html

Detecting malicious social bots: story of a never-ending clash 11

12. Cresci, S., Petrocchi, M., Spognardi, A., Tognazzi, S.: Better safe than sorry: An

adversarial approach to improve social bot detection. In: ACM WebSci (2019)

13. Cresci, S., Pietro, R.D., Petrocchi, M., Spognardi, A., Tesconi, M.: Social ﬁnger-

printing: Detection of spambot groups through DNA-inspired behavioral modeling.

IEEE Transactions on Dependable and Secure Computing 15(4), 561–576 (2018)

14. D’Andrea, E., Ducange, P., Lazzerini, B., Marcelloni, F.: Real-time detection of

traﬃc from twitter stream analysis. IEEE Transactions on Intelligent Transporta-

tion Systems 16(4), 2269–2283 (2015)

15. Davis, C.A., Varol, O., Ferrara, E., Flammini, A., Menczer, F.: BotOrNot: A system

to evaluate social bots. In: ACM WWW Companion (2016)

16. De Cristofaro, E., Kourtellis, N., Leontiadis, I., Stringhini, G., Zhou, S., et al.:

LOBO: Evaluation of generalization deﬁciencies in Twitter bot classiﬁers. In: ACM

ACSAC (2018)

17. Ferrara, E.: The history of digital spam. Communications of the ACM 62(8), 82–91

(2019)

18. Ferrara, E., Varol, O., Davis, C., Menczer, F., Flammini, A.: The rise of social

bots. Commununications of the ACM 59(7) (2016)

19. Goodfellow, I.J., McDaniel, P.D., Papernot, N.: Making machine learning robust

against adversarial inputs. Commununications of the ACM 61(7), 56–66 (2018)

20. Grimme, C., Assenmacher, D., Adam, L.: Changing perspectives: Is it suﬃcient to

detect social bots? In: SCSM (2018)

21. Grimme, C., Preuss, M., Adam, L., Trautmann, H.: Social bots: Human-like by

means of human control? Big data 5(4) (2017)

22. Jiang, M., Cui, P., Beutel, A., Faloutsos, C., Yang, S.: Catching synchronized be-

haviors in large networks: A graph mining approach. ACM Transactions on Knowl-

edge Discovery from Data 10(4) (2016)

23. Kavanaugh, A.L., Fox, E.A., Sheetz, S.D., Yang, S., Li, L.T., Shoemaker, D.J.,

Natsev, A., Xie, L.: Social media use by government: From the routine to the

critical. Government Information Quarterly 29(4), 480–491 (2012)

24. Ledig, C., Theis, L., Husz´ar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken,

A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-

resolution using a generative adversarial network. In: IEEE ICCV (2017)

25. de Lima Salge, C.A., Berente, N.: Is that social bot behaving unethically? Com-

munications of the ACM 60(9), 29–31 (2017)

26. Liu, S., Hooi, B., Faloutsos, C.: Holoscope: Topology-and-spike aware fraud detec-

tion. In: ACM CIKM (2017)

27. Mazza, M., Cresci, S., Avvenuti, M., Quattrociocchi, W., Tesconi, M.: RTbust:

Exploiting temporal patterns for botnet detection on twitter. In: ACM WebSci

(2019)

28. Miller, Z., Dickinson, B., Deitrick, W., Hu, W., Wang, A.H.: Twitter spammer

detection using data stream clustering. Information Sciences 260, 64–73 (2014)

29. Pandey, R., Castillo, C., Purohit, H.: Modeling human annotation errors to design

bias-aware systems for social stream processing. In: IEEE/ACM ASONAM (2019)

30. Pascual, S., Bonafonte, A., Serr`a, J.: SEGAN: Speech Enhancement Generative

Adversarial Network. In: Interspeech (2017)

31. Sahay, R., Mahfuz, R., Gamal, A.E.: A computationally eﬃcient method for de-

fending adversarial deep learning attacks. arXiv preprint arXiv:1906.05599 (2019)

32. Shao, C., Ciampaglia, G.L., Varol, O., Yang, K.C., Flammini, A., Menczer, F.:

The spread of low-credibility content by social bots. Nature communications 9(1)

(2018)

12 S. Cresci

33. Starbird, K., Arif, A., Wilson, T.: Disinformation as collaborative work: Surfac-

ing the participatory nature of strategic information operations. In: ACM CSCW

(2019)

34. Stella, M., Ferrara, E., De Domenico, M.: Bots increase exposure to negative and in-

ﬂammatory content in online social systems. Proceedings of the National Academy

of Sciences 115(49) (2018)

35. Varol, O., Ferrara, E., Davis, C.A., Menczer, F., Flammini, A.: Online human-bot

interactions: Detection, estimation, and characterization. In: AAAI ICWSM (2017)

36. Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for ﬁght-

ing evolving twitter spammers. IEEE Transactions on Information Forensics and

Security 8(8), 1280–1293 (2013)

37. Yang, K.C., Varol, O., Davis, C.A., Ferrara, E., Flammini, A., Menczer, F.: Arming

the public with artiﬁcial intelligence to counter social bots. Human Behavior and

Emerging Technologies 1(1), 48–61 (2019)

38. Yardi, S., Romero, D., Schoenebeck, G., et al.: Detecting spam in a twitter network.

First Monday 15(1) (2010)

Holistic Approaches To Investigating, Characterizing, and Detecting Online Disinformation and Misbehavior

Thesis

Full-text available

May 2022

Serena Tardelli

Multimodal Detection of Social Spambots in Twitter using Transformers

Preprint

Full-text available

Aug 2023

Although not all bots are malicious, the vast majority of them are responsible for spreading misinformation and manipulating the public opinion about several issues, i.e., elections and many more. Therefore, the early detection of social spambots is crucial. Although there have been proposed methods for detecting bots in social media, there are still substantial limitations. For instance, existing research initiatives still extract a large number of features and train traditional machine learning algorithms or use GloVe embeddings and train LSTMs. However, feature extraction is a tedious procedure demanding domain expertise. Also, language models based on transformers have been proved to be better than LSTMs. Other approaches create large graphs and train graph neural networks requiring in this way many hours for training and access to computational resources. To tackle these limitations, this is the first study employing only the user description field and images of three channels denoting the type and content of tweets posted by the users. Firstly, we create digital DNA sequences, transform them to 3d images, and apply pretrained models of the vision domain, including EfficientNet, AlexNet, VGG16, etc. Next, we propose a multimodal approach, where we use TwHIN-BERT for getting the textual representation of the user description field and employ VGG16 for acquiring the visual representation for the image modality. We propose three different fusion methods, namely concatenation, gated multimodal unit, and crossmodal attention, for fusing the different modalities and compare their performances. Extensive experiments conducted on the Cresci '17 dataset demonstrate valuable advantages of our introduced approaches over state-of-the-art ones reaching Accuracy up to 99.98%.

False Information, Bots and Malicious Campaigns: Demystifying Elements of Social Media Manipulations

Preprint

Aug 2023

The rapid spread of false information and persistent manipulation attacks on online social networks (OSNs), often for political, ideological, or financial gain, has affected the openness of OSNs. While researchers from various disciplines have investigated different manipulation-triggering elements of OSNs (such as understanding information diffusion on OSNs or detecting automated behavior of accounts), these works have not been consolidated to present a comprehensive overview of the interconnections among these elements. Notably, user psychology, the prevalence of bots, and their tactics in relation to false information detection have been overlooked in previous research. To address this research gap, this paper synthesizes insights from various disciplines to provide a comprehensive analysis of the manipulation landscape. By integrating the primary elements of social media manipulation (SMM), including false information, bots, and malicious campaigns, we extensively examine each SMM element. Through a systematic investigation of prior research, we identify commonalities, highlight existing gaps, and extract valuable insights in the field. Our findings underscore the urgent need for interdisciplinary research to effectively combat social media manipulations, and our systematization can guide future research efforts and assist OSN providers in ensuring the safety and integrity of their platforms.

Evaluation of social bot detection models

Article

May 2022

Political Propagation of Social Botnets: Policy Consequences

Preprint

Full-text available

May 2022

Shashank Yadav

The 2016 US election was a watershed event where an electoral intervention by an adversarial state made extensive use of networks of software robots and data driven communications which transformed the interference into a goal driven functionality of man-machine collaboration. Reviewing the debates post the debacle, we reflect upon the policy consequences of the use of Social Botnets and understand the impact of their adversarial operation in terms of catalysing institutional decay, growing infrastructural anxieties, increased industry regulations, more vulnerable Individuals and more distorted ideas, and most importantly, the emergence of an unintended constituency in form of the bot agency itself. The article first briefly introduces the nature and evolution of Social Botnets, and then moves over to discussing the policy consequences. For future work, it is important to understand the agency and collective properties of these software robots, in order to design the institutional and socio-technical mechanisms which mitigate the risk of adversarial social engineering using these bots from interfering into democratic processes.

Crime and its fear in social media

Article

Full-text available

Apr 2020

Social media posts incorporate real-time information that has, elsewhere, been exploited to predict social trends. This paper considers whether such information can be useful in relation to crime and fear of crime. A large number of tweets were collected from the 18 largest Spanish-speaking countries in Latin America, over a period of 70 days. These tweets are then classified as being crime-related or not and additional information is extracted, including the type of crime and where possible, any geo-location at a city level. From the analysis of collected data, it is established that around 15 out of every 1000 tweets have text related to a crime, or fear of crime. The frequency of tweets related to crime is then compared against the number of murders, the murder rate, or the level of fear of crime as recorded in surveys. Results show that, like mass media, such as newspapers, social media suffer from a strong bias towards violent or sexual crimes. Furthermore, social media messages are not highly correlated with crime. Thus, social media is shown not to be highly useful for detecting trends in crime itself, but what they do demonstrate is rather a reflection of the level of the fear of crime.

Unsupervised Social Bot Detection via Structural Information Theory

Article

Apr 2024

Research on social bot detection plays a crucial role in maintaining the order and reliability of information dissemination while increasing trust in social interactions. The current mainstream social bot detection models rely on black-box neural network technology, e.g., Graph Neural Network, Transformer, etc., which lacks interpretability. In this work, we present UnDBot, a novel unsupervised, interpretable, yet effective and practical framework for detecting social bots. This framework is built upon structural information theory. We begin by designing three social relationship metrics that capture various aspects of social bot behaviors: Posting Type Distribution , Posting Influence , and Follow-to-follower Ratio . Three new relationships are utilized to construct a new, unified, and weighted social multi-relational graph, aiming to model the relevance of social user behaviors and discover long-distance correlations between users. Second, we introduce a novel method for optimizing heterogeneous structural entropy. This method involves the personalized aggregation of edge information from the social multi-relational graph to generate a two-dimensional encoding tree. The heterogeneous structural entropy facilitates decoding of the substantial structure of the social bots network and enables hierarchical clustering of social bots. Thirdly, a new community labeling method is presented to distinguish social bot communities by computing the user’s stationary distribution, measuring user contributions to network structure, and counting the intensity of user aggregation within the community. Compared with ten representative social bot detection approaches, comprehensive experiments demonstrate the advantages of effectiveness and interpretability of UnDBot on four real social network datasets.

Deep Learning Based Social Bot Detection on Twitter

Article

Jan 2023

While social bots can be used for various good causes, they can also be utilized to manipulate people and spread malware. Therefore, it is crucial to detect bots running on social media platforms. However, social bots are increasingly successful in creating human-like messages with the recent developments in artificial intelligence. Thus, we need more sophisticated solutions to detect them. In this study, we propose a novel deep learning architecture in which three long short-term memory (LSTM) models and a fully connected layer are utilized to capture complex social media activity of humans and bots. Since our architecture involves many components connected at different levels, we explore three learning schemes to train each component effectively. In our extensive experiments, we analyze the impact of each component of our architecture on classification accuracy using four different datasets. Furthermore, we show that our proposed architecture outperforms all baselines used in our experiments.

An Offensive Approach for Hiding Malicious Payloads in an Image

Conference Paper

Oct 2021

Keshav Kaushik

Steganography is the oldest technique that is been used from century, steganography purpose has not changed, i.e., all these techniques aim at hiding data or protecting data. With the help of steganalysis, the media can be analyzed to check for the presence of any secret information. Nowadays, attackers are making the use of advanced steganography approaches to conceal the secret information and communicate in a stealth manner. In this paper, the authors have discussed about the novel approach to hide malicious payload into image metadata. Therefore, metadata is a data that describes about the image rights and its administration. Hacker generally uses this metadata to perform various malicious attacks such embedding malicious script inside the image metadata and many more.

Better Safe Than Sorry: An Adversarial Approach to Improve Social Bot Detection

Conference Paper

Full-text available

Jun 2019

The arm race between spambots and spambot-detectors is made of several cycles (or generations): a new wave of spambots is created (and new spam is spread), new spambot filters are derived and old spambots mutate (or evolve) to new species. Recently, with the diffusion of the adversarial learning approach, a new practice is emerging: to manipulate on purpose target samples in order to make stronger detection models. Here, we manipulate generations of Twitter social bots, to obtain - and study - their possible future evolutions, with the aim of eventually deriving more effective detection techniques. In detail, we propose and experiment with a novel genetic algorithm for the synthesis of online accounts. The algorithm allows to create synthetic evolved versions of current state-of-the-art social bots. Results demonstrate that synthetic bots really escape current detection techniques. However, they give all the needed elements to improve such techniques, making possible a proactive approach for the design of social bot detection systems.

RTbust: Exploiting Temporal Patterns for Botnet Detection on Twitter

Conference Paper

Full-text available

Jun 2019

Within OSNs, many of our supposedly online friends may instead be fake accounts called social bots, part of large groups that purposely re-share targeted content. Here, we study retweeting behaviors on Twitter, with the ultimate goal of detecting retweeting social bots.We collect a dataset of 10M retweets. We design a novel visualization that we leverage to highlight benign and malicious patterns of retweeting activity. In this way, we uncover a ?normal" retweeting pattern that is peculiar of human-operated accounts, and suspicious patterns related to bot activities. Then, we propose a bot detection technique that stems from the previous exploration of retweeting behaviors. Our technique, called Retweet-Buster (RTbust), leverages unsupervised feature extraction and clustering. An LSTM autoencoder converts the retweet time series into compact and informative latent feature vectors, which are then clustered with a hierarchical density-based algorithm. Accounts belonging to large clusters characterized by malicious retweeting patterns are labeled as bots. RTbust obtains excellent detection results, with F1=0.87, whereas competitors achieve F1?0.76.Finally, we apply RTbust to a large dataset of retweets, uncovering 2 previously unknown active botnets with hundreds of accounts.

Arming the public with artificial intelligence to counter social bots

Article

Full-text available

Feb 2019

LOBO: Evaluation of Generalization Deficiencies in Twitter Bot Classifiers

Conference Paper

Full-text available

Dec 2018

Botnets in online social networks are increasingly often affecting the regular flow of discussion, attacking regular users and their posts, spamming them with irrelevant or offensive content, and even manipulating the popularity of messages and accounts. Researchers and cybercriminals are involved in an arms race, and new and updated botnets designed to defeat current detection systems are constantly developed, rendering such detection systems obsolete. In this paper, we motivate the need for a generalized evaluation in Twitter bot detection and propose a methodology to evaluate bot classifiers by testing them on unseen bot classes. We show that this methodology is empirically robust, using bot classes of varying sizes and characteristics and reaching similar results, and argue that methods trained and tested on single bot classes or datasets might not able to generalize to new bot classes. We train one such classifier on over 200,000 data points and show that it achieves over 97% accuracy. The data used to train and test this classifier includes some of the largest and most varied collections of bots used in literature. We then test this theoretically sound classifier using our methodology, highlighting that it does not generalize well to unseen bot classes. Finally, we discuss the implications of our results, and reasons why some bot classes are easier and faster to detect than others.

The spread of low-credibility content by social bots

Article

Full-text available

Nov 2018

The massive spread of digital misinformation has been identified as a major threat to democracies. Communication, cognitive, social, and computer scientists are studying the complex causes for the viral diffusion of misinformation, while online platforms are beginning to deploy countermeasures. Little systematic, data-based evidence has been published to guide these efforts. Here we analyze 14 million messages spreading 400 thousand articles on Twitter during ten months in 2016 and 2017. We find evidence that social bots played a disproportionate role in spreading articles from low-credibility sources. Bots amplify such content in the early spreading moments, before an article goes viral. They also target users with many followers through replies and mentions. Humans are vulnerable to this manipulation, resharing content posted by bots. Successful low-credibility sources are heavily supported by social bots. These results suggest that curbing social bots may be an effective strategy for mitigating the spread of online misinformation.

Bots increase exposure to negative and inflammatory content in online social systems

Article

Full-text available

Nov 2018

Significance Social media can deeply influence reality perception, affecting millions of people’s voting behavior. Hence, maneuvering opinion dynamics by disseminating forged content over online ecosystems is an effective pathway for social hacking. We propose a framework for discovering such a potentially dangerous behavior promoted by automatic users, also called “bots,” in online social networks. We provide evidence that social bots target mainly human influencers but generate semantic content depending on the polarized stance of their targets. During the 2017 Catalan referendum, used as a case study, social bots generated and promoted violent content aimed at Independentists, ultimately exacerbating social conflict online. Our results open challenges for detecting and controlling the influence of such content on society.

Making machine learning robust against adversarial inputs

Article

Full-text available

Jun 2018

Such inputs distort how machine-learning-based systems are able to function in the world as it is.

Harnessing the Social Sensing revolution - Challenges and Opportunities

Thesis

Full-text available

May 2018

Stefano Cresci

The recent proliferation of handheld devices that are equipped with a large number of sensors and communication capabilities, as well as the ubiquitous presence of communication facilities and infrastructures, and the mass diffusion and availability of social networking applications, has created a socio-technical convergence capable of sparking a revolution in the sensing world. One of the most promising and fascinating consequences of this new socio-technical convergence is the possibility to significantly extend, complement, and possibly substitute, conventional sensing by enabling the collection of data through networks of humans. Indeed, these unprecedented sensing and sharing opportunities have enabled situations where individuals not only play the role of sensor operators, but also act as data sources themselves. This spontaneous behavior has driven a new thriving – yet challenging – research field, called social sensing, investigating how human-sourced data can be gathered and used to gain situational awareness in a number of socially relevant domains. However, the social sensing revolution does not come without costs. Now that each of us can send messages for the entire world to read, or upload pictures for the entire world to see, the amount of real-time information out there far exceeds our cognitive capacity to consume it. Today, we have access to a plethora of blogs, discussion forums, and online social network accounts that provide orders of magnitude increases in the number of news sources. We are thus witnessing to the development of a widening gap between information production and our consumption capacity. Moreover, the reliability of such sources is not guaranteed. Indeed, it has already been demonstrated that observations produced by social sensors might be affected by a number of issues that undermine their usefulness and applicability. Among such issues are the widespread presence of fictitious, malicious, and deceptive social sensors; and the spreading of deceptive content, such as fake news. As a consequence, in order to fully harness this unfolding sensing revolution, we are in dire need of novel algorithms, techniques, and tools that are capable of turning this deluge of messy data into concise, meaningful, and reliable information. The possibility to fruitfully exploit this citizen-sensed stream of big data for novel applications – and ultimately for improving our societies and our everyday life – represents a tantalizing opportunity, counterbalanced by the many challenges related to the assessment of the reliability of such information, as well as its aggregation, summarization, and filtering. The goal of this thesis is to investigate the two sides of the “social sensing” coin. Thus, the main contributions of this doctoral work are twofold: (i) investigate the problem of credibility and reliability of social sensors; and (ii) explore the opportunities opened up by social sensing for a practically relevant scenario, such as that of emergency management.

Disinformation as Collaborative Work: Surfacing the Participatory Nature of Strategic Information Operations

Article

Nov 2019

In this paper, we argue that strategic information operations (e.g. disinformation, political propaganda, and other forms of online manipulation) are a critical concern for CSCW researchers, and that the CSCW community can provide vital insight into understanding how these operations function-by examining them as collaborative "work" within online crowds. First, we provide needed definitions and a framework for conceptualizing strategic information operations, highlighting related literatures and noting historical context. Next, we examine three case studies of online information operations using a sociotechnical lens that draws on CSCW theories and methods to account for the mutual shaping of technology, social structure, and human action. Through this lens, we contribute a more nuanced understanding of these operations (beyond "bots" and "trolls") and highlight a persistent challenge for researchers, platform designers, and policy makers-distinguishing between orchestrated, explicitly coordinated, information operations and the emergent, organic behaviors of an online crowd.

The history of digital spam

Article

Jul 2019

Emilio Ferrara

Tracing the tangled web of unsolicited and undesired email and possible strategies for its demise.

Detecting Malicious Social Bots: Story of a Never-Ending Clash

Abstract

Recommended publications

Detecting Malicious Social Bots: Story of a Never-Ending Clash

A Decade of Social Bot Detection

A Decade of Social Bot Detection

Detection of Novel Social Bots by Ensembles of Specialized Classifiers