Conference PaperPDF Available

Detecting Malicious Social Bots: Story of a Never-Ending Clash

Authors:

Abstract

Recently, studies on the characterization and detection of social bots were published at an impressive rate. By looking back at over ten years of research and experimentation on social bots detection, in this paper we aim at understanding past, present, and future research trends in this crucial field. In doing so, we discuss about one of the nastiest features of social bots-that is, their evolutionary nature. Then, we highlight the switch from supervised bot detection techniques - focusing on feature engineering and on the analysis of one account at a time - to unsupervised ones, where the focus is on proposing new detection algorithms and on the analysis of groups of accounts that behave in a coordinated and synchronized fashion. These unsupervised, group-analyses techniques currently represent the state-of-the-art in social bot detection. Going forward, we analyze the latest research trend in social bot detection in order to highlight a promising new development of this crucial field.
Detecting Malicious Social Bots:
Story of a Never-Ending Clash
Stefano Cresci[0000000301702445]
Institute for Informatics and Telematics (IIT-CNR), Pisa, Italy
stefano.cresci@iit.cnr.it
Abstract. Recently, studies on the characterization and detection of
social bots were published at an impressive rate. By looking back at over
ten years of research and experimentation on social bots detection, in
this paper we aim at understanding past, present, and future research
trends in this crucial field. In doing so, we discuss about one of the nas-
tiest features of social bots – that is, their evolutionary nature. Then,
we highlight the switch from supervised bot detection techniques – fo-
cusing on feature engineering and on the analysis of one account at a
time – to unsupervised ones, where the focus is on proposing new detec-
tion algorithms and on the analysis of groups of accounts that behave
in a coordinated and synchronized fashion. These unsupervised, group-
analyses techniques currently represent the state-of-the-art in social bot
detection. Going forward, we analyze the latest research trend in social
bot detection in order to highlight a promising new development of this
crucial field.
Keywords: Social bots ·bot evolution ·reactive detection ·proactive
detection ·adversarial machine learning ·generalizability.
1 Introduction
Social media and Online Social Networks (OSNs) are having a profound impact
on our everyday life, giving voice to the crowds and reshaping the information
landscape. Indeed, the deluge of real-time data spontaneously shared in OSNs
already proved valuable in many different domains, spanning tourism [7], safety
and security [4,3], transportation and politics [14,23], to name but a few notable
cases.
However, the democratizing effect of OSNs does not come without costs [6].
In 2016, “post-truth” was selected by the Oxford dictionary as the word of the
year, and in 2017 “fake news” was selected for the same purpose by Collins
dictionary. Still in 2017, the World Economic Forum raised a warning on the
potential distortion effect of OSNs on user perceptions of reality1. Moreover, the
same openness of OSNs that favored the democratization of information (e.g.,
the support for programmatic access via APIs and the support for anonymity),
1http://reports.weforum.org/global-risks-2017
2 S. Cresci
2014 2015 2016 2017 2018 2019 2020
time
queries
data
loess fit
(a) Normalized Google queries per month.
0
50
100
150
200
2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020
time
publications
data
loess fit
pubmed
wos
dimension.ai
(b) Publications per year.
Fig. 1: Trends in search queries and publications regarding social bots.
also inevitably favored the proliferation of social bots. Indeed, previous studies
report that social bots are as old as OSNs themselves [18]. With the term social
bot, we broadly refer to computer programs capable of automatically produc-
ing, re-sharing, and liking content in OSNs, or even capable of establishing and
maintaining social relations. In fact, any of our supposedly online friends may
instead be a fake, automated account, part of large coordinated groups [18].
Not all social bots are malicious and dangerous, and some of them also serve
beneficial purposes, such as contributing to gather accurate information in the
aftermath of emergencies [2,25]. Unfortunately, however, the vast majority ac-
tually pursue malicious goals. These malicious bots try to hide their automated
nature by imitating the behaviors of legitimate users. Moreover, they often act
in a synchronized and coordinated fashion – a strategy that collectively allows
them to increase their impact. Many recent studies concluded that social bots
played a role in strategic information operations orchestrated in the run up to
several major political elections, both in western and eastern countries [32,33].
As additional evidence for this claim, Twitter recently banned several thousands
accounts, linked to many different malicious information operations perpetrated
between 2016 and 20192. Other recent studies also suggested that social bots
were used to exacerbate online social discussions about controversial topics (e.g.,
vaccination and immigration debates), thus increasing polarization and fueling
abusive and hateful speech [34]. Across the whole Twittersphere, it is reported
that social bots account for 9 to 15% of total active platform users [35]. Even
more worryingly however, when strong political or economical incentives are at
stake, the presence of bots exponentially increases. As an example, a recent study
reported that 71% of all users mentioning stocks traded in US financial markets,
are likely to be bots [10].
Since social bots have a central role in the diffusion of disinformation, spam,
and malware, both scholars and practitioners devoted much effort to the devel-
opment of detection techniques. Nowadays, new studies on the characterization
and detection of social bots are published at an impressive rate, as shown in
Figure 1. An analysis of a subset of publications from 2018 reports that more
2https://about.twitter.com/en us/values/elections-integrity.html
Detecting malicious social bots: story of a never-ending clash 3
than 3 new papers were published (on average) every week on the topic of social
bots3. The rapidly growing publication trend suggests that in the near future
there will be one new paper published every day, which poses a heavy burden
on researchers trying to keep pace with the evolution of this field. This issue is
also emphasized by the lack of a thorough survey. Perhaps more importantly,
the rate at which new studies on this topic are published implies that a huge
effort is taking place worldwide in order to overcome the diffusion of social bots.
Given this picture, an important question arises: where is all this effort leading?
In the remainder of this paper we try to answer this crucial question via a
longitudinal analysis of ten years of research in the field of social bot detection.
2 Traditional social bot detection
The first work that focused on the detection of misbehaving accounts in OSNs
dates back to January 2010 [38]. Since then and until present days, the vast ma-
jority of attempts at bot detection have been based on heuristics (i.e., rule-based)
or on supervised machine learning [9]. An important implication of the adoption
of supervised machine learning is that each account is analyzed singularly. In
other words, given a group of accounts to investigate (e.g., an OSN community),
the detection technique is separately applied to each account of the group, to
which it assigns a label (either bot or legitimate). In fact, the key assumption
of this large body of work is that each bot/fake/spammer has peculiar features
that make it clearly distinguishable from legitimate accounts. This approach to
the task of social bot detection, which we call “traditional”, thus revolves around
the application of off-the-shelf machine learning algorithms on the accounts un-
der investigation, rather than on developing new algorithms. Indeed, most of the
works in this branch are focused on designing machine learning features – that
is, they are focused on the task of feature engineering – capable of maximizing
detection performances of well-known algorithms, such as SVM, decision trees,
random forests, and more [9].
Regarding features to exploit for the detection, 3 classes have been mainly
considered: (i) profile features [8,15]; (ii) features extracted from the posts, such
as posting behavior and content of posted messages [28,5]; and (iii) features
derived from the social or interaction graph of the accounts [22,26]. The classes
of features exploited by the detection technique have a strong impact on both
the performances of the detector as well as its efficiency. For instance, in Twitter
it has been demonstrated that those features that mostly contribute towards the
predictive power of bot detectors (e.g., graph-based features such as measures of
centrality in the social graph), are also the most costly ones, in terms of needed
data and computation [8].
Despite achieving promising initial results, the traditional approach – which
still comprises the majority of papers published nowadays – has a number of
drawbacks. The first challenge in developing a supervised detector is related
3Source: https://www.dimensions.ai/
4 S. Cresci
to the availability of a ground truth (i.e., labeled) dataset, to be used in the
learning phase of the classifier. In most cases, a real ground truth is lacking and
the labels are simply given by human operators that manually analyze the data.
Critical issues arise since, as of 2019, we still lack a “standard” definition of what
a social bot is [21,37]. Moreover, humans have been proven to suffer from several
biases [29] and to largely fail at spotting modern, sophisticated bots, with only
'24% bots correctly labeled as such by humans [9].
The biggest drawback of traditional approaches, however, is due to the evo-
lutionary nature of social bots, which we discuss in the following section.
3 The issue of bot evolution
Early success at social bot detection, in turn, inevitably inspired countermea-
sures by bot developers. Because of this, newer bots often feature advanced char-
acteristics that make them way harder to detect with respect to older ones. This
iterative process, that leads to the development of always more sophisticated
social bots, is commonly referred to as bot evolution.
A noteworthy work published in 2011, and later extended in 2013 [36], pro-
vided the first evidence and the theoretical foundations to study social bot evo-
lution. The first wave of social bots that populated OSNs until around 2011 was
made of rather simplistic bots – mainly accounts with very low perceived rep-
utation (e.g., few social connections and posted messages) and featuring clear
signs of automation (e.g., repeated spam of the same URLs). On the contrary,
the social bots studied in [36] appeared as more popular and credible, given the
relatively large number of their social connections. In addition, they were no
longer spamming the same messages over and over again, but they were instead
posting several messages with the same meaning but with different words, in or-
der to avoid detection techniques based on content analysis. Starting from these
findings, authors of [36] also proposed a supervised machine learning classifier
that was specifically designed for detecting evolving bots. Their classifier simul-
taneously leveraged features computed from the content of posted messages,
social connections, and tweeting behaviors, and initially proved capable of ac-
curately detecting the sophisticated bots. More recently, new studies provided
evidence of a third generation of social bots that spread through OSNs from 2016
onwards [18,9]. Unfortunately, the classifier originally developed in [36] was no
longer successful at detecting the third wave of social bots, as shown in [9].
The previous example serves as anecdotal evidence of bot evolution, and of
the detrimental effect it has on bot detectors. Additional evidence is reported
in [9], where authors evaluated the survivability of different bots, and the abil-
ity of humans in spotting bots in the wild. Specifically, authors of [9] showed
that only '5% of evolved bots are removed from social platforms (i.e., high
survivability), whilst “old” social bots are removed '60% of the times (i.e.,
low/moderate survivability). Moreover, in a large-scale crowdsourcing experi-
ment, tech-savvy social media users proved unable to tell apart evolved bots
and legitimate users, 76% of the times (i.e., 3 out of 4 evade detection by hu-
Detecting malicious social bots: story of a never-ending clash 5
mans). The same users were instead unable of spotting “old” social bots only
9% of the times (i.e., only 1 out of 10 evades detection) [9].
What results reported in [18,9] ultimately tell us, is that current sophisticated
bots are practically indistinguishable from legitimate accounts, if analyzed one at
a time. In other words, the results about bot evolution tell us that the assumption
of traditional (i.e., supervised) bot detection approaches, according to which
bots have features that allow to distinguish them from legitimate accounts, is
no longer true.
4 Modern social bot detection
The difficulties in detecting sophisticated bots with supervised approaches that
are based on the analysis of individual accounts, recently gave rise to a new
research trend that aims to analyze groups of accounts as a whole. This new
research trend is also motivated by the interest of platform administrators in
detecting what they typically refer to as “coordinated inauthentic behavior”4,5 .
Since 2013, several different research teams independently started to propose
new techniques for social bot detection. Despite being based on different key
concepts, all these new techniques – that collectively represent the “modern”
approach to social bot detection – included important contributions also from
the algorithmic point of view, thus shifting from general-purpose machine learn-
ing algorithms such as SVMs and decision trees, to ad-hoc algorithms that were
specifically designed for detecting bots. Furthermore, the majority of these new
algorithms considered groups of accounts as a whole, rather than single accounts,
thus moving in the direction of detecting the coordinated and synchronized be-
havior that characterizes malicious botnets [9].
As a consequence of this paradigm-shift, modern bot detectors are partic-
ularly effective at detecting evolving, coordinated, and synchronized bots. For
instance, the technique discussed in [13] associates each account to a sequence
of characters that encodes its behavioral information. Such sequences are then
compared between one another to find anomalous similarities among sequences
of a subgroup of accounts. The similarity is computed by measuring the longest
common subsequence shared by all the accounts of the group. Accounts that
share a suspiciously long subsequence are then labeled as bots. Instead, the fam-
ily of systems described in [22,26] build a bipartite graph of accounts and their
interactions with content (e.g., retweets to some other tweets) or with other ac-
counts (e.g., becoming followers of other accounts). Then, they aim to detect
anomalously dense blocks in the graph, which might be representative of coor-
dinated and synchronized attacks. Another recent example of an unsupervised,
group-based technique is RTbust [27], which is tailored for detecting mass-
retweeting bots. The technique leverages unsupervised feature extraction and
clustering. An LSTM autoencoder converts the retweet time series of accounts
into compact and informative latent feature vectors, which are then clustered
4https://newsroom.fb.com/news/2018/12/inside-feed-coordinated-inauthentic-behavior/
5https://help.twitter.com/en/rules-and-policies/platform- manipulation
6 S. Cresci
by a hierarchical density-based algorithm. Accounts belonging to large clusters
characterized by malicious retweeting patterns are labeled as bots, since they
are likely to represent retweeting botnets.
Given that bot detection techniques belonging to this modern approach still
represent the minority of all published papers on social bot detection, we still
lack a through and systematic study of the improvement brought by the mod-
ern approach to social bot detection. However, the first preliminary results that
compared the detection performances of traditional and modern detectors on
the same datasets, seem to support the increased effectiveness of the latter. In
particular, the technique introduced in [13] outperformed several traditional de-
tectors on two datasets, yielding an average F1 improvement of +0.37. Similarly,
RTbust [27] improved on a widely used traditional bot detector by increasing
F1 of +0.44. The promising results with modern bot detectors tell us that fo-
cusing on groups is advantageous. In fact, large groups of coordinated bots are
more likely to leave traces of automation than a single bot, independently of
how sophisticated the individual bots are [9]. By performing analyses at group
level, this modern approach appears to be able to raise the bar for bot develop-
ers to evade detection. Furthermore, the majority of modern bot detectors are
semi-supervised or unsupervised, which gives higher guarantees on the general-
izability of the detector and mitigates challenges related to the acquisition of a
reliable ground-truth.
5 The way ahead
So far, we highlighted that a shift is taking place in the development of bot
detectors, in order to counter the evolutionary nature of social bots. Now, by
looking at the latest advances in this thriving field, we aim at gaining some
insights into the future of social bot detection.
Notably, both the traditional and the modern approach to social bot detec-
tion have always followed a reactive schema. Quite naturally, the driving factor
for the development of new and better bot detectors have been bot mischiefs
themselves. As soon as scholars and OSN administrators identified a new group
of bots, possibly featuring new and advanced characteristics, they started the
development of detectors capable of spotting them. A major implication of this
reactive approach is that improvements in bot detection are possible only after
having collected evidence of new bot mischiefs. In turn, this means that scholars
and OSN administrators are constantly one step behind of bot developers, and
that bots have a significant time span (i.e., the time needed to design, develop,
and deploy a new detector) during which they are essentially free to tamper with
our online environments.
However, another – radically different – approach to social bot detection is
possible, and has just started being investigated by several researchers. This
trailblazing direction of research involves the application of adversarial machine
learning [19] to bot detection. Adversarial machine learning has already been
applied to a number of fields such as computer vision [24] and speech recog-
Detecting malicious social bots: story of a never-ending clash 7
nition [30], with exceptional results. In general, it is considered as a machine
learning paradigm that can be profitably applied to all scenarios that are intrin-
sically adversarial (i.e., with adversaries interested in fooling machine learning
models) [19], with social bots detection clearly being one of such scenarios [17].
In the so-called adversarial social bot detection, scholars try to find meaning-
ful adversarial examples with which to test current bot detectors [11]. In other
words, this branch of research aims at studying possible attacks to existing bot
detectors, with the goal of building more robust and more secure detectors. In
this context, adversarial examples might be sophisticated types of existing bots
that manage to evade detection by current techniques [1], or even bots that do
not exist yet, but whose behaviors and characteristics are simulated [12], or bots
developed ad-hoc for the sake of experimentation [20]. Finding good adversarial
examples can, in turn, help scholars understand the weaknesses of existing bot
detection systems, before such weaknesses are effectively exploited by bot devel-
opers. As a result, bot hunters need not wait anymore for new bot mischiefs in
order to adapt their techniques, but instead they can proactively test them, in an
effort that could quickly make them more robust. Among the positive outcomes
of adversarial approaches to bot detection, is a more rapid understanding of the
drawbacks of current detectors and the opportunity to gain insights into new
features for achieving more robust and more reliable detectors.
Despite the high hopes placed on adversarial social bot detection, this re-
search direction is still in its infancy. The very first works in this field have in
fact been published just in 2018 and 2019. Adversarial approaches to social bot
detection thus represent a promising new development of this field. However,
efforts at adversarial social bot detection can only be successful if the scientific
community decides to rise to the many open challenges. Among the challenges
opened up by proactive and adversarial approaches is the development of tech-
niques for creating many different kinds of adversarial examples, with which to
test existing bot detectors. A task that, to date, was only tackled by relying on
the creativity of some researchers and only for a few limited cases [11,20,12].
Moreover, adversarial approaches have proved computationally and data inten-
sive in some of the early tasks to which they were applied, with only few solutions
proposed to date to boost their efficiency [31]. Another challenge thus revolves
around assessing the efficiency of adversarial social bot detection, as well as its
coverage of the possible types of attacks (i.e., how likely it is with the adversarial
approach to anticipate a real future attack or a real future evolution of bots).
6 Conclusions
Our longitudinal analysis of the first decade of research in social bot detection
revealed some interesting trends in the development of bot detectors. In partic-
ular, we identified 3 ages of bot detection: (i) the traditional age, characterized
by the study of account features and by the adoption of off-the-shelf supervised
machine learning algorithms; (ii) the modern age, characterized by the develop-
ment of ad-hoc unsupervised algorithms for detecting groups of colluding bots;
8 S. Cresci
time
current state of development
traditional mo dern adversarial
key con-
cept
features allow to tell
apart bots and legiti-
mate accounts
synchronization and
coordination allow to
detect botnets
improve bot detectors
by finding their weak-
nesses
development
focus features (e.g., via fea-
ture engineering)
detection algorithms adversarial examples
method supervised, off-the-
shelf ML (e.g., deci-
sion trees, SVMs)
unsupervised, ad-hoc
algorithms
adversarial ML
target §single accounts groups of accounts bot detectors
: what scholars aim to optimize
: which machine learning (ML) paradigm scholars adopt
§: to what scholars apply their method
Table 1: The analysis of more than a decade of research and experimentation
in social bot detection allows to identify 3 main directions of research, corre-
sponding to 3 different ages: the traditional, the modern, and the adversarial
age. In turn, each age is characterized by a few distinctive features reported
above. Furthermore, an analysis of recently published papers on social bot de-
tection, positions current endeavors somewhere in between the traditional and
the modern ages.
and (iii) the newborn adversarial age, whose promise is to apply the paradigm
of adversarial machine learning to the task of bot detection. Given the consid-
erable amount of work still needed to lay the foundations of adversarial social
bot detection, the adversarial age has not really sparked yet. However, if it lives
up to its expectations, it might blossom soon with a tremendous impact. Apart
from the adversarial age, the characteristics of currently published works in so-
cial bot detection still highlight a majority of traditional detectors. However,
the gap between newly proposed traditional and modern detectors is narrowing.
Hence we can conclude that the peak of the traditional age is probably over, and
that we are moving towards the peak of the modern age, as pictorially shown in
Table 1.
The exponentially growing body of work on social bot detection shown in
Figure 1, somehow reassures us that much effort is bound to be devoted to the
fight of this critical issue. However, at the same time it also poses some new
challenges. Firstly, it is becoming more and more important to be able to or-
ganize this large body of work. Doing so would not only contribute to a better
exploitation of this knowledge, but would also allow researchers in bot detec-
tion to more effectively and more efficiently provide new solutions (e.g., avoid
wasting time and effort on solutions that have already proved unsuccessful).
Detecting malicious social bots: story of a never-ending clash 9
fake
followers
(])
mass
retweeters
fake
reviewers
information
polluters
.
.
.
now next evolution
types of social bots
time
adversarial social bot detection
generalize on:
bot
evolutions
types
of bots
both
difficulty
of detection:
easy
medium
hard
Fig. 2: A bi-dimensional theory of generalizability for social bot detectors. Let
us consider a detector developed for a specific kind of bots (marked with ]). The
detector will likely achieve its best performances when used against the same bots
it was developed for (green-colored scenario). However, it would be useful to also
evaluate its detection performances against different kinds of bots, thus moving
along the yaxis. Furthermore, by exploiting adversarial social bot detection,
it could also be possible to estimate its detection performances against evolved
bots, thus moving along the xaxis of the generalizability space. The hardest
foreseeable evaluation scenario is the one where a detector is tested against
evolved versions of bots for which it was not originally designed (red-colored).
The vast majority of newly proposed bot detectors are only evaluated in the
easiest scenario.
Unfortunately, thorough and comprehensive surveys on bot detection are still
few and far between. To this regard, this paper aims to provide a contribution
to the critical review and analysis of the vast literature in this field. Secondly,
more papers on this topic inevitably imply that more bot detectors will be pro-
posed. With the growing number of disparate detection techniques, it is thus
becoming increasingly important to have standard tools (e.g., frameworks, ref-
erence datasets, methodologies) to evaluate and compare them. In particular,
one facet of bot detectors that is often overlook is their generalizability – that is,
their capability in maintaining good detection results also for types of bots that
have not been originally considered. To this regard, the analyses carried out in
this study lay the foundations for a bi-dimensional theory of generalizability, as
shown in Figure 2. A desirable scenario for the near future would involve the
possibility to easily evaluate any new bot detector against many different types
10 S. Cresci
of social bots in order to assess its strengths and weaknesses, for instance by
following the approach laid out in [16]. It would also be profitable to be able to
evaluate detectors against possible evolved versions of current bots, by applying
the adversarial approach previously described. In order to reach this ambitious
goal, we must first create reference datasets that comprise several different kinds
of bots, thus significantly adding to the sparse resources existing as of today6.
Then, as already anticipated, we should also devise additional ways for creating
a broad array of diverse adversarial examples. These challenges currently stand
as unsolved, and call for the highest effort of our scientific community.
Acknowledgments
This research is supported in part by the EU H2020 Program under the scheme
INFRAIA-1-2014-2015: Research Infrastructures grant agreement #654024
SoBigData: Social Mining & Big Data Ecosystem.
References
1. Assenmacher, D., Adam, L., Frischlich, L., Trautmann, H., Grimme, C.: Openbots.
arXiv preprint arXiv:1902.06691 (2019)
2. Avvenuti, M., Bellomo, S., Cresci, S., La Polla, M.N., Tesconi, M.: Hybrid crowd-
sensing: A novel paradigm to combine the strengths of opportunistic and partici-
patory crowdsensing. In: ACM WWW Companion (2017)
3. Avvenuti, M., Cresci, S., Del Vigna, F., Fagni, T., Tesconi, M.: Crismap: a big data
crisis mapping system based on damage detection and geoparsing. Information
Systems Frontiers 20(5), 993–1011 (2018)
4. Avvenuti, M., Cresci, S., Marchetti, A., Meletti, C., Tesconi, M.: Predictability or
early warning: using social media in modern emergency response. IEEE Internet
Computing 20(6) (2016)
5. Chavoshi, N., Hamooni, H., Mueen, A.: DeBot: Twitter bot detection via warped
correlation. In: IEEE ICDM (2016)
6. Cresci, S.: Harnessing the Social Sensing revolution: Challenges and Opportunities.
PhD dissertation, University of Pisa (2018)
7. Cresci, S., D’Errico, A., Gazz´e, D., Lo Duca, A., Marchetti, A., Tesconi, M.: To-
wards a DBpedia of tourism: the case of Tourpedia. In: ISWC (2014)
8. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: Fame for sale:
Efficient detection of fake Twitter followers. Decision Support Systems 80 (2015)
9. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: The paradigm-
shift of social spambots: Evidence, theories, and tools for the arms race. In: ACM
WWW Companion (2017)
10. Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: Cashtag piggybacking: Un-
covering spam and bot activity in stock microblogs on twitter. ACM Transactions
on the Web 13(2), 11 (2019)
11. Cresci, S., Petrocchi, M., Spognardi, A., Tognazzi, S.: From Reaction to Proac-
tion: Unexplored Ways to the Detection of Evolving Spambots. In: ACM WWW
Companion (2018)
6https://botometer.iuni.iu.edu/bot-repository/datasets.html
Detecting malicious social bots: story of a never-ending clash 11
12. Cresci, S., Petrocchi, M., Spognardi, A., Tognazzi, S.: Better safe than sorry: An
adversarial approach to improve social bot detection. In: ACM WebSci (2019)
13. Cresci, S., Pietro, R.D., Petrocchi, M., Spognardi, A., Tesconi, M.: Social finger-
printing: Detection of spambot groups through DNA-inspired behavioral modeling.
IEEE Transactions on Dependable and Secure Computing 15(4), 561–576 (2018)
14. D’Andrea, E., Ducange, P., Lazzerini, B., Marcelloni, F.: Real-time detection of
traffic from twitter stream analysis. IEEE Transactions on Intelligent Transporta-
tion Systems 16(4), 2269–2283 (2015)
15. Davis, C.A., Varol, O., Ferrara, E., Flammini, A., Menczer, F.: BotOrNot: A system
to evaluate social bots. In: ACM WWW Companion (2016)
16. De Cristofaro, E., Kourtellis, N., Leontiadis, I., Stringhini, G., Zhou, S., et al.:
LOBO: Evaluation of generalization deficiencies in Twitter bot classifiers. In: ACM
ACSAC (2018)
17. Ferrara, E.: The history of digital spam. Communications of the ACM 62(8), 82–91
(2019)
18. Ferrara, E., Varol, O., Davis, C., Menczer, F., Flammini, A.: The rise of social
bots. Commununications of the ACM 59(7) (2016)
19. Goodfellow, I.J., McDaniel, P.D., Papernot, N.: Making machine learning robust
against adversarial inputs. Commununications of the ACM 61(7), 56–66 (2018)
20. Grimme, C., Assenmacher, D., Adam, L.: Changing perspectives: Is it sufficient to
detect social bots? In: SCSM (2018)
21. Grimme, C., Preuss, M., Adam, L., Trautmann, H.: Social bots: Human-like by
means of human control? Big data 5(4) (2017)
22. Jiang, M., Cui, P., Beutel, A., Faloutsos, C., Yang, S.: Catching synchronized be-
haviors in large networks: A graph mining approach. ACM Transactions on Knowl-
edge Discovery from Data 10(4) (2016)
23. Kavanaugh, A.L., Fox, E.A., Sheetz, S.D., Yang, S., Li, L.T., Shoemaker, D.J.,
Natsev, A., Xie, L.: Social media use by government: From the routine to the
critical. Government Information Quarterly 29(4), 480–491 (2012)
24. Ledig, C., Theis, L., Husz´ar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken,
A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-
resolution using a generative adversarial network. In: IEEE ICCV (2017)
25. de Lima Salge, C.A., Berente, N.: Is that social bot behaving unethically? Com-
munications of the ACM 60(9), 29–31 (2017)
26. Liu, S., Hooi, B., Faloutsos, C.: Holoscope: Topology-and-spike aware fraud detec-
tion. In: ACM CIKM (2017)
27. Mazza, M., Cresci, S., Avvenuti, M., Quattrociocchi, W., Tesconi, M.: RTbust:
Exploiting temporal patterns for botnet detection on twitter. In: ACM WebSci
(2019)
28. Miller, Z., Dickinson, B., Deitrick, W., Hu, W., Wang, A.H.: Twitter spammer
detection using data stream clustering. Information Sciences 260, 64–73 (2014)
29. Pandey, R., Castillo, C., Purohit, H.: Modeling human annotation errors to design
bias-aware systems for social stream processing. In: IEEE/ACM ASONAM (2019)
30. Pascual, S., Bonafonte, A., Serr`a, J.: SEGAN: Speech Enhancement Generative
Adversarial Network. In: Interspeech (2017)
31. Sahay, R., Mahfuz, R., Gamal, A.E.: A computationally efficient method for de-
fending adversarial deep learning attacks. arXiv preprint arXiv:1906.05599 (2019)
32. Shao, C., Ciampaglia, G.L., Varol, O., Yang, K.C., Flammini, A., Menczer, F.:
The spread of low-credibility content by social bots. Nature communications 9(1)
(2018)
12 S. Cresci
33. Starbird, K., Arif, A., Wilson, T.: Disinformation as collaborative work: Surfac-
ing the participatory nature of strategic information operations. In: ACM CSCW
(2019)
34. Stella, M., Ferrara, E., De Domenico, M.: Bots increase exposure to negative and in-
flammatory content in online social systems. Proceedings of the National Academy
of Sciences 115(49) (2018)
35. Varol, O., Ferrara, E., Davis, C.A., Menczer, F., Flammini, A.: Online human-bot
interactions: Detection, estimation, and characterization. In: AAAI ICWSM (2017)
36. Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fight-
ing evolving twitter spammers. IEEE Transactions on Information Forensics and
Security 8(8), 1280–1293 (2013)
37. Yang, K.C., Varol, O., Davis, C.A., Ferrara, E., Flammini, A., Menczer, F.: Arming
the public with artificial intelligence to counter social bots. Human Behavior and
Emerging Technologies 1(1), 48–61 (2019)
38. Yardi, S., Romero, D., Schoenebeck, G., et al.: Detecting spam in a twitter network.
First Monday 15(1) (2010)
... However, the growing efforts for detecting it can't keep up with the increasing effects that these actions seem to have on our society. In fact, fighting this phenomenon does not always correspond to effective counter-offensives, as the strategies employed by deceiving actors adapt and evolve over time as an evasion mechanism for the plethora of existing detection techniques [51]. In order to keep up with the evergrowing information manipulation strategies, in this thesis, we propose new solutions that go beyond current literature limitations and siloed approaches by addressing such issues as interconnected, comprehensive, and holistic problems. ...
... Account misbehavior has pervaded the online space since the early days of SM. A lot of research has focused its efforts on detecting it [51,72]. The vast majority of such detection techniques have always been characterized by two main characteristics: a supervised machine learning approach and an individual account target. ...
... In this way, we can delineate their features and give an idea of how they can evolve. Many works have already investigated the characteristics of social bots in spreading disinformation related to fields such as politics, health, science, and education [51]. However, financial social bots remain a largely unexplored topic. ...
... However, social media are often manipulated by social bots. Although not all social bots are malicious and dangerous, the majority of them constitute the main form of misinformation [2]. Research has showed that social bots can influence elections [3], [4], promote phishing attacks [5], and constitute an effective tool for manipulating social media by spreading articles of low-credibility scores [6]. ...
... while it achieves equal Precision and Specificity scores with GoogLeNet, ResNet50, and AlexNet. Additionally, GoogLeNet, WideResNet-50-2, 2 We experimented with EfficientNet-B0 to B7, but EfficientNet-B4 was the best performing model. AlexNet, and MobileNetV2 achieve Accuracy and F1-scores over 99.00%. ...
Preprint
Full-text available
Although not all bots are malicious, the vast majority of them are responsible for spreading misinformation and manipulating the public opinion about several issues, i.e., elections and many more. Therefore, the early detection of social spambots is crucial. Although there have been proposed methods for detecting bots in social media, there are still substantial limitations. For instance, existing research initiatives still extract a large number of features and train traditional machine learning algorithms or use GloVe embeddings and train LSTMs. However, feature extraction is a tedious procedure demanding domain expertise. Also, language models based on transformers have been proved to be better than LSTMs. Other approaches create large graphs and train graph neural networks requiring in this way many hours for training and access to computational resources. To tackle these limitations, this is the first study employing only the user description field and images of three channels denoting the type and content of tweets posted by the users. Firstly, we create digital DNA sequences, transform them to 3d images, and apply pretrained models of the vision domain, including EfficientNet, AlexNet, VGG16, etc. Next, we propose a multimodal approach, where we use TwHIN-BERT for getting the textual representation of the user description field and employ VGG16 for acquiring the visual representation for the image modality. We propose three different fusion methods, namely concatenation, gated multimodal unit, and crossmodal attention, for fusing the different modalities and compare their performances. Extensive experiments conducted on the Cresci '17 dataset demonstrate valuable advantages of our introduced approaches over state-of-the-art ones reaching Accuracy up to 99.98%.
... Several studies have explored and reviewed the research on false information [119,141], bots [23,31], and campaigns [76,126], and have provided definitions for related terms and examined their impact on various aspects of society [82,115]. For instance, existing surveys on false information detection aim to highlight aspects such as early detection, multimodal detection, and explanatory detection [56]. ...
Preprint
The rapid spread of false information and persistent manipulation attacks on online social networks (OSNs), often for political, ideological, or financial gain, has affected the openness of OSNs. While researchers from various disciplines have investigated different manipulation-triggering elements of OSNs (such as understanding information diffusion on OSNs or detecting automated behavior of accounts), these works have not been consolidated to present a comprehensive overview of the interconnections among these elements. Notably, user psychology, the prevalence of bots, and their tactics in relation to false information detection have been overlooked in previous research. To address this research gap, this paper synthesizes insights from various disciplines to provide a comprehensive analysis of the manipulation landscape. By integrating the primary elements of social media manipulation (SMM), including false information, bots, and malicious campaigns, we extensively examine each SMM element. Through a systematic investigation of prior research, we identify commonalities, highlight existing gaps, and extract valuable insights in the field. Our findings underscore the urgent need for interdisciplinary research to effectively combat social media manipulations, and our systematization can guide future research efforts and assist OSN providers in ensuring the safety and integrity of their platforms.
... Therefore, social bot developers will always search for vulnerabilities of detection systems and modify their bots accordingly. This evolving nature of social bots also requires a continuous upgrade of detection systems, yielding a "never-ending clash", as Cresci [40] points out. ...
... The use of social botnets in Russia's 2016 US campaign has become quite a case study with some scholars noting that from 2016 onwards we have been witnessing the third generation of social bots which are not only much harder to detect than their traditional counterparts, but given the trajectory of technology, will also evolve to become even more harder to detect in future (Cresci, 2019). Simultaneously, large-scale political messaging by social botnets also leads to what researchers have described as "Information Gerrymandering" (Stewart et al., 2019) -as a result of which when a political party uses large Social Botnet based messaging operation for campaigning and wins a disproportionately large share of the votes, other parties are also incentivized to follow the same methods, leaving everyone trapped in a deadlock. ...
Preprint
Full-text available
The 2016 US election was a watershed event where an electoral intervention by an adversarial state made extensive use of networks of software robots and data driven communications which transformed the interference into a goal driven functionality of man-machine collaboration. Reviewing the debates post the debacle, we reflect upon the policy consequences of the use of Social Botnets and understand the impact of their adversarial operation in terms of catalysing institutional decay, growing infrastructural anxieties, increased industry regulations, more vulnerable Individuals and more distorted ideas, and most importantly, the emergence of an unintended constituency in form of the bot agency itself. The article first briefly introduces the nature and evolution of Social Botnets, and then moves over to discussing the policy consequences. For future work, it is important to understand the agency and collective properties of these software robots, in order to design the institutional and socio-technical mechanisms which mitigate the risk of adversarial social engineering using these bots from interfering into democratic processes.
... This is not a specific limitation of our study, but rather a limitation related to all analyses that are based on social media data. Among the most common forms of manipulation in social media, are those related to the activity of automated accounts-so-called social bots (Cresci, 2019;Mazza et al., 2019)-and those related to the spread of false and misleading information-e.g., fake news (Docan-Morgan, 2019). Results related to the study of online manipulation and fake content, have however, demonstrated that the majority of malicious activities occur in discussions related to politics (Bessi and Ferrara, 2016) and finance -that is, in those scenarios characterised by the strongest political or economic interests. ...
Article
Full-text available
Social media posts incorporate real-time information that has, elsewhere, been exploited to predict social trends. This paper considers whether such information can be useful in relation to crime and fear of crime. A large number of tweets were collected from the 18 largest Spanish-speaking countries in Latin America, over a period of 70 days. These tweets are then classified as being crime-related or not and additional information is extracted, including the type of crime and where possible, any geo-location at a city level. From the analysis of collected data, it is established that around 15 out of every 1000 tweets have text related to a crime, or fear of crime. The frequency of tweets related to crime is then compared against the number of murders, the murder rate, or the level of fear of crime as recorded in surveys. Results show that, like mass media, such as newspapers, social media suffer from a strong bias towards violent or sexual crimes. Furthermore, social media messages are not highly correlated with crime. Thus, social media is shown not to be highly useful for detecting trends in crime itself, but what they do demonstrate is rather a reflection of the level of the fear of crime.
Article
Research on social bot detection plays a crucial role in maintaining the order and reliability of information dissemination while increasing trust in social interactions. The current mainstream social bot detection models rely on black-box neural network technology, e.g., Graph Neural Network, Transformer, etc., which lacks interpretability. In this work, we present UnDBot, a novel unsupervised, interpretable, yet effective and practical framework for detecting social bots. This framework is built upon structural information theory. We begin by designing three social relationship metrics that capture various aspects of social bot behaviors: Posting Type Distribution , Posting Influence , and Follow-to-follower Ratio . Three new relationships are utilized to construct a new, unified, and weighted social multi-relational graph, aiming to model the relevance of social user behaviors and discover long-distance correlations between users. Second, we introduce a novel method for optimizing heterogeneous structural entropy. This method involves the personalized aggregation of edge information from the social multi-relational graph to generate a two-dimensional encoding tree. The heterogeneous structural entropy facilitates decoding of the substantial structure of the social bots network and enables hierarchical clustering of social bots. Thirdly, a new community labeling method is presented to distinguish social bot communities by computing the user’s stationary distribution, measuring user contributions to network structure, and counting the intensity of user aggregation within the community. Compared with ten representative social bot detection approaches, comprehensive experiments demonstrate the advantages of effectiveness and interpretability of UnDBot on four real social network datasets.
Article
While social bots can be used for various good causes, they can also be utilized to manipulate people and spread malware. Therefore, it is crucial to detect bots running on social media platforms. However, social bots are increasingly successful in creating human-like messages with the recent developments in artificial intelligence. Thus, we need more sophisticated solutions to detect them. In this study, we propose a novel deep learning architecture in which three long short-term memory (LSTM) models and a fully connected layer are utilized to capture complex social media activity of humans and bots. Since our architecture involves many components connected at different levels, we explore three learning schemes to train each component effectively. In our extensive experiments, we analyze the impact of each component of our architecture on classification accuracy using four different datasets. Furthermore, we show that our proposed architecture outperforms all baselines used in our experiments.
Conference Paper
Steganography is the oldest technique that is been used from century, steganography purpose has not changed, i.e., all these techniques aim at hiding data or protecting data. With the help of steganalysis, the media can be analyzed to check for the presence of any secret information. Nowadays, attackers are making the use of advanced steganography approaches to conceal the secret information and communicate in a stealth manner. In this paper, the authors have discussed about the novel approach to hide malicious payload into image metadata. Therefore, metadata is a data that describes about the image rights and its administration. Hacker generally uses this metadata to perform various malicious attacks such embedding malicious script inside the image metadata and many more.
Conference Paper
Full-text available
The arm race between spambots and spambot-detectors is made of several cycles (or generations): a new wave of spambots is created (and new spam is spread), new spambot filters are derived and old spambots mutate (or evolve) to new species. Recently, with the diffusion of the adversarial learning approach, a new practice is emerging: to manipulate on purpose target samples in order to make stronger detection models. Here, we manipulate generations of Twitter social bots, to obtain - and study - their possible future evolutions, with the aim of eventually deriving more effective detection techniques. In detail, we propose and experiment with a novel genetic algorithm for the synthesis of online accounts. The algorithm allows to create synthetic evolved versions of current state-of-the-art social bots. Results demonstrate that synthetic bots really escape current detection techniques. However, they give all the needed elements to improve such techniques, making possible a proactive approach for the design of social bot detection systems.
Conference Paper
Full-text available
Within OSNs, many of our supposedly online friends may instead be fake accounts called social bots, part of large groups that purposely re-share targeted content. Here, we study retweeting behaviors on Twitter, with the ultimate goal of detecting retweeting social bots.We collect a dataset of 10M retweets. We design a novel visualization that we leverage to highlight benign and malicious patterns of retweeting activity. In this way, we uncover a ?normal" retweeting pattern that is peculiar of human-operated accounts, and suspicious patterns related to bot activities. Then, we propose a bot detection technique that stems from the previous exploration of retweeting behaviors. Our technique, called Retweet-Buster (RTbust), leverages unsupervised feature extraction and clustering. An LSTM autoencoder converts the retweet time series into compact and informative latent feature vectors, which are then clustered with a hierarchical density-based algorithm. Accounts belonging to large clusters characterized by malicious retweeting patterns are labeled as bots. RTbust obtains excellent detection results, with F1=0.87, whereas competitors achieve F1?0.76.Finally, we apply RTbust to a large dataset of retweets, uncovering 2 previously unknown active botnets with hundreds of accounts.
Conference Paper
Full-text available
Botnets in online social networks are increasingly often affecting the regular flow of discussion, attacking regular users and their posts, spamming them with irrelevant or offensive content, and even manipulating the popularity of messages and accounts. Researchers and cybercriminals are involved in an arms race, and new and updated botnets designed to defeat current detection systems are constantly developed, rendering such detection systems obsolete. In this paper, we motivate the need for a generalized evaluation in Twitter bot detection and propose a methodology to evaluate bot classifiers by testing them on unseen bot classes. We show that this methodology is empirically robust, using bot classes of varying sizes and characteristics and reaching similar results, and argue that methods trained and tested on single bot classes or datasets might not able to generalize to new bot classes. We train one such classifier on over 200,000 data points and show that it achieves over 97% accuracy. The data used to train and test this classifier includes some of the largest and most varied collections of bots used in literature. We then test this theoretically sound classifier using our methodology, highlighting that it does not generalize well to unseen bot classes. Finally, we discuss the implications of our results, and reasons why some bot classes are easier and faster to detect than others.
Article
Full-text available
The massive spread of digital misinformation has been identified as a major threat to democracies. Communication, cognitive, social, and computer scientists are studying the complex causes for the viral diffusion of misinformation, while online platforms are beginning to deploy countermeasures. Little systematic, data-based evidence has been published to guide these efforts. Here we analyze 14 million messages spreading 400 thousand articles on Twitter during ten months in 2016 and 2017. We find evidence that social bots played a disproportionate role in spreading articles from low-credibility sources. Bots amplify such content in the early spreading moments, before an article goes viral. They also target users with many followers through replies and mentions. Humans are vulnerable to this manipulation, resharing content posted by bots. Successful low-credibility sources are heavily supported by social bots. These results suggest that curbing social bots may be an effective strategy for mitigating the spread of online misinformation.
Article
Full-text available
Significance Social media can deeply influence reality perception, affecting millions of people’s voting behavior. Hence, maneuvering opinion dynamics by disseminating forged content over online ecosystems is an effective pathway for social hacking. We propose a framework for discovering such a potentially dangerous behavior promoted by automatic users, also called “bots,” in online social networks. We provide evidence that social bots target mainly human influencers but generate semantic content depending on the polarized stance of their targets. During the 2017 Catalan referendum, used as a case study, social bots generated and promoted violent content aimed at Independentists, ultimately exacerbating social conflict online. Our results open challenges for detecting and controlling the influence of such content on society.
Article
Full-text available
Such inputs distort how machine-learning-based systems are able to function in the world as it is.
Thesis
Full-text available
The recent proliferation of handheld devices that are equipped with a large number of sensors and communication capabilities, as well as the ubiquitous presence of communication facilities and infrastructures, and the mass diffusion and availability of social networking applications, has created a socio-technical convergence capable of sparking a revolution in the sensing world. One of the most promising and fascinating consequences of this new socio-technical convergence is the possibility to significantly extend, complement, and possibly substitute, conventional sensing by enabling the collection of data through networks of humans. Indeed, these unprecedented sensing and sharing opportunities have enabled situations where individuals not only play the role of sensor operators, but also act as data sources themselves. This spontaneous behavior has driven a new thriving – yet challenging – research field, called social sensing, investigating how human-sourced data can be gathered and used to gain situational awareness in a number of socially relevant domains. However, the social sensing revolution does not come without costs. Now that each of us can send messages for the entire world to read, or upload pictures for the entire world to see, the amount of real-time information out there far exceeds our cognitive capacity to consume it. Today, we have access to a plethora of blogs, discussion forums, and online social network accounts that provide orders of magnitude increases in the number of news sources. We are thus witnessing to the development of a widening gap between information production and our consumption capacity. Moreover, the reliability of such sources is not guaranteed. Indeed, it has already been demonstrated that observations produced by social sensors might be affected by a number of issues that undermine their usefulness and applicability. Among such issues are the widespread presence of fictitious, malicious, and deceptive social sensors; and the spreading of deceptive content, such as fake news. As a consequence, in order to fully harness this unfolding sensing revolution, we are in dire need of novel algorithms, techniques, and tools that are capable of turning this deluge of messy data into concise, meaningful, and reliable information. The possibility to fruitfully exploit this citizen-sensed stream of big data for novel applications – and ultimately for improving our societies and our everyday life – represents a tantalizing opportunity, counterbalanced by the many challenges related to the assessment of the reliability of such information, as well as its aggregation, summarization, and filtering. The goal of this thesis is to investigate the two sides of the “social sensing” coin. Thus, the main contributions of this doctoral work are twofold: (i) investigate the problem of credibility and reliability of social sensors; and (ii) explore the opportunities opened up by social sensing for a practically relevant scenario, such as that of emergency management.
Article
In this paper, we argue that strategic information operations (e.g. disinformation, political propaganda, and other forms of online manipulation) are a critical concern for CSCW researchers, and that the CSCW community can provide vital insight into understanding how these operations function-by examining them as collaborative "work" within online crowds. First, we provide needed definitions and a framework for conceptualizing strategic information operations, highlighting related literatures and noting historical context. Next, we examine three case studies of online information operations using a sociotechnical lens that draws on CSCW theories and methods to account for the mutual shaping of technology, social structure, and human action. Through this lens, we contribute a more nuanced understanding of these operations (beyond "bots" and "trolls") and highlight a persistent challenge for researchers, platform designers, and policy makers-distinguishing between orchestrated, explicitly coordinated, information operations and the emergent, organic behaviors of an online crowd.
Article
Tracing the tangled web of unsolicited and undesired email and possible strategies for its demise.