Conference PaperPDF Available

Measuring #GamerGate: A Tale of Hate, Sexism, and Bullying

April 2017

April 2017

DOI:10.1145/3041021.3053890

Conference: the 26th International Conference

Authors:

Despoina Chatzakou

The Centre for Research and Technology, Hellas

Nicolas Kourtellis

Telefónica I+D

Jeremy Blackburn

Binghamton University

Emiliano De Cristofaro

University of California, Riverside

Show all 6 authorsHide

Over the past few years, online aggression and abusive behaviors have occurred in many different forms and on a variety of platforms. In extreme cases, these incidents have evolved into hate, discrimination, and bullying, and even materialized into real-world threats and attacks against individuals or groups. In this paper, we study the Gamergate controversy. Started in August 2014 in the online gaming world, it quickly spread across various social networking platforms, ultimately leading to many incidents of cyberbullying and cyberaggression. We focus on Twitter, presenting a measurement study of a dataset of 340k unique users and 1.6M tweets to study the properties of these users, the content they post, and how they differ from random Twitter users. We find that users involved in this ``Twitter war'' tend to have more friends and followers, are generally more engaged and post tweets with negative sentiment, less joy, and more hate than random users. We also perform preliminary measurements on how the Twitter suspension mechanism deals with such abusive behaviors. While we focus on Gamergate, our methodology to collect and analyze tweets related to aggressive and bullying activities is of independent interest.

Similarity distribution.

…

CDF of (a) Emoticons, (b) Uppercases, (c) Sentiment, (d) Joy.

…

Figures - uploaded by Gianluca Stringhini

Content may be subject to copyright.

Content uploaded by Gianluca Stringhini

Content may be subject to copyright.

Measuring #GamerGate:

A Tale of Hate, Sexism, and Bullying

Despoina Chatzakou†, Nicolas Kourtellis‡, Jeremy Blackburn‡

Emiliano De Cristofaro], Gianluca Stringhini], Athena Vakali†

†Aristotle University of Thessaloniki ‡Telefonica Research ]University College London

deppych@csd.auth.gr, nicolas.kourtellis@telefonica.com, jeremy.blackburn@telefonica.com

e.decristofaro@ucl.ac.uk, g.stringhini@ucl.ac.uk, avakali@csd.auth.gr

ABSTRACT

Over the past few years, online aggression and abusive behaviors

have occurred in many different forms and on a variety of plat-

forms. In extreme cases, these incidents have evolved into hate,

discrimination, and bullying, and even materialized into real-world

threats and attacks against individuals or groups. In this paper, we

study the Gamergate controversy. Started in August 2014 in the

online gaming world, it quickly spread across various social net-

working platforms, ultimately leading to many incidents of cyber-

bullying and cyberaggression. We focus on Twitter, presenting a

measurement study of a dataset of 340k unique users and 1.6M

tweets to study the properties of these users, the content they post,

and how they differ from random Twitter users. We ﬁnd that users

involved in this “Twitter war” tend to have more friends and fol-

lowers, are generally more engaged and post tweets with negative

sentiment, less joy, and more hate than random users. We also

perform preliminary measurements on how the Twitter suspension

mechanism deals with such abusive behaviors. While we focus on

Gamergate, our methodology to collect and analyze tweets related

to aggressive and bullying activities is of independent interest.

1. INTRODUCTION

With the proliferation of social networking services and always-

on always-connected devices, social interactions have increasingly

moved online, as social media has become an integral part of peo-

ple’s every day life. At the same time, however, new instanti-

ations of negative interactions have arisen, including aggressive

and bullying behavior among online users. Cyberbullying, the

digital manifestation of bullying and aggressiveness in online so-

cial interactions, has spread to various platforms such as Twit-

ter [23], Youtube [6], Ask.fm [14,15], and Facebook [27]. Other

community-based services such as Yahoo Answers [16], and online

gaming platforms are not an exception [18]. Research has showed

that bullying actions are often organized, with online users called to

participate in hateful raids against other social network users [13].

2017 International World Wide Web Conference Committee (IW3C2),

published under Creative Commons CC BY 4.0 License.

WWW 2017 Companion April 3–7, 2017, Perth, Australia.

ACM 978-1-4503-4914-7/17/04.

http://dx.doi.org/10.1145/3041021.3053890

The Gamergate controversy [17] is one example of a coordi-

nated campaign of harassment in the online world. It started with

a blog post by an ex-boyfriend of independent game developer Zoe

Quinn, alleging sexual improprieties. 4chan boards like /r9k/ [1]

and /pol/ [2], turned it into a narrative about “ethical” concerns

in video game journalism and began organizing harassment cam-

paigns [13]. It quickly devolved into a polarizing issue, involving

sexism, feminism, and “social justice,” taking place on social me-

dia like Twitter [11]. Although held up as a pseudo-political move-

ment by its adherents, there is substantial evidence that Gamergate

is more accurately described as an organized campaign of hate and

harassment [12]. What started as “mere” denigration of women

in the gaming industry, eventually evolved into directed threats of

violence, rape, and murder [29]. Gamergate came about due to a

unique time in the digital world in general, and gaming in partic-

ular. The recent democratization of video game development and

distribution via platforms such as Steam has allowed for a new gen-

eration of “indie” game developers who often have a more intimate

relationship with their games and the community of gamers that

play them. With the advent of ubiquitous social media and a com-

munity born in the digital world, the Gamergate controversy pro-

vides us a unique point of view into online harassment campaigns.

Roadmap. In this paper, we explore a slice of the Gamergate con-

troversy by analyizing 1.6M tweets from 340k unique users part of

whom engaged in it. As a ﬁrst attempt at quantifying this contro-

versy, we focus on how these users, and the content they post, differ

from random (baseline) Twitter users. We discover that Gamer-

gaters are seemingly more engaged than random Twitter users,

which is an indication as to how and why this controversy is still

on going. We also ﬁnd that, while their tweets appear to be aggres-

sive and hateful, Gamergaters do not exhibit common expressions

of online anger, and in fact primarily differ from random users in

that their tweets are less joyful. The increased rate of engagement

of Gamergate users makes it more difﬁcult for Twitter to deal with

all these cases at once, something reﬂected in the relative low sus-

pension rates of such users. In the struggle to combat existing ag-

gressive and bullying behaviors, Twitter recently took new actions

and is now temporarily limiting users for abusive behavior [25].

Finally, we note that, although our work is focused on Gamergate

in particular, our principled methodology to collect and analyze

tweets related to aggressive and bullying activities on Twitter can

be generalized and it is thus of independent interest.

Paper Organization. Next section reviews related work, then Sec-

tion 3discusses our data collection methodology. In Section 4, we

present the results of our analysis and lessons we learn from them.

Finally, the paper concludes in Section 5.

2. RELATED WORK

Previous research has studied and aimed at detecting offensive,

abusive, aggressive, and bullying content on social media, includ-

ing Twitter, YouTube, Instagram, Facebook and Ask.fm. Next, we

cover related work on this type of behavior in general, as well as

work related to the Gamergate case.

Detecting abusive behavior. Chen et al. [6] aim to detect offensive

content and potential offensive users by analyzing YouTube com-

ments. Then, Hosseinmardi et al. [14,15] turn to cyberbullying on

Instagram and Ask.fm. Speciﬁcally, in [15], besides considering

available text information, they also try to associate the topic of an

image (e.g., drugs, celebrity, sports, etc.) to possible cyberbullying

events, concluding that drugs are highly associated with cyberbul-

lying. Also, in a effort to create a suitable dataset for their analy-

sis, at ﬁrst the authors collected a large number of media sessions

– i.e., videos and images along with comments – from Instagram

public proﬁles, with a subset selected for labeling. To ensure that

an adequate number of cyberbullying instances will be present in

the dataset, they selected media sessions with at least one profan-

ity word. Finally, they relied on the CrowdFlower crowdsourcing

platform to determine whether or not such sessions are related with

cyberbullying or cyberaggression. In [14] authors leveraged both

likes and comments to identify negative behavior in the Ask.fm so-

cial network. Here, their dataset was created by exploiting publicly

accessible proﬁles, e.g. questions, answers, and likes.

Other works aim to detect hate/abusive content on Yahoo Fi-

nance. In [8], the authors use a Yahoo Finance dataset labeled over

a 6-month period. Nobata et al. [20] gather a new dataset from

Yahoo Finance and News comments: each comment is initially

characterized as either abusive or clean (from Yahoo’s in-house

trained raters), with further analysis on the abusive comments spec-

ifying whether they contain hate, derogative language, or profanity.

They follow two annotation processes, with labeling performed by:

(i) three trained raters, and (ii) workers recruited from Amazon Me-

chanical Turk, concluding that the former is more effective.

Kayes et al. [16] focus on a Community-based Question-

Answering (CQA) site, Yahoo Answers, ﬁnding that users tend

to ﬂag abusive content posted in an overwhelmingly correct way,

while in [7], the problem of cyberbullying is further decomposed

to sensitive topics related to race and culture, sexuality, and in-

telligence, using YouTube comments extracted from controver-

sial videos. Hee et al. [27] also study speciﬁc types of cyber-

bullying, e.g., threats and insults, on Dutch posts extracted from

Ask.fm social media. They also highlight three main user behav-

iors, harasser, victim, and bystander – either bystander-defender or

bystander-assistant who support the victim or the harasser, respec-

tively. Their dataset was created by crawling a number of seed sites

from Ask.fm, with a limited number of cyberbullying instances.

They complement the data with more cyberbullying related content

by: (i) launching a campaign where people reported personal cases

of cyberbullying taking place in different platforms, i.e., Facebook,

message board posts and chats, and (ii) by designing a role-playing

game involving a cyberbullying simulation on Facebook. Then,

they ask manual annotators to characterize content as being part of

a cyberbullying event, and indicate the author’s role in such event,

i.e., victim, harasser, bystander-defender, or bystander-assistant.

Sanchez et al. [23] use Twitter messages to detect bullying in-

stances and more speciﬁcally cases related to gender bullying.

They use a distant supervision approach [10] to automatically label

a set of tweets by using a set of abusive terms used to character-

ize text as expressing negative or positive sentiment. The dataset is

then used to train a classiﬁer geared to ﬁnding inappropriate words

in Twitter text and detect bullying – the hypothesis being that bul-

lying instances most probably contain negative sentiment. Finally,

in [3] the authors propose an approach suitable for detecting bul-

lying and aggressive behavior on Twitter. They study the prop-

erties of cuberbullies and aggressors and what distinguishes them

from regular users. To perform their analysis, they build upon the

CrowdFlower crowdsourcing tool to create a dataset where users

are characterized as bullies, aggressors, spammers, or normal.

Even though Twitter is among the most popular social networks,

only a few efforts have focused on detecting abusive content on it.

Here, we propose an approach for building a ground truth dataset,

using Twitter as a source of information, which will contain a

higher density of abusive content (mimicking real life abusive post-

ing activity).

Analysis of Gamergate. In our work, the hashtag #GamerGate

serves as a seed word to build a dataset of abusive behavior, as

Gamergate is one of the best documented large-scale instances of

bullying/aggressive behavior we are aware of [17]. With individ-

uals on both sides of the controversy using it, and extreme cases

of cyberbullying and aggressive behavior associated with it (e.g.,

direct threats of rape and murder), #GamerGate is a relatively un-

ambiguous hashtag associated with tweets that are likely to involve

abusive/aggressive behavior. Prior work has also looked at Gamer-

gate in somewhat related contexts. For instance, Guberman and

Hemphill [11] used #GamerGate to collect a sufﬁcient number of

harassment-related tweets in an effort to study and detect toxic-

ity on Twitter. Also, Mortensen [18] likens the Gamergate phe-

nomenon to hooliganism, i.e., a leisure-centered aggression where

fans are organized in groups to attack another group’s members.

3. METHODOLOGY

In this section, we present our methodology for collecting and

processing a dataset of abusive behavior on Twitter. In this paper,

we focus on the Gamergate case, however, our methodology can be

generalized to other platforms and case studies.

3.1 Data Collection

Seed keyword(s). The ﬁrst step is to select one or more seed key-

words, which are likely related to the occurrence of abusive inci-

dents. Besides #GamerGate, good examples are also #BlackLives-

Matter and #PizzaGate. In addition to such seed words, a set of

hate- or curse-related words can also be used, e.g., words extracted

from the Hatebase database (HB)1, to start collecting possible abu-

sive texts from social media sources. Therefore, at time t1, the list

of words to be used for ﬁltering posted texts includes only the seed

word(s), i.e., L(t1) =< seed(s)>.

In our case, we focus on Twitter, and more speciﬁcally we build

upon the Twitter Streaming API2which gives access to 1% of all

tweets. This returns a set of correlated information, either user-

based, e.g., poster’s username, followers and friends count, proﬁle

image, total number of posted/liked/favorite tweets, or text-based,

e.g., the text itself, hashtags, URLs, if it is a retweeted or reply

tweet, etc. The data collection process took place from June to

August 2016. Initially, we obtained a 1% of the sample public

tweets and parsed it to select all tweets containing the seed word

#GamerGate, which are likely to involve the type of behavior, and

the case study we are interested in.

Dynamic list of keywords. In addition to the seed keyword(s),

further ﬁltering keywords are used to select abusive-related con-

tent. The list of the additional keywords is updated dynamically in

1https://www.hatebase.org/

2https://dev.twitter.com/streaming/overview

consecutive time intervals based on the posted texts during these

intervals. Thus, in T={t1, t2, ..., tn}the keywords list L(t) has

the following form: L(ti) =< seed(s), kw1, k w2, kwN>, where

kwjis the jth top keyword in time period ∆T=ti−ti−1. De-

pending on the topic under examination, i.e., if it is a popular topic

or not, the creation of the dynamic keywords list can be split to dif-

ferent consecutive time intervals. To maintain the dynamic list of

keywords for the time period ti−1→ti, we investigate the texts

posted in this time period. We extract Nkeywords found during

that time, compute their frequency and rank them into a tempo-

rary list LT (ti). We then adjust the dynamic list L(ti)with entries

from the temporary list LT (ti)to create a new dynamic list that

contains the up-to-date top N keywords along with the seed words.

This new list is used in the next time period ti→ti+1 for the

ﬁltering of posted text.

As mentioned, #GamerGate serves as a seed for a snowball sam-

pling of other hashtags likely associated with cyberbullying and

aggressive behavior. We include tweets with hashtags appearing

in the same tweets as #GamerGate (the keywords list is updated

on a daily basis). Overall, we reach 308 hashtags during the data

collection period. A manual examination of these hashtags reveals

that they do contain a number of hate words, e.g., #InternationalOf-

fendAFeministDay,#IStandWithHateSpeech, and #KillAllNiggers.

Random sample. To complement the dataset with cases that are

less likely to contain abusive content, we also crawl a random sam-

ple of texts over the same time period. In our case, we simply crawl

a random set of tweets, which constitutes our baseline.

Remarks. Overall, we have collected two datasets: (i) a random

sample set of 1M tweets, and (ii) a set of 659k tweets which are

likely to contain abusive behavior.

We argue that the our data collection methodology provides sev-

eral beneﬁts with respect to performance. First, it allows for regular

updates of the keyword list, hence, the collection of more up-to-

date content and capturing previously unseen behaviors, keywords,

and trends. Second, it lets us adjust the update time of the dynamic

keywords list based on the observed burstiness of the topics under

examination, thus eliminating the possibility of either losing new

information or collecting repeatedly the same information. Finally,

this process can be parallelized for scalability on multiple machines

using a Map-Reduce script for computing top N keywords list. All

machines maintain local top N keyword lists which are aggregated

globally in a central controller, enabling the construction of a global

dynamic top N keyword list that can be distributed back to the com-

puting / crawling machines.

3.2 Data Processing

We performed preprocessing of the data collected to produce a

‘clean’ dataset, free of noisy data.

Cleaning. We remove stop words, URLs, numbers, and punctua-

tion marks. Additionally, we perform normalization, i.e., we elimi-

nate repeated letters and repetitive characters which users often use

to express their feelings more intensely (e.g., the word ‘hellooo’ is

converted to ‘hello’).

Spam removal. Even though extensive work has been done on

spam detection in social media, e.g., [9,24,28], Twitter is still full

of spam accounts [5], often using vulgar language and exhibiting

behavior (repeated posts with similar content, mentions, or hash-

tags) that could also be considered as aggressive or bullying. So,

to eliminate part of this noise we proceeded with a ﬁrst-level spam

removal process by considering two attributes which have already

been used as ﬁlters (e.g., [28]) to remove spam incidents: (i) the

number of hashtags per tweet (often used for boosting the visibility

Figure 1: Similarity distribution.

of the posted tweets), and (ii) posting of (almost) similar tweets. To

ﬁnd optimal cutoffs for these heuristics, we study both the distribu-

tion of hashtags and the duplication of tweets.

Hashtags. The hashtags distribution shows that users tend to use

from 0 to about 17 hashtags on average. With such information at

hand, we test different cutoffs to set a proper limit, upon which the

user could be characterized as spammer. After a manual inspection

on a sample of posts, we set the limit to 5 hashtags, i.e., users with

more than 5 hashtags, on average, are ﬂagged as spammers, and

their tweets are removed from the dataset.

Duplications. We also estimate the similarity of users’ posts based

on an appropriate similarity measure. In many cases, a user’s tweets

are (almost) the same, while only the listed mentioned users are

modiﬁed. So, in addition to the previous presented cleaning pro-

cesses, we also remove all existing mentions. We then proceed to

compute the Levenshtein distance [19], which counts the minimum

number of single-character edits needed to convert one string into

another, averaging it out over all pairs of their tweets. Initially, for

each user, we calculated the intra-tweet similarity, then we set to

out estimate the average intra-tweets similarity. For a user with x

tweets, we use a set of nsimilarity scores, where n=x(x−1)/2.

In the end, all users with intra-tweet similarity above 0.8are ex-

cluded from the dataset. Figure 1shows that about 5% of the users

have a high percentage of similar posts and which were removed.

4. RESULTS

In this section, we present the results of our measurement-

based characterization, comparing the baseline and the Gamergate

(GG) related datasets across various dimensions, including user at-

tributes, posting activity, content semantics, and Twitter account

status.

4.1 How Active are Gamergaters?

Account age. An underlying question about the Gamergate contro-

versy is what started ﬁrst: participants tweeting about it or Twitter

users participating in Gamergate? In other words, did Gamergate

draw people to Twitter, or were Twitter users drawn to Gamer-

gate? In Figure 2a, we plot the distribution of account age for

users in the Gamergate dataset and baseline users. For the most

part, GG users tend to have older accounts (mean = 982.94 days,

median = 788 days, S T D = 772.49 days). The mean, me-

dian, and STD values for the random users are 834.39,522, and

652.42 days, respectively. Based on a two-sample Kolmogorov-

Smirnov test,3the two distributions are different with a test statistic

D= 0.20142 and p < 0.01. Overall, the oldest account in our

3A statistical test to compare the probability distributions of differ-

ent samples.

(a) Account age distribution. (b) Posts distribution. (c) Hashtags distribution.

Figure 2: CDF of (a) Account age, (b) Number of posts, and (c) Hashtags.

(a) Favorites distribution. (b) Lists distribution. (c) URLs distribution. (d) Mentions distribution.

Figure 3: CDF of (a) Number of Favorites, (b) Lists, (c) URLs, (d) Mentions.

dataset belongs to a GG user, while only 26.64% of baseline ac-

counts are older than the mean value of the GG users. Figure 2a

indicates that GG users were existing Twitter users drawn to the

controversy. In fact, their familiarity with Twitter could be the rea-

son that Gamergate exploded in the ﬁrst place.

Tweets and Hashtags. In Figure 2b, we plot the distribution of

the number of tweets made by GG users and random users. GG

users are signiﬁcantly more active than random Twitter users (D=

0.352,p < 0.01). The mean, median, and STD values for the GG

(random) users are 135,618 (49,342), 48,587 (9,429), and 185,997

(97,457) posts, resp. Figure 2c reports the CDF of the number

of hashtags found in users’ tweets for both GG and the random

sample, ﬁnding that GG users use signiﬁcantly (D= 0.25681,

p < 0.01) more hashtags than random Twitter users.

Other characteristics. Figures 3a and 3b show the CDFs of fa-

vorites and lists declared in the users’ proﬁles. We note that in

the median case, GG users are similar to baseline users, but on the

tail end (30% of users), GG users have more favorites and topi-

cal lists declared than random users. Then, Figure 3c reports the

CDF of the number of URLs found in tweets by both baseline and

GG users. The former post fewer URLs (the median indicates a

difference of 1-2 URLs, D= 0.26659,p < 0.01), while the lat-

ter post more in an attempt to disseminate information about their

“cause,” somewhat using Twitter like a news service. Finally, Fig-

ure 3d shows that GG users tend to make more mentions within

their posts, which can be ascribed to the higher number of direct

attacks compared to random users.

Take aways. Overall, the behavior we observe is indicative of GG

users’ “mastery” of Twitter as a mechanism for broadcasting their

ideals. They make use of more advanced features, e.g., lists, tend

to favorite more tweets, and share more URLs and hashtags than

random users. Using hashtags and mentions can draw attention to

(a) Friends distribution. (b) Followers distribution.

Figure 4: CDF of (a) Number of Friends, (b) Followers.

their message, thus GG users likely use them to disseminate their

ideas deeper in the Twitter network, possibly aiming to attack more

users and topical groups.

4.2 How Social are Gamergaters?

Gamergaters are involved in what we would typically think of

as anti-social behavior. However, this is somewhat at odds with

the fact that their activity takes place primarily on social media.

Aiming to give an idea of how “social” Gamergaters are, in Fig-

ures 4a and 4b, we plot the distribution of friends and followers

for GG users vs baseline users. We observe that, perhaps surpris-

ingly, GG users tend to have more friends and followers (D= 0.34

and 0.39,p < 0.01 for both). Although this might be somewhat

counter-intuitive, the reality is that Gamergate was born on social

media, and the controversy appears to be a clear “us vs. them” sit-

uation. This leads to easy identiﬁcation of in-group membership,

thus heightening the likelihood of relationship formation.

(a) Emoticons distribution. (b) Uppercases distribution. (c) Sentiment distribution. (d) Joy distribution.

Figure 5: CDF of (a) Emoticons, (b) Uppercases, (c) Sentiment, (d) Joy.

The ease of in-group membership identiﬁcation is somewhat dif-

ferent than polarizing issues in the real world where it may be difﬁ-

cult to know a person’s views on a polarizing subject, without actu-

ally engaging them on the subject. In fact, people in real life might

be unwilling to express their viewpoint because of social conse-

quences. On the contrary, on social media platforms like Twitter,

(pseudo-)anonymity often removes much of the inhibition people

feel in the real world, and public timelines can often provide per-

sistent and explicit expression of viewpoints.

4.3 How Different is Gamergater’s Content?

Emoticons and Uppercase Tweets. Two common ways to express

emotion in social media are emoticons and “shouting” by using all

capital letters. Based on the nature of Gamergate, we would expect

a relatively low number of emoticon usage, but many tweets that

would be shouting in all uppercase letters. However, as we can

see in Figures 5a and 5b, which plot the CDF of emoticon usage

and all uppercase tweets, respectively, this is not the case. GG

and random users tend to use emoticons at about the same rate (we

are unable to reject the null hypothesis with D= 0.028 and p=

0.96). However, GG users tend to use all uppercase less often (D=

0.212,p < 0.01). As mentioned, GG users are savvy Twitter users,

and generally speaking, shouting tends to be ignored. Thus, one

explanation for this behavior is that GG users avoid such a simple

“tell” as posting in all uppercase, to ensure their message is not so

easily dismissed.

Sentiment. In Figure 5c, we plot the CDF of sentiment of tweets.

In both cases (GG and baseline) around 25% of tweets are posi-

tive. However, GG users post tweets with a generally more nega-

tive sentiment (D= 0.101,p < 0.01). In particular, around 25%

of GG tweets are negative compared to only around 15% for base-

line users. This observation aligns with the fact that the GG dataset

contains a large proportion of offensive posts.

We also compare the offensiveness score of tweets according to

Hatebase, a crowdsourced list of hate words. Each word included in

HB is scored on a [0, 100] scale, which indicates how hateful it is.

Though the difference is slight, GG users use more hate words than

random users (D= 0.006,p < 0.01). The mean and standard

deviation values for HB score are 0.06 and 2.16 for the baseline

users, while for the GG users they are 0.25 and 3.55, respectively.

Finally, based on [4], we extract sentiment values for 6differ-

ent emotions: anger, disgust, fear, joy, sadness, and surprise. We

note that of these, the 2-sample KS test is unable to reject the null

hypothesis except for joy, as shown in Figure 5d (D= 0.089,

p < 0.01). This is particularly interesting because it contradicts

the narrative that Gamergaters are posting virulent content out of

anger. Instead, GG users are less joyful, and this is a subtle but

active deleted suspended

Baseline 67% 13% 20%

Gamergate 86% 5% 9%

Table 1: Status distribution.

important difference: they are not necessarily angry, but they are

apparently not happy.

4.4 Are Gamergaters Suspended More Often?

A Twitter user can be in one of the following three statuses: ac-

tive,deleted, or suspended. Typically, Twitter suspends an account

(temporarily or even permanently, in some cases) if it has been hi-

jacked/compromised, is considered spam/fake, or if it is abusive.4

A user account is deleted if the user-owner of the account deacti-

vates their account. In the following, we examine the differences

among these three statuses with respect to GG and baseline users.

To examine these differences, we focus on a sample of 33k users

from both the GG and baseline datasets. From Table 1, we ob-

serve that, in both cases, users tend to be suspended more often

than deleting their accounts by choice. However, baseline users are

more prone to be suspended (20%) or delete their accounts (13%)

than GG users (9% and 5%, respectively). This seems to be in

line with the behavior observed in Figure 2a, which shows that GG

users have been in the platform for a longer period of time; some-

what surprising given their exhibited behavior. Indeed, a small por-

tion of these users may be spammers who are difﬁcult to detect

and ﬁlter out. Nevertheless, Twitter has made signiﬁcant efforts to

address spam and we suspect there is a higher presence of such ac-

counts in the baseline dataset, since the GG dataset is very much

focused around a somewhat niche topic.

These efforts are less apparent when it comes to the bullying

and aggressive behavior phenomena observed on Twitter in gen-

eral [22,26], and in our study of Gamergate users in particular.

However, recently, Twitter has increased its efforts to combat the

existing harassment cases, for instance, by preventing suspended

users from creating new accounts [21], or temporarily limiting

users for abusive behavior [25]. Such efforts constitute initial steps

to deal with the ongoing war among the abusers, their victims, and

online bystanders.

5. CONCLUSION

This paper presented a ﬁrst-of-its-kind effort to quantitatively an-

alyze the Gamergate controversy. We collected 1.6M tweets from

4https://support.twitter.com/articles/15790

340k unique users using a generic methodology (which can also be

used for other platforms and other case studies). Although focused

on a narrow slice of time, we found that, in general, users tweeting

about Gamergate appear to be Twitter savvy and quite engaged with

the platform. They produce more tweets than random users, and

have more friends and followers as well. Surprisingly, we observed

that, while expressing more negative sentiment overall, these users

only differed signiﬁcantly from random users with respect to joy.

Finally, we looked at account suspension, ﬁnding that Gamergate

users are less likely to be suspended due to the inherent difﬁculties

in detecting and combating online harassment activities.

While we believe our work contributes to understanding large-

scale online harassment, it is only a start. As part of future work,

we plan to perform a more in-depth study of Gamergate, focusing

on how it evolved over time. Overall, we argue that a deeper under-

standing of how online harassment campaigns function can enable

our community to better address them and propose detection tools

as well as mitigation strategies.

Acknowledgements. This work has been funded by the Euro-

pean Commission as part of the ENCASE project (H2020-MSCA-

RISE), under GA number 691025.

6. REFERENCES

[1] Anonymous. "i dated zoe quinn". 4chan.

https://archive.is/qrS5Q.

[2] Anonymous. Zoe Quinn, prominent SJW and indie developer

is a liar and a slut. 4chan. https://archive.is/QIjm3.

[3] D. Chatzakou, N. Kourtellis, J. Blackburn, E. D. Cristofaro,

G. Stringhini, and A. Vakali. Mean Birds: Detecting

Aggression and Bullying on Twitter. arXiv preprint,

arXiv:1702.06877, 2017.

[4] D. Chatzakou, V. A. Koutsonikola, A. Vakali, and

K. Kafetsios. Micro-blogging content analysis via

emotionally-driven clustering. In ACII, 2013.

[5] C. Chen, J. Zhang, X. Chen, Y. Xiang, and W. Zhou. 6

million spam tweets: A large ground truth for timely Twitter

spam detection. In IEEE ICC, 2015.

[6] Y. Chen, Y. Zhou, S. Zhu, and H. Xu. Detecting Offensive

Language in Social Media to Protect Adolescent Online

Safety. In PASSAT and SocialCom, 2012.

[7] K. Dinakar, R. Reichart, and H. Lieberman. Modeling the

detection of textual cyberbullying. The Social Mobile Web,

11, 2011.

[8] N. Djuric, J. Zhou, R. Morris, M. Grbovic, V. Radosavljevic,

and N. Bhamidipati. Hate Speech Detection with Comment

Embeddings. In WWW, 2015.

[9] M. Giatsoglou, D. Chatzakou, N. Shah, A. Beutel,

C. Faloutsos, and A. Vakali. Nd-sync: Detecting

synchronized fraud activities. In Advances in Knowledge

Discovery and Data Mining, 19th Paciﬁc-Asia Conference,

2015.

[10] A. Go, R. Bhayani, and L. Huang. Twitter sentiment

classiﬁcation using distant supervision. Processing, 2009.

[11] J. Guberman and L. Hemphill. Challenges in modifying

existing scales for detecting harassment in individual tweets.

In System Sciences, 2017.

[12] A. Hern. Feminist critics of video games facing threats in

‘gamergate’ campaign. The Guardian, Oct 2014.

https://www.theguardian.com/technology/2014/oct/23/

felicia-days-public-details- online-gamergate.

[13] G. E. Hine, J. Onaolapo, E. De Cristofaro, N. Kourtellis,

I. Leontiadis, R. Samaras, G. Stringhini, and J. Blackburn. A

longitudinal measurement study of 4chan’s politically

incorrect forum and its effect on the web. arXiv preprint

arXiv:1610.03452, 2016.

[14] H. Hosseinmardi, R. Han, Q. Lv, S. Mishra, and

A. Ghasemianlangroodi. Towards understanding

cyberbullying behavior in a semi-anonymous social network.

In IEEE/ACM ASONAM, 2014.

[15] H. Hosseinmardi, S. A. Mattson, R. I. Raﬁq, R. Han, Q. Lv,

and S. Mishra. Analyzing Labeled Cyberbullying Incidents

on the Instagram Social Network. In SocInfo, 2015.

[16] I. Kayes, N. Kourtellis, D. Quercia, A. Iamnitchi, and

F. Bonchi. The Social World of Content Abusers in

Community Question Answering. In WWW, 2015.

[17] A. Massanari. #gamergate and the fappening: How reddit’s

algorithm, governance, and culture support toxic

technocultures. New Media & Society, 2015.

[18] T. E. Mortensen. Anger, fear, and games. Games and

Culture, 2016.

[19] G. Navarro. A Guided Tour to Approximate String Matching.

ACM Comput. Surv., 33(1), 2001.

[20] C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and

Y. Chang. Abusive Language Detection in Online User

Content. In WWW, 2016.

[21] Pham, Sherisse. Twitter tries new measures in crackdown on

harassment. CNNtech, February 2017.

http://money.cnn.com/2017/02/07/technology/

twitter-combat-harassment-features/.

[22] Twitter trolls are now abusing the company’s bottom line.

http://www.salon.com/2016/10/19/

twitter-trolls-are-now-abusing-the-companys- bottom-line/,

2016.

[23] H. Sanchez and S. Kumar. Twitter bullying detection. ser.

NSDI, 12, 2011.

[24] G. Stringhini, C. Kruegel, and G. Vigna. Detecting

spammers on social networks. In ACSAC, 2010.

[25] A. Sulleyman. Twitter temporarily limiting users for abusive

behaviour. Independent, February 2017.

https://www.theguardian.com/technology/2014/oct/23/

felicia-days-public-details- online-gamergate.

[26] The Guardian. Did trolls cost Twitter 3.5bn and its sale?

https://www.theguardian.com/technology/2016/oct/18/

did-trolls-cost-twitter-35bn, 2016.

[27] C. Van Hee, E. Lefever, B. Verhoeven, J. Mennes,

B. Desmet, G. De Pauw, W. Daelemans, and V. Hoste.

Automatic detection and prevention of cyberbullying. In

Human and Social Analytics, 2015.

[28] A. H. Wang. Don’t follow me: Spam detection in Twitter. In

SECRYPT, 2010.

[29] N. Wingﬁeld. Feminist critics of video games facing threats

in ‘gamergate’ campaign. New York Times, Oct 2014.

https://www.nytimes.com/2014/10/16/technology/

gamergate-women-video- game-threats-anita-sarkeesian.

html.

Politics on YouTube: Detecting Online Group Polarization Based on News Videos’ Comments

Article

Full-text available

May 2024

Technology-mediated group toxicity polarization is a major socio-technological issue of our time. For better large-scale monitoring of polarization among social media news content, we quantify the toxicity of news video comments using a Toxicity Polarization Score. For polarizing news videos, our premise is that the comments’ toxicity approximates either an “M” or “U” shaped distribution—that is, there is unevenly balanced toxicity among the comments. We evaluate our premises through a case study using a dataset of ~180,000 YouTube comments on ~3,700 real news videos from an international online news organization. Toward polarization-mitigating information systems, we build a predictive machine learning model to score the toxicity polarization of news content even when its comments are disabled or not available, as it is a current trend among news publishers to disable comments. Findings imply that the most engaging news content is also often the most polarizing, which we associate with increasing research on clickbait content and the detrimental effect of attention-based metrics on the health of online social media communities, especially news communities.

Towards Understanding the Role of Content-based and Contextualized Features in Detecting Abuse on Twitter

Article

Full-text available

Apr 2024

This paper presents a novel approach for detecting abuse on Twitter. Abusive posts have become a major problem for social media platforms like Twitter. It is important to identify abuse to mitigate its potential harm. Many researchers have proposed methods to detect abuse on Twitter. However, most of the existing approaches for detecting abuse look only at the content of the abusive tweet in isolation and do not consider its contextual information, particularly the tweets posted before the abusive tweet. In this paper, we propose a new method for detecting abuse that uses contextual information from the tweets that precede and follow the abusive tweet. We hypothesize that this contextual information can be used to better understand the intent of the abusive tweet and to identify abuse that content-based methods would otherwise miss. We performed extensive experiments to identify the best combination of features and machine learning algorithms to detect abuse on Twitter. We test eight different machine learning classifiers on content- and context-based features for the experiments. The proposed method is compared with existing abuse detection methods and achieves an absolute improvement of around 7%. The best results are obtained by combining the content and context-based features. The highest accuracy of the proposed method is 86%, whereas the existing methods used for comparison have highest accuracy of 79.2%.

Doctoral Colloquium—Challenging Stereotypes Around NeurodiversityThrough Co-Design in a 2D Videogame, The Things Left Behind

Conference Paper

Jun 2024

Grounding Toxicity in Real-World Events across Languages

Preprint

Full-text available

May 2024

Social media conversations frequently suffer from toxicity, creating significant issues for users, moderators, and entire communities. Events in the real world, like elections or conflicts, can initiate and escalate toxic behavior online. Our study investigates how real-world events influence the origin and spread of toxicity in online discussions across various languages and regions. We gathered Reddit data comprising 4.5 million comments from 31 thousand posts in six different languages (Dutch, English, German, Arabic, Turkish and Spanish). We target fifteen major social and political world events that occurred between 2020 and 2023. We observe significant variations in toxicity, negative sentiment, and emotion expressions across different events and language communities, showing that toxicity is a complex phenomenon in which many different factors interact and still need to be investigated. We will release the data for further research along with our code.

From Research to Applications: What Can We Extract with Social Media Sensing?

Article

Full-text available

Apr 2024

With the constant growth of social media in our daily lives, a huge amount of information is generated online by multiple social networks. However, what can we actually extract with the science of social media sensing? It is a very challenging task to mine meaningful data out of this vast crowdsourcing volume, which also rapidly changes or ends up being misleading. The scope of this paper is to present different approaches that overcome these challenges and utilize social media information from various sources. This work illustrates applications that: improve the performance of architectural design; preserve the cultural heritage; enhance citizen security; provide early detection for disasters; and discover creeping crisis events. A large variety of analyses are presented, including, among other, disaster or crime event detection, user identity linkage, relevance classification, and community detection techniques. The evaluation of the presented methods is also given in this article, proving that they can be practical and valuable in many applications.

Toxic comments are associated with reduced activity of volunteer editors on Wikipedia

Article

Full-text available

Dec 2023

Wikipedia is one of the most successful collaborative projects in history. It is the largest encyclopedia ever created, with millions of users worldwide relying on it as the first source of information as well as for fact-checking and in-depth research. As Wikipedia relies solely on the efforts of its volunteer editors, its success might be particularly affected by toxic speech. In this paper, we analyze all 57 million comments made on user talk pages of 8.5 million editors across the six most active language editions of Wikipedia to study the potential impact of toxicity on editors’ behavior. We find that toxic comments are consistently associated with reduced activity of editors, equivalent to 0.5–2 active days per user in the short term. This translates to multiple human-years of lost productivity, considering the number of active contributors to Wikipedia. The effects of toxic comments are potentially even greater in the long term, as they are associated with a significantly increased risk of editors leaving the project altogether. Using an agent-based model, we demonstrate that toxicity attacks on Wikipedia have the potential to impede the progress of the entire project. Our results underscore the importance of mitigating toxic speech on collaborative platforms such as Wikipedia to ensure their continued success.

The automatic racism text detection on social media data using machine learning

Conference Paper

Jan 2024

An Examination of Online Hate and Harassment Targeting Immigrants

Conference Paper

Nov 2023

Sina Keshvadi

Social media use by secondary school students: Benefits and challenges

Chapter

Jan 2024

Damian Maher

Contextual Target-Specific Stance Detection on Twitter: Dataset and Method

Conference Paper

Dec 2023

Automatic detection and prevention of cyberbullying

Conference Paper

Full-text available

Oct 2015

The recent development of social media poses new challenges to the research community in analyzing online interactions between people. Social networking sites offer great opportunities for connecting with others, but also increase the vulnerability of young people to undesirable phenomena, such as cybervictimization. Recent research reports that on average, 20% to 40% of all teenagers have been victimized online. In this paper, we focus on cyberbullying as a particular form of cybervictimization. Successful prevention depends on the adequate detection of potentially harmful messages. However, given the massive information overload on the Web, there is a need for intelligent systems to identify potential risks automatically. We present the construction and annotation of a corpus of Dutch social media posts annotated with fine-grained cyberbullying-related text categories, such as insults and threats. Also, the specific participants (harasser, victim or bystander) in a cyberbullying conversation are identified to enhance the analysis of human interactions involving cyberbullying. Apart from describing our dataset construction and annotation, we present proof-of-concept experiments on the automatic identification of cyberbullying events and fine-grained cyberbullying categories.

Challenges in Modifying Existing Scales for Detecting Harassment in Individual Tweets

Conference Paper

Full-text available

Apr 2017

In an effort to create new sociotechnical tools to combat online harassment, we developed a scale to detect and measure verbal violence within individual tweets. Unfortunately, we found that the scale, based on scales effective at detecting harassment offline, was unreliable for tweets. Here, we begin with information about the development and validation of our scale, then discuss the scale's shortcomings for detecting harassment in tweets, and explore what we can learn from this scale's failures. We explore how rarity, context, and individual coder's differences create challenges for detecting verbal violence in individual tweets. We also examine differences in on-and offline harassment that limit the utility of existing harassment measures for online contexts. We close with a discussion of potential avenues for future work in automated harassment detection.

Kek, Cucks, and God Emperor Trump: A Measurement Study of 4chan's Politically Incorrect Forum and its Effects on the Web

Conference Paper

Full-text available

May 2017

Although it has been a part of the dark underbelly of the Internet since its inception, recent events have brought the discussion board site 4chan to the forefront of the world's collective mind. In particular, /pol/, 4chan's "Politically Incorrect" board has become a central figure in the outlandish 2016 Presidential election. Even though 4chan has long been viewed as the "final boss of the Internet," it remains relatively unstudied in the academic literature. In this paper we analyze /pol/ along several axes using a dataset of over 8M posts. We first perform a general characterization that reveals how active posters are, as well as how some unique features of 4chan affect the flow of discussion. We then analyze the content posted to /pol/ with a focus on determining topics of interest and types of media shared, as well as the usage of hate speech and differences in poster demographics. We additionally provide quantitative evidence of /pol/'s collective attacks on other social media platforms. We perform a quantitative case study of /pol/'s attempt to poison anti-trolling machine learning technology by altering the language of hate on social media. Then, via analysis of comments from the 10s of thousands of YouTube videos linked on /pol/, we provide a mechanism for detecting attacks from /pol/ threads on 3rd party social media services.

Analyzing Labeled Cyberbullying Incidents on the Instagram Social Network

Conference Paper

Full-text available

Dec 2015

Cyberbullying is a growing problem affecting more than half of all American teens. The main goal of this paper is to study labeled cyberbullying incidents in the Instagram social network. In this work, we have collected a sample data set consisting of Instagram images and their associated comments. We then designed a labeling study and employed human contributors at the crowd-sourced CrowdFlower website to label these media sessions for cyberbullying. A detailed analysis of the labeled data is then presented, including a study of relationships between cyberbullying and a host of features such as cyberaggression, profanity, social graph features, temporal commenting behavior, linguistic content, and image content.

#Gamergate and The Fappening: How Reddit's algorithm, governance, and culture support toxic technocultures

Article

Full-text available

Oct 2015
NEW MEDIA SOC

Adrienne Massanari

This article considers how the social-news and community site Reddit.com has become a hub for anti-feminist activism. Examining two recent cases of what are defined as “toxic technocultures” (#Gamergate and The Fappening), this work describes how Reddit’s design, algorithm, and platform politics implicitly support these kinds of cultures. In particular, this piece focuses on the ways in which Reddit’s karma point system, aggregation of material across subreddits, ease of subreddit and user account creation, governance structure, and policies around offensive content serve to provide fertile ground for anti-feminist and misogynistic activism. The ways in which these events and communities reflect certain problematic aspects of geek masculinity is also considered. This research is informed by the results of a long-term participant-observation and ethnographic study into Reddit’s culture and community and is grounded in actor-network theory.

Kek, Cucks, and God Emperor Trump: A Measurement Study of 4chan’s Politically Incorrect Forum and Its Effects on the Web

Article

May 2017

The discussion-board site 4chan has been part of the Internet's dark underbelly since its inception, and recent political events have put it increasingly in the spotlight. In particular, /pol/, the “Politically Incorrect'” board, has been a central figure in the outlandish 2016 US election season, as it has often been linked to the alt-right movement and its rhetoric of hate and racism. However, 4chan remains relatively unstudied by the scientific community: little is known about its user base, the content it generates, and how it affects other parts of the Web. In this paper, we start addressing this gap by analyzing /pol/ along several axes, using a dataset of over 8M posts we collected over two and a half months. First, we perform a general characterization, showing that /pol/ users are well distributed around the world and that 4chan's unique features encourage fresh discussions. We also analyze content, finding, for instance, that YouTube links and hate speech are predominant on /pol/. Overall, our analysis not only provides the first measurement study of /pol/, but also insight into online harassment and hate speech trends in social media.

Abusive Language Detection in Online User Content

Conference Paper

Apr 2016

Detection of abusive language in user generated online content has become an issue of increasing importance in recent years. Most current commercial methods make use of blacklists and regular expressions, however these measures fall short when contending with more subtle, less ham-fisted examples of hate speech. In this work, we develop a machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach. We also develop a corpus of user comments annotated for abusive language, the first of its kind. Finally, we use our detection tool to analyze abusive language over time and in different settings to further enhance our knowledge of this behavior.

6 million spam tweets: A large ground truth for timely Twitter spam detection

Conference Paper

Jun 2015

Hate Speech Detection with Comment Embeddings

Conference Paper

May 2015

We address the problem of hate speech detection in online user comments. Hate speech, defined as an "abusive speech targeting specific group characteristics, such as ethnicity, religion, or gender", is an important problem plaguing websites that allow users to leave feedback, having a negative impact on their online business and overall user experience. We propose to learn distributed low-dimensional representations of comments using recently proposed neural language models, that can then be fed as inputs to a classification algorithm. Our approach addresses issues of high-dimensionality and sparsity that impact the current state-of-the-art, resulting in highly efficient and effective hate speech detectors.

Towards understanding cyberbullying behavior in a semi-anonymous social network

Conference Paper

Aug 2014

Measuring #GamerGate: A Tale of Hate, Sexism, and Bullying

Abstract and Figures

Recommended publications

Hate is not Binary: Studying Abusive Behavior of #GamerGate on Twitter

Hate is not Binary: Studying Abusive Behavior of #GamerGate on Twitter

Measuring #GamerGate: A Tale of Hate, Sexism, and Bullying

Mean Birds: Detecting Aggression and Bullying on Twitter