PreprintPDF Available

Sociocultural Considerations in Monitoring Anti-LGBTQ+ Content on Social Media

July 2024

July 2024

License
CC BY 4.0

Authors:

Sidney GJ Wong

University of Canterbury

Preprints and early-stage research may not have been peer reviewed yet.

The purpose of this paper is to ascertain the influence of sociocultural factors (i.e., social, cultural , and political) in the development of hate speech detection systems. We set out to investigate the suitability of using open-source training data to monitor levels of anti-LGBTQ+ content on social media across different national-varieties of English. Our findings suggests the social and cultural alignment of open-source hate speech data sets influences the predicted outputs. Furthermore, the keyword-search approach of anti-LGBTQ+ slurs in the development of open-source training data encourages detection models to overfit on slurs; therefore, anti-LGBTQ+ content may go undetected. We recommend combining empirical outputs with qualitative insights to ensure these systems are fit for purpose. Content Warning: This paper contains unobfuscated examples of slurs, hate speech, and offensive language with reference to homophobia and transphobia which may cause distress.

Content uploaded by Sidney GJ Wong

Content may be subject to copyright.

Sociocultural Considerations in Monitoring Anti-LGBTQ+ Content on

Social Media

Sidney G.-J. Wong1,2,3

1University of Canterbury, New Zealand

2Geospatial Research Institute, New Zealand

3New Zealand Institute of Language, Brain and Behaviour, New Zealand

{sidney.wong}@pg.canterbury.ac.nz

Abstract

The purpose of this paper is to ascertain the in-

ﬂuence of sociocultural factors (i.e., social, cul-

tural, and political) in the development of hate

speech detection systems. We set out to investi-

gate the suitability of using open-source train-

ing data to monitor levels of anti-LGBTQ+ con-

tent on social media across different national-

varieties of English. Our ﬁndings suggests the

social and cultural alignment of open-source

hate speech data sets inﬂuences the predicted

outputs. Furthermore, the keyword-search ap-

proach of anti-LGBTQ+ slurs in the develop-

ment of open-source training data encourages

detection models to overﬁt on slurs; therefore,

anti-LGBTQ+ content may go undetected. We

recommend combining empirical outputs with

qualitative insights to ensure these systems are

ﬁt for purpose.

Content Warning: This paper contains unobfus-

cated examples of slurs, hate speech, and offensive

language with reference to homophobia and trans-

phobia which may cause distress.

1 Introduction

The proliferation of hate speech on social media

platforms continues to negatively impact LGBTQ+

communities (Stefania and Buf,2021). As a conse-

quence of anti-LGBTQ+ hate speech, these already

minoritised and marginalised communities may ex-

perience digital exclusion and barriers to access

in the form of the digital divide (Norris,2001).

There have been considerable developments within

the ﬁeld of Natural Language Processing (NLP)

in response to this social issue (Sánchez-Sánchez

et al.,2024), with most of the methodological ad-

vancements in this area being made in the last three

decades (Tontodimamma et al.,2021).

While much of hate speech research has focused

on documentation and detection, there has been

little attention on how these approaches can be ap-

plied across different social, political, or linguistic

contexts (Locatelli et al.,2023). Just as the appro-

priateness of swear words is highly contextually

variable depending on language and culture (Jay

and Janschewitz,2008), hate speech in the form

of anti-LGBTQ+ hate speech is often predicated

by social, cultural, and political attitudes towards

diverse gender and sexualities. With minimal litera-

ture beyond just a system development context, we

set out to investigate the suitability of implement-

ing open-source anti-LGBTQ+ hate speech system

on real-world sources of social media data.

This paper makes two contributions: ﬁrstly, we

show the predicted outputs from classiﬁcation mod-

els can be transformed into various time series

data sets to monitor the rate and volume of anti-

LGBTQ+ hate speech on social media. Secondly,

we argue that social, cultural, and linguistic bias

introduced during the data collection phase has an

impact on the suitability of these approaches.

1.1 Related Work

Hate speech detection is often treated as a text clas-

siﬁcation task, whereby existing data can be used

to train machine learning models to predict the

attributes of unknown data (Jahan and Oussalah,

2023). The main focus of these systems are racism,

sexism and gender discrimination, and violent rad-

icalism (Sánchez-Sánchez et al.,2024). Both the

production and deployment of hate speech detec-

tion systems are methodologically similar produced

under the following pipeline (Kowsari et al.,2019):

Data Set Collection and Preparation: involves

collecting either real-world or synthetic in-

stances of hate speech in a language condition

(i.e., keyword search). This phase may in-

volve or manual annotation from experts of

crowd-sourced annotators.

Feature Engineering: involves manipulating

and transforming instances of hate speech.

arXiv:submit/5702373 [cs.CL] 1 Jul 2024

This may involve anonymisation or conﬁden-

tialisation depending on the privacy and data

use rules for each social media platform.

Model Training: involves developing a hate

speech detection system with machine learn-

ing algorithms. This may involve statistical

language models or incorporating transformer-

based large language models.

Model Evaluation: involves producing model

performance metrics to determine the statisti-

cal validity of the system. This may involve

making predictions on unseen or test data.

Despite their straightforward workﬂow, these

systems pose a number of ethical challenges and

risks to the vulnerable communities (Vidgen and

Derczynski,2020). Cultural biases and harms can

be introduced at each stage of the data set produc-

tion process (Sap et al.,2019). Some of this can

be attributed to poorly designed systems which are

not ﬁt for purpose (Vidgen and Derczynski,2020).

For example, racial bias was identiﬁed in one open-

source hate speech detection system developed by

Davidson et al. (2017) which resulted in samples

of written African American English being mis-

classiﬁed as instances of hate speech and offensive

language (Davidson et al.,2019).

The presence of racial bias can be attributed to

the decisions made during the Data Set Collec-

tion and Preparation phase during system devel-

opment. Davidson et al. (2017) took a keyword

search approach (i.e., slurs and profanities) to iden-

tify instances of hate speech and offensive language.

These samples were then used in the development

of the detection system. Although slurs and pro-

fanities are good evidence of anti-social behaviour,

the same words can also be re-appropriated or re-

claimed by target communities (Popa-Wyatt,2020).

Classiﬁcations algorithms are unable to account for

implicit world knowledge.

Similarly, simple machine learning algorithms

cannot account for linguistic variation which is

another form of implicit world knowledge. Of in-

terest to our current investigation, Wong (2023a)

applied the same system developed by Davidson

et al. (2017) on samples of tweets/posts originat-

ing in New Zealand. The system erroneously

classiﬁed tweets/posts with words such as bugger,

digger, and stagger as instances of hate speech.

An unintended consequence of these misclassiﬁed

tweets/posts is that rural areas exhibited higher

rates of hate speech and offensive language when

compared to the national mean.

However, not all forms of biases stem from de-

cisions made during system development. Recent

innovations in transformer-based language models,

such as BERT (Devlin et al.,2019), have introduced

new ethical challenges as the presence of gender,

race, and other forms of bias have been observed

in the word embeddings of large language models

(Tan and Celis,2019). This means there is potential

for bias even in the later stages of system develop-

ment during the Model Training phase.

While we grow increasingly aware of the im-

pacts from these limitations (Alonso Alemany et al.,

2023), the number of hate speech detection data

sets and systems continue to increase (Tontodi-

mamma et al.,2021). A systematic review of hate

speech literature has identiﬁed over 69 training

data sets to detect hate speech on online and social

media for 21 different language conditions (Jahan

and Oussalah,2023). Seemingly, the solution to

addressing social, cultural, and political discrep-

ancies within hate speech detection is to develop

more systems in different languages.

There remains little interest from NLP re-

searchers to consider the issue of hate speech de-

tection from a social impact lens (Hovy and Spruit,

2016). The primary concerns in this research area

are largely methodological. For example, improv-

ing model performance of detection systems result-

ing from noisy training data (Arango et al.,2022).

Laaksonen et al. (2020) critiqued the dataﬁcation

of hate speech detection which in turn has become

an unnecessary distraction for NLP researchers in

combating this social issue.

In fact, the appetite in applying NLP approaches

for social good has decreased over time (Fortuna

et al.,2021). Some researchers are beginning to

question whether the efforts put towards the de-

velopment and production of hate speech detec-

tion systems is the ideal solution for this social

issue (Parker and Ruths,2023). In sidelining these

pressing issues in hate speech detection research,

we may unintentionally perpetuate existing preju-

dices against marginalised and minoritised groups

these systems were meant to support (Buhmann

and Fieseler,2021).

In light of these ethical and methodological chal-

lenges in hate speech detection (Das et al.,2023),

we are starting to see how sociolinguistic infor-

mation can be used to ﬁne tune and improve the

social and cultural performance of hate speech de-

Hostility Direct Indirect Total

Abusive 20 45 65

Disrespectful 5 56 61

Fearful 5 47 52

Hateful 36 106 142

Normal 13 71 84

Offensive 65 308 373

Total 144 633 777

Table 1: The distribution of English posts/tweets and

the level of hostility by directness targeting sexual ori-

entation in Ousidhoum et al. (2019). Note that all totals

are total responses.

tection (Wong et al.,2023;Wong and Durward,

2024) using well-attested methods such as domain

adaptation (Liu et al.,2019). NLP researchers may

still play an invaluable role in combating online

hate speech by incorporating sociocultural consid-

erations in the development and deployment of hate

speech detection systems.

2 Methodology

As discussed in Section 1.1, hate speech detection

research needs to undergo a paradigmatic shift in

order to truly enable positive social impact, social

good, and social beneﬁt potential. The main pur-

pose of this paper is to ascertain the inﬂuence of

sociocultural factors (i.e., social, cultural, and po-

litical) in the development of hate speech detection

systems. Our research questions are as follows:

RQ1

Can we use open-source hate speech training

data to monitor anti-LGBTQ+ hate speech in

real world instances of social media? and;

RQ2

How do the social, cultural, and linguistic con-

texts of open-source training data impact on

the suitability of anti-LGBTQ+ hate speech

detection?

In order to address RQ1, we compare and con-

trast two anti-LGBTQ+ hate speech detection sys-

tems. We provide an in depth description of the

data sources in Section 2.1 and our system devel-

opment pipeline in Section 2.2. Once we develop

the detection systems, we apply the detection sys-

tems on real-world samples of social media data to

monitor anti-LGBTQ+ hate speech across different

geographic dialects.

We opted for a mixed-methods approach to ad-

dress this emergent area of enquiry. This is because

Class EN G TAM TAM-ENG

HOMO 276 723 465

TRANS 13 233 184

NONE 4,657 3,205 5,385

Total 4,946 4,161 6,034

Table 2: The class distribution of YouTube comments

based on the three-class classiﬁcation system (homo-

phobic (HO MO), transphobic (TRANS), and non-anti-

LGBTQ+ (NO NE) content) by language condition (En-

glish (EN G), Tamil (TAM), and Tamil-English (TAM-

ENG)) in Chakravarthi et al. (2021).

RQ2 can only be addressed qualitatively as we con-

sider the suitability of the detection systems and

the sociocultural relevance of the predicted outputs.

We will address RQ2 in the discussion (Section 4);

however, we have provided relevant sociolinguistic,

cultural, and political information in Section 2.3 to

contextualise our discussion.

2.1 Data Sources

As part of our investigation, we use two open-

source training data sets to develop our anti-

LGBTQ+ hate speech detection systems in our

investigation: Ousidhoum et al. (2019) (Multi-

lingual and Multi-Aspect Hate Speech Data Set;

MLMA) and Chakravarthi et al. (2021) (LTEDI)

The MLMA and LTEDI were chosen due to the avail-

ability of data and documentation to understand the

data set collection and annotation process.

The MLMA is a multilingual hate speech data

set for posts/tweets from X (Twitter) for English,

French, and Arabic (Ousidhoum et al.,2019). The

authors took a keyword search approach by retriev-

ing posts/tweets which matched a list of common

slurs, controversial topics, and discourse patterns

typically found in a hate speech. This approach

proved challenging due to the high-rates of code-

switching in the English and French conditions

and Arabic diglossia. The posts/tweets were then

posted on the crowd-sourcing platform, Mechani-

cal Turk, for public annotation.

One of the most well-documented anti-LGBTQ+

training data sets is the English, Tamil, and English-

Tamil anti-LGBTQ+ hate speech data set developed

by Chakravarthi et al. (2021). The data set contains

public comments to LGBTQ+ videos on YouTube.

The comments were manually annotated based on

We refer to it as LTEDI with reference to its central role

in the various shared tasks hosted as part of the Language

Technology for Equity, Diversity, and Inclusion

Figure 1: Model comparison of anti-LGBTQ+

hate speech on ten randomised samples of 10,000

posts/tweets per month from India between June 2018

to June 2023 including grouped mean and the upper and

lower conﬁdence intervals.

a three-class (i.e., homophobic, transphobic, and

non-anti-LGBTQ+ hate speech). The training data

was tested with three language models: MURIL

(Khanuja et al.,2021), MBERT (Pires et al.,2019),

and XLM-ROB ERTA (Conneau et al.,2020).

The results show that transformer-based models,

such as BERT, outperformed statistical language

models with minimal ﬁne-tuning. The best per-

forming BERT-based system for English yielded an

averaged

-score of 0.94 (Maimaitituoheti et al.,

2022). This anti-LGBTQ+ training data set has

since expanded to a suite of additional language

conditions such as Spanish (García-Díaz et al.,

2020), Hindi and Malayalam (Kumaresan et al.,

2023), and Telugu, Kannada, Gujarati, Marathi,

and Tulu (Chakravarthi et al.,2024).

We discuss the similarities and differences be-

tween the two data sets in relation sociocultural

considerations regarding the data collection strat-

egy in Section 2.1.1, the annotation strategy in Sec-

tion 2.1.2, and the cultural alignment in Section

2.1.3 derived from available documentation.

2.1.1 Data Collection

The developers of the MLMA took a culturally-

agnostic approach with limited information on the

data collection points; however, evidence of code-

switching between English with Hindi, Spanish,

and French posed a challenge to annotators. The

MLMA took a keyword search approach to ﬁlter

X (Twitter) for instances hate speech. The key-

words in relation to anti-LGBTQ+ hate in English

included: dyke,twat, and faggot. This contrasts

LTEDI which took a content search approach of

users reacting to LGBTQ+ content from India.

The high-level of code-switching and script-

switching between English and other Indo-Aryan

and Dravidian languages provides some level of

social, cultural, and linguistic information of the

training data. Both training data sets are compara-

ble in size; however, MLMA is 13.2% larger than

LTEDI by number of observations. The proportion

of anti-LGBTQ+ hate speech in the MLMA is 9.1%

while the proportion of anti-LGBTQ+ hate speech

in the LTEDI is 5.8%.

2.1.2 Annotation Process

Bender and Friedman (2018) proposed including

data statement framework in the hope to mitigate

different forms of social bias by dutifully doc-

umenting the NLP production process. Neither

data sets provided annotator metadata (Bender and

Friedman,2018); therefore, we can only infer

some of the annotator information from available

documentation. Where the MLMA took a crowd-

sourcing approach, the LTEDI data set were an-

notated by members of the LGBTQ+ communi-

ties. Based on the limited details, LTEDI we know

the annotators were English speakers based at the

National University of Ireland Galway. Unsurpris-

ingly, the MLMA at 0.15 is lower than LTEDI at 0.67

based on Krippendorf’s alpha where 1 suggests

perfect reliability while 0 suggests no reliability

beyond chance.

2.1.3 Cultural Alignment

With limited documentation to the data set col-

lection and annotation process beyond the system

description papers, we tentatively determine the

LTEDI is largely in alignment with anti-LGBTQ+

discourse from the South Asian cultural sphere

and the MLMA as culturally-undetermined anti-

LGBTQ+ rhetoric. This creates a useful contrast

which not only compares the efﬁcacy of two train-

ing data sets, but also anti-LGBTQ+ behaviour in

different varieties of World Englishes which are

inﬂuenced by their own unique social, cultural, and

linguistic contexts (Kachru,1982). We predict the

data set collection and annotation approaches will

have an impact on the outputs of the automatic

detection systems.

2.2 System Development

The ﬁrst phase of our investigation involves de-

veloping multiclass classiﬁcation models to detect

Figure 2: Comparison of anti-LGBTQ+ hate speech detected in 10,000 samples of posts/tweets from inner- and

outer-circle varieties of English between June 2018 to June 2023 including grouped mean and the upper and lower

conﬁdence intervals.

Macro Weighted

Base Retrain Base Retrain

LTEDI 0.78 0.81 0.95 0.96

MLMA 0.83 0.83 0.94 0.94

Table 3: Model evaluation metrics comparing the four

candidate models by average macro

-score and aver-

age weighted F1-score.

anti-LGBTQ+ hate speech in English. We opted for

a transformer-based language modelling approach.

Even though the focus of LTEDI is YouTube, we

can adapt Pretrained Language Models (PLMs) to

speciﬁc domains, or register of language, through

pretraining with additional samples of text (Guru-

rangan et al.,2020).

We initially trained two classiﬁcation models

with minimal feature engineering in order to de-

termine the best approaches to develop our auto-

matic detection systems. We split the training data

into training, development, and test sets with a

train:development:test split of 90:5:5. We used

Multi-Class Classiﬁcation model from the Simple

Transformers

Python package to ﬁnetune and train

the multi-class classiﬁcation model. We trained

each model for 8 iterations. We used AdamW as

the optimiser (Loshchilov and Hutter,2018). Our

baseline PLM is XLM-ROBERTA, which is a cross-

lingual transformer-based language model (Con-

neau et al.,2020).

2.2.1 Feature Engineering

Class imbalance had an effect on our detection sys-

tem. Therefore, we collapsed the multiple classes

from each training data set into a binary classiﬁ-

cation. We also removed the conﬁdentialised user-

2https://simpletransformers.ai/

names and URLs from Ousidhoum et al. (2019),

as we could not mask these high-frequency to-

kens from the classiﬁcation model. We used

RandomOverSampler from the Imbalanced Learn

Python package to upsample the minority classes.

We address the register discrepancy in Chakravarthi

et al. (2021). We retrained XLM-ROB ERTA with

120,000 samples of X (Twitter) language data from

the CGLU (Dunn,2020). The composition of the

language data included 10,000 samples from each

language condition.

2.2.2 Model Evaluation

We present the model evaluation metrics in Table 3.

In Table 3, we compare the model evaluation results

for the four candidate models (LTEDI

,LTEDI

MLMA

, and MLMA

). The model performance

improved in three of the four candidate models

based on both macro average and weighted average

-score. Surprisingly, there were no differences

between the two approaches for the MLMA mod-

els. With a focus on the anti-LGBTQ+ class, do-

main adaptation improved the

-score from 0.58

to 0.64 for the LTEDI

model. The

-score for the

MLMA

remains unchanged at 0.69. Based on the

model performance metrics for the four candidate

models, we advanced with the LTEDI

and MLMA

classiﬁcation models with domain adaptation and

feature engineering during ﬁnetuning. We contin-

ued to apply domain adaptation in both systems

despite not seeing signiﬁcant improvements in the

MLMA

model to maintain consistency between

the two classiﬁcation models.

3https://imbalanced-learn.org/

Figure 3: Quarterly growth rate of anti-LGBTQ+ hate speech detected with the LTEDI model with number of

posts/tweets by country between June 2018 and June 2023.

2.3 Communities of Interest

Even though the MLMA is supposedly culturally-

agnostic, we have broadly identiﬁed the cultural

alignment within the LTEDI based on the data

set collection and annotation process outlined in

Chakravarthi et al. (2021). More speciﬁcally, high-

levels of code-switching and script-switching be-

tween English, Hindi, and Tamil in the LTEDI sug-

gests the presence of an Indian English substrate in

the training data. Written English is often treated

as homogeneous language; however, geographic-

dialects represented by national-varieties of En-

glish maintain a constant-level of variation (Dunn

and Wong,2022).

Furthermore, the presence of Indian English on

social media, or English spoken and written in In-

dia introduced as a result of British colonisation

(Hickey,2005), is uncontested (Rajee,2024). In the

three concentric circles model of World Englishes,

Indian English is categorised as an outer-circle va-

riety of English (Kachru et al.,1985). Outer-circle

and inner-circle varieties of English are deﬁned as

national-varieties with British colonial ties. The

distinguishing feature of outer-circle varieties is

that English is not the primary language of social

life and the government sector. These outer-circle

varieties of English often co-exist alongside other

indigenous languages.

In order to test for the inﬂuence of social, cul-

tural, and linguistic factors, we retrieved samples of

social media language from outer-circle and inner-

circle varieties of English. Outer-circle varieties of

English as written English originating from Ghana,

India, Kenya, Malaysia, the Philippines, and Pak-

istan. Similarly, inner-circle varieties as written

English originating from Australia, Canada, Ire-

land, New Zealand, the United Kingdom, and the

United States. The data source of our social media

language data comes from a subset of CGLU corpus

which contains georeferenced posts/tweets from X

(Twitter) (Dunn,2020).

For each national-variety of English, we ﬁltered

the data for tweets in English. All posts/tweets

were processed with hyperlinks, emojis, and user

identifying information removed. In addition to the

monthly samples for each country, we re-sampled

monthly tweets from India over ten iterations to

determine the impact of our sampling methodol-

ogy. All posts/tweets were produced between July

2018 to June 2023. Of relevance to our analy-

sis, the countries associated with these national-

varieties all criminalised same-sex sexual activity

as a legacy of the English common law legal system

(with the exception of the Philippines) (Han and

O’Mahoney,2014). All but four of these countries

(Kenya, Ghana, Pakistan, Malaysia) have since

decriminalised same-sex sexual activity. How-

ever, LGBTQ+ rights vary signiﬁcantly between

countries and LGBTQ+ communities continue to

face discrimination in response to increased anti-

LGBTQ+ legislation in the United States dispro-

portionately affecting transgender people (Canady,

2023).

3 Results

We dedicate the current section to describe the re-

sults of the second phase of our investigation. This

phase involved applying the candidate models to

automatically detect anti-LGBTQ+ hate speech on

real-world instances of social media data in English.

Firstly, we applied both anti-LGBTQ+ hate speech

detection models on the ten randomised monthly

samples of social media language data from In-

dia using the same sampling methodology for other

national-varieties of English. The results are shown

in Figure 1. As expected, the LTEDI

model pre-

dicted higher rates of anti-LGBTQ+ hate speech;

however, what was unexpected were the low num-

ber of predictions from the MLMA

model. The

narrow conﬁdence intervals suggest little instability

between the different samples and the predictions

remained constant across samples.

After validating our sampling methodology by

visually inspecting the ten randomised monthly

samples from India, we applied both models on

random samples of inner- and outer-circle varieties

of English. We compared the results of the detec-

tion models as visualised in Figure 2. These were

consistent with our initial results. The rate of anti-

LGBTQ+ hate speech remained constant according

to the MLMA

model, while anti-LGBTQ+ hate

speech has increased over time based on a visual

inspection of the results. Of interest to our investi-

gation, the MLMA

model identiﬁed a higher pro-

portion of anti-LGBTQ+ hate speech in inner-circle

varieties of English. We saw an inverse relationship

with the LTEDI

where we see a higher proportion

of anti-LGBTQ+ hate speech in outer-circle vari-

eties of English. The wide conﬁdence intervals of

the LTEDI

suggests greater between-variety insta-

bility in outer-circle varieties of English.

We calculated the quarterly growth rates for

(a) Training Data

(b) Predicted Outputs

Figure 4: MLMA Wordcloud.

each variety of English for the predictions from

the LTEDI

. We included the total number of pre-

dicted posts/tweets in our visualisation as shown in

Figure 3. The growth rates allowed us to determine

the growth rate for each variety of English inde-

pendently. The results suggest the growth rate of

predicted anti-LGBTQ+ hate speech has remained

stable over time.

4 Discussion

The results from our study raises some interest-

ing questions on the efﬁcacy of these systems on

real-world instances of social media data. With re-

gards to the ﬁrst research question, our transformer-

based multiclass classiﬁcation model enabled us

to detect instances of anti-LGBTQ+ hate speech

from samples of georeferenced posts/tweets from

X (Twitter). We were able to manipulate the pre-

dicted outputs into different forms of time series

as shown in Figures 2and 3. The level of anti-

LGBTQ+ hate speech has maintained a constant

rate of growth despite decreasing usership on the

social media platform since the acquisition of X

(Twitter) by Elon Musk in 2022. The results sug-

gest that anti-LGBTQ+ hate speech on X (Twitter)

is indeed increasing in both rate and volume over

time (Hattotuwa et al.,2023).

When we compare the predicted results between

the MLMA

and the LTEDI models, we can see

signiﬁcant differences between the two models.

This is particularly obvious when we compare

(a) Training Data

(b) Predicted Outputs

Figure 5: LTEDI Wordcloud.

the predicted outputs in Figures 1and 2, where

the LTEDI

model on average predicted 50 times

more instances of anti-LGBTQ+ hate speech than

the MLMA

. These was unexpected as the model

evaluation metrics during model development sug-

gested the MLMA

model performed marginally

better than the LTEDI

. Considering both the sam-

pling methodology and the model development ap-

proaches were held constant between the models,

we propose the differences we see in the predicted

outputs is a result of the open-source training data.

One challenge of applying multiclass classiﬁca-

tion models on unknown data is that there is no sim-

ple method to validate the results. This is because

we do not have access to labelled training, devel-

opment, and test sets to evaluate the model perfor-

mance. We are therefore reliant on qualitative meth-

ods to validate the performance of our detection

models. Figure 4is a visual representation of the

word-token frequencies between the open-source

training data (a) and the predicted anti-LGBTQ+

hate speech (b) from the samples of posts/tweets.

The most prominent word-token in the training data

is faggot followed by dyke. This is not unexpected

as these word-tokens (including twat) were used to

identify instances of anti-LGBTQ+ hate speech on

X (Twitter). Counterintuitively, we did not see a

similar distribution in the predicted outputs.

With reference to Figure 4, the word-tokens with

the highest frequency in the predicted output were

not faggot or dyke, but sleep and gay. When we

Variety dyke faggot twat

GH 8 2 6

IN 5 - 6

KE 1 4 7

MY 3 4 14

PH 8 4 8

PK 3 6 6

gay

353

226

295

500

701

478

Table 4: Frequency of LGBTQ+ related slurs for outer

circle varieties of English.

ﬁltered for the keyword search terms in the sam-

ples, we found few instances across the varieties

of English as shown in Tables 4and 5. This is

unexpected as the keyword search terms are highly

prevalent in inner-circle varieties of spoken En-

glish (such as the United Kingdom and Ireland)

(Love,2021). This is supported by the higher word-

token frequencies in inner-circle varieties of En-

glish as shown in Tables 4and 5. We attribute the

infrequent occurrence of LGBTQ+ slurs in direct

response to X (Twitter) rules which discourages

hateful conduct on the platform.

Our analysis of the MLMA

model suggests a re-

lationship between the training data and the result-

ing detection model. Incidentally, we also observe

this bias towards inner-circle varieties of English

in Figure 2where the MLMA

is more inclined to

identify more anti-LGBTQ+ hate speech in inner-

circle than outer-circle varieties of English. This

leads our discussion to the second research ques-

tion where we determine how the social, cultural,

and linguistic context impacts the efﬁcacy of anti-

LGBTQ+ hate speech detection. Although anti-

LGBTQ+ discourse is consistent across languages

(Locatelli et al.,2023), slurs and swearwords are

not (Jay and Janschewitz,2008). This form of cul-

tural bias toward inner-circle varieties of English

(or oversight of outer-circle varieties) introduced

during the data collection process, raises questions

on the suitability of the MLMA

model in monitor-

ing anti-LGBTQ+ hate speech.

As we determined the LTEDI

model to be more

culturally aligned with the South Asian context, we

initially predicted the LTEDI model would be more

appropriate for South Asian contexts. However,

the results suggest the LTEDI

model as more ﬁt

for purpose in contrast to the MLMA

model. Not

only do we observe high-congruency between the

LTEDI

model output and the outer-circle varieties

of English as shown in Figure 2, the word-token

Variety dyke faggot twat

AU 5 12 48

CA 16 13 19

IE 15 16 62

NZ 6 14 53

UK 23 9 148

US 19 11 13

gay

635

623

659

627

679

875

Table 5: Frequency of LGBTQ+ related slurs for inner

circle varieties of English.

frequencies between the training data (a) and the

predicted outputs (b) in the LTEDI appear to have a

similar distribution as shown in Figure 5.

Curiously, both the training data and predicted

output lack slurs. Instead, we see word-tokens asso-

ciated with community (e.g., people) and religion

(e.g., bible,god, and Adam possibly in reference

to the Abrahamic creation myth of Adam and Eve).

This is unsurprising as anti-LGBTQ+ legislation

is often rooted in puritanical beliefs on morality

(Han and O’Mahoney,2014). With reference to

Figure 3, we observed a possible link between the

increased growth rate with nationwide response to

the Covid-19 pandemic. Once again this raises a

question on the validity of the predicted outputs

and whether the posts/tweets are anti-LGBTQ+ or

religious/spiritual in nature (or indeed, both).

5 Conclusion

The ﬁndings from this current paper raises a num-

ber challenges in applying hate speech detection in

a real-world context. Even within national-varieties

of English, we observed the impacts of social,

cultural, and linguistic factors. For example, the

LTEDI

which was culturally aligned with Indian

English was more sensitive to outer circle varieties

of English, while the MLMA

model was slightly

more sensitive to inner circle varieties of English.

We conclude that monitoring anti-LGBTQ+ hate

speech with open-source training data is not prob-

lematic in itself; however, we must interpret these

empirical outputs with qualitative insights to ensure

these systems are ﬁt for purpose.

Ethics Statement

The purpose of this paper is to investigate the suit-

ability of using open-source training data to de-

velop a multiclass classiﬁcation model to monitor

and forecast levels of anti-LGBTQ+ hate speech

on social media across different geographic dialect

contexts in English. This study contributes to the ef-

forts in mitigating harmful hate speech experienced

by LGBTQ+ communities. In our investigation, we

combine methods from NLP, sociolinguistics, and

discourse analysis to evaluate the effectiveness of

anti-LGBTQ+ hate speech detection.

We recognise the importance of advocate and

activist-led research in particular by members of

under-represented and minoritised communities

(Hale,2008). The lead author acknowledges their

positionality as an active advocate and a member of

the LGBTQ+ community (Wong,2023b). The lead

author is familiar with anti-LGBTQ+ discourse

both in online and ofﬂine spaces and its harmful

effects on members of the LGBTQ+ communities.

As discussed in Section 5, we support the cri-

tique of Parker and Ruths (2023) for NLP re-

searchers to reﬂect on the efﬁcacy and suitability

of hate speech detection models. The development

of hate speech data sets impose a ‘diversity tax’ on

already marginalised LGBTQ+ communities. Orig-

inally coined by Padilla (1994), this refers to the un-

intentional burden placed on marginalised peoples

to address inequities, exclusion, and inaccessibility

particularly in a research context. NLP researchers

need to work alongside key-stakeholders (e.g., af-

fected communities, advocates, and activists) as

well as social media platforms, non-proﬁt organi-

sations, and government entities to determine the

solutions of this social issue.

The inclusion of unobfuscated examples of

slurs, hate speech, and offensive language to-

wards LGBTQ+ communities is a deliberate at-

tempt to initiate the process of reclaiming and re-

appropriating some anti-LGBTQ+ slurs in NLP

research. Currently, there are limited best practice

guidelines on the obfuscation of profanities in NLP

research (Nozza and Hovy,2023). Worthen (2020)

theorised that anti-LGBTQ+ slurs are used to stig-

matise violations of social norms. Re-appropriating

these stigmatising labels can enhance what were

once devalued social identities (Galinsky et al.,

2003). This process of ‘cleaning’ and ‘detoxifying’

slurs is also a process of resistance and to reclaim

power and control (Popa-Wyatt,2020).

We argue that within context of social media

research giving unwarranted attention to slurs ig-

nores the root of this social issue: hate speech

expresses hate (Marques,2023). Many social me-

dia platforms have already put in place procedures

to censor sensitive word-tokens; however, social

media users continue to adopt innovative linguistic

strategies such as voldermorting (van der Nagel,

2018) and Algospeak (Steen et al.,2023) to con-

travene well-meaning moderation and censorship

algorithms. Our results suggest hate speech train-

ing data sets do not identify the full breadth of

hateful content on social media.

This paper does not include human or animal

participants. Furthermore, we abide by the data

sharing rules of X (Twitter) and posts/tweets with

identiﬁable personal details will not be shared pub-

licly. The authors have no conﬂicts of interests to

declare.

Limitations

In this section, we address some of the known limi-

tations of our approach in addition to limitations of

the open-source training data and the social media

data we have used in the current study.

Invisibility of Q+ identities This paper uses the

LGBTQ+ acronym to signify diverse gender and

sexualities who continue to experience forms of

discrimination and stigmatisation (namely Lesbian,

Gay, Bisexual, and Transgender people). While the

Q+ refers to those who are not straight or not cis-

gender (Queer+), we acknowledge the invisibility

of other minorities who are often excluded from

NLP research including intersex and indigenous

expressions of gender, sexualities, and sex charac-

teristics at birth.

Sociocultural bias during data collection De-

spite including more training data, the MLMA iden-

tiﬁed signiﬁcantly fewer instances of anti-LGBTQ+

hate speech than the LTEDI across the national-

varieties of English. With reference to the word-

clouds produced from the training data for MLMA

and LTEDI as shown in Figures 4and 5, there is

a high likelihood the keyword search (on dyke,

twat, and faggot) during the data collection pro-

cess has caused the classiﬁcation model to over-ﬁt

the training data. Similarly, the religious subtext

in the LTEDI training data reinforces polarising be-

liefs that religion is anti-LGBTQ+. Furthermore,

these detection systems do not account for semantic

bleaching or the reclamation of slurs (Popa-Wyatt,

2020).

Pitfalls of large language models We acknowl-

edge the cultural and linguistic biases introduced

through the PLMs used in our transformer-based

approach. However, we have mitigated some of

these impacts through domain adaptation (Liu et al.,

2019). With reference to Figure 4, we have reason

to believe the transformer-based detection systems

erroneously classiﬁed dylan,mike and like with

dyke. A breakdown of the character-trigrams (#DY,

DYK,YKE, and #KE) conﬁrms this belief.

Class imbalance and distribution We were able

to improve the performance of the detection model

during model development by up-sampling the mi-

nority classes. The LTEDI detected a constant pro-

portion of anti-LGBTQ+ hate speech between 5-

10% for all varieties of English which is a sim-

ilar proportion of anti-LGBTQ+ hate speech in

the training data (or 5.8% of the training data).

This raises potential questions on the efﬁcacy of

transformer-based classiﬁcation models.

Further work We welcome NLP researchers to

address these limitations in their research especially

on increasing the visibility of Q+ communities and

the sociocultural biases shown in open-source train-

ing data sets and large language models.

Acknowledgements

The lead author wants to thank Dr. Benjamin

Adams (University of Canterbury | Te Whare

ananga o Waitaha) and Dr. Jonathan Dunn (Uni-

versity of Illinois Urbana-Champaign) for their

feedback on the initial manuscript. The lead author

wants to thank the three anonymous peer reviewers

and the programme chairs for their constructive

feedback. Lastly, the lead author wants to thank

Fulbright New Zealand | Te T

apapa M

atauranga

o Aotearoa me Amerika and their partnership with

the Ministry of Business, Innovation, and Em-

ployment | H

ıkina Whakatutuki for their support

through the Fulbright New Zealand Science and

Innovation Graduate Award.

References

Laura Alonso Alemany, Luciana Benotti, Hernán

Maina, Lucía Gonzalez, Lautaro Martínez, Beatriz

Busaniche, Alexia Halvorsen, Amanda Rojo, and

Mariela Rajngewerc. 2023. Bias assessment for ex-

perts in discrimination, not in computer science. In

Proceedings of the First Workshop on Cross-Cultural

Considerations in NLP (C3NLP), pages 91–106,

Dubrovnik, Croatia. Association for Computational

Linguistics.

Aymé Arango, Jorge Pérez, and Barbara Poblete. 2022.

Hate speech detection is not as easy as you may think:

A closer look at model validation (extended version).

Information Systems, 105:101584.

Emily M. Bender and Batya Friedman. 2018. Data

Statements for Natural Language Processing: Toward

Mitigating System Bias and Enabling Better Science.

Transactions of the Association for Computational

Linguistics, 6:587–604.

Alexander Buhmann and Christian Fieseler. 2021. To-

wards a deliberative framework for responsible inno-

vation in artiﬁcial intelligence.Technology in Society,

64:101475.

Valerie A. Canady. 2023. Mounting anti-LGBTQ+

bills impact mental health of youths.Mental Health

Weekly, 33(15):1–6.

Bharathi Raja Chakravarthi, Prasanna Kumaresan, Ruba

Priyadharshini, Paul Buitelaar, Asha Hegde, Hosa-

halli Shashirekha, Saranya Rajiakodi, Miguel Án-

gel García, Salud María Jiménez-Zafra, José García-

Díaz, Rafael Valencia-García, Kishore Ponnusamy,

Poorvi Shetty, and Daniel García-Baena. 2024.

Overview of Third Shared Task on Homophobia and

Transphobia Detection in Social Media Comments.

In Proceedings of the Fourth Workshop on Language

Technology for Equality, Diversity, Inclusion, pages

124–132, St. Julian’s, Malta. Association for Compu-

tational Linguistics.

Bharathi Raja Chakravarthi, Ruba Priyadharshini,

Rahul Ponnusamy, Prasanna Kumar Kumaresan,

Kayalvizhi Sampath, Durairaj Thenmozhi, Sathi-

yaraj Thangasamy, Rajendran Nallathambi, and

John Phillip McCrae. 2021. Dataset for Identiﬁ-

cation of Homophobia and Transophobia in Mul-

tilingual YouTube Comments.arXiv preprint.

ArXiv:2109.00227 [cs].

Alexis Conneau, Kartikay Khandelwal, Naman Goyal,

Vishrav Chaudhary, Guillaume Wenzek, Francisco

Guzmán, Edouard Grave, Myle Ott, Luke Zettle-

moyer, and Veselin Stoyanov. 2020. Unsupervised

Cross-lingual Representation Learning at Scale. In

Proceedings of the 58th Annual Meeting of the Asso-

ciation for Computational Linguistics, pages 8440–

8451, Online. Association for Computational Lin-

guistics.

Dipto Das, Shion Guha, and Bryan Semaan. 2023. To-

ward Cultural Bias Evaluation Datasets: The Case

of Bengali Gender, Religious, and National Iden-

tity. In Proceedings of the First Workshop on Cross-

Cultural Considerations in NLP (C3NLP), pages 68–

83, Dubrovnik, Croatia. Association for Computa-

tional Linguistics.

Thomas Davidson, Debasmita Bhattacharya, and Ing-

mar Weber. 2019. Racial Bias in Hate Speech and

Abusive Language Detection Datasets. In Proceed-

ings of the Third Workshop on Abusive Language

Online, pages 25–35, Florence, Italy. Association for

Computational Linguistics.

Thomas Davidson, Dana Warmsley, Michael Macy, and

Ingmar Weber. 2017. Automated Hate Speech Detec-

tion and the Problem of Offensive Language.arXiv

preprint. ArXiv:1703.04009 [cs].

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and

Kristina Toutanova. 2019. BERT: Pre-training of

Deep Bidirectional Transformers for Language Un-

derstanding. In Proceedings of NAACL-HLT, pages

4171–4186.

Jonathan Dunn. 2020. Mapping languages: the Corpus

of Global Language Use.Language Resources and

Evaluation, 54(4):999–1018.

Jonathan Dunn and Sidney Wong. 2022. Stability of

Syntactic Dialect Classiﬁcation over Space and Time.

In Proceedings of the 29th International Confer-

ence on Computational Linguistics, pages 26–36,

Gyeongju, Republic of Korea. International Com-

mittee on Computational Linguistics.

Paula Fortuna, Laura Pérez-Mayos, Ahmed AbuRa’ed,

Juan Soler-Company, and Leo Wanner. 2021. Car-

tography of Natural Language Processing for Social

Good (NLP4SG): Searching for Deﬁnitions, Statis-

tics and White Spots. In Proceedings of the 1st Work-

shop on NLP for Positive Impact, pages 19–26, On-

line. Association for Computational Linguistics.

Adam D Galinsky, Kurt Hugenberg, Carla Groom, and

Galen V Bodenhausen. 2003. The reappropriation of

stigmatizing labels: Implications for social identity.

In Jeffrey Polzer, editor, Identity Issues in Groups,

volume 5 of Research on Managing Groups and

Teams, pages 221–256. Emerald Group Publishing

Limited.

José Antonio García-Díaz, Ángela Almela, Gema

Alcaraz-Mármol, and Rafael Valencia-García. 2020.

UMUCorpusClassiﬁer: Compilation and evaluation

of linguistic corpus for Natural Language Process-

ing tasks.Procesamiento del Lenguaje Natural,

65(0):139–142.

Suchin Gururangan, Ana Marasovi´

c, Swabha

Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey,

and Noah A. Smith. 2020. Don’t Stop Pretraining:

Adapt Language Models to Domains and Tasks.

arXiv preprint. ArXiv:2004.10964 [cs].

Charles R. Hale. 2008. Engaging Contradictions: The-

ory, Politics, and Methods of Activist Scholarship.

In Engaging Contradictions. University of California

Press.

Enze Han and Joseph O’Mahoney. 2014. British

colonialism and the criminalization of homosexu-

ality.Cambridge Review of International Affairs,

27(2):268–288.

Sanjana Hattotuwa, Kate Hannah, and Kayli Tay-

lor. 2023. Transgressive transitions: Transphobia,

community building, bridging, and bonding within

Aotearoa New Zealand’s disinformation ecologies

march-April 2023. Technical report, The Disinfor-

mation Project, New Zealand.

Raymond Hickey, editor. 2005. Legacies of Colonial

English: Studies in Transported Dialects. Studies

in English Language. Cambridge University Press,

Cambridge.

Dirk Hovy and Shannon L. Spruit. 2016. The Social

Impact of Natural Language Processing. In Proceed-

ings of the 54th Annual Meeting of the Association

for Computational Linguistics (Volume 2: Short Pa-

pers), pages 591–598, Berlin, Germany. Association

for Computational Linguistics.

Md Saroar Jahan and Mourad Oussalah. 2023. A sys-

tematic review of hate speech automatic detection

using natural language processing.Neurocomputing,

546:126232.

Timothy Jay and Kristin Janschewitz. 2008. The prag-

matics of swearing.Journal of Politeness Research

Language Behaviour Culture, 4(2):267–288.

Braj B. Kachru. 1982. The Other tongue: English

across cultures. University of Illinois Press, Urbana-

Champaign.

Braj B. Kachru, R. Quirk, and H. G. Widdowson. 1985.

Standards, codiﬁcation and sociolinguistic realism.

World Englishes. Critical Concepts in Linguistics,

pages 241–270.

Simran Khanuja, Diksha Bansal, Sarvesh Mehtani,

Savya Khosla, Atreyee Dey, Balaji Gopalan,

Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja

Nagipogu, Shachi Dave, Shruti Gupta, Subhash

Chandra Bose Gali, Vish Subramanian, and Partha

Talukdar. 2021. MuRIL: Multilingual Represen-

tations for Indian Languages.arXiv preprint.

ArXiv:2103.10730 [cs].

Kamran Kowsari, Kiana Jafari Meimandi, Mojtaba Hei-

darysafa, Sanjana Mendu, Laura Barnes, and Donald

Brown. 2019. Text Classiﬁcation Algorithms: A

Survey.Information, 10(4):150.

Prasanna Kumar Kumaresan, Rahul Ponnusamy, Ruba

Priyadharshini, Paul Buitelaar, and Bharathi Raja

Chakravarthi. 2023. Homophobia and transphobia

detection for low-resourced languages in social me-

dia comments.Natural Language Processing Jour-

nal, 5:100041.

Salla-Maaria Laaksonen, Jesse Haapoja, Teemu Kin-

nunen, Matti Nelimarkka, and Reeta Pöyhtäri. 2020.

The Dataﬁcation of Hate: Expectations and Chal-

lenges in Automated Hate Speech Monitoring.Fron-

tiers in Big Data, 3.

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man-

dar Joshi, Danqi Chen, Omer Levy, Mike Lewis,

Luke Zettlemoyer, and Veselin Stoyanov. 2019.

RoBERTa: A Robustly Optimized BERT Pretrain-

ing Approach.arXiv preprint. ArXiv:1907.11692

[cs].

Davide Locatelli, Greta Damo, and Debora Nozza. 2023.

A Cross-Lingual Study of Homotransphobia on Twit-

ter. In Proceedings of the First Workshop on Cross-

Cultural Considerations in NLP (C3NLP), pages 16–

24, Dubrovnik, Croatia. Association for Computa-

tional Linguistics.

Ilya Loshchilov and Frank Hutter. 2018. Decoupled

Weight Decay Regularization. In International Con-

ference on Learning Representations.

Robbie Love. 2021. Swearing in informal spoken En-

glish: 1990s–2010s.Text & Talk, 41(5-6):739–762.

Abulimiti Maimaitituoheti, Yong Yang, and Xi-

aochao Fan. 2022. ABLIMET @LT-EDI-

ACL2022: A Roberta based Approach for Homopho-

bia/Transphobia Detection in Social Media. In Pro-

ceedings of the Second Workshop on Language Tech-

nology for Equality, Diversity and Inclusion, pages

155–160, Dublin, Ireland. Association for Computa-

tional Linguistics.

Teresa Marques. 2023. The Expression of Hate in Hate

Speech.Journal of Applied Philosophy, 40(5):769–

787.

Pippa Norris. 2001. Digital Divide: Civic Engagement,

Information Poverty, and the Internet Worldwide.

Communication, Society and Politics. Cambridge

University Press, Cambridge.

Debora Nozza and Dirk Hovy. 2023. The State of Pro-

fanity Obfuscation in Natural Language Processing

Scientiﬁc Publications. In Findings of the Associa-

tion for Computational Linguistics: ACL 2023, pages

3897–3909.

Nedjma Ousidhoum, Zizheng Lin, Hongming Zhang,

Yangqiu Song, and Dit-Yan Yeung. 2019. Multilin-

gual and Multi-Aspect Hate Speech Analysis. In

Proceedings of the 2019 Conference on Empirical

Methods in Natural Language Processing and the

9th International Joint Conference on Natural Lan-

guage Processing (EMNLP-IJCNLP), pages 4675–

4684, Hong Kong, China. Association for Computa-

tional Linguistics.

Amado M. Padilla. 1994. Ethnic Minority Scholars,

Research, and Mentoring: Current and Future Issues.

Educational Researcher, 23(4):24–27.

Sara Parker and Derek Ruths. 2023. Is hate speech

detection the solution the world wants? Pro-

ceedings of the National Academy of Sciences,

120(10):e2209384120.

Telmo Pires, Eva Schlinger, and Dan Garrette. 2019.

How Multilingual is Multilingual BERT? In Pro-

ceedings of the 57th Annual Meeting of the Asso-

ciation for Computational Linguistics, pages 4996–

5001, Florence, Italy. Association for Computational

Linguistics.

Mihaela Popa-Wyatt. 2020. Reclamation: Taking Back

Control of Words.Grazer Philosophische Studien,

97(1):159–176.

Clarissa Jane Rajee. 2024. Analyzing Social Values of

Indian English in YouTube Video Comments: A Citi-

zen Sociolinguistic Perspective.Strength for Today

and Bright Hope for Tomorrow Volume 24: 3 March

2024 ISSN 1930-2940, page 9.

Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi,

and Noah A. Smith. 2019. The Risk of Racial Bias

in Hate Speech Detection. In Proceedings of the

57th Annual Meeting of the Association for Computa-

tional Linguistics, pages 1668–1678, Florence, Italy.

Association for Computational Linguistics.

Ella Steen, Kathryn Yurechko, and Daniel Klug. 2023.

You Can (Not) Say What You Want: Using Algos-

peak to Contest and Evade Algorithmic Content Mod-

eration on TikTok.Social Media + Society, 9(3).

Oana Stefania and Diana-Maria Buf. 2021. Hate Speech

in Social Media and Its Effects on the LGBT Com-

munity: A Review of the Current Research. Roma-

nian Journal of Communication & Public Relations,

23(1):47–55.

Ana M. Sánchez-Sánchez, David Ruiz-Muñoz, and

Francisca J. Sánchez-Sánchez. 2024. Mapping Ho-

mophobia and Transphobia on Social Media.Sexual-

ity Research and Social Policy, 21(1):210–226.

Yi Chern Tan and L. Elisa Celis. 2019. Assessing Social

and Intersectional Biases in Contextualized Word

Representations. In Advances in Neural Information

Processing Systems, volume 32. Curran Associates,

Inc.

Alice Tontodimamma, Eugenia Nissi, Annalina Sarra,

and Lara Fontanella. 2021. Thirty years of research

into hate speech: topics of interest and their evolution.

Scientometrics, 126(1):157–179.

Emily van der Nagel. 2018. ‘Networks that work too

well’: intervening in algorithmic connections.Media

International Australia, 168(1):81–92.

Bertie Vidgen and Leon Derczynski. 2020. Direc-

tions in abusive language training data, a system-

atic review: Garbage in, garbage out.PLOS ONE,

15(12):e0243300.

Sidney Wong and Matthew Durward. 2024.

cantnlp@LT-EDI-2024: Automatic Detection

of Anti-LGBTQ+ Hate Speech in Under-resourced

Languages. In Proceedings of the Fourth Workshop

on Language Technology for Equality, Diversity,

Inclusion, pages 177–183, St. Julian’s, Malta.

Association for Computational Linguistics.

Sidney Wong, Matthew Durward, Benjamin Adams,

and Jonathan Dunn. 2023. cantnlp@LT-EDI-2023:

Homophobia/Transphobia Detection in Social Me-

dia Comments using Spatio-Temporally Retrained

Language Models. In Proceedings of the Third Work-

shop on Language Technology for Equality, Diversity

and Inclusion, pages 103–108, Varna, Bulgaria. IN-

COMA Ltd., Shoumen, Bulgaria.

Sidney Gig-Jan Wong. 2023a. Monitoring Hate Speech

and Offensive Language on Social Media. In Fourth

Spatial Data Science Symposium, University of Can-

terbury.

Sidney Gig-Jan Wong. 2023b. Queer Asian Identities

in Contemporary Aotearoa New Zealand: One Foot

Out of the Closet. Lived Places Publishing.

Meredith Worthen. 2020. Queers, bis, and straight

lies: An intersectional examination of LGBTQ stigma.

Routledge.

ResearchGate has not been able to resolve any citations for this publication.

You Can (Not) Say What You Want: Using Algospeak to Contest and Evade Algorithmic Content Moderation on TikTok

Article

Full-text available

Aug 2023

Social media users have long been aware of opaque content moderation systems and how they shape platform environments. On TikTok, creators increasingly utilize algospeak to circumvent unjust content restriction, meaning, they change or invent words to prevent TikTok’s content moderation algorithm from banning their video (e.g., “le$bean” for “lesbian”). We interviewed 19 TikTok creators about their motivations and practices of using algospeak in relation to their experience with TikTok’s content moderation. Participants largely anticipated how TikTok’s algorithm would read their videos, and used algospeak to evade unjustified content moderation while simultaneously ensuring target audiences can still find their videos. We identify non-contextuality, randomness, inaccuracy, and bias against marginalized communities as major issues regarding freedom of expression, equality of subjects, and support for communities of interest. Using algospeak, we argue for a need to improve contextually informed content moderation to valorize marginalized and tabooed audiovisual content on social media.

Mapping Homophobia and Transphobia on Social Media

Article

Full-text available

Sep 2023
Sex Res Soc Pol

Introduction. One of the consequences of the increase in the number of social network users has been the inappropriate use of social networks by some of these users. Hate speeches are frequently identified on social media, and these promote certain homophobic and transphobic attitudes, causing psychological consequences on users belonging to minority gender groups. With this work, it is intended to know the current state of the problem raised, to facilitate the activity of new researchers in an emerging field. Methodology. Bibliographic analysis of 203 papers from the Scopus databases for the period between 1997 and 2022 using the VOSViewer software. The search for publications was carried out in February 2023. Results. There is a positive trend in the number of relevant publications since 2017, mainly in 2021 and 2022. The research on homophobia and transphobia on social media in USA is prominent, with a high number of published articles, productive organizations, and influential authors. Twitter is shown to be the social network most widely used to spread homotransphobic hate speech. Environments conducive to the development of homotransphobic attitudes are identified as collective sports, mainly football and its supporters, as well as peer groups. Conclusions It is a growing problem that requires intervention at the societal level, requiring the development of legislation that moves away from heteronormativity, the development of mechanisms for automatic detection of homotransphobic discourse on social networks, and a multidisciplinary analysis and approach to control the problem as well as provide adequate social support to affected groups.

A Cross-Lingual Study of Homotransphobia on Twitter

Conference Paper

Full-text available

Jan 2023

Monitoring Hate Speech and Offensive Language on Social Media

Poster

Full-text available

Sep 2023

Sidney GJ Wong

Hate speech and offensive language content on social media platforms has increased in both volume and tone across Aotearoa. The current study aims to develop a method to monitor hate speech and offensive language using transformer-based pretrained language models (e.g., XLM-RoBERTa). A hate speech and offensive language text classification model was developed using open-source hate speech language training data. We applied our text classification system on a random monthly sample of tweets from across a hundred locations. The results found that the rate of hate speech as identified by the system developed for this study has steadily increased over time. There also appears to be an urban-rural split in the occurrence of hate speech and offensive language. However, a closer inspection of hate speech found that the model was not sensitive to Aotearoa-specific linguistic features (e.g., ‘bugger’) and words with structural similarities to slurs were misclassified as hate speech or offensive language. The findings suggest that language models are immensely valuable; however, further work is needed to develop training data specific to the social, political, and linguistic context of Aotearoa.

Bias assessment for experts in discrimination, not in computer science

Conference Paper

Full-text available

May 2023

Approaches to bias assessment usually require such technical skills that, by design, they leave discrimination experts out. In this paper we present EDIA, a tool that facilitates that experts in discrimination explore social biases in word embeddings and masked language models. Experts can then characterize those biases so that their presence can be assessed more systematically, and actions can be planned to address them. They can work interactively to assess the effects of different characterizations of bias in a given word embedding or language model, which helps to specify informal intuitions in concrete resources for systematic testing.

A systematic review of Hate Speech automatic detection using Natural Language Processing.

Article

Full-text available

May 2023
NEUROCOMPUTING

Homophobia and transphobia detection for low-resourced languages in social media comments

Article

Nov 2023

Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity

Conference Paper

Jan 2023

The State of Profanity Obfuscation in Natural Language Processing Scientific Publications

Conference Paper

Jan 2023

Mounting anti‐LGBTQ+ bills impact mental health of youths

Article

Apr 2023

Valerie A. Canady

The influx of anti‐LGBTQ+ legislation proposed by states in recent years, and in particular in 2023 around the country, continues to take a toll on the mental health of our country's youth population, including increases in stress and anxiety, according to national polls and news outlets.