ArticlePDF Available

Gender-Based Violence in 140 Characters or Fewer: A #BigData Case Study of Twitter

March 2015
First Monday

March 2015

DOI:10.5210/fm.v21i1.6148

Source
arXiv

License
CC BY 4.0

Authors:

Hemant Purohit

George Mason University

Tanvi Banerjee

Wright State University

Andrew J. Hampton

Christian Brothers University

Valerie L. Shalin

Wright State University

Show all 6 authorsHide

Humanitarian and public institutions are increasingly relying on data from social media sites to measure public attitude, and provide timely public engagement. Such engagement supports the exploration of public views on important social issues such as gender-based violence (GBV). In this study, we examine Big (Social) Data consisting of nearly fourteen million tweets collected from the Twitter platform over a period of ten months to analyze public opinion regarding GBV, highlighting the nature of tweeting practices by geographical location and gender. The exploitation of Big Data requires the techniques of Computational Social Science to mine insight from the corpus while accounting for the influence of both transient events and sociocultural factors. We reveal public awareness regarding GBV tolerance and suggest opportunities for intervention and the measurement of intervention effectiveness assisting both governmental and non-governmental organizations in policy development.

Summary of the education gap between genders, the penetration of Internet, and overall literacy rates in the diverse set of chosen countries.

…

Tweet volume across countries from 1 January to 31 October 2014 related to GBV, and controlling for the population and Internet penetration of each country. Vertical lines represent time slices sampled. [Population expressed as a percentage of the smallest candidate (Nigeria). Nigeria = 1, India ~ 25. The adjusted metric then is expressed as original tweet numbers divided by adjusted population divided by Internet penetration.]

…

. Relative distribution of tweets whose author’s gender we could determine.

…

. A snapshot of the Twitris social analytics platform showing on-going research to monitor and analyze the GBV data by a variety of dimensions- location (1, 8), time (2), types of content theme such as sexual violence (3), demographics based map visualization updates (4), top topics by country or region of interest (5), country specific demographic and top topics and related tweets (6, 7), sentiment based heat map (8), as well as fine-grained analysis of emotion expressed, who-talks-to-whom network for influential users about GBV topics, and real-time trends dashboard (1).

…

. Percentages of sexual violence messages discovered by time slice and by country.

…

Figures - uploaded by Amit Sheth

Content may be subject to copyright.

Content uploaded by Amit Sheth

Content may be subject to copyright.

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

Gender-Based Violence in 140 Characters or Fewer:

A #BigData Case Study of Twitter

Hemant Purohit1,2, Tanvi Banerjee1,2, Andrew Hampton1,3, Valerie L. Shalin1,3,

Nayanesh Bhandutia4 & Amit P. Sheth1,2

1Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Dayton, OH, USA

2Department of Computer Science and Engineering, Wright State University, Dayton, OH, USA

3Department of Psychology, Wright State University, Dayton, OH, USA

4United Nations Population Fund Headquarters, NYC, NY, USA

Corresponding Authors:

Hemant Purohit, Andrew Hampton, Valerie Shalin: {hemant, andrew, valerie}@knoesis.org

ABSTRACT

Humanitarian and public institutions are increasingly relying on data from social media sites to measure

public attitude, and provide timely public engagement. Such engagement supports the exploration of

public views on important social issues such as gender-based violence (GBV). In this study, we examine

Big (Social) Data consisting of nearly fourteen million tweets collected from the Twitter platform over a

period of ten months to analyze public opinion regarding GBV, highlighting the nature of tweeting

practices by geographical location and gender. The exploitation of Big Data requires the techniques of

Computational Social Science to mine insight from the corpus while accounting for the influence of both

transient events and sociocultural factors. We reveal public awareness regarding GBV tolerance and

suggest opportunities for intervention and the measurement of intervention effectiveness assisting both

governmental and non-governmental organizations in policy development.

Keywords: computational social science, gender-based violence, social media, citizen sensing, public

awareness, public attitude, policy, intervention campaign

KEY FINDINGS & IMPLICATIONS (see Table 1 for examples)

Social Media Content:

1. Substantial GBV related content exists in social media.

2. Spikes in GBV content reflect the influence of transient events, particularly involving celebrities.

3. Gender, language, technology penetration, and education influence participation with implications

for the interpretation of quantitative measures.

4. GBV content includes humor and metaphor (e.g. in Sports) that reflect both attitude and behavior.

5. Content highlights the role of government, law enforcement and business in the tolerance of GBV.

Relevance to GBV Policy:

6. Social Media provides an alternative for measuring GBV attitude and behavior that is cheaper,

faster, and broader than conventional survey-based methods.

7. Regional socio-cultural context influences both the measurement and the interpretation of data.

8. Computational methods for context-sensitive monitoring, modeling and interpreting GBV social

media content are highly feasible.

9. The location and network of social media participants supports targeted regional anti-GBV

campaigns.

10. Policy makers require tools to make social media content accessible in near-real time to monitor the

effectiveness of anti-GBV campaigns.

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

[1.] INTRODUCTION

Gender-based violence (GBV), primarily against women, is a pervasive, global

phenomenon affecting both developed and developing countries. Over 35% of the world’s

female population has experienced gender-based violence at some point in their lives (World

Health Organization, 2013). According to the United Nations Population Fund (UNFPA)1,

“GBV is a serious public health concern that also impedes the crucial role of women and girls in

development.” However, anti-GBV sentiment is not universal, apparent in sexist chants during a

professional sport2 and the public response to convicted offenders3. While a United Nations

campaign acknowledges GBV as a societal problem (United Nations, 2009), prevalence is very

difficult to assess. The European Union’s council report on GBV highlighted a persistent lack of

comparable data across regions and over time (European Union, 2010), hampering both

assessment and mitigation. Both the UNFPA4 and the European Union Agency for Fundamental

Rights5 seek better data sourcing and policy design (European Union Agency for Fundamental

Rights, 2012).

Mining large-scale online data from mobile technology and social media promises to

complement traditional methods and provide greater insight with finer detail. Computational

Social Science (Lazer et al., 2009) has leveraged such data to inform programs in a variety of

domains, for example disaster response coordination (Purohit et al., 2014; Purohit et al., 2013;

Vieweg et al., 2010), health (Culotta, 2014; De Choudhury et al., 2013), and drug abuse

(Cameron et al., 2013). Here we examine the potential of Computational Social Science to

address the problem of monitoring and mitigating GBV on a global scale (see Table 1),

confronting the typical Big Data challenges of large-scale volume, velocity of content generation,

sparsity of data behaviors, variety in language complexity and heterogeneity in participant

demographics. Below, we provide several analyses including the volume, location, source

gender, and a variety of content analyses to help monitor GBV, and both inform the design and

assess the effectiveness of anti-GBV campaigns.

Traditional GBV monitoring methods face many challenges. Gathering statistics on GBV

episodes is time consuming, collected under non-standardized protocols, and published in highly

aggregated form. For example, the non-partner sexual violence prevalence data published by the

World Health Organization6 is dated 2010 and reported by 21 aggregated geographic regions

such as West Africa and South Asia. While the United Nations Office on Drugs and Crime data7

are less aggregated, and somewhat more recent with some data available as recently as 2012 and

separated by country, the data are sparse and incomplete. For instance, sexual violence data are

available for the Philippines and Nigeria for 2012, but the most recent data for India are from

1 UNFPA agency’s description about GBV: http://www.unfpa.org/gender/violence.htm

2 http://www.bbc.com/news/blogs-trending-31628729

3 http://www.telegraph.co.uk/news/worldnews/asia/india/11443462/Delhi-bus-rapist-blames-his-victim-in-prison-

interview.html

4 UNFPA agency: http://www.unfpa.org/public/

5 EU FRA agency: http://fra.europa.eu/en/project/2012/fra-survey-gender-based-violence-against-women

6 Data available here: http://apps.who.int/gho/data/view.main.NPSVGBDREGION

7 Data available here: http://www.unodc.org/unodc/en/data-and-analysis/statistics/crime.html

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

2010 and no data are presented for South Africa. The lag itself prevents monitoring change,

either to detect unexpected increases in GBV, change in attitude, possibly due to recent events

and awareness campaigns, or decreases resulting from mitigation efforts. Furthermore, these

statistics reflect legal definitions, making direct comparisons among countries impossible due to

differences in the definition and recording of offenses.

Table 1. Tweet samples from our analysis (see Section 3) with implications to inform and design targeted

GBV intervention campaigns. M1-M3 illustrates existing automatic analysis capability, while M4-M8

result from partially automated analyses presenting the case for expanding computational methods.

Message

Implications

M1: RT @USER1: 1 in 3 women are

raped/abused in their lifetime. RT if you rise to

stop the violence. #1billionrising

http://t.co/lXEEmQoLbO

Volumetric analysis of social media can help measure

population engagement and effective penetration of designed

campaigns in the community

M2: #StopRape Rape Crisis says many survivors

of sexual abuse and assault still don't feel

confident in the criminal justice system. CS

Location analysis to identify message origin (e.g., South

Africa in M2) and route to appropriate agencies

M3: @USER2 Takes a MOMENT 2 Sign & ask

Others 2,so DV & Rape Laws become Equal.

TOGETHER we can Change History:

http://t.co/ylWtl5PgCI

Gender detection analysis of message authors (e.g., female in

M3) to adjust apparent GBV attitudes for gender and suggest

the content of anti-GBV policy and campaigns

M4: RT @USER3: Rape prevention nail polish

sounds like a great idea but Iím not sure how

youíre going to get men to wear it

Content analysis sensitive to sociocultural considerations,

such as the role of humor overall, and for acknowledging

despair (e.g., in M4 from the Philippines)

M5: RT @USER4: 15 yard penalty for

"unnecessary rape" http://t.co/yhzxtYzGP0

Metaphor analysis indicates the acceptance of GBV, echoed

for example in sports, and suggesting opportunities for

specific anti-GBV campaigns

M6: Valentine's day is really helping me sell

these date rape drugs

Entity recognition based on knowledge-bases of GBV

entities can help identify precipitating events (e.g., Valentine

day in M6), apriori to design preventive campaigns

M7: @USER5 WB Govt must have ordered

Police to protect the family of Rape-Victim. It is

shameful for Mamata Banerjee O GOD GIVE

WISDOM TO ALL

Organization detection, including government, law

enforcement and commercial entities in relation to GBV can

inform policy, e.g., M7 informs potential lack of police

protection for a victim’s family in West Bengal (WB)

M8: RT @USER6: It is not my job to coddle and

"educate" young Black men when it comes to

violence against women. Y'all wanna "teach"?

Modeling of stereotypical association can inform design of

targeted campaigns, e.g., M8 author is stereotyping GBV

violence with black men

Note. We anonymized user mentions as per the IRB guidelines.

Apart from the basic problems in the logistics of gathering GBV data, the data

themselves are limited. Reliance on formal reports risks under-reporting by victims and

witnesses, who may believe that domestic violence is a private matter (Nilan et al., 2014). In

fact, such evaluative attitudes (Allport, 1935), left implicit in conventional data collection

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

methods, may provide a better metric of GBV risk and tolerance. Aggregation/generalization

across localities with different socio-economic properties masks important trends (Yoshihama,

Blazevski, & Bybee, 2014) because sociocultural context including politics, history, religion, and

economy strongly influence attitudes (Nayak et al., 2003). Researchers suggest the need for

different prediction and mitigation models for different sociocultural contexts. For example, an

attitude of liberal sexual conduct combines with egalitarian values and economic development to

predict tolerance for GBV (Martinez & Khalil, 2012). Similarly, Corroon et al. (2014) found that

encouraging the Nigerian population to accept responsibility for their own health affects health

outcomes only in specific cities and urban regions.

Social science survey methods complement the data gathering efforts of governmental

and non-governmental agencies. However, survey data also has several limitations. Surveys

reflect sampling biases and various confounds (Pierotti, 2013). For example, the rates of sexual

violence from a survey of patients being treated for sexually transmitted disease in South Africa

(Kalichman et al., 2005) bear uncertain relationship to the general population. Survey candidates

may refuse to participate, provide socially acceptable but inaccurate responses, or respond to

superficial yes/no questions using local mores and standards with high violence thresholds (Nilan

et al., 2014). For example, perpetrators may not consider beating to be an act of violence if the

victim failed to comply with behavioral norms. We highlight five additional limitations of survey

data of particular relevance to Computational Social Science methods. First, survey items tend to

address attitudes but not behavior, and therefore bear unclear relationship to the rate of GBV

episodes (Bhanot & Senn, 2007). In contrast, inasmuch as verbal abuse constitutes a form of

violence, social media posts can provide actual instances of behavior. Second, the methods fail to

account for transient global events, such as political or celebrity activity that can influence views

and responses, hindering comparison over time. Third, the items themselves presume an

established theory and standard measures of GBV, limiting the opportunity to discover latent

patterns that reflect attitude and behavior. For example, metaphor is a powerful reflection of

public opinion (Lakoff, 2008), but to our knowledge has not been explored in survey measures.

Fourth, survey methods constitute a highly labor intensive data collection method, necessitating

small samples while imposing cost and lag in data availability in a dynamic world. For example,

Pierotti’s survey examined change over an average of five year intervals, ranging from three to

seven years (Pierotti, 2013). Finally, survey methods are by design static, discouraging

questionnaire modification to incorporate newly detected trends once the survey data collection

begins.

Social media provide a faster, cheaper and face-valid means to engage the public,

providing unprecedented large-scale access to public views and behavior (see Table 1). It

provides an ability to monitor attitudes in near real-time, to support timely mitigation efforts.

While use of social media give advantages with regards to speed (velocity), in some case

participation, broad sourcing and lower cost, studies cannot be tightly controlled with specific

statistical sampling, availability of demographic data may be limited or language use can skew

coverage–we discuss limitations later in section 4. Ultimately all three resources (formal reports,

surveys, and social media) require integration in order to assist policy design, prioritize attention

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

for interventions, and design region-specific programs to curb GBV. A logical first step is to

understand what social media offer for GBV monitoring and the design of mitigation and policy.

Study Design Overview

Based on the suggestions of our UNFPA collaborators to identify a GBV related corpus,

we selected three major themes that encompass gender violence concerns: physical violence,

sexual violence, and harmful socio-cultural practices. Corresponding to these three themes, we

created a seed set of keywords for data, crawling the Twitter social media microblogging

platform. We also selected four countries with suspected elevations in GBV suggested by

UNFPA experts: India, Nigeria, the Philippines, and South Africa, in addition to the U.S. given

its Internet penetration. In this paper we assess the role of social media (data from Twitter) to

identify public views related to GBV in these chosen parts of world, and tweeting practices by

geography, time, gender, and events to inform concerned parties and assist GBV policy design.

Figure 1. Summary of the education gap between genders, the penetration of Internet, and overall literacy

rates in the diverse set of chosen countries.

The five countries present different contexts both for understanding social media data

pertaining to GBV and for mitigation efforts. Figure 1 summarizes their variability across some

key contextual dimensions: the education gap between genders, the penetration of Internet, and

overall literacy rates. The figure illustrates the clustering of Nigeria and India for lower literacy

rates, a greater education gap and lower Internet penetration. South Africa and the Philippines

cluster with the U.S. regarding overall literacy and the reduced education gap, but reflect a

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

diverging range of Internet penetration. The graphs suggest the risk of sampling bias that affects

data interpretation: an illiterate female citizen with no access to the Internet (likely in rural areas

with unique GBV issues) may not be providing social media data, biasing the aggregated

measures of attitude. Other differences may also be relevant. For example, India is among the top

20 of over 140 countries regarding female political empowerment while Nigeria is below

average on this dimension. Additional influences on the use of social media not depicted in the

figure include cultural influences on free speech. Sizable percentage of Nigerians for example

may avoid public conversation about the Boko Haram atrocities8 due to fear of revenge.

In the next section, we employ quantitative and qualitative analyses to examine Twitter

content related to GBV. Twitter supports the distribution of short messages called tweets that are

a maximum 140 characters in length. The character limit influences message style and constrains

communication practices. Therefore, tweets often contain URL links to web pages or blogs,

sometimes relying on shortened URL versions from external services (e.g.,

http://bit.ly/1C4HnMN). A hashtag convention (e.g., #RapeJoke, #ChildMarriage) supports the

identification of searchable user-defined topics. Other Twitter engagement features include

retweet (or ‘RT’, a forwarding of someone’s tweet). The device used to post a tweet may provide

accessible, precise location indicators in some cases. Alternatively, accessible user profiles

provide more general indicators of location and sometimes gender indicators such as author

name.

A corpus of Big (Social) Data collected over a period of ten months included nearly

fourteen million tweets. In this corpus, we examine volume, location, trends over time, gender

participation, and content such as metaphor and humor. Our analyses present both challenge and

opportunity to study phenomenon of gender-based violence. Challenges concern the need for

computational methods to discern public perception and attitude from complex contextualized

behavior. Opportunity lies in gaining fine-grained, region-specific insights concerning the

prevailing GBV attitudes, and related policies along with potential approaches to mitigation.

[2.] METHOD

We first describe our data collection for the study, followed by a description of the

analysis approach.

Data Collection

Based on domain expert guidance we collected data from the Twitter Streaming API

(Twitter Developers, 2014), using its ‘filter/track’ method for the given set of keywords. We

leveraged the keyword-set crawlers (data collectors) of our Twitris platform for mining

collective social intelligence as described in (Nagarajan et al. 2009, Sheth et al., 2014). Twitris

also provides real-time and scalable analyses of social data streams for specific topics of interest

by end users (discussed in future work). UNFPA experts assisted in the definitions corresponding

to the three themes of interest for GBV study and associated English terminology. Using the

8 For an overview, see the following: http://en.wikipedia.org/wiki/Boko_Haram

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

themes of physical violence, sexual violence, and harmful practices, we created a set of seed

keywords and key multi-word phrases for data crawling (see Table 2).

Table 2. Seed set of key phrases for GBV related data crawling.

Physical violence

woman dragged, women dragged, girl dragged, female dragged, woman kicked,

women kicked, girl kicked, female kicked, woman beat up, women beat up, girl

beat up, female beat up, woman beaten, women beaten, girl beaten, female beaten,

woman burn, women burn, girl burn, female burn, woman acid attack, women acid

attack, girl acid attack, female acid attack, woman violence, women violence, girl

violence, female violence

Sexual violence

sexual assault, sexual violence, rape, woman harass, women harass, girl harass,

female harass, woman attacked, women attacked, girl attacked, female attacked,

boyfriend assault, boy-friend assault, stalking woman, stalking women, stalking

girl, stalking female, groping woman, groping women, groping girl, groping female

Harmful Practices

child marriage, children marriage, underage marriage, forced marriage, sex

trafficking, woman trafficking, women trafficking, girl trafficking, female

trafficking, child trafficking, children trafficking

For each single keyword K, the Twitter service provided messages containing any form

of the keyword #K, k, K. For the multi-word phrase K, it provided the messages containing all

the terms of K. Each message is associated with metadata, containing various tweet related and

author profile characteristics, such as tweet origin location (latitude and longitude), time of

posting, author profile description and location, number of author followers (users that

subscribed to the author’s updates), and followees (users to whose updates the author

subscribed). Location metadata was crucial to geographical analysis. We first checked if tweet

origin latitude-longitude were available from the device used to send the tweet, else we resolved

the author profile location, if available, using Google Maps API. We used a bounding box of

latitude-longitude for a country of interest to identify a country-specific tweet dataset. Using

Genderize API (http://genderize.io), we collected genders of the authors of tweets. We first

fetched the real names of the twitter authors using metadata of their Twitter handles. We

extracted first names to detect author genders via calls to the Genderize API with first names as

parameters. We do note limitations with the gender detection due to informal social media

language use, and therefore discuss scope for improvement in our future work section.

Analysis Approach

With UNFPA guidance, we analyzed the data corpus of 13.8 million tweets from the ten

months in non-uniform time slices, starting with a smaller pilot phase and adding two additional

extended data collection periods due to supporting results from the pilot phase data.

● SLICE 1: Jan 1 2014 - Feb 15 2014: 1.5 months; Phase-1 (pilot phase)

● SLICE 2: Feb 15 2014 - June 31 2014: 4.5 months; Phase-2

● SLICE 3: July 1 2014 - Oct 31 2014: 4 months; Phase-3

To study the diverse set of data from the three phases, we employed mixed methods

(Creswell, 2013) to reveal both patterns across the corpus as well as the content of specific

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

contributions. The focus of quantitative analysis is to provide data-driven insights of activity

patterns in the social media community, by examining large-scale distributions of GBV content

by geography, time, and gender. The recovery of meaning of a pattern requires more fine-grained

analysis than mere statistical distribution. Therefore, the focus of our qualitative analysis is to

reveal attitudes and behaviors across different countries and between genders. In both cases, our

interpretation relies on context, regarding current events and socio-cultural considerations,

ultimately supporting the need for context-sensitive computation for monitoring GBV content in

social media.

[3.] RESULTS

Quantitative Analysis

We discuss four types of quantitative analyses in this section: volume, theme, sharing, and

gender-based.

Volume analysis. To begin our study, we sampled an initial slice of data for 1.5 months

from all over the world, which contained 2.3 million tweets worldwide related to GBV. We skip

the detailed descriptive statistics report for brevity regarding this sample and instead summarize

our observations. Regarding our interest in location-specific GBV data, we note relatively few

tweets with device-based location information (1.91%). However, author profiles provide

location related information as well, including which over half of the data (54.74%) for 1.3

million users had location information. These results motivated our more extensive data

collection effort, spanning an additional 8.5 months.

Table 3 provides the composition of the full data set by country. More than 10% of the

social media traffic belongs to the five countries examined here. We note more than five times

the traffic in the U.S. and India relative to the other countries. Moreover, the observed frequency

ranking differs from the population demographic9 information for these countries (in descending

order: India, the U.S., Nigeria, the Philippines, South Africa). We suspect that Internet

penetration is influencing data collection. Worse, a frequency graph for all of the raw data over

time would mask variability in the Philippines, Nigeria, and South Africa with smaller

populations and variable Internet penetrability (We plan to address urban versus rural

comparison in our future work.) Figure 2 therefore scales the raw data by two factors: population

and Internet penetration. This allows us to consider the relative prevalence of GBV topics

between countries and over time. Below we discuss some of the emergent patterns by country.

Although we highlight the need to interpret these patterns with respect to a broad and complex

knowledge base that includes current events, these are precisely the sorts of analyses that are best

accomplished with computational tools.

9 http://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

Table 3. Volume of GBV related tweets by country.

Country of Origin

Number of Tweets

United States

698077

India

549033

Philippines

105550

Nigeria

134671

South Africa

104430

Note. This table represents 1,591,761 (11%) of the total 13,942,592 GBV tweets collected globally.

Figure 2. Tweet volume across countries from 1 January to 31 October 2014 related to GBV, and

controlling for the population and Internet penetration of each country. Vertical lines represent time

slices sampled. [Population expressed as a percentage of the smallest candidate (Nigeria). Nigeria = 1,

India ~ 25. The adjusted metric then is expressed as original tweet numbers divided by adjusted

population divided by Internet penetration.]

The trend in Fig. 2 shows that when controlling for population and internet penetration,

South Africa generates the most GBV related tweets overall, with the U.S. and India lagging

behind roughly the same amount, and Nigeria and the Philippines toward the bottom.

Several current events may explain the apparent peaks over time, revealed during a

parallel Google Trends search by country. In the Philippines, the ongoing saga of Vhong Navarro

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

dominated much of the discussion10. Navarro, a television personality, was assaulted in his home

in the January of 2014 apparently in retaliation for attempted rape. The incident stirred concerns

over rape culture, as many lashed out at the female accuser. However, the American movie star

Jennifer Lawrence also factored highly in the most searched items in the Philippines regarding

the hacking of her private email accounts.

Between March and July a vehemently anti-LGBT figure named Myles Munroe ranked

among the most-searched topics in Nigeria11. The largest Nigerian story from the liberal

perspective (the kidnap of the Chibok schoolgirls in mid-April) factored only ninth in Nigeria.

“Kidnapping” searches initially increased by about four fold in the month and a half following

this incident, but then dropped to previous levels. Given the responsible group’s influence and

tactics, we suspect citizens may have been reticent to discuss any wrongdoing openly via social

media. We make this point to illustrate the need to consider socio-cultural influences on social

media traffic. This suggests that we will require separate models of GBV in social media, by

country.

South Africa’s Google searches reflected considerable violence12. Among the top events

driving traffic was the murder of a prominent soccer player in October during a burglary at his

home. Oscar Pistorius also influenced the search patterns. The trial for the murder of his

girlfriend began on March 3 and temporarily adjourned on May 20. The trial resumed on June 30

through August 8, with a judgment on September 12 and sentencing on October 21. At least two

of the peaks in GBV related Twitter traffic are coincident with these events.

These observations suggest that any quantitative analysis of social media traffic must

control for current events, in order to separate fundamental trends from local variability. Such

control is well within the capability of computational analysis using topic-modeling approaches.

We note apparent correlations between certain countries: India and the U.S., as well as

the Philippines and Nigeria at least for the bulk of the data. We observed a 48.1% overlap in the

popular topics of the U.S. within the set for India for the month of August. Topics to determine

overlap were based on the top 500 key-phrases (set of words in a message text) extracted using

modified tf-idf based key-phrase extraction algorithm in Twitris (Nagarajan et al. 2009). Such

content overlaps suggest the presence of mediating influences. To provide a source of

hypotheses, we turn to demographic properties of the countries investigated in this study. We

note a large Indian diaspora living in the U.S., potentially reacting to events originating in India.

The story of a 15 year old Indian female acid attack victim received considerable attention

following an Al Jazeera report in January (Dixit, 2014). The shared August peak coincides with

an Independence Day speech by the Prime Minister of India (Narendra Modi) in which he urged

suspension of the practice of questioning the families of girls regarding their social habits but

never boys. Manual inspection of the U.S. corpus supports an explanation of the correlation

between the U.S. and India trends based on the Indian diaspora. Tweets with apparent origins in

10 http://www.google.com/trends/topcharts#geo=PH&date=2014

11 http://www.google.com/trends/topcharts#geo=NG&date=2014

12 http://www.google.com/trends/topcharts#geo=ZA&date=2014

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

the U.S. may actually constitute amplification of tweets with origins in India, as the following

example illustrates.

India: What world calls shining #India is the worst place for women in terms of #Rape

http://t.co/F2oyKps60R #BanBollywood #MediaMafia

US: What world calls shining #India is the worst place for women in terms of #Rape

http://t.co/zZoB17p1L1 #BanBollywood #PakMediaHijacked

The U.S. and Indian patterns illustrate an important challenge in the interpretation of social

media data—the trends do not necessarily reflect local events. Computational tools for

monitoring GBV will need to conduct location analysis from the text to distinguish commentary

about other countries. Model building must accommodate this distinction. However, given the

tweet pedigree, it is not clear to whom we should attribute the attitude.

Table 4. Percentages of sexual violence messages discovered by time slice and by country.

Country

SLICE-1

SLICE-2

SLICE-3

ALL

United States

65.10%

64.78%

63.90%

64.44%

India

66.05%

65.41%

65.30%

65.45%

Philippines

77.68%

69.08%

69.09%

72.00%

Nigeria

55.80%

47.13%

49.08%

49.59%

South Africa

71.15%

66.96%

63.69%

66.24%

Total

65.67%

62.52%

61.11%

62.43%

Theme analysis. Our original data collection categorized the tweets into three thematic

groups: physical violence, sexual violence, and harmful practices. Sexual violence dominates the

sample of GBV tweets (72% overall in global corpus), so we provide their distribution over time

in Table 4. We note a general decrease over time with some variability between countries. We

suspect early spikes related to incidents like Vhong Navarro rape accusation in Philippines and

the initial publicity surrounding the Boko Haram atrocities in Nigeria.

Sharing behavior analysis. Social media provides the opportunity to distribute

information, potentially reflecting both the senders’ judgment of information importance, and

reliance on the voice of others. Sharing functions to amplify these voices, often the influential

celebrities. We analyze two types of sharing behavior in the social media community

surrounding GBV events: direct content resharing as retweet (RT), and indirect sharing via

references to the external resources using URL, such as news, blogs, articles, and multimedia.

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

Table 5. Proportional RT and URL tweets by country.

Country

URL

United States

45.47%

42.25%

India

46.68%

42.63%

Philippines

40.74%

18.79%

Nigeria

26.28%

62.13%

South Africa

44.25%

31.49%

We calculated the percentage of GBV retweets relative to the total count of tweets for

each data sample as shown in Table 5. More than 40% of the GBV corpus is a retweet in the US,

India, the Philippines, and South Africa, amplifying information that senders consider to be

important. For comparison, Liu, Kliman-Silver, and Mislove (2014) found that retweets

generally constitute just over 25% of the total volume of tweets. Although we note variability in

retweet behavior between countries, the low retweeting frequency in Nigeria is particularly

remarkable (see Table 5). One might hypothesize that a low literacy country such as Nigeria, in

which senders are less able to compose messages, would have the highest retweet ratio. The

adjacent analysis of URLs suggests a different socio-cultural phenomenon at work, concerning

the identifiability of the responsible party. For GBV tweets containing URLs, Nigeria has the

highest percentage of tweets with URLs in comparison to other countries. Numerous

explanations should be tested, including literacy, credibility of the public press, and the

possibility that reliance on external resources somehow reduces the threat of being identified as

the responsible party.

Author gender analysis. We obtained gender identification for 37% of the overall users

(see Table 6). The reduced percentage is due to names missing in the Genderize API lexicon, as

well as unconstrained natural language features of social media content, such as use of special

characters in names, for instance, ‘@@shish’ instead of ‘Aashish’, which is a male Indian name.

Table 6. Gender-wise distribution of data for the overall global corpus.

Statistic

Users

% of Total

Users

Generated Tweets

% of Total

Tweets

Total

3,036,576

100%

13,942,592

100%

- Gender Filtered*

1,148,329

37.82%

2,771,686

19.88%

-- Female author

563,016

18.54%

1,324,292

9.50%

-- Male author

585,313

19.28%

1,447,394

10.38%

*Filtered = where an author gender could be determined

Keeping in mind that statistically significant differences are certain with a large sample

dataset, the distribution of gender appears approximately equal in Table 6. We also note a tweet

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

frequency average for a female author as 2.352 tweets per author, while 2.472 tweets per author

for a male. However, Figure 3 separates the gender distribution for the examined countries, and

depicts the impression of gender inequality.

Figure 3. Relative distribution of tweets whose author’s gender we could determine.

The name classification procedure that we employed suggests a gender bias in the

samples across countries. Literacy serves as a partial explanation for the observed ratios, except

for the U.S. Apart from the explanation, opinions collected in the U.S., India, and Nigeria reflect

a male bias, while opinions collected from the Philippines and South Africa are more balanced or

even reflecting slight female bias. It has corresponding implications for the assessment of GBV

attitudes, the general reach of anti-GBV campaigns using social media and the ability to target

potential perpetrators and activists for engagement separately.

Qualitative Content Analysis

Thus far we have described the corpus with respect to dimensions that we anticipated at

data collection: country, theme of GBV event, sharing methods, and gender. We have made

suggestions about the surrounding context, including transient events that might explain the

observed patterns. In the remaining analyses, we look more closely at message content for

indications of GBV attitudes and to clarify the requirements for future computational analysis

capabilities.

Language indicators. Using Linguistic Inquiry Word Count (LIWC) software13

(Tausczik & Pennebaker, 2010), we analyzed language of the content of all tweets generated by

both genders. We used the predefined LIWC dictionaries that tally word frequencies in

categories such as anger, sexuality, sadness, health, etc. Content corresponding to these

13 Information available here: http://www.liwc.net/

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

particular categories, and in fact content across the majority of the LIWC categories appeared

more frequently in tweets of male origin relative to tweets of female origin. However, we did

note some trends in tweet content from females. Consistent with the research on gender issues in

communication (Tannen, 1996), female authors here are more collective and socially oriented.

Their tweets call for action and are more likely to express agreement.

Female (South Africa): @USER7 Absolutely. If we follow each other I can DM you my

email address. I applaud your speaking out on the rape epidemic in SA.

Female (Nigeria): I am worried abt our approach to d fight against rape. Permit me to

vent here. @USER8 @USER9 @USER10 @USER11 #CurbingRape

Female authors are also more likely to provide opinion on causality, as in the following example

tweet from India:

Female (India): The major factor behind #Rape in #India is the #Bollywood which incites

feeling 2 cross the moral limits #BanBollywood #GreaterPakistan

Gender specific analysis can be leveraged to design and promote anti-GBV campaigns.

For example, tweets of female origin in India, although not guaranteed to be benevolent, could

be amplified to extend their reach.

Following a manual review of a random sample of the corpus, we used computational

analyses to describe the prevalence of attitude indicators across the whole corpus. Two of the

attitude indicators examined here are the presence of humor indicators and GBV metaphors in

sports. We also provide specific examples of tweet content from manual analysis of random

samples.

Humor indicators. Humor might indicate a trivialization of an issue or an expression of

underlying helplessness (Romero, 2010; Ancheta, 2007). We assessed the prevalence of humor

references with related permutations of “haha” and “hehe”. The number of humor-flagged tweets

by country appears in Table 7. The Philippines sample provides a greater proportion of humor

indicators by far. This is consistent with Filipino culture14 in general. Moreover, the observed

female bias in the Filipino sample reinforces the interpretation of humor as an expression of

helplessness. Of course, laughter annotations do not provide a complete account of humorous

references but rather a correlated indicator for jokes such as the following from the Filipino

corpus:

Female (Philippines): RT @USER12: Rape prevention nail polish sounds like a great idea but

Iím not sure how youíre going to get men to wear it

Due to cultural differences in the role of humor, we cannot advocate comparison between

countries. However, we do suggest that the prevalence of GBV humor may provide a useful

metric for changes in attitude over time within a region.

14 A brief overview of traditional Filipino values can be found here:

http://en.wikipilipinas.org/index.php?title=Philippine_Core_Values

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

Table 7. Analysis of humor-flagged tweets indicated by permutations of humor indicators.

Country

Flagged tweets

Total tweets

Percentage

Philippines

10418

105550

9.870%

South Africa

1801

104430

1.725%

USA

8067

698077

1.156%

Nigeria

1419

134671

1.054%

India

5195

549033

0.946%

Note. Lexicon set defined as {haha, ha ha, hehe, he he, he he he, hahaha, lol, lmao}.

GBV and sports metaphors. Sports involve competition, and violent metaphor is a

common device in sport discussion (Lewandowski, 2010, 2011, 2012). GBV metaphors may

appear in a sports corpus as an indication of dominance, such as the following example from the

South African tweet corpus:

South Africa: The German Team is on a Steroid-induced anal-rape rage

against Brazil right now.

Comparing the prevalence of GBV metaphors in sports across countries is difficult

because a given sport does not capture national attention equally across countries. We obtain

some insight regarding the GBV metaphor in tweets related to sports from exchanges concerning

the FIFA World Cup contest, held from 12 June through 13 July 2014. Soccer is the most

popular sport in Nigeria and South Africa, one of the top three favorite sports in India, and a top

ten favorite in the Philippines and the U.S.15. While all of the countries we examined have

eligible teams, only the U.S. (ranked 13) and Nigeria (ranked 44) participated in the final

tournament16.

We should not expect any FIFA related content in a corpus designed to capture GBV

issues. Yet, Table 8 illustrates the existence of tweets flagged for containing references to:

football, futbol, soccer, worldcup, world cup, fifa, fifacup, soccercup, as well as pairs of team

names participating in the tournament. Although the percentages are small, every country

examined provided such instances, in a ranking consistent with soccer popularity, with the

exception of the U.S. The finding requires a nuanced interpretation. On the one hand, rape as a

metaphor suggests a trivialization of the primary definition. On the other hand, at least one

dictionary17 indicates an archaic definition meaning plunder or violation, for example regarding

the environment. On the one hand violent metaphor is a common device in sport discussion. On

the other hand, it is just one of many common metaphors (Lewandowski, 2012). In fact, in

Lewandowski’s (2010-2011) extensive analysis of violent, conflict oriented metaphor in football

in sports journalism, none of the 551 violent metaphors invoked rape.

15 http://mostpopularsports.net/football-soccer-popularity

16 http://en.wikipedia.org/wiki/2014_FIFA_World_Cup

17 http://dictionary.reference.com/browse/rape

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

A difference in the popularity of sport between countries is only one challenge to direct

comparison of GBV metaphor between countries. The development of a knowledge base to

support analysis poses further challenge, as different regions follow different teams with

different players. As in the interpretation of humor, we believe the best use of sport metaphor

data is to provide indicators of attitude trends over time by region, while adjusting for seasonal

variations in sporting events or averaged annually.

Table 8. Analysis of sports-related tweets indicated by sports lexicon.

Country

Flagged tweets

Total tweets

Percentage

U.S.

1053

698077

0.0015%

India

657

549033

0.0011%

Nigeria

320

134671

0.0023%

Philippines

105550

0.0004%

South Africa

214

104430

0.0020%

Note: FIFA World Cup sample for each country in the period of June 12 to July 13 2014.

Lexicon: { football, futbol, soccer, worldcup, world cup, fifa, fifacup, soccercup, [country-names pairs

from http://en.wikipedia.org/wiki/2014_FIFA_World_Cup] }

Policy-making and intervention insights via manual analysis. Our goal is to generate

attitude metrics and inform mitigation campaigns automatically. All of the above examples were

informed by computational analysis. For the following examples, we manually examined random

subsets drawn from each of the three slices of our data corpus, as 200, 500, and 500 tweets. We

end our presentation of GBV data by indicating the kind of content available in the corpus

compatible with somewhat more specialized computational analysis. Developing the necessary

content specific filtering methods (feasible using domain knowledge of politics, business, and

behavioral theories) is particularly critical for appropriately routing content to specific policy

making recommendation agencies. Some examples are following:

a.) Behavior pertaining to government/public officials/leaders

● @USER13 NCP leader doesn't know rape happens due to pervert mindset . what

abt unreported rapes happen with minors & toddlers.

b.) Commercial considerations

● RT @USER14: .@USER15 Please don't allow violent hatemongers use your app

to harass and exploit marginalized women. \"http://t.co/DoRkbXFuN…

● Spring Breakers isn?t just a terrible movie, it reinforces rape culture

http://t.co/pxTAZftM8v

● RJ Police already said that it's not a Rape case!But Media neglected it 4 creating

SPICY news!#KnowTheTruth & Rise4Justice

c.) Persuasive/encouraging message content

● RT @USER16: Pregnancy, periods, breast cancer, being walked on, rape,

harassment, abuse; females go through a lot. WOMEN ARE STRONG.

● @USER17 @USER18 Is it acceptable to use gang rape in an example?

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

d.) Stereotype association

● RT @USER6: It is not my job to coddle and "educate" young Black men when it

comes to violence against women. Y'all wanna "teach"?

Automated filtering of GBV content according to such dimensions is challenging but feasible

given the appropriate knowledge bases.

[4.] DISCUSSION AND FUTURE WORK

Here we revisit the limitations of conventional, survey-based methods for monitoring

GBV attitudes to discuss the progress we have made, the limitations we face, and the promise of

further advances in Computational Social Science.

Progress

We noted above the limitations of conventional methods in gathering GBV data for

specific regions. Data are either highly aggregated, at the country level or missing. We have

provided substantial amounts of GBV data for all five of our target regions. User profiles and

GPS tags provide the capability of identifying content specific to much smaller units of analysis,

e.g., type of socio-economic region or even city (Leetaru et al., 2014). This capability supports

targeted GBV campaigns, where the prevalence and types of violations may vary. Below we note

progress with respect to a number of additional concerns regarding conventional data gathering

methods.

Reducing sampling bias. We noted the presence of bias in survey methods due to

artificial sample selection. We have overcome some of the bias concern simply by the scope of

data collection that is feasible with social media. We have the opportunity to observe male and

female attitudes by specific regions, over time. However, bias does remain in our data collection

methods. Some of the persisting bias is amenable to adjustment. For example, bias exists in the

form of literacy assumptions by country and by gender within country and Internet penetrability.

Awareness of these sampling issues allows us to amplify the weight of content from

underrepresented participants, e.g., the contributions from females in lower penetration areas.

This is possible because we can assess both gender and location in the available data. Of course,

we cannot amplify content that does not exist. This limitation is fundamental, but it also

identifies the regions where this limitation occurs and where we should focus other methods of

data collection.

Reducing content bias. We suggested that survey participation itself constitutes a form

of bias. Apart from the sampling issues noted above, the posted content on social media itself

reflects a form of bias. Participants are still providing what they consider to be socially

acceptable content. This is particularly apparent in the absence of commentary regarding Boko

Haram and GBV in Nigeria. However, this observation alone can inform the course of action for

policy makers, e.g., publicizing sensitive topics. Nevertheless, for the social media data that are

available, participants are not providing guarded responses to external queries about their

attitudes. Instead they express their attitudes directly, making them susceptible to analysis.

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

Analysis speed. We have over 1.5 million social media postings collected in 2014 from

just five countries to support the claim that GBV issues generate commentary. Moreover, we

have completely eliminated the need for labor-intensive data collection, and as a result have

overcome the cost and lag limitations in data collection. We produced analysis within a year in

contrast to survey methods, despite the absence of complete automated capability. This rapid

turnaround enables the development of dynamic metrics to assess the results of campaigns

designed to curb GBV. A rapid measurement capability can play a role in the promotion of

effective efforts and the abandonment of those that are less effective, with real consequence to

the alleviation of human suffering.

GBV attitude metrics. Survey items tend to address attitudes but not behavior. Social

media data provide both attitudes and behavior, inasmuch as jokes and metaphor are both

behavior and attitude. This provides us with potential measures of tolerance for GBV. Thus, the

editing of socially acceptable content that constitutes a form of bias in data collection is the very

same behavior that tells us what is considered acceptable. This provides a means to measure the

effectiveness of anti-GBV campaigns, both those directly targeted to the potentially offensive

jokes and metaphor, as well as those targeted to specific but apparently unrelated concerns such

as the effect of holidays or weather.

Survey methods are by design static. Standard measures purport to provide evidence that

is comparable across time and regions. However, standardization ignores the effect of context.

Social media trends over time tell us that context cannot be ignored. For example the publicity

surrounding a celebrity involvement in a GBV related event spikes social media commentary.

The availability of such events (Tversky & Kahneman, 1973) may very well influence survey

responses. But standalone surveys have no way to account for this influence. Our computational

social science methods allow us to complement the interpretation of GBV commentary with

adjustments for the influence of events at a short-time scale in order to discern long-term trends.

Finally, and completely outside the typical survey content, the analysis of social media

also promises to assist GBV campaigns, both by targeting the views of specific groups and by

providing content recommendations regarding law enforcement, politics, health services and

commerce.

Limitations & Challenges

All data collection methods suffer from limitations, and ours is no exception. Here we

note several concerns, some of which are amenable to computational solution and some of which

are more fundamental.

Unconstrained natural language text. Our keyword-based crawling limits the

completeness of the resulting corpus. Given the natural language of social media messages, we

cannot guarantee collection of every single relevant message. Moreover, keyword selection

matters. Countries vary the terminology they employ for different purposes. For example the

word “rape” in tweets that originate in India generally refer to events in India, but “sexual

assault” in the Indian corpus returns mostly American incidents. We are further constrained at

present by restriction to the English language, missing the interpretation of GBV language

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

embedded in Tagalog for example (spoken widely across the Philippines). Furthermore, our

dependence on keywords glosses over the different definitions of rape across cultures.

Global event sensitivity. Both a feature and a limitation, we demonstrated the influence

of events on social media content, both within a region and between regions. World events

provoke the articulation of public opinion and provide an unprecedented large-scale opportunity

to gather opinion and attitude. At the same time world events create variations in magnitude that

require adjustments to frequency counts, in order to reflect enduring trends over time.

Inter-regional comparison. We commented earlier that differences in the legal

definition of GBV hindered comparison of GBV rates between countries. While we believe our

measurements within a region can be informative about attitude change over time, the ability to

compare GBV issues between regions still poses substantial challenge. Our two proposed

measures, sports metaphors and jokes illustrate the challenge. A given sport is not equally

important across countries (or even regions), so that the prevalence of more frequent violent

sport metaphor in one country relative to another may simply reflect population interest in the

sport. Allowing type of sport to vary between regions confounds sport with region, so that we

might be learning more about issues with the sport than the region. The meaning of jokes

notwithstanding (trivialization versus helplessness), we cannot compare the prevalence of GBV

jokes between countries without factoring in the prevalence of jokes between countries in

general. Thus, we have not escaped the local, sociocultural influences on measures that prevent

between region comparisons. Our contribution to this issue is to establish that it is not a

limitation of specific measures such as police reports and surveys. All measures reflect these

sociocultural influences.

Location identification. Location references of the messages are not always consistent

with the location of the author profile or GPS coordinates of source device. U.S. events appear in

the opinions of people in other countries, and vice versa. The technical aspect of this problem

resolves with techniques that discern event location from the message text or ancillary URL

content. This is not necessarily without remaining uncertainty, as people often assume shared

context location identification, e.g., with abbreviated names for familiar landmarks resulting in

referential ambiguity. However a far greater concern lies in the attribution of attitude when the

content corresponds to a remote event.

Correlational logic. Our argument has a correlational component. For example, we can

identify indicators of humor such as “hehehe” and “hahaha” using computational methods.

However, automatically detecting every instance of a joke embedded in social media content is

not computationally feasible. We assume that the frequency of humor keywords in a GBV

context correlates with the frequency of GBV jokes without such keywords. Although this

assumption is worth confirming using a manual classification of jokes, we note that the presence

of laughter indicators alone in this context is potentially offensive.

Future Work

We have demonstrated the potential of social media to inform policy makers regarding

attitudes, measuring the effect of campaigns, and even providing campaign concepts.

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

Computational Social Science holds promise for a range of economic development issues beyond

GBV, covering a different set of themes, countries, and languages. However, much technical,

theoretical, and practical work remains to realize the full potential of the medium for the GBV

application.

Technical advances are required in order to distinguish message origins from message

content and support more comprehensive gender detection. Substantial work remains in the

development of specific knowledge bases to guide the detection and interpretation of jokes, as

well as metaphor and commentary directed to particular entities such as government and

business. The knowledge bases must be dynamic to capture the transient events that mask

fundamental attitude. Theoretical guidance from social science is required to attribute humor and

metaphor to either despair or tolerance or the lack thereof. The interpretation of retweets across

regions raises the problem of attribution in more than a practical sense. The issue is whether and

how to weigh the endorsement: as a property of the original sender, or the endorser who is

amplifying the message. Does the endorsement reflect opinion of the endorser’s location

independent of birth origins? The resulting approaches require validation to address these issues.

Comparison of GBV threats between countries, though raising measurement issues, is not

an exclusively methodological problem. Instead, it is a sociocultural issue, concerning the

boundaries that violate globally established norms and determine the deployment of global

resources. We lack the policy expertise to weigh in on such matters. However, we do endorse the

development of adaptive regional, and even local models for GBV behaviors and attitudes.

Although a large model encompassing all countries holds a certain appeal, it is fallacious to

assume that all regions have the same underlying issues and beliefs. The number of countries

constrains the number of contextual variables that one can test statistically. Local socio-

economic conditions, literacy, Internet penetrability, gender, crime rates, religious beliefs, liberty

and many other unexamined, but constant factors influenced the data and their interpretation.

Data accessibility for policy-makers. Near-real-time information benefits an urgent

need to reduce GBV and the associated suffering. Conventional methods of disseminating public

opinion, are reflected in written reports covering multiple years, and issued with considerable

delay following data collection. Governments, NGOs, International Organizations, Aid Workers,

etc. require far higher bandwidth access to the dynamic data and analysis in order to measure

current public opinion and the effect of anti-GBV campaigns.

The Twitris collective social intelligence platform (Sheth et al. 2014) provides a

foundation for delivering the necessary high bandwidth access. Twitris supports the presentation

of thematic data along spatial and temporal dimensions (Nagarajan et al. 2009), network

relationships among people related to the distribution of content (Purohit et al. 2012), and

sentiment (Chen et al. 2012) as well as real-time trends18 shown in Figure 4.

18 A currently monitored GBV campaign is available at:

http://twitris2.knoesis.org/app/#campaigns/Gender%20Based%20Violence/dashboard Register for full dashboard

view.

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

Figure 4. A snapshot of the Twitris social analytics platform showing on-going research to monitor and

analyze the GBV data by a variety of dimensions- location (1, 8), time (2), types of content theme such as

sexual violence (3), demographics based map visualization updates (4), top topics by country or region of

interest (5), country specific demographic and top topics and related tweets (6, 7), sentiment based heat

map (8), as well as fine-grained analysis of emotion expressed, who-talks-to-whom network for

influential users about GBV topics, and real-time trends dashboard (1).

Twitris analyzes a topic such as GBV, and provides real-time scalable analyses of social

data streams for greater insights and actionable information to improve intervention. For

example, Twitris can monitor the real-time public sentiment and emotional reaction to criminal

reports and the justice system response by region, permitting side-by-side comparison with other

events. Under user guidance, Twitris automatically identifies the key topics meriting further

attention. Twitris network analysis assists in the measurement of anti-GBV campaign diffusion

to measure campaign effectiveness, as well as the identification of users who will spread targeted

campaigns.

CONCLUSION

Big (Social) Data complement more controlled but slower survey-based data collection

and analysis methods, whose conclusions may become obsolete in a dynamic world that

continuously generates noisy data responding to transient events. The lag in surveys and noisy

nature of data limits the use of conventional methods for measuring the effects of changing GBV

attitudes and anti-GBV campaigns. Computational Social Science supports the collection and

analysis of a large GBV corpus. We analyzed nearly fourteen million Twitter messages collected

over a ten-month period. The large sample reduced bias relative to conventional data collection.

We examined data by region and gender, identifying content such as humor and metaphor that

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

have implications for both the measurement of GBV attitudes as well as specific targets for anti-

GBV campaigns. Our methods constitute an inexpensive way to engage with citizens at

unprecedented scale, including the collection of public views regarding the behavior of

government and business to revolutionize the conduct and measurement of anti-GBV campaigns.

ACKNOWLEDGEMENT

We acknowledge the partial support of the U.S. National Science Foundation Social-

Computational Systems (SoCS) program for grant IIS–1111182 ‘Social Media Enhanced

Organizational Sensemaking in Emergency Response’ for our study. We are thankful to our

colleagues at the United Nations Population Fund NYC—especially Upala Devi, Judy Ilag, and

Maria Dolores Martin Villalba and Kno.e.sis Center–especially Lu Chen, and current research

interns Kushal Shah and Garvit Bansal from LNMIIT India, for their invaluable continued

support in discussion and review for our GBV research.

REFERENCES

Ancheta MRG. 2007. "Pig's nest" in an even bigger pen: Pugad Baboy as a case of subversion

and renegotiation in Philippine comedy. Humanities Diliman 1:2.

Aumiller S. 2011. Why is Twitter promoting violence against women? Give Hope A Voice.

WordPress.com. Available at http://givehopeavoice.wordpress.com/2011/08/01/why-is-

twitter-promoting-violence-against-women/

Bhanot S, Senn CY. 2007. Attitudes towards violence against women in men of South Asian

ancestry: Are acculturation and gender role attitudes important factors? Journal of Family

22(1):25-31.

Bowes N, McMurran M. 2013. Cognitions supportive of violence and violent behavior.

Aggression and Violent Behavior 18(6):660-665.

Cameron D, Smith GA, Daniulaityte R, Sheth AP, Dave D, Chen L, Gaurish A, Carlson R,

Watkins KZ, Falck R. 2013. PREDOSE: A semantic web platform for drug abuse

epidemiology using social media. Journal of Biomedical Informatics 46(6):985-997.

Corroon M, Speizer IS, Fotso JC, Akiode A, Saad A, Calhoun L, Irani L. 2014. The Role of

Gender Empowerment on Reproductive Health Outcomes in Urban Nigeria. Maternal and

Child Health Journal 18(1):307-315.

Creswell JW. 2013. Research design: Qualitative, quantitative, and mixed methods approaches.

Sage.

Culotta A. 2014. Estimating county health statistics with Twitter. In Proceedings of the SIGCHI

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

Conference on Human Factors in Computing Systems (CHI’14):1335-1344.

De Choudhury M, Gamon M, Counts S, Horvitz E. 2013. Predicting depression via social media.

In ICWSM.

Dixit A. 24 January 2014. How love got the better of India acid attack. Al Jazeera India.

European Union. 2012. Council conclusions on the eradication of violence against women in the

European Union. Available at

http://www.consilium.europa.eu/uedocs/cms_Data/docs/pressdata/en/lsa/113226.pdf

European Union Agency for Fundamental Rights. 2012. European council effort. Available at

http://fra.europa.eu/en/project/2012/fra-survey-gender-based-violence-against-women

Factbook, C.I.A. 2013. The world factbook. Central Intelligence Agency, Washington D.C.

Google Trends. 2015. Top charts. Available at

http://www.google.com/trends/topchartsdate=2014

HarassMap tool. Available at http://harassmap.org/en/

Heise L, Ellsberg M, Gottmoeller M. 2002. A global overview of gender-based violence.

International Journal of Gynecology & Obstetrics 78(1):S5-S14, ISSN

0020-7292, http://dx.doi.org/10.1016/S0020-7292(02)00038-3

ITU statistics 2014 on Mobile Tech adoption. Available at

http://www.itu.int/net/pressoffice/press_releases/2014/23.aspx#.VCSELStdU7s

Kalichman S, Simbayi LC, Kaufman M, Cain D, Cherry C, Jooste S, Mathiti V. 2005. Gender

attitudes, sexual violence and HIV/AIDS risks among men and women in Cape Town,

South Africa. Journal of Sex Research 42(4):299-305.

Lakoff G. 2008. Don't think of an elephant!: Know your values and frame the debate. Chelsea

Green Publishing.

Lazer D, Pentland AS, Adamic L, Aral S, Barabasi AL, Brewer D, Christakis N, Contractor N,

Fowler J, Gutmann M, Jebara T, King G, Macy M, Roy D, Van Alstyne M. 2009. Life in

the network: The coming age of computational social science. Science 323(5915):721.

Leetaru K, Wang S, Cao G, Padmanabhan A, Shook E. 2013. Mapping the global Twitter

heartbeat: The geography of Twitter. First Monday 18(5).

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

Lewandowski M. 2010-2011. The rhetoric of violence in Polish and English soccer. In I.

Koutny and P. Nowak (eds.): Language, communication, information 87-89.

Lewandowski M. 2012. Football is not only war. Non-violent conceptual metaphors in English

and Polish soccer language. In Taborek, Tworek and Zieliński (eds.): Sprache und Fußball

im Blickpunkt linguistischer Forschung, Verlag Dr. Kovač, Hamburg 79-95.

Liu Y, Kliman-Silver C, Mislove A. 2014, May. The tweets they are a-changin’: Evolution of

twitter users and behavior. In International AAAI Conference on Weblogs and Social

Media (ICWSM) 13:55.

Martinez PR, Khalil H. 2012. Battery and development: Exploring the link between intimate

partner violence and modernization. Cross-Cultural Research 47(3):231-267.

Morrison A, Ellsberg M, Bott S. 2007. Addressing gender-based violence: A critical

review of interventions. The World Bank Research Observer 22(1):25-51.

Nayak MB, Byrne CA, Martin MK, Abraham AG. 2003. Attitudes toward violence against

women: A cross-nation study. Sex Roles 49(7-8):333-342.

Nilan P, Demartoto A, Broom A, Germov J. 2014. Indonesian men's perception of violence

against women. Violence Against Women 20(7):869-888.

Pierotti RS. 2013. Increasing rejection of intimate partner violence evidence of global cultural

diffusion. American Sociological Review 78(2):240-265.

Purohit H, Castillo C, Diaz F, Sheth AP, Meier P. 2013. Emergency-relief coordination on social

media: Automatically matching resource requests and offers. First Monday 19(1).

Purohit H, Hampton A, Bhatt S, Shalin VL, Sheth AP, Flach JM. 2014. Identifying seekers and

suppliers in social media communities to support crisis coordination. Computer Supported

Cooperative Work (CSCW) 23(4-6):513-545.

Romero G. 2010. Why the Filipino laughs. Research Lines. Available at

http://www.ovcrd.upd.edu.ph/researchlines/2010/10/22/why-the-filipino-laughs/

SafeCity initiative. Available at http://safecity.in/

Sheth A, Jadhav A, Kapanipathi P, Chen L, Purohit H, Smith GA, Wang W. 2014. Twitris- A

system for collective social intelligence. Encyclopedia of Social Network Analysis and

Mining (ESNAM), Springer, Alhajj, Reda, Rokne, Jon (Eds.) ISBN: 978-1-4614-6169-2.

GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015

Available at http://www.springer.com/computer/communication+networks/book/978-1-

4614-6169-2

Tannen D. 1996. Gender and Discourse. Oxford University Press, ISBN 0-19-508975-8; ISBN

0-19-510124-3.

Tausczik YR, Pennebaker JW. 2010. The psychological meaning of words: LIWC and

computerized text analysis methods. Journal of Language and Social Psychology

29(1):24-54.

Twitter Developers. 2014. Streaming API. Available at

https://dev.twitter.com/streaming/overview Accessed Jan 11 2014.

United Nations. 2009. UN Secretariat campaign for violence against women. Available at

https://www.un.org/en/events/endviolenceday/pdf/UNiTE_TheSituation_EN.pdf

United Nations Population Fund. 2013. Addressing gender-based violence. Available at

http://www.unfpa.org/webdav/site/global/shared/documents/publications/2013/final%20se

xual%20violence%20CSW%20piece.pdf

United Nations Population Fund. Gender-based violence. Available at

http://www.unfpa.org/gender/violence.htm

United States Agency for International Development. 2008. Indicators for gender violence.

Available at http://www.cpc.unc.edu/measure/publications/ms-08-30

Vieweg S, Hughes AL, Starbird K, Palen L. 2010. Microblogging during two natural hazards

events: What Twitter may contribute to situational awareness. CHI ’10: Proceedings of

the SIGCHI Conference on Human Factors in Computing Systems 1,079–1,088.

doi:http://dx.doi.org/10.1145/1753326.1753486, accessed 26 December 2013.

World Health Organization. 2013b. Global and regional estimate of violence against women.

Available at

http://apps.who.int/iris/bitstream/10665/85239/1/9789241564625_eng.pdf

World Health Organization. (2013a). WHO Fact sheet. Available at

http://www.who.int/mediacentre/factsheets/fs239/en/

Yoshihama M, Blazevski J, Bybee D. 2014. Enculturation and attitudes toward intimate partner

violence and gender roles in an Asian Indian population: Implications for community

based prevention. American Journal of Community Psychology 53:249-260.

Social mining for sustainable cities: thematic study of gender-based violence coverage in news articles and domestic violence in relation to COVID-19

Article

Full-text available

Apr 2022

We argue that social computing and its diverse applications can contribute to the attainment of sustainable development goals (SDGs)—specifically to the SDGs concerning gender equality and empowerment of all women and girls, and to make cities and human settlements inclusive. To achieve the above goals for the sustainable growth of societies, it is crucial to study gender-based violence (GBV) in a smart city context, which is a common component of violence across socio-economic groups globally. This paper analyzes the nature of news articles reported in English newspapers of Pakistan, India, and the UK—accumulating 12,693 gender-based violence-related news articles. For the qualitative textual analysis, we employ Latent Dirichlet allocation for topic modeling and propose a Doc2Vec based word-embeddings model to classify gender-based violence-related content, called GBV2Vec. Further, by leveraging GBV2Vec, we also build an online tool that analyzes the sensitivity of Gender-based violence-related content from the textual data. We run a case study on GBV concerning COVID-19 by feeding the data collected through Google News API. Finally, we show different news reporting trends and the nature of the gender-based violence committed during the testing times of COVID-19. The approach and the toolkit that this paper proposes will be of great value to decision-makers and human rights activists, given the prompt and coordinated performance against gender-based violence in smart city context—and can contribute to the achievement of SDGs for sustainable growth of human societies.

Using Twitter-based data for sexual violence research: a scoping review (Preprint)

Article

Full-text available

Feb 2023
J MED INTERNET RES

Background: Scholars have used data from in-person interviews, administrative systems, and surveys for sexual violence research. Using Twitter as a data source for examining the nature of sexual violence is a relatively new and underexplored area of study. Objective: We aimed to perform a scoping review of the current literature on using Twitter data for researching sexual violence, elaborate on the validity of the methods, and discuss the implications and limitations of existing studies. Methods: We performed a literature search in the following 6 databases: APA PsycInfo (Ovid), Scopus, PubMed, International Bibliography of Social Sciences (ProQuest), Criminal Justice Abstracts (EBSCO), and Communications Abstracts (EBSCO), in April 2022. The initial search identified 3759 articles that were imported into Covidence. Seven independent reviewers screened these articles following 2 steps: (1) title and abstract screening, and (2) full-text screening. The inclusion criteria were as follows: (1) empirical research, (2) focus on sexual violence, (3) analysis of Twitter data (ie, tweets or Twitter metadata), and (4) text in English. Finally, we selected 121 articles that met the inclusion criteria and coded these articles. Results: We coded and presented the 121 articles using Twitter-based data for sexual violence research. About 70% (89/121, 73.6%) of the articles were published in peer-reviewed journals after 2018. The reviewed articles collectively analyzed about 79.6 million tweets. The primary approaches to using Twitter as a data source were content text analysis (112/121, 92.5%) and sentiment analysis (31/121, 25.6%). Hashtags (103/121, 85.1%) were the most prominent metadata feature, followed by tweet time and date, retweets, replies, URLs, and geotags. More than a third of the articles (51/121, 42.1%) used the application programming interface to collect Twitter data. Data analyses included qualitative thematic analysis, machine learning (eg, sentiment analysis, supervised machine learning, unsupervised machine learning, and social network analysis), and quantitative analysis. Only 10.7% (13/121) of the studies discussed ethical considerations. Conclusions: We described the current state of using Twitter data for sexual violence research, developed a new taxonomy describing Twitter as a data source, and evaluated the methodologies. Research recommendations include the following: development of methods for data collection and analysis, in-depth discussions about ethical norms, exploration of specific aspects of sexual violence on Twitter, examination of tweets in multiple languages, and decontextualization of Twitter data. This review demonstrates the potential of using Twitter data in sexual violence research.

Harnessing the Potential of Google Searches for Understanding Dynamics of Intimate Partner Violence Before and After the COVID-19 Outbreak

Article

Full-text available

Aug 2022
EUR J POPUL

Most social phenomena are inherently complex and hard to measure, often due to under-reporting, stigma, social desirability bias, and rapidly changing external circumstances. This is for instance the case of Intimate Partner Violence (IPV), a highly-prevalent social phenomenon which has drastically risen in the wake of the COVID-19 pandemic. This paper explores whether big data-an increasingly common tool to track, nowcast, and forecast social phenomena in close-to-real time-might help track and understand IPV dynamics. We leverage online data from Google Trends to explore whether online searches might help reach "hard-to-reach" populations such as victims of IPV using Italy as a case-study. We ask the following questions: Can digital traces help predict instances of IPV-both potential threat and actual violent cases-in Italy? Is their predictive power weaker or stronger in the aftermath of crises such as COVID-19? Our results suggest that online searches using selected keywords measuring different facets of IPV are a powerful tool to track potential threats of IPV before and during global-level crises such as the current COVID-19 pandemic, with stronger predictive power post outbreaks. Conversely, online searches help predict actual violence only in post-outbreak scenarios. Our findings, validated by a Facebook survey, also highlight the important role that socioeconomic status (SES) plays in shaping online search behavior, thus shedding new light on the role played by third-level digital divides in determining the predictive power of digital traces. More specifically, they suggest that forecasting might be more reliable among high-SES population strata. Supplementary information: The online version contains supplementary material available at 10.1007/s10680-022-09619-2.

He said “who’s gonna take care of your children when you are at ACL?”: Reported Sexist Acts are Not Sexist

Conference Paper

Jan 2020

Twitris: A System for Collective Social Intelligence

Chapter

Jun 2018

TOWARDS DESIGNING PERSUASIVE TECHNOLOGIES TO CONTEST GENDER- BASED VIOLENCE (GBV) IN NAMIBIA

Thesis

Apr 2018

Gender-based violence (GBV) is defined as a harmful social act that is perpetrated against a person based on their socially ascribed gender. It occurs throughout the world across different cultures and despite the attention and multiple interventions, it continues unabated. Namibia is no exception and GBV has taken on epidemic proportions in recent years. As previous traditional interventions have not led to the desired decrease of violence, we postulate that socio-cultural issues need to be taken into consideration if a societal change is desired. Efforts to advocate for public education, cultural learning, and campaigns on social issues in Namibia favour passive communication mediums at their disposal. Yet platforms such as public art and interactive installations are becoming essential tools to enliven, inspire, motivate and provoke people into the discussion of topics relating to their esteemed cultures and society. Technology and the Internet, on the other hand, are quickly introducing newer platforms providing opportunities for learning and advocating against GBV. Nonetheless, technology interventions that advocate against GBV in Namibia mainly focus on social media and do not leverage other technological user experiences beyond the social media sphere. The focal objective of the study is to design a persuasive technology that addresses the underlying causes of GBV and evokes changes in values and attitudes towards GBV in Namibia while taking into consideration the complexity of the issue (GBV) and challenges related to designing persuasive technologies. The study employed a research through design methodology. An interactive technology hut installation, inspired by local cultural elements, was designed as a data collection tool and exhibited at various public events and places. The data analysis revealed the values and attitudes of participants towards GBV. An evaluation of the interactive installation leads to insights of design tactics around public art, cultural probes, role-playing simulation and embedding discomforting experience in delivering the design implications of a persuasive technology to contest GBV in Namibia through a practice of design.

The mediatization of femicide: a corpus-based study on the representation of gendered violence in italian media

Article

Full-text available

Jan 2021

This methodological study deploys hybrid techniques to investigate how femicide is framed in media.Results are consistent with ISTAT data and with the literature, and also offer novel insights. We find a tendency of not holding offenders accountable; that most femicides are perpetrated by men that victims know well; and that mediatic discourse around such crimes increases in certain circumstances and moments of the year. The analysis of the docu-fiction Amore Criminale reveals that metaphors are frequently used to sketch the participants’ socio-psychological portraits. Iconic speech and gestures are frequently employed by interviewees to report and mime episodes of violence./// Questo studio propone un metodo ibrido per indagare la rappresentazione linguistica del femminicidio nei media italiani. I risultati sono coerenti con i dati ISTAT e con la letteratura, e offrono nuovi spunti di riflessione. Si riscontra: una tendenza a deresponsabilizzare i colpevoli; che la maggior parte dei delitti sono compiuti da uomini vicini a esse; e che su tali delitti i media si concentrano in specifiche circostanze e momenti dell’anno. L’analisi sulla docu-fiction Amore Criminale rivela che per delineare ritratti sociopsicologici di vittime e carnefici si impiegano metafore, mentre per descrivere/mimare episodi di violenza si impiegano strategie iconiche. Keywords: corpus linguistics, multimodal analysis, Structural Topic Model, television language, journalistic language

Social Media for Mental Health: Data, Methods, and Findings

Chapter

Aug 2020

There is an increasing number of virtual communities and forums available on the web. With social media, people can freely communicate and share their thoughts, ask personal questions, and seek peer-support, especially those with conditions that are highly stigmatized, without revealing personal identity. We study the state-of-the-art research methodologies and findings on mental health challenges like depression, anxiety, suicidal thoughts, from the pervasive use of social media data. We also discuss how these novel thinking and approaches can help to raise awareness of mental health issues in an unprecedented way. Specifically, this chapter describes linguistic, visual, and emotional indicators expressed in user disclosures. The main goal of this chapter is to show how this new source of data can be tapped to improve medical practice, provide timely support, and influence government or policymakers. In the context of social media for mental health issues, this chapter categorizes social media data used, introduces different deployed machine learning, feature engineering, natural language processing, and surveys methods and outlines directions for future research.

From Slut Shaming to Cultural Commentary: What Live Tweeting Practices of Viewers of ABC's The Bachelorette Reveal about Gender Policing an Digital Activism on Twitter

Chapter

Full-text available

May 2017

Melissa Ames

This article analyzes live tweets posted by viewers of ABC's The Bachelorette during a network-promoted scandal concerning the star's sexual activity on the reality TV program. This study notes how problematic gender norms were reinforced within the conversation unfolding on Twitter and how a subset of tweets served to critique the sexism found within the program, the Twitter feed, and in society more generally. As these tweets attempting to combat gender norms can be considered a form of digital activism, this study also analyzes the ways in which Twitter's particular communication format might complicate and/or interfere with their desired societal critiques. On June 22, 2015, the Twitterverse erupted when the star of ABC's The Bachelorette had sex with one of her male suitors prior to the show's pre-approved, pre-scripted timeline. Far from being a PG-rated reality TV franchise, the long-running show is well known for broadcasting a slew of make out sessions and an entire episode devoted to speculating on whether the bachelor or bachelorette will sleep with any or all of his or her final three contestants in the fantasy suite. Yet when an episode aired revealing that Kaitlyn Bristowe, the season's bachelorette, and repeat contestant, Nick Viall, had slept together at the close of their one-on-one date, Kaitlyn faced a wave of criticism from viewers through social media. Over 80,000 tweets with the hashtag #TheBachelorette appeared in the 24 hours surrounding this episode, and a vast majority of them were negative posts consisting of judgmental quips and derogatory slurs focusing on Kaitlyn's sexual activity. These tweeters, the majority of whom were female, were quick to affix all the normal labels used to discuss so-called female promiscuity. Among the tamer tweets were chastising posts, such as "Kaitlyn needs to learn how to keep it classy & not so trashy" (@otrat_rowyso). Amid the caustic remarks were also hundreds of tweets defending the star. For example, comedian Amy Schumer (@amyschumer) posted: "Oh no someone slept with a guy they're dating and considering marrying! Showing love for @kaitlynbristowe." Tweets that challenged slut shaming began to enter the feed, as did posts that specifically criticized ABC's producers for the ways in which the show participated in and encouraged such shaming. While some important conversations resulted from this sensationalized reality television episode (Gray, 2015; Uffalussy, 2015; Yahr, 2015), the initial social media response it provoked reveals how expectations for single women on the dating market today are entrenched in problematic sexual double standards that have remained unaltered for decades. Consider, for example, this tweet posted during the episode: "you can turn a housewife into a hoe. But you can't turn a hoe into a housewife" (@HeatherGossman). As the negative twitter posts prove, many still believe that certain behaviors determine whether a woman is good girlfriend or wife material, and at the top of the list remains her sexual history. This study notes the pervasiveness of these problematic gender norms within the collected tweets and analyzes a subset of posts that serve to critique these norms and provide broader cultural commentary. It could be argued that these latter tweets combatting gender norms are a form of digital activism. As such, this study analyzes the ways in which Twitter's particular communication format might complicate or interfere with their societal critiques.

Bachelorette Twitter Study - Computers & Writing Proceedings

Chapter

Full-text available

May 2017

Melissa Ames

The Tweets They Are a-Changin’: Evolution of Twitter Users and Behavior

Article

Full-text available

May 2014

The microblogging site Twitter is now one of the most popular Web destinations. Due to the relative ease of data access, there has been significant research based on Twitter data, ranging from measuring the spread of ideas through society to predicting the behavior of real-world phenomena such as the stock market. Unfortunately, relatively little work has studied the changes in the Twitter ecosystem itself; most research that uses Twitter data is typically based on a small time-window of data, generally ranging from a few weeks to a few months. Twitter is known to have evolved significantly since its founding, and it remains unclear whether prior results still hold, and whether the (often implicit) assumptions of proposed systems are still valid. In this paper, we take a first step towards answering these question by focusing on the evolution of Twitter's users and their behavior. Using a set of over 37 billion tweets spanning over seven years, we quantify how the users, their behavior, and the site as a whole have evolved. We observe and quantify a number of trends including the spread of Twitter across the globe, the rise of spam and malicious behavior, the rapid adoption of tweeting conventions, and the shift from desktop to mobile usage. Our results can be used to interpret and calibrate previous Twitter work, as well as to make future projections of the site as a whole.

Life in the network: The coming age of computational social science

Article

Full-text available

Jan 2009

Battery and Development: Exploring the link between Intimate Partner Violence and Modernization

Article

Full-text available

Aug 2013
CROSS-CULT RES

This article focuses on the changes in attitudes about sexuality, gender equality, and intimate partner violence within the context of modernization. Revised modernization theory predicts that increasing development leads to greater levels of egalitarian gender values and liberal sexual mores as part of a larger change in society. Our analysis leads to the conclusion that although both these sets of attitudes are a part of the movement towards postmaterialist values, in the context of intimate partner violence, different dynamics prevail at different levels of development. Using regression analysis and data from the fifth wave of the World Values Survey, we find a significant relationship between attitudes towards intimate partner violence, egalitarian gender values and liberal sexual mores. In general, liberal attitudes towards sexuality do not necessarily mean a lower tolerance for intimate partner violence. Crucially, the relationship between these three sets of values depends on the level of development. We find that in agrarian and industrial societies, higher levels of liberal sexual mores with lower levels of egalitarian gender values lead to a higher level of support for intimate partner violence against women.

Identifying Seekers and Suppliers in Social Media Communities to Support Crisis Coordination

Article

Full-text available

Dec 2014

Effective crisis management has long relied on both the formal and informal response communities. Social media platforms such as Twitter increase the participation of the informal response community in crisis response. Yet, challenges remain in realizing the formal and informal response communities as a cooperative work system. We demonstrate a supportive technology that recognizes the existing capabilities of the informal response community to identify needs (seeker behavior) and provide resources (supplier behavior), using their own terminology. To facilitate awareness and the articulation of work in the formal response community, we present a technology that can bridge the differences in terminology and understanding of the task between the formal and informal response communities. This technology includes our previous work using domain-independent features of conversation to identify indications of coordination within the informal response community. In addition, it includes a domain-dependent analysis of message content (drawing from the ontology of the formal response community and patterns of language usage concerning the transfer of property) to annotate social media messages. The resulting repository of annotated messages is accessible through our social media analysis tool, Twitris. It allows recipients in the formal response community to sort on resource needs and availability along various dimensions including geography and time. Thus, computation indexes the original social media content and enables complex querying to identify contents, players, and locations. Evaluation of the computed annotations for seeker-supplier behavior with human judgment shows fair to moderate agreement. In addition to the potential benefits to the formal emergency response community regarding awareness of the observations and activities of the informal response community, the analysis serves as a point of reference for evaluating more computationally intensive efforts and characterizing the patterns of language behavior during a crisis.

Indonesian Men's Perceptions of Violence Against Women

Article

Full-text available

Jul 2014

This article explores male perceptions and attitudes toward violence against women in Indonesia. It analyzes interview data from Indonesian men collected as part of a large multimethod Australian government-funded project on masculinities and violence in two Asian countries. Reluctance to talk about violence against women was evident, and the accounts of those men who did respond referred to three justificatory discourses: denial, blaming the victim, and exonerating the male perpetrator. The findings support continuation of government and nongovernmental organization (NGO) projects aimed at both empowering women and reeducating men.

Research design: qualitative, quantitative, and mixed methods approaches

Book

Jan 2014

John W Creswell

Increasing Rejection of Intimate Partner Violence: Evidence of Global Cultural Diffusion

Article

Mar 2013

Rachael S. Pierotti

This study extends existing world society research on ideational diffusion by going beyond examinations of national policy change to investigate the spread of ideas among nonelite individuals. Specifically, I test whether recent trends in women's attitudes about intimate partner violence are converging toward global cultural scripts. Results suggest that global norms regarding violence against women are reaching citizens worldwide, including in some of the least privileged parts of the globe. During the first decade of the 2000s, women in 23 of the 26 countries studied became more likely to reject intimate partner violence. Structural socioeconomic or demographic changes, such as urbanization, rising educational attainment, increasing media access, and cohort replacement, fail to explain the majority of the observed trend. Rather, women of all ages and social locations became less likely to accept justifications for intimate partner violence. The near uniformity of the trend and speed of the change in attitudes about intimate partner violence suggest that global cultural diffusion has played an important role.

Estimating county health statistics with twitter

Article

Apr 2014

Aron Culotta

Understanding the relationships among environment, behavior, and health is a core concern of public health researchers. While a number of recent studies have investigated the use of social media to track infectious diseases such as influenza, little work has been done to determine if other health concerns can be inferred. In this paper, we present a large-scale study of 27 health-related statistics, including obesity, health insurance coverage, access to healthy foods, and teen birth rates. We perform a linguistic analysis of the Twitter activity in the top 100 most populous counties in the U.S., and find a significant correlation with 6 of the 27 health statistics. When compared to traditional models based on demographic variables alone, we find that augmenting models with Twitter-derived information improves predictive accuracy for 20 of 27 statistics, suggesting that this new methodology can complement existing approaches.

Attitudes Toward Violence Against Women: A Cross-Nation Study

Article

Oct 2003

An understanding of attitudes toward violence against women is vital for effective prevention strategies. In this study we examined attitudes regarding violence against women in samples of undergraduate women and men students from four countries: India, Japan, Kuwait, and the United States. Attitudes toward sexual assault and spousal physical violence differed between men and women and across the four countries. Variations in gender differences across countries indicated that, for attitudes regarding sexual assault of women in particular, sociocultural factors may be a stronger influence than gender. Findings suggest the importance of examining differences within the larger sociocultural context of political, historical, religious, and economic influences on attitudes toward gender roles and violence against women.

Don't Think of an Elephant!: Know Your Values and Frame the Debate--The Essential Guide for Progressives

Book

Jan 2004

Gender-Based Violence in 140 Characters or Fewer: A #BigData Case Study of Twitter

Abstract and Figures

Recommended publications

Education in Conflict Zones: a Web and Mobility Approach

Extensible Framework for Rapid Exploration of Social Media

SportSense: Real-Time Detection of NFL Game Events from Twitter

Building Smart Communities with Cyber-Physical Systems