ArticlePDF Available

Abstract and Figures

Humanitarian and public institutions are increasingly relying on data from social media sites to measure public attitude, and provide timely public engagement. Such engagement supports the exploration of public views on important social issues such as gender-based violence (GBV). In this study, we examine Big (Social) Data consisting of nearly fourteen million tweets collected from the Twitter platform over a period of ten months to analyze public opinion regarding GBV, highlighting the nature of tweeting practices by geographical location and gender. The exploitation of Big Data requires the techniques of Computational Social Science to mine insight from the corpus while accounting for the influence of both transient events and sociocultural factors. We reveal public awareness regarding GBV tolerance and suggest opportunities for intervention and the measurement of intervention effectiveness assisting both governmental and non-governmental organizations in policy development.
Content may be subject to copyright.
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
1
Gender-Based Violence in 140 Characters or Fewer:
A #BigData Case Study of Twitter
Hemant Purohit1,2, Tanvi Banerjee1,2, Andrew Hampton1,3, Valerie L. Shalin1,3,
Nayanesh Bhandutia4 & Amit P. Sheth1,2
1Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Dayton, OH, USA
2Department of Computer Science and Engineering, Wright State University, Dayton, OH, USA
3Department of Psychology, Wright State University, Dayton, OH, USA
4United Nations Population Fund Headquarters, NYC, NY, USA
Corresponding Authors:
Hemant Purohit, Andrew Hampton, Valerie Shalin: {hemant, andrew, valerie}@knoesis.org
ABSTRACT
Humanitarian and public institutions are increasingly relying on data from social media sites to measure
public attitude, and provide timely public engagement. Such engagement supports the exploration of
public views on important social issues such as gender-based violence (GBV). In this study, we examine
Big (Social) Data consisting of nearly fourteen million tweets collected from the Twitter platform over a
period of ten months to analyze public opinion regarding GBV, highlighting the nature of tweeting
practices by geographical location and gender. The exploitation of Big Data requires the techniques of
Computational Social Science to mine insight from the corpus while accounting for the influence of both
transient events and sociocultural factors. We reveal public awareness regarding GBV tolerance and
suggest opportunities for intervention and the measurement of intervention effectiveness assisting both
governmental and non-governmental organizations in policy development.
Keywords: computational social science, gender-based violence, social media, citizen sensing, public
awareness, public attitude, policy, intervention campaign
KEY FINDINGS & IMPLICATIONS (see Table 1 for examples)
Social Media Content:
1. Substantial GBV related content exists in social media.
2. Spikes in GBV content reflect the influence of transient events, particularly involving celebrities.
3. Gender, language, technology penetration, and education influence participation with implications
for the interpretation of quantitative measures.
4. GBV content includes humor and metaphor (e.g. in Sports) that reflect both attitude and behavior.
5. Content highlights the role of government, law enforcement and business in the tolerance of GBV.
Relevance to GBV Policy:
6. Social Media provides an alternative for measuring GBV attitude and behavior that is cheaper,
faster, and broader than conventional survey-based methods.
7. Regional socio-cultural context influences both the measurement and the interpretation of data.
8. Computational methods for context-sensitive monitoring, modeling and interpreting GBV social
media content are highly feasible.
9. The location and network of social media participants supports targeted regional anti-GBV
campaigns.
10. Policy makers require tools to make social media content accessible in near-real time to monitor the
effectiveness of anti-GBV campaigns.
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
2
[1.] INTRODUCTION
Gender-based violence (GBV), primarily against women, is a pervasive, global
phenomenon affecting both developed and developing countries. Over 35% of the world’s
female population has experienced gender-based violence at some point in their lives (World
Health Organization, 2013). According to the United Nations Population Fund (UNFPA)1,
“GBV is a serious public health concern that also impedes the crucial role of women and girls in
development.” However, anti-GBV sentiment is not universal, apparent in sexist chants during a
professional sport2 and the public response to convicted offenders3. While a United Nations
campaign acknowledges GBV as a societal problem (United Nations, 2009), prevalence is very
difficult to assess. The European Union’s council report on GBV highlighted a persistent lack of
comparable data across regions and over time (European Union, 2010), hampering both
assessment and mitigation. Both the UNFPA4 and the European Union Agency for Fundamental
Rights5 seek better data sourcing and policy design (European Union Agency for Fundamental
Rights, 2012).
Mining large-scale online data from mobile technology and social media promises to
complement traditional methods and provide greater insight with finer detail. Computational
Social Science (Lazer et al., 2009) has leveraged such data to inform programs in a variety of
domains, for example disaster response coordination (Purohit et al., 2014; Purohit et al., 2013;
Vieweg et al., 2010), health (Culotta, 2014; De Choudhury et al., 2013), and drug abuse
(Cameron et al., 2013). Here we examine the potential of Computational Social Science to
address the problem of monitoring and mitigating GBV on a global scale (see Table 1),
confronting the typical Big Data challenges of large-scale volume, velocity of content generation,
sparsity of data behaviors, variety in language complexity and heterogeneity in participant
demographics. Below, we provide several analyses including the volume, location, source
gender, and a variety of content analyses to help monitor GBV, and both inform the design and
assess the effectiveness of anti-GBV campaigns.
Traditional GBV monitoring methods face many challenges. Gathering statistics on GBV
episodes is time consuming, collected under non-standardized protocols, and published in highly
aggregated form. For example, the non-partner sexual violence prevalence data published by the
World Health Organization6 is dated 2010 and reported by 21 aggregated geographic regions
such as West Africa and South Asia. While the United Nations Office on Drugs and Crime data7
are less aggregated, and somewhat more recent with some data available as recently as 2012 and
separated by country, the data are sparse and incomplete. For instance, sexual violence data are
available for the Philippines and Nigeria for 2012, but the most recent data for India are from
1 UNFPA agency’s description about GBV: http://www.unfpa.org/gender/violence.htm
2 http://www.bbc.com/news/blogs-trending-31628729
3 http://www.telegraph.co.uk/news/worldnews/asia/india/11443462/Delhi-bus-rapist-blames-his-victim-in-prison-
interview.html
4 UNFPA agency: http://www.unfpa.org/public/
5 EU FRA agency: http://fra.europa.eu/en/project/2012/fra-survey-gender-based-violence-against-women
6 Data available here: http://apps.who.int/gho/data/view.main.NPSVGBDREGION
7 Data available here: http://www.unodc.org/unodc/en/data-and-analysis/statistics/crime.html
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
3
2010 and no data are presented for South Africa. The lag itself prevents monitoring change,
either to detect unexpected increases in GBV, change in attitude, possibly due to recent events
and awareness campaigns, or decreases resulting from mitigation efforts. Furthermore, these
statistics reflect legal definitions, making direct comparisons among countries impossible due to
differences in the definition and recording of offenses.
Table 1. Tweet samples from our analysis (see Section 3) with implications to inform and design targeted
GBV intervention campaigns. M1-M3 illustrates existing automatic analysis capability, while M4-M8
result from partially automated analyses presenting the case for expanding computational methods.
Message
Implications
M1: RT @USER1: 1 in 3 women are
raped/abused in their lifetime. RT if you rise to
stop the violence. #1billionrising
http://t.co/lXEEmQoLbO
Volumetric analysis of social media can help measure
population engagement and effective penetration of designed
campaigns in the community
M2: #StopRape Rape Crisis says many survivors
of sexual abuse and assault still don't feel
confident in the criminal justice system. CS
Location analysis to identify message origin (e.g., South
Africa in M2) and route to appropriate agencies
M3: @USER2 Takes a MOMENT 2 Sign & ask
Others 2,so DV & Rape Laws become Equal.
TOGETHER we can Change History:
http://t.co/ylWtl5PgCI
Gender detection analysis of message authors (e.g., female in
M3) to adjust apparent GBV attitudes for gender and suggest
the content of anti-GBV policy and campaigns
M4: RT @USER3: Rape prevention nail polish
sounds like a great idea but Iím not sure how
youíre going to get men to wear it
Content analysis sensitive to sociocultural considerations,
such as the role of humor overall, and for acknowledging
despair (e.g., in M4 from the Philippines)
M5: RT @USER4: 15 yard penalty for
"unnecessary rape" http://t.co/yhzxtYzGP0
Metaphor analysis indicates the acceptance of GBV, echoed
for example in sports, and suggesting opportunities for
specific anti-GBV campaigns
M6: Valentine's day is really helping me sell
these date rape drugs
Entity recognition based on knowledge-bases of GBV
entities can help identify precipitating events (e.g., Valentine
day in M6), apriori to design preventive campaigns
M7: @USER5 WB Govt must have ordered
Police to protect the family of Rape-Victim. It is
shameful for Mamata Banerjee O GOD GIVE
WISDOM TO ALL
Organization detection, including government, law
enforcement and commercial entities in relation to GBV can
inform policy, e.g., M7 informs potential lack of police
protection for a victim’s family in West Bengal (WB)
M8: RT @USER6: It is not my job to coddle and
"educate" young Black men when it comes to
violence against women. Y'all wanna "teach"?
Modeling of stereotypical association can inform design of
targeted campaigns, e.g., M8 author is stereotyping GBV
violence with black men
Note. We anonymized user mentions as per the IRB guidelines.
Apart from the basic problems in the logistics of gathering GBV data, the data
themselves are limited. Reliance on formal reports risks under-reporting by victims and
witnesses, who may believe that domestic violence is a private matter (Nilan et al., 2014). In
fact, such evaluative attitudes (Allport, 1935), left implicit in conventional data collection
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
4
methods, may provide a better metric of GBV risk and tolerance. Aggregation/generalization
across localities with different socio-economic properties masks important trends (Yoshihama,
Blazevski, & Bybee, 2014) because sociocultural context including politics, history, religion, and
economy strongly influence attitudes (Nayak et al., 2003). Researchers suggest the need for
different prediction and mitigation models for different sociocultural contexts. For example, an
attitude of liberal sexual conduct combines with egalitarian values and economic development to
predict tolerance for GBV (Martinez & Khalil, 2012). Similarly, Corroon et al. (2014) found that
encouraging the Nigerian population to accept responsibility for their own health affects health
outcomes only in specific cities and urban regions.
Social science survey methods complement the data gathering efforts of governmental
and non-governmental agencies. However, survey data also has several limitations. Surveys
reflect sampling biases and various confounds (Pierotti, 2013). For example, the rates of sexual
violence from a survey of patients being treated for sexually transmitted disease in South Africa
(Kalichman et al., 2005) bear uncertain relationship to the general population. Survey candidates
may refuse to participate, provide socially acceptable but inaccurate responses, or respond to
superficial yes/no questions using local mores and standards with high violence thresholds (Nilan
et al., 2014). For example, perpetrators may not consider beating to be an act of violence if the
victim failed to comply with behavioral norms. We highlight five additional limitations of survey
data of particular relevance to Computational Social Science methods. First, survey items tend to
address attitudes but not behavior, and therefore bear unclear relationship to the rate of GBV
episodes (Bhanot & Senn, 2007). In contrast, inasmuch as verbal abuse constitutes a form of
violence, social media posts can provide actual instances of behavior. Second, the methods fail to
account for transient global events, such as political or celebrity activity that can influence views
and responses, hindering comparison over time. Third, the items themselves presume an
established theory and standard measures of GBV, limiting the opportunity to discover latent
patterns that reflect attitude and behavior. For example, metaphor is a powerful reflection of
public opinion (Lakoff, 2008), but to our knowledge has not been explored in survey measures.
Fourth, survey methods constitute a highly labor intensive data collection method, necessitating
small samples while imposing cost and lag in data availability in a dynamic world. For example,
Pierotti’s survey examined change over an average of five year intervals, ranging from three to
seven years (Pierotti, 2013). Finally, survey methods are by design static, discouraging
questionnaire modification to incorporate newly detected trends once the survey data collection
begins.
Social media provide a faster, cheaper and face-valid means to engage the public,
providing unprecedented large-scale access to public views and behavior (see Table 1). It
provides an ability to monitor attitudes in near real-time, to support timely mitigation efforts.
While use of social media give advantages with regards to speed (velocity), in some case
participation, broad sourcing and lower cost, studies cannot be tightly controlled with specific
statistical sampling, availability of demographic data may be limited or language use can skew
coverage–we discuss limitations later in section 4. Ultimately all three resources (formal reports,
surveys, and social media) require integration in order to assist policy design, prioritize attention
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
5
for interventions, and design region-specific programs to curb GBV. A logical first step is to
understand what social media offer for GBV monitoring and the design of mitigation and policy.
Study Design Overview
Based on the suggestions of our UNFPA collaborators to identify a GBV related corpus,
we selected three major themes that encompass gender violence concerns: physical violence,
sexual violence, and harmful socio-cultural practices. Corresponding to these three themes, we
created a seed set of keywords for data, crawling the Twitter social media microblogging
platform. We also selected four countries with suspected elevations in GBV suggested by
UNFPA experts: India, Nigeria, the Philippines, and South Africa, in addition to the U.S. given
its Internet penetration. In this paper we assess the role of social media (data from Twitter) to
identify public views related to GBV in these chosen parts of world, and tweeting practices by
geography, time, gender, and events to inform concerned parties and assist GBV policy design.
Figure 1. Summary of the education gap between genders, the penetration of Internet, and overall literacy
rates in the diverse set of chosen countries.
The five countries present different contexts both for understanding social media data
pertaining to GBV and for mitigation efforts. Figure 1 summarizes their variability across some
key contextual dimensions: the education gap between genders, the penetration of Internet, and
overall literacy rates. The figure illustrates the clustering of Nigeria and India for lower literacy
rates, a greater education gap and lower Internet penetration. South Africa and the Philippines
cluster with the U.S. regarding overall literacy and the reduced education gap, but reflect a
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
6
diverging range of Internet penetration. The graphs suggest the risk of sampling bias that affects
data interpretation: an illiterate female citizen with no access to the Internet (likely in rural areas
with unique GBV issues) may not be providing social media data, biasing the aggregated
measures of attitude. Other differences may also be relevant. For example, India is among the top
20 of over 140 countries regarding female political empowerment while Nigeria is below
average on this dimension. Additional influences on the use of social media not depicted in the
figure include cultural influences on free speech. Sizable percentage of Nigerians for example
may avoid public conversation about the Boko Haram atrocities8 due to fear of revenge.
In the next section, we employ quantitative and qualitative analyses to examine Twitter
content related to GBV. Twitter supports the distribution of short messages called tweets that are
a maximum 140 characters in length. The character limit influences message style and constrains
communication practices. Therefore, tweets often contain URL links to web pages or blogs,
sometimes relying on shortened URL versions from external services (e.g.,
http://bit.ly/1C4HnMN). A hashtag convention (e.g., #RapeJoke, #ChildMarriage) supports the
identification of searchable user-defined topics. Other Twitter engagement features include
retweet (or ‘RT’, a forwarding of someone’s tweet). The device used to post a tweet may provide
accessible, precise location indicators in some cases. Alternatively, accessible user profiles
provide more general indicators of location and sometimes gender indicators such as author
name.
A corpus of Big (Social) Data collected over a period of ten months included nearly
fourteen million tweets. In this corpus, we examine volume, location, trends over time, gender
participation, and content such as metaphor and humor. Our analyses present both challenge and
opportunity to study phenomenon of gender-based violence. Challenges concern the need for
computational methods to discern public perception and attitude from complex contextualized
behavior. Opportunity lies in gaining fine-grained, region-specific insights concerning the
prevailing GBV attitudes, and related policies along with potential approaches to mitigation.
[2.] METHOD
We first describe our data collection for the study, followed by a description of the
analysis approach.
Data Collection
Based on domain expert guidance we collected data from the Twitter Streaming API
(Twitter Developers, 2014), using its ‘filter/track’ method for the given set of keywords. We
leveraged the keyword-set crawlers (data collectors) of our Twitris platform for mining
collective social intelligence as described in (Nagarajan et al. 2009, Sheth et al., 2014). Twitris
also provides real-time and scalable analyses of social data streams for specific topics of interest
by end users (discussed in future work). UNFPA experts assisted in the definitions corresponding
to the three themes of interest for GBV study and associated English terminology. Using the
8 For an overview, see the following: http://en.wikipedia.org/wiki/Boko_Haram
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
7
themes of physical violence, sexual violence, and harmful practices, we created a set of seed
keywords and key multi-word phrases for data crawling (see Table 2).
Table 2. Seed set of key phrases for GBV related data crawling.
Physical violence
woman dragged, women dragged, girl dragged, female dragged, woman kicked,
women kicked, girl kicked, female kicked, woman beat up, women beat up, girl
beat up, female beat up, woman beaten, women beaten, girl beaten, female beaten,
woman burn, women burn, girl burn, female burn, woman acid attack, women acid
attack, girl acid attack, female acid attack, woman violence, women violence, girl
violence, female violence
Sexual violence
sexual assault, sexual violence, rape, woman harass, women harass, girl harass,
female harass, woman attacked, women attacked, girl attacked, female attacked,
boyfriend assault, boy-friend assault, stalking woman, stalking women, stalking
girl, stalking female, groping woman, groping women, groping girl, groping female
Harmful Practices
child marriage, children marriage, underage marriage, forced marriage, sex
trafficking, woman trafficking, women trafficking, girl trafficking, female
trafficking, child trafficking, children trafficking
For each single keyword K, the Twitter service provided messages containing any form
of the keyword #K, k, K. For the multi-word phrase K, it provided the messages containing all
the terms of K. Each message is associated with metadata, containing various tweet related and
author profile characteristics, such as tweet origin location (latitude and longitude), time of
posting, author profile description and location, number of author followers (users that
subscribed to the author’s updates), and followees (users to whose updates the author
subscribed). Location metadata was crucial to geographical analysis. We first checked if tweet
origin latitude-longitude were available from the device used to send the tweet, else we resolved
the author profile location, if available, using Google Maps API. We used a bounding box of
latitude-longitude for a country of interest to identify a country-specific tweet dataset. Using
Genderize API (http://genderize.io), we collected genders of the authors of tweets. We first
fetched the real names of the twitter authors using metadata of their Twitter handles. We
extracted first names to detect author genders via calls to the Genderize API with first names as
parameters. We do note limitations with the gender detection due to informal social media
language use, and therefore discuss scope for improvement in our future work section.
Analysis Approach
With UNFPA guidance, we analyzed the data corpus of 13.8 million tweets from the ten
months in non-uniform time slices, starting with a smaller pilot phase and adding two additional
extended data collection periods due to supporting results from the pilot phase data.
SLICE 1: Jan 1 2014 - Feb 15 2014: 1.5 months; Phase-1 (pilot phase)
SLICE 2: Feb 15 2014 - June 31 2014: 4.5 months; Phase-2
SLICE 3: July 1 2014 - Oct 31 2014: 4 months; Phase-3
To study the diverse set of data from the three phases, we employed mixed methods
(Creswell, 2013) to reveal both patterns across the corpus as well as the content of specific
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
8
contributions. The focus of quantitative analysis is to provide data-driven insights of activity
patterns in the social media community, by examining large-scale distributions of GBV content
by geography, time, and gender. The recovery of meaning of a pattern requires more fine-grained
analysis than mere statistical distribution. Therefore, the focus of our qualitative analysis is to
reveal attitudes and behaviors across different countries and between genders. In both cases, our
interpretation relies on context, regarding current events and socio-cultural considerations,
ultimately supporting the need for context-sensitive computation for monitoring GBV content in
social media.
[3.] RESULTS
Quantitative Analysis
We discuss four types of quantitative analyses in this section: volume, theme, sharing, and
gender-based.
Volume analysis. To begin our study, we sampled an initial slice of data for 1.5 months
from all over the world, which contained 2.3 million tweets worldwide related to GBV. We skip
the detailed descriptive statistics report for brevity regarding this sample and instead summarize
our observations. Regarding our interest in location-specific GBV data, we note relatively few
tweets with device-based location information (1.91%). However, author profiles provide
location related information as well, including which over half of the data (54.74%) for 1.3
million users had location information. These results motivated our more extensive data
collection effort, spanning an additional 8.5 months.
Table 3 provides the composition of the full data set by country. More than 10% of the
social media traffic belongs to the five countries examined here. We note more than five times
the traffic in the U.S. and India relative to the other countries. Moreover, the observed frequency
ranking differs from the population demographic9 information for these countries (in descending
order: India, the U.S., Nigeria, the Philippines, South Africa). We suspect that Internet
penetration is influencing data collection. Worse, a frequency graph for all of the raw data over
time would mask variability in the Philippines, Nigeria, and South Africa with smaller
populations and variable Internet penetrability (We plan to address urban versus rural
comparison in our future work.) Figure 2 therefore scales the raw data by two factors: population
and Internet penetration. This allows us to consider the relative prevalence of GBV topics
between countries and over time. Below we discuss some of the emergent patterns by country.
Although we highlight the need to interpret these patterns with respect to a broad and complex
knowledge base that includes current events, these are precisely the sorts of analyses that are best
accomplished with computational tools.
9 http://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
9
Table 3. Volume of GBV related tweets by country.
Country of Origin
Number of Tweets
United States
698077
India
549033
Philippines
105550
Nigeria
134671
South Africa
104430
Note. This table represents 1,591,761 (11%) of the total 13,942,592 GBV tweets collected globally.
Figure 2. Tweet volume across countries from 1 January to 31 October 2014 related to GBV, and
controlling for the population and Internet penetration of each country. Vertical lines represent time
slices sampled. [Population expressed as a percentage of the smallest candidate (Nigeria). Nigeria = 1,
India ~ 25. The adjusted metric then is expressed as original tweet numbers divided by adjusted
population divided by Internet penetration.]
The trend in Fig. 2 shows that when controlling for population and internet penetration,
South Africa generates the most GBV related tweets overall, with the U.S. and India lagging
behind roughly the same amount, and Nigeria and the Philippines toward the bottom.
Several current events may explain the apparent peaks over time, revealed during a
parallel Google Trends search by country. In the Philippines, the ongoing saga of Vhong Navarro
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
10
dominated much of the discussion10. Navarro, a television personality, was assaulted in his home
in the January of 2014 apparently in retaliation for attempted rape. The incident stirred concerns
over rape culture, as many lashed out at the female accuser. However, the American movie star
Jennifer Lawrence also factored highly in the most searched items in the Philippines regarding
the hacking of her private email accounts.
Between March and July a vehemently anti-LGBT figure named Myles Munroe ranked
among the most-searched topics in Nigeria11. The largest Nigerian story from the liberal
perspective (the kidnap of the Chibok schoolgirls in mid-April) factored only ninth in Nigeria.
“Kidnapping” searches initially increased by about four fold in the month and a half following
this incident, but then dropped to previous levels. Given the responsible group’s influence and
tactics, we suspect citizens may have been reticent to discuss any wrongdoing openly via social
media. We make this point to illustrate the need to consider socio-cultural influences on social
media traffic. This suggests that we will require separate models of GBV in social media, by
country.
South Africa’s Google searches reflected considerable violence12. Among the top events
driving traffic was the murder of a prominent soccer player in October during a burglary at his
home. Oscar Pistorius also influenced the search patterns. The trial for the murder of his
girlfriend began on March 3 and temporarily adjourned on May 20. The trial resumed on June 30
through August 8, with a judgment on September 12 and sentencing on October 21. At least two
of the peaks in GBV related Twitter traffic are coincident with these events.
These observations suggest that any quantitative analysis of social media traffic must
control for current events, in order to separate fundamental trends from local variability. Such
control is well within the capability of computational analysis using topic-modeling approaches.
We note apparent correlations between certain countries: India and the U.S., as well as
the Philippines and Nigeria at least for the bulk of the data. We observed a 48.1% overlap in the
popular topics of the U.S. within the set for India for the month of August. Topics to determine
overlap were based on the top 500 key-phrases (set of words in a message text) extracted using
modified tf-idf based key-phrase extraction algorithm in Twitris (Nagarajan et al. 2009). Such
content overlaps suggest the presence of mediating influences. To provide a source of
hypotheses, we turn to demographic properties of the countries investigated in this study. We
note a large Indian diaspora living in the U.S., potentially reacting to events originating in India.
The story of a 15 year old Indian female acid attack victim received considerable attention
following an Al Jazeera report in January (Dixit, 2014). The shared August peak coincides with
an Independence Day speech by the Prime Minister of India (Narendra Modi) in which he urged
suspension of the practice of questioning the families of girls regarding their social habits but
never boys. Manual inspection of the U.S. corpus supports an explanation of the correlation
between the U.S. and India trends based on the Indian diaspora. Tweets with apparent origins in
10 http://www.google.com/trends/topcharts#geo=PH&date=2014
11 http://www.google.com/trends/topcharts#geo=NG&date=2014
12 http://www.google.com/trends/topcharts#geo=ZA&date=2014
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
11
the U.S. may actually constitute amplification of tweets with origins in India, as the following
example illustrates.
India: What world calls shining #India is the worst place for women in terms of #Rape
http://t.co/F2oyKps60R #BanBollywood #MediaMafia
US: What world calls shining #India is the worst place for women in terms of #Rape
http://t.co/zZoB17p1L1 #BanBollywood #PakMediaHijacked
The U.S. and Indian patterns illustrate an important challenge in the interpretation of social
media data—the trends do not necessarily reflect local events. Computational tools for
monitoring GBV will need to conduct location analysis from the text to distinguish commentary
about other countries. Model building must accommodate this distinction. However, given the
tweet pedigree, it is not clear to whom we should attribute the attitude.
Table 4. Percentages of sexual violence messages discovered by time slice and by country.
Country
SLICE-1
SLICE-2
SLICE-3
ALL
United States
65.10%
64.78%
63.90%
64.44%
India
66.05%
65.41%
65.30%
65.45%
Philippines
77.68%
69.08%
69.09%
72.00%
Nigeria
55.80%
47.13%
49.08%
49.59%
South Africa
71.15%
66.96%
63.69%
66.24%
Total
65.67%
62.52%
61.11%
62.43%
Theme analysis. Our original data collection categorized the tweets into three thematic
groups: physical violence, sexual violence, and harmful practices. Sexual violence dominates the
sample of GBV tweets (72% overall in global corpus), so we provide their distribution over time
in Table 4. We note a general decrease over time with some variability between countries. We
suspect early spikes related to incidents like Vhong Navarro rape accusation in Philippines and
the initial publicity surrounding the Boko Haram atrocities in Nigeria.
Sharing behavior analysis. Social media provides the opportunity to distribute
information, potentially reflecting both the senders’ judgment of information importance, and
reliance on the voice of others. Sharing functions to amplify these voices, often the influential
celebrities. We analyze two types of sharing behavior in the social media community
surrounding GBV events: direct content resharing as retweet (RT), and indirect sharing via
references to the external resources using URL, such as news, blogs, articles, and multimedia.
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
12
Table 5. Proportional RT and URL tweets by country.
Country
RT
URL
United States
45.47%
42.25%
India
46.68%
42.63%
Philippines
40.74%
18.79%
Nigeria
26.28%
62.13%
South Africa
44.25%
31.49%
We calculated the percentage of GBV retweets relative to the total count of tweets for
each data sample as shown in Table 5. More than 40% of the GBV corpus is a retweet in the US,
India, the Philippines, and South Africa, amplifying information that senders consider to be
important. For comparison, Liu, Kliman-Silver, and Mislove (2014) found that retweets
generally constitute just over 25% of the total volume of tweets. Although we note variability in
retweet behavior between countries, the low retweeting frequency in Nigeria is particularly
remarkable (see Table 5). One might hypothesize that a low literacy country such as Nigeria, in
which senders are less able to compose messages, would have the highest retweet ratio. The
adjacent analysis of URLs suggests a different socio-cultural phenomenon at work, concerning
the identifiability of the responsible party. For GBV tweets containing URLs, Nigeria has the
highest percentage of tweets with URLs in comparison to other countries. Numerous
explanations should be tested, including literacy, credibility of the public press, and the
possibility that reliance on external resources somehow reduces the threat of being identified as
the responsible party.
Author gender analysis. We obtained gender identification for 37% of the overall users
(see Table 6). The reduced percentage is due to names missing in the Genderize API lexicon, as
well as unconstrained natural language features of social media content, such as use of special
characters in names, for instance, ‘@@shish’ instead of ‘Aashish’, which is a male Indian name.
Table 6. Gender-wise distribution of data for the overall global corpus.
Statistic
Users
% of Total
Users
Generated Tweets
% of Total
Tweets
Total
3,036,576
100%
13,942,592
100%
- Gender Filtered*
1,148,329
37.82%
2,771,686
19.88%
-- Female author
563,016
18.54%
1,324,292
9.50%
-- Male author
585,313
19.28%
1,447,394
10.38%
*Filtered = where an author gender could be determined
Keeping in mind that statistically significant differences are certain with a large sample
dataset, the distribution of gender appears approximately equal in Table 6. We also note a tweet
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
13
frequency average for a female author as 2.352 tweets per author, while 2.472 tweets per author
for a male. However, Figure 3 separates the gender distribution for the examined countries, and
depicts the impression of gender inequality.
Figure 3. Relative distribution of tweets whose author’s gender we could determine.
The name classification procedure that we employed suggests a gender bias in the
samples across countries. Literacy serves as a partial explanation for the observed ratios, except
for the U.S. Apart from the explanation, opinions collected in the U.S., India, and Nigeria reflect
a male bias, while opinions collected from the Philippines and South Africa are more balanced or
even reflecting slight female bias. It has corresponding implications for the assessment of GBV
attitudes, the general reach of anti-GBV campaigns using social media and the ability to target
potential perpetrators and activists for engagement separately.
Qualitative Content Analysis
Thus far we have described the corpus with respect to dimensions that we anticipated at
data collection: country, theme of GBV event, sharing methods, and gender. We have made
suggestions about the surrounding context, including transient events that might explain the
observed patterns. In the remaining analyses, we look more closely at message content for
indications of GBV attitudes and to clarify the requirements for future computational analysis
capabilities.
Language indicators. Using Linguistic Inquiry Word Count (LIWC) software13
(Tausczik & Pennebaker, 2010), we analyzed language of the content of all tweets generated by
both genders. We used the predefined LIWC dictionaries that tally word frequencies in
categories such as anger, sexuality, sadness, health, etc. Content corresponding to these
13 Information available here: http://www.liwc.net/
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
14
particular categories, and in fact content across the majority of the LIWC categories appeared
more frequently in tweets of male origin relative to tweets of female origin. However, we did
note some trends in tweet content from females. Consistent with the research on gender issues in
communication (Tannen, 1996), female authors here are more collective and socially oriented.
Their tweets call for action and are more likely to express agreement.
Female (South Africa): @USER7 Absolutely. If we follow each other I can DM you my
email address. I applaud your speaking out on the rape epidemic in SA.
Female (Nigeria): I am worried abt our approach to d fight against rape. Permit me to
vent here. @USER8 @USER9 @USER10 @USER11 #CurbingRape
Female authors are also more likely to provide opinion on causality, as in the following example
tweet from India:
Female (India): The major factor behind #Rape in #India is the #Bollywood which incites
feeling 2 cross the moral limits #BanBollywood #GreaterPakistan
Gender specific analysis can be leveraged to design and promote anti-GBV campaigns.
For example, tweets of female origin in India, although not guaranteed to be benevolent, could
be amplified to extend their reach.
Following a manual review of a random sample of the corpus, we used computational
analyses to describe the prevalence of attitude indicators across the whole corpus. Two of the
attitude indicators examined here are the presence of humor indicators and GBV metaphors in
sports. We also provide specific examples of tweet content from manual analysis of random
samples.
Humor indicators. Humor might indicate a trivialization of an issue or an expression of
underlying helplessness (Romero, 2010; Ancheta, 2007). We assessed the prevalence of humor
references with related permutations of “haha” and “hehe”. The number of humor-flagged tweets
by country appears in Table 7. The Philippines sample provides a greater proportion of humor
indicators by far. This is consistent with Filipino culture14 in general. Moreover, the observed
female bias in the Filipino sample reinforces the interpretation of humor as an expression of
helplessness. Of course, laughter annotations do not provide a complete account of humorous
references but rather a correlated indicator for jokes such as the following from the Filipino
corpus:
Female (Philippines): RT @USER12: Rape prevention nail polish sounds like a great idea but
Iím not sure how youíre going to get men to wear it
Due to cultural differences in the role of humor, we cannot advocate comparison between
countries. However, we do suggest that the prevalence of GBV humor may provide a useful
metric for changes in attitude over time within a region.
14 A brief overview of traditional Filipino values can be found here:
http://en.wikipilipinas.org/index.php?title=Philippine_Core_Values
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
15
Table 7. Analysis of humor-flagged tweets indicated by permutations of humor indicators.
Country
Flagged tweets
Total tweets
Percentage
Philippines
10418
105550
9.870%
South Africa
1801
104430
1.725%
USA
8067
698077
1.156%
Nigeria
1419
134671
1.054%
India
5195
549033
0.946%
Note. Lexicon set defined as {haha, ha ha, hehe, he he, he he he, hahaha, lol, lmao}.
GBV and sports metaphors. Sports involve competition, and violent metaphor is a
common device in sport discussion (Lewandowski, 2010, 2011, 2012). GBV metaphors may
appear in a sports corpus as an indication of dominance, such as the following example from the
South African tweet corpus:
South Africa: The German Team is on a Steroid-induced anal-rape rage
against Brazil right now.
Comparing the prevalence of GBV metaphors in sports across countries is difficult
because a given sport does not capture national attention equally across countries. We obtain
some insight regarding the GBV metaphor in tweets related to sports from exchanges concerning
the FIFA World Cup contest, held from 12 June through 13 July 2014. Soccer is the most
popular sport in Nigeria and South Africa, one of the top three favorite sports in India, and a top
ten favorite in the Philippines and the U.S.15. While all of the countries we examined have
eligible teams, only the U.S. (ranked 13) and Nigeria (ranked 44) participated in the final
tournament16.
We should not expect any FIFA related content in a corpus designed to capture GBV
issues. Yet, Table 8 illustrates the existence of tweets flagged for containing references to:
football, futbol, soccer, worldcup, world cup, fifa, fifacup, soccercup, as well as pairs of team
names participating in the tournament. Although the percentages are small, every country
examined provided such instances, in a ranking consistent with soccer popularity, with the
exception of the U.S. The finding requires a nuanced interpretation. On the one hand, rape as a
metaphor suggests a trivialization of the primary definition. On the other hand, at least one
dictionary17 indicates an archaic definition meaning plunder or violation, for example regarding
the environment. On the one hand violent metaphor is a common device in sport discussion. On
the other hand, it is just one of many common metaphors (Lewandowski, 2012). In fact, in
Lewandowski’s (2010-2011) extensive analysis of violent, conflict oriented metaphor in football
in sports journalism, none of the 551 violent metaphors invoked rape.
15 http://mostpopularsports.net/football-soccer-popularity
16 http://en.wikipedia.org/wiki/2014_FIFA_World_Cup
17 http://dictionary.reference.com/browse/rape
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
16
A difference in the popularity of sport between countries is only one challenge to direct
comparison of GBV metaphor between countries. The development of a knowledge base to
support analysis poses further challenge, as different regions follow different teams with
different players. As in the interpretation of humor, we believe the best use of sport metaphor
data is to provide indicators of attitude trends over time by region, while adjusting for seasonal
variations in sporting events or averaged annually.
Table 8. Analysis of sports-related tweets indicated by sports lexicon.
Country
Flagged tweets
Total tweets
Percentage
U.S.
1053
698077
0.0015%
India
657
549033
0.0011%
Nigeria
320
134671
0.0023%
Philippines
47
105550
0.0004%
South Africa
214
104430
0.0020%
Note: FIFA World Cup sample for each country in the period of June 12 to July 13 2014.
Lexicon: { football, futbol, soccer, worldcup, world cup, fifa, fifacup, soccercup, [country-names pairs
from http://en.wikipedia.org/wiki/2014_FIFA_World_Cup] }
Policy-making and intervention insights via manual analysis. Our goal is to generate
attitude metrics and inform mitigation campaigns automatically. All of the above examples were
informed by computational analysis. For the following examples, we manually examined random
subsets drawn from each of the three slices of our data corpus, as 200, 500, and 500 tweets. We
end our presentation of GBV data by indicating the kind of content available in the corpus
compatible with somewhat more specialized computational analysis. Developing the necessary
content specific filtering methods (feasible using domain knowledge of politics, business, and
behavioral theories) is particularly critical for appropriately routing content to specific policy
making recommendation agencies. Some examples are following:
a.) Behavior pertaining to government/public officials/leaders
@USER13 NCP leader doesn't know rape happens due to pervert mindset . what
abt unreported rapes happen with minors & toddlers.
b.) Commercial considerations
RT @USER14: .@USER15 Please don't allow violent hatemongers use your app
to harass and exploit marginalized women. \"http://t.co/DoRkbXFuN…
Spring Breakers isn?t just a terrible movie, it reinforces rape culture
http://t.co/pxTAZftM8v
RJ Police already said that it's not a Rape case!But Media neglected it 4 creating
SPICY news!#KnowTheTruth & Rise4Justice
c.) Persuasive/encouraging message content
RT @USER16: Pregnancy, periods, breast cancer, being walked on, rape,
harassment, abuse; females go through a lot. WOMEN ARE STRONG.
@USER17 @USER18 Is it acceptable to use gang rape in an example?
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
17
d.) Stereotype association
RT @USER6: It is not my job to coddle and "educate" young Black men when it
comes to violence against women. Y'all wanna "teach"?
Automated filtering of GBV content according to such dimensions is challenging but feasible
given the appropriate knowledge bases.
[4.] DISCUSSION AND FUTURE WORK
Here we revisit the limitations of conventional, survey-based methods for monitoring
GBV attitudes to discuss the progress we have made, the limitations we face, and the promise of
further advances in Computational Social Science.
Progress
We noted above the limitations of conventional methods in gathering GBV data for
specific regions. Data are either highly aggregated, at the country level or missing. We have
provided substantial amounts of GBV data for all five of our target regions. User profiles and
GPS tags provide the capability of identifying content specific to much smaller units of analysis,
e.g., type of socio-economic region or even city (Leetaru et al., 2014). This capability supports
targeted GBV campaigns, where the prevalence and types of violations may vary. Below we note
progress with respect to a number of additional concerns regarding conventional data gathering
methods.
Reducing sampling bias. We noted the presence of bias in survey methods due to
artificial sample selection. We have overcome some of the bias concern simply by the scope of
data collection that is feasible with social media. We have the opportunity to observe male and
female attitudes by specific regions, over time. However, bias does remain in our data collection
methods. Some of the persisting bias is amenable to adjustment. For example, bias exists in the
form of literacy assumptions by country and by gender within country and Internet penetrability.
Awareness of these sampling issues allows us to amplify the weight of content from
underrepresented participants, e.g., the contributions from females in lower penetration areas.
This is possible because we can assess both gender and location in the available data. Of course,
we cannot amplify content that does not exist. This limitation is fundamental, but it also
identifies the regions where this limitation occurs and where we should focus other methods of
data collection.
Reducing content bias. We suggested that survey participation itself constitutes a form
of bias. Apart from the sampling issues noted above, the posted content on social media itself
reflects a form of bias. Participants are still providing what they consider to be socially
acceptable content. This is particularly apparent in the absence of commentary regarding Boko
Haram and GBV in Nigeria. However, this observation alone can inform the course of action for
policy makers, e.g., publicizing sensitive topics. Nevertheless, for the social media data that are
available, participants are not providing guarded responses to external queries about their
attitudes. Instead they express their attitudes directly, making them susceptible to analysis.
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
18
Analysis speed. We have over 1.5 million social media postings collected in 2014 from
just five countries to support the claim that GBV issues generate commentary. Moreover, we
have completely eliminated the need for labor-intensive data collection, and as a result have
overcome the cost and lag limitations in data collection. We produced analysis within a year in
contrast to survey methods, despite the absence of complete automated capability. This rapid
turnaround enables the development of dynamic metrics to assess the results of campaigns
designed to curb GBV. A rapid measurement capability can play a role in the promotion of
effective efforts and the abandonment of those that are less effective, with real consequence to
the alleviation of human suffering.
GBV attitude metrics. Survey items tend to address attitudes but not behavior. Social
media data provide both attitudes and behavior, inasmuch as jokes and metaphor are both
behavior and attitude. This provides us with potential measures of tolerance for GBV. Thus, the
editing of socially acceptable content that constitutes a form of bias in data collection is the very
same behavior that tells us what is considered acceptable. This provides a means to measure the
effectiveness of anti-GBV campaigns, both those directly targeted to the potentially offensive
jokes and metaphor, as well as those targeted to specific but apparently unrelated concerns such
as the effect of holidays or weather.
Survey methods are by design static. Standard measures purport to provide evidence that
is comparable across time and regions. However, standardization ignores the effect of context.
Social media trends over time tell us that context cannot be ignored. For example the publicity
surrounding a celebrity involvement in a GBV related event spikes social media commentary.
The availability of such events (Tversky & Kahneman, 1973) may very well influence survey
responses. But standalone surveys have no way to account for this influence. Our computational
social science methods allow us to complement the interpretation of GBV commentary with
adjustments for the influence of events at a short-time scale in order to discern long-term trends.
Finally, and completely outside the typical survey content, the analysis of social media
also promises to assist GBV campaigns, both by targeting the views of specific groups and by
providing content recommendations regarding law enforcement, politics, health services and
commerce.
Limitations & Challenges
All data collection methods suffer from limitations, and ours is no exception. Here we
note several concerns, some of which are amenable to computational solution and some of which
are more fundamental.
Unconstrained natural language text. Our keyword-based crawling limits the
completeness of the resulting corpus. Given the natural language of social media messages, we
cannot guarantee collection of every single relevant message. Moreover, keyword selection
matters. Countries vary the terminology they employ for different purposes. For example the
word “rape” in tweets that originate in India generally refer to events in India, but “sexual
assault” in the Indian corpus returns mostly American incidents. We are further constrained at
present by restriction to the English language, missing the interpretation of GBV language
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
19
embedded in Tagalog for example (spoken widely across the Philippines). Furthermore, our
dependence on keywords glosses over the different definitions of rape across cultures.
Global event sensitivity. Both a feature and a limitation, we demonstrated the influence
of events on social media content, both within a region and between regions. World events
provoke the articulation of public opinion and provide an unprecedented large-scale opportunity
to gather opinion and attitude. At the same time world events create variations in magnitude that
require adjustments to frequency counts, in order to reflect enduring trends over time.
Inter-regional comparison. We commented earlier that differences in the legal
definition of GBV hindered comparison of GBV rates between countries. While we believe our
measurements within a region can be informative about attitude change over time, the ability to
compare GBV issues between regions still poses substantial challenge. Our two proposed
measures, sports metaphors and jokes illustrate the challenge. A given sport is not equally
important across countries (or even regions), so that the prevalence of more frequent violent
sport metaphor in one country relative to another may simply reflect population interest in the
sport. Allowing type of sport to vary between regions confounds sport with region, so that we
might be learning more about issues with the sport than the region. The meaning of jokes
notwithstanding (trivialization versus helplessness), we cannot compare the prevalence of GBV
jokes between countries without factoring in the prevalence of jokes between countries in
general. Thus, we have not escaped the local, sociocultural influences on measures that prevent
between region comparisons. Our contribution to this issue is to establish that it is not a
limitation of specific measures such as police reports and surveys. All measures reflect these
sociocultural influences.
Location identification. Location references of the messages are not always consistent
with the location of the author profile or GPS coordinates of source device. U.S. events appear in
the opinions of people in other countries, and vice versa. The technical aspect of this problem
resolves with techniques that discern event location from the message text or ancillary URL
content. This is not necessarily without remaining uncertainty, as people often assume shared
context location identification, e.g., with abbreviated names for familiar landmarks resulting in
referential ambiguity. However a far greater concern lies in the attribution of attitude when the
content corresponds to a remote event.
Correlational logic. Our argument has a correlational component. For example, we can
identify indicators of humor such as “hehehe” and “hahaha” using computational methods.
However, automatically detecting every instance of a joke embedded in social media content is
not computationally feasible. We assume that the frequency of humor keywords in a GBV
context correlates with the frequency of GBV jokes without such keywords. Although this
assumption is worth confirming using a manual classification of jokes, we note that the presence
of laughter indicators alone in this context is potentially offensive.
Future Work
We have demonstrated the potential of social media to inform policy makers regarding
attitudes, measuring the effect of campaigns, and even providing campaign concepts.
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
20
Computational Social Science holds promise for a range of economic development issues beyond
GBV, covering a different set of themes, countries, and languages. However, much technical,
theoretical, and practical work remains to realize the full potential of the medium for the GBV
application.
Technical advances are required in order to distinguish message origins from message
content and support more comprehensive gender detection. Substantial work remains in the
development of specific knowledge bases to guide the detection and interpretation of jokes, as
well as metaphor and commentary directed to particular entities such as government and
business. The knowledge bases must be dynamic to capture the transient events that mask
fundamental attitude. Theoretical guidance from social science is required to attribute humor and
metaphor to either despair or tolerance or the lack thereof. The interpretation of retweets across
regions raises the problem of attribution in more than a practical sense. The issue is whether and
how to weigh the endorsement: as a property of the original sender, or the endorser who is
amplifying the message. Does the endorsement reflect opinion of the endorser’s location
independent of birth origins? The resulting approaches require validation to address these issues.
Comparison of GBV threats between countries, though raising measurement issues, is not
an exclusively methodological problem. Instead, it is a sociocultural issue, concerning the
boundaries that violate globally established norms and determine the deployment of global
resources. We lack the policy expertise to weigh in on such matters. However, we do endorse the
development of adaptive regional, and even local models for GBV behaviors and attitudes.
Although a large model encompassing all countries holds a certain appeal, it is fallacious to
assume that all regions have the same underlying issues and beliefs. The number of countries
constrains the number of contextual variables that one can test statistically. Local socio-
economic conditions, literacy, Internet penetrability, gender, crime rates, religious beliefs, liberty
and many other unexamined, but constant factors influenced the data and their interpretation.
Data accessibility for policy-makers. Near-real-time information benefits an urgent
need to reduce GBV and the associated suffering. Conventional methods of disseminating public
opinion, are reflected in written reports covering multiple years, and issued with considerable
delay following data collection. Governments, NGOs, International Organizations, Aid Workers,
etc. require far higher bandwidth access to the dynamic data and analysis in order to measure
current public opinion and the effect of anti-GBV campaigns.
The Twitris collective social intelligence platform (Sheth et al. 2014) provides a
foundation for delivering the necessary high bandwidth access. Twitris supports the presentation
of thematic data along spatial and temporal dimensions (Nagarajan et al. 2009), network
relationships among people related to the distribution of content (Purohit et al. 2012), and
sentiment (Chen et al. 2012) as well as real-time trends18 shown in Figure 4.
18 A currently monitored GBV campaign is available at:
http://twitris2.knoesis.org/app/#campaigns/Gender%20Based%20Violence/dashboard Register for full dashboard
view.
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
21
Figure 4. A snapshot of the Twitris social analytics platform showing on-going research to monitor and
analyze the GBV data by a variety of dimensions- location (1, 8), time (2), types of content theme such as
sexual violence (3), demographics based map visualization updates (4), top topics by country or region of
interest (5), country specific demographic and top topics and related tweets (6, 7), sentiment based heat
map (8), as well as fine-grained analysis of emotion expressed, who-talks-to-whom network for
influential users about GBV topics, and real-time trends dashboard (1).
Twitris analyzes a topic such as GBV, and provides real-time scalable analyses of social
data streams for greater insights and actionable information to improve intervention. For
example, Twitris can monitor the real-time public sentiment and emotional reaction to criminal
reports and the justice system response by region, permitting side-by-side comparison with other
events. Under user guidance, Twitris automatically identifies the key topics meriting further
attention. Twitris network analysis assists in the measurement of anti-GBV campaign diffusion
to measure campaign effectiveness, as well as the identification of users who will spread targeted
campaigns.
CONCLUSION
Big (Social) Data complement more controlled but slower survey-based data collection
and analysis methods, whose conclusions may become obsolete in a dynamic world that
continuously generates noisy data responding to transient events. The lag in surveys and noisy
nature of data limits the use of conventional methods for measuring the effects of changing GBV
attitudes and anti-GBV campaigns. Computational Social Science supports the collection and
analysis of a large GBV corpus. We analyzed nearly fourteen million Twitter messages collected
over a ten-month period. The large sample reduced bias relative to conventional data collection.
We examined data by region and gender, identifying content such as humor and metaphor that
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
22
have implications for both the measurement of GBV attitudes as well as specific targets for anti-
GBV campaigns. Our methods constitute an inexpensive way to engage with citizens at
unprecedented scale, including the collection of public views regarding the behavior of
government and business to revolutionize the conduct and measurement of anti-GBV campaigns.
ACKNOWLEDGEMENT
We acknowledge the partial support of the U.S. National Science Foundation Social-
Computational Systems (SoCS) program for grant IIS–1111182 ‘Social Media Enhanced
Organizational Sensemaking in Emergency Response’ for our study. We are thankful to our
colleagues at the United Nations Population Fund NYC—especially Upala Devi, Judy Ilag, and
Maria Dolores Martin Villalba and Kno.e.sis Center–especially Lu Chen, and current research
interns Kushal Shah and Garvit Bansal from LNMIIT India, for their invaluable continued
support in discussion and review for our GBV research.
REFERENCES
Ancheta MRG. 2007. "Pig's nest" in an even bigger pen: Pugad Baboy as a case of subversion
and renegotiation in Philippine comedy. Humanities Diliman 1:2.
Aumiller S. 2011. Why is Twitter promoting violence against women? Give Hope A Voice.
WordPress.com. Available at http://givehopeavoice.wordpress.com/2011/08/01/why-is-
twitter-promoting-violence-against-women/
Bhanot S, Senn CY. 2007. Attitudes towards violence against women in men of South Asian
ancestry: Are acculturation and gender role attitudes important factors? Journal of Family
22(1):25-31.
Bowes N, McMurran M. 2013. Cognitions supportive of violence and violent behavior.
Aggression and Violent Behavior 18(6):660-665.
Cameron D, Smith GA, Daniulaityte R, Sheth AP, Dave D, Chen L, Gaurish A, Carlson R,
Watkins KZ, Falck R. 2013. PREDOSE: A semantic web platform for drug abuse
epidemiology using social media. Journal of Biomedical Informatics 46(6):985-997.
Corroon M, Speizer IS, Fotso JC, Akiode A, Saad A, Calhoun L, Irani L. 2014. The Role of
Gender Empowerment on Reproductive Health Outcomes in Urban Nigeria. Maternal and
Child Health Journal 18(1):307-315.
Creswell JW. 2013. Research design: Qualitative, quantitative, and mixed methods approaches.
Sage.
Culotta A. 2014. Estimating county health statistics with Twitter. In Proceedings of the SIGCHI
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
23
Conference on Human Factors in Computing Systems (CHI’14):1335-1344.
De Choudhury M, Gamon M, Counts S, Horvitz E. 2013. Predicting depression via social media.
In ICWSM.
Dixit A. 24 January 2014. How love got the better of India acid attack. Al Jazeera India.
European Union. 2012. Council conclusions on the eradication of violence against women in the
European Union. Available at
http://www.consilium.europa.eu/uedocs/cms_Data/docs/pressdata/en/lsa/113226.pdf
European Union Agency for Fundamental Rights. 2012. European council effort. Available at
http://fra.europa.eu/en/project/2012/fra-survey-gender-based-violence-against-women
Factbook, C.I.A. 2013. The world factbook. Central Intelligence Agency, Washington D.C.
Google Trends. 2015. Top charts. Available at
http://www.google.com/trends/topchartsdate=2014
HarassMap tool. Available at http://harassmap.org/en/
Heise L, Ellsberg M, Gottmoeller M. 2002. A global overview of gender-based violence.
International Journal of Gynecology & Obstetrics 78(1):S5-S14, ISSN
0020-7292, http://dx.doi.org/10.1016/S0020-7292(02)00038-3
ITU statistics 2014 on Mobile Tech adoption. Available at
http://www.itu.int/net/pressoffice/press_releases/2014/23.aspx#.VCSELStdU7s
Kalichman S, Simbayi LC, Kaufman M, Cain D, Cherry C, Jooste S, Mathiti V. 2005. Gender
attitudes, sexual violence and HIV/AIDS risks among men and women in Cape Town,
South Africa. Journal of Sex Research 42(4):299-305.
Lakoff G. 2008. Don't think of an elephant!: Know your values and frame the debate. Chelsea
Green Publishing.
Lazer D, Pentland AS, Adamic L, Aral S, Barabasi AL, Brewer D, Christakis N, Contractor N,
Fowler J, Gutmann M, Jebara T, King G, Macy M, Roy D, Van Alstyne M. 2009. Life in
the network: The coming age of computational social science. Science 323(5915):721.
Leetaru K, Wang S, Cao G, Padmanabhan A, Shook E. 2013. Mapping the global Twitter
heartbeat: The geography of Twitter. First Monday 18(5).
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
24
Lewandowski M. 2010-2011. The rhetoric of violence in Polish and English soccer. In I.
Koutny and P. Nowak (eds.): Language, communication, information 87-89.
Lewandowski M. 2012. Football is not only war. Non-violent conceptual metaphors in English
and Polish soccer language. In Taborek, Tworek and Zieliński (eds.): Sprache und Fußball
im Blickpunkt linguistischer Forschung, Verlag Dr. Kovač, Hamburg 79-95.
Liu Y, Kliman-Silver C, Mislove A. 2014, May. The tweets they are a-changin’: Evolution of
twitter users and behavior. In International AAAI Conference on Weblogs and Social
Media (ICWSM) 13:55.
Martinez PR, Khalil H. 2012. Battery and development: Exploring the link between intimate
partner violence and modernization. Cross-Cultural Research 47(3):231-267.
Morrison A, Ellsberg M, Bott S. 2007. Addressing gender-based violence: A critical
review of interventions. The World Bank Research Observer 22(1):25-51.
Nayak MB, Byrne CA, Martin MK, Abraham AG. 2003. Attitudes toward violence against
women: A cross-nation study. Sex Roles 49(7-8):333-342.
Nilan P, Demartoto A, Broom A, Germov J. 2014. Indonesian men's perception of violence
against women. Violence Against Women 20(7):869-888.
Pierotti RS. 2013. Increasing rejection of intimate partner violence evidence of global cultural
diffusion. American Sociological Review 78(2):240-265.
Purohit H, Castillo C, Diaz F, Sheth AP, Meier P. 2013. Emergency-relief coordination on social
media: Automatically matching resource requests and offers. First Monday 19(1).
Purohit H, Hampton A, Bhatt S, Shalin VL, Sheth AP, Flach JM. 2014. Identifying seekers and
suppliers in social media communities to support crisis coordination. Computer Supported
Cooperative Work (CSCW) 23(4-6):513-545.
Romero G. 2010. Why the Filipino laughs. Research Lines. Available at
http://www.ovcrd.upd.edu.ph/researchlines/2010/10/22/why-the-filipino-laughs/
SafeCity initiative. Available at http://safecity.in/
Sheth A, Jadhav A, Kapanipathi P, Chen L, Purohit H, Smith GA, Wang W. 2014. Twitris- A
system for collective social intelligence. Encyclopedia of Social Network Analysis and
Mining (ESNAM), Springer, Alhajj, Reda, Rokne, Jon (Eds.) ISBN: 978-1-4614-6169-2.
GENDER-BASED VIOLENCE ANALYSIS ON TWITTER. KNO.E.SIS TECHNICAL REPORT 2015
25
Available at http://www.springer.com/computer/communication+networks/book/978-1-
4614-6169-2
Tannen D. 1996. Gender and Discourse. Oxford University Press, ISBN 0-19-508975-8; ISBN
0-19-510124-3.
Tausczik YR, Pennebaker JW. 2010. The psychological meaning of words: LIWC and
computerized text analysis methods. Journal of Language and Social Psychology
29(1):24-54.
Twitter Developers. 2014. Streaming API. Available at
https://dev.twitter.com/streaming/overview Accessed Jan 11 2014.
United Nations. 2009. UN Secretariat campaign for violence against women. Available at
https://www.un.org/en/events/endviolenceday/pdf/UNiTE_TheSituation_EN.pdf
United Nations Population Fund. 2013. Addressing gender-based violence. Available at
http://www.unfpa.org/webdav/site/global/shared/documents/publications/2013/final%20se
xual%20violence%20CSW%20piece.pdf
United Nations Population Fund. Gender-based violence. Available at
http://www.unfpa.org/gender/violence.htm
United States Agency for International Development. 2008. Indicators for gender violence.
Available at http://www.cpc.unc.edu/measure/publications/ms-08-30
Vieweg S, Hughes AL, Starbird K, Palen L. 2010. Microblogging during two natural hazards
events: What Twitter may contribute to situational awareness. CHI ’10: Proceedings of
the SIGCHI Conference on Human Factors in Computing Systems 1,079–1,088.
doi:http://dx.doi.org/10.1145/1753326.1753486, accessed 26 December 2013.
World Health Organization. 2013b. Global and regional estimate of violence against women.
Available at
http://apps.who.int/iris/bitstream/10665/85239/1/9789241564625_eng.pdf
World Health Organization. (2013a). WHO Fact sheet. Available at
http://www.who.int/mediacentre/factsheets/fs239/en/
Yoshihama M, Blazevski J, Bybee D. 2014. Enculturation and attitudes toward intimate partner
violence and gender roles in an Asian Indian population: Implications for community
based prevention. American Journal of Community Psychology 53:249-260.
... Tranchese and Zollo (2013) conducted a comparative analysis of perpetrators of rape and victim's representation in the broadcast and printed media. Purohit et al. (2015) analyzed public sentiment linked to gender-based violence by analyzing the Twitter dataset. Moss-Racusin et al. (2012) performed a case study of rating job applications by science faculty from top research institutions. ...
... The newspapers include The News, Nation, Dawn from Pakistan, Daily Mail from the UK, and The Hindu from India. Next, we build a dictionary of gender-based violence terms, as shown in Table 1, by extending existing keywords (Purohit et al. 2015). Then, we filtered GBV news articles using this dictionary. ...
... Initially, a topic modeling using LDA is applied to analyze the topics reported in the media using a dataset consisting of major English newspapers of Pakistan (The News, Nation, Dawn), Daily Mail from UK and The Hindu from India. We build a dictionary of gender-based violence keywords by extending existing keywords (Purohit et al. 2015). Then we train the machine learning model to incorporate it into a web portal to analyze web content based on GBV. ...
Article
Full-text available
We argue that social computing and its diverse applications can contribute to the attainment of sustainable development goals (SDGs)—specifically to the SDGs concerning gender equality and empowerment of all women and girls, and to make cities and human settlements inclusive. To achieve the above goals for the sustainable growth of societies, it is crucial to study gender-based violence (GBV) in a smart city context, which is a common component of violence across socio-economic groups globally. This paper analyzes the nature of news articles reported in English newspapers of Pakistan, India, and the UK—accumulating 12,693 gender-based violence-related news articles. For the qualitative textual analysis, we employ Latent Dirichlet allocation for topic modeling and propose a Doc2Vec based word-embeddings model to classify gender-based violence-related content, called GBV2Vec. Further, by leveraging GBV2Vec, we also build an online tool that analyzes the sensitivity of Gender-based violence-related content from the textual data. We run a case study on GBV concerning COVID-19 by feeding the data collected through Google News API. Finally, we show different news reporting trends and the nature of the gender-based violence committed during the testing times of COVID-19. The approach and the toolkit that this paper proposes will be of great value to decision-makers and human rights activists, given the prompt and coordinated performance against gender-based violence in smart city context—and can contribute to the achievement of SDGs for sustainable growth of human societies.
... This scoping review examined the current state of using Twitter data for sexual violence research, with a focus on publication year, publication source (ie, journal or conference), research objectives, and ethical considerations surrounding using Twitter data. We identified 7 main objectives after reviewing and summarizing the stated research objectives, including (1) exploring online disclosures and public opinions of sexual violence victimization [20,31]; (2) analyzing Twitter activities and discussions about focusing events or cases related to sexual violence, such as "Wolf pack," "Hawthron case," [32] "New Delhi Gangrape," "Ray Rice," and "Janay Rice" [23]; (3) investigating cultural perceptions of sexual assault [33]; (4) building tools to capture offensive and abusive language on Twitter [34]; (5) using Twitter as a tool to set public agenda and influence policies related to sexual violence, for example, Clark and Evans [35] examined how factors (ie, gender, partisanship, and ideology) influence Congress members' tweet activities about the #MeToo movement; (6) building and testing algorithms to categorize tweets containing harassment [36,37]; and (7) examining public discourses under popular sexual assault-related hashtags (ie, #whyistatyed and #whyididn'treport) [20] or with key terms of "domestic violence" [38]. When using Twitter as a data source, most studies focused on analyzing tweets to examine sexual assault events in the society, such as assessing the reactions of Twitter users as supportive or detractive, as well as exploring the public discourse and personal revelation surrounding sexual violence. ...
... In addition, some studies pointed out problems with the algorithms employed in data processing. For example, Purohit et al [31] found that the LIWC tool required revision in order to effectively process the unconstrained natural language text of Twitter data. ...
Article
Full-text available
Background: Scholars have used data from in-person interviews, administrative systems, and surveys for sexual violence research. Using Twitter as a data source for examining the nature of sexual violence is a relatively new and underexplored area of study. Objective: We aimed to perform a scoping review of the current literature on using Twitter data for researching sexual violence, elaborate on the validity of the methods, and discuss the implications and limitations of existing studies. Methods: We performed a literature search in the following 6 databases: APA PsycInfo (Ovid), Scopus, PubMed, International Bibliography of Social Sciences (ProQuest), Criminal Justice Abstracts (EBSCO), and Communications Abstracts (EBSCO), in April 2022. The initial search identified 3759 articles that were imported into Covidence. Seven independent reviewers screened these articles following 2 steps: (1) title and abstract screening, and (2) full-text screening. The inclusion criteria were as follows: (1) empirical research, (2) focus on sexual violence, (3) analysis of Twitter data (ie, tweets or Twitter metadata), and (4) text in English. Finally, we selected 121 articles that met the inclusion criteria and coded these articles. Results: We coded and presented the 121 articles using Twitter-based data for sexual violence research. About 70% (89/121, 73.6%) of the articles were published in peer-reviewed journals after 2018. The reviewed articles collectively analyzed about 79.6 million tweets. The primary approaches to using Twitter as a data source were content text analysis (112/121, 92.5%) and sentiment analysis (31/121, 25.6%). Hashtags (103/121, 85.1%) were the most prominent metadata feature, followed by tweet time and date, retweets, replies, URLs, and geotags. More than a third of the articles (51/121, 42.1%) used the application programming interface to collect Twitter data. Data analyses included qualitative thematic analysis, machine learning (eg, sentiment analysis, supervised machine learning, unsupervised machine learning, and social network analysis), and quantitative analysis. Only 10.7% (13/121) of the studies discussed ethical considerations. Conclusions: We described the current state of using Twitter data for sexual violence research, developed a new taxonomy describing Twitter as a data source, and evaluated the methodologies. Research recommendations include the following: development of methods for data collection and analysis, in-depth discussions about ethical norms, exploration of specific aspects of sexual violence on Twitter, examination of tweets in multiple languages, and decontextualization of Twitter data. This review demonstrates the potential of using Twitter data in sexual violence research.
... For instance, Twitter has increasingly been used as a medium in IPV research based on big data, employing various computational methods, using tweets including IPV-related keywords or hashtags as units of analysis. Studies show evidence that there is an active Twitter community on violence against women, which tends to engage in conversations (Xue et al., 2019a); this community also highlights oft-neglected forms of violence such as reproductive coercion (McCauley et al., 2018) and provides important information on awareness campaigns, as well as a support platform (Purohit et al., 2016;Xue et al., 2019b). IPV studies using data from Pinterest (Carlyle et al., 2019) and Instagram (Carlyle et al., 2021), with predominantly female and young-adult users, respectively, corroborate the idea that social media platforms involve an experience-based narrative on different forms of violence and thus provide a valuable tool for policy makers and advocacy groups. ...
Article
Full-text available
Most social phenomena are inherently complex and hard to measure, often due to under-reporting, stigma, social desirability bias, and rapidly changing external circumstances. This is for instance the case of Intimate Partner Violence (IPV), a highly-prevalent social phenomenon which has drastically risen in the wake of the COVID-19 pandemic. This paper explores whether big data-an increasingly common tool to track, nowcast, and forecast social phenomena in close-to-real time-might help track and understand IPV dynamics. We leverage online data from Google Trends to explore whether online searches might help reach "hard-to-reach" populations such as victims of IPV using Italy as a case-study. We ask the following questions: Can digital traces help predict instances of IPV-both potential threat and actual violent cases-in Italy? Is their predictive power weaker or stronger in the aftermath of crises such as COVID-19? Our results suggest that online searches using selected keywords measuring different facets of IPV are a powerful tool to track potential threats of IPV before and during global-level crises such as the current COVID-19 pandemic, with stronger predictive power post outbreaks. Conversely, online searches help predict actual violence only in post-outbreak scenarios. Our findings, validated by a Facebook survey, also highlight the important role that socioeconomic status (SES) plays in shaping online search behavior, thus shedding new light on the role played by third-level digital divides in determining the predictive power of digital traces. More specifically, they suggest that forecasting might be more reliable among high-SES population strata. Supplementary information: The online version contains supplementary material available at 10.1007/s10680-022-09619-2.
... Sexism can also be hostile (e.g., The world would be a better place without women) or benevolent where messages are subjectively positive, and sexism is expressed in the form of a compliment (e.g., Many women have a quality of purity that few men have) (Glick and Fiske, 1996). In communication studies, the analysis of political discourse (Bonnafous, 2003;Coulomb-Gully, 2012), sexist abuse or media discourse (Dai and Xu, 2014;Biscarrat et al., 2016) show that political women presentations are stereotyped: use of physical or clothing character-istics, reference to private life, etc. From a sociological perspective, studies focus on social media contents (tweets) or SMS in order to analyze public opinion on gender-based violence (Purohit et al., 2016) or violence and sexist behaviours (Barak, 2005;Megarry, 2014). ...
Thesis
Gender-based violence (GBV) is defined as a harmful social act that is perpetrated against a person based on their socially ascribed gender. It occurs throughout the world across different cultures and despite the attention and multiple interventions, it continues unabated. Namibia is no exception and GBV has taken on epidemic proportions in recent years. As previous traditional interventions have not led to the desired decrease of violence, we postulate that socio-cultural issues need to be taken into consideration if a societal change is desired. Efforts to advocate for public education, cultural learning, and campaigns on social issues in Namibia favour passive communication mediums at their disposal. Yet platforms such as public art and interactive installations are becoming essential tools to enliven, inspire, motivate and provoke people into the discussion of topics relating to their esteemed cultures and society. Technology and the Internet, on the other hand, are quickly introducing newer platforms providing opportunities for learning and advocating against GBV. Nonetheless, technology interventions that advocate against GBV in Namibia mainly focus on social media and do not leverage other technological user experiences beyond the social media sphere. The focal objective of the study is to design a persuasive technology that addresses the underlying causes of GBV and evokes changes in values and attitudes towards GBV in Namibia while taking into consideration the complexity of the issue (GBV) and challenges related to designing persuasive technologies. The study employed a research through design methodology. An interactive technology hut installation, inspired by local cultural elements, was designed as a data collection tool and exhibited at various public events and places. The data analysis revealed the values and attitudes of participants towards GBV. An evaluation of the interactive installation leads to insights of design tactics around public art, cultural probes, role-playing simulation and embedding discomforting experience in delivering the design implications of a persuasive technology to contest GBV in Namibia through a practice of design.
Article
Full-text available
This methodological study deploys hybrid techniques to investigate how femicide is framed in media.Results are consistent with ISTAT data and with the literature, and also offer novel insights. We find a tendency of not holding offenders accountable; that most femicides are perpetrated by men that victims know well; and that mediatic discourse around such crimes increases in certain circumstances and moments of the year. The analysis of the docu-fiction Amore Criminale reveals that metaphors are frequently used to sketch the participants’ socio-psychological portraits. Iconic speech and gestures are frequently employed by interviewees to report and mime episodes of violence./// Questo studio propone un metodo ibrido per indagare la rappresentazione linguistica del femminicidio nei media italiani. I risultati sono coerenti con i dati ISTAT e con la letteratura, e offrono nuovi spunti di riflessione. Si riscontra: una tendenza a deresponsabilizzare i colpevoli; che la maggior parte dei delitti sono compiuti da uomini vicini a esse; e che su tali delitti i media si concentrano in specifiche circostanze e momenti dell’anno. L’analisi sulla docu-fiction Amore Criminale rivela che per delineare ritratti sociopsicologici di vittime e carnefici si impiegano metafore, mentre per descrivere/mimare episodi di violenza si impiegano strategie iconiche. Keywords: corpus linguistics, multimodal analysis, Structural Topic Model, television language, journalistic language
Chapter
There is an increasing number of virtual communities and forums available on the web. With social media, people can freely communicate and share their thoughts, ask personal questions, and seek peer-support, especially those with conditions that are highly stigmatized, without revealing personal identity. We study the state-of-the-art research methodologies and findings on mental health challenges like depression, anxiety, suicidal thoughts, from the pervasive use of social media data. We also discuss how these novel thinking and approaches can help to raise awareness of mental health issues in an unprecedented way. Specifically, this chapter describes linguistic, visual, and emotional indicators expressed in user disclosures. The main goal of this chapter is to show how this new source of data can be tapped to improve medical practice, provide timely support, and influence government or policymakers. In the context of social media for mental health issues, this chapter categorizes social media data used, introduces different deployed machine learning, feature engineering, natural language processing, and surveys methods and outlines directions for future research.
Chapter
Full-text available
This article analyzes live tweets posted by viewers of ABC's The Bachelorette during a network-promoted scandal concerning the star's sexual activity on the reality TV program. This study notes how problematic gender norms were reinforced within the conversation unfolding on Twitter and how a subset of tweets served to critique the sexism found within the program, the Twitter feed, and in society more generally. As these tweets attempting to combat gender norms can be considered a form of digital activism, this study also analyzes the ways in which Twitter's particular communication format might complicate and/or interfere with their desired societal critiques. On June 22, 2015, the Twitterverse erupted when the star of ABC's The Bachelorette had sex with one of her male suitors prior to the show's pre-approved, pre-scripted timeline. Far from being a PG-rated reality TV franchise, the long-running show is well known for broadcasting a slew of make out sessions and an entire episode devoted to speculating on whether the bachelor or bachelorette will sleep with any or all of his or her final three contestants in the fantasy suite. Yet when an episode aired revealing that Kaitlyn Bristowe, the season's bachelorette, and repeat contestant, Nick Viall, had slept together at the close of their one-on-one date, Kaitlyn faced a wave of criticism from viewers through social media. Over 80,000 tweets with the hashtag #TheBachelorette appeared in the 24 hours surrounding this episode, and a vast majority of them were negative posts consisting of judgmental quips and derogatory slurs focusing on Kaitlyn's sexual activity. These tweeters, the majority of whom were female, were quick to affix all the normal labels used to discuss so-called female promiscuity. Among the tamer tweets were chastising posts, such as "Kaitlyn needs to learn how to keep it classy & not so trashy" (@otrat_rowyso). Amid the caustic remarks were also hundreds of tweets defending the star. For example, comedian Amy Schumer (@amyschumer) posted: "Oh no someone slept with a guy they're dating and considering marrying! Showing love for @kaitlynbristowe." Tweets that challenged slut shaming began to enter the feed, as did posts that specifically criticized ABC's producers for the ways in which the show participated in and encouraged such shaming. While some important conversations resulted from this sensationalized reality television episode (Gray, 2015; Uffalussy, 2015; Yahr, 2015), the initial social media response it provoked reveals how expectations for single women on the dating market today are entrenched in problematic sexual double standards that have remained unaltered for decades. Consider, for example, this tweet posted during the episode: "you can turn a housewife into a hoe. But you can't turn a hoe into a housewife" (@HeatherGossman). As the negative twitter posts prove, many still believe that certain behaviors determine whether a woman is good girlfriend or wife material, and at the top of the list remains her sexual history. This study notes the pervasiveness of these problematic gender norms within the collected tweets and analyzes a subset of posts that serve to critique these norms and provide broader cultural commentary. It could be argued that these latter tweets combatting gender norms are a form of digital activism. As such, this study analyzes the ways in which Twitter's particular communication format might complicate or interfere with their societal critiques.
Article
Full-text available
The microblogging site Twitter is now one of the most popular Web destinations. Due to the relative ease of data access, there has been significant research based on Twitter data, ranging from measuring the spread of ideas through society to predicting the behavior of real-world phenomena such as the stock market. Unfortunately, relatively little work has studied the changes in the Twitter ecosystem itself; most research that uses Twitter data is typically based on a small time-window of data, generally ranging from a few weeks to a few months. Twitter is known to have evolved significantly since its founding, and it remains unclear whether prior results still hold, and whether the (often implicit) assumptions of proposed systems are still valid. In this paper, we take a first step towards answering these question by focusing on the evolution of Twitter's users and their behavior. Using a set of over 37 billion tweets spanning over seven years, we quantify how the users, their behavior, and the site as a whole have evolved. We observe and quantify a number of trends including the spread of Twitter across the globe, the rise of spam and malicious behavior, the rapid adoption of tweeting conventions, and the shift from desktop to mobile usage. Our results can be used to interpret and calibrate previous Twitter work, as well as to make future projections of the site as a whole.
Article
Full-text available
This article focuses on the changes in attitudes about sexuality, gender equality, and intimate partner violence within the context of modernization. Revised modernization theory predicts that increasing development leads to greater levels of egalitarian gender values and liberal sexual mores as part of a larger change in society. Our analysis leads to the conclusion that although both these sets of attitudes are a part of the movement towards postmaterialist values, in the context of intimate partner violence, different dynamics prevail at different levels of development. Using regression analysis and data from the fifth wave of the World Values Survey, we find a significant relationship between attitudes towards intimate partner violence, egalitarian gender values and liberal sexual mores. In general, liberal attitudes towards sexuality do not necessarily mean a lower tolerance for intimate partner violence. Crucially, the relationship between these three sets of values depends on the level of development. We find that in agrarian and industrial societies, higher levels of liberal sexual mores with lower levels of egalitarian gender values lead to a higher level of support for intimate partner violence against women.
Article
Full-text available
Effective crisis management has long relied on both the formal and informal response communities. Social media platforms such as Twitter increase the participation of the informal response community in crisis response. Yet, challenges remain in realizing the formal and informal response communities as a cooperative work system. We demonstrate a supportive technology that recognizes the existing capabilities of the informal response community to identify needs (seeker behavior) and provide resources (supplier behavior), using their own terminology. To facilitate awareness and the articulation of work in the formal response community, we present a technology that can bridge the differences in terminology and understanding of the task between the formal and informal response communities. This technology includes our previous work using domain-independent features of conversation to identify indications of coordination within the informal response community. In addition, it includes a domain-dependent analysis of message content (drawing from the ontology of the formal response community and patterns of language usage concerning the transfer of property) to annotate social media messages. The resulting repository of annotated messages is accessible through our social media analysis tool, Twitris. It allows recipients in the formal response community to sort on resource needs and availability along various dimensions including geography and time. Thus, computation indexes the original social media content and enables complex querying to identify contents, players, and locations. Evaluation of the computed annotations for seeker-supplier behavior with human judgment shows fair to moderate agreement. In addition to the potential benefits to the formal emergency response community regarding awareness of the observations and activities of the informal response community, the analysis serves as a point of reference for evaluating more computationally intensive efforts and characterizing the patterns of language behavior during a crisis.
Article
Full-text available
This article explores male perceptions and attitudes toward violence against women in Indonesia. It analyzes interview data from Indonesian men collected as part of a large multimethod Australian government-funded project on masculinities and violence in two Asian countries. Reluctance to talk about violence against women was evident, and the accounts of those men who did respond referred to three justificatory discourses: denial, blaming the victim, and exonerating the male perpetrator. The findings support continuation of government and nongovernmental organization (NGO) projects aimed at both empowering women and reeducating men.
Article
This study extends existing world society research on ideational diffusion by going beyond examinations of national policy change to investigate the spread of ideas among nonelite individuals. Specifically, I test whether recent trends in women's attitudes about intimate partner violence are converging toward global cultural scripts. Results suggest that global norms regarding violence against women are reaching citizens worldwide, including in some of the least privileged parts of the globe. During the first decade of the 2000s, women in 23 of the 26 countries studied became more likely to reject intimate partner violence. Structural socioeconomic or demographic changes, such as urbanization, rising educational attainment, increasing media access, and cohort replacement, fail to explain the majority of the observed trend. Rather, women of all ages and social locations became less likely to accept justifications for intimate partner violence. The near uniformity of the trend and speed of the change in attitudes about intimate partner violence suggest that global cultural diffusion has played an important role.
Article
Understanding the relationships among environment, behavior, and health is a core concern of public health researchers. While a number of recent studies have investigated the use of social media to track infectious diseases such as influenza, little work has been done to determine if other health concerns can be inferred. In this paper, we present a large-scale study of 27 health-related statistics, including obesity, health insurance coverage, access to healthy foods, and teen birth rates. We perform a linguistic analysis of the Twitter activity in the top 100 most populous counties in the U.S., and find a significant correlation with 6 of the 27 health statistics. When compared to traditional models based on demographic variables alone, we find that augmenting models with Twitter-derived information improves predictive accuracy for 20 of 27 statistics, suggesting that this new methodology can complement existing approaches.
Article
An understanding of attitudes toward violence against women is vital for effective prevention strategies. In this study we examined attitudes regarding violence against women in samples of undergraduate women and men students from four countries: India, Japan, Kuwait, and the United States. Attitudes toward sexual assault and spousal physical violence differed between men and women and across the four countries. Variations in gender differences across countries indicated that, for attitudes regarding sexual assault of women in particular, sociocultural factors may be a stronger influence than gender. Findings suggest the importance of examining differences within the larger sociocultural context of political, historical, religious, and economic influences on attitudes toward gender roles and violence against women.