ArticlePDF Available

The Evolution of Political Memes: Detecting and Characterizing Internet Memes with Multi-modal Deep Learning

December 2019
Information Processing & Management 57(2):102172

December 2019
57(2):102172

DOI:10.1016/j.ipm.2019.102170

Authors:

David Beskow

United States Military Academy West Point

Sumeet Kumar

Carnegie Mellon University

Kathleen M Carley

Carnegie Mellon University

Combining humor with cultural relevance, Internet memes have become a ubiquitous artifact of the digital age. As Richard Dawkins described in his book The Selfish Gene, memes behave like cultural genes as they propagate and evolve through a complex process of `mutation' and `inheritance'. On the Internet, these memes activate inherent biases in a culture or society, sometimes replacing logical approaches to persuasive argument. Despite their fair share of success on the Internet, their detection and evolution have remained understudied. In this research, we propose and evaluate Meme-Hunter, a multi-modal deep learning model to classify images on the Internet as memes vs non-memes, and compare this to uni-modal approaches. We then use image similarity, meme specific optical character recognition, and face detection to find and study families of memes shared on Twitter in the 2018 US Mid-term elections. By mapping meme mutation in an electoral process, this study confirms Richard Dawkins' concept of meme evolution.

Memes used in conjunction with the US 2018 Midterm Elections.

…

OCR Pipeline for Meme Images.

…

Joint Model for meme classification.

…

Comparing training and test performance of different models.

…

Graph Learning with Fixed Radius Nearest Neighbors showing families of memes in the US 2018 mid-term elections (89K nodes and 1.87M links). Network visualization is done with Graphistry (https://www.graphistry.com/).

…

Figures - uploaded by David Beskow

Content may be subject to copyright.

Content uploaded by David Beskow

Content may be subject to copyright.

The Evolution of Political Memes: Detecting and Characterizing

Internet Memes with Multi-modal Deep Learning

David M. Beskow, Sumeet Kumar and Kathleen M. Carley

School of Computer Science

Carnegie Mellon University

5000 Forbes Ave, Pittsburgh, PA 15213, USA

ARTICLE INFO

Keywords:

Deep Learning

Multi-modal learning

Computer vision

Meme-detection

Meme

ABSTRACT

Combining humor with cultural relevance, Internet memes have become a ubiquitous artifact of

the digital age. As Richard Dawkins described in his book The Selﬁsh Gene, memes behave like

cultural genes as they propagate and evolve through a complex process of ‘mutation’ and ‘inher-

itance’. On the Internet, these memes activate inherent biases in a culture or society, sometimes

replacing logical approaches to persuasive argument. Despite their fair share of success on the

Internet, their detection and evolution have remained understudied. In this research, we propose

and evaluate Meme-Hunter, a multi-modal deep learning model to classify images on the Internet

as memes vs non-memes, and compare this to uni-modal approaches. We then use image simi-

larity, meme speciﬁc optical character recognition, and face detection to ﬁnd and study families

of memes shared on Twitter in the 2018 US Mid-term elections. By mapping meme mutation in

an electoral process, this study conﬁrms Richard Dawkins’ concept of meme evolution.

1. Introduction

Richard Dawkins ﬁrst coined the word meme in his now famous book The Selﬁsh Gene Dawkins (2006). He

developed the word meme by shortening the Greek word mimeme in an eﬀort to create a “...noun that conveys the

idea of a unit of cultural transmission, or a unit of imitation.” Dawkins indicated that memes function like genes for

culture, and can undergo variation, selection, and retention. The meme is further deﬁned as “an idea, behavior, style or

usage that spreads from person to person within a culture” Blackmore, Dugatkin, Boyd, Richerson and Plotkin (2000).

Examples of memes include shaking hands and singing “Happy Birthday”. As such, memes become building blocks

of complex cultures Shifman (2012).

Internet memes include any digital unit that transfers culture. This can be as simple as a phrase or hashtag, such as

the Diasoi meme in China Szablewicz (2014) or the #MeToo movement in America. The Internet provides an envi-

ronment for digital memes to quickly move from person to person, often mutating in the process as initially envisioned

by Dawkins. In 1982 the ﬁrst emoticon (:-)) was used on Carnegie Mellon University’s online bulletin board in order

to ﬂag humor Davison (2012). As a merger of humor, text, and a symbol, the emoticon became one of the ﬁrst types

of Internet memes.

While Internet memes can exist as words, emoticons, videos, or gifs, a common form is an image with superimposed

text that conveys some type of merged message. In the earlier days of the Internet, images with superimposed text began

to propagate via Usenet, email, and message boards. By the early 2000’s researchers began to study these speciﬁc visual

artifacts that were proliferating. Social networks soon emerged, allowing these memes to go viral.

Given the power of memes to appeal to cultures and sub-cultures, various political actors increasingly use them to

communicate political messaging and change the beliefs and actions of the fabric of a society. Canning even goes so far

as to claim that memes have replaced nuanced political debate Canning, Reinsborough and Smucker (2017). Memes

become a simple and eﬀective way to package a message for a target culture. In particular, memes are used for politics,

magnify echo chambers, and attack minority groups Peirson, Abel and Tolunay (2018). This has jumped into the public

discourse with various articles, including the New York Times article “The mainstreaming of political memes online”

Bowles (2018). The increasing use of Internet memes for “information operations” has led to our eﬀort to detect and

characterize memes that inhabit and propagate within given world events and the conversations that surround them.

Few research eﬀorts have attempted to capture a comprehensive dataset of political memes and the network they

travel through in a political election event, and then document how the memes evolve, propagate, and impact the

ORCID (s): 0000-0003-2814-8712 (D.M. Beskow)

D.M. Beskow et al.: Preprint submitted to Elsevier Page 1 of 16

Detecting and Characterizing Internet Memes

Figure 1: Memes used in conjunction with the US 2018 Midterm Elections.

network. Our work will develop a deep learning method to detect memes in social media streams and leverage graph

learning to cluster these images into meme “families”. We will then apply these methods to Twitter data streams

associated with the 2018 US Mid-term elections and the 2018 Swedish National Elections. In addition to contributing

a theoretical framework for classifying and clustering meme images, our research indicates that memes are shared

less but move to more places on the Internet when compared to non-meme images. Memes therefore spread through

diﬀerent mechanisms than other “viral” content.

This paper is organized as follows. In section Related Work, we describe the history of the Internet memes, prior

work exploring data analysis approaches to study memes, and deep neural networks that have been used on similar

problems. Then in section 3, we propose Meme-Hunter, a deep learning model to ﬁnd images on the Internet and

classify them as meme vs non-memes. We then use the models to study the usage of memes in two elections in section

4. Finally, we conclude the ﬁndings of this research and suggest directions for future work.

2. Related Work

2.1. History of Internet Memes

The study of memes has existed ever since Richard Dawkins introduced the concept in his book ‘The Selﬁsh Gene’

in the 1970’s Davis (2017). Many researchers have attempted to study the relationship between memes and culture.

The advancement in Internet technologies and the world-wide-web (www) gave meme researchers a laboratory with

which to study the spread and mutation of memes. This led to several books on memes, the most inﬂuential and

controversial being Blackmore’s The Meme Machine Blackmore (2000); Shifman (2013).

Linor Shifman has conducted extensive research of digital memes from the perspective of journalism and com-

munication. In 2013 Shifman deviated slightly from Dawson’s original deﬁnition and deﬁnes the Internet meme as

artifacts that “(a) share common characteristics of content, form, and/or stance; (b) are created with awareness of each

other; and (c) are circulated, imitated, and transformed via the Internet by multiple users” Shifman (2014a,b). She also

diﬀerentiates viral content from memetic content. She claims that viral content “is deﬁned here as a clip that spreads

to the masses via digital word-of-mouth mechanisms without signiﬁcant change.” In contrast, memetic content is “...a

D.M. Beskow et al.: Preprint submitted to Elsevier Page 2 of 16

Detecting and Characterizing Internet Memes

popular clip that lures extensive creative user engagement in the form of parody, pastiche, mash-ups or other derivative

work.”

In 2012 Davidson observes and discusses the fact that Internet memes typically lack attribution Davison (2012).

Unlike many other creative works, authors of Internet memes typically don’t attach their name to the memes they

create. They remove any traces of attribution from the ﬁle and its metadata, and usually introduce memes on sites that

oﬀer anonymity (4chan, Reddit, etc.), where they gain popularity before hopping over to mainstream media (Facebook,

Twitter, etc.) Bauckhage, Kersting and Hadiji (2013). Several theories exist that explain this behavior, but Davidson

seems to oﬀer the most logical in that anonymity enables a type of freedom. This freedom allows authors to create and

distribute questionable material without concern for retribution from authorities. It is this lack of certain attribution

that encourages malicious and divisive political actors to resort to memes for information operations.

The far reaching impact of a memes evolution combined with the often inherent anonymity make memes attractive

to various political and propaganda campaigns. The evolutionary nature of memes assists them in ‘hopping’ platforms

to move to additional Internet and social media spaces. The natural anonymity of memes allows various actors to

make it appear that the distribution of the messages is part of a grass roots movement. Donovan and Friedberg discuss

how images can be used to as “evidence collages” in a “source hacking” operation Donovan and Friedberg (2019),

thereby providing seemingly legitimate evidence of a false event or biased conclusion. It is these aspects of political

and propaganda memes that we want to apply our research.

2.2. Meme Detection

Deep neural networks (DNN) have shown great success in many ﬁelds Hinton, Deng, Yu, Dahl, Mohamed, Jaitly,

Senior, Vanhoucke, Nguyen, Sainath et al. (2012). Researchers have used DNNs for various vision tasks like the

Imagenet Challenge Deng, Dong, Socher, Li, Li and Fei-Fei (2009); Krizhevsky, Sutskever and Hinton (2012) and

fashion recommendation Song, Feng, Han, Yang, Liu and Nie (2018). DNN’s have also been used for various natural

language processing (NLP) tasks like Part of Speech (POS) tagging and named entity recognition Collobert and Weston

(2008). Ironically, deep learning has more often been used to automatically generate Internet memes as opposed to

ﬁnd them. In 2013 Wang et al. Wang and Wen (2015) used copula methods to jointly model text and vision features

with popular votes. In 2018 Peirson et al. Peirson et al. (2018) leveraged deep learning to generate memes in a model

they titled “Dank Learning”.

Xie et al. Xie, Natsev, Kender, Hill and Smith (2011) used YouTube to ﬁnd short video segments that are frequently

reposted which they call video memes. The authors then created a graph of people and content to model interactions.

Unlike video memes, exploring image memes is more challenging as this requires ﬁrst classifying an image as meme

or not-meme.

The closest research related to our detection eﬀort is the Memesequencer model developed by Dubey et al. Dubey,

Moro, Cebrian and Rahwan (2018). This research separates the meme image template (underlying image) from the

additional text and image manipulation. After separating the meme template it creates a meme embedding by con-

catenating image features and text features using deep learning, with the best model concatenating ResNet18 with

SkipThought text features. Having created an embedding, the authors construct the evolutionary tree using a phylo-

genetic tree. This research is limited to memes that have identiﬁable templates found on sites like Memegenerator or

Quickmeme. When used to extract memes for social cybersecurity practitioners, the Memegenerator provides high

precision but low recall (see below). Our intent with Meme-Hunter is to increase recall.

2.3. Meme Evolution

The digital footprint that Internet memes leave allows researchers to study the propagation of memes through (and

across) networks. Coscia looked at meme propagation and measurements of success in 2013 Coscia (2013). Bauckhage

et al. Bauckhage et al. (2013) explored the temporal models of fads by looking at Internet memes, approximating

interest in a given meme by using Google Trends. Leskovec et al. Leskovec, Backstrom and Kleinberg (2009) used

memes and phrases extracted from news and blogs to track and study the dynamics of the news cycle. This work was

able to map the evolution of text based memes in the news cycle and blogosphere. Ferrara et al. Ferrara, JafariAsbagh,

Varol, Qazvinian, Menczer and Flammini (2013) focused on clustering text based memes.

The closest research to our study of meme evolution is the study by Zannettou et al. Zannettou, Caulﬁeld, Black-

burn, De Cristofaro, Sirivianos, Stringhini and Suarez-Tangil (2018) that clusters image streams based on pHash and

identiﬁes memetic clusters using meme annotation from sites such as “Know Your Meme”. They apply this process to

multiple sources (Twitter, Reddit, 4chan, Gab) and then use Hawkes process to measure which ecosystem has greater

D.M. Beskow et al.: Preprint submitted to Elsevier Page 3 of 16

Detecting and Characterizing Internet Memes

inﬂuence. While focused on meme evolution and inﬂuence, this paper does not speciﬁcally develop a detection model

that generalizes easily beyond the Know Your Meme annotations, once again rendering low recall in detection appli-

cations. Additionally, this paper clusters only based on the image (via pHash) and does not consider the multi-modal

nature of memes when measuring similarity.

2.4. Meme Optical Character Recognition

The classiﬁcation process requires learning from a composition of image and text characteristics. Extracting text in

memes requires Optical Character Recognition (OCR). OCR on memes can be challenging since most OCR algorithms

are trained to recognize black font on white background, where many memes are white font on dark background. For

social media image OCR, the state of the art is arguably the Facebook Rosetta system, a deep learning model that

conducts OCR while taking into consideration the background as well Borisyuk, Gordo and Sivakumar (2018). This is

being deployed on Facebook’s platform in order to censor images for extremist messages, allowing Facebook to comply

with increased regulation, particularly in the European Union. Facebook Rosetta output is standard OCR output (text),

and it is not intended to classify memes vs. not-memes. It is also not open sourced or available for researchers (at the

time of this writing).

Our research combines some of the eﬀorts of Zannettou et al. Zannettou et al. (2018) with that of Dubey et al. Dubey

et al. (2018). In doing so, we go beyond both papers by creating a generalizable multi-modal meme detection model

that is not constrained by annotated entries on a site like Know Your Meme. Additionally, we develop the evolutionary

graph with a radius nearest neighbors approach and apply this speciﬁcally within the online debate around a large

election event (2018 Mid-term elections). This provides the research community with a gereralizable multi-modal

meme detection model, a new way to build an evolutionary tree, a meme OCR pipeline, and insights into meme impact

and propagation within political conversations. Additionally this model provides approximately 8 times increase in

recall over the template based methods that Dubey and Zannettou propose. This increase in detection recall is especially

important for social cybersecurity practitioners.

3. Classifying Images as Memes

Most images shared on platforms like Twitter are not memes (see Table 3for stats). Therefore, to explore the usage

of memes, it is essential to ﬁrst classify if an image is a meme or not. While visual Internet memes come in a wide

variety of formats, we restricted our classiﬁcation to two types that are commonly found. These two types are found

in Figure 4and can be described as:

1. A picture with superimposed white text in impact font. Impact font was developed in the 1960’s by Geoﬀ Lee

and is the font of choice for text over image Edwards. This is illustrated in Figure 4a.

2. Text placed in a white space over a picture, as is shown in Figure 4b.

While this seems restrictive, we will show later that, even with this constraint, our approach ﬁnds 8× more memes (i.e.

8× higher recall) than template based methods.

Given enough meme vs non-meme data, it could be possible for a neural-network model to learn to extract text

(using OCR), extract faces and other meme characteristics to classify memes. However, in a limited data setting like

ours, this approach is likely to fail as OCR itself is research domain in itself. Consequently, we propose to ﬁrst extract

text and face encodings and use them as supplementary input features. Then to predict an image to be meme/non-

meme, we explore deep learning based multi-modal (multiple features) models that use extracted features in addition

to the raw images.

Next, we describe our models, our data collection eﬀort to get meme and non-memes data to train the models, the

process of training the models, and the classiﬁcation performance.

3.1. Memes Classiﬁcation Models

As mentioned earlier, we ﬁrst extract text and face encodings, so here we explain the process of extracting text and

face encodings from images.

Text Extraction For Optical Character Recognition (OCR) we combined meme speciﬁc image preprocessing with an

open source OCR tool. When images contained white font over dark background, we preprocessed the images by 1)

converting the image to grayscale, 2) binarizing the image, and 3) inverting every bit in the binary image array. These

D.M. Beskow et al.: Preprint submitted to Elsevier Page 4 of 16

Detecting and Characterizing Internet Memes

Figure 2: OCR Pipeline for Meme Images.

image preprocessing steps are illustrated in Figure 2. OCR on preprocessed images was accomplished with Google

Tesseract Smith (2007). If images already had black text on white background, no preprocessing was applied. Our

experiments indicated that preprocessing signiﬁcantly improved Tesseract’s OCR on meme images. Baseline Tesseract

required an average of 49.8 ± 13.8character edits (or levenstein distance) with only 2% readability. Preprocessing

reduced this to an average of 17.5±4.8character edits with 72% of strings remaining readable.

Human Face Encoding As faces are an important element of memes, we extract facial features using the open source

face detection software package called face_recognition, created by Adam Geitgey and made available at Geitgey

(2019). The library returns a face encoding vector for each face found in the image. We use these vectors as the input

to our classiﬁcation models.

We tried four diﬀerent groups of classiﬁers: 1) unimodal classiﬁcation using only text 2) unimodal classiﬁcation

using only machine vision 3) multimodal classiﬁcation using text and vision, and 4) multimodal classiﬁcation using

text, vision, and face encoding.

LSTM based text classiﬁer In this unimodal model, we use only the extracted text from images as the input for meme

classiﬁcation. Long Short Term Memory (LSTM) Hochreiter and Schmidhuber (1997) networks are very popular for

text classiﬁcation. An LSTM takes word embedding and a hidden vector as the input and outputs a new hidden vector.

At the end of the text (input), a fully-connected layer followed by a softmax layer is used to predict the label of the

text. We used Glove vectors Pennington, Socher and Manning (2014) as the input word embeddings. In our results, we

provide several other text only models for comparison, including Naïve Bayes, Support Vector Machines, and Logistic

Regression.

CNN based image classiﬁer Given that our work focuses on image based meme detection, and Convolutional Neural

Networks (CNNs) are the most popular models for visual learning, it is natural for us to consider a CNN based model.

For this work, we tried a number of pre-trained models including VGG18 Simonyan and Zisserman (2014), ResNet18

He, Zhang, Ren and Sun (2016), ImagenetV3 Szegedy, Vanhoucke, Ioﬀe, Shlens and Wojna (2016). For classiﬁcation,

we removed the last fully connected layer of the pre-trained network, added a new fully connected network followed by

a sigmoid layer. We also explored freezing all layers, freezing some of the layers, and not freezing any of the layers in

the training process. In the end, allowing to update the weights on all layers provided the best results. We also include

results that extract descriptors with scale-invariant feature transform (SIFT) and Bag-of-Visual-Words (BOVW) feature

representation and support vector machine classiﬁcation. The SIFT-BOVW model is provided to demonstrate DNN

improvement over pre-DNN models.

Joint DNN model The joint DNN model approach starts by combining just the LSTM (discussed above) and CNN

(discussed above), and then combines the LSTM/CNN with face encoding features as a single model. The model’s

architecture is shown in Fig. 3. As shown in the ﬁgure, the output of the LSTM, the CNN and face encodings are

concatenated as a single vector. The concatenated vector is then used as the input to a dense fully connect layer

followed by a sigmoid activation. All parts of model are trained jointly.

D.M. Beskow et al.: Preprint submitted to Elsevier Page 5 of 16

Detecting and Characterizing Internet Memes

Give that meme-maker

a life imprisonment

Resnet 18

Face Encoding

LSTM

Meme/ Non-meme

Softmax

Tex t e x tr ac t io n

Face extracti on

Figure 3: Joint Model for meme classiﬁcation.

In the last connected layer we use a sigmoid (or logistic) function to generate a probability of the image being a

meme. The sigmoid function is deﬁned as

𝜙(𝑧) = 1

1 − 𝑒𝑥𝑝(𝑧)

3.2. Data

To label Internet meme images for supervised learning, we searched meme images on Reddit, Twitter, Tumblr,

Google Image Search, Flickr, and Instagram. Collecting images from these platforms, we were able to ﬁnd 25,109

meme images. The meme data contained varied meme categories, including sports, politics, celebrities, and animals.

While the dominant language is English, other languages include French, Spanish, German, Russian, Japanese, Arabic,

and Chinese. The non-meme images were collected at random from Twitter and Google Image search.

In the training data we ﬁltered out non-meme images that didn’t contain either text or a background photo. This

was done so that the algorithm would learn the unique attributes of meme images as opposed to just learning to identify

the presence or absence of text. In order to ﬁlter for text, we needed to conduct text detection but not necessarily text

recognition. We found that the Eﬃcient and Accurate Scene Text (EAST) detection model Zhou, Yao, Wen, Wang,

Zhou, He and Liang (2017) performed better at detecting text than the Tesseract based OCR pipeline discussed earlier.

Note that the East model detects the location of text in an image but does not recognize or extract the text. We used the

EAST algorithm to ﬁlter out any images that didn’t have at least one text bounding box. Having removed images that

don’t contain text, we discovered that we also needed to remove images that don’t contain a photograph. This decision

was made after ﬁnding many black and white document images, particularly in political conversations. To remove

document images, we developed a heuristic that measured the mean Red Green Blue (RGB) score for the image, and

removed it if the mean score was greater than 220. This proved to be fast and easily removed document images without

removing memes of interest. This ﬁlter was applied in both the training process as well as the production algorithm.

Image is document if 𝑅𝑒𝑑 +𝐺𝑟𝑒𝑒𝑛 +𝐵𝑙𝑢𝑒

3>220

We summarize the ﬁnal model training dataset in Table 1. The 50,209 images were mixed with equal portions of

meme and not-meme images. The data was then randomly split into training data (80%), validation data (10%), and

held out test data (10%).

D.M. Beskow et al.: Preprint submitted to Elsevier Page 6 of 16

Detecting and Characterizing Internet Memes

(a) Type A Meme (b) Type B Meme

Figure 4: Two types of memes used for meme classiﬁcation with their respective saliency maps. Saliency maps are

computed by averaging pooled gradients across channels.

Table 1

Classiﬁcation Dataset Statistics.

Total Images Memes Non-memes

50,209 25,109 25,100

Collecting images from social media streams often includes some amount of abusive language and adult content

images. Practitioners using our methods who want to minimize the impact of this sensitive content should have an

appropriate ﬁlter. In our case we used Yahoo’s Open Source “Not Safe For Work” (NSFW) ﬁlter (https://github.

com/yahoo/open_nsfw).

3.3. Experiments and Results

For the meme classiﬁcation task, we deﬁne the overall objective function using cross-entropy loss, as can be seen

in Equation 1, where 𝑖∈𝑛samples, 𝑗∈ {𝑚𝑒𝑚𝑒, 𝑛𝑜𝑛-𝑚𝑒𝑚𝑒}classes, 𝑦is the (one-hot) true label, 𝑝is the probability

output for each label.

ℒ(𝑦, 𝑝)=−1

𝑛∑

𝑖,𝑗

𝑦𝑖𝑗 log(𝑝𝑖𝑗 )(1)

Our primary metric of interest is the F1 score, deﬁned as the harmonic mean of precision and recall. We used

this as our primary metric since it balances the often competing priority of precision vs. recall. In our results we also

provide accuracy, precision, and recall for interpretability.

All models are built using Keras library1with Tensorﬂow backend 2. As described earlier, the models use text,

1https://keras.io/

2https://www.tensorﬂow.org/

D.M. Beskow et al.: Preprint submitted to Elsevier Page 7 of 16

Detecting and Characterizing Internet Memes

Table 2

Classiﬁcation Results.

Type Model Accuracy F1 Precision Recall

Text

Logistic Regression 0.724 0.719 0.735 0.703

Naïve Bayes 0.681 0.607 0.793 0.492

SVM 0.721 0.714 0.736 0.693

LSTM 0.799 0.805 0.786 0.824

Vision

SIFT-BOVW 0.798 0.788 0.828 0.752

Baseline CNN 0.939 0.938 0.946 0.930

VGG18 0.915 0.916 0.909 0.923

ResNet18 0.926 0.927 0.907 0.948

Inception-V3 0.958 0.958 0.952 0.964

Multi-modal

Vision + Text 0.954 0.954 0.943 0.965

Vision + Text Length 0.952 0.951 0.947 0.956

Vision + Text + Face 0.961 0.961 0.959 0.963

face-encoding, and image features as the input and a sigmoid layer for the class label prediction. The models are trained

using stochastic gradient descent with a cross-entropy loss function as seen in Equation 1. The learning rate was used

as a hyper-parameter and varied from 10−3 to 10−1. The LSTM hidden layer size was varied from 16 to 256. We found

that a hidden layer size of 50 and a learning rate of 10−3 worked well. These hyper-parameters were then ﬁxed during

the training and testing process.

We compare the performance of the models in Table. 2and show the training plots in Fig. 5. We train the models

for only 10 epochs since the performance plateaus after that. As we can observe from the plots, most of the learning

is done in the ﬁrst epoch and validation accuracy is high thereafter. From these results we see that the LSTM model is

signiﬁcantly better than other text models. Within the Vision Models, we see that all DNN models show signiﬁcantly

improvement over the SIFT-BOVW model, with the Inception-V3 very deep model providing the best performance

across all metrics. We do see that the multi-modal models provide slight improvement over unimodal vision models.

Model saliency maps Simonyan, Vedaldi and Zisserman (2013) are provided in Figures 4c and 4d. Saliency maps show

the salient pixels that are important for a given class and are computed by averaging pooled gradients across channels.

From these saliency maps we see that we are indeed learning to identify images where the text is positionally located

in pixel locations that are indicative of meme images. Overall we can summarize results by claiming that unimodal

machine vision models provide solid performance in meme detection, and can be enhanced (at a computational cost)

with multi-modal text based features.

4. Evaluating Memes in Election Events

4.1. Finding Memes

We used the DNN model to classify images used in the 2018 US Midterm Elections and the 2018 Swedish National

Elections. We will focus on the 2018 US Midterm election data because it provides the largest meme collection, but

the 2018 Swedish election data is provided in Table 3for comparison purposes. For the US midterm elections, we

collected all tweets that mentioned a member of congress or congressional candidate. For the Swedish elections, we

collected tweets containing hashtags associated with anti-immigrant and nationalistic movements (#svpol, #Val2018,

#feministisktInitiativ, #migpol, #valet2018, #SD2018, #AfS2018, and #MEDval18). Note that the Swedish election

data does not cover the full spectrum of politics in Sweden, but the US Midterm election data does cover the full

spectrum of politics in the United States. We downloaded all images from both data sets in February of 2019. As

indicated below, approximately 9% of the images weren’t available (the account or tweet was suspended by Twitter or

removed by the account owner). The statistics for both data sets are provided in Table 3.

We conducted binary classiﬁcation with our trained DNN model on all images extracted from both data streams.

A collage of examples that we classiﬁed as memes in the US mid-term elections is provided in Figure 1.

D.M. Beskow et al.: Preprint submitted to Elsevier Page 8 of 16

Detecting and Characterizing Internet Memes

0.925

0.950

0.975

1.000

2 4 6 8 10

epoch

Accuracy

Model

Vision

Vision + Text

Vision + Text + Face

Type

Training_Accuracy

Validation_Accuracy

Figure 5: Comparing training and test performance of diﬀerent models.

4.2. Mapping Meme Evolution in Political Conversations

Given the rich vision/text data that we had, we wanted to map the evolution of visual memes using similarity clus-

tering. By clustering these images, we can not only identify the families but also the connections between the families

of memes. We explored several proven methods for measuring image similarity, to include Color Histograms Novak

and Shafer (1992), Scale-Invariant Feature Transform (SIFT) Lowe (2004), Perceptual Hashing (pHash) Chamoso, Ri-

vas, Martín-Limorti and Rodríguez (2017), and a method similar to the Deep Ranking Wang, Song, Leung, Rosenberg,

Wang, Philbin, Chen and Wu (2014). Similar methods have been used with K-nearest neighbors for image annota-

tion Su and Xue (2015) and with mapReduce by Google for clustering billions of images Liu, Rosenberg and Rowley

(2007). Our initial experiments reveal that the deep ranking method (using features extracted from the last layer before

softmax and evaluated with euclidean distance) performs well. To identify the families of memes, we ﬁnally use graph

learning with ﬁxed radius nearest neighbors algorithm Bentley (1975). Fixed radius nearest neighbors ﬁnds the neigh-

bors within a given radius of a point or points. We chose ﬁxed-radius method over the K-nearest neighbors method

since the size of our meme families vary widely. This technique also allows us to quickly query similar images based

on a ﬁxed distance radius.

Given a meme, we use ‘brute-force’ based radius neighbour algorithm to ﬁnd the mutations of the meme. We

attempted to use the ball tree algorithm Omohundro (1989), which partitions meme features into a nested set of ﬁxed

dimensional hyper-spheres (balls) such that each hyper-sphere contains a set of memes based on its distance from the

balls center. Although the ball-tree was designed for high dimensionality, we found that this is still computationally

expensive with more than 120 features. With 25,088 features, we found that the ball-tree algorithm was not practical,

and resorted to the brute force algorithm. Once we have the neighbours of a meme, we can use time of the posting

associated with the meme to generate a directed graph of meme mutations. We recurse the whole process over the

neighbours to get the next set of neighbour and add them to the graph. We stop the recursion after a ﬁxed set of steps

or if the max size of the graph is attained. The algorithm is summarized below (Algorithm. 1). The map of all nodes

and links for the 2018 US Midterm elections is provided in Figure 6. In this we clearly see the clusters of similar

images (or “families”), as well as some of the connections between them.

Having mapped the individual “families” of memes, we used this similarity clustering and the date-time information

from the Tweet metadata to map the chronological evolution of speciﬁc memes as seen in Figure 7. In these images we

D.M. Beskow et al.: Preprint submitted to Elsevier Page 9 of 16

Detecting and Characterizing Internet Memes

Figure 6: Graph Learning with Fixed Radius Nearest Neighbors showing families of memes in the US 2018 mid-term

elections (89K nodes and 1.87M links). Network visualization is done with Graphistry (https://www.graphistry.com/).

Algorithm 1 Memes Mutation Graph Algorithm

1: procedure GETMUTATIONGRAPH(Meme m)

2: memes_graph ←new dictionary

3: neighbours ←Get radius neighbours

4: for 𝑏𝑖in neighbours do

5: if 𝑏𝑖not in memes_graph then

6: Add 𝑏𝑖to memes_graph

7: for 𝑏𝑖in neighbours do

8: if size(memes_graph)≤exit_condition then

9: child_memes_graph ←𝑔𝑒𝑡𝑀𝑢𝑡𝑎𝑡𝑖𝑜𝑛𝐺𝑟𝑎𝑝ℎ(b𝑖)

10: Add child_memes_graph to memes_graph

11: return memes_graph

see the cultural evolution that was originally envisioned by Richard Dawkins. We also see Linor Shifman’s deﬁnition

of memes play out as these meme images “lure extensive creative user engagement.”

4.3. Results and Findings

4.3.1. Memes Usage in Election Events

Having identiﬁed memes thriving in the online conversation around these election events, we calculated descriptive

statistics regarding memes and the accounts that share them. These descriptive statistics are provided in Table 3. In this

table we make several observations that help us understand meme popularity and virality. First, we see that, although

images are generally popular (high retweet/likes), memes are not. In both events, memes had fewer retweets and likes

than other images, and in the US election memes had a shorter “life-span” on average. We hypothesize that the reason

behind this is that attributed users do not want to associate their reputation with a controversial political meme and its

message. For the same reasons that meme creators disassociate themselves from the memes they create, social media

users, while inﬂuenced by memes, are hesitant to like or retweet them, especially polarizing political memes. If this

D.M. Beskow et al.: Preprint submitted to Elsevier Page 10 of 16

Detecting and Characterizing Internet Memes

Figure 7: Political conversations within and between political left and political right.

is the case, then the virality of memes may not be due to normal social media activity (like,share,retweet), but rather

occurs through the selection, retention, and mutation that Dawkins originally described. The memes mutate, carrying

pieces of the original message, and are reintroduced in other corners of the Internet.

We hypothesized that bots could be used to push memes on social media. Using the bot-hunter bot prediction tool

Beskow and Carley (2018) with a probability threshold of 0.6, we predicted the portion of accounts that have bot-like

characteristics. In the Swedish data we found a slightly higher bot involvement with memes, but did not ﬁnd this in the

US election data. From this analysis we conclude that bot activity did not play an out-sized role in meme propagation

for either of these events.

Additionally, we conducted face detection on the US election memes to ﬁnd 18 prominent US politicians in the

meme data. To do this we leveraged the open source face detection software created by Adam Geitgey and made

available at Geitgey (2019), using a comparison threshold of 0.54. Using this face detection software, we found the

distribution of memes by politician provided in Figure 8.

In Figure 9we’ve plotted the posting or retweeting of meme images in the 2018 US Election by the political party of

the candidate mentioned. Note that politicians and candidates are mentioned with both positive and negative memes.

In this case, we see the highest volume of meme mentioning Democrats and Republicans associated with the time

immediately after the Kavanaugh hearings.

4.3.2. Meme Propagation Across Platforms

Given the evolutionary and anonymous nature of memes, we hypothesized that memes propagate across the Inter-

net diﬀerently than other viral content. Viral content is generally spread through the simple mechanisms of sharing,

retweeting, liking, etc. Memes, as noted above, aren’t liked or retweeted near as much as other media content. We

believe that their propagation occurs more through their mutation and evolution, where one meme generates other

creative works that emerge in other parts of the Internet. This would cause memes to ‘hop’ to more platforms and

domains than normal images. While propagating to new corners of the Internet, however, the memes will undoubtedly

morph, and this mutations is out of the hands and control of the original creators.

To assess this hypothesis, we sampled 5,000 meme images and 5,000 non-meme images from images associated

D.M. Beskow et al.: Preprint submitted to Elsevier Page 11 of 16

Detecting and Characterizing Internet Memes

Table 3

Descriptive Statistics about Internet Memes in Online Election Conversations.

2018 Sweden Election US Midterm Election

Total Tweets 661K 62,034K

Total Users 88K 2,695K

Suspended/removed 1,616/2,302 41,901/47,349

Total Images Shared 47K 4,446K

Total Images Available 43K 4,037K

image meme normal

image

image meme normal

image

# Images Available 5K 38K 497K 3,539K

# of Unique Images 1.5K 10K 175K 951K

% of bot-like accounts 0.32 0.35 0.31 0.37 0.32 0.28

Life of tweet (hours) 0.51 0.60 0.59 21.80 16.02 22.87

Mean retweets 26 15 33 3,492 237 3,478

Mean Likes 0.84 1.50 2.03 15.96 24.42 65.48

User Median Followers 246 259 224 594 190 258

User Median Friends 348 401 340 857 375 407

Figure 8: Memes by Politician (identiﬁed by Face Detection).

with the 2018 US Mid-term elections. All images were political in semantic and visual content. We then conducted a

reverse image lookup or web-detection using the Google Vision API. This service provided us with links to matching

and partially matching images on the Internet. The 5,000 meme images had 62,475 matching links associated with

9,536 unique domains. The 5,000 non-meme images had only 13,617 total links associated with only 4,731 unique

domain names. The memes therefore were connected to roughly 4 times the number of links and twice the number

of domains when compared to non-meme images, supporting the hypothesis that memes propagate to more corners of

the Internet than other types of media.

D.M. Beskow et al.: Preprint submitted to Elsevier Page 12 of 16

Detecting and Characterizing Internet Memes

5000

10000

15000

Sep 01 Sep 15 Oct 01 Oct 15 Nov 01

count

Party Democrat Other Republican

Figure 9: Memes (both positive and negative) by Political Party of Candidate mentioned.

4.4. Comparison to Past Methods

In our section looking at related works, we noted several research eﬀorts that leverage meme templates. These

eﬀorts include multi-model eﬀorts by Dudley et al. Dubey et al. (2018) and meme evolution eﬀort by Zannettou et

al. Zannettou et al. (2018). While Dubey uses this technique for virality prediction and clustering, we primarily want

to compare their approach to meme hunter for the task of image retrieval (i.e. extracting all meme images in a given

social media stream). The primary limitation to their work is that it is constrained to identify memes found on sites like

Memegenerator or Quickmeme. As we illustrate below, this approach, while generating high precision, ﬁnds very few

of the total memes in election related social media streams (low recall). The Meme-Hunter approach that we propose,

while limited to only two types of memes, typically ﬁnds at least 8 times more memes in election related social media

streams as approaches constrained by meme templates.

To evaluate both methods, we randomly sampled 1,050 images from both the Swedish election event and 1,050

images from the 2018 Midterm election stream. We then manually labeled any image that could be construed as an

Internet meme as deﬁned by Dawkins and Schifman. We then ran our Meme-Hunter approach and compared this to a

template based approach.

To replicate a template based approach, we collected 39,112 meme templates from the Meme Generator web

application found at https://imgflip.com/memegenerator. This included most of the popular and even less

popular meme templates used, to include meme templates associated with politicians and world leaders. We then used

perceptual hashing (phash) to identify any image in the test image set that used one of the meme templates. Positive

matches were determined by those hashes that required less than 10 substitutions in a hamming distance comparison.

Positive matches were then considered memes.

Meme hunter was applied with unimodal machine vision models as well as multi-modal models as indicated in

Table 4. In this comparison we see that, while template based approaches oﬀer high accuracy and precision, the recall

in both election based streams is only approximate 5%. In these very dynamic political dialogues, many images that are

construed as memes are not yet in the template databases. This means that using template based methods will only ﬁnd

5% of the memes in these streams. The Meme Hunter approach, while oﬀering slightly lower accuracy and precision, is

able to ﬁnd 8× more memes, with the InceptionV3 unimodal model and all multi-modal models providing the highest

performance across all metrics. We see that, in regards to accuracy metric, multi-modal consistently outperforms

unimodal models. The top models using the Meme-Hunter DNN approach ﬁnd approximately 50% of the images in

both streams.

In this comparison we also want to comment on the lower performance of Meme-Hunter in the US Midterm stream

compared to the Swedish election stream. This is the result of more sophisticated memes being used in the US election

D.M. Beskow et al.: Preprint submitted to Elsevier Page 13 of 16

Detecting and Characterizing Internet Memes

Table 4

Comparing Meme-Hunter to meme template based approaches to ﬁnd memes in social media streams.

Sweden US Midterms

Model Accuracy F1 Precision Recall Accuracy F1 Precision Recall

Template Based 0.872 0.107 0.727 0.058 0.795 0.100 0.667 0.054

VGG18 0.809 0.437 0.358 0.561 0.771 0.348 0.435 0.290

ResNet18 0.846 0.464 0.429 0.504 0.806 0.430 0.562 0.348

Inception V3 0.820 0.488 0.391 0.647 0.807 0.494 0.550 0.448

Vision + Text 0.865 0.510 0.490 0.532 0.815 0.455 0.600 0.367

Vision + Text + Face 0.858 0.511 0.470 0.561 0.812 0.439 0.592 0.348

stream, some of which are elaborate photo editing work ﬂows and contain no text. Others contain vertical text or

specially placed text. Meme-Hunter will struggle to positively identify these more sophisticated memes.

5. Conclusion

In this paper we present a method for using deep learning to classify memes and graph learning to cluster them into

their evolutionary “families”. Additionally, these models were used to analyze meme usage inside two large democratic

election events. We found that Meme-Hunter provided at least 8 times higher recall than template based methods and

that graph learning is able to capture the overall structure of the evolutionary tree. Having identiﬁed memes in large

election events, we found evidence that memes are liked and retweeted less, but families of memes ‘hop’ platforms

and travel to more locations of the Internet than regular images. This indicates that memes do not propagate across

social media and the Internet in the same way as other viral content.

The organic and evolutionary nature of memes has caused some nation states to ban them McDonell (2017), while

encouraging other nations to leverage them as part of elaborate propaganda operations Groll (2018). The countries

that ban them do so largely because memes evolve outside of the control of the state and because image memes can be

diﬃcult to trace Abad-Santos (2013). Those countries that leverage them for information warfare do so for the exact

same reasons. We hope that our proposed methods to study memes would provide more possibilities to trace memes

for good causes.

In future work we plan to use the learned graph and dynamic network analysis to analyze the evolution of the meme

families over time.

6. Acknowledgements

This work was supported in part by the Oﬃce of Naval Research (ONR) Award N00014182106 Group Polarization

in Social Media and the Center for Computational Analysis of Social and Organization Systems (CASOS). The views

and conclusions contained in this document are those of the authors and should not be interpreted as representing the

oﬃcial policies, either expressed or implied, of the ONR or the U.S. Government.

References

Abad-Santos, A., 2013. How memes became the best weapon against chinese internet censorship - the atlantic. https://www.theatlantic.com/

international/archive/2013/06/how-memes- became-best- weapon-against-chinese- internet-censorship/314618/. (Ac-

cessed on 04/06/2019).

Bauckhage, C., Kersting, K., Hadiji, F., 2013. Mathematical models of fads explain the temporal dynamics of internet memes., in: ICWSM, pp.

22–30.

Bentley, J.L., 1975. A Survey of Techniques for Fixed Radius near Neighbor Searching. Technical Report.

Beskow, D., Carley, K.M., 2018. Introducing bothunter: A tiered approach to detection and characterizing automated activity on twitter, in: Bisgin,

H., Hyder, A., Dancy, C., Thomson, R. (Eds.), International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and

Behavior Representation in Modeling and Simulation, Springer.

Blackmore, S., 2000. The Meme Machine. volume 25. Oxford Paperbacks.

Blackmore, S., Dugatkin, L.A., Boyd, R., Richerson, P.J., Plotkin, H., 2000. The power of memes. Scientiﬁc American 283, 64–73.

D.M. Beskow et al.: Preprint submitted to Elsevier Page 14 of 16

Detecting and Characterizing Internet Memes

Borisyuk, F., Gordo, A., Sivakumar, V., 2018. Rosetta: Large scale system for text detection and recognition in images, in: Proceedings of the 24th

ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM. pp. 71–79.

Bowles, N., 2018. The mainstreaming of political memes online. New York Times URL: https://www.nytimes.com/interactive/2018/

02/09/technology/political-memes- go-mainstream.html.

Canning, D., Reinsborough, P., Smucker, J.M., 2017. Re: Imagining Change: How to Use Story-based Strategy to Win Campaigns, Build Move-

ments, and Change the World. Pm Press.

Chamoso, P., Rivas, A., Martín-Limorti, J.J., Rodríguez, S., 2017. A hash based image matching algorithm for social networks, in: International

Conference on Practical Applications of Agents and Multi-Agent Systems, Springer. pp. 183–190.

Collobert, R., Weston, J., 2008. A uniﬁed architecture for natural language processing: Deep neural networks with multitask learning, in: Proceed-

ings of the 25th international conference on Machine learning, ACM. pp. 160–167.

Coscia, M., 2013. Competition and success in the meme pool: A case study on quickmeme. com., in: ICWSM.

Davis, N., 2017. The Selﬁsh Gene. Macat Library.

Davison, P., 2012. The language of internet memes. The social media reader , 120–134.

Dawkins, R., 2006. The selﬁsh gene: With a new introduction by the author. UK: Oxford University Press.(Originally published in 1976) .

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L., 2009. Imagenet: A large-scale hierarchical image database, in: 2009 IEEE conference

on computer vision and pattern recognition, Ieee. pp. 248–255.

Donovan, J., Friedberg, B., 2019. Source hacking: Media manipulation in practice. Retrieved from Data&Society website: https://datasociety.

net/output/source-hacking-media-manipulation-in-practice .

Dubey, A., Moro, E., Cebrian, M., Rahwan, I., 2018. Memesequencer: Sparse matching for embedding image macros, in: Proceedings of the 2018

World Wide Web Conference, International World Wide Web Conferences Steering Committee. pp. 1225–1235.

Edwards, P., . The reason every meme uses that one font - vox. https://www.vox.com/2015/7/26/9036993/meme- font-impact. (Accessed

on 02/20/2019).

Ferrara, E., JafariAsbagh, M., Varol, O., Qazvinian, V., Menczer, F., Flammini, A., 2013. Clustering memes in social media, in: 2013 IEEE/ACM

International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013), IEEE. pp. 548–555.

Geitgey, A., 2019. Face recognition. https://github.com/ageitgey/face_recognition.

Groll, E., 2018. How russia hacked u.s. politics with instagram marketing – foreign policy. https://foreignpolicy.com/2018/12/17/

how-russia- hacked-us- politics-with-instagram- marketing/. (Accessed on 04/06/2019).

He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision

and pattern recognition, pp. 770–778.

Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al., 2012. Deep neural

networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 82–97.

Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural computation 9, 1735–1780.

Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classiﬁcation with deep convolutional neural networks, in: Advances in neural infor-

mation processing systems, pp. 1097–1105.

Leskovec, J., Backstrom, L., Kleinberg, J., 2009. Meme-tracking and the dynamics of the news cycle, in: Proceedings of the 15th ACM SIGKDD

international conference on Knowledge discovery and data mining, ACM. pp. 497–506.

Liu, T., Rosenberg, C., Rowley, H.A., 2007. Clustering billions of images with large scale nearest neighbor search, in: 2007 IEEE Workshop on

Applications of Computer Vision (WACV’07), IEEE. pp. 28–28.

Lowe, D.G., 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision 60, 91–110.

McDonell, S., 2017. Why china censors banned winnie the pooh - bbc news. https://www.bbc.com/news/blogs- china-blog-40627855.

(Accessed on 04/06/2019).

Novak, C.L., Shafer, S.A., 1992. Anatomy of a color histogram, in: Proceedings 1992 IEEE Computer Society Conference on Computer Vision

and Pattern Recognition, IEEE. pp. 599–605.

Omohundro, S.M., 1989. Five Balltree Construction Algorithms. International Computer Science Institute Berkeley.

Peirson, V., Abel, L., Tolunay, E.M., 2018. Dank learning: Generating memes using deep neural networks. arXiv preprint arXiv:1806.04510 .

Pennington, J., Socher, R., Manning, C., 2014. Glove: Global vectors for word representation, in: Proceedings of the 2014 conference on empirical

methods in natural language processing (EMNLP), pp. 1532–1543.

Shifman, L., 2012. An anatomy of a youtube meme. new media & society 14, 187–203.

Shifman, L., 2013. Memes in a digital world: Reconciling with a conceptual troublemaker. Journal of Computer-Mediated Communication 18,

362–377.

Shifman, L., 2014a. The cultural logic of photo-based meme genres. Journal of Visual Culture 13, 340–358.

Shifman, L., 2014b. Memes in Digital Culture. MIT press.

Simonyan, K., Vedaldi, A., Zisserman, A., 2013. Deep inside convolutional networks: Visualising image classiﬁcation models and saliency maps.

arXiv preprint arXiv:1312.6034 .

Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 .

Smith, R., 2007. An overview of the tesseract ocr engine, in: Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International

Conference on, IEEE. pp. 629–633.

Song, X., Feng, F., Han, X., Yang, X., Liu, W., Nie, L., 2018. Neural compatibility modeling with attentive knowledge distillation, in: The 41st

International ACM SIGIR Conference on Research & Development in Information Retrieval, ACM. pp. 5–14.

Su, F., Xue, L., 2015. Graph learning on k nearest neighbours for automatic image annotation, in: Proceedings of the 5th ACM on International

Conference on Multimedia Retrieval, ACM. pp. 403–410.

Szablewicz, M., 2014. The ‘losers’ of china’s internet: Memes as ‘structures of feeling’for disillusioned young netizens. China Information 28,

259–275.

D.M. Beskow et al.: Preprint submitted to Elsevier Page 15 of 16

Detecting and Characterizing Internet Memes

Szegedy, C., Vanhoucke, V., Ioﬀe, S., Shlens, J., Wojna, Z., 2016. Rethinking the inception architecture for computer vision, in: Proceedings of the

IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826.

Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y., 2014. Learning ﬁne-grained image similarity with deep

ranking, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1386–1393.

Wang, W.Y., Wen, M., 2015. I can has cheezburger? a nonparanormal approach to combining textual and visual information for predicting and gen-

erating popular meme descriptions, in: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational

Linguistics: Human Language Technologies, pp. 355–365.

Xie, L., Natsev, A., Kender, J.R., Hill, M., Smith, J.R., 2011. Visual memes in social media: Tracking real-world news in youtube videos, in:

Proceedings of the 19th ACM International Conference on Multimedia, ACM, New York, NY, USA. pp. 53–62. URL: http://doi.acm.org/

10.1145/2072298.2072307, doi:10.1145/2072298.2072307.

Zannettou, S., Caulﬁeld, T., Blackburn, J., De Cristofaro, E., Sirivianos, M., Stringhini, G., Suarez-Tangil, G., 2018. On the origins of memes by

means of fringe web communities, in: Proceedings of the Internet Measurement Conference 2018, ACM. pp. 188–202.

Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J., 2017. EAST: An eﬃcient and accurate scene text detector, in: 2017 IEEE

Conference on Computer Vision and Pattern Recognition (CVPR), IEEE. pp. 2642–2651.

D.M. Beskow et al.: Preprint submitted to Elsevier Page 16 of 16

Situation Awareness in AI-based Technologies and Multimodal Systems: Architectures, Challenges and Applications

Article

Full-text available

Jan 2024

Situation Awareness (SA) is a process of sensing, understanding and predicting the environment and is an important component in complex systems. The reception of information from the environment tends to be continuous and of a multimodal nature. AI technologies provide a more efficient and robust support by subdividing the different stages of SA objectives into tasks such as data fusion, representation, classification, and prediction. This paper provides an overview of AI and multimodal methods used to build, enhance and evaluate SA in a variety of environments and applications. Emphasis is placed on enhancing perceptual integrity and persistence. Research indicates that the integration of artificial intelligence and multimodal approaches has significantly enhanced perception and comprehension in complex systems. However, there remains a research gap in projecting future situations and effectively fusing multimodal information. This paper summarizes some of the use cases and lessons learned where AI and multimodal techniques have been used to deliver SA. Future perspectives and challenges are proposed, including more comprehensive predictions, greater interpretability, and more advanced visual information.

Memes of Saudi Arabian National Soccer Team in World Cup 2022: A Focus from the General Theory of Verbal Humor

Article

Full-text available

Jun 2024

This qualitative descriptive study attempts to investigate verbal humour in nineteen Saudi Arabian national soccer team's memes on the event of their participation in 2022 FIFA World Cup in Qatar. Since this study uses Attardo and Raskin's (1991) General Theory of Verbal Humour (GTVH) as its methodology, it attempts to check whether GTVH six Knowledge Resources (KRs) apply to the different methods and strategies Saudi fans used in the creation of these humorous memes. Data has been randomly selected from various users' accounts on X (formerly known as Twitter) under different hashtags related to the Saudi Arabian national soccer team's participation. The results reveal that all the analysed nineteen memes were structured around the six KRs proposed by GTVH. These are: Script Opposition (SO), Logical mechanism (LM), Situation (SI), Target (TA), Narrative strategy (NS), and Language (LA). This ultimately supports Attardo's (2001) view that all kinds of humorous texts, memes in this study, are within the scope of GTVH KRs. The types of verbal humor used were: irony, wordplay, exaggeration, coincidence, juxtaposition, sarcasm, humor, and satire. Their selection depended on the surrounding context and the user's intention supporting Dynel's view (2009). The creation of these memes relied on images related to famous national and international media figures such as singers, actors, players, coaches, and soccer events forming a framework that shaped the soccer humorous discourse of memes in social media. Humor in these memes has been found to originate from the interplay between the different script oppositions used in the image of the meme and the text in the caption. In addition, memes were found to be mainly used for humorous effects in three occasions: (1) celebrating their team's victory by mocking international players who participated in Saudi Arabia-Argentina match, (2) relieving different emotions such as stress and concern on the occasion of their team's defeat by Poland, and (3) satirizing Saudi team players after their loss to Mexico. These memes also have been found to have an implicit (non-literal) meaning that depends on culture, time, and the recipient's personal experience supporting the views of

Toxic Memes: A Survey of Computational Perspectives on the Detection and Explanation of Meme Toxicities

Preprint

Full-text available

Jun 2024

Internet memes, channels for humor, social commentary, and cultural expression, are increasingly used to spread toxic messages. Studies on the computational analyses of toxic memes have significantly grown over the past five years, and the only three surveys on computational toxic meme analysis cover only work published until 2022, leading to inconsistent terminology and unexplored trends. Our work fills this gap by surveying content-based computational perspectives on toxic memes, and reviewing key developments until early 2024. Employing the PRISMA methodology, we systematically extend the previously considered papers, achieving a threefold result. First, we survey 119 new papers, analyzing 158 computational works focused on content-based toxic meme analysis. We identify over 30 datasets used in toxic meme analysis and examine their labeling systems. Second, after observing the existence of unclear definitions of meme toxicity in computational works, we introduce a new taxonomy for categorizing meme toxicity types. We also note an expansion in computational tasks beyond the simple binary classification of memes as toxic or non-toxic, indicating a shift towards achieving a nuanced comprehension of toxicity. Third, we identify three content-based dimensions of meme toxicity under automatic study: target, intent, and conveyance tactics. We develop a framework illustrating the relationships between these dimensions and meme toxicities. The survey analyzes key challenges and recent trends, such as enhanced cross-modal reasoning, integrating expert and cultural knowledge, the demand for automatic toxicity explanations, and handling meme toxicity in low-resource languages. Also, it notes the rising use of Large Language Models (LLMs) and generative AI for detecting and generating toxic memes. Finally, it proposes pathways for advancing toxic meme detection and interpretation.

Investigating the Effects of Misinformation as Infopathogens: Developing a Model and Thought Experiment

Article

Full-text available

May 2024

Previously, it has been shown that transmissible and harmful misinformation can be viewed as pathogenic, potentially contributing to collective social epidemics. In this study, a biological analogy is developed to allow investigative methods that are applied to biological epidemics to be considered for adaptation to digital and social ones including those associated with misinformation. The model’s components include infopathogens, tropes, cognition, memes, and phenotypes. The model can be used for diagnostic, pathologic, and synoptic/taxonomic study of the spread of misinformation. A thought experiment based on a hypothetical riot is used to understand how disinformation spreads.

The Faces and Forms of Pandemic Humor: Exploring Covid-19 Memes with Visual Machine Learning

Article

Full-text available

May 2024

The Covid-19 pandemic brought about an unprecedented cycle of digitally spread humor. This article analyzes a corpus of 12,337 humor items from 80+ countries, mainly in visual format, and mostly memes, collected during the first half of 2020, to understand the features and intended audiences of this “pandemic humor”. Employing visual machine-learning techniques and additional qualitative analysis, we ask which actors and which templates were most prominent in the pandemic humor, and how these actors and templates vary on the following dimensions: local vs. global, Covid-specific vs. general, and specifically for the actors, political vs. not political. Our analysis shows that most pandemic memes from the first wave are not political. The vast majority of the memes are global: They are based on well-recognized meme templates, and almost all identified actors are part of a cast of set “meme faces”, mostly from the US and the UK but recognized around the world. Most popular templates were found in several countries and languages, including non-European languages. Most memes were based on non-Covid specific templates, but we found new Covid-specific memes, which sheds new light on the process by which memes emerge, spread, and potentially become new meme templates. Our analysis supplements existing studies of (Covid) memes that mostly focus on small national samples, using qualitative methods. This cross-national analysis is enabled by a global dataset with unique data on geographical origin of humor. We show the usefulness of visual machine learning for identifying the emergence, spread and prevalence of transnational (humorous) cultural forms. By combining large-scale computational analysis with in-depth analysis, we bridge a gap in in meme studies between (mostly quantitative) data sciences and (mostly qualitative) communication and media studies.

Exploring the Impact of Memetic Content on Political Behaviors of University Students in Punjab, Pakistan

Article

Full-text available

Dec 2023

This research article investigates the impact of memetic content on the political behaviours of university students in Punjab, Pakistan. With the rapid growth of social media and the increasing popularity of memetic content, understanding its influence on political behaviours becomes crucial, especially among the young and educated population. A survey-based research method has been employed, using a questionnaire as the data gathering tool, targeting university students in Punjab. The study aims to explore the relationship between exposure to memetic content and political behaviours, including political engagement, political knowledge, and political participation. The sample population consisted of university students from diverse disciplines, allowing for a comprehensive analysis of the research topic. The findings from this research will contribute to our understanding of how memetic content shapes political behaviours and may inform strategies to enhance political awareness and engagement among university students in Punjab, Pakistan.

Racist and Abusive Memes Classification Using Deep Learning Techniques

Chapter

Jun 2024

PolitiFi: Just Another Meme or Instrumental for Winning Elections?

Article

Jan 2024

Bridging gaps in image meme research: A multidisciplinary paradigm for scaling up qualitative analyses

Article

May 2024

This paper outlines a multidisciplinary framework ( Digital Rhetorical Ecosystem or DRE3 ) for scaling up qualitative analyses of image memes. First, we make a case for applying rhetorical theory to examine image memes as quasi‐arguments that promote claims on a variety of political and social issues. Next, we argue for integrating rhetorical analysis of image memes into an ecological framework to trace interaction and evolution of memetic claims as they coalesce into evidence ecosystems that inform public narratives. Finally, we apply a computational framework to address the particular problem of claim identification in memes at large scales. Our integrated framework answers the recent call in information studies to highlight the social, political, and cultural attributes of information phenomena, and bridges the divide between small‐scale qualitative analyses and large‐scale computational analyses of image memes. We present this theoretical framework to guide the development of research questions, processes, and computational architecture to study the widespread and powerful influence of image memes in shaping consequential public beliefs and sentiments.

MISTRA: Misogyny Detection through Text-Image Fusion and Representation Analysis

Article

Apr 2024

On the Origins of Memes by Means of Fringe Web Communities

Conference Paper

Full-text available

Oct 2018

Internet memes are increasingly used to sway and manipulate public opinion. This prompts the need to study their propagation, evolution, and influence across the Web. In this paper, we detect and measure the propagation of memes across multiple Web communities, using a processing pipeline based on perceptual hashing and clustering techniques, and a dataset of 160M images from 2.6B posts gathered from Twitter, Reddit, 4chan's Politically Incorrect board (/pol/), and Gab, over the course of 13 months. We group the images posted on fringe Web communities (/pol/, Gab, and The_Donald subreddit) into clusters, annotate them using meme metadata obtained from Know Your Meme, and also map images from mainstream communities (Twitter and Reddit) to the clusters. Our analysis provides an assessment of the popularity and diversity of memes in the context of each community, showing, e.g., that racist memes are extremely common in fringe Web communities. We also find a substantial number of politics-related memes on both mainstream and fringe Web communities, supporting media reports that memes might be used to enhance or harm politicians. Finally, we use Hawkes processes to model the interplay between Web communities and quantify their reciprocal influence, finding that /pol/ substantially influences the meme ecosystem with the number of memes it produces, while The_Donald has a higher success rate in pushing them to other communities.

Bot-hunter: A Tiered Approach to Detecting & Characterizing Automated Activity on Twitter

Conference Paper

Full-text available

Jul 2018

As malicious automated agents, or bots, are increasingly used to manipulate the global marketplace of information and beliefs, their detection, characterization, and at times neutralization is an important aspect of a national security operations. Unhindered, these information campaigns, assisted by automated agents, can begin slowly changing a society and its norms. Within this context, we seek to lay the groundwork for bot-hunter, a Tiered Approach to bot detection and characterization, while simultaneously presenting an event based method for annotating data.

MemeSequencer: Sparse Matching for Embedding Image Macros

Conference Paper

Full-text available

Apr 2018

The analysis of the creation, mutation, and propagation of social media content on the Internet is an essential problem in computational social science, affecting areas ranging from marketing to political mobilization. A first step towards understanding the evolution of images online is the analysis of rapidly modifying and propagating memetic imagery or "memes". However, a pitfall in proceeding with such an investigation is the current incapability to produce a robust semantic space for such imagery, capable of understanding differences in Image Macros. In this study, we provide a first step in the systematic study of image evolution on the Internet, by proposing an algorithm based on sparse representations and deep learning to decouple various types of content in such images and produce a rich semantic embedding. We demonstrate the benefits of our approach on a variety of tasks pertaining to memes and Image Macros, such as image clustering, image retrieval, topic prediction and virality prediction, surpassing the existing methods on each. In addition to its utility on quantitative tasks, our method opens up the possibility of obtaining the first large-scale understanding of the evolution and propagation of memetic imagery.

MemeSequencer: Sparse Matching for Embedding Image Macros

Article

Full-text available

Feb 2018

The analysis of the creation, mutation, and propagation of social media content on the Internet is an essential problem in computational social science, affecting areas ranging from marketing to political mobilization. A first step towards understanding the evolution of images online is the analysis of rapidly modifying and propagating memetic imagery or `memes'. However, a pitfall in proceeding with such an investigation is the current incapability to produce a robust semantic space for such imagery, capable of understanding differences in Image Macros. In this study, we provide a first step in the systematic study of image evolution on the Internet, by proposing an algorithm based on sparse representations and deep learning to decouple various types of content in such images and produce a rich semantic embedding. We demonstrate the benefits of our approach on a variety of tasks pertaining to memes and Image Macros, such as image clustering, image retrieval, topic prediction and virality prediction, surpassing the existing methods on each. In addition to its utility on quantitative tasks, our method opens up the possibility of obtaining the first large-scale understanding of the evolution and propagation of memetic imagery.

Rosetta: Large Scale System for Text Detection and Recognition in Images

Conference Paper

Jul 2018

In this paper we present a deployed, scalable optical character recognition (OCR) system, which we call Rosetta , designed to process images uploaded daily at Facebook scale. Sharing of image content has become one of the primary ways to communicate information among internet users within social networks such as Facebook, and the understanding of such media, including its textual information, is of paramount importance to facilitate search and recommendation applications. We present modeling techniques for efficient detection and recognition of text in images and describe Rosetta 's system architecture. We perform extensive evaluation of presented technologies, explain useful practical approaches to build an OCR system at scale, and provide insightful intuitions as to why and how certain components work based on the lessons learnt during the development and deployment of the system.

Neural Compatibility Modeling with Attentive Knowledge Distillation

Conference Paper

Jun 2018

Recently, the booming fashion sector and its huge potential benefits have attracted tremendous attention from many research communities. In particular, increasing research efforts have been dedicated to the complementary clothing matching as matching clothes to make a suitable outfit has become a daily headache for many people, especially those who do not have the sense of aesthetics. Thanks to the remarkable success of neural networks in various applications such as the image classification and speech recognition, the researchers are enabled to adopt the data-driven learning methods to analyze fashion items. Nevertheless, existing studies overlook the rich valuable knowledge (rules) accumulated in fashion domain, especially the rules regarding clothing matching. Towards this end, in this work, we shed light on the complementary clothing matching by integrating the advanced deep neural networks and the rich fashion domain knowledge. Considering that the rules can be fuzzy and different rules may have different confidence levels to different samples, we present a neural compatibility modeling scheme with attentive knowledge distillation based on the teacher-student network scheme. Extensive experiments on the real-world dataset show the superiority of our model over several state-of-the-art methods. Based upon the comparisons, we observe certain fashion insights that can add value to the fashion matching study. As a byproduct, we released the codes, and involved parameters to benefit other researchers.

EAST: An Efficient and Accurate Scene Text Detector

Conference Paper

Jul 2017

Imagenet classification with deep convolutional neural networks

Conference Paper

Jan 2012

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry

A Hash Based Image Matching Algorithm for Social Networks

Conference Paper

Jul 2017

One of the main research trends over the last years has focused on knowledge extraction from social networks users. One of the main difficulties of this analysis is the lack of structure of the information and the multiple formats in which it can appear. The present article focuses on the analysis of the information provided by different users in image form. The problem that is intended to be solved is the detection of equal images (although they may have minimal transformations, such as a watermark), which allows establishing links between users who publish the same images. The solution proposed in the article is based on the comparison of hashes, which allows certain transformations that can be made to an image from a computational point of view.

Deep Residual Learning for Image Recognition

Conference Paper

Jun 2016

The Evolution of Political Memes: Detecting and Characterizing Internet Memes with Multi-modal Deep Learning

Abstract and Figures

Recommended publications

On the Origins of Memes by Means of Fringe Web Communities

Influence in Social Networks Through Visual Analysis of Image Memes

Characterization and Comparison of Russian and Chinese Disinformation Campaigns

Acquisition and Analysis of a Meme Corpus to Investigate Web Culture