Content uploaded by David Beskow
Author content
All content in this area was uploaded by David Beskow on Dec 18, 2019
Content may be subject to copyright.
The Evolution of Political Memes: Detecting and Characterizing
Internet Memes with Multi-modal Deep Learning
David M. Beskow, Sumeet Kumar and Kathleen M. Carley
School of Computer Science
Carnegie Mellon University
5000 Forbes Ave, Pittsburgh, PA 15213, USA
ARTICLE INFO
Keywords:
Deep Learning
Multi-modal learning
Computer vision
Meme-detection
Meme
ABSTRACT
Combining humor with cultural relevance, Internet memes have become a ubiquitous artifact of
the digital age. As Richard Dawkins described in his book The Selfish Gene, memes behave like
cultural genes as they propagate and evolve through a complex process of ‘mutation’ and ‘inher-
itance’. On the Internet, these memes activate inherent biases in a culture or society, sometimes
replacing logical approaches to persuasive argument. Despite their fair share of success on the
Internet, their detection and evolution have remained understudied. In this research, we propose
and evaluate Meme-Hunter, a multi-modal deep learning model to classify images on the Internet
as memes vs non-memes, and compare this to uni-modal approaches. We then use image simi-
larity, meme specific optical character recognition, and face detection to find and study families
of memes shared on Twitter in the 2018 US Mid-term elections. By mapping meme mutation in
an electoral process, this study confirms Richard Dawkins’ concept of meme evolution.
1. Introduction
Richard Dawkins first coined the word meme in his now famous book The Selfish Gene Dawkins (2006). He
developed the word meme by shortening the Greek word mimeme in an effort to create a “...noun that conveys the
idea of a unit of cultural transmission, or a unit of imitation.” Dawkins indicated that memes function like genes for
culture, and can undergo variation, selection, and retention. The meme is further defined as “an idea, behavior, style or
usage that spreads from person to person within a culture” Blackmore, Dugatkin, Boyd, Richerson and Plotkin (2000).
Examples of memes include shaking hands and singing “Happy Birthday”. As such, memes become building blocks
of complex cultures Shifman (2012).
Internet memes include any digital unit that transfers culture. This can be as simple as a phrase or hashtag, such as
the Diasoi meme in China Szablewicz (2014) or the #MeToo movement in America. The Internet provides an envi-
ronment for digital memes to quickly move from person to person, often mutating in the process as initially envisioned
by Dawkins. In 1982 the first emoticon (:-)) was used on Carnegie Mellon University’s online bulletin board in order
to flag humor Davison (2012). As a merger of humor, text, and a symbol, the emoticon became one of the first types
of Internet memes.
While Internet memes can exist as words, emoticons, videos, or gifs, a common form is an image with superimposed
text that conveys some type of merged message. In the earlier days of the Internet, images with superimposed text began
to propagate via Usenet, email, and message boards. By the early 2000’s researchers began to study these specific visual
artifacts that were proliferating. Social networks soon emerged, allowing these memes to go viral.
Given the power of memes to appeal to cultures and sub-cultures, various political actors increasingly use them to
communicate political messaging and change the beliefs and actions of the fabric of a society. Canning even goes so far
as to claim that memes have replaced nuanced political debate Canning, Reinsborough and Smucker (2017). Memes
become a simple and effective way to package a message for a target culture. In particular, memes are used for politics,
magnify echo chambers, and attack minority groups Peirson, Abel and Tolunay (2018). This has jumped into the public
discourse with various articles, including the New York Times article “The mainstreaming of political memes online”
Bowles (2018). The increasing use of Internet memes for “information operations” has led to our effort to detect and
characterize memes that inhabit and propagate within given world events and the conversations that surround them.
Few research efforts have attempted to capture a comprehensive dataset of political memes and the network they
travel through in a political election event, and then document how the memes evolve, propagate, and impact the
ORCID (s): 0000-0003-2814-8712 (D.M. Beskow)
D.M. Beskow et al.: Preprint submitted to Elsevier Page 1 of 16
Detecting and Characterizing Internet Memes
Figure 1: Memes used in conjunction with the US 2018 Midterm Elections.
network. Our work will develop a deep learning method to detect memes in social media streams and leverage graph
learning to cluster these images into meme “families”. We will then apply these methods to Twitter data streams
associated with the 2018 US Mid-term elections and the 2018 Swedish National Elections. In addition to contributing
a theoretical framework for classifying and clustering meme images, our research indicates that memes are shared
less but move to more places on the Internet when compared to non-meme images. Memes therefore spread through
different mechanisms than other “viral” content.
This paper is organized as follows. In section Related Work, we describe the history of the Internet memes, prior
work exploring data analysis approaches to study memes, and deep neural networks that have been used on similar
problems. Then in section 3, we propose Meme-Hunter, a deep learning model to find images on the Internet and
classify them as meme vs non-memes. We then use the models to study the usage of memes in two elections in section
4. Finally, we conclude the findings of this research and suggest directions for future work.
2. Related Work
2.1. History of Internet Memes
The study of memes has existed ever since Richard Dawkins introduced the concept in his book ‘The Selfish Gene’
in the 1970’s Davis (2017). Many researchers have attempted to study the relationship between memes and culture.
The advancement in Internet technologies and the world-wide-web (www) gave meme researchers a laboratory with
which to study the spread and mutation of memes. This led to several books on memes, the most influential and
controversial being Blackmore’s The Meme Machine Blackmore (2000); Shifman (2013).
Linor Shifman has conducted extensive research of digital memes from the perspective of journalism and com-
munication. In 2013 Shifman deviated slightly from Dawson’s original definition and defines the Internet meme as
artifacts that “(a) share common characteristics of content, form, and/or stance; (b) are created with awareness of each
other; and (c) are circulated, imitated, and transformed via the Internet by multiple users” Shifman (2014a,b). She also
differentiates viral content from memetic content. She claims that viral content “is defined here as a clip that spreads
to the masses via digital word-of-mouth mechanisms without significant change.” In contrast, memetic content is “...a
D.M. Beskow et al.: Preprint submitted to Elsevier Page 2 of 16
Detecting and Characterizing Internet Memes
popular clip that lures extensive creative user engagement in the form of parody, pastiche, mash-ups or other derivative
work.”
In 2012 Davidson observes and discusses the fact that Internet memes typically lack attribution Davison (2012).
Unlike many other creative works, authors of Internet memes typically don’t attach their name to the memes they
create. They remove any traces of attribution from the file and its metadata, and usually introduce memes on sites that
offer anonymity (4chan, Reddit, etc.), where they gain popularity before hopping over to mainstream media (Facebook,
Twitter, etc.) Bauckhage, Kersting and Hadiji (2013). Several theories exist that explain this behavior, but Davidson
seems to offer the most logical in that anonymity enables a type of freedom. This freedom allows authors to create and
distribute questionable material without concern for retribution from authorities. It is this lack of certain attribution
that encourages malicious and divisive political actors to resort to memes for information operations.
The far reaching impact of a memes evolution combined with the often inherent anonymity make memes attractive
to various political and propaganda campaigns. The evolutionary nature of memes assists them in ‘hopping’ platforms
to move to additional Internet and social media spaces. The natural anonymity of memes allows various actors to
make it appear that the distribution of the messages is part of a grass roots movement. Donovan and Friedberg discuss
how images can be used to as “evidence collages” in a “source hacking” operation Donovan and Friedberg (2019),
thereby providing seemingly legitimate evidence of a false event or biased conclusion. It is these aspects of political
and propaganda memes that we want to apply our research.
2.2. Meme Detection
Deep neural networks (DNN) have shown great success in many fields Hinton, Deng, Yu, Dahl, Mohamed, Jaitly,
Senior, Vanhoucke, Nguyen, Sainath et al. (2012). Researchers have used DNNs for various vision tasks like the
Imagenet Challenge Deng, Dong, Socher, Li, Li and Fei-Fei (2009); Krizhevsky, Sutskever and Hinton (2012) and
fashion recommendation Song, Feng, Han, Yang, Liu and Nie (2018). DNN’s have also been used for various natural
language processing (NLP) tasks like Part of Speech (POS) tagging and named entity recognition Collobert and Weston
(2008). Ironically, deep learning has more often been used to automatically generate Internet memes as opposed to
find them. In 2013 Wang et al. Wang and Wen (2015) used copula methods to jointly model text and vision features
with popular votes. In 2018 Peirson et al. Peirson et al. (2018) leveraged deep learning to generate memes in a model
they titled “Dank Learning”.
Xie et al. Xie, Natsev, Kender, Hill and Smith (2011) used YouTube to find short video segments that are frequently
reposted which they call video memes. The authors then created a graph of people and content to model interactions.
Unlike video memes, exploring image memes is more challenging as this requires first classifying an image as meme
or not-meme.
The closest research related to our detection effort is the Memesequencer model developed by Dubey et al. Dubey,
Moro, Cebrian and Rahwan (2018). This research separates the meme image template (underlying image) from the
additional text and image manipulation. After separating the meme template it creates a meme embedding by con-
catenating image features and text features using deep learning, with the best model concatenating ResNet18 with
SkipThought text features. Having created an embedding, the authors construct the evolutionary tree using a phylo-
genetic tree. This research is limited to memes that have identifiable templates found on sites like Memegenerator or
Quickmeme. When used to extract memes for social cybersecurity practitioners, the Memegenerator provides high
precision but low recall (see below). Our intent with Meme-Hunter is to increase recall.
2.3. Meme Evolution
The digital footprint that Internet memes leave allows researchers to study the propagation of memes through (and
across) networks. Coscia looked at meme propagation and measurements of success in 2013 Coscia (2013). Bauckhage
et al. Bauckhage et al. (2013) explored the temporal models of fads by looking at Internet memes, approximating
interest in a given meme by using Google Trends. Leskovec et al. Leskovec, Backstrom and Kleinberg (2009) used
memes and phrases extracted from news and blogs to track and study the dynamics of the news cycle. This work was
able to map the evolution of text based memes in the news cycle and blogosphere. Ferrara et al. Ferrara, JafariAsbagh,
Varol, Qazvinian, Menczer and Flammini (2013) focused on clustering text based memes.
The closest research to our study of meme evolution is the study by Zannettou et al. Zannettou, Caulfield, Black-
burn, De Cristofaro, Sirivianos, Stringhini and Suarez-Tangil (2018) that clusters image streams based on pHash and
identifies memetic clusters using meme annotation from sites such as “Know Your Meme”. They apply this process to
multiple sources (Twitter, Reddit, 4chan, Gab) and then use Hawkes process to measure which ecosystem has greater
D.M. Beskow et al.: Preprint submitted to Elsevier Page 3 of 16
Detecting and Characterizing Internet Memes
influence. While focused on meme evolution and influence, this paper does not specifically develop a detection model
that generalizes easily beyond the Know Your Meme annotations, once again rendering low recall in detection appli-
cations. Additionally, this paper clusters only based on the image (via pHash) and does not consider the multi-modal
nature of memes when measuring similarity.
2.4. Meme Optical Character Recognition
The classification process requires learning from a composition of image and text characteristics. Extracting text in
memes requires Optical Character Recognition (OCR). OCR on memes can be challenging since most OCR algorithms
are trained to recognize black font on white background, where many memes are white font on dark background. For
social media image OCR, the state of the art is arguably the Facebook Rosetta system, a deep learning model that
conducts OCR while taking into consideration the background as well Borisyuk, Gordo and Sivakumar (2018). This is
being deployed on Facebook’s platform in order to censor images for extremist messages, allowing Facebook to comply
with increased regulation, particularly in the European Union. Facebook Rosetta output is standard OCR output (text),
and it is not intended to classify memes vs. not-memes. It is also not open sourced or available for researchers (at the
time of this writing).
Our research combines some of the efforts of Zannettou et al. Zannettou et al. (2018) with that of Dubey et al. Dubey
et al. (2018). In doing so, we go beyond both papers by creating a generalizable multi-modal meme detection model
that is not constrained by annotated entries on a site like Know Your Meme. Additionally, we develop the evolutionary
graph with a radius nearest neighbors approach and apply this specifically within the online debate around a large
election event (2018 Mid-term elections). This provides the research community with a gereralizable multi-modal
meme detection model, a new way to build an evolutionary tree, a meme OCR pipeline, and insights into meme impact
and propagation within political conversations. Additionally this model provides approximately 8 times increase in
recall over the template based methods that Dubey and Zannettou propose. This increase in detection recall is especially
important for social cybersecurity practitioners.
3. Classifying Images as Memes
Most images shared on platforms like Twitter are not memes (see Table 3for stats). Therefore, to explore the usage
of memes, it is essential to first classify if an image is a meme or not. While visual Internet memes come in a wide
variety of formats, we restricted our classification to two types that are commonly found. These two types are found
in Figure 4and can be described as:
1. A picture with superimposed white text in impact font. Impact font was developed in the 1960’s by Geoff Lee
and is the font of choice for text over image Edwards. This is illustrated in Figure 4a.
2. Text placed in a white space over a picture, as is shown in Figure 4b.
While this seems restrictive, we will show later that, even with this constraint, our approach finds 8× more memes (i.e.
8× higher recall) than template based methods.
Given enough meme vs non-meme data, it could be possible for a neural-network model to learn to extract text
(using OCR), extract faces and other meme characteristics to classify memes. However, in a limited data setting like
ours, this approach is likely to fail as OCR itself is research domain in itself. Consequently, we propose to first extract
text and face encodings and use them as supplementary input features. Then to predict an image to be meme/non-
meme, we explore deep learning based multi-modal (multiple features) models that use extracted features in addition
to the raw images.
Next, we describe our models, our data collection effort to get meme and non-memes data to train the models, the
process of training the models, and the classification performance.
3.1. Memes Classification Models
As mentioned earlier, we first extract text and face encodings, so here we explain the process of extracting text and
face encodings from images.
Text Extraction For Optical Character Recognition (OCR) we combined meme specific image preprocessing with an
open source OCR tool. When images contained white font over dark background, we preprocessed the images by 1)
converting the image to grayscale, 2) binarizing the image, and 3) inverting every bit in the binary image array. These
D.M. Beskow et al.: Preprint submitted to Elsevier Page 4 of 16
Detecting and Characterizing Internet Memes
Figure 2: OCR Pipeline for Meme Images.
image preprocessing steps are illustrated in Figure 2. OCR on preprocessed images was accomplished with Google
Tesseract Smith (2007). If images already had black text on white background, no preprocessing was applied. Our
experiments indicated that preprocessing significantly improved Tesseract’s OCR on meme images. Baseline Tesseract
required an average of 49.8 ± 13.8character edits (or levenstein distance) with only 2% readability. Preprocessing
reduced this to an average of 17.5±4.8character edits with 72% of strings remaining readable.
Human Face Encoding As faces are an important element of memes, we extract facial features using the open source
face detection software package called face_recognition, created by Adam Geitgey and made available at Geitgey
(2019). The library returns a face encoding vector for each face found in the image. We use these vectors as the input
to our classification models.
We tried four different groups of classifiers: 1) unimodal classification using only text 2) unimodal classification
using only machine vision 3) multimodal classification using text and vision, and 4) multimodal classification using
text, vision, and face encoding.
LSTM based text classifier In this unimodal model, we use only the extracted text from images as the input for meme
classification. Long Short Term Memory (LSTM) Hochreiter and Schmidhuber (1997) networks are very popular for
text classification. An LSTM takes word embedding and a hidden vector as the input and outputs a new hidden vector.
At the end of the text (input), a fully-connected layer followed by a softmax layer is used to predict the label of the
text. We used Glove vectors Pennington, Socher and Manning (2014) as the input word embeddings. In our results, we
provide several other text only models for comparison, including Naïve Bayes, Support Vector Machines, and Logistic
Regression.
CNN based image classifier Given that our work focuses on image based meme detection, and Convolutional Neural
Networks (CNNs) are the most popular models for visual learning, it is natural for us to consider a CNN based model.
For this work, we tried a number of pre-trained models including VGG18 Simonyan and Zisserman (2014), ResNet18
He, Zhang, Ren and Sun (2016), ImagenetV3 Szegedy, Vanhoucke, Ioffe, Shlens and Wojna (2016). For classification,
we removed the last fully connected layer of the pre-trained network, added a new fully connected network followed by
a sigmoid layer. We also explored freezing all layers, freezing some of the layers, and not freezing any of the layers in
the training process. In the end, allowing to update the weights on all layers provided the best results. We also include
results that extract descriptors with scale-invariant feature transform (SIFT) and Bag-of-Visual-Words (BOVW) feature
representation and support vector machine classification. The SIFT-BOVW model is provided to demonstrate DNN
improvement over pre-DNN models.
Joint DNN model The joint DNN model approach starts by combining just the LSTM (discussed above) and CNN
(discussed above), and then combines the LSTM/CNN with face encoding features as a single model. The model’s
architecture is shown in Fig. 3. As shown in the figure, the output of the LSTM, the CNN and face encodings are
concatenated as a single vector. The concatenated vector is then used as the input to a dense fully connect layer
followed by a sigmoid activation. All parts of model are trained jointly.
D.M. Beskow et al.: Preprint submitted to Elsevier Page 5 of 16
Detecting and Characterizing Internet Memes
Give that meme-maker
a life imprisonment
Resnet 18
Face Encoding
LSTM
Meme/ Non-meme
Softmax
Tex t e x tr ac t io n
Face extracti on
Figure 3: Joint Model for meme classification.
In the last connected layer we use a sigmoid (or logistic) function to generate a probability of the image being a
meme. The sigmoid function is defined as
𝜙(𝑧) = 1
1 − 𝑒𝑥𝑝(𝑧)
3.2. Data
To label Internet meme images for supervised learning, we searched meme images on Reddit, Twitter, Tumblr,
Google Image Search, Flickr, and Instagram. Collecting images from these platforms, we were able to find 25,109
meme images. The meme data contained varied meme categories, including sports, politics, celebrities, and animals.
While the dominant language is English, other languages include French, Spanish, German, Russian, Japanese, Arabic,
and Chinese. The non-meme images were collected at random from Twitter and Google Image search.
In the training data we filtered out non-meme images that didn’t contain either text or a background photo. This
was done so that the algorithm would learn the unique attributes of meme images as opposed to just learning to identify
the presence or absence of text. In order to filter for text, we needed to conduct text detection but not necessarily text
recognition. We found that the Efficient and Accurate Scene Text (EAST) detection model Zhou, Yao, Wen, Wang,
Zhou, He and Liang (2017) performed better at detecting text than the Tesseract based OCR pipeline discussed earlier.
Note that the East model detects the location of text in an image but does not recognize or extract the text. We used the
EAST algorithm to filter out any images that didn’t have at least one text bounding box. Having removed images that
don’t contain text, we discovered that we also needed to remove images that don’t contain a photograph. This decision
was made after finding many black and white document images, particularly in political conversations. To remove
document images, we developed a heuristic that measured the mean Red Green Blue (RGB) score for the image, and
removed it if the mean score was greater than 220. This proved to be fast and easily removed document images without
removing memes of interest. This filter was applied in both the training process as well as the production algorithm.
Image is document if 𝑅𝑒𝑑 +𝐺𝑟𝑒𝑒𝑛 +𝐵𝑙𝑢𝑒
3>220
We summarize the final model training dataset in Table 1. The 50,209 images were mixed with equal portions of
meme and not-meme images. The data was then randomly split into training data (80%), validation data (10%), and
held out test data (10%).
D.M. Beskow et al.: Preprint submitted to Elsevier Page 6 of 16
Detecting and Characterizing Internet Memes
(a) Type A Meme (b) Type B Meme
(c) Saliency in Type A Meme (d) Saliency in Type B Meme
Figure 4: Two types of memes used for meme classification with their respective saliency maps. Saliency maps are
computed by averaging pooled gradients across channels.
Table 1
Classification Dataset Statistics.
Total Images Memes Non-memes
50,209 25,109 25,100
Collecting images from social media streams often includes some amount of abusive language and adult content
images. Practitioners using our methods who want to minimize the impact of this sensitive content should have an
appropriate filter. In our case we used Yahoo’s Open Source “Not Safe For Work” (NSFW) filter (https://github.
com/yahoo/open_nsfw).
3.3. Experiments and Results
For the meme classification task, we define the overall objective function using cross-entropy loss, as can be seen
in Equation 1, where 𝑖∈𝑛samples, 𝑗∈ {𝑚𝑒𝑚𝑒, 𝑛𝑜𝑛-𝑚𝑒𝑚𝑒}classes, 𝑦is the (one-hot) true label, 𝑝is the probability
output for each label.
ℒ(𝑦, 𝑝)=−1
𝑛∑
𝑖,𝑗
𝑦𝑖𝑗 log(𝑝𝑖𝑗 )(1)
Our primary metric of interest is the F1 score, defined as the harmonic mean of precision and recall. We used
this as our primary metric since it balances the often competing priority of precision vs. recall. In our results we also
provide accuracy, precision, and recall for interpretability.
All models are built using Keras library1with Tensorflow backend 2. As described earlier, the models use text,
1https://keras.io/
2https://www.tensorflow.org/
D.M. Beskow et al.: Preprint submitted to Elsevier Page 7 of 16
Detecting and Characterizing Internet Memes
Table 2
Classification Results.
Type Model Accuracy F1 Precision Recall
Text
Logistic Regression 0.724 0.719 0.735 0.703
Naïve Bayes 0.681 0.607 0.793 0.492
SVM 0.721 0.714 0.736 0.693
LSTM 0.799 0.805 0.786 0.824
Vision
SIFT-BOVW 0.798 0.788 0.828 0.752
Baseline CNN 0.939 0.938 0.946 0.930
VGG18 0.915 0.916 0.909 0.923
ResNet18 0.926 0.927 0.907 0.948
Inception-V3 0.958 0.958 0.952 0.964
Multi-modal
Vision + Text 0.954 0.954 0.943 0.965
Vision + Text Length 0.952 0.951 0.947 0.956
Vision + Text + Face 0.961 0.961 0.959 0.963
face-encoding, and image features as the input and a sigmoid layer for the class label prediction. The models are trained
using stochastic gradient descent with a cross-entropy loss function as seen in Equation 1. The learning rate was used
as a hyper-parameter and varied from 10−3 to 10−1. The LSTM hidden layer size was varied from 16 to 256. We found
that a hidden layer size of 50 and a learning rate of 10−3 worked well. These hyper-parameters were then fixed during
the training and testing process.
We compare the performance of the models in Table. 2and show the training plots in Fig. 5. We train the models
for only 10 epochs since the performance plateaus after that. As we can observe from the plots, most of the learning
is done in the first epoch and validation accuracy is high thereafter. From these results we see that the LSTM model is
significantly better than other text models. Within the Vision Models, we see that all DNN models show significantly
improvement over the SIFT-BOVW model, with the Inception-V3 very deep model providing the best performance
across all metrics. We do see that the multi-modal models provide slight improvement over unimodal vision models.
Model saliency maps Simonyan, Vedaldi and Zisserman (2013) are provided in Figures 4c and 4d. Saliency maps show
the salient pixels that are important for a given class and are computed by averaging pooled gradients across channels.
From these saliency maps we see that we are indeed learning to identify images where the text is positionally located
in pixel locations that are indicative of meme images. Overall we can summarize results by claiming that unimodal
machine vision models provide solid performance in meme detection, and can be enhanced (at a computational cost)
with multi-modal text based features.
4. Evaluating Memes in Election Events
4.1. Finding Memes
We used the DNN model to classify images used in the 2018 US Midterm Elections and the 2018 Swedish National
Elections. We will focus on the 2018 US Midterm election data because it provides the largest meme collection, but
the 2018 Swedish election data is provided in Table 3for comparison purposes. For the US midterm elections, we
collected all tweets that mentioned a member of congress or congressional candidate. For the Swedish elections, we
collected tweets containing hashtags associated with anti-immigrant and nationalistic movements (#svpol, #Val2018,
#feministisktInitiativ, #migpol, #valet2018, #SD2018, #AfS2018, and #MEDval18). Note that the Swedish election
data does not cover the full spectrum of politics in Sweden, but the US Midterm election data does cover the full
spectrum of politics in the United States. We downloaded all images from both data sets in February of 2019. As
indicated below, approximately 9% of the images weren’t available (the account or tweet was suspended by Twitter or
removed by the account owner). The statistics for both data sets are provided in Table 3.
We conducted binary classification with our trained DNN model on all images extracted from both data streams.
A collage of examples that we classified as memes in the US mid-term elections is provided in Figure 1.
D.M. Beskow et al.: Preprint submitted to Elsevier Page 8 of 16
Detecting and Characterizing Internet Memes
0.925
0.950
0.975
1.000
2 4 6 8 10
epoch
Accuracy
Model
Vision
Vision + Text
Vision + Text + Face
Type
Training_Accuracy
Validation_Accuracy
Figure 5: Comparing training and test performance of different models.
4.2. Mapping Meme Evolution in Political Conversations
Given the rich vision/text data that we had, we wanted to map the evolution of visual memes using similarity clus-
tering. By clustering these images, we can not only identify the families but also the connections between the families
of memes. We explored several proven methods for measuring image similarity, to include Color Histograms Novak
and Shafer (1992), Scale-Invariant Feature Transform (SIFT) Lowe (2004), Perceptual Hashing (pHash) Chamoso, Ri-
vas, Martín-Limorti and Rodríguez (2017), and a method similar to the Deep Ranking Wang, Song, Leung, Rosenberg,
Wang, Philbin, Chen and Wu (2014). Similar methods have been used with K-nearest neighbors for image annota-
tion Su and Xue (2015) and with mapReduce by Google for clustering billions of images Liu, Rosenberg and Rowley
(2007). Our initial experiments reveal that the deep ranking method (using features extracted from the last layer before
softmax and evaluated with euclidean distance) performs well. To identify the families of memes, we finally use graph
learning with fixed radius nearest neighbors algorithm Bentley (1975). Fixed radius nearest neighbors finds the neigh-
bors within a given radius of a point or points. We chose fixed-radius method over the K-nearest neighbors method
since the size of our meme families vary widely. This technique also allows us to quickly query similar images based
on a fixed distance radius.
Given a meme, we use ‘brute-force’ based radius neighbour algorithm to find the mutations of the meme. We
attempted to use the ball tree algorithm Omohundro (1989), which partitions meme features into a nested set of fixed
dimensional hyper-spheres (balls) such that each hyper-sphere contains a set of memes based on its distance from the
balls center. Although the ball-tree was designed for high dimensionality, we found that this is still computationally
expensive with more than 120 features. With 25,088 features, we found that the ball-tree algorithm was not practical,
and resorted to the brute force algorithm. Once we have the neighbours of a meme, we can use time of the posting
associated with the meme to generate a directed graph of meme mutations. We recurse the whole process over the
neighbours to get the next set of neighbour and add them to the graph. We stop the recursion after a fixed set of steps
or if the max size of the graph is attained. The algorithm is summarized below (Algorithm. 1). The map of all nodes
and links for the 2018 US Midterm elections is provided in Figure 6. In this we clearly see the clusters of similar
images (or “families”), as well as some of the connections between them.
Having mapped the individual “families” of memes, we used this similarity clustering and the date-time information
from the Tweet metadata to map the chronological evolution of specific memes as seen in Figure 7. In these images we
D.M. Beskow et al.: Preprint submitted to Elsevier Page 9 of 16
Detecting and Characterizing Internet Memes
Figure 6: Graph Learning with Fixed Radius Nearest Neighbors showing families of memes in the US 2018 mid-term
elections (89K nodes and 1.87M links). Network visualization is done with Graphistry (https://www.graphistry.com/).
Algorithm 1 Memes Mutation Graph Algorithm
1: procedure GETMUTATIONGRAPH(Meme m)
2: memes_graph ←new dictionary
3: neighbours ←Get radius neighbours
4: for 𝑏𝑖in neighbours do
5: if 𝑏𝑖not in memes_graph then
6: Add 𝑏𝑖to memes_graph
7: for 𝑏𝑖in neighbours do
8: if size(memes_graph)≤exit_condition then
9: child_memes_graph ←𝑔𝑒𝑡𝑀𝑢𝑡𝑎𝑡𝑖𝑜𝑛𝐺𝑟𝑎𝑝ℎ(b𝑖)
10: Add child_memes_graph to memes_graph
11: return memes_graph
see the cultural evolution that was originally envisioned by Richard Dawkins. We also see Linor Shifman’s definition
of memes play out as these meme images “lure extensive creative user engagement.”
4.3. Results and Findings
4.3.1. Memes Usage in Election Events
Having identified memes thriving in the online conversation around these election events, we calculated descriptive
statistics regarding memes and the accounts that share them. These descriptive statistics are provided in Table 3. In this
table we make several observations that help us understand meme popularity and virality. First, we see that, although
images are generally popular (high retweet/likes), memes are not. In both events, memes had fewer retweets and likes
than other images, and in the US election memes had a shorter “life-span” on average. We hypothesize that the reason
behind this is that attributed users do not want to associate their reputation with a controversial political meme and its
message. For the same reasons that meme creators disassociate themselves from the memes they create, social media
users, while influenced by memes, are hesitant to like or retweet them, especially polarizing political memes. If this
D.M. Beskow et al.: Preprint submitted to Elsevier Page 10 of 16
Detecting and Characterizing Internet Memes
Figure 7: Political conversations within and between political left and political right.
is the case, then the virality of memes may not be due to normal social media activity (like,share,retweet), but rather
occurs through the selection, retention, and mutation that Dawkins originally described. The memes mutate, carrying
pieces of the original message, and are reintroduced in other corners of the Internet.
We hypothesized that bots could be used to push memes on social media. Using the bot-hunter bot prediction tool
Beskow and Carley (2018) with a probability threshold of 0.6, we predicted the portion of accounts that have bot-like
characteristics. In the Swedish data we found a slightly higher bot involvement with memes, but did not find this in the
US election data. From this analysis we conclude that bot activity did not play an out-sized role in meme propagation
for either of these events.
Additionally, we conducted face detection on the US election memes to find 18 prominent US politicians in the
meme data. To do this we leveraged the open source face detection software created by Adam Geitgey and made
available at Geitgey (2019), using a comparison threshold of 0.54. Using this face detection software, we found the
distribution of memes by politician provided in Figure 8.
In Figure 9we’ve plotted the posting or retweeting of meme images in the 2018 US Election by the political party of
the candidate mentioned. Note that politicians and candidates are mentioned with both positive and negative memes.
In this case, we see the highest volume of meme mentioning Democrats and Republicans associated with the time
immediately after the Kavanaugh hearings.
4.3.2. Meme Propagation Across Platforms
Given the evolutionary and anonymous nature of memes, we hypothesized that memes propagate across the Inter-
net differently than other viral content. Viral content is generally spread through the simple mechanisms of sharing,
retweeting, liking, etc. Memes, as noted above, aren’t liked or retweeted near as much as other media content. We
believe that their propagation occurs more through their mutation and evolution, where one meme generates other
creative works that emerge in other parts of the Internet. This would cause memes to ‘hop’ to more platforms and
domains than normal images. While propagating to new corners of the Internet, however, the memes will undoubtedly
morph, and this mutations is out of the hands and control of the original creators.
To assess this hypothesis, we sampled 5,000 meme images and 5,000 non-meme images from images associated
D.M. Beskow et al.: Preprint submitted to Elsevier Page 11 of 16
Detecting and Characterizing Internet Memes
Table 3
Descriptive Statistics about Internet Memes in Online Election Conversations.
2018 Sweden Election US Midterm Election
Total Tweets 661K 62,034K
Total Users 88K 2,695K
Suspended/removed 1,616/2,302 41,901/47,349
Total Images Shared 47K 4,446K
Total Images Available 43K 4,037K
no
image meme normal
image
no
image meme normal
image
# Images Available 5K 38K 497K 3,539K
# of Unique Images 1.5K 10K 175K 951K
% of bot-like accounts 0.32 0.35 0.31 0.37 0.32 0.28
Life of tweet (hours) 0.51 0.60 0.59 21.80 16.02 22.87
Mean retweets 26 15 33 3,492 237 3,478
Mean Likes 0.84 1.50 2.03 15.96 24.42 65.48
User Median Followers 246 259 224 594 190 258
User Median Friends 348 401 340 857 375 407
Figure 8: Memes by Politician (identified by Face Detection).
with the 2018 US Mid-term elections. All images were political in semantic and visual content. We then conducted a
reverse image lookup or web-detection using the Google Vision API. This service provided us with links to matching
and partially matching images on the Internet. The 5,000 meme images had 62,475 matching links associated with
9,536 unique domains. The 5,000 non-meme images had only 13,617 total links associated with only 4,731 unique
domain names. The memes therefore were connected to roughly 4 times the number of links and twice the number
of domains when compared to non-meme images, supporting the hypothesis that memes propagate to more corners of
the Internet than other types of media.
D.M. Beskow et al.: Preprint submitted to Elsevier Page 12 of 16
Detecting and Characterizing Internet Memes
0
5000
10000
15000
Sep 01 Sep 15 Oct 01 Oct 15 Nov 01
count
Party Democrat Other Republican
Figure 9: Memes (both positive and negative) by Political Party of Candidate mentioned.
4.4. Comparison to Past Methods
In our section looking at related works, we noted several research efforts that leverage meme templates. These
efforts include multi-model efforts by Dudley et al. Dubey et al. (2018) and meme evolution effort by Zannettou et
al. Zannettou et al. (2018). While Dubey uses this technique for virality prediction and clustering, we primarily want
to compare their approach to meme hunter for the task of image retrieval (i.e. extracting all meme images in a given
social media stream). The primary limitation to their work is that it is constrained to identify memes found on sites like
Memegenerator or Quickmeme. As we illustrate below, this approach, while generating high precision, finds very few
of the total memes in election related social media streams (low recall). The Meme-Hunter approach that we propose,
while limited to only two types of memes, typically finds at least 8 times more memes in election related social media
streams as approaches constrained by meme templates.
To evaluate both methods, we randomly sampled 1,050 images from both the Swedish election event and 1,050
images from the 2018 Midterm election stream. We then manually labeled any image that could be construed as an
Internet meme as defined by Dawkins and Schifman. We then ran our Meme-Hunter approach and compared this to a
template based approach.
To replicate a template based approach, we collected 39,112 meme templates from the Meme Generator web
application found at https://imgflip.com/memegenerator. This included most of the popular and even less
popular meme templates used, to include meme templates associated with politicians and world leaders. We then used
perceptual hashing (phash) to identify any image in the test image set that used one of the meme templates. Positive
matches were determined by those hashes that required less than 10 substitutions in a hamming distance comparison.
Positive matches were then considered memes.
Meme hunter was applied with unimodal machine vision models as well as multi-modal models as indicated in
Table 4. In this comparison we see that, while template based approaches offer high accuracy and precision, the recall
in both election based streams is only approximate 5%. In these very dynamic political dialogues, many images that are
construed as memes are not yet in the template databases. This means that using template based methods will only find
5% of the memes in these streams. The Meme Hunter approach, while offering slightly lower accuracy and precision, is
able to find 8× more memes, with the InceptionV3 unimodal model and all multi-modal models providing the highest
performance across all metrics. We see that, in regards to accuracy metric, multi-modal consistently outperforms
unimodal models. The top models using the Meme-Hunter DNN approach find approximately 50% of the images in
both streams.
In this comparison we also want to comment on the lower performance of Meme-Hunter in the US Midterm stream
compared to the Swedish election stream. This is the result of more sophisticated memes being used in the US election
D.M. Beskow et al.: Preprint submitted to Elsevier Page 13 of 16
Detecting and Characterizing Internet Memes
Table 4
Comparing Meme-Hunter to meme template based approaches to find memes in social media streams.
Sweden US Midterms
Model Accuracy F1 Precision Recall Accuracy F1 Precision Recall
Template Based 0.872 0.107 0.727 0.058 0.795 0.100 0.667 0.054
VGG18 0.809 0.437 0.358 0.561 0.771 0.348 0.435 0.290
ResNet18 0.846 0.464 0.429 0.504 0.806 0.430 0.562 0.348
Inception V3 0.820 0.488 0.391 0.647 0.807 0.494 0.550 0.448
Vision + Text 0.865 0.510 0.490 0.532 0.815 0.455 0.600 0.367
Vision + Text + Face 0.858 0.511 0.470 0.561 0.812 0.439 0.592 0.348
stream, some of which are elaborate photo editing work flows and contain no text. Others contain vertical text or
specially placed text. Meme-Hunter will struggle to positively identify these more sophisticated memes.
5. Conclusion
In this paper we present a method for using deep learning to classify memes and graph learning to cluster them into
their evolutionary “families”. Additionally, these models were used to analyze meme usage inside two large democratic
election events. We found that Meme-Hunter provided at least 8 times higher recall than template based methods and
that graph learning is able to capture the overall structure of the evolutionary tree. Having identified memes in large
election events, we found evidence that memes are liked and retweeted less, but families of memes ‘hop’ platforms
and travel to more locations of the Internet than regular images. This indicates that memes do not propagate across
social media and the Internet in the same way as other viral content.
The organic and evolutionary nature of memes has caused some nation states to ban them McDonell (2017), while
encouraging other nations to leverage them as part of elaborate propaganda operations Groll (2018). The countries
that ban them do so largely because memes evolve outside of the control of the state and because image memes can be
difficult to trace Abad-Santos (2013). Those countries that leverage them for information warfare do so for the exact
same reasons. We hope that our proposed methods to study memes would provide more possibilities to trace memes
for good causes.
In future work we plan to use the learned graph and dynamic network analysis to analyze the evolution of the meme
families over time.
6. Acknowledgements
This work was supported in part by the Office of Naval Research (ONR) Award N00014182106 Group Polarization
in Social Media and the Center for Computational Analysis of Social and Organization Systems (CASOS). The views
and conclusions contained in this document are those of the authors and should not be interpreted as representing the
official policies, either expressed or implied, of the ONR or the U.S. Government.
References
Abad-Santos, A., 2013. How memes became the best weapon against chinese internet censorship - the atlantic. https://www.theatlantic.com/
international/archive/2013/06/how-memes- became-best- weapon-against-chinese- internet-censorship/314618/. (Ac-
cessed on 04/06/2019).
Bauckhage, C., Kersting, K., Hadiji, F., 2013. Mathematical models of fads explain the temporal dynamics of internet memes., in: ICWSM, pp.
22–30.
Bentley, J.L., 1975. A Survey of Techniques for Fixed Radius near Neighbor Searching. Technical Report.
Beskow, D., Carley, K.M., 2018. Introducing bothunter: A tiered approach to detection and characterizing automated activity on twitter, in: Bisgin,
H., Hyder, A., Dancy, C., Thomson, R. (Eds.), International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and
Behavior Representation in Modeling and Simulation, Springer.
Blackmore, S., 2000. The Meme Machine. volume 25. Oxford Paperbacks.
Blackmore, S., Dugatkin, L.A., Boyd, R., Richerson, P.J., Plotkin, H., 2000. The power of memes. Scientific American 283, 64–73.
D.M. Beskow et al.: Preprint submitted to Elsevier Page 14 of 16
Detecting and Characterizing Internet Memes
Borisyuk, F., Gordo, A., Sivakumar, V., 2018. Rosetta: Large scale system for text detection and recognition in images, in: Proceedings of the 24th
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM. pp. 71–79.
Bowles, N., 2018. The mainstreaming of political memes online. New York Times URL: https://www.nytimes.com/interactive/2018/
02/09/technology/political-memes- go-mainstream.html.
Canning, D., Reinsborough, P., Smucker, J.M., 2017. Re: Imagining Change: How to Use Story-based Strategy to Win Campaigns, Build Move-
ments, and Change the World. Pm Press.
Chamoso, P., Rivas, A., Martín-Limorti, J.J., Rodríguez, S., 2017. A hash based image matching algorithm for social networks, in: International
Conference on Practical Applications of Agents and Multi-Agent Systems, Springer. pp. 183–190.
Collobert, R., Weston, J., 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning, in: Proceed-
ings of the 25th international conference on Machine learning, ACM. pp. 160–167.
Coscia, M., 2013. Competition and success in the meme pool: A case study on quickmeme. com., in: ICWSM.
Davis, N., 2017. The Selfish Gene. Macat Library.
Davison, P., 2012. The language of internet memes. The social media reader , 120–134.
Dawkins, R., 2006. The selfish gene: With a new introduction by the author. UK: Oxford University Press.(Originally published in 1976) .
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L., 2009. Imagenet: A large-scale hierarchical image database, in: 2009 IEEE conference
on computer vision and pattern recognition, Ieee. pp. 248–255.
Donovan, J., Friedberg, B., 2019. Source hacking: Media manipulation in practice. Retrieved from Data&Society website: https://datasociety.
net/output/source-hacking-media-manipulation-in-practice .
Dubey, A., Moro, E., Cebrian, M., Rahwan, I., 2018. Memesequencer: Sparse matching for embedding image macros, in: Proceedings of the 2018
World Wide Web Conference, International World Wide Web Conferences Steering Committee. pp. 1225–1235.
Edwards, P., . The reason every meme uses that one font - vox. https://www.vox.com/2015/7/26/9036993/meme- font-impact. (Accessed
on 02/20/2019).
Ferrara, E., JafariAsbagh, M., Varol, O., Qazvinian, V., Menczer, F., Flammini, A., 2013. Clustering memes in social media, in: 2013 IEEE/ACM
International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013), IEEE. pp. 548–555.
Geitgey, A., 2019. Face recognition. https://github.com/ageitgey/face_recognition.
Groll, E., 2018. How russia hacked u.s. politics with instagram marketing – foreign policy. https://foreignpolicy.com/2018/12/17/
how-russia- hacked-us- politics-with-instagram- marketing/. (Accessed on 04/06/2019).
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision
and pattern recognition, pp. 770–778.
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al., 2012. Deep neural
networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 82–97.
Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural computation 9, 1735–1780.
Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep convolutional neural networks, in: Advances in neural infor-
mation processing systems, pp. 1097–1105.
Leskovec, J., Backstrom, L., Kleinberg, J., 2009. Meme-tracking and the dynamics of the news cycle, in: Proceedings of the 15th ACM SIGKDD
international conference on Knowledge discovery and data mining, ACM. pp. 497–506.
Liu, T., Rosenberg, C., Rowley, H.A., 2007. Clustering billions of images with large scale nearest neighbor search, in: 2007 IEEE Workshop on
Applications of Computer Vision (WACV’07), IEEE. pp. 28–28.
Lowe, D.G., 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision 60, 91–110.
McDonell, S., 2017. Why china censors banned winnie the pooh - bbc news. https://www.bbc.com/news/blogs- china-blog-40627855.
(Accessed on 04/06/2019).
Novak, C.L., Shafer, S.A., 1992. Anatomy of a color histogram, in: Proceedings 1992 IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, IEEE. pp. 599–605.
Omohundro, S.M., 1989. Five Balltree Construction Algorithms. International Computer Science Institute Berkeley.
Peirson, V., Abel, L., Tolunay, E.M., 2018. Dank learning: Generating memes using deep neural networks. arXiv preprint arXiv:1806.04510 .
Pennington, J., Socher, R., Manning, C., 2014. Glove: Global vectors for word representation, in: Proceedings of the 2014 conference on empirical
methods in natural language processing (EMNLP), pp. 1532–1543.
Shifman, L., 2012. An anatomy of a youtube meme. new media & society 14, 187–203.
Shifman, L., 2013. Memes in a digital world: Reconciling with a conceptual troublemaker. Journal of Computer-Mediated Communication 18,
362–377.
Shifman, L., 2014a. The cultural logic of photo-based meme genres. Journal of Visual Culture 13, 340–358.
Shifman, L., 2014b. Memes in Digital Culture. MIT press.
Simonyan, K., Vedaldi, A., Zisserman, A., 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps.
arXiv preprint arXiv:1312.6034 .
Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 .
Smith, R., 2007. An overview of the tesseract ocr engine, in: Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International
Conference on, IEEE. pp. 629–633.
Song, X., Feng, F., Han, X., Yang, X., Liu, W., Nie, L., 2018. Neural compatibility modeling with attentive knowledge distillation, in: The 41st
International ACM SIGIR Conference on Research & Development in Information Retrieval, ACM. pp. 5–14.
Su, F., Xue, L., 2015. Graph learning on k nearest neighbours for automatic image annotation, in: Proceedings of the 5th ACM on International
Conference on Multimedia Retrieval, ACM. pp. 403–410.
Szablewicz, M., 2014. The ‘losers’ of china’s internet: Memes as ‘structures of feeling’for disillusioned young netizens. China Information 28,
259–275.
D.M. Beskow et al.: Preprint submitted to Elsevier Page 15 of 16
Detecting and Characterizing Internet Memes
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z., 2016. Rethinking the inception architecture for computer vision, in: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826.
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y., 2014. Learning fine-grained image similarity with deep
ranking, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1386–1393.
Wang, W.Y., Wen, M., 2015. I can has cheezburger? a nonparanormal approach to combining textual and visual information for predicting and gen-
erating popular meme descriptions, in: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, pp. 355–365.
Xie, L., Natsev, A., Kender, J.R., Hill, M., Smith, J.R., 2011. Visual memes in social media: Tracking real-world news in youtube videos, in:
Proceedings of the 19th ACM International Conference on Multimedia, ACM, New York, NY, USA. pp. 53–62. URL: http://doi.acm.org/
10.1145/2072298.2072307, doi:10.1145/2072298.2072307.
Zannettou, S., Caulfield, T., Blackburn, J., De Cristofaro, E., Sirivianos, M., Stringhini, G., Suarez-Tangil, G., 2018. On the origins of memes by
means of fringe web communities, in: Proceedings of the Internet Measurement Conference 2018, ACM. pp. 188–202.
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J., 2017. EAST: An efficient and accurate scene text detector, in: 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), IEEE. pp. 2642–2651.
D.M. Beskow et al.: Preprint submitted to Elsevier Page 16 of 16