ArticlePDF Available

The Evolution of Political Memes: Detecting and Characterizing Internet Memes with Multi-modal Deep Learning

Authors:

Abstract and Figures

Combining humor with cultural relevance, Internet memes have become a ubiquitous artifact of the digital age. As Richard Dawkins described in his book The Selfish Gene, memes behave like cultural genes as they propagate and evolve through a complex process of `mutation' and `inheritance'. On the Internet, these memes activate inherent biases in a culture or society, sometimes replacing logical approaches to persuasive argument. Despite their fair share of success on the Internet, their detection and evolution have remained understudied. In this research, we propose and evaluate Meme-Hunter, a multi-modal deep learning model to classify images on the Internet as memes vs non-memes, and compare this to uni-modal approaches. We then use image similarity, meme specific optical character recognition, and face detection to find and study families of memes shared on Twitter in the 2018 US Mid-term elections. By mapping meme mutation in an electoral process, this study confirms Richard Dawkins' concept of meme evolution.
Content may be subject to copyright.
The Evolution of Political Memes: Detecting and Characterizing
Internet Memes with Multi-modal Deep Learning
David M. Beskow, Sumeet Kumar and Kathleen M. Carley
School of Computer Science
Carnegie Mellon University
5000 Forbes Ave, Pittsburgh, PA 15213, USA
ARTICLE INFO
Keywords:
Deep Learning
Multi-modal learning
Computer vision
Meme-detection
Meme
ABSTRACT
Combining humor with cultural relevance, Internet memes have become a ubiquitous artifact of
the digital age. As Richard Dawkins described in his book The Selfish Gene, memes behave like
cultural genes as they propagate and evolve through a complex process of ‘mutation’ and ‘inher-
itance’. On the Internet, these memes activate inherent biases in a culture or society, sometimes
replacing logical approaches to persuasive argument. Despite their fair share of success on the
Internet, their detection and evolution have remained understudied. In this research, we propose
and evaluate Meme-Hunter, a multi-modal deep learning model to classify images on the Internet
as memes vs non-memes, and compare this to uni-modal approaches. We then use image simi-
larity, meme specific optical character recognition, and face detection to find and study families
of memes shared on Twitter in the 2018 US Mid-term elections. By mapping meme mutation in
an electoral process, this study confirms Richard Dawkins’ concept of meme evolution.
1. Introduction
Richard Dawkins first coined the word meme in his now famous book The Selfish Gene Dawkins (2006). He
developed the word meme by shortening the Greek word mimeme in an effort to create a “...noun that conveys the
idea of a unit of cultural transmission, or a unit of imitation.” Dawkins indicated that memes function like genes for
culture, and can undergo variation, selection, and retention. The meme is further defined as “an idea, behavior, style or
usage that spreads from person to person within a culture” Blackmore, Dugatkin, Boyd, Richerson and Plotkin (2000).
Examples of memes include shaking hands and singing “Happy Birthday”. As such, memes become building blocks
of complex cultures Shifman (2012).
Internet memes include any digital unit that transfers culture. This can be as simple as a phrase or hashtag, such as
the Diasoi meme in China Szablewicz (2014) or the #MeToo movement in America. The Internet provides an envi-
ronment for digital memes to quickly move from person to person, often mutating in the process as initially envisioned
by Dawkins. In 1982 the first emoticon (:-)) was used on Carnegie Mellon University’s online bulletin board in order
to flag humor Davison (2012). As a merger of humor, text, and a symbol, the emoticon became one of the first types
of Internet memes.
While Internet memes can exist as words, emoticons, videos, or gifs, a common form is an image with superimposed
text that conveys some type of merged message. In the earlier days of the Internet, images with superimposed text began
to propagate via Usenet, email, and message boards. By the early 2000’s researchers began to study these specific visual
artifacts that were proliferating. Social networks soon emerged, allowing these memes to go viral.
Given the power of memes to appeal to cultures and sub-cultures, various political actors increasingly use them to
communicate political messaging and change the beliefs and actions of the fabric of a society. Canning even goes so far
as to claim that memes have replaced nuanced political debate Canning, Reinsborough and Smucker (2017). Memes
become a simple and effective way to package a message for a target culture. In particular, memes are used for politics,
magnify echo chambers, and attack minority groups Peirson, Abel and Tolunay (2018). This has jumped into the public
discourse with various articles, including the New York Times article “The mainstreaming of political memes online”
Bowles (2018). The increasing use of Internet memes for “information operations” has led to our effort to detect and
characterize memes that inhabit and propagate within given world events and the conversations that surround them.
Few research efforts have attempted to capture a comprehensive dataset of political memes and the network they
travel through in a political election event, and then document how the memes evolve, propagate, and impact the
ORCID (s): 0000-0003-2814-8712 (D.M. Beskow)
D.M. Beskow et al.: Preprint submitted to Elsevier Page 1 of 16
Detecting and Characterizing Internet Memes
Figure 1: Memes used in conjunction with the US 2018 Midterm Elections.
network. Our work will develop a deep learning method to detect memes in social media streams and leverage graph
learning to cluster these images into meme “families”. We will then apply these methods to Twitter data streams
associated with the 2018 US Mid-term elections and the 2018 Swedish National Elections. In addition to contributing
a theoretical framework for classifying and clustering meme images, our research indicates that memes are shared
less but move to more places on the Internet when compared to non-meme images. Memes therefore spread through
different mechanisms than other “viral” content.
This paper is organized as follows. In section Related Work, we describe the history of the Internet memes, prior
work exploring data analysis approaches to study memes, and deep neural networks that have been used on similar
problems. Then in section 3, we propose Meme-Hunter, a deep learning model to find images on the Internet and
classify them as meme vs non-memes. We then use the models to study the usage of memes in two elections in section
4. Finally, we conclude the findings of this research and suggest directions for future work.
2. Related Work
2.1. History of Internet Memes
The study of memes has existed ever since Richard Dawkins introduced the concept in his book ‘The Selfish Gene’
in the 1970’s Davis (2017). Many researchers have attempted to study the relationship between memes and culture.
The advancement in Internet technologies and the world-wide-web (www) gave meme researchers a laboratory with
which to study the spread and mutation of memes. This led to several books on memes, the most influential and
controversial being Blackmore’s The Meme Machine Blackmore (2000); Shifman (2013).
Linor Shifman has conducted extensive research of digital memes from the perspective of journalism and com-
munication. In 2013 Shifman deviated slightly from Dawson’s original definition and defines the Internet meme as
artifacts that “(a) share common characteristics of content, form, and/or stance; (b) are created with awareness of each
other; and (c) are circulated, imitated, and transformed via the Internet by multiple users” Shifman (2014a,b). She also
differentiates viral content from memetic content. She claims that viral content “is defined here as a clip that spreads
to the masses via digital word-of-mouth mechanisms without significant change.” In contrast, memetic content is “...a
D.M. Beskow et al.: Preprint submitted to Elsevier Page 2 of 16
Detecting and Characterizing Internet Memes
popular clip that lures extensive creative user engagement in the form of parody, pastiche, mash-ups or other derivative
work.”
In 2012 Davidson observes and discusses the fact that Internet memes typically lack attribution Davison (2012).
Unlike many other creative works, authors of Internet memes typically don’t attach their name to the memes they
create. They remove any traces of attribution from the file and its metadata, and usually introduce memes on sites that
offer anonymity (4chan, Reddit, etc.), where they gain popularity before hopping over to mainstream media (Facebook,
Twitter, etc.) Bauckhage, Kersting and Hadiji (2013). Several theories exist that explain this behavior, but Davidson
seems to offer the most logical in that anonymity enables a type of freedom. This freedom allows authors to create and
distribute questionable material without concern for retribution from authorities. It is this lack of certain attribution
that encourages malicious and divisive political actors to resort to memes for information operations.
The far reaching impact of a memes evolution combined with the often inherent anonymity make memes attractive
to various political and propaganda campaigns. The evolutionary nature of memes assists them in ‘hopping’ platforms
to move to additional Internet and social media spaces. The natural anonymity of memes allows various actors to
make it appear that the distribution of the messages is part of a grass roots movement. Donovan and Friedberg discuss
how images can be used to as “evidence collages” in a “source hacking” operation Donovan and Friedberg (2019),
thereby providing seemingly legitimate evidence of a false event or biased conclusion. It is these aspects of political
and propaganda memes that we want to apply our research.
2.2. Meme Detection
Deep neural networks (DNN) have shown great success in many fields Hinton, Deng, Yu, Dahl, Mohamed, Jaitly,
Senior, Vanhoucke, Nguyen, Sainath et al. (2012). Researchers have used DNNs for various vision tasks like the
Imagenet Challenge Deng, Dong, Socher, Li, Li and Fei-Fei (2009); Krizhevsky, Sutskever and Hinton (2012) and
fashion recommendation Song, Feng, Han, Yang, Liu and Nie (2018). DNN’s have also been used for various natural
language processing (NLP) tasks like Part of Speech (POS) tagging and named entity recognition Collobert and Weston
(2008). Ironically, deep learning has more often been used to automatically generate Internet memes as opposed to
find them. In 2013 Wang et al. Wang and Wen (2015) used copula methods to jointly model text and vision features
with popular votes. In 2018 Peirson et al. Peirson et al. (2018) leveraged deep learning to generate memes in a model
they titled “Dank Learning”.
Xie et al. Xie, Natsev, Kender, Hill and Smith (2011) used YouTube to find short video segments that are frequently
reposted which they call video memes. The authors then created a graph of people and content to model interactions.
Unlike video memes, exploring image memes is more challenging as this requires first classifying an image as meme
or not-meme.
The closest research related to our detection effort is the Memesequencer model developed by Dubey et al. Dubey,
Moro, Cebrian and Rahwan (2018). This research separates the meme image template (underlying image) from the
additional text and image manipulation. After separating the meme template it creates a meme embedding by con-
catenating image features and text features using deep learning, with the best model concatenating ResNet18 with
SkipThought text features. Having created an embedding, the authors construct the evolutionary tree using a phylo-
genetic tree. This research is limited to memes that have identifiable templates found on sites like Memegenerator or
Quickmeme. When used to extract memes for social cybersecurity practitioners, the Memegenerator provides high
precision but low recall (see below). Our intent with Meme-Hunter is to increase recall.
2.3. Meme Evolution
The digital footprint that Internet memes leave allows researchers to study the propagation of memes through (and
across) networks. Coscia looked at meme propagation and measurements of success in 2013 Coscia (2013). Bauckhage
et al. Bauckhage et al. (2013) explored the temporal models of fads by looking at Internet memes, approximating
interest in a given meme by using Google Trends. Leskovec et al. Leskovec, Backstrom and Kleinberg (2009) used
memes and phrases extracted from news and blogs to track and study the dynamics of the news cycle. This work was
able to map the evolution of text based memes in the news cycle and blogosphere. Ferrara et al. Ferrara, JafariAsbagh,
Varol, Qazvinian, Menczer and Flammini (2013) focused on clustering text based memes.
The closest research to our study of meme evolution is the study by Zannettou et al. Zannettou, Caulfield, Black-
burn, De Cristofaro, Sirivianos, Stringhini and Suarez-Tangil (2018) that clusters image streams based on pHash and
identifies memetic clusters using meme annotation from sites such as “Know Your Meme”. They apply this process to
multiple sources (Twitter, Reddit, 4chan, Gab) and then use Hawkes process to measure which ecosystem has greater
D.M. Beskow et al.: Preprint submitted to Elsevier Page 3 of 16
Detecting and Characterizing Internet Memes
influence. While focused on meme evolution and influence, this paper does not specifically develop a detection model
that generalizes easily beyond the Know Your Meme annotations, once again rendering low recall in detection appli-
cations. Additionally, this paper clusters only based on the image (via pHash) and does not consider the multi-modal
nature of memes when measuring similarity.
2.4. Meme Optical Character Recognition
The classification process requires learning from a composition of image and text characteristics. Extracting text in
memes requires Optical Character Recognition (OCR). OCR on memes can be challenging since most OCR algorithms
are trained to recognize black font on white background, where many memes are white font on dark background. For
social media image OCR, the state of the art is arguably the Facebook Rosetta system, a deep learning model that
conducts OCR while taking into consideration the background as well Borisyuk, Gordo and Sivakumar (2018). This is
being deployed on Facebook’s platform in order to censor images for extremist messages, allowing Facebook to comply
with increased regulation, particularly in the European Union. Facebook Rosetta output is standard OCR output (text),
and it is not intended to classify memes vs. not-memes. It is also not open sourced or available for researchers (at the
time of this writing).
Our research combines some of the efforts of Zannettou et al. Zannettou et al. (2018) with that of Dubey et al. Dubey
et al. (2018). In doing so, we go beyond both papers by creating a generalizable multi-modal meme detection model
that is not constrained by annotated entries on a site like Know Your Meme. Additionally, we develop the evolutionary
graph with a radius nearest neighbors approach and apply this specifically within the online debate around a large
election event (2018 Mid-term elections). This provides the research community with a gereralizable multi-modal
meme detection model, a new way to build an evolutionary tree, a meme OCR pipeline, and insights into meme impact
and propagation within political conversations. Additionally this model provides approximately 8 times increase in
recall over the template based methods that Dubey and Zannettou propose. This increase in detection recall is especially
important for social cybersecurity practitioners.
3. Classifying Images as Memes
Most images shared on platforms like Twitter are not memes (see Table 3for stats). Therefore, to explore the usage
of memes, it is essential to first classify if an image is a meme or not. While visual Internet memes come in a wide
variety of formats, we restricted our classification to two types that are commonly found. These two types are found
in Figure 4and can be described as:
1. A picture with superimposed white text in impact font. Impact font was developed in the 1960’s by Geoff Lee
and is the font of choice for text over image Edwards. This is illustrated in Figure 4a.
2. Text placed in a white space over a picture, as is shown in Figure 4b.
While this seems restrictive, we will show later that, even with this constraint, our approach finds more memes (i.e.
higher recall) than template based methods.
Given enough meme vs non-meme data, it could be possible for a neural-network model to learn to extract text
(using OCR), extract faces and other meme characteristics to classify memes. However, in a limited data setting like
ours, this approach is likely to fail as OCR itself is research domain in itself. Consequently, we propose to first extract
text and face encodings and use them as supplementary input features. Then to predict an image to be meme/non-
meme, we explore deep learning based multi-modal (multiple features) models that use extracted features in addition
to the raw images.
Next, we describe our models, our data collection effort to get meme and non-memes data to train the models, the
process of training the models, and the classification performance.
3.1. Memes Classification Models
As mentioned earlier, we first extract text and face encodings, so here we explain the process of extracting text and
face encodings from images.
Text Extraction For Optical Character Recognition (OCR) we combined meme specific image preprocessing with an
open source OCR tool. When images contained white font over dark background, we preprocessed the images by 1)
converting the image to grayscale, 2) binarizing the image, and 3) inverting every bit in the binary image array. These
D.M. Beskow et al.: Preprint submitted to Elsevier Page 4 of 16
Detecting and Characterizing Internet Memes
Figure 2: OCR Pipeline for Meme Images.
image preprocessing steps are illustrated in Figure 2. OCR on preprocessed images was accomplished with Google
Tesseract Smith (2007). If images already had black text on white background, no preprocessing was applied. Our
experiments indicated that preprocessing significantly improved Tesseract’s OCR on meme images. Baseline Tesseract
required an average of 49.8 ± 13.8character edits (or levenstein distance) with only 2% readability. Preprocessing
reduced this to an average of 17.5±4.8character edits with 72% of strings remaining readable.
Human Face Encoding As faces are an important element of memes, we extract facial features using the open source
face detection software package called face_recognition, created by Adam Geitgey and made available at Geitgey
(2019). The library returns a face encoding vector for each face found in the image. We use these vectors as the input
to our classification models.
We tried four different groups of classifiers: 1) unimodal classification using only text 2) unimodal classification
using only machine vision 3) multimodal classification using text and vision, and 4) multimodal classification using
text, vision, and face encoding.
LSTM based text classifier In this unimodal model, we use only the extracted text from images as the input for meme
classification. Long Short Term Memory (LSTM) Hochreiter and Schmidhuber (1997) networks are very popular for
text classification. An LSTM takes word embedding and a hidden vector as the input and outputs a new hidden vector.
At the end of the text (input), a fully-connected layer followed by a softmax layer is used to predict the label of the
text. We used Glove vectors Pennington, Socher and Manning (2014) as the input word embeddings. In our results, we
provide several other text only models for comparison, including Naïve Bayes, Support Vector Machines, and Logistic
Regression.
CNN based image classifier Given that our work focuses on image based meme detection, and Convolutional Neural
Networks (CNNs) are the most popular models for visual learning, it is natural for us to consider a CNN based model.
For this work, we tried a number of pre-trained models including VGG18 Simonyan and Zisserman (2014), ResNet18
He, Zhang, Ren and Sun (2016), ImagenetV3 Szegedy, Vanhoucke, Ioffe, Shlens and Wojna (2016). For classification,
we removed the last fully connected layer of the pre-trained network, added a new fully connected network followed by
a sigmoid layer. We also explored freezing all layers, freezing some of the layers, and not freezing any of the layers in
the training process. In the end, allowing to update the weights on all layers provided the best results. We also include
results that extract descriptors with scale-invariant feature transform (SIFT) and Bag-of-Visual-Words (BOVW) feature
representation and support vector machine classification. The SIFT-BOVW model is provided to demonstrate DNN
improvement over pre-DNN models.
Joint DNN model The joint DNN model approach starts by combining just the LSTM (discussed above) and CNN
(discussed above), and then combines the LSTM/CNN with face encoding features as a single model. The model’s
architecture is shown in Fig. 3. As shown in the figure, the output of the LSTM, the CNN and face encodings are
concatenated as a single vector. The concatenated vector is then used as the input to a dense fully connect layer
followed by a sigmoid activation. All parts of model are trained jointly.
D.M. Beskow et al.: Preprint submitted to Elsevier Page 5 of 16
Detecting and Characterizing Internet Memes
Give that meme-maker
a life imprisonment
Resnet 18
Face Encoding
LSTM
Meme/ Non-meme
Softmax
Tex t e x tr ac t io n
Face extracti on
Figure 3: Joint Model for meme classification.
In the last connected layer we use a sigmoid (or logistic) function to generate a probability of the image being a
meme. The sigmoid function is defined as
𝜙(𝑧) = 1
1 − 𝑒𝑥𝑝(𝑧)
3.2. Data
To label Internet meme images for supervised learning, we searched meme images on Reddit, Twitter, Tumblr,
Google Image Search, Flickr, and Instagram. Collecting images from these platforms, we were able to find 25,109
meme images. The meme data contained varied meme categories, including sports, politics, celebrities, and animals.
While the dominant language is English, other languages include French, Spanish, German, Russian, Japanese, Arabic,
and Chinese. The non-meme images were collected at random from Twitter and Google Image search.
In the training data we filtered out non-meme images that didn’t contain either text or a background photo. This
was done so that the algorithm would learn the unique attributes of meme images as opposed to just learning to identify
the presence or absence of text. In order to filter for text, we needed to conduct text detection but not necessarily text
recognition. We found that the Efficient and Accurate Scene Text (EAST) detection model Zhou, Yao, Wen, Wang,
Zhou, He and Liang (2017) performed better at detecting text than the Tesseract based OCR pipeline discussed earlier.
Note that the East model detects the location of text in an image but does not recognize or extract the text. We used the
EAST algorithm to filter out any images that didn’t have at least one text bounding box. Having removed images that
don’t contain text, we discovered that we also needed to remove images that don’t contain a photograph. This decision
was made after finding many black and white document images, particularly in political conversations. To remove
document images, we developed a heuristic that measured the mean Red Green Blue (RGB) score for the image, and
removed it if the mean score was greater than 220. This proved to be fast and easily removed document images without
removing memes of interest. This filter was applied in both the training process as well as the production algorithm.
Image is document if 𝑅𝑒𝑑 +𝐺𝑟𝑒𝑒𝑛 +𝐵𝑙𝑢𝑒
3>220
We summarize the final model training dataset in Table 1. The 50,209 images were mixed with equal portions of
meme and not-meme images. The data was then randomly split into training data (80%), validation data (10%), and
held out test data (10%).
D.M. Beskow et al.: Preprint submitted to Elsevier Page 6 of 16
Detecting and Characterizing Internet Memes
(a) Type A Meme (b) Type B Meme
(c) Saliency in Type A Meme (d) Saliency in Type B Meme
Figure 4: Two types of memes used for meme classification with their respective saliency maps. Saliency maps are
computed by averaging pooled gradients across channels.
Table 1
Classification Dataset Statistics.
Total Images Memes Non-memes
50,209 25,109 25,100
Collecting images from social media streams often includes some amount of abusive language and adult content
images. Practitioners using our methods who want to minimize the impact of this sensitive content should have an
appropriate filter. In our case we used Yahoo’s Open Source “Not Safe For Work” (NSFW) filter (https://github.
com/yahoo/open_nsfw).
3.3. Experiments and Results
For the meme classification task, we define the overall objective function using cross-entropy loss, as can be seen
in Equation 1, where 𝑖𝑛samples, 𝑗∈ {𝑚𝑒𝑚𝑒, 𝑛𝑜𝑛-𝑚𝑒𝑚𝑒}classes, 𝑦is the (one-hot) true label, 𝑝is the probability
output for each label.
(𝑦, 𝑝)=−1
𝑛
𝑖,𝑗
𝑦𝑖𝑗 log(𝑝𝑖𝑗 )(1)
Our primary metric of interest is the F1 score, defined as the harmonic mean of precision and recall. We used
this as our primary metric since it balances the often competing priority of precision vs. recall. In our results we also
provide accuracy, precision, and recall for interpretability.
All models are built using Keras library1with Tensorflow backend 2. As described earlier, the models use text,
1https://keras.io/
2https://www.tensorflow.org/
D.M. Beskow et al.: Preprint submitted to Elsevier Page 7 of 16
Detecting and Characterizing Internet Memes
Table 2
Classification Results.
Type Model Accuracy F1 Precision Recall
Text
Logistic Regression 0.724 0.719 0.735 0.703
Naïve Bayes 0.681 0.607 0.793 0.492
SVM 0.721 0.714 0.736 0.693
LSTM 0.799 0.805 0.786 0.824
Vision
SIFT-BOVW 0.798 0.788 0.828 0.752
Baseline CNN 0.939 0.938 0.946 0.930
VGG18 0.915 0.916 0.909 0.923
ResNet18 0.926 0.927 0.907 0.948
Inception-V3 0.958 0.958 0.952 0.964
Multi-modal
Vision + Text 0.954 0.954 0.943 0.965
Vision + Text Length 0.952 0.951 0.947 0.956
Vision + Text + Face 0.961 0.961 0.959 0.963
face-encoding, and image features as the input and a sigmoid layer for the class label prediction. The models are trained
using stochastic gradient descent with a cross-entropy loss function as seen in Equation 1. The learning rate was used
as a hyper-parameter and varied from 10−3 to 10−1. The LSTM hidden layer size was varied from 16 to 256. We found
that a hidden layer size of 50 and a learning rate of 10−3 worked well. These hyper-parameters were then fixed during
the training and testing process.
We compare the performance of the models in Table. 2and show the training plots in Fig. 5. We train the models
for only 10 epochs since the performance plateaus after that. As we can observe from the plots, most of the learning
is done in the first epoch and validation accuracy is high thereafter. From these results we see that the LSTM model is
significantly better than other text models. Within the Vision Models, we see that all DNN models show significantly
improvement over the SIFT-BOVW model, with the Inception-V3 very deep model providing the best performance
across all metrics. We do see that the multi-modal models provide slight improvement over unimodal vision models.
Model saliency maps Simonyan, Vedaldi and Zisserman (2013) are provided in Figures 4c and 4d. Saliency maps show
the salient pixels that are important for a given class and are computed by averaging pooled gradients across channels.
From these saliency maps we see that we are indeed learning to identify images where the text is positionally located
in pixel locations that are indicative of meme images. Overall we can summarize results by claiming that unimodal
machine vision models provide solid performance in meme detection, and can be enhanced (at a computational cost)
with multi-modal text based features.
4. Evaluating Memes in Election Events
4.1. Finding Memes
We used the DNN model to classify images used in the 2018 US Midterm Elections and the 2018 Swedish National
Elections. We will focus on the 2018 US Midterm election data because it provides the largest meme collection, but
the 2018 Swedish election data is provided in Table 3for comparison purposes. For the US midterm elections, we
collected all tweets that mentioned a member of congress or congressional candidate. For the Swedish elections, we
collected tweets containing hashtags associated with anti-immigrant and nationalistic movements (#svpol, #Val2018,
#feministisktInitiativ, #migpol, #valet2018, #SD2018, #AfS2018, and #MEDval18). Note that the Swedish election
data does not cover the full spectrum of politics in Sweden, but the US Midterm election data does cover the full
spectrum of politics in the United States. We downloaded all images from both data sets in February of 2019. As
indicated below, approximately 9% of the images weren’t available (the account or tweet was suspended by Twitter or
removed by the account owner). The statistics for both data sets are provided in Table 3.
We conducted binary classification with our trained DNN model on all images extracted from both data streams.
A collage of examples that we classified as memes in the US mid-term elections is provided in Figure 1.
D.M. Beskow et al.: Preprint submitted to Elsevier Page 8 of 16
Detecting and Characterizing Internet Memes
0.925
0.950
0.975
1.000
2 4 6 8 10
epoch
Accuracy
Model
Vision
Vision + Text
Vision + Text + Face
Type
Training_Accuracy
Validation_Accuracy
Figure 5: Comparing training and test performance of different models.
4.2. Mapping Meme Evolution in Political Conversations
Given the rich vision/text data that we had, we wanted to map the evolution of visual memes using similarity clus-
tering. By clustering these images, we can not only identify the families but also the connections between the families
of memes. We explored several proven methods for measuring image similarity, to include Color Histograms Novak
and Shafer (1992), Scale-Invariant Feature Transform (SIFT) Lowe (2004), Perceptual Hashing (pHash) Chamoso, Ri-
vas, Martín-Limorti and Rodríguez (2017), and a method similar to the Deep Ranking Wang, Song, Leung, Rosenberg,
Wang, Philbin, Chen and Wu (2014). Similar methods have been used with K-nearest neighbors for image annota-
tion Su and Xue (2015) and with mapReduce by Google for clustering billions of images Liu, Rosenberg and Rowley
(2007). Our initial experiments reveal that the deep ranking method (using features extracted from the last layer before
softmax and evaluated with euclidean distance) performs well. To identify the families of memes, we finally use graph
learning with fixed radius nearest neighbors algorithm Bentley (1975). Fixed radius nearest neighbors finds the neigh-
bors within a given radius of a point or points. We chose fixed-radius method over the K-nearest neighbors method
since the size of our meme families vary widely. This technique also allows us to quickly query similar images based
on a fixed distance radius.
Given a meme, we use ‘brute-force’ based radius neighbour algorithm to find the mutations of the meme. We
attempted to use the ball tree algorithm Omohundro (1989), which partitions meme features into a nested set of fixed
dimensional hyper-spheres (balls) such that each hyper-sphere contains a set of memes based on its distance from the
balls center. Although the ball-tree was designed for high dimensionality, we found that this is still computationally
expensive with more than 120 features. With 25,088 features, we found that the ball-tree algorithm was not practical,
and resorted to the brute force algorithm. Once we have the neighbours of a meme, we can use time of the posting
associated with the meme to generate a directed graph of meme mutations. We recurse the whole process over the
neighbours to get the next set of neighbour and add them to the graph. We stop the recursion after a fixed set of steps
or if the max size of the graph is attained. The algorithm is summarized below (Algorithm. 1). The map of all nodes
and links for the 2018 US Midterm elections is provided in Figure 6. In this we clearly see the clusters of similar
images (or “families”), as well as some of the connections between them.
Having mapped the individual “families” of memes, we used this similarity clustering and the date-time information
from the Tweet metadata to map the chronological evolution of specific memes as seen in Figure 7. In these images we
D.M. Beskow et al.: Preprint submitted to Elsevier Page 9 of 16
Detecting and Characterizing Internet Memes
Figure 6: Graph Learning with Fixed Radius Nearest Neighbors showing families of memes in the US 2018 mid-term
elections (89K nodes and 1.87M links). Network visualization is done with Graphistry (https://www.graphistry.com/).
Algorithm 1 Memes Mutation Graph Algorithm
1: procedure GETMUTATIONGRAPH(Meme m)
2: memes_graph new dictionary
3: neighbours Get radius neighbours
4: for 𝑏𝑖in neighbours do
5: if 𝑏𝑖not in memes_graph then
6: Add 𝑏𝑖to memes_graph
7: for 𝑏𝑖in neighbours do
8: if size(memes_graph)exit_condition then
9: child_memes_graph 𝑔𝑒𝑡𝑀𝑢𝑡𝑎𝑡𝑖𝑜𝑛𝐺𝑟𝑎𝑝ℎ(b𝑖)
10: Add child_memes_graph to memes_graph
11: return memes_graph
see the cultural evolution that was originally envisioned by Richard Dawkins. We also see Linor Shifman’s definition
of memes play out as these meme images “lure extensive creative user engagement.
4.3. Results and Findings
4.3.1. Memes Usage in Election Events
Having identified memes thriving in the online conversation around these election events, we calculated descriptive
statistics regarding memes and the accounts that share them. These descriptive statistics are provided in Table 3. In this
table we make several observations that help us understand meme popularity and virality. First, we see that, although
images are generally popular (high retweet/likes), memes are not. In both events, memes had fewer retweets and likes
than other images, and in the US election memes had a shorter “life-span” on average. We hypothesize that the reason
behind this is that attributed users do not want to associate their reputation with a controversial political meme and its
message. For the same reasons that meme creators disassociate themselves from the memes they create, social media
users, while influenced by memes, are hesitant to like or retweet them, especially polarizing political memes. If this
D.M. Beskow et al.: Preprint submitted to Elsevier Page 10 of 16
Detecting and Characterizing Internet Memes
Figure 7: Political conversations within and between political left and political right.
is the case, then the virality of memes may not be due to normal social media activity (like,share,retweet), but rather
occurs through the selection, retention, and mutation that Dawkins originally described. The memes mutate, carrying
pieces of the original message, and are reintroduced in other corners of the Internet.
We hypothesized that bots could be used to push memes on social media. Using the bot-hunter bot prediction tool
Beskow and Carley (2018) with a probability threshold of 0.6, we predicted the portion of accounts that have bot-like
characteristics. In the Swedish data we found a slightly higher bot involvement with memes, but did not find this in the
US election data. From this analysis we conclude that bot activity did not play an out-sized role in meme propagation
for either of these events.
Additionally, we conducted face detection on the US election memes to find 18 prominent US politicians in the
meme data. To do this we leveraged the open source face detection software created by Adam Geitgey and made
available at Geitgey (2019), using a comparison threshold of 0.54. Using this face detection software, we found the
distribution of memes by politician provided in Figure 8.
In Figure 9we’ve plotted the posting or retweeting of meme images in the 2018 US Election by the political party of
the candidate mentioned. Note that politicians and candidates are mentioned with both positive and negative memes.
In this case, we see the highest volume of meme mentioning Democrats and Republicans associated with the time
immediately after the Kavanaugh hearings.
4.3.2. Meme Propagation Across Platforms
Given the evolutionary and anonymous nature of memes, we hypothesized that memes propagate across the Inter-
net differently than other viral content. Viral content is generally spread through the simple mechanisms of sharing,
retweeting, liking, etc. Memes, as noted above, aren’t liked or retweeted near as much as other media content. We
believe that their propagation occurs more through their mutation and evolution, where one meme generates other
creative works that emerge in other parts of the Internet. This would cause memes to ‘hop’ to more platforms and
domains than normal images. While propagating to new corners of the Internet, however, the memes will undoubtedly
morph, and this mutations is out of the hands and control of the original creators.
To assess this hypothesis, we sampled 5,000 meme images and 5,000 non-meme images from images associated
D.M. Beskow et al.: Preprint submitted to Elsevier Page 11 of 16
Detecting and Characterizing Internet Memes
Table 3
Descriptive Statistics about Internet Memes in Online Election Conversations.
2018 Sweden Election US Midterm Election
Total Tweets 661K 62,034K
Total Users 88K 2,695K
Suspended/removed 1,616/2,302 41,901/47,349
Total Images Shared 47K 4,446K
Total Images Available 43K 4,037K
no
image meme normal
image
no
image meme normal
image
# Images Available 5K 38K 497K 3,539K
# of Unique Images 1.5K 10K 175K 951K
% of bot-like accounts 0.32 0.35 0.31 0.37 0.32 0.28
Life of tweet (hours) 0.51 0.60 0.59 21.80 16.02 22.87
Mean retweets 26 15 33 3,492 237 3,478
Mean Likes 0.84 1.50 2.03 15.96 24.42 65.48
User Median Followers 246 259 224 594 190 258
User Median Friends 348 401 340 857 375 407
Figure 8: Memes by Politician (identified by Face Detection).
with the 2018 US Mid-term elections. All images were political in semantic and visual content. We then conducted a
reverse image lookup or web-detection using the Google Vision API. This service provided us with links to matching
and partially matching images on the Internet. The 5,000 meme images had 62,475 matching links associated with
9,536 unique domains. The 5,000 non-meme images had only 13,617 total links associated with only 4,731 unique
domain names. The memes therefore were connected to roughly 4 times the number of links and twice the number
of domains when compared to non-meme images, supporting the hypothesis that memes propagate to more corners of
the Internet than other types of media.
D.M. Beskow et al.: Preprint submitted to Elsevier Page 12 of 16
Detecting and Characterizing Internet Memes
0
5000
10000
15000
Sep 01 Sep 15 Oct 01 Oct 15 Nov 01
count
Party Democrat Other Republican
Figure 9: Memes (both positive and negative) by Political Party of Candidate mentioned.
4.4. Comparison to Past Methods
In our section looking at related works, we noted several research efforts that leverage meme templates. These
efforts include multi-model efforts by Dudley et al. Dubey et al. (2018) and meme evolution effort by Zannettou et
al. Zannettou et al. (2018). While Dubey uses this technique for virality prediction and clustering, we primarily want
to compare their approach to meme hunter for the task of image retrieval (i.e. extracting all meme images in a given
social media stream). The primary limitation to their work is that it is constrained to identify memes found on sites like
Memegenerator or Quickmeme. As we illustrate below, this approach, while generating high precision, finds very few
of the total memes in election related social media streams (low recall). The Meme-Hunter approach that we propose,
while limited to only two types of memes, typically finds at least 8 times more memes in election related social media
streams as approaches constrained by meme templates.
To evaluate both methods, we randomly sampled 1,050 images from both the Swedish election event and 1,050
images from the 2018 Midterm election stream. We then manually labeled any image that could be construed as an
Internet meme as defined by Dawkins and Schifman. We then ran our Meme-Hunter approach and compared this to a
template based approach.
To replicate a template based approach, we collected 39,112 meme templates from the Meme Generator web
application found at https://imgflip.com/memegenerator. This included most of the popular and even less
popular meme templates used, to include meme templates associated with politicians and world leaders. We then used
perceptual hashing (phash) to identify any image in the test image set that used one of the meme templates. Positive
matches were determined by those hashes that required less than 10 substitutions in a hamming distance comparison.
Positive matches were then considered memes.
Meme hunter was applied with unimodal machine vision models as well as multi-modal models as indicated in
Table 4. In this comparison we see that, while template based approaches offer high accuracy and precision, the recall
in both election based streams is only approximate 5%. In these very dynamic political dialogues, many images that are
construed as memes are not yet in the template databases. This means that using template based methods will only find
5% of the memes in these streams. The Meme Hunter approach, while offering slightly lower accuracy and precision, is
able to find more memes, with the InceptionV3 unimodal model and all multi-modal models providing the highest
performance across all metrics. We see that, in regards to accuracy metric, multi-modal consistently outperforms
unimodal models. The top models using the Meme-Hunter DNN approach find approximately 50% of the images in
both streams.
In this comparison we also want to comment on the lower performance of Meme-Hunter in the US Midterm stream
compared to the Swedish election stream. This is the result of more sophisticated memes being used in the US election
D.M. Beskow et al.: Preprint submitted to Elsevier Page 13 of 16
Detecting and Characterizing Internet Memes
Table 4
Comparing Meme-Hunter to meme template based approaches to find memes in social media streams.
Sweden US Midterms
Model Accuracy F1 Precision Recall Accuracy F1 Precision Recall
Template Based 0.872 0.107 0.727 0.058 0.795 0.100 0.667 0.054
VGG18 0.809 0.437 0.358 0.561 0.771 0.348 0.435 0.290
ResNet18 0.846 0.464 0.429 0.504 0.806 0.430 0.562 0.348
Inception V3 0.820 0.488 0.391 0.647 0.807 0.494 0.550 0.448
Vision + Text 0.865 0.510 0.490 0.532 0.815 0.455 0.600 0.367
Vision + Text + Face 0.858 0.511 0.470 0.561 0.812 0.439 0.592 0.348
stream, some of which are elaborate photo editing work flows and contain no text. Others contain vertical text or
specially placed text. Meme-Hunter will struggle to positively identify these more sophisticated memes.
5. Conclusion
In this paper we present a method for using deep learning to classify memes and graph learning to cluster them into
their evolutionary “families”. Additionally, these models were used to analyze meme usage inside two large democratic
election events. We found that Meme-Hunter provided at least 8 times higher recall than template based methods and
that graph learning is able to capture the overall structure of the evolutionary tree. Having identified memes in large
election events, we found evidence that memes are liked and retweeted less, but families of memes ‘hop’ platforms
and travel to more locations of the Internet than regular images. This indicates that memes do not propagate across
social media and the Internet in the same way as other viral content.
The organic and evolutionary nature of memes has caused some nation states to ban them McDonell (2017), while
encouraging other nations to leverage them as part of elaborate propaganda operations Groll (2018). The countries
that ban them do so largely because memes evolve outside of the control of the state and because image memes can be
difficult to trace Abad-Santos (2013). Those countries that leverage them for information warfare do so for the exact
same reasons. We hope that our proposed methods to study memes would provide more possibilities to trace memes
for good causes.
In future work we plan to use the learned graph and dynamic network analysis to analyze the evolution of the meme
families over time.
6. Acknowledgements
This work was supported in part by the Office of Naval Research (ONR) Award N00014182106 Group Polarization
in Social Media and the Center for Computational Analysis of Social and Organization Systems (CASOS). The views
and conclusions contained in this document are those of the authors and should not be interpreted as representing the
official policies, either expressed or implied, of the ONR or the U.S. Government.
References
Abad-Santos, A., 2013. How memes became the best weapon against chinese internet censorship - the atlantic. https://www.theatlantic.com/
international/archive/2013/06/how-memes- became-best- weapon-against-chinese- internet-censorship/314618/. (Ac-
cessed on 04/06/2019).
Bauckhage, C., Kersting, K., Hadiji, F., 2013. Mathematical models of fads explain the temporal dynamics of internet memes., in: ICWSM, pp.
22–30.
Bentley, J.L., 1975. A Survey of Techniques for Fixed Radius near Neighbor Searching. Technical Report.
Beskow, D., Carley, K.M., 2018. Introducing bothunter: A tiered approach to detection and characterizing automated activity on twitter, in: Bisgin,
H., Hyder, A., Dancy, C., Thomson, R. (Eds.), International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and
Behavior Representation in Modeling and Simulation, Springer.
Blackmore, S., 2000. The Meme Machine. volume 25. Oxford Paperbacks.
Blackmore, S., Dugatkin, L.A., Boyd, R., Richerson, P.J., Plotkin, H., 2000. The power of memes. Scientific American 283, 64–73.
D.M. Beskow et al.: Preprint submitted to Elsevier Page 14 of 16
Detecting and Characterizing Internet Memes
Borisyuk, F., Gordo, A., Sivakumar, V., 2018. Rosetta: Large scale system for text detection and recognition in images, in: Proceedings of the 24th
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM. pp. 71–79.
Bowles, N., 2018. The mainstreaming of political memes online. New York Times URL: https://www.nytimes.com/interactive/2018/
02/09/technology/political-memes- go-mainstream.html.
Canning, D., Reinsborough, P., Smucker, J.M., 2017. Re: Imagining Change: How to Use Story-based Strategy to Win Campaigns, Build Move-
ments, and Change the World. Pm Press.
Chamoso, P., Rivas, A., Martín-Limorti, J.J., Rodríguez, S., 2017. A hash based image matching algorithm for social networks, in: International
Conference on Practical Applications of Agents and Multi-Agent Systems, Springer. pp. 183–190.
Collobert, R., Weston, J., 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning, in: Proceed-
ings of the 25th international conference on Machine learning, ACM. pp. 160–167.
Coscia, M., 2013. Competition and success in the meme pool: A case study on quickmeme. com., in: ICWSM.
Davis, N., 2017. The Selfish Gene. Macat Library.
Davison, P., 2012. The language of internet memes. The social media reader , 120–134.
Dawkins, R., 2006. The selfish gene: With a new introduction by the author. UK: Oxford University Press.(Originally published in 1976) .
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L., 2009. Imagenet: A large-scale hierarchical image database, in: 2009 IEEE conference
on computer vision and pattern recognition, Ieee. pp. 248–255.
Donovan, J., Friedberg, B., 2019. Source hacking: Media manipulation in practice. Retrieved from Data&Society website: https://datasociety.
net/output/source-hacking-media-manipulation-in-practice .
Dubey, A., Moro, E., Cebrian, M., Rahwan, I., 2018. Memesequencer: Sparse matching for embedding image macros, in: Proceedings of the 2018
World Wide Web Conference, International World Wide Web Conferences Steering Committee. pp. 1225–1235.
Edwards, P., . The reason every meme uses that one font - vox. https://www.vox.com/2015/7/26/9036993/meme- font-impact. (Accessed
on 02/20/2019).
Ferrara, E., JafariAsbagh, M., Varol, O., Qazvinian, V., Menczer, F., Flammini, A., 2013. Clustering memes in social media, in: 2013 IEEE/ACM
International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013), IEEE. pp. 548–555.
Geitgey, A., 2019. Face recognition. https://github.com/ageitgey/face_recognition.
Groll, E., 2018. How russia hacked u.s. politics with instagram marketing – foreign policy. https://foreignpolicy.com/2018/12/17/
how-russia- hacked-us- politics-with-instagram- marketing/. (Accessed on 04/06/2019).
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision
and pattern recognition, pp. 770–778.
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al., 2012. Deep neural
networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 82–97.
Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural computation 9, 1735–1780.
Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep convolutional neural networks, in: Advances in neural infor-
mation processing systems, pp. 1097–1105.
Leskovec, J., Backstrom, L., Kleinberg, J., 2009. Meme-tracking and the dynamics of the news cycle, in: Proceedings of the 15th ACM SIGKDD
international conference on Knowledge discovery and data mining, ACM. pp. 497–506.
Liu, T., Rosenberg, C., Rowley, H.A., 2007. Clustering billions of images with large scale nearest neighbor search, in: 2007 IEEE Workshop on
Applications of Computer Vision (WACV’07), IEEE. pp. 28–28.
Lowe, D.G., 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision 60, 91–110.
McDonell, S., 2017. Why china censors banned winnie the pooh - bbc news. https://www.bbc.com/news/blogs- china-blog-40627855.
(Accessed on 04/06/2019).
Novak, C.L., Shafer, S.A., 1992. Anatomy of a color histogram, in: Proceedings 1992 IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, IEEE. pp. 599–605.
Omohundro, S.M., 1989. Five Balltree Construction Algorithms. International Computer Science Institute Berkeley.
Peirson, V., Abel, L., Tolunay, E.M., 2018. Dank learning: Generating memes using deep neural networks. arXiv preprint arXiv:1806.04510 .
Pennington, J., Socher, R., Manning, C., 2014. Glove: Global vectors for word representation, in: Proceedings of the 2014 conference on empirical
methods in natural language processing (EMNLP), pp. 1532–1543.
Shifman, L., 2012. An anatomy of a youtube meme. new media & society 14, 187–203.
Shifman, L., 2013. Memes in a digital world: Reconciling with a conceptual troublemaker. Journal of Computer-Mediated Communication 18,
362–377.
Shifman, L., 2014a. The cultural logic of photo-based meme genres. Journal of Visual Culture 13, 340–358.
Shifman, L., 2014b. Memes in Digital Culture. MIT press.
Simonyan, K., Vedaldi, A., Zisserman, A., 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps.
arXiv preprint arXiv:1312.6034 .
Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 .
Smith, R., 2007. An overview of the tesseract ocr engine, in: Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International
Conference on, IEEE. pp. 629–633.
Song, X., Feng, F., Han, X., Yang, X., Liu, W., Nie, L., 2018. Neural compatibility modeling with attentive knowledge distillation, in: The 41st
International ACM SIGIR Conference on Research & Development in Information Retrieval, ACM. pp. 5–14.
Su, F., Xue, L., 2015. Graph learning on k nearest neighbours for automatic image annotation, in: Proceedings of the 5th ACM on International
Conference on Multimedia Retrieval, ACM. pp. 403–410.
Szablewicz, M., 2014. The ‘losers’ of china’s internet: Memes as ‘structures of feeling’for disillusioned young netizens. China Information 28,
259–275.
D.M. Beskow et al.: Preprint submitted to Elsevier Page 15 of 16
Detecting and Characterizing Internet Memes
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z., 2016. Rethinking the inception architecture for computer vision, in: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826.
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y., 2014. Learning fine-grained image similarity with deep
ranking, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1386–1393.
Wang, W.Y., Wen, M., 2015. I can has cheezburger? a nonparanormal approach to combining textual and visual information for predicting and gen-
erating popular meme descriptions, in: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, pp. 355–365.
Xie, L., Natsev, A., Kender, J.R., Hill, M., Smith, J.R., 2011. Visual memes in social media: Tracking real-world news in youtube videos, in:
Proceedings of the 19th ACM International Conference on Multimedia, ACM, New York, NY, USA. pp. 53–62. URL: http://doi.acm.org/
10.1145/2072298.2072307, doi:10.1145/2072298.2072307.
Zannettou, S., Caulfield, T., Blackburn, J., De Cristofaro, E., Sirivianos, M., Stringhini, G., Suarez-Tangil, G., 2018. On the origins of memes by
means of fringe web communities, in: Proceedings of the Internet Measurement Conference 2018, ACM. pp. 188–202.
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J., 2017. EAST: An efficient and accurate scene text detector, in: 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), IEEE. pp. 2642–2651.
D.M. Beskow et al.: Preprint submitted to Elsevier Page 16 of 16
... Vision is widely regarded as one of the primary senses for humans, playing a crucial role in understanding complex systems. Beskow et al. in [41] investigated the use of deep learning networks to enable the perception and classification of Internet culture (i.e., memes) for social media tweets. The authors analyzed the problem from a modal perspective: tweets contain memes information that may be embedded in text and images, including text in images. ...
... A three-layer fully connected network forms the DQN, which is used to obtain the optimal Q-value, i.e., the policy, by means of gradient descent. Base on the Qfunction represents by (41) , DQN has neural network weights given the state and action . Using the Mean Squared Error (MSE) as the loss function, the target Q-Value is computed as: ...
... Word embedding, e.g., Word2Vec and GloVe, is a useful approach when it is necessary to get a good semantic representation for each lexical learning instead of classifying a whole passage of text. This is evidenced by the work in [39], [41]. ...
Article
Full-text available
Situation Awareness (SA) is a process of sensing, understanding and predicting the environment and is an important component in complex systems. The reception of information from the environment tends to be continuous and of a multimodal nature. AI technologies provide a more efficient and robust support by subdividing the different stages of SA objectives into tasks such as data fusion, representation, classification, and prediction. This paper provides an overview of AI and multimodal methods used to build, enhance and evaluate SA in a variety of environments and applications. Emphasis is placed on enhancing perceptual integrity and persistence. Research indicates that the integration of artificial intelligence and multimodal approaches has significantly enhanced perception and comprehension in complex systems. However, there remains a research gap in projecting future situations and effectively fusing multimodal information. This paper summarizes some of the use cases and lessons learned where AI and multimodal techniques have been used to deliver SA. Future perspectives and challenges are proposed, including more comprehensive predictions, greater interpretability, and more advanced visual information.
... Furthermore, memes have been popularised on many social media networks. Some examples include Facebook, TikTok, Instagram, and X (Beskow et al. 2020;Mahasneh & Bashayreh 2021). Memes are transmitted across social media at high speed; indeed, they spread like viruses (Wiggins 2019). ...
... Lestari (2018) asserts that because culture, time, and personal experience all impact a meme's meaning, a meme has an implicit (non-literal) meaning. Beskow et al. (2020) believe that memes are culture specific. Given that one way to understand a meme is by looking at the image and comparing it with the current situation. ...
... Given that one way to understand a meme is by looking at the image and comparing it with the current situation. In addition, memes usually transfer different meanings and convey various social, political, and economic messages (Beskow et al., 2020;Laineste & Voolaid 2017). Sometimes they express universal emotions; at other times, they only intend to be humorous and to entertain social media users. ...
Article
Full-text available
This qualitative descriptive study attempts to investigate verbal humour in nineteen Saudi Arabian national soccer team's memes on the event of their participation in 2022 FIFA World Cup in Qatar. Since this study uses Attardo and Raskin's (1991) General Theory of Verbal Humour (GTVH) as its methodology, it attempts to check whether GTVH six Knowledge Resources (KRs) apply to the different methods and strategies Saudi fans used in the creation of these humorous memes. Data has been randomly selected from various users' accounts on X (formerly known as Twitter) under different hashtags related to the Saudi Arabian national soccer team's participation. The results reveal that all the analysed nineteen memes were structured around the six KRs proposed by GTVH. These are: Script Opposition (SO), Logical mechanism (LM), Situation (SI), Target (TA), Narrative strategy (NS), and Language (LA). This ultimately supports Attardo's (2001) view that all kinds of humorous texts, memes in this study, are within the scope of GTVH KRs. The types of verbal humor used were: irony, wordplay, exaggeration, coincidence, juxtaposition, sarcasm, humor, and satire. Their selection depended on the surrounding context and the user's intention supporting Dynel's view (2009). The creation of these memes relied on images related to famous national and international media figures such as singers, actors, players, coaches, and soccer events forming a framework that shaped the soccer humorous discourse of memes in social media. Humor in these memes has been found to originate from the interplay between the different script oppositions used in the image of the meme and the text in the caption. In addition, memes were found to be mainly used for humorous effects in three occasions: (1) celebrating their team's victory by mocking international players who participated in Saudi Arabia-Argentina match, (2) relieving different emotions such as stress and concern on the occasion of their team's defeat by Poland, and (3) satirizing Saudi team players after their loss to Mexico. These memes also have been found to have an implicit (non-literal) meaning that depends on culture, time, and the recipient's personal experience supporting the views of
... Additionally, there is a need to explore potential common relationships between meme toxicity types and the tactics used to convey them. Furthermore, further investigation into additional dimensions, such as the context of posting (user, forum, platform) and propagation features [252], is warranted, as these factors can provide a more comprehensive understanding of meme toxicity dynamics. ...
Preprint
Full-text available
Internet memes, channels for humor, social commentary, and cultural expression, are increasingly used to spread toxic messages. Studies on the computational analyses of toxic memes have significantly grown over the past five years, and the only three surveys on computational toxic meme analysis cover only work published until 2022, leading to inconsistent terminology and unexplored trends. Our work fills this gap by surveying content-based computational perspectives on toxic memes, and reviewing key developments until early 2024. Employing the PRISMA methodology, we systematically extend the previously considered papers, achieving a threefold result. First, we survey 119 new papers, analyzing 158 computational works focused on content-based toxic meme analysis. We identify over 30 datasets used in toxic meme analysis and examine their labeling systems. Second, after observing the existence of unclear definitions of meme toxicity in computational works, we introduce a new taxonomy for categorizing meme toxicity types. We also note an expansion in computational tasks beyond the simple binary classification of memes as toxic or non-toxic, indicating a shift towards achieving a nuanced comprehension of toxicity. Third, we identify three content-based dimensions of meme toxicity under automatic study: target, intent, and conveyance tactics. We develop a framework illustrating the relationships between these dimensions and meme toxicities. The survey analyzes key challenges and recent trends, such as enhanced cross-modal reasoning, integrating expert and cultural knowledge, the demand for automatic toxicity explanations, and handling meme toxicity in low-resource languages. Also, it notes the rising use of Large Language Models (LLMs) and generative AI for detecting and generating toxic memes. Finally, it proposes pathways for advancing toxic meme detection and interpretation.
... There are also limitations to applying the biological analogy to the study of disinformation versus misinformation in terms of meme spread. It is well known that memes evolve and propagate as they spread through networks (Beskow et al. 2020;Schlaile et al. 2018), and that propagation of a meme depends on its relative fitness compared to other memes (Spitzberg 2014). A critical difference between biological and informational agents is that information agents-especially disinformative ones-are very commonly accompanied by intent, whereas biological agents arise due to human intention exceedingly rarely. ...
Article
Full-text available
Previously, it has been shown that transmissible and harmful misinformation can be viewed as pathogenic, potentially contributing to collective social epidemics. In this study, a biological analogy is developed to allow investigative methods that are applied to biological epidemics to be considered for adaptation to digital and social ones including those associated with misinformation. The model’s components include infopathogens, tropes, cognition, memes, and phenotypes. The model can be used for diagnostic, pathologic, and synoptic/taxonomic study of the spread of misinformation. A thought experiment based on a hypothetical riot is used to understand how disinformation spreads.
... Du et al., 2020;Kuipers, 2002). But often memes are also construed by adding new text to existing "meme templates," featuring recognizable visuals (Beskow et al., 2020) or a cast of recurring characters, including typical "meme characters" (Denisova, 2019;Shifman, 2014). ...
Article
Full-text available
The Covid-19 pandemic brought about an unprecedented cycle of digitally spread humor. This article analyzes a corpus of 12,337 humor items from 80+ countries, mainly in visual format, and mostly memes, collected during the first half of 2020, to understand the features and intended audiences of this “pandemic humor”. Employing visual machine-learning techniques and additional qualitative analysis, we ask which actors and which templates were most prominent in the pandemic humor, and how these actors and templates vary on the following dimensions: local vs. global, Covid-specific vs. general, and specifically for the actors, political vs. not political. Our analysis shows that most pandemic memes from the first wave are not political. The vast majority of the memes are global: They are based on well-recognized meme templates, and almost all identified actors are part of a cast of set “meme faces”, mostly from the US and the UK but recognized around the world. Most popular templates were found in several countries and languages, including non-European languages. Most memes were based on non-Covid specific templates, but we found new Covid-specific memes, which sheds new light on the process by which memes emerge, spread, and potentially become new meme templates. Our analysis supplements existing studies of (Covid) memes that mostly focus on small national samples, using qualitative methods. This cross-national analysis is enabled by a global dataset with unique data on geographical origin of humor. We show the usefulness of visual machine learning for identifying the emergence, spread and prevalence of transnational (humorous) cultural forms. By combining large-scale computational analysis with in-depth analysis, we bridge a gap in in meme studies between (mostly quantitative) data sciences and (mostly qualitative) communication and media studies.
... By charting the way of meme transformation inside this pivotal discretionary setting, the consider offers compelling proof in the back of Richard Dawkins' hypothesis of meme advancement. (Beskow et al., 2020) The web, a social dissolving pot flooding with viral patterns, has reignited the wrangle about "memes," those irresistible units of meaning passed carefully from hand to hand. In "Memes in an Advanced World: ...
Article
Full-text available
This research article investigates the impact of memetic content on the political behaviours of university students in Punjab, Pakistan. With the rapid growth of social media and the increasing popularity of memetic content, understanding its influence on political behaviours becomes crucial, especially among the young and educated population. A survey-based research method has been employed, using a questionnaire as the data gathering tool, targeting university students in Punjab. The study aims to explore the relationship between exposure to memetic content and political behaviours, including political engagement, political knowledge, and political participation. The sample population consisted of university students from diverse disciplines, allowing for a comprehensive analysis of the research topic. The findings from this research will contribute to our understanding of how memetic content shapes political behaviours and may inform strategies to enhance political awareness and engagement among university students in Punjab, Pakistan.
Article
This paper outlines a multidisciplinary framework ( Digital Rhetorical Ecosystem or DRE3 ) for scaling up qualitative analyses of image memes. First, we make a case for applying rhetorical theory to examine image memes as quasi‐arguments that promote claims on a variety of political and social issues. Next, we argue for integrating rhetorical analysis of image memes into an ecological framework to trace interaction and evolution of memetic claims as they coalesce into evidence ecosystems that inform public narratives. Finally, we apply a computational framework to address the particular problem of claim identification in memes at large scales. Our integrated framework answers the recent call in information studies to highlight the social, political, and cultural attributes of information phenomena, and bridges the divide between small‐scale qualitative analyses and large‐scale computational analyses of image memes. We present this theoretical framework to guide the development of research questions, processes, and computational architecture to study the widespread and powerful influence of image memes in shaping consequential public beliefs and sentiments.
Conference Paper
Full-text available
Internet memes are increasingly used to sway and manipulate public opinion. This prompts the need to study their propagation, evolution, and influence across the Web. In this paper, we detect and measure the propagation of memes across multiple Web communities, using a processing pipeline based on perceptual hashing and clustering techniques, and a dataset of 160M images from 2.6B posts gathered from Twitter, Reddit, 4chan's Politically Incorrect board (/pol/), and Gab, over the course of 13 months. We group the images posted on fringe Web communities (/pol/, Gab, and The_Donald subreddit) into clusters, annotate them using meme metadata obtained from Know Your Meme, and also map images from mainstream communities (Twitter and Reddit) to the clusters. Our analysis provides an assessment of the popularity and diversity of memes in the context of each community, showing, e.g., that racist memes are extremely common in fringe Web communities. We also find a substantial number of politics-related memes on both mainstream and fringe Web communities, supporting media reports that memes might be used to enhance or harm politicians. Finally, we use Hawkes processes to model the interplay between Web communities and quantify their reciprocal influence, finding that /pol/ substantially influences the meme ecosystem with the number of memes it produces, while The_Donald has a higher success rate in pushing them to other communities.
Conference Paper
Full-text available
As malicious automated agents, or bots, are increasingly used to manipulate the global marketplace of information and beliefs, their detection, characterization, and at times neutralization is an important aspect of a national security operations. Unhindered, these information campaigns, assisted by automated agents, can begin slowly changing a society and its norms. Within this context, we seek to lay the groundwork for bot-hunter, a Tiered Approach to bot detection and characterization, while simultaneously presenting an event based method for annotating data.
Conference Paper
Full-text available
The analysis of the creation, mutation, and propagation of social media content on the Internet is an essential problem in computational social science, affecting areas ranging from marketing to political mobilization. A first step towards understanding the evolution of images online is the analysis of rapidly modifying and propagating memetic imagery or "memes". However, a pitfall in proceeding with such an investigation is the current incapability to produce a robust semantic space for such imagery, capable of understanding differences in Image Macros. In this study, we provide a first step in the systematic study of image evolution on the Internet, by proposing an algorithm based on sparse representations and deep learning to decouple various types of content in such images and produce a rich semantic embedding. We demonstrate the benefits of our approach on a variety of tasks pertaining to memes and Image Macros, such as image clustering, image retrieval, topic prediction and virality prediction, surpassing the existing methods on each. In addition to its utility on quantitative tasks, our method opens up the possibility of obtaining the first large-scale understanding of the evolution and propagation of memetic imagery.
Article
Full-text available
The analysis of the creation, mutation, and propagation of social media content on the Internet is an essential problem in computational social science, affecting areas ranging from marketing to political mobilization. A first step towards understanding the evolution of images online is the analysis of rapidly modifying and propagating memetic imagery or `memes'. However, a pitfall in proceeding with such an investigation is the current incapability to produce a robust semantic space for such imagery, capable of understanding differences in Image Macros. In this study, we provide a first step in the systematic study of image evolution on the Internet, by proposing an algorithm based on sparse representations and deep learning to decouple various types of content in such images and produce a rich semantic embedding. We demonstrate the benefits of our approach on a variety of tasks pertaining to memes and Image Macros, such as image clustering, image retrieval, topic prediction and virality prediction, surpassing the existing methods on each. In addition to its utility on quantitative tasks, our method opens up the possibility of obtaining the first large-scale understanding of the evolution and propagation of memetic imagery.
Conference Paper
In this paper we present a deployed, scalable optical character recognition (OCR) system, which we call Rosetta , designed to process images uploaded daily at Facebook scale. Sharing of image content has become one of the primary ways to communicate information among internet users within social networks such as Facebook, and the understanding of such media, including its textual information, is of paramount importance to facilitate search and recommendation applications. We present modeling techniques for efficient detection and recognition of text in images and describe Rosetta 's system architecture. We perform extensive evaluation of presented technologies, explain useful practical approaches to build an OCR system at scale, and provide insightful intuitions as to why and how certain components work based on the lessons learnt during the development and deployment of the system.
Conference Paper
Recently, the booming fashion sector and its huge potential benefits have attracted tremendous attention from many research communities. In particular, increasing research efforts have been dedicated to the complementary clothing matching as matching clothes to make a suitable outfit has become a daily headache for many people, especially those who do not have the sense of aesthetics. Thanks to the remarkable success of neural networks in various applications such as the image classification and speech recognition, the researchers are enabled to adopt the data-driven learning methods to analyze fashion items. Nevertheless, existing studies overlook the rich valuable knowledge (rules) accumulated in fashion domain, especially the rules regarding clothing matching. Towards this end, in this work, we shed light on the complementary clothing matching by integrating the advanced deep neural networks and the rich fashion domain knowledge. Considering that the rules can be fuzzy and different rules may have different confidence levels to different samples, we present a neural compatibility modeling scheme with attentive knowledge distillation based on the teacher-student network scheme. Extensive experiments on the real-world dataset show the superiority of our model over several state-of-the-art methods. Based upon the comparisons, we observe certain fashion insights that can add value to the fashion matching study. As a byproduct, we released the codes, and involved parameters to benefit other researchers.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Conference Paper
One of the main research trends over the last years has focused on knowledge extraction from social networks users. One of the main difficulties of this analysis is the lack of structure of the information and the multiple formats in which it can appear. The present article focuses on the analysis of the information provided by different users in image form. The problem that is intended to be solved is the detection of equal images (although they may have minimal transformations, such as a watermark), which allows establishing links between users who publish the same images. The solution proposed in the article is based on the comparison of hashes, which allows certain transformations that can be made to an image from a computational point of view.