Content uploaded by Yahya Tashtoush
Author content
All content in this area was uploaded by Yahya Tashtoush on Mar 02, 2021
Content may be subject to copyright.
Abstract— The social media world is growing day by day;
people are using social media platforms to express their
feelings. Leading companies are deploying those platforms in
measuring customer’s satisfaction with their products and
services. The huge amount of data produced by such platforms
can be analyzed to help those companies in improving their
businesses. This paper introduces a new contribution in the
field of emotion classification for Tweets by using Fuzzy Logic
intellection that classify each tweet to an emotion with different
degrees of intensity. We achieved that by developing two fuzzy
classification systems, the first one inspects the text and
referred to as (TCFL), while the second one inspects emojis
associated with tweet’s text and referred to as (ECFL). Our
approaches classify tweets into eight different emotion
categories (Joy, Sadness, Anger, Disgust, Trust, Fear, Surprise,
and Anticipation) with seven emotion degrees (Extremely
High, Very High, High, Medium, Low, Very Low, and
Extremely Low). After comparing the developed two systems
with human-based classification, TCFL outperformed ECFL
with 48.96% match as compared to 32.54% match for ECFL.
Keywords- Sentiment analysis, emotion detection, fuzzy logic,
multi-label classification, text classification,emoji classification,
Tweets.
I. INTRODUCTION
Over 3.2 billion people are using social media
platforms, 73% of marketers are using social media for
marketing purposes, and 54% of social media users are
using social media platforms to search products and services
and express their opinions about it [1]. This huge usage for
social media platforms is producing a tremendous amount of
unstructured data over the web, and in the last few years,
analyzing such data to extract useful information from it,
and use it to make crucial business decisions has become an
important issue. Big companies are investing their money to
manage and analyze the data they get about their business
from their social media accounts. The reason for this is that
people posts, comments, tweets, and other types of online
interactions can be used to explore the opinions about a
specific brand, what they think about a new product, what
people want, and the general trend in a specific topic. If they
explored such information, business owners can adjust their
business to increase customer satisfaction and achieve
higher profits.
Sentiment analysis is a natural language processing
(NLP) branch that constructs data mining systems to extract
the opinion from a text, and whether the text holds a positive
or negative opinion or feeling [2][3]. The polarity
classification here is concerned with classifying the
document, word, or sentence to positive, negative, or neutral
opinion or feeling. While the other type of sentiment
analysis is called the beyond polarity classification and it
aims at not only classifying the document to positive or
negative but also classify it to an emotion like sad, happy,
angry, and surprised [4][5].
Emotion recognition is a type of beyond polarity
classification that aims at uncovering the emotional
sentiment in a text and the feeling of a user when he/she
wrote a review, comment, or his opinion about something.
In order to detect the emotion, the text itself needs to be
analyzed in a certain way. Usually, the process starts with
breaking the text down to words and then using a specific
approach that processes those words based on their
sentiment in order to discover the final emotion of the
document or the sentence [6]. Most of the emotion detection
systems consider the text and make the use of lexicons that
pair the word with a sentiment and a weight that represent
the intensity of that emotion, and this is called text
classification [7][8][9].
Text classification is a data mining task that maps a
data instance (text) to one label from a set of possible labels
called classes. The built classification model analysis the
text features to decide the label of the data instance. Many
variations are possible here. The hard version of the text
classification maps the data instance to an explicit label.
While in the soft version, the instance could be assigned a
probability. In another variation, the same data instance
could be classified to more than one class at the same time,
and this is called multi-label classification [10]. The most
common techniques used to build the classification model
are decision trees (DT), Support vector machine (SVM),
Tweets Emotion Prediction by Using Fuzzy Logic
System
Yahya M. Tashtoush, Dana Abed Al Aziz Orabi
Department of Computer Science
Jordan University for Science and Technology
Irbid, Jordan
Email: yahya-t@just.edu.jo,daorabi16@cit.just.edu.jo
2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)
978-1-7281-2946-4/19/$31.00 ©2019 IEEE
83
naïve bayes (NB), and Bayesian classifiers [11][12].
Although most of the applications process the text only,
there has been an increased attention to the emojis, given
their importance in the sentiment analysis task, leading to
more reliable classification results [13][14].
In the multi-label classification, the same data instance
can be classified to two or more labels from the classes data
set. The motivation for this concept came from the medical
diagnosis and text classification problem where the same
text could cover more than one subject like social and
religion. And the same patient medical profile could be
classified to more than one disease together like blood
pressure and diabetes [15] [16]. Boosting, k-nearest
neighbor, decision trees, neural networks, and fuzzy logic
are classification methodologies that can be adjusted and
optimized to solve the multi-label classification problem
[17] [18] [19]. In our contribution we deployed the fuzzy
logic concept to develop the two multi-label classification
systems (TCFL and ECFL).
This paper will be completed as follows: section 2
explains the fuzzy logic concept in details, section 3
presents the problem statement and the study’s goal. section
4 demonstrates the related works, section 5 presents our
classification methodology, section 6 presents the
experimental results, section 7 concludes our contribution,
and section 8 points to future research directions.
II. FUZZY LOGIC
The word fuzzy means the thing that is not clear or
cloudy. In the real world, there is a lot of situations that we
cannot judge whether a specific thing is true or false,
absolute positive or absolute negative, because it is not
absolute for us. This was the reason for discovering the
fuzzy logic concept by Lotfi Zadeh back in the 60’s. He
stated that the modern computers with the language of 0’s
and 1’s cannot understand the natural language of humans
because it cannot be converted to an absolute 0’s and 1’s.
His new idea introduced the fuzzy logic as a way of
computing that is based on the degree of truth instead of an
absolute 0’s or 1’s as in the Boolean logic [20] [21].
A. Fuzzy Logic Methods
There are three types of fuzzy logic inference methods
which are Tsukamoto, Mamdani, and Sugeno. They all have
the three stages of the fuzzy logic framework which are
fuzzification, rule generation, and defuzzification which are
demonstrated in detail in the following section. But the core
difference between the fuzzy logic methods is the
methodology that is used to create the crisp output out of the
fuzzy input [22]. The Mamdani fuzzy inference system uses
the center of gravity algorithm in the defuzzification stage,
the Sugeno uses the weighted average methodology, and the
Tsukamoto uses the height method. The researchers use the
inference system that is suitable for the available input
values and the desired output from the system based on the
developed application [23][24][25].
B. Fuzzy Logic Architecture
Basically, the fuzzy logic system consists of four parts
which are the fuzzification module, knowledge base,
inference engine, and defuzzification module [26][27]. In
the fuzzification module, the input numerical values are
converted to linguistic values. Those input values are
mapped to linguistic values based on a membership
function. The fuzzy controller that consist of the knowledge
base and the inference engine performs the core steps of the
fuzzy logic framework [28]. The knowledge base consists of
a set of well-defined if-then fuzzy rules, and the inference
engine uses those rules for the assigning of the output
values. The inference engine is responsible for determining
which rules are applied based on the input values. Follows
that a step of rule combining to develop the control actions.
There are many methods to combine the rules such as the
maximum method and the bounded sum. In our study we
use the maximum method to combine the rules. The last step
in the fuzzy system is the defuzzification. In the
defuzzification module, the fuzzy sets generated from the
previous steps are converted to a crisp output. There is a lot
of methods to apply the defuzzification which are the center
of gravity, weighted average, left most, and right most
maximum [29][30][31].
C. Fuzzy Knowledge Base
The fuzzy knowledge base consists of a set of fuzzy
if-then rules that are used in the classification process.
Those rules should be generated and studied carefully
because the efficiency of the classification system depends
on the quality of the rules that resides in the fuzzy
knowledge base. There are several methodologies for the
fuzzy rule generation. The first one is the histogram
methodology where the input feature values are mapped to a
histogram in a different way for each class and then used as
membership function for the classification task. The other
type is the rule generation based on the mean and standard
deviation of the input values. In this approach, there is a rule
generated for each class based on the mean and standard
deviation of each attribute values in the dataset [32][33].
The last rule generation methodology is the simple
fuzzy grid where each attribute is partitioned to a fuzzy set.
After that the fuzzy grid is built and it consists of fuzzy
subspaces. The number of the subspaces is determined by
the number of the attributes and their fuzzy sets. Each
attribute is represented by a dimension in the grid and the
dimension is partitioned to the fuzzy sets. Each subspace
presents a meeting point between one fuzzy set from each
attribute. After that, for each one of the fuzzy subspaces, a
fuzzy rule is generated [34].
2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)
84
D. Fuzzy Membership Functions
At the stages of fuzzy logic related to fuzzification and
defuzzification, the crisp input values need to be fuzzified
and converted to linguistic values, and then defuzzified and
converted back to a crisp output value. This is done using
the membership functions that present the degree of truth
based on the fuzzy logic and it is presented as a graphical
form. Each input value is mapped to one or two values
between 0 and 1. If the same input value got two linguistic
values, this means that it belongs to two fuzzy sets with
different degrees of membership and the summation of the
degrees should be equal to 1. There are different types of
membership functions which are the trapezodial, triangular
and Gaussian. For the fuzzification we used the triangular
form while for the defuzzification we used the trapezoidal
form [35] [36].
III. PROBLEM STATEMENT AND STUDY AIM
Usually, emotion classification systems classify the text
to an absolute class of emotion like sad, anger, happy. And
most of them focus on the words in the text only without
paying attention to emojis that appear in it and could add a
piece of very important sentiment information.
The aim of this study is to develop a classification
framework by using the fuzzy logic approach which helps
us in classifying the tweet to an emotion in addition to
detecting the degree of intensity of that emotion in the
tweet, with two different variations of the system. The first
system considers the text in a tweet to perform the
classification task, and through this thesis, we refer to this
system as textual classification by using fuzzy logic (TCFL)
system. The other one is based on emojis contained in a
tweet and we consider them only to perform the
classification task. We refer to this system as emojis
classification by using fuzzy logic (ECFL). In the end, we
answer the question: which one of the two, the text or emoji
is more representative of the true emotion in a tweet?
IV. LITIRATURE REVIEW
This section presents the latest scientist’s contributions
related to tweets sentiment analysis, Emojis based sentiment
analysis, and fuzzy logic classification.
A. Tweets Sentiment Analysis
Sharma et al. [37] developed a web application that
performs alive tweets sentiment analysis on a regional level.
Python Flask platform is used by the authors to render the
live web pages based on the user search request. After the
twitter API gathers the relative tweets based on the search
string, data features including text, user name, friends list,
and followers are extracted from the metadata. The location
of the user is extracted also from the metadata in case the
user provided it in the first place, otherwise, the location is
extracted based on the location signs mentioned in the text
like countries and cities names. Next, the python text blob
library is used to perform the sentiment analysis task, this
library assigns the text with a sentiment score that ranges
from -1 to 1 from the most negative to the most positive.
The last step is to create a .csv file that contains all the data
of the tweets along with their sentiment score. This file is
used by the Plotly library that generates a file and sends it to
the front end at the shape that allows the data to be plotted
on the world map using JavaScript. The final result is at the
shape of a plotted world map shaded based on the sentiment
of the tweets the user searched for. The degree of the
shading in a specific location represent the sentiment of the
tweets in that area. For example, dark shade could mean
positive sentiment and light shade could represent the
negative sentiment.
Jabreel et al. [38] proposed a new methodology for the
problem of multi-label classification, each multi-label
classification task should go through the stage of problem
transformation that transforms the multi-label classification
problem to multiple single labeling problems. There are
many methods to perform the problem transformation
operation such as the binary relative method. They proposed
a new method and called it the XY pair set. This algorithm
considers a multi-labeled data set and the set of all possible
labels and transforms it to a binary data set that pairs the
data instances of data along with the possible labels with a
value of 0 or 1 that represent whether the instance is labeled
with that class or not. Then they built a deep learning model
with 3 parts: The embedding module assign each word-label
with a weight. The attention model uses those weights to
define the strength of the relationship between each word
and the label. The second part is the encoding module that
uses a recurrent neural network to transform the previous
vector to vector of hidden status to prepare it for the
classification module that does the classification task using
the neural network. The proposed methodology achieves the
highest accuracy of 54% compared with the baseline that
uses the TCS and SVM.
Sharifirad et al. [39] worked on a project that detects
the emotion and the intensity degree of it in sexist tweets.
Their work is based on a previous study that categorized the
sexist subjects to three categories which are indirect
harassment, sexual harassment, and physical harassment.
They worked on four emotions anger, joy, sadness, and fear
with no, low, medium and high degree of intensity. For the
purpose of multi-label classification, they used the one-vs.-
rest (OVR), support vector machine (SVM), Naïve Bayes
(NB), k-nearest neighbor (KNN), Multi-layer perceptron
(MLP), long-short-term memory (LSTM), and
convolutional neural network (CNN) with the embedding
representations of the words W2V, Glove, and Fast text with
the Fast text achieving the highest classification accuracy
with all classes of emotions. The next step was to study the
intensity of each emotion under the 3 categories of sexist
speech. This was applied by the classification of data set
that is already classified to the 3 categories of harassment to
16 different classes that represent the emotion with a degree
of intensity to discover what emotions are held under each
2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)
85
harassment category. The results showed that indirect
harassment is strongly related with high intensity of joy
feeling, the sexual harassment is strongly attached to anger,
joy and sadness feelings with a high degree of intensity,
while the physical harassment caries high degree of anger,
joy and sadness emotions.
B. Emojis Based Sentiment Analysis
Tashtoush et al. [40] built an automatic labeling system
that uses emojis to classify an Arabic twitter data set to one
of four emotions which are anger, joy, sadness, and disgust.
They labeled the most used emojis on twitter manually to
one of the four emotions. Then the weights for those emojis
are assigned based on the AFINN lexicon and if the weight
is not available, it is assigned the weight of its representative
word from the same lexicon. The tweet is labeled with the
emotion of the emoji that has the highest absolute sentiment
score in the tweet and the sentiment of the emoji is doubled
if it is mentioned more than on time. Then another portion
of the data is labeled manually by the authors. The entire
data set is preprocessed to build a classification model based
on the two classifications (manually and automatically) and
compare between them to see the effect of using emojis in
the classification process. They used three classifiers which
are the SVM, MNB, and RF. The automatic classification
that uses emojis outperformed the manual one in all of the
classifiers. In addition, the MNB achieved the highest
precision score of 0.757.
Chen et al. [41] studied the effect of engaging the emoji
in the sentiment classification process. The main idea was to
measure the performance of the classifiers when the input
data set enters the classifier with or without emojis. The
authors decided to consider 7 classes of emotions which are
sad, angry, happy, scared, thankful, surprised, and love.
Then they represented each emotion with a set of
representative keywords for the purpose of tweets
collection. If the tweet contains one of the keywords as a
hashtag it is included in the data set. Then for labeling the
data set, the hashtags in the tweet are studied and the tweet
is labeled with an emotion based on the last keyword
hashtag. For example, if the last hashtag is #joy, the tweet is
labeled with the happy emotion. For the purpose of
evaluation, they used the multinomial NB and SVM
classifiers. The MNB, in general, outperformed the SVM
accuracy. In the MNB, the classification accuracy when
using emojis is higher than the accuracy when eliminating
them. On the other hand, when using the SVM classifier, the
classification accuracy is slightly higher when eliminating
emojis than keeping them through the classification process.
C. Fuzzy Logic Classification
Bahreini et al. [42] introduced a fuzzy logic detection
system that detects the emotions based on live facial
gestures. This system detects the emotion from an image,
video or even a live camera recording. This methodology
used a database of facial expressions classified to emotions.
The proposed system studied the features in the data set of
the images to extract numerical features related to facial
dimensions to build the fuzzy rules based on them. The
fuzzy logic system study the features in the facial image,
transform the crisp values to fuzzy values, then using the
fuzzy rules, the emotion is detected. The proposed system
achieved an 83% accuracy level, stating that the system
performs better when working on the exhaustive emotions.
Liu et al. [43] introduced a fuzzy logic classification
system to classify social media posts and comments to one
of the hate speech categories which are sexual, religion,
color, and disability. The text is transformed into a
numerical vector using the bag of words methodology. After
that, the vector values are transformed to fuzzy values based
on the fuzzy membership function, then those values are
defuzzified to get the final class with its degrees by applying
the fuzzy rules. The other contribution of the authors is a
methodology to solve the problem of ambiguous instances
that gets the same membership degree by retraining those
instances and using the K-nearest neighbor to classify it
based on the similarity with the surrounding instances.
V. METHODOLOGY
This section presents our research methodology
including the dataset type, fuzzy logic classification and
defuzzification process.
A. Dataset
The dataset we obtained for our research is called
(Customer Support on Twitter), and it consists of 3 million
tweets and replies that reflects customer opinions and
problems about a group of famous brands products and
services. It is free and available on Kaggle [44]. For our
research purposes, we filtered the 3 million tweets and saved
the ones that contain multi-emoji to end up with 30233
tweets. We selected 120 emoji for the filtering process. We
selected emojis that represent a feeling or emotion and
excluded the ones that do not give a sentiment meaning.
We handled the 30233 tweets as follows: we separated
the text from emojis of the tweet. We generated two
separate files for the two systems (TCFL and ECFL), one
for the text only and the other contains emojis related to
each tweet. We handled each file in a different way that is
explained later in this paper.
Preprocessing steps included removing twitter handles,
hashtags, links, punctuations, numbers, special characters,
words less than two letters, and English stop words. We
handled repeated letters manually, and substituted
abbreviations and cursing words with a representative word.
B. Sentiment Lexicons
The sentiment lexicon is a special type of lexicons which
refers to dictionaries of words with a label or weight that
specifies their sentiment, whether a word gives a positive or
2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)
86
negative sentiment, or a weight that specifies the sentiment
of it. For the textual based classification system (TCFL) we
used 6 NRC sentiment lexicons [45] [46] [47] [48], which
attaches the words with a sentiment label (positive or
negative) and score. While for the emojis based
classification system (ECFL), we used the emoji lexicon in
[40] and followed their methodology to give sentiment
labels and scores for the emojis not included in their study.
C. Base Class Identification
Each tweet first is classified to a base emotion (Anger,
Sadness, Joy, Fear, Disgust, Surprise, Anticipation, and
Trust), and then the tweet words in TCFL and the tweet
emojis in ECFL enters the fuzzy logic framework to specify
the degrees of the classification. In TCFL, the word with the
highest sentiment score decides the base class, if not found
in the emotion lexicon, we move to the less weighted word,
and so on. NRC emotion lexicons [49] [50] are used for this
step. For the ECFL we classified the 120 emojis manually to
one of the 8 main emotions and the first emoji in the tweet
decides the main base class based on the constructed
emotion-emojis lexicon.
D. Fuzzification
The crisp values (weights) of the words and emojis are
converted to linguistic values using two fuzzification
membership functions we built for our systems using the
triangular form. The TCFL fuzzification membership
function ranges from 0 to ∞, and ECFL fuzzification
membership function ranges from 0 to 5 based on the crisp
values ranges over the Low, Medium, and High fuzzy sets.
The same ranges are used for the negative sentiment but
with a negative sign, as the membership function works as
mirror for both. The sentiment weights (crisp values) for the
words series in the TCFL and the emojis series in the ECFL
are fuzzified to linguistic values.
E. Fuzzy Rule Generation and Combining
We used the simple fuzzy grid to generate simple if-then
fuzzy rules. Then after that, the output of the rules is
combined based on the maximum methodology. We
generated 84 fuzzy rules as in [51]. After the fuzzification
step is done, we start by taking the first two words/emojis,
with their linguistic assignments, construct all the possible
combinations between those assigned pairs, apply the
suitable fuzzy rule to get the output. If more than one pair
has the same fuzzy rule output, we consider the maximum
degree to combine them. The output of this step is the fuzzy
set output with a numerical membership degree.
F. Defuzzification
The last step in the fuzzy logic classification is the
defuzzification process. In this step a final crisp output is
computed from the result we get from the previous steps.
There is a wide range of methods that can be used to
calculate the crisp output, in our study we used the center of
gravity. And after we calculate this defuzzified value, we
use the defuzzification membership function to get the final
class. The equation to calculate the center of gravity is as
follows [52]:
X* = ∑(Ai ∗ Xi)
𝑁
𝑖=1
∑Ai
𝑁
𝑖=1
(1)
where X* represents the crisp fuzzified value, the total area
of the fuzzy set is divided to sub-areas, and N reflects the
number of sub-areas covered by the result from the previous
step, Xi refers to the centroid of a specific subarea, and Ai
presents the area under each sub-area which is a fixed and
already specified value. After we multiply the area of each
sub-area covered with its centroid, they are summed and
divided by the summation of the total area covered, to get
the defuzzified crisp value. The crisp output is then mapped
to one or more of the following classes: Extremely Low,
Very Low, Low, Medium, High, Very High, and Extremely
High, with different degrees of truth based on the
defuzzification membership function we built or this step.
There are many shapes to build the membership functions
and for this step, we used the trapezoidal with arrange from
0 to 15.5. In another meaning we consider the first two
words/emojis then we apply the fuzzification, rule
generation and combining, and defuzzification steps, the
results of those are then taken with the third word/emoji and
the same steps are repeated and so on until there is no more
words/emojis in the tweet, which means we reached the
final classification of the tweet using the fuzzy logic under
the two system TCFL and ECFL.
For example, the tweet “@Uber_Support I need help
with my Account, it gets disabled every time I try to login
and it's getting really frustrating! ” is classified to
medium sadness with degree 0.174 and high sadness with
degree 0.826 under TCFL system, and to high anger with
degree 0.625 and very high anger with degree 0.375 under
ECFL classification system. Finally, we considered the class
with the higher degree as the final classification result.
VI. EXPERIMENTAL RESULTS
In order to evaluate the reliability of the classification
results, we obtained using our both developed systems; they
need to be compared with the human-based classification of
the same dataset and then measuring the percentage of
similarity between the results from the human-based
classification and our systems classification. For this reason,
we used the RapidMiner studio in order to obtain a human-
based classification for our dataset using the clustering data
mining task.
B. Human-Based Classification
Clustering is a data mining task that splits the data set
under study to a set of groups based on the similarity
between the instances. The instances inside one cluster are
2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)
87
similar to each other and the main goal of the clustering task
is to minimize the distance between the same cluster
instances and maximize it between the instances that belong
to different clusters [53]. There are various clustering
algorithms and in our study, we used the k-means clustering.
K-means clustering is an algorithm that splits the data
objects in the data set into a set of groups based on its
attributes. The k refers to the number of groups (clusters)
you want to split the data to. The main idea of the k-means
is that grouping the objects is done based on the idea of
minimizing the distance between the cluster centroid and the
objects of it [54].
We used the k-means in RapidMiner with k = 40, the k
in our study refers to the number of fuzzy emotion classes
we built our systems based on, such as high joy, low
sadness, high disgust, low disgust and so on, which means
each cluster represents a class of emotion. The number of
iterations we chose is 10. By the end of this step, our data
instances (tweets) are grouped to different 40 clusters based
on the similarity between them. The next step is to label
each cluster of the 40 based on the instances that belong to
it. We did this step manually. We took clusters one by one,
read most of the tweets that belong to it, and labeled it based
on human judgment to the most suitable emotion class from
the 40 classes. Which means all the tweets that belong to
that cluster is labeled with the class we chose for that
cluster. For this reason, they say that the main goal of k-
means is to do the classification. By the end of this step, we
managed to obtain a human-based classification for our data
set that can be used as a base to compare the performance of
our fuzzy classification systems with it.
C. Evaluation Methodology
We evaluated the performance of our systems TCFL
and ECFL by comparing their classification results with the
human-based classification in three different ways:
1. Exact Class Match
In this type of evaluation, the classification result of our
systems should match exactly (base class and degree) with
the human-based classification to considerate as a hit
(match) otherwise it is not being considered as a match
(miss).
2. Base Class Match
This is the simplest type of evaluation we applied. If the
base class from the fuzzy system classification result
matches the human-based classification base class, we
considered it as a match (hit) regardless of the degree of
membership. And if the base class does not match we
considered it as a mess.
3. Partial Class Match
The main idea here is that if the base class of the two
classifications we are comparing match but the degrees do
not match, it is a little bit unfair to consider it as an absolute
miss. It is fair to consider it as a partial match based on
certain criteria and by giving the match a percentage of
matching. To be clearer, the degree of very high and very
high have a matching percentage of 100%, very high and
high 80%, very high and medium have 60%, very high and
low have 40%, and very high and very low have 20%
matching percentage, and so on for the rest of degree pairs.
D. Final Results
Under the exact class match evaluation method, the
percentage of match between TCFL classification results
and the human-based classification is 48.96% and between
ECFL and the human-based classification, the matching
percentage is 32.54% under this type of evaluation that
requires a full exact match. We can see here that the text
based system accuracy outperforms the emoji based system
accuracy.
Under the base class match evaluation method, the
percentage of match between TCFL classification results
and the human-based classification is 56.84% and between
ECFL and the human-based classification, the matching
percentage is 43.49% under this type of evaluation that
requires the base class match only regardless the degree. We
can notice that, again, TCFL accuracy under this evaluation
type is higher than the accuracy of ECFL.
Under the partial class match evaluation method, the
percentage of match between TCFL classification results
and the human-based classification is 53.73% and between
ECFL and the human-based classification, the matching
percentage is 39.05% when considering the partial match
between the classes based on a matching percentage
between the degrees. For the third time, TCFL outperforms
ECFL accuracy. The results from all the evaluation
methodologies are shown in figure (1).
We can see that TCFL outperforms ECFL under any
type of evaluation. And if we looked at TCFL separately, it
is obvious that the highest accuracy percentage (percentage
of match) achieved when using base class match, follows it
the partial match and then comes the exact match
evaluation. The same thing for ECFL, if we looked at it
separately. This is logical because the exact match is the
strictest type of evaluation that requires a full match
regarding the base class and the degree to consider it as a
hit, which means the number of matching classification
pairs between the fuzzy system and the human-based system
will be less than the other two types of evaluation. On the
other hand, it is the most reliable type, because it can give
the most realistic view of the performance with no room for
doubt. Yet, if you are looking for more flexible and
inclusive systems, the other two types of evaluation, which
are the base class and the partial class match, are a good
choice giving a higher percentage of accuracy. In
conclusion, although both of our fuzzy systems (TCFL and
ECFL) give a good classification results, and by looking at
2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)
88
the performance of both, we can state that the text
classification using the fuzzy logic (TCFL) system wins
over the emoji classification using the fuzzy logic (ECFL)
system in giving a better and more accurate indication of the
emotion that a specific tweet holds.
Fig. 1. TCFL vs. ECFL under all types of evaluation
VII. CONCLUSION AND FUTURE WORK
The main objective of our study was to establish
sentiment analysis systems that extrapolates the emotion a
specific text holds in its words and emojis by choosing the
multi-labeling fuzzy logic approach for the classification
task. In addition to finding an answer to the question: which
one, the words or emojis, gives a better indication of the
emotion a text holds? We developed two emotion fuzzy
classification systems, the text-based system (TCFL) and
emoji-based system (ECFL). Our evaluation results led us to
conclude that the text of a tweet is more reliable in detecting
the tweet real emotion than the emojis it contains.
In the future, we are planning to build a united fuzzy
logic emotion classification system based on the text and
emojis along with each other. In addition to this, we are
planning to expand the number of main emotions that the
systems cover, and increasing the size of the data set in
order to increase the reliability of the final results.
REFERENCES
[1] M. Mohsin, "10 Social Media Statistics You Need to
Know in 2019 [Infographic]," 7 March 2019.
[2] "Sentiment Analysis: Nearly Everything You Need to
Know | MonkeyLearn", MonkeyLearn, 2019.
[3] T. Young, D. Hazarika, S. Poria, and E. Cambria,
“Recent trends in deep learning based natural language
processing [Review Article],” IEEE Comput. Intell.
Mag., vol. 13, no. 3, pp. 55–75, 2018.
[4] A. Collomb, C. Costea, D. Joyeux, O. Hasan, and L.
Brunie, “A Study and Comparison of Sentiment
Analysis Methods for Reputation Evaluation,” Res.
Rep. RR-LIRIS-2014-002, 2013.
[5] L. Luo et al., “Beyond polarity: Interpretable financial
Sentiment analysis with hierarchical query-driven
attention,” IJCAI Int. Jt. Conf. Artif. Intell., vol. 2018-
July, pp. 4244–4250, 2018.
[6] B. Kratzwald, S. Ilić, M. Kraus, S. Feuerriegel, and H.
Prendinger, “Deep learning for affective computing:
Text-based emotion recognition in decision support,”
Decis. Support Syst., vol. 115, pp. 24–35, 2018.
[7] G. Haralabopoulos and E. Simperl, “Crowdsourcing
for Beyond Polarity Sentiment Analysis A Pure
Emotion Lexicon,” 2017.
[8] Ayyoub, Essa, and Alsmadi, “Lexicon-based sentiment
analysis of Arabic tweets,” Int. J. Soc. Netw. Min. 2(2)
101, vol. X, 2015.
[9] R. T. Nakatsu and E. B. Grossman, “A Task-Fit Model
of Crowdsourcing : Finding the Right Crowdsourcing
Approach to Fit the Task,” J. Inf. Sci., pp. 1–11, 2014.
[10] C. C. Aggarwal and C. X. Zhai, Mining text data, vol.
9781461432234. 2013.
[11] C. N. Mahender and V. Korde, “Text
Classificationandclassifiers: a Survey,” Int. J. Artif.
Intell. Appl., vol. 3, no. 2, pp. 85–99, 2012.
[12] H. Zhang and Z. Pan, “Cross-voting SVM method for
multiple vehicle classification in wireless sensor
networks,” Sensors (Switzerland), vol. 18, no. 9, 2018.
[13] "Emoji Science Facts, Statistics, And What Your
Emojis Say About You! — Steemit", Steemit.com,
2019.
[14] B. Felbo, A. Mislove, A. Søgaard, I. Rahwan, and S.
Lehmann, “Using millions of emoji occurrences to
learn any-domain representations for detecting
sentiment, emotion and sarcasm,” 2017.
[15] G. Tsoumakas and I. Katakis, “Multi-Label
Classification: An Overview.”2007.
[16] S. Kanj, F. Abdallah, T. Denœux, and K. Tout,
“Editing training data for multi-label classification
with the k-nearest neighbor rule,” Pattern Anal. Appl.,
vol. 19, no. 1, pp. 145–161, 2016.
[17] A. Joly, L. Wehenkel, and P. Geurts, “Gradient tree
boosting with random output projections for multi-
label classification and multi-output regression,” pp.
1–40, 2019.
[18] E. A. Tanaka, S. R. Nozawa, A. A. Macedo, and J. A.
Baranauskas, “A multi-label approach using binary
relevance and decision trees applied to functional
genomics,” J. Biomed. Inform., 2015.
[19] S. Kanj, F. Abdallah, T. Denœux, and K. Tout,
“Editing training data for multi-label classification
with the k-nearest neighbor rule,” Pattern Anal. Appl.,
vol. 19, no. 1, pp. 145–161, 2016.
[20] “What is fuzzy logic? - Definition from
WhatIs.com”, SearchEnterpriseAI.
[21] R. Kay, "Sidebar: The history of fuzzy
logic", Computerworld, 2019.
[22] W. E. Sari, O. Wahyunggoro, and S. Fauziati, “A
comparative study on fuzzy Mamdani-Sugeno-
Tsukamoto for the childhood tuberculosis diagnosis,”
AIP Conf. Proc., vol. 1755, no. July 2016, 2016.
2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)
89
[23] A. Saepullah and S. W. Romi, “Comparative Analysis
of Mamdani , Sugeno And Tsukamoto Method of
Fuzzy Inference System for Air Conditioner Energy
Saving,” J. Intell. Syst., vol. 1, no. 2, pp. 143–147,
2015.
[24] O. Cordón, “A historical review of evolutionary
learning methods for Mamdani-type fuzzy rule-based
systems: Designing interpretable genetic fuzzy
systems,” Int. J. Approx. Reason., vol. 52, no. 6, pp.
894–913, 2011.
[25] K. Ishibashi et al., “Simultaneous Bidirectional
Transceiver Logic,” IEEE Micro, vol. 19, no. 1, pp.
14–19, 1999.
[26] D. Team, "What is Fuzzy Logic Systems in AI -
Architecture, Application - DataFlair", DataFlair.
[27] "Fuzzy Logic Tutorial: What is, Application &
Example", Guru99.com.
[28] "What is Fuzzy Logic System - Operation, Examples,
Advantages & Applications", ELECTRICAL
TECHNOLOGY.
[29] "Fuzzy Logic - How Does Fuzzy Logic Work:
Architecture and Applications", ElProCus - Electronic
Projects for Engineering Students.
[30] P. Fortemps and M. Roubens, “Ranking and
defuzzification methods based on area compensation,”
Fuzzy Sets Syst., vol. 82, no. 3, pp. 319–330, 1996.
[31] T. A. Runkler, "Selection of appropriate
defuzzification methods using application specific
properties," in IEEE Transactions on Fuzzy Systems,
vol. 5, no. 1, pp. 72-79, Feb. 1997.
[32] H. Ishibuchi and T. Nakashima, “A study on
generating fuzzy classification rules using histograms,”
no. April, pp. 132–140, 2002.
[33] O. Cordón, F. Herrera, and P. Villar, “Generating the
knowledge base of a fuzzy rule-based system by the
genetic learning of the data base,” IEEE Trans. Fuzzy
Syst., 2001.
[34] Y. H. Joo, “Fuzzy Systems Modeling : An
Introduction,” vol. 1, pp. 734–736, 2009.
[35] U. Kose, “Fundamentals of Fuzzy Logic with an Easy-
to-use , Interactive Fuzzy Control Application,” Int. J.
Mod. Eng. Res., vol. 2, no. 3, pp. 1198–1203, 2012.
[36] "Fuzzy Logic Membership
Function", www.tutorialspoint.com.
[37] N. Sharma, R. Pabreja, U. Yaqub, V. Atluri, S. A.
Chun, and J. Vaidya, “Web-based application for
sentiment analysis of live tweets,” pp. 1–2, 2018.
[38] M. Jabreel and A. Moreno, “A Deep Learning-Based
Approach for Multi-Label Emotion Classification in
Tweets,” Appl. Sci., vol. 9, no. 6, p. 1123, 2019.
[39] S. Sharifirad, B. Jafarpour, and S. Matwin, “How is
Your Mood When Writing Sexist tweets? Detecting
the Emotion Type and Intensity of Emotion Using
Natural Language Processing Techniques,” 2019.
[40] W. Hussien, M. Al-Ayyoub, Y. Tashtoush, and M. Al-
Kabi, “On the Use of Emojis to Train Emotion
Classifiers”, 2019.
[41] T. Lecompte and J. Chen, “Sentiment Analysis of
Tweets Including Emoji Data,” Proc. - 2017 Int. Conf.
Comput. Sci. Comput. Intell. CSCI 2017, pp. 793–798,
2018.
[42] K. Bahreini, W. van der Vegt, and W. Westera, “A
fuzzy logic approach to reliable real-time recognition
of facial emotions,” Multimed. Tools Appl., 2019.
[43] H. Liu, P. Burnap, W. Alorainy, and M. L. Williams,
“A Fuzzy Approach to Text Classification with Two-
Stage Training for Ambiguous Instances,” IEEE Trans.
Comput. Soc. Syst., vol. 6, no. 2, pp. 227–240, 2019.
[44] "Customer Support on Twitter", Kaggle.com.
[45] S. Kiritchenko, X. Zhu, and S. M. Mohammad,
“Sentiment analysis of short informal texts,” J. Artif.
Intell. Res., vol. 50, pp. 723–762, 2014.
[46] S. M. Mohammad, S. Kiritchenko, and X. Zhu, “NRC-
Canada: Building the State-of-the-Art in Sentiment
Analysis of Tweets,” 2013.
[47] X. Zhu, S. Kiritchenko, and S. Mohammad, “NRC-
Canada-2014: Recent Improvements in the Sentiment
Analysis of Tweets,” pp. 443–447, 2015.
[48] S. Kiritchenko, X. Zhu, C. Cherry, and S. Mohammad,
“NRC-Canada-2014: Detecting Aspects and Sentiment
in Customer Reviews,” pp. 437–442, 2015.
[49] S. M. Mohammad and P. D. Turney, “Crowdsourcing a
word-emotion association lexicon,” Comput. Intell.,
vol. 29, no. 3, pp. 436–465, 2013.
[50] S. M. Mohammad and S. Kiritchenko, “Using
Hashtags to Capture Fine Emotion Categories from
Tweets - Mohammad - 2014 - Computational
Intelligence - Wiley Online Library,” Comput. Intell.,
vol. 31, no. 2, 2015.
[51] Tasneem Mallah, Advisor: Yahya Tashtoush, “Fuzzy
Logic System for Emotion Identification of Tweets”,
Thesis, The CS department, Jordan University of
Science and Technology, 2018.
[52] E. Van Broekhoven and B. De Baets, “Fast and
accurate center of gravity defuzzification of fuzzy
system outputs defined on trapezoidal fuzzy
partitions,” Fuzzy Sets Syst., 2006.
[53] P. Berkhin, “A survey of clustering data mining
techniques,” in Grouping Multidimensional Data:
Recent Advances in Clustering, 2006.
[54] K. Teknomo, “K-Means Clustering Tutorial,”
Medicine (Baltimore), pp. 1–12, 2007.
2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)
90