Conference PaperPDF Available

Sentiment Analysis Using Naive Bayes Algorithm Of The Data Crawler: Twitter

Authors:
  • De La Salle Catholic University, Indonesia

Abstract and Figures

Sentiment analysis is an activity carried out to see the level of public sentiment or public opinion relating to goods or services and even a figure, both political and celebrity figures. In this study, a sentiment analysis application for twitter analysis was conducted on 2019 Republic of Indonesia presidential candidates, using the python programming language. There are several steps taken to conduct this sentiment analysis, which is to collect data using libraries in python, text processing, testing training data, and text classification using the Naïve Bayes method. The Naïve Bayes method is used to help classify classes or the level of sentiments of society. The results of this study found that the value of the positive sentiment polarity of the Jokowi-Ma'ruf Amin pair was 45.45% and a negative value of 54.55%, while the Prabowo-Sandiaga pair received a positive sentiment score of 44.32% and negative 55.68%. Then the combined data was tested from the training data used for each presidential candidate and get an accuracy of 80.90% ≈ 80.1%. In this study a comparison was carried out using the naïve bayes, svm and K-Nearest Neighbor (K-NN) methods which were tested using RapidMiner by producing a naïve bayes accuracy value of 75.58%, svm accuracy value of 63.99% and K-NN accuracy value of 73.34%.
Content may be subject to copyright.
Sentiment Analysis Using Naive Bayes Algorithm Of
The Data Crawler : Twitter
Meylan Wongkar
1
, Apriandy Angdresey
2
Department of Informatics Engineering
De La Salle Catholic University
Manado 95000, Indonesia
Email:
14013016@unikadelasalle.ac.id
1
, aangdresey@unikadelasalle.ac.id
2
Abstract— Sentiment analysis is an activity carried out to see
the level of public sentiment or public opinion relating to goods
or services and even a figure, both political and celebrity figures.
In this study, a sentiment analysis application for twitter analysis
was conducted on 2019 Republic of Indonesia presidential
candidates, using the python programming language. There are
several steps taken to conduct this sentiment analysis, which is to
collect data using libraries in python, text processing, testing
training data, and text classification using the Naïve Bayes
method. The Naïve Bayes method is used to help classify classes
or the level of sentiments of society. The results of this study
found that the value of the positive sentiment polarity of the
Jokowi-Ma'ruf Amin pair was 45.45% and a negative value of
54.55%, while the Prabowo-Sandiaga pair received a positive
sentiment score of 44.32% and negative 55.68%. . Then the
combined data was tested from the training data used for each
presidential candidate and get an accuracy of 80.90% 80.1%.
In this study a comparison was carried out using the naïve bayes,
svm and K-Nearest Neighbor (K-NN) methods which were tested
using RapidMiner by producing a naïve bayes accuracy value of
75.58%, svm accuracy value of 63.99% and K-NN accuracy value
of 73.34%.
Keywords: Sentiment analysis, Naïve Bayes algorithm, text
mining, Twitter.
I.
I
NTRODUCTION
Sentiment is a term used to describe a topic that is
subjective and objective and a factual or non-factual topic that
transcends the difference between a positive or negative topic
[1]. Sentiment analysis is an analysis carried out based on
rumors or gossip circulating [2]. Sentiment analysis is an
analytical approach used to analyze a text. The purpose of this
sentiment analysis is to determine a subjectivity of opinion,
the result of a review or a tweets. Based on sentiment analysis,
opinions from someone can be classified into various
categories based on data size and document type [3].
Nowadays, the community often provides responses and
criticisms of leaders, both political figures and public figures
through social media such as twitter. Twitter is one of the
social media that has a retweet feature that can be used by
every user to re-upload information or tweets which allows the
dissemination of information on social twitter media to be
faster [2]. Twitter is also a social media that can be used to
sentiment analysis using data tweets obtained by doing
crawler data. Data crawler is a method used to collect data. In
this study, the author aims to analyze the level of sentiment
from the community towards the 2019 presidential candidates
of the Republic of Indonesia obtained from the public on
Twitter social media, by doing crawler data. Furthermore, the
author will make a comparison of the accuracy of the Naïve
Bayes method, with other classification methods such as SVM
and KNN. Naïve Bayes method [3] is a method used to group
data according to the categories that already exist.
The paper is organized as follows, part II will be
explained about related work and information related to
sentimental analysis sentiment on twitter. Part III describes the
methods that present the formulas used for classification on
sentimental analysis sentiments on twitter. Part IV describes
the performance evaluation that contains the results of
research that has been done and section V concludes the
results of the research that has been done.
II.
RELATED WORK
In this study analysis sentiments were carried out for
factors related to customer satisfaction with e-commerce.
These factors can then help e-commerce companies focus on
improving service and company quality that will be associated
with increased traffic, sales, and company profits. Then do
data collection, data cleaning, and lexicon classification. The
three stages will be processed using R Studio, which is
software application that uses the basis of the R programming
language. Based on the results of the sentiment analysis on the
three largest e-commerce companies in the world, namely
Amazon, Ebay, and Rakuten, it can be concluded that several
factors that influence customer satisfaction that get more
customer attention are use fullness, service quality,
information quality, and system quality [4]. Then in [10] the
sentiment analysis was performed by comparing the SVM and
KNN methods. The tested data consists of various amounts to
see differences in the level of accuracy in each dataset. Data
sets were tested, the first one using 100 data tweets, then the
second 500 data tweets, the third one 1000, then 1500 data
tweets, after that using 2000 data tweets, 2500 data tweets and
the last 3000 data tweets. Based on the research, the results of
accuracy for the method are higher than the svm method.
In [5],[9] conducted a twitter sentiment analysis to see
the level of sentiment in twitter users using the K-Nearest
Neighbor (KNN) method and analyzed the level of community
Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on May 27,2020 at 07:38:13 UTC from IEEE Xplore. Restrictions apply.
sentiment towards the performance of Malang City
(SAMSAT) using the Naïve Bayes classifier method. In the
study [5] the results were collected by crawling data on twitter
and then doing text processing so that the data was ready to be
analyzed using the K-NN method. Based on the tests that have
been carried out, the highest accuracy value is produced,
namely 67.2% and the precision value of 56.94% in the test
using the value = 5, the highest recall value is 78.24% in the
test with a value of k = 15. While in paper [9], the collection
process data used web scrapping techniques to retrieve data
from twitter and then save it to the database. Then in the data
processing, the author performs duplicate tweets filtering
which functions to delete repeated tweets, folding cases to
convert all letters to lowercase letters, cleaning to clean data
tweets from characters or words that are not needed,
tokenizing to separate word words, filtering to retrieve
important words from the results of tokens. Based on the
research that has been done in the first stage of the class
category positive, negative and neutral opinions were obtained
81%, 89%, 80% and in the second test the results for the
positive, negative and neutral categories were 82%, 92% and
80% [5] [9].
In the study [7], [8] analyze the tweets using Indonesian
language was conducted to determine the level of public
sentiment towards a film and to public figures on Twitter
ahead of the 2014 general election in Indonesia using the
Naïve Bayes classifier method. In the sentiment analysis about
film [7] words were corrected on twitter with text processing
and then testing non-standard sentences. There are 140
opinion data as training data consisting of 70 positive opinion
data and 70 negative opinion data and 60 test data consisting
of 30 positive opinion data and 30 negative opinion data. The
test was carried out 3 times and produced the highest accuracy
on the third stage testing with an accuracy value of 86.67%.
Whereas in the research [8] the list of tweets obtained was
obtained by using the cron job facility on the Windows
operating system. At the data processing stage the author
performs cleaning data by deleting special characters, URLs
and eliminating word affixes. Based on 1329 data tested
tweets obtained classification testing results with the term
frequency feature obtained at 79.91% while the TF-IDF
feature obtained an accuracy of 79.68%. The classification
using the RapidMiner tools with Naive Bayes and the term
frequency feature obtained was 73.81%, while the TF-IDF
feature was obtained at 71.11% [7] [8].
III.
M
ETHODS
There are several processes that are carried out in this
text processing: firstly we collect data, in this study we using
data tweets are collected from Twitter social media by using a
crawler. Furthermore, we parse the tweets are get by
describing it verbatim. Hereinafter, we do the tokenization
process that is cleaning the tweet and selecting the meaningful
words. Then, we do text mining using naïve bayes method, the
process of text processing can be seen in Figure 1.
Figure 1. Text Processing
a. Collect data
In this data collection process, we use data tweets are
obtained using crawler data from Twitter taken from January to
May 2019. In Table 1 presents sample data taken from Twitter.
Table 1. Sample Data
No. Sentiment Text
1 Positive the president has worked well
2 Negative the president cannot keep his promises
3 Positive president helps flood victims
4 Negative the president raised the price of fuel oil
5 Negative the president raised the electricity tariffs
6 Negative the president was unsuccessful
7 Positive president built infrastructure
8 Positive president spent his holiday with his
family
… … ….
443 Negative president of imaging
b. Text parsing and Tokenization
The process of processing sentences into several words
that have been separated from characters and taken words that
have value. Table 2 is the results from the process of text
parsing and tokenization of the sample data Table 1.
Table 2. Tokenization
No. Sentiment Text
1 Positive [the, president, has, worked, well]
2 Negative [the, president, can, not, keep, his,
promises]
3 Positive [president, helps, flood, victims]
4 Negative [the, president, raised, the, price, of, fuel,
oil]
5 Negative [the, president, raised, the, electricity,
tariffs]
6 Negative [the, president, was, unsuccessful]
7 Positive [president, built, infrastructure]
8 Positive [president, spent, his, holiday, with, his,
family]
… …
443 Negative [president, of, imaging]
c. Text mining
The following is an explanation of the text processing
process using the Naïve Bayes method. Then calculate the
value of the class probability by dividing the number of class
data with the total or number of existing documents.
P(positive) = = 0,53.
P(negative) = = 0,46.
Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on May 27,2020 at 07:38:13 UTC from IEEE Xplore. Restrictions apply.
The following are test data that will be tested using
training data that has been previously cleaned, as shown in
Table 3. Then, the value of will be determined from
the existing test data. The following is the testing phase of test
data No.1 to determine the value of in the sentence in
the positive class, as presented in Table 4. Further, the stage of
testing in data test no.1 to determine the value of in
the sentence in the negative class, as presented in Table 5.
Table 3. Test Data
No. Text Sentiment
1 a good president ?
2 not a good president ?
Table 4. Probability Table for Test Data No.1 (Positive)
Term n nc p m
A 1 0 0,15 2
good 1 1 0,15 2
president 13 7 1,07 14
Table 5. Probability Calculation Table on No.1 (Negative) Test Data
Term n nc p m
A 1 1 0,15 2
Good 1 0 0,15 2
president 13 6 1,07 14
After that, all the results of the predetermined class
probabilities are multiplied to conclude the class classification
results from the test data that has been tested.
(positive) = 0,1 x 0,43 x 0,81 = 0,03483.
(negative) = 0,43 x 0,1 x 0,77 = 0,03311.
Based on the results of calculations that have been made,
it can be concluded that the No.1 test data entered into the
positive class. In Table 6 present the test data no. 2 to
determine the value of in the sentence in the positive
class. Whereas, the following will be tested on test data no.2 to
determine the value of in the sentence in the negative
class, presented in Table 7.
Table 6. Probability Calculation Table on Test Data No.2 (Positive)
Term n nc p m
not 1 0 0,15 2
a 1 0 0,15 2
good 1 1 0,15 2
president 13 7 1,07 14
Table 7. Probability Calculation Tables on No.2 (Negative) Test Data
Term n nc p m
not 1 1 0,15 2
a 1 1 0,15 2
good 1 0 0,15 2
president 13 6 1,07 14
After that, all the results of the predetermined class
probabilities are multiplied to conclude the class classification
results from the test data that has been tested.
(positive) = 0,1 x 0,1 x 0,43 x 0,81 = 0,0034.
(negative) = 0,43 x 0,43 x 0,1 x 0,77 = 0,0142.
Based on the results of calculations that have been done,
it can be concluded that the test data No.2 entered into the
negative class.
Table 8. Test Data Test Results Table
No. Sentiment Text Sentiment Result
1 Positive A good president Positive
2 Negative Not a good president Negative
Furthermore, the calculation is done to see the value of
accuracy, precision, recall, and f_1-score from the results of the
analysis. For this reason, the value of TP, TN, FP and FN is
determined. For TP and TN values taken from the initial
assumptions of the test data where in this study the value of TP
is 1 and TN is 1. While, FP and FN are taken from the results
of the classification of test data with FP values are 1 and FN is
1. After the text mining process, the data is classified using the
naïve Bayes method. Naïve Bayes Method is a method that can
be trained or used on small-scale data and can provide
predictive results in real-time. The naïve Bayes method can
also help in classifying a class whose results can be used in
parallel in increasing the scale of the dataset, especially in
large-scale data case studies [10].
In this research, the classification of every sentiment from
the community is present on social media. The following is the
equation formula of the Naïve Bayes methods [14]:
.
.
.
(2.1)
x = Data with an unknown class.
c = The data hypothesis is a specific class.
P(c|x) = Probability of hypothesis based on condition (posteriori
probability).
Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on May 27,2020 at 07:38:13 UTC from IEEE Xplore. Restrictions apply.
P(c) = Probability of hypothesis (prior probability).
P(x|c) = Probability based on conditions in the hypothesis.
P(x) = Probability of value c.
IV. P
ERFOMANCE EVALUATION
A. Experimental Setup
In this study, we using the data are collected through
social media twitter, by collecting all the tweets related to the
Republic Indonesia of presidential candidate pair period 2019 -
2024, where the data began to be collected during the
presidential election campaign in 2019 until after the election
period. Then the number of data are obtained as 443 with
sentiment attributes that contain information, while for labels
consists of positive and negative labels.
Table 9. Sample Data
No. Text Sentiment
1 jokowi ma'ruf winner successful perfect full Positive
2 second term free as a bird or a lame-duck
president nice analysis
Positive
3 it seems that 17 million votes have been
confirmed in central and east java all in
favor of jokowi
Positive
4 prabowo subianto argues jokowi campaign
stole votes during election
Negative
5 let’s be honest both two parties, jokowi and
prabowo, do inappropriate ways of
campaign
Negative
6 i hate jokowi Negative
7 do you see what's the difference between
prabowo and jokowi's speech? jokowi
didn't forget to say thanks to all participant
Positive
8 love my president Positive
….
443 media today not doing journalistic but be a
tool of a ruler they degrade themselves
Negative
B. Experimental Result and Discussion
The following is the result of a comparison of the value of
accuracy made on the Naïve Bayes method, SVM and KNN
which can be seen in Figure 4, where the accuracy of the Naïve
Bayes method is better with the results of 80.90%
80.1%,
compared to the KNN which has an accuracy rate of 75.58%
and with the lowest accuracy value of 63.99%
using SVM.
Moreover, In Figure 2 the results of the training data are
explained using the Naïve Bayes method which produces an
accuracy value of 80.90%
80.1%
and a precision value for
the positive class represented by the number 0, which is 0.89
and the others is number 1 for the negative class with a score of
0.71%, the value of the positive class is better than the negative
class, significant things can be seen from the positive class
support values above 50. Meanwhile, in Figure 3 we can see
the comparison of the accuracy of the Naïve Bayes, KNN and
SVM methods. The accuracy of the predictions made indicates
that the accuracy of the Naïve Bayes is better than both
methods that is 80.9%. While, the level of accuracy by KNN is
only 75.58% and the lowest accuracy level is SVM with
63.99%.
Figure 2. Data Training Result of Naïve Bayes
Figure 3. The Comparison of Level Accuracy
Figure 4 shows the results of calculating the comparison
of the precision prediction and recall values of each
presidential candidate using Naive Bayes, namely for the
precision value of the positive class of the Jokowi-Ma'ruf is
lower than that Prabowo and Sandiaga, but on the contrary for
the recall value of positive class. Whereas, the precision value
of negative class from Jokowi - Ma'ruf is slightly higher than
that of Prabowo and Sandiaga, whereas for the recall value.
In Figure 5 can be seen the results of the comparison of
the precision and recall values in each positive and negative
class of each method used, for the precision value in the
positive class Naive Bayes is still better, however for the recall
of positive classes SVM has a better value but not very
significant. Meanwhile, the precision value of negative Naive
Bayes class and SVM has almost the same value, on the
contrary for the recall value of negative Naive Bayes class is
better.
Figure 4. The Result of President Candidate using Naïve Bayes
Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on May 27,2020 at 07:38:13 UTC from IEEE Xplore. Restrictions apply.
Figure 5. The Testing Performance of Data Training
V. C
ONCLUSIONS
In this paper, we discuss the sentiment analysis of public
towards the Republic of Indonesia's presidential candidates for
the 2019-2024 period, using tweet data obtained from social
media: Twitter, by crawlers. In addition, we do the text
processing from data obtained and use Naive Bayes method to
predict the class. Afterward, compare with other methods such
as SVM and KNN. We classify by two classes namely positive
and negative. From the results of our experiments, it can be
seen that the Naïve Bayes method has a better accuracy level
(i.e. 80.90%) compared to using other methods, such as KNN
which only has an accuracy rate of 75.58% and an accuracy
rate using SVM which is 63 99%. For future work, we plan to
analyze sentiment of public satisfaction toward the
performance of the elected president of the Republic of
Indonesia, using the data from another social media, such as
Facebook and Instagram.
VI. R
EFERENCES
[1]
F. A. Pozzi, E. Fersini, E. Messina and B. Liu, in Sentiment
Analysis In Social Network, United States, Todd Green,
2017, p. 228.
[2]
S. Widoatmodjo, in Cara Cepat Memulai Investasi Saham
Panduan Bagi Pemula, Jakarta, Kompas Media, 2012, p.
139.
[3]
Rajput, D. Singh, Thakur, R. Singh, Basha and S. Muzamil,
Sentiment Analysis and Knowledge Discovery in
Contemporary Business, United States of America: IGI
Global, 2018.
[4]
C. A. Haryani, H. Tihari, Marhamah and Y. A. Nurrahman,
Sentimen Analisis Kepuasan Pelanggan E-commerce
Menggunakan Lexicon Classification dengan R, in
Konferensi Nasional Sistem Informasi, Pangkalpinang,
2018.
[5]
A. Deviyanto and M. D. Wahyudi, Penerapan Analisis
Sentimen Pada Pengguna Twitter Menggunakan Metode K-
Nearest Neighbor, JISKa (Jurnal Informatika Sunan
Kalijaga), Vols. Vol. 3, No. 1, no. ISSN : 2527-5836, p. 1–
13, 2018.
[6]
I. F. Rozi, E. N. Hamdana and M. B. I. Alfahmi,
Pengembangan Aplikasi Analisis Sentimen Twitter
Menggunakan Metode Naive Bayes Classifier (Studi Kasus
SAMSAT Kota Malang), Jurnal Informatika Polinema,
Volume 04, Edisi 02, no. ISSN: 2407-070X, 2018.
[7]
P. Antinasari, R. S. Perdana and M. A. Fauzi, Analisis
Sentimen Tentang Opini Film Pada Dokumen Twitter
Berbahasa Indonesia Menggunakan Naive Bayes Dengan
Perbaikan Kata Tidak Baku, Jurnal Pengembangan
Teknologi Informasi dan Ilmu Komputer, Vol. 1, No. 12,
no. e-ISSN: 2548-964X, pp. 1-9, Desember 2017.
[8]
A. F. Hidayatullah and A. SN, Analisis Sentimen dan
Klasifikasi Kategori Terhadap Tokoh Publik Pada Twitter,
Seminar Nasional Informatika 2014 (SemnasIF 2014) , no.
SSN: 1979-2328, 2014.
[9] M. R. Huq, A. Ali and A. Rahman, Sentiment Analysis on
Twitter Data using KNN and SVM, (IJACSA) International
Journal of Advanced Computer Science and Applications,
vol. 8, 2017.
[10] G. Chakraborty, M. Pagolu, S. Garla, Text Mining And
Analysis Practical Methods, Examples, And Case Studies
Using SAS, North Carolina, USA: SAS Institute Inc., 2013.
Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on May 27,2020 at 07:38:13 UTC from IEEE Xplore. Restrictions apply.
... A significant number of approaches have been proposed employing machine learning (ML), deep learning (DL), Recurrent Neural Networks (RNNs), and Transformer [11]-based large language models (LLMs) for comment analysis. In a study [12], three ML models-Naïve Bayes (NB), Support Vector Machine (SVM), and K-Nearest Neighbor (KNN)-were applied to Twitter data to comprehend people's sentiments. The NB model achieved an accuracy of 75.58% and outperformed the other models. ...
... The study [12] contributes to understanding public sentiment towards 2019 Republic of Indonesia presidential candidates by conducting sentiment analysis on Twitter data. Three ML algorithms (e.g., NB, SVM, and KNN) are used to classify sentiments, providing insights into public opinion dynamics during the election period. ...
Preprint
Full-text available
Effectively analyzing the comments to uncover latent intentions holds immense value in making strategic decisions across various domains. However, several challenges hinder the process of sentiment analysis including the lexical diversity exhibited in comments, the presence of long dependencies within the text, encountering unknown symbols and words, and dealing with imbalanced datasets. Moreover, existing sentiment analysis tasks mostly leveraged sequential models to encode the long dependent texts and it requires longer execution time as it processes the text sequentially. In contrast, the Transformer requires less execution time due to its parallel processing nature. In this work, we introduce a novel hybrid deep learning model, RoBERTa-BiLSTM, which combines the Robustly Optimized BERT Pretraining Approach (RoBERTa) with Bidirectional Long Short-Term Memory (BiLSTM) networks. RoBERTa is utilized to generate meaningful word embedding vectors, while BiLSTM effectively captures the contextual semantics of long-dependent texts. The RoBERTa-BiLSTM hybrid model leverages the strengths of both sequential and Transformer models to enhance performance in sentiment analysis. We conducted experiments using datasets from IMDb, Twitter US Airline, and Sentiment140 to evaluate the proposed model against existing state-of-the-art methods. Our experimental findings demonstrate that the RoBERTa-BiLSTM model surpasses baseline models (e.g., BERT, RoBERTa-base, RoBERTa-GRU, and RoBERTa-LSTM), achieving accuracies of 80.74%, 92.36%, and 82.25% on the Twitter US Airline, IMDb, and Sentiment140 datasets, respectively. Additionally, the model achieves F1-scores of 80.73%, 92.35%, and 82.25% on the same datasets, respectively.
... Supervised machine learning employs classification algorithms, such as Naive Bayes [19], Support Vector Machines [20], or Neural Networks [18], to train models with large labelled data sets to classify texts into emotional categories. ...
... ted-ultimate-dataset, reviewed in January 2024. 19 Available at https://huggingface.co/datasets/hita/social-behavior-emotions, reviewed in January 2024. 20 The model hyperparameters were set to temperature=0.0, top_p=1.0 ...
... Compared to conventional machine learning methods such as Naïve Bayes and SVM, BERT provides a more advanced understanding of a word's context within sentences. The Naïve Bayes technique is appreciated for its efficiency [108] but is less capable of understanding the contexts and assumes feature independence. Conversely, SVM is favorable in high-dimension data [109] but adjusting SVM's parameters can be complicated and might not intuitively perceive textual contexts. ...
Article
Full-text available
The tourism sector plays a crucial role in the global economy, encompassing both physical infrastructure and cultural engagement. Indonesia has a wide range of attractions and has experienced remarkable growth, with Bali as a notable example of this. With the rapid advancements in technology, travelers now have the freedom to explore independently, while online travel agencies (OTAs) serve as important resources. Reviews from tourists significantly impact the service quality and perception of destinations, and text mining is a valuable tool for extracting insights from unstructured review data. This research integrates multiclass text classification and a network analysis to uncover tourists' behavioral patterns through their perceptions and movement. This study innovates beyond conventional sentiment and cognitive image analysis to the tourists' perceptions of cognitive dimensions and explores the sentiment correlation between different cognitive dimensions. We find that destinations generally receive positive feedback, with 80.36% positive reviews, with natural attractions being the most positive aspect while infrastructure is the least positive aspect. We highlight that qualitative experiences do not always align with quantitative cost-effectiveness evaluations. Through a network analysis, we identify patterns in tourist mobility, highlighting three clusters of attractions that cater to diverse preferences. This research underscores the need for tourism destinations to strategically adapt to tourists' varied expectations, enhancing their appeal and aligning their services with preferences to elevate destination competitiveness and increase tourist satisfaction.
... Based on literature studies, most of the previous studies were only related to one SDGs point. SDGs 1 [14], [15]; SDGs 2 [16], [17]; SDGs 3 [4], [18], [19], [20], [21], [22]; SDGs 4 [6], [23], [24], [25], [26]; SDGs 5 [27], [28]; SDGs 7 [29], [30]; SDGs 8 [3], [8], [31], [32]; SDGs 9 [33]; SDGs 10 [34], [35]; SDGs 11. [36]; SDGs 12 [37], [38]; SDGs 13 [39], [40]; SDGs 14 [41], [42]; SDGs 16 [43]; and SDGs 17 [44]. ...
Article
Full-text available
As part of the Sustainable Development Goals (SDGs), governments worldwide have committed to improving people's lives to improve the quality of life for all, including the 17 such goals that were agreed upon in 2015 to benefit the human race as a whole. It would be interesting to see how society responds to the SDGs after approximately half of them have been achieved. This public response was analyzed in terms of sentiment. Within the total number of internet users in Indonesia, there are 18.45 million Twitter users. The platform enables anyone to write about anything they are experiencing in their lives, such as what is happening in their environment, what is happening in their education system, what is happening in the food industry, how people feel, and many more. The platform enables anyone to write about anything they are experiencing in their lives, such as what is happening in their environment, what is happening in their education system, what is happening in the food industry, how people feel, and many more. To model the data collected, the researchers used Ensemble Machine Learning Classifiers (EMLC) to model the data by using a machine learning classifier that uses machine learning techniques. The best model in this study is EMLC-Stacking with a data splitting of 80:20 and using SMOTE, which obtains an accuracy of 91%. This accuracy results from a 5% increase compared to when not using SMOTE. From 15,698 tweets, this research found that 47% were positive sentiments, 28% were negative sentiments, and 25% were neutral sentiments. The results that we measured offer hope that there will be a positive trend in the journey of the SDGs until 2030 if these findings are true.
... In this article, the K-nearest neighbor (KNN) method will be used to analyze the sentiment of Twitter users [6]. The aim of this analysis is to determine the level of satisfaction and perspectives of Twitter users towards Indonesian tourism [7]. ...
Article
Full-text available
This research analyzes the sentiment of Twitter users regarding tourism in Indonesia using the keyword "wonderful Indonesia" as the tourism promotion identity. This study aims to gain a deeper understanding of the public sentiment towards "wonderful Indonesia" through social media data analysis. The novelty obtained provides new insights into valuable information about Indonesian tourism for the government and relevant stakeholders in promoting Indonesian tourism and enhancing tourist experiences. The method used is tweet analysis and classification using the K-nearest neighbor (KNN) algorithm to determine the positive, neutral, or negative sentiment of the tweets. The classification results show that the majority of tweets (65.1% out of a total of 14,189 tweets) have a neutral sentiment, indicating that most tweets with the "wonderful Indonesia" tagline are related to advertising or promoting Indonesian tourism. However, the percentage of tweets with positive sentiment (33.8%) is higher than those with negative sentiment (1.1%). This study also achieved training results with an accuracy rate of 98.5%, precision of 97.6%, recall of 98.5%, and F1-score of 98.1%. However, reassessment is needed in the future as Twitter users' sentiments can change along with the development of Indonesian tourism itself.
... In this article, the K-nearest neighbor (KNN) method will be used to analyze the sentiment of Twitter users [6]. The aim of this analysis is to determine the level of satisfaction and perspectives of Twitter users towards Indonesian tourism [7]. ...
Article
Full-text available
This research analyzes the sentiment of Twitter users regarding tourism in Indonesia using the keyword "wonderful Indonesia" as the tourism promotion identity. The aim of this study is to gain a deeper understanding of the public sentiment towards "wonderful Indonesia" through social media data analysis. The novelty obtained provides new insights into valuable information about Indonesian tourism for the government and relevant stakeholders in promoting Indonesian tourism and enhancing tourist experiences. The method used is tweet analysis and classification using the K-nearest neighbor (KNN) algorithm to determine the positive, neutral, or negative sentiment of the tweets. The classification results show that the majority of tweets (65.1% out of a total of 14,189 tweets) have a neutral sentiment, indicating that most tweets with the "wonderful Indonesia" tagline are related to advertising or promoting Indonesian tourism. However, the percentage of tweets with positive sentiment (33.8%) is higher than those with negative sentiment (1.1%). This study also achieved training results with an accuracy rate of 98.5%, precision of 97.6%, recall of 98.5%, and F1-score of 98.1%. However, reassessment is needed in the future as Twitter users' sentiment can change along with the development of Indonesian tourism itself.
... In this article, the K-nearest neighbor (KNN) method will be used to analyze the sentiment of Twitter users [6]. The aim of this analysis is to determine the level of satisfaction and perspectives of Twitter users towards Indonesian tourism [7]. ...
Article
Full-text available
This research analyzes the sentiment of Twitter users regarding tourism in Indonesia using the keyword "wonderful Indonesia" as the tourism promotion identity. This study aims to gain a deeper understanding of the public sentiment towards "wonderful Indonesia" through social media data analysis. The novelty obtained provides new insights into valuable information about Indonesian tourism for the government and relevant stakeholders in promoting Indonesian tourism and enhancing tourist experiences. The method used is tweet analysis and classification using the K-nearest neighbor (KNN) algorithm to determine the positive, neutral, or negative sentiment of the tweets. The classification results show that the majority of tweets (65.1% out of a total of 14,189 tweets) have a neutral sentiment, indicating that most tweets with the "wonderful Indonesia" tagline are related to advertising or promoting Indonesian tourism. However, the percentage of tweets with positive sentiment (33.8%) is higher than those with negative sentiment (1.1%). This study also achieved training results with an accuracy rate of 98.5%, precision of 97.6%, recall of 98.5%, and F1-score of 98.1%. However, reassessment is needed in the future as Twitter users' sentiments can change along with the development of Indonesian tourism itself.
Article
Instagram merupakan salah satu sosial media yang digunakan merepresentasikan diri, berinteraksi, dan mencari informasi. Kita dapat mengambil sekumpulan informasi ke dalam bentuk dataset untuk diolah lebih lanjut. Berkaitan dengan hal itu , Program Kampus Merdeka sebagai objek analisis, mengingat Program Kampus Merdeka adalah program pemerintah yang saat ini sedang dijalanan oleh Kemendikbud. Pengambilan dataset yang didapat dari kumpulan komentar Instragram, tool yang digunakan adalah phantombuster. Menggunakan bahasa pemrograman phyton dengan tools Google Collab, dengan Algoritma Naïve Bayes Classifier dan Decision Tree untuk membuat model sentimen. Hasil scrapping mendapatkan 1764 data , dan sesudah dilakukan pre-processing menjadi 1694 data. Dari sentimen analisis yang telah dilakukan diperoleh hasil dari penerapan Algortima Complement Naïve Bayes dan Decision Tree, sebelum dilakukan SMOTE over-sampling, perbandingan data positif dan negatif sebesar 35,06% banding 64,95% , dengan akurasi model Decision Tree 84% dengan skenario pembagian data 90:10 dan model Complement Naïve Bayes 81% pada skenario pembagian data 80:20. Setelah dilakukan balancing data menggunakan SMOTE over-sampling, akurasi pada model Decision Tree naik sebesar 1% dari 86% menjadi 85%, dengan skenario pembagian data 90:10, dan pada model Complement Naïve Bayes juga mengalami kenaikan sebesar 2%, dari 82% menjadi 83% dengan skenario pembagian data 80:20
Article
For businesses and manufacturers, opinion information is extremely essential. They frequently want to know intimately what their goods and services think of customers and the public. It is nonetheless unrealistic to read every article manually on the site and extract valuable views from it. If you do it manually, too much info is available. Sentiment analysis provides for efficient and cost-effective large-scale processing of data. To learn more about sentiment analysis, this author examines how sentiment analysis is used by businesses to identify their strengths & limitations. This paper summarizes the sentiment analysis on amazon reviews with its application and classification levels. This work reviews sentiment classification techniques and also studies feature selection in sentiment analysis. In addition, it provides the introduction of natural language processing and its techniques, and limitations. At last, we described text mining in this paper.
Article
Full-text available
This research is made to implement the KNN (K-Nearest Neighbor) algorithm for sentiment analysis Twitter about Jakarta Governor Election 2017. The object is 2000 data tweets in Indonesia collected from Twitter during Januari 2017 using Python package called Twitterscraper. The methode used in sentiment analysis system is KNN with TF-IDF term weighting and Cosine similarity measure. As the test result, the highest accuracy is 67,2% when k=5, the highest precision is 56,94% with k=5, and the highest recall 78,24% with k=15. Keywords : K – Nearest Neighbor, Twitterscraper, TF-IDF, Cosine Similarity Penelitian ini dibuat untuk mengimplementasikan algoritma KNN (K - Nearest Neighbor) dalam analisis sentimen pengguna Twitter tentang topik Pilkada DKI 2017. Data tweet yang digunakan adalah sebanyak 2000 data tweet berbahasa Indonesia yang dikumpulkan selama bulan Januari 2017 menggunakan package Python bernama Twitterscraper. Menggunakan algoritma KNN dengan pembobotan kata TF-IDF dan fungsi Cosine Similarity, akan dilakukan pengklasifikasian nilai sentimen ke dalam dua kelas : positif dan negatif. Dari hasil pengujian diketahui bahwa nilai akurasi terbesar adalah 67,2% ketika k=5, presisi tertinggi 56,94% ketika k=5, dan recall 78,24% dengan k=15. Kata Kunci : K – Nearest Neighbor, Twitterscraper, TF-IDF, Cosine Similarity
Article
Full-text available
Twitter adalah salah satu media sosial dimana pengguna dapat mencari topik tertentu dan membahas isu-isu terkini. Beberapa pesan singkat atau tweet dapat memuat opini terhadap produk dan layanan yang dirasakan oleh masyarakat. Data ini dapat menjadi sumber data untuk dijadikan objek penelitian. Penelitian ini bertujuan untuk membangun aplikasi analisis sentimen yang menerapkan pendekatan Naïve Bayes Classifier untuk mengklasifikasikan kata-kata dan difokuskan pada tweet dalam bahasa Indonesia. Data diperoleh melalui cara web scrapping dan sumber teks yang digunakan sebagai topik bahasan adalah Sistem Administrasi Manunggal Satu Atap (SAMSAT) Malang Kota. Proses klasifikasi dilakukan melalui serangkaian tahapan seperti preproses (case folding, cleaning, tokenizing, dan stopword) serta proses klasifikasi dengan algoritma Naïve Bayes Classifier itu sendiri untuk mendapatkan hasil klasifikasi dengan kategori positif, negatif atau netral. Berdasarkan hasil penelitian, algoritma Naïve Bayes Classifier memberikan unjuk kerja yang baik dalam analisis sentimen. Dari hasil uji akurasi klasifikasi yang dilakukan oleh aplikasi menghasilkan nilai akurasi tertinggi pada setiap kategori positif, negatif, netral masing-masing sebesar 82%, 92%, 80% dengan jumlah data latih 200 tweet negatif, 200 tweet positif, dan 200 tweet netral.
Conference Paper
Full-text available
Kepuasan pelanggan adalah prioritas utama bagi setiap perusahaan yang bergerak di bidang ecommerce. Oleh karena itu, sangat penting bagi setiap e-commerce, terutama yang sudah melayani transaksi antar negara seperti Amazon, Ebay, dan Rakuten untuk mengetahui bagaimana kesan atau sentimen pelanggannya mengenai kualitas produk ataupun pelayanan yang diberikan agar bisa mempertahankan ataupun meningkatkan kualitasnya. Melalui pesatnya perkembangan teknologi, sentimen tersebut semakin mudah untuk diketahui. Salah satunya dengan memanfaatkan komentar yang ada di media sosial seperti Twitter. Dengan menganalisa komentar pengguna Twitter yang berkaitan dengan faktor- faktor penentu kepuasan pelanggan terhadap e-commerce menggunakan metode Lexicon classification, diperoleh bahwa faktor-faktor yang paling dominan dalam menentukan kepuasan pelanggan adalah usefullness, system quality, dan information quality. E-commerce yang ingin meningkatkan kepuasan pelanggannya, disarankan untuk berfokus pada ketiga faktor tersebut, dikarenakan faktor-faktor tersebut yang menjadi fokus utama dari pelanggan ketika berbelanja pada sebuah e-commerce.
Article
Full-text available
Pertumbuhan media sosial yang sangat pesat tidak membuat Twitter ditinggalkan oleh penggunanya. Twitter merupakan salah satu media sosial yang memungkinkan penggunanya untuk melakukan interaksi, berbagi informasi, atau bahkan untuk mengutarakan perasaan dan opini, termasuk juga dalam mengutarakan opini film. Komentar atau Tweet mengenai film yang ada pada Twitter dapat dijadikan sebagai evaluasi dalam menonton film dan meningkatkan produksi film. Untuk mengetahui hal tersebut, analisis sentimen dapat digunakan untuk mengklasifikasikan kedalam sentimen negatif atau positif. Didalam Tweet terkandung banyak ragam bahasa yang digunakan, yaitu diantaranya bahasa dalam bentuk tidak baku seperti bahasa slang, penyingkatan kata, dan salah eja. Oleh sebab itu dibutuhkan penanganan khusus pada Tweet. Pada penelitian ini digunakan kamus kata tidak baku dan normalisasi Levenshtein Distance untuk memperbaiki kata yang tidak baku menjadi kata baku dengan pengklasifikasian Naive Bayes. Berdasarkan hasil pengujian yang telah dilakukan didapatkan akurasi tertinggi dengan nilai accuracy, precision, recall, dan f-measure sebesar 98.33%, 96.77%, 100%, dan 98.36%. ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- The rapid growth of social media does not make Twitter left by its users. Twitter is one of the social media that allows user to interact each other, share information, or even to express feelings and opinions, including in expressing film opinions. Comments or Tweets about movies that exist on Twitter can be used as an evaluation in watching movies and increasing film production. To figure it out, sentiment analysis can be used to classify into negative or positive sentiments. In Tweets contain many languages used in the form of non-standard languages such as slang, word-outs, and misspellings. Therefore it takes special handling on Twitter comments. In this research used non-standard word dictionary and Levenshtein Distance normalization to improve non-standard word to standard word by classification Naive Bayes. Based on the result of the test, the highest accuracy, precision, recall, and f-measure value are 98.33%, 96.77%, 100%, and 98.36%.
Conference Paper
Full-text available
dibangun dan pada tools RapidMiner memperlihatkan bahwa akurasi dengan term frequency memberikan hasil akurasi yang lebih baik daripada akurasi dengan fitur TF-IDF. Metode Support Vector Machine menghasilkan akurasi performansi yang lebih baik daripada metode Naive Bayes baik dalam klasifikasi sentimen maupun dalam klasifikasi kategori. Namun demikian, secara keseluruhan penggunaan metode Support Vector Machine dan Naive Bayes sama-sama memiliki performansi yang cukup baik untuk melakukan klasifikasi tweet. 1. PENDAHULUAN Media jejaring sosial seperti Twitter, Facebook, dan Youtube merupakan beberapa media perangkat komunikasi terpopuler di masyarakat saat ini (Aliandu, 2012; Kumar dan Sebastian, 2012). Menjelang pemilihan umum, para politisi atau tokoh publik sering memanfaatkan media sosial untuk berkampanye dan meningkatkan popularitas mereka. Salah satu media jejaring sosial yang telah dimanfaatkan dalam pemilihan umum adalah Twitter. Twitter telah dimanfaatkan dalam pemilihan umum di beberapa negara seperti Singapura, Jerman, dan Amerika (Sang dan Bos, 2012; Choy dkk, 2012; Choy dkk, 2011). Penelitian ini mencoba memanfaatkan Twitter dengan menganalisis tweet berbahasa Indonesia yang membicarakan tentang tokoh publik menjelang pemilihan umum 2014 di Indonesia. Tokoh publik yang dianalisis merupakan tokoh publik dengan popularitas tertinggi hasil survei yang telah dilakukan dari beberapa lembaga survei seperti Lembaga Survei Indonesia (LSI), Lembaga Survei Nasional (LSN), Sogeng Sarjadi Syndicate (SSS), Centre for Strategic and International Studies (CSIS) dan Saiful Mujani Research and Consulting (SMRC). Analisis dilakukan dengan mengklasifikasikan tweet menggunakan Naive Bayes Classifier. Naive Bayes Classifier dikombinasikan dengan fitur untuk dapat mendeteksi negasi dan pembobotan menggunakan term frequency serta TF-IDF. Klasifikasi tweet pada penelitian ini diperoleh berdasarkan kombinasi antara kelas sentimen dan kelas kategori. Kelas sentimen terbagi menjadi dua polaritas yaitu sentimen positif dan negatif. Kelas kategori dipilih berdasarkan indikator yang telah digunakan oleh LSI (Lembaga Survei Indonesia) untuk menilai tokoh yang dianggap layak maju dalam pilpres 2014. Ketiga dimensi tersebut adalah kapabilitas, integritas, dan akseptabilitas. Dimensi kapabilitas mencakup kepandaian, wawasan, visi, kepemimpinan, ketegasan, dan keberanian dalam pengambilan keputusan. Integritas mencakup aspek moral, kejujuran, satu dalam kata dan perbuatan, serta bersih dari cacat moral, etik, dan hukum. Akseptabilitas adalah sikap penerimaan masyarakat terhadap seorang tokoh (Mujani, dkk, 2012).
Text Mining And Analysis Practical Methods, Examples, And Case Studies Using SAS
  • G Chakraborty
  • M Pagolu
  • S Garla
G. Chakraborty, M. Pagolu, S. Garla, Text Mining And Analysis Practical Methods, Examples, And Case Studies Using SAS, North Carolina, USA: SAS Institute Inc., 2013.
Sentiment Analysis In Social Network, United States, Todd Green
  • F A Pozzi
  • E Fersini
  • E Messina
  • B Liu
Sentiment Analysis and Knowledge Discovery in Contemporary Business
  • D Singh Rajput
  • R Singh
  • Thakur
  • S Basha
  • Muzamil
  • I F Rozi
  • E N Hamdana
  • M B I Alfahmi
I. F. Rozi, E. N. Hamdana and M. B. I. Alfahmi, Pengembangan Aplikasi Analisis Sentimen Twitter Menggunakan Metode Naive Bayes Classifier (Studi Kasus SAMSAT Kota Malang), Jurnal Informatika Polinema, Volume 04, Edisi 02, no. ISSN: 2407-070X, 2018.