Conference PaperPDF Available

Sentiment Analysis Using Naive Bayes Algorithm Of The Data Crawler: Twitter

October 2019

October 2019

DOI:10.1109/ICIC47613.2019.8985884

Conference: 2019 Fourth International Conference on Informatics and Computing (ICIC)

Authors:

Apriandy Angdresey

De La Salle Catholic University, Indonesia

Sentiment analysis is an activity carried out to see the level of public sentiment or public opinion relating to goods or services and even a figure, both political and celebrity figures. In this study, a sentiment analysis application for twitter analysis was conducted on 2019 Republic of Indonesia presidential candidates, using the python programming language. There are several steps taken to conduct this sentiment analysis, which is to collect data using libraries in python, text processing, testing training data, and text classification using the Naïve Bayes method. The Naïve Bayes method is used to help classify classes or the level of sentiments of society. The results of this study found that the value of the positive sentiment polarity of the Jokowi-Ma'ruf Amin pair was 45.45% and a negative value of 54.55%, while the Prabowo-Sandiaga pair received a positive sentiment score of 44.32% and negative 55.68%. Then the combined data was tested from the training data used for each presidential candidate and get an accuracy of 80.90% ≈ 80.1%. In this study a comparison was carried out using the naïve bayes, svm and K-Nearest Neighbor (K-NN) methods which were tested using RapidMiner by producing a naïve bayes accuracy value of 75.58%, svm accuracy value of 63.99% and K-NN accuracy value of 73.34%.

…

Probability Calculation Table on No.1 (Negative) Test Data

…

Probability Calculation Table on Test Data No.2 (Positive)

…

Figures - uploaded by Apriandy Angdresey

Content may be subject to copyright.

Content uploaded by Apriandy Angdresey

Content may be subject to copyright.

Sentiment Analysis Using Naive Bayes Algorithm Of

The Data Crawler : Twitter

Meylan Wongkar

, Apriandy Angdresey

Department of Informatics Engineering

De La Salle Catholic University

Manado 95000, Indonesia

Email:

14013016@unikadelasalle.ac.id

, aangdresey@unikadelasalle.ac.id

Abstract— Sentiment analysis is an activity carried out to see

the level of public sentiment or public opinion relating to goods

or services and even a figure, both political and celebrity figures.

In this study, a sentiment analysis application for twitter analysis

was conducted on 2019 Republic of Indonesia presidential

candidates, using the python programming language. There are

several steps taken to conduct this sentiment analysis, which is to

collect data using libraries in python, text processing, testing

training data, and text classification using the Naïve Bayes

method. The Naïve Bayes method is used to help classify classes

or the level of sentiments of society. The results of this study

found that the value of the positive sentiment polarity of the

Jokowi-Ma'ruf Amin pair was 45.45% and a negative value of

54.55%, while the Prabowo-Sandiaga pair received a positive

sentiment score of 44.32% and negative 55.68%. . Then the

combined data was tested from the training data used for each

presidential candidate and get an accuracy of 80.90% ≈ 80.1%.

In this study a comparison was carried out using the naïve bayes,

svm and K-Nearest Neighbor (K-NN) methods which were tested

using RapidMiner by producing a naïve bayes accuracy value of

75.58%, svm accuracy value of 63.99% and K-NN accuracy value

of 73.34%.

Keywords: Sentiment analysis, Naïve Bayes algorithm, text

mining, Twitter.

NTRODUCTION

Sentiment is a term used to describe a topic that is

subjective and objective and a factual or non-factual topic that

transcends the difference between a positive or negative topic

[1]. Sentiment analysis is an analysis carried out based on

rumors or gossip circulating [2]. Sentiment analysis is an

analytical approach used to analyze a text. The purpose of this

sentiment analysis is to determine a subjectivity of opinion,

the result of a review or a tweets. Based on sentiment analysis,

opinions from someone can be classified into various

categories based on data size and document type [3].

Nowadays, the community often provides responses and

criticisms of leaders, both political figures and public figures

through social media such as twitter. Twitter is one of the

social media that has a retweet feature that can be used by

every user to re-upload information or tweets which allows the

dissemination of information on social twitter media to be

faster [2]. Twitter is also a social media that can be used to

sentiment analysis using data tweets obtained by doing

crawler data. Data crawler is a method used to collect data. In

this study, the author aims to analyze the level of sentiment

from the community towards the 2019 presidential candidates

of the Republic of Indonesia obtained from the public on

Twitter social media, by doing crawler data. Furthermore, the

author will make a comparison of the accuracy of the Naïve

Bayes method, with other classification methods such as SVM

and KNN. Naïve Bayes method [3] is a method used to group

data according to the categories that already exist.

The paper is organized as follows, part II will be

explained about related work and information related to

sentimental analysis sentiment on twitter. Part III describes the

methods that present the formulas used for classification on

sentimental analysis sentiments on twitter. Part IV describes

the performance evaluation that contains the results of

research that has been done and section V concludes the

results of the research that has been done.

II.

RELATED WORK

In this study analysis sentiments were carried out for

factors related to customer satisfaction with e-commerce.

These factors can then help e-commerce companies focus on

improving service and company quality that will be associated

with increased traffic, sales, and company profits. Then do

data collection, data cleaning, and lexicon classification. The

three stages will be processed using R Studio, which is

software application that uses the basis of the R programming

language. Based on the results of the sentiment analysis on the

three largest e-commerce companies in the world, namely

Amazon, Ebay, and Rakuten, it can be concluded that several

factors that influence customer satisfaction that get more

customer attention are use fullness, service quality,

information quality, and system quality [4]. Then in [10] the

sentiment analysis was performed by comparing the SVM and

KNN methods. The tested data consists of various amounts to

see differences in the level of accuracy in each dataset. Data

sets were tested, the first one using 100 data tweets, then the

second 500 data tweets, the third one 1000, then 1500 data

tweets, after that using 2000 data tweets, 2500 data tweets and

the last 3000 data tweets. Based on the research, the results of

accuracy for the method are higher than the svm method.

In [5],[9] conducted a twitter sentiment analysis to see

the level of sentiment in twitter users using the K-Nearest

Neighbor (KNN) method and analyzed the level of community

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on May 27,2020 at 07:38:13 UTC from IEEE Xplore. Restrictions apply.

sentiment towards the performance of Malang City

(SAMSAT) using the Naïve Bayes classifier method. In the

study [5] the results were collected by crawling data on twitter

and then doing text processing so that the data was ready to be

analyzed using the K-NN method. Based on the tests that have

been carried out, the highest accuracy value is produced,

namely 67.2% and the precision value of 56.94% in the test

using the value = 5, the highest recall value is 78.24% in the

test with a value of k = 15. While in paper [9], the collection

process data used web scrapping techniques to retrieve data

from twitter and then save it to the database. Then in the data

processing, the author performs duplicate tweets filtering

which functions to delete repeated tweets, folding cases to

convert all letters to lowercase letters, cleaning to clean data

tweets from characters or words that are not needed,

tokenizing to separate word words, filtering to retrieve

important words from the results of tokens. Based on the

research that has been done in the first stage of the class

category positive, negative and neutral opinions were obtained

81%, 89%, 80% and in the second test the results for the

positive, negative and neutral categories were 82%, 92% and

80% [5] [9].

In the study [7], [8] analyze the tweets using Indonesian

language was conducted to determine the level of public

sentiment towards a film and to public figures on Twitter

ahead of the 2014 general election in Indonesia using the

Naïve Bayes classifier method. In the sentiment analysis about

film [7] words were corrected on twitter with text processing

and then testing non-standard sentences. There are 140

opinion data as training data consisting of 70 positive opinion

data and 70 negative opinion data and 60 test data consisting

of 30 positive opinion data and 30 negative opinion data. The

test was carried out 3 times and produced the highest accuracy

on the third stage testing with an accuracy value of 86.67%.

Whereas in the research [8] the list of tweets obtained was

obtained by using the cron job facility on the Windows

operating system. At the data processing stage the author

performs cleaning data by deleting special characters, URLs

and eliminating word affixes. Based on 1329 data tested

tweets obtained classification testing results with the term

frequency feature obtained at 79.91% while the TF-IDF

feature obtained an accuracy of 79.68%. The classification

using the RapidMiner tools with Naive Bayes and the term

frequency feature obtained was 73.81%, while the TF-IDF

feature was obtained at 71.11% [7] [8].

III.

ETHODS

There are several processes that are carried out in this

text processing: firstly we collect data, in this study we using

data tweets are collected from Twitter social media by using a

crawler. Furthermore, we parse the tweets are get by

describing it verbatim. Hereinafter, we do the tokenization

process that is cleaning the tweet and selecting the meaningful

words. Then, we do text mining using naïve bayes method, the

process of text processing can be seen in Figure 1.

Figure 1. Text Processing

a. Collect data

In this data collection process, we use data tweets are

obtained using crawler data from Twitter taken from January to

May 2019. In Table 1 presents sample data taken from Twitter.

Table 1. Sample Data

No. Sentiment Text

1 Positive the president has worked well

2 Negative the president cannot keep his promises

3 Positive president helps flood victims

4 Negative the president raised the price of fuel oil

5 Negative the president raised the electricity tariffs

6 Negative the president was unsuccessful

7 Positive president built infrastructure

8 Positive president spent his holiday with his

family

… … ….

443 Negative president of imaging

b. Text parsing and Tokenization

The process of processing sentences into several words

that have been separated from characters and taken words that

have value. Table 2 is the results from the process of text

parsing and tokenization of the sample data Table 1.

Table 2. Tokenization

No. Sentiment Text

1 Positive [the, president, has, worked, well]

2 Negative [the, president, can, not, keep, his,

promises]

3 Positive [president, helps, flood, victims]

4 Negative [the, president, raised, the, price, of, fuel,

oil]

5 Negative [the, president, raised, the, electricity,

tariffs]

6 Negative [the, president, was, unsuccessful]

7 Positive [president, built, infrastructure]

8 Positive [president, spent, his, holiday, with, his,

family]

… … …

443 Negative [president, of, imaging]

c. Text mining

The following is an explanation of the text processing

process using the Naïve Bayes method. Then calculate the

value of the class probability by dividing the number of class

data with the total or number of existing documents.

P(positive) = = 0,53.

P(negative) = = 0,46.

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on May 27,2020 at 07:38:13 UTC from IEEE Xplore. Restrictions apply.

The following are test data that will be tested using

training data that has been previously cleaned, as shown in

Table 3. Then, the value of will be determined from

the existing test data. The following is the testing phase of test

data No.1 to determine the value of in the sentence in

the positive class, as presented in Table 4. Further, the stage of

testing in data test no.1 to determine the value of in

the sentence in the negative class, as presented in Table 5.

Table 3. Test Data

No. Text Sentiment

1 a good president ?

2 not a good president ?

Table 4. Probability Table for Test Data No.1 (Positive)

Term n nc p m

A 1 0 0,15 2

good 1 1 0,15 2

president 13 7 1,07 14

Table 5. Probability Calculation Table on No.1 (Negative) Test Data

Term n nc p m

A 1 1 0,15 2

Good 1 0 0,15 2

president 13 6 1,07 14

After that, all the results of the predetermined class

probabilities are multiplied to conclude the class classification

results from the test data that has been tested.

(positive) = 0,1 x 0,43 x 0,81 = 0,03483.

(negative) = 0,43 x 0,1 x 0,77 = 0,03311.

Based on the results of calculations that have been made,

it can be concluded that the No.1 test data entered into the

positive class. In Table 6 present the test data no. 2 to

determine the value of in the sentence in the positive

class. Whereas, the following will be tested on test data no.2 to

determine the value of in the sentence in the negative

class, presented in Table 7.

Table 6. Probability Calculation Table on Test Data No.2 (Positive)

Term n nc p m

not 1 0 0,15 2

a 1 0 0,15 2

good 1 1 0,15 2

president 13 7 1,07 14

Table 7. Probability Calculation Tables on No.2 (Negative) Test Data

Term n nc p m

not 1 1 0,15 2

a 1 1 0,15 2

good 1 0 0,15 2

president 13 6 1,07 14

After that, all the results of the predetermined class

probabilities are multiplied to conclude the class classification

results from the test data that has been tested.

(positive) = 0,1 x 0,1 x 0,43 x 0,81 = 0,0034.

(negative) = 0,43 x 0,43 x 0,1 x 0,77 = 0,0142.

Based on the results of calculations that have been done,

it can be concluded that the test data No.2 entered into the

negative class.

Table 8. Test Data Test Results Table

No. Sentiment Text Sentiment Result

1 Positive A good president Positive

2 Negative Not a good president Negative

Furthermore, the calculation is done to see the value of

accuracy, precision, recall, and f_1-score from the results of the

analysis. For this reason, the value of TP, TN, FP and FN is

determined. For TP and TN values taken from the initial

assumptions of the test data where in this study the value of TP

is 1 and TN is 1. While, FP and FN are taken from the results

of the classification of test data with FP values are 1 and FN is

1. After the text mining process, the data is classified using the

naïve Bayes method. Naïve Bayes Method is a method that can

be trained or used on small-scale data and can provide

predictive results in real-time. The naïve Bayes method can

also help in classifying a class whose results can be used in

parallel in increasing the scale of the dataset, especially in

large-scale data case studies [10].

In this research, the classification of every sentiment from

the community is present on social media. The following is the

equation formula of the Naïve Bayes methods [14]:

(2.1)

x = Data with an unknown class.

c = The data hypothesis is a specific class.

P(c|x) = Probability of hypothesis based on condition (posteriori

probability).

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on May 27,2020 at 07:38:13 UTC from IEEE Xplore. Restrictions apply.

P(c) = Probability of hypothesis (prior probability).

P(x|c) = Probability based on conditions in the hypothesis.

P(x) = Probability of value c.

IV. P

ERFOMANCE EVALUATION

A. Experimental Setup

In this study, we using the data are collected through

social media twitter, by collecting all the tweets related to the

Republic Indonesia of presidential candidate pair period 2019 -

2024, where the data began to be collected during the

presidential election campaign in 2019 until after the election

period. Then the number of data are obtained as 443 with

sentiment attributes that contain information, while for labels

consists of positive and negative labels.

Table 9. Sample Data

No. Text Sentiment

1 jokowi ma'ruf winner successful perfect full Positive

2 second term free as a bird or a lame-duck

president nice analysis

Positive

3 it seems that 17 million votes have been

confirmed in central and east java all in

favor of jokowi

Positive

4 prabowo subianto argues jokowi campaign

stole votes during election

Negative

5 let’s be honest both two parties, jokowi and

prabowo, do inappropriate ways of

campaign

Negative

6 i hate jokowi Negative

7 do you see what's the difference between

prabowo and jokowi's speech? jokowi

didn't forget to say thanks to all participant

Positive

8 love my president Positive

… …. …

443 media today not doing journalistic but be a

tool of a ruler they degrade themselves

Negative

B. Experimental Result and Discussion

The following is the result of a comparison of the value of

accuracy made on the Naïve Bayes method, SVM and KNN

which can be seen in Figure 4, where the accuracy of the Naïve

Bayes method is better with the results of 80.90%

≈

80.1%,

compared to the KNN which has an accuracy rate of 75.58%

and with the lowest accuracy value of 63.99%

using SVM.

Moreover, In Figure 2 the results of the training data are

explained using the Naïve Bayes method which produces an

accuracy value of 80.90%

≈

80.1%

and a precision value for

the positive class represented by the number 0, which is 0.89

and the others is number 1 for the negative class with a score of

0.71%, the value of the positive class is better than the negative

class, significant things can be seen from the positive class

support values above 50. Meanwhile, in Figure 3 we can see

the comparison of the accuracy of the Naïve Bayes, KNN and

SVM methods. The accuracy of the predictions made indicates

that the accuracy of the Naïve Bayes is better than both

methods that is 80.9%. While, the level of accuracy by KNN is

only 75.58% and the lowest accuracy level is SVM with

63.99%.

Figure 2. Data Training Result of Naïve Bayes

Figure 3. The Comparison of Level Accuracy

Figure 4 shows the results of calculating the comparison

of the precision prediction and recall values of each

presidential candidate using Naive Bayes, namely for the

precision value of the positive class of the Jokowi-Ma'ruf is

lower than that Prabowo and Sandiaga, but on the contrary for

the recall value of positive class. Whereas, the precision value

of negative class from Jokowi - Ma'ruf is slightly higher than

that of Prabowo and Sandiaga, whereas for the recall value.

In Figure 5 can be seen the results of the comparison of

the precision and recall values in each positive and negative

class of each method used, for the precision value in the

positive class Naive Bayes is still better, however for the recall

of positive classes SVM has a better value but not very

significant. Meanwhile, the precision value of negative Naive

Bayes class and SVM has almost the same value, on the

contrary for the recall value of negative Naive Bayes class is

better.

Figure 4. The Result of President Candidate using Naïve Bayes

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on May 27,2020 at 07:38:13 UTC from IEEE Xplore. Restrictions apply.

Figure 5. The Testing Performance of Data Training

V. C

ONCLUSIONS

In this paper, we discuss the sentiment analysis of public

towards the Republic of Indonesia's presidential candidates for

the 2019-2024 period, using tweet data obtained from social

media: Twitter, by crawlers. In addition, we do the text

processing from data obtained and use Naive Bayes method to

predict the class. Afterward, compare with other methods such

as SVM and KNN. We classify by two classes namely positive

and negative. From the results of our experiments, it can be

seen that the Naïve Bayes method has a better accuracy level

(i.e. 80.90%) compared to using other methods, such as KNN

which only has an accuracy rate of 75.58% and an accuracy

rate using SVM which is 63 99%. For future work, we plan to

analyze sentiment of public satisfaction toward the

performance of the elected president of the Republic of

Indonesia, using the data from another social media, such as

Facebook and Instagram.

VI. R

EFERENCES

[1]

F. A. Pozzi, E. Fersini, E. Messina and B. Liu, in Sentiment

Analysis In Social Network, United States, Todd Green,

2017, p. 228.

[2]

S. Widoatmodjo, in Cara Cepat Memulai Investasi Saham

Panduan Bagi Pemula, Jakarta, Kompas Media, 2012, p.

139.

[3]

Rajput, D. Singh, Thakur, R. Singh, Basha and S. Muzamil,

Sentiment Analysis and Knowledge Discovery in

Contemporary Business, United States of America: IGI

Global, 2018.

[4]

C. A. Haryani, H. Tihari, Marhamah and Y. A. Nurrahman,

Sentimen Analisis Kepuasan Pelanggan E-commerce

Menggunakan Lexicon Classification dengan R, in

Konferensi Nasional Sistem Informasi, Pangkalpinang,

2018.

[5]

A. Deviyanto and M. D. Wahyudi, Penerapan Analisis

Sentimen Pada Pengguna Twitter Menggunakan Metode K-

Nearest Neighbor, JISKa (Jurnal Informatika Sunan

Kalijaga), Vols. Vol. 3, No. 1, no. ISSN : 2527-5836, p. 1–

13, 2018.

[6]

I. F. Rozi, E. N. Hamdana and M. B. I. Alfahmi,

Pengembangan Aplikasi Analisis Sentimen Twitter

Menggunakan Metode Naive Bayes Classifier (Studi Kasus

SAMSAT Kota Malang), Jurnal Informatika Polinema,

Volume 04, Edisi 02, no. ISSN: 2407-070X, 2018.

[7]

P. Antinasari, R. S. Perdana and M. A. Fauzi, Analisis

Sentimen Tentang Opini Film Pada Dokumen Twitter

Berbahasa Indonesia Menggunakan Naive Bayes Dengan

Perbaikan Kata Tidak Baku, Jurnal Pengembangan

Teknologi Informasi dan Ilmu Komputer, Vol. 1, No. 12,

no. e-ISSN: 2548-964X, pp. 1-9, Desember 2017.

[8]

A. F. Hidayatullah and A. SN, Analisis Sentimen dan

Klasifikasi Kategori Terhadap Tokoh Publik Pada Twitter,

Seminar Nasional Informatika 2014 (SemnasIF 2014) , no.

SSN: 1979-2328, 2014.

[9] M. R. Huq, A. Ali and A. Rahman, Sentiment Analysis on

Twitter Data using KNN and SVM, (IJACSA) International

Journal of Advanced Computer Science and Applications,

vol. 8, 2017.

[10] G. Chakraborty, M. Pagolu, S. Garla, Text Mining And

Analysis Practical Methods, Examples, And Case Studies

Using SAS, North Carolina, USA: SAS Institute Inc., 2013.

Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on May 27,2020 at 07:38:13 UTC from IEEE Xplore. Restrictions apply.

RoBERTa-BiLSTM: A Context-Aware Hybrid Model for Sentiment Analysis

Preprint

Full-text available

Jun 2024

Effectively analyzing the comments to uncover latent intentions holds immense value in making strategic decisions across various domains. However, several challenges hinder the process of sentiment analysis including the lexical diversity exhibited in comments, the presence of long dependencies within the text, encountering unknown symbols and words, and dealing with imbalanced datasets. Moreover, existing sentiment analysis tasks mostly leveraged sequential models to encode the long dependent texts and it requires longer execution time as it processes the text sequentially. In contrast, the Transformer requires less execution time due to its parallel processing nature. In this work, we introduce a novel hybrid deep learning model, RoBERTa-BiLSTM, which combines the Robustly Optimized BERT Pretraining Approach (RoBERTa) with Bidirectional Long Short-Term Memory (BiLSTM) networks. RoBERTa is utilized to generate meaningful word embedding vectors, while BiLSTM effectively captures the contextual semantics of long-dependent texts. The RoBERTa-BiLSTM hybrid model leverages the strengths of both sequential and Transformer models to enhance performance in sentiment analysis. We conducted experiments using datasets from IMDb, Twitter US Airline, and Sentiment140 to evaluate the proposed model against existing state-of-the-art methods. Our experimental findings demonstrate that the RoBERTa-BiLSTM model surpasses baseline models (e.g., BERT, RoBERTa-base, RoBERTa-GRU, and RoBERTa-LSTM), achieving accuracies of 80.74%, 92.36%, and 82.25% on the Twitter US Airline, IMDb, and Sentiment140 datasets, respectively. Additionally, the model achieves F1-scores of 80.73%, 92.35%, and 82.25% on the same datasets, respectively.

Emotional Evaluation of Open-Ended Responses with Transformer Models

Chapter

May 2024

Exploring Tourists' Behavioral Patterns in Bali's Top-Rated Destinations: Perception and Mobility

Article

Full-text available

Apr 2024

The tourism sector plays a crucial role in the global economy, encompassing both physical infrastructure and cultural engagement. Indonesia has a wide range of attractions and has experienced remarkable growth, with Bali as a notable example of this. With the rapid advancements in technology, travelers now have the freedom to explore independently, while online travel agencies (OTAs) serve as important resources. Reviews from tourists significantly impact the service quality and perception of destinations, and text mining is a valuable tool for extracting insights from unstructured review data. This research integrates multiclass text classification and a network analysis to uncover tourists' behavioral patterns through their perceptions and movement. This study innovates beyond conventional sentiment and cognitive image analysis to the tourists' perceptions of cognitive dimensions and explores the sentiment correlation between different cognitive dimensions. We find that destinations generally receive positive feedback, with 80.36% positive reviews, with natural attractions being the most positive aspect while infrastructure is the least positive aspect. We highlight that qualitative experiences do not always align with quantitative cost-effectiveness evaluations. Through a network analysis, we identify patterns in tourist mobility, highlighting three clusters of attractions that cater to diverse preferences. This research underscores the need for tourism destinations to strategically adapt to tourists' varied expectations, enhancing their appeal and aligning their services with preferences to elevate destination competitiveness and increase tourist satisfaction.

Implementation of Ensemble Machine Learning Classifier and Synthetic Minority Oversampling Technique for Sentiment Analysis of Sustainable Development Goals in Indonesia

Article

Full-text available

May 2024

As part of the Sustainable Development Goals (SDGs), governments worldwide have committed to improving people's lives to improve the quality of life for all, including the 17 such goals that were agreed upon in 2015 to benefit the human race as a whole. It would be interesting to see how society responds to the SDGs after approximately half of them have been achieved. This public response was analyzed in terms of sentiment. Within the total number of internet users in Indonesia, there are 18.45 million Twitter users. The platform enables anyone to write about anything they are experiencing in their lives, such as what is happening in their environment, what is happening in their education system, what is happening in the food industry, how people feel, and many more. The platform enables anyone to write about anything they are experiencing in their lives, such as what is happening in their environment, what is happening in their education system, what is happening in the food industry, how people feel, and many more. To model the data collected, the researchers used Ensemble Machine Learning Classifiers (EMLC) to model the data by using a machine learning classifier that uses machine learning techniques. The best model in this study is EMLC-Stacking with a data splitting of 80:20 and using SMOTE, which obtains an accuracy of 91%. This accuracy results from a 5% increase compared to when not using SMOTE. From 15,698 tweets, this research found that 47% were positive sentiments, 28% were negative sentiments, and 25% were neutral sentiments. The results that we measured offer hope that there will be a positive trend in the journey of the SDGs until 2030 if these findings are true.

Trends in sentiment of Twitter users towards Indonesian tourism: analysis with the k-nearest neighbor method

Article

Full-text available

Mar 2024

This research analyzes the sentiment of Twitter users regarding tourism in Indonesia using the keyword "wonderful Indonesia" as the tourism promotion identity. This study aims to gain a deeper understanding of the public sentiment towards "wonderful Indonesia" through social media data analysis. The novelty obtained provides new insights into valuable information about Indonesian tourism for the government and relevant stakeholders in promoting Indonesian tourism and enhancing tourist experiences. The method used is tweet analysis and classification using the K-nearest neighbor (KNN) algorithm to determine the positive, neutral, or negative sentiment of the tweets. The classification results show that the majority of tweets (65.1% out of a total of 14,189 tweets) have a neutral sentiment, indicating that most tweets with the "wonderful Indonesia" tagline are related to advertising or promoting Indonesian tourism. However, the percentage of tweets with positive sentiment (33.8%) is higher than those with negative sentiment (1.1%). This study also achieved training results with an accuracy rate of 98.5%, precision of 97.6%, recall of 98.5%, and F1-score of 98.1%. However, reassessment is needed in the future as Twitter users' sentiments can change along with the development of Indonesian tourism itself.

Trends in sentiment of Twitter users towards Indonesian tourism: analysis with the k-nearest neighbor method

Article

Full-text available

Mar 2024

This research analyzes the sentiment of Twitter users regarding tourism in Indonesia using the keyword "wonderful Indonesia" as the tourism promotion identity. The aim of this study is to gain a deeper understanding of the public sentiment towards "wonderful Indonesia" through social media data analysis. The novelty obtained provides new insights into valuable information about Indonesian tourism for the government and relevant stakeholders in promoting Indonesian tourism and enhancing tourist experiences. The method used is tweet analysis and classification using the K-nearest neighbor (KNN) algorithm to determine the positive, neutral, or negative sentiment of the tweets. The classification results show that the majority of tweets (65.1% out of a total of 14,189 tweets) have a neutral sentiment, indicating that most tweets with the "wonderful Indonesia" tagline are related to advertising or promoting Indonesian tourism. However, the percentage of tweets with positive sentiment (33.8%) is higher than those with negative sentiment (1.1%). This study also achieved training results with an accuracy rate of 98.5%, precision of 97.6%, recall of 98.5%, and F1-score of 98.1%. However, reassessment is needed in the future as Twitter users' sentiment can change along with the development of Indonesian tourism itself.

Trends in sentiment of Twitter users towards Indonesian tourism: analysis with the k-nearest neighbor method

Article

Full-text available

Mar 2024

ANALISIS SENTIMEN KOMENTAR INSTAGRAM PADA PROGRAM KAMPUS MERDEKA DENGAN ALGORITMA NAIVE BAYES DAN DECISION TREE

Article

Apr 2024

Instagram merupakan salah satu sosial media yang digunakan merepresentasikan diri, berinteraksi, dan mencari informasi. Kita dapat mengambil sekumpulan informasi ke dalam bentuk dataset untuk diolah lebih lanjut. Berkaitan dengan hal itu , Program Kampus Merdeka sebagai objek analisis, mengingat Program Kampus Merdeka adalah program pemerintah yang saat ini sedang dijalanan oleh Kemendikbud. Pengambilan dataset yang didapat dari kumpulan komentar Instragram, tool yang digunakan adalah phantombuster. Menggunakan bahasa pemrograman phyton dengan tools Google Collab, dengan Algoritma Naïve Bayes Classifier dan Decision Tree untuk membuat model sentimen. Hasil scrapping mendapatkan 1764 data , dan sesudah dilakukan pre-processing menjadi 1694 data. Dari sentimen analisis yang telah dilakukan diperoleh hasil dari penerapan Algortima Complement Naïve Bayes dan Decision Tree, sebelum dilakukan SMOTE over-sampling, perbandingan data positif dan negatif sebesar 35,06% banding 64,95% , dengan akurasi model Decision Tree 84% dengan skenario pembagian data 90:10 dan model Complement Naïve Bayes 81% pada skenario pembagian data 80:20. Setelah dilakukan balancing data menggunakan SMOTE over-sampling, akurasi pada model Decision Tree naik sebesar 1% dari 86% menjadi 85%, dengan skenario pembagian data 90:10, dan pada model Complement Naïve Bayes juga mengalami kenaikan sebesar 2%, dari 82% menjadi 83% dengan skenario pembagian data 80:20

A Comprehensive Survey on Sentimental Analysis using Classification Techniques

Article

Apr 2024

For businesses and manufacturers, opinion information is extremely essential. They frequently want to know intimately what their goods and services think of customers and the public. It is nonetheless unrealistic to read every article manually on the site and extract valuable views from it. If you do it manually, too much info is available. Sentiment analysis provides for efficient and cost-effective large-scale processing of data. To learn more about sentiment analysis, this author examines how sentiment analysis is used by businesses to identify their strengths & limitations. This paper summarizes the sentiment analysis on amazon reviews with its application and classification levels. This work reviews sentiment classification techniques and also studies feature selection in sentiment analysis. In addition, it provides the introduction of natural language processing and its techniques, and limitations. At last, we described text mining in this paper.

Sentiment Analysis Using Large Language Models: A Case Study of GPT-3.5

Chapter

Apr 2024

Penerapan Analisis Sentimen pada Pengguna Twitter Menggunakan Metode K-Nearest Neighbor

Article

Full-text available

Dec 2018

This research is made to implement the KNN (K-Nearest Neighbor) algorithm for sentiment analysis Twitter about Jakarta Governor Election 2017. The object is 2000 data tweets in Indonesia collected from Twitter during Januari 2017 using Python package called Twitterscraper. The methode used in sentiment analysis system is KNN with TF-IDF term weighting and Cosine similarity measure. As the test result, the highest accuracy is 67,2% when k=5, the highest precision is 56,94% with k=5, and the highest recall 78,24% with k=15. Keywords : K – Nearest Neighbor, Twitterscraper, TF-IDF, Cosine Similarity Penelitian ini dibuat untuk mengimplementasikan algoritma KNN (K - Nearest Neighbor) dalam analisis sentimen pengguna Twitter tentang topik Pilkada DKI 2017. Data tweet yang digunakan adalah sebanyak 2000 data tweet berbahasa Indonesia yang dikumpulkan selama bulan Januari 2017 menggunakan package Python bernama Twitterscraper. Menggunakan algoritma KNN dengan pembobotan kata TF-IDF dan fungsi Cosine Similarity, akan dilakukan pengklasifikasian nilai sentimen ke dalam dua kelas : positif dan negatif. Dari hasil pengujian diketahui bahwa nilai akurasi terbesar adalah 67,2% ketika k=5, presisi tertinggi 56,94% ketika k=5, dan recall 78,24% dengan k=15. Kata Kunci : K – Nearest Neighbor, Twitterscraper, TF-IDF, Cosine Similarity

PENGEMBANGAN APLIKASI ANALISIS SENTIMEN TWITTER MENGGUNAKAN METODE NAÏVE BAYES CLASSIFIER (Studi Kasus SAMSAT Kota Malang)

Article

Full-text available

Feb 2018

Twitter adalah salah satu media sosial dimana pengguna dapat mencari topik tertentu dan membahas isu-isu terkini. Beberapa pesan singkat atau tweet dapat memuat opini terhadap produk dan layanan yang dirasakan oleh masyarakat. Data ini dapat menjadi sumber data untuk dijadikan objek penelitian. Penelitian ini bertujuan untuk membangun aplikasi analisis sentimen yang menerapkan pendekatan Naïve Bayes Classifier untuk mengklasifikasikan kata-kata dan difokuskan pada tweet dalam bahasa Indonesia. Data diperoleh melalui cara web scrapping dan sumber teks yang digunakan sebagai topik bahasan adalah Sistem Administrasi Manunggal Satu Atap (SAMSAT) Malang Kota. Proses klasifikasi dilakukan melalui serangkaian tahapan seperti preproses (case folding, cleaning, tokenizing, dan stopword) serta proses klasifikasi dengan algoritma Naïve Bayes Classifier itu sendiri untuk mendapatkan hasil klasifikasi dengan kategori positif, negatif atau netral. Berdasarkan hasil penelitian, algoritma Naïve Bayes Classifier memberikan unjuk kerja yang baik dalam analisis sentimen. Dari hasil uji akurasi klasifikasi yang dilakukan oleh aplikasi menghasilkan nilai akurasi tertinggi pada setiap kategori positif, negatif, netral masing-masing sebesar 82%, 92%, 80% dengan jumlah data latih 200 tweet negatif, 200 tweet positif, dan 200 tweet netral.

Sentimen Analisis Kepuasan Pelanggan E-commerce Menggunakan Lexicon Classification dengan R

Conference Paper

Full-text available

Mar 2018

Calandra Alencia

Kepuasan pelanggan adalah prioritas utama bagi setiap perusahaan yang bergerak di bidang ecommerce. Oleh karena itu, sangat penting bagi setiap e-commerce, terutama yang sudah melayani transaksi antar negara seperti Amazon, Ebay, dan Rakuten untuk mengetahui bagaimana kesan atau sentimen pelanggannya mengenai kualitas produk ataupun pelayanan yang diberikan agar bisa mempertahankan ataupun meningkatkan kualitasnya. Melalui pesatnya perkembangan teknologi, sentimen tersebut semakin mudah untuk diketahui. Salah satunya dengan memanfaatkan komentar yang ada di media sosial seperti Twitter. Dengan menganalisa komentar pengguna Twitter yang berkaitan dengan faktor- faktor penentu kepuasan pelanggan terhadap e-commerce menggunakan metode Lexicon classification, diperoleh bahwa faktor-faktor yang paling dominan dalam menentukan kepuasan pelanggan adalah usefullness, system quality, dan information quality. E-commerce yang ingin meningkatkan kepuasan pelanggannya, disarankan untuk berfokus pada ketiga faktor tersebut, dikarenakan faktor-faktor tersebut yang menjadi fokus utama dari pelanggan ketika berbelanja pada sebuah e-commerce.

Analisis Sentimen Tentang Opini Film pada Dokumen Twitter Berbahasa Indonesia Menggunakan Naive Bayes dengan Perbaikan Kata Tidak Baku

Article

Full-text available

Dec 2017

Pertumbuhan media sosial yang sangat pesat tidak membuat Twitter ditinggalkan oleh penggunanya. Twitter merupakan salah satu media sosial yang memungkinkan penggunanya untuk melakukan interaksi, berbagi informasi, atau bahkan untuk mengutarakan perasaan dan opini, termasuk juga dalam mengutarakan opini film. Komentar atau Tweet mengenai film yang ada pada Twitter dapat dijadikan sebagai evaluasi dalam menonton film dan meningkatkan produksi film. Untuk mengetahui hal tersebut, analisis sentimen dapat digunakan untuk mengklasifikasikan kedalam sentimen negatif atau positif. Didalam Tweet terkandung banyak ragam bahasa yang digunakan, yaitu diantaranya bahasa dalam bentuk tidak baku seperti bahasa slang, penyingkatan kata, dan salah eja. Oleh sebab itu dibutuhkan penanganan khusus pada Tweet. Pada penelitian ini digunakan kamus kata tidak baku dan normalisasi Levenshtein Distance untuk memperbaiki kata yang tidak baku menjadi kata baku dengan pengklasifikasian Naive Bayes. Berdasarkan hasil pengujian yang telah dilakukan didapatkan akurasi tertinggi dengan nilai accuracy, precision, recall, dan f-measure sebesar 98.33%, 96.77%, 100%, dan 98.36%. ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- The rapid growth of social media does not make Twitter left by its users. Twitter is one of the social media that allows user to interact each other, share information, or even to express feelings and opinions, including in expressing film opinions. Comments or Tweets about movies that exist on Twitter can be used as an evaluation in watching movies and increasing film production. To figure it out, sentiment analysis can be used to classify into negative or positive sentiments. In Tweets contain many languages used in the form of non-standard languages such as slang, word-outs, and misspellings. Therefore it takes special handling on Twitter comments. In this research used non-standard word dictionary and Levenshtein Distance normalization to improve non-standard word to standard word by classification Naive Bayes. Based on the result of the test, the highest accuracy, precision, recall, and f-measure value are 98.33%, 96.77%, 100%, and 98.36%.

Sentiment Analysis on Twitter Data using KNN and SVM

Article

Full-text available

Jan 2017

A-1 ANALISIS SENTIMEN DAN KLASIFIKASI KATEGORI TERHADAP TOKOH PUBLIK PADA TWITTER

Conference Paper

Full-text available

Aug 2013

dibangun dan pada tools RapidMiner memperlihatkan bahwa akurasi dengan term frequency memberikan hasil akurasi yang lebih baik daripada akurasi dengan fitur TF-IDF. Metode Support Vector Machine menghasilkan akurasi performansi yang lebih baik daripada metode Naive Bayes baik dalam klasifikasi sentimen maupun dalam klasifikasi kategori. Namun demikian, secara keseluruhan penggunaan metode Support Vector Machine dan Naive Bayes sama-sama memiliki performansi yang cukup baik untuk melakukan klasifikasi tweet. 1. PENDAHULUAN Media jejaring sosial seperti Twitter, Facebook, dan Youtube merupakan beberapa media perangkat komunikasi terpopuler di masyarakat saat ini (Aliandu, 2012; Kumar dan Sebastian, 2012). Menjelang pemilihan umum, para politisi atau tokoh publik sering memanfaatkan media sosial untuk berkampanye dan meningkatkan popularitas mereka. Salah satu media jejaring sosial yang telah dimanfaatkan dalam pemilihan umum adalah Twitter. Twitter telah dimanfaatkan dalam pemilihan umum di beberapa negara seperti Singapura, Jerman, dan Amerika (Sang dan Bos, 2012; Choy dkk, 2012; Choy dkk, 2011). Penelitian ini mencoba memanfaatkan Twitter dengan menganalisis tweet berbahasa Indonesia yang membicarakan tentang tokoh publik menjelang pemilihan umum 2014 di Indonesia. Tokoh publik yang dianalisis merupakan tokoh publik dengan popularitas tertinggi hasil survei yang telah dilakukan dari beberapa lembaga survei seperti Lembaga Survei Indonesia (LSI), Lembaga Survei Nasional (LSN), Sogeng Sarjadi Syndicate (SSS), Centre for Strategic and International Studies (CSIS) dan Saiful Mujani Research and Consulting (SMRC). Analisis dilakukan dengan mengklasifikasikan tweet menggunakan Naive Bayes Classifier. Naive Bayes Classifier dikombinasikan dengan fitur untuk dapat mendeteksi negasi dan pembobotan menggunakan term frequency serta TF-IDF. Klasifikasi tweet pada penelitian ini diperoleh berdasarkan kombinasi antara kelas sentimen dan kelas kategori. Kelas sentimen terbagi menjadi dua polaritas yaitu sentimen positif dan negatif. Kelas kategori dipilih berdasarkan indikator yang telah digunakan oleh LSI (Lembaga Survei Indonesia) untuk menilai tokoh yang dianggap layak maju dalam pilpres 2014. Ketiga dimensi tersebut adalah kapabilitas, integritas, dan akseptabilitas. Dimensi kapabilitas mencakup kepandaian, wawasan, visi, kepemimpinan, ketegasan, dan keberanian dalam pengambilan keputusan. Integritas mencakup aspek moral, kejujuran, satu dalam kata dan perbuatan, serta bersih dari cacat moral, etik, dan hukum. Akseptabilitas adalah sikap penerimaan masyarakat terhadap seorang tokoh (Mujani, dkk, 2012).

Text Mining And Analysis Practical Methods, Examples, And Case Studies Using SAS

Jan 2013

G Chakraborty
M Pagolu
S Garla

G. Chakraborty, M. Pagolu, S. Garla, Text Mining And Analysis Practical Methods, Examples, And Case Studies Using SAS, North Carolina, USA: SAS Institute Inc., 2013.

Sentiment Analysis In Social Network, United States, Todd Green

F A Pozzi
E Fersini
E Messina
B Liu

Sentiment Analysis and Knowledge Discovery in Contemporary Business

D Singh Rajput
R Singh
Thakur
S Basha
Muzamil

Jan 2018
2407-2070

I F Rozi
E N Hamdana
M B I Alfahmi

I. F. Rozi, E. N. Hamdana and M. B. I. Alfahmi, Pengembangan Aplikasi Analisis Sentimen Twitter Menggunakan Metode Naive Bayes Classifier (Studi Kasus SAMSAT Kota Malang), Jurnal Informatika Polinema, Volume 04, Edisi 02, no. ISSN: 2407-070X, 2018.

Sentiment Analysis Using Naive Bayes Algorithm Of The Data Crawler: Twitter

Abstract and Figures

Recommended publications

A BigData approach for sentiment analysis of twitter data using Naive Bayes and SVM Algorithm

Study of Twitter Sentiment Analysis using Machine Learning Algorithms on Python

Carpooling: travelers’ perceptions from a big data analysis

MR-SAT: A MapReduce Algorithm for Big Data Sentiment Analysis on Twitter