ArticlePDF Available

Using Text Mining to Analyze Mobile Phone Provider Service Quality (Case Study: Social Media Twitter)

Authors:

Abstract and Figures

Competition between telephone providers to attract new customers can be seen through advertisment war on TVs, posters and radios nearly every moment. Question is arise on how do we measure the quality of these providers in order choose the best one for oneself. This paper is written to solve the question by measuring customers satisfaction by using text mining. Sample model is extracted from social media Twitter and the sentiment polarity is measured using Naïve Bayes classifier method. The model shows a promising result on defining the popularity based on customer's satisfaction and therefore defining the best provider to be used Index Terms— Naïve bayesian, sentiment analysis, telephone provider, text mining
Content may be subject to copyright.
AbstractCompetition between telephone providers to
attract new customers can be seen through advertisment war on
TVs, posters and radios nearly every moment. Question is arise
on how do we measure the quality of these providers in order
choose the best one for oneself.
This paper is written to solve the question by measuring
customers satisfaction by using text mining. Sample model is
extracted from social media Twitter and the sentiment polarity
is measured using Naïve Bayes classifier method. The model
shows a promising result on defining the popularity based on
customer's satisfaction and therefore defining the best provider
to be used
Index Terms Naïve bayesian, sentiment analysis, telephone
provider, text mining.
I. INTRODUCTION
Nowadays people use telephone to send messages despite
the distances between them. There are many providers and
programs available for us to choose from which creates
competition between these companies and yet confusion for
the users.
Arising along the internet popularity, social media Twitter
become one of the top web accessed and used by Indonesian.
Millions of tweets containing thoughts, questions, comments
and critiques posted daily. The telephone provider companies
even use this media to get closer to the customers. These huge
amounts of posts [1] can easily become a source of
information, of course it has to be polished first.
This paper suggests a method to extract information by
using text mining [2] and naive Bayesian method on the
model of extracted Twitter posts. Sentiment analysis is also
used to identify the readers opinion to determine positiveness
of the posts [3]. A few examples also showed how effective
Sentiment Analysis such as [4]-[6].
II. RESEARCH OBJECT AND ASSUMPTION
The telephone providers that are going to be research
objects are 3 famous telephone provider in Indonesia. They
are PT XL Axiata Tbk, PT Telkomsel Tbk and PT Indosat
Tbk
PT XL Axiata is one of the biggest telephone provider
company in Indonesia with broad network and high quality
service across the country that has stood since October 8th
Manuscript received July 9, 2013; revised December 10, 2013.
Calvin and Johan Setiawan are with the Information System Department,
Faculty of Information and Communication Technology, Multimedia
Nusantara University, Scientia Boulevard Street, Gading Serpong,
Tangerang, Banten-15811, Indonesia (e-mail: calv.axl@gmail.com;
johansetiawanumn@gmail.com).
1996. PT XL Axiata was proclaimed to be the first private
company that provides telephone services especially for
mobile phone in Indonesia.
PT Telkomsel Tbk has been established since 1995 as one
of the innovator to develop Indonesia’s communication
technology. To achieve that, Telkomsel keeps to grow their
network rapidly through the country while empowering the
community. PT Telkomsel Tbk became the pioneer of mobile
telecommunication technologies in Indonesia.
PT Indosat Tbk was established in 1967 as a foreign
investment company and started to operate in 1969. In 1980,
PT Indosat Tbk became state-owned enterprise which is
wholly owned by Indonesia’s Government. Until now, the
company provides cellular services, international
communications and satellite services. Most known services
from them are Indosat Mentari and IM3.
To carry on this research, there are a few assumptions and
criteria had to be made to make this research doable.
Assumption 1:
The real data is tweets from Twitter Timeline via
LingPipe4Twitter which mentions one of three telephone
provider companies in this research within 15 Timeline pages
and 100 tweets per pages. 1500 tweets considered to be
sufficient as a sample for this research.
Assumption 2:
Every tweets might contains none or more than one
positive and/or negative sentiment word. Tweet with no
sentiment word won’t affect the result of the research.
Assumption 3:
Repetition of sentiment word in a tweet won’t be counted
as the previous word is already calculated.
Assumption 4:
Calculation for the result is focused on the individual word
in every tweet, not per tweet.
Assumption 5:
The 0 (zero) point score are the quality wanted by user.
Negative point shows bad quality and positive point shows
good quality.
III. GENERAL METHOD DESCRIPTION AND SENTIMENT
ANALYSIS
Research started by Understanding the Literature,
especially related to text mining, Naïve Bayes and Sentiment
Analysis. Once having sufficient material, the research
continues with Determining Positive and Negative Word
along with Data Collection from Twitter using
Using Text Mining to Analyze Mobile Phone Provider
Service Quality (Case Study: Social Media Twitter)
Calvin and Johan Setiawan
International
Journal of Machine Learning and Computing, Vol. 4, No. 1, February 2014
106
DOI: 10.7763/IJMLC.2014.V4.395
LingPipe4Twitter. LingPipe4Twitter used for this research
because it is free to use and also a combination of both open
source library. The Determined Words will be inserted to the
Words Bank / Dictionary. Result from Data Collection is
a .csv file. From the .csv file, research will be followed by
Data Processing using the Words Bank / Dictionary, so it can
be considered feasible to be used in the next phase of the
research.
After processing the data, the results that appear are scored
for each company. Scoring results will then be analyzed to
achieve the desired results. (See in Fig. 1).
Fig 1. General method flowchart.
Before carrying on to model the data with Naïve Bayes, we
need to create word dictionary which will be used as the base
of sentiment analysis.
Sentiment Analysis refers to learning an opinion or
concerns, feelings and emotions are expressed through
writing [7]. The main task in Sentiment Analysis is to classify
the polarity of a text in a document, sentence or feature - an
opinion in the document, or in a sentence giving positive
aspects, negative or neutral. With this, Sentiment Analysis
can determine whether the person is in a state of emotion [8].
Sentiment Analysis involves the use of a classification
opinion within a text into categories such as "positive" or
"negative" and the category is considered "neutral".
Sentiment Analysis application will also be able to track what
someone is saying about a brand or message [9].
Using Sentiment Analysis, the Determining Positive and
Negative Words resulting the Words Bank / Dictionary
shown at Table I.
TABLE I: WORDS BANK / DICTIONARY
Words Dictionary
Positive Negative
untung hilang
oke menyesal
bagus lemot
terpercaya lambat
bersahabat pending
murah bohong
cinta jelek
kreatif mati
lancar ganggu
hebat rusak
puas error
kuat parah
cepat menyusahkan
beres menurun
kencang berkurang
irit putus
mantap tipu
asik menipu
betah gagal
aktif payah
stabil mengesalkan
jelas kesal
terimakasih penipu
jernih lelet
bebas berhenti
percaya masalah
ramah gangguan
konsisten mengecewakan
terjangkau kecewa
senang menyebalkan
top susah
hemat nyangkut
rugi
mahal
IV. TRAINING SETS
To assure this method is applicable and reliable, it will be
tested against a training data model. The training data model
is based on the Words Bank / Dictionary to make sure the
original data will be processed and shows a correct result.
The training data contains 200 tweets which 100 tweets
filled with positive sentiment words and 100 tweets filled
with negative sentiment words. For each sentiment word,
positive sentiment will be scored as 1 point and negative
sentiment will be scored as -1 point. A negation word before
a sentiment word will reverse the point.
Here are some of the results of the data training shown in
Table II.
Column “articleid” filled with 1(one) shows the positive
data training and the field which filled with 2(two) shows the
negative data training.
Column “who” shows who made the tweet, whether it is
public with positive opinion or the negative opinion. Column
“whom” shows whom the tweets mentioned to, in this case is
the telephone provider company.
Column “value” shows the point for each word that
contains with positive or negatif sentiment/meaning. Column
“keyword” and “sentenceprocessed” indicates the words with
the following sentence that contains a positive or negative
sentiment.
International
Journal of Machine Learning and Computing, Vol. 4, No. 1, February 2014
107
TABLE II: SOME RESULTS ON DATA TRAINING
article id who sentence
index whom value keyword sentenceprocessed
1 Public+ 14 Tes
Positif -1 untung ini sih ga untung ganti @XL123 dikit2 pending
browsing apapun lambat.
1 Public+ 28 Tes
Positif -1 stabil internet speednya gak stabil ? buffer youtube mpe
ngulang 5x @XL123.
1 Public+ 38 Tes
Positif -1 jelas nipunipu nih. @XL123 bnyk potonganpotongan gak
jelas pulsa dipotong mulu. Apaapaan nih.
1 Public- 1 Tes
Negatif -1 rugi koq tiba2 pulsa gw ilang ya Mana nih @XL123 jgn
bikin org rugi donk.
1 Public- 2 Tes
Negatif -1 lemot alah katanya @XL123 oke yg ada malah nyesel udah
ga ada sinyal lemot pending pula.
1 Public- 2 Tes
Negatif -1 pending alah katanya @XL123 oke yg ada malah nyesel udah
ga ada sinyal lemot pending pula.
2 Public+ 1 Tes
Positif 1 untung wah pake @XL123 untung juga yah ternyata.
2 Public+ 2 Tes
Positif 1 bagus oke juga nih @XL123 bagus sinyalnya.
2 Public+ 2 Tes
Positif 1 oke oke juga nih @XL123 bagus sinyalnya.
2 Public+ 5 Tes
Positif 1 bersahabat terima kasih @XL123 udah ngasih paket2 yang
bersahabat ini.
2 Public+ 6 Tes
Positif 1 murah oi @tina si @XL123 ada kasih promo murah tuh
cobain deh.
2 Public+ 7 Tes
Positif 1 oke maju terus @XL123 oke banget!.
V. ACTUAL DATA MODEL ANALYSIS AND RESULT
The actual data model is obtained from LingPipe4Twitter.
Actual data model consists of up to 1500 tweets, where
LingPipe4Twitter search the mentioned research’s object
Twitter account. PT XL Axiata’s Twitter accounts are
@XL123, @XLCare dan @XLandMe. For PT Telkomsel,
the Twitter accounts are @telkomsel, @Kartu_As dan
@simPATI. Then PT Indosat’s Twitter accounts are
@IndosatMania, @indosat dan @indosatcare.
The search result focusing the number of tweets showed in
Table III.
TABLE III: SEARCH RESULT
Company
Account Search Result Subtotal
PT XL Axiata @XL123 1494 4490
@XLCare 1498
@XlandMe 1498
PT Telkomsel @telkomsel 1453 4445
@Kartu_As 1498
@simPATI 1494
PT Indosat @IndosatMania 1496 4265
@indosat 1274
@indosatcare 1495
Total Tweets 13200
The search result will be used again in the next step which
is Data Processing. Data Processing is a step where scores
will be calculated by using a modified Java Application
specialized in text mining [10].
The results of this process are shown in the tables below
for each company. Table IV is the result for PT XL Axiata,
where Table V is the result for PT Telkomsel and Table VI is
the result for PT Indosat.
With the total scoring obtained for each company, the
result shows that PT XL Axiata Tbk has better service quality
than the other two companies. PT XL Axiata Tbk leads with
29 positive points, followed by PT Telkomsel Tbk with 70
negative values and then PT Indosat Tbk in last position with
100 negative value.
TABLE IV: SCORING RESULT FOR PT XL AXIATA
Account Tweet Score Sub Total
@XL123 Positive 106 34
Negative -72
@XLCare Positive 65 -29
Negative -94
@XLandMe Positive 32 24
Negative -8
Total Scoring PT XL Axiata 29
TABLE V: SCORING RESULT FOR PT TELKOMSEL
Account Tweet Score Sub
Total
@telkomsel Positive 95 -59
Negative -154
@Kartu_As Positive 26 14
Negative -12
@simPATI Positive 31 -25
Negative -56
Total Scoring PT Telkomsel -70
TABLE VI: SCORING RESULT FOR PT INDOSAT
Account Tweet Score Sub
Total
@IndosatMania Positive 12 -37
Negative -49
@indosat Positive 92 -18
Negative -110
@indosatcare Positive 76 -45
Negative -120
Total Scoring PT Indosat -100
VI. CONCLUSION
The result of this research in this paper shows that the
International
Journal of Machine Learning and Computing, Vol. 4, No. 1, February 2014
108
companies in telephone service providers must have had a lot
of users, but they may still not know the quality they provide
to their customers. Usually any submitted opinions by users
were ignored by the company. By processing the opinions
that have been submitted using text mining, this paper has
shown the service quality of each company.
REFERENCES
[1] S. Iiritano and M. Ruffolo, Managing the knowledge contained in
electronic documents: a clustering method for text mining, in Proc.
the 12th International Workshop on Database and Expert Systems
Applications, 2001, pp. 454-458.
[2] C. Bridge, Unstructured Data and the 80 Percent Rule, 2011.
[3] F. Neri, C. Aliprandi, F. Capeci, M. Cuadros, and Tomas, Sentiment
analysis on social media, in Proc. IEEE/ACM International
Conference on Advances in Social Networks Analysis and Mining,
2012, pp. 919-926.
[4] B. Pang and L. Lee. (April 2, 2013). Department of Computer Science,
Cornell University. [Online]. Available:
http://www.cs.cornell.edu/home/llee/papers/pang-lee-stars.pdf
[5] A. Adrifina, J. U. Putri, and I. W. Simri, Pemilahan artikel berita
dengan text mining, in Proc. Seminar Ilmiah Nasional Komputer dan
Sistem Intelijen, Universitas Gunadarma, Depok, 2008, pp. 176-181.
[6] A. Nurani, B. Susanto, and U. Proboyekti, Implementasi naïve bayes
classifier pada program bantu penentuan buku referensi matakuliah,
Jurnal Informatika, Universitas Kristen Duta Wacana, vol. 3, no. 2, pp.
32-36, 2007.
[7] B. Liu, Handbook of Natural Language Processing, CRC Press, 2010.
[8] M. D. Haff. (March 12, 2010). Customer think. [Online]. Available:
http://www.customerthink.com/blog/sentiment_analysis_hard_but_wo
rth_it
[9] Lingpipe: Sentiment Analysis Tutorial. [Online]. Available:
http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html
[10] Y. E. Soelistio and M. R. S. Surendra, Simple text mining for
sentiment analysis of political figure using naïve bayes classifier
method, in Proc. the 7th International Conference on Information and
Communication Technology and Systems, Bali, 2013.
Johan Setiawan was born in Jakarta on October 27,
1964. He graduated from Bina Nusantara University,
Jakarta, DKI Jakarta - Indonesia majoring in
Information System. He also received his Master of
Business Administration degree (MBA) from
Monash-Mt Eliza University, Melbourne Australia ,
and Magister Management (MM) degree from IPMI
Jakarta DKI Jakarta majoring in General
Management. His primary interests are in data
warehouse, data mining and system analysis and design.
Calvin was born in Jakarta on January 19, 1992. He
graduated from Multimedia Nusantara University,
Tangerang, Banten Indonesia majoring in
Information System. His interests are in games and
custom application development.
International
Journal of Machine Learning and Computing, Vol. 4, No. 1, February 2014
109
... The authors extract the following tasks for the method from Schmunk et al. (2013): (1) finding documents relevant to a specific topic or purpose; (2) pre-processing collected documents to map the unstructured information into relevant structured information that can subsequently be used with certain statistical techniques; and (3) identifying the sentiment surrounding the product or company. Sentiment analysis is a tool that is increasing in reputation and acceptance for extracting opinions about quality of service from UGC in numerous sectors: (1) hospitals (Greaves et al., 2014); (2) mobile phone providers (Setiawan, 2014); ...
... More recently, however, social-media platforms like Twitter and Facebook have been employed as alternative UGC data sources. For example, Twitter has been used successfully as a resource for data measuring the quality of service of: (1) hospitals (Greaves et al., 2014); (2) mobile phone companies (Setiawan, 2014); (3) and airlines (Mostafa, 2013). This research intends to extract similarly useful UGC data for the purposes of sentiment analysis techniques for airports. ...
Article
User generated content (UGC) is providing new broad information datasets about airport service quality (ASQ) that are more easily available to researchers than information gathered using traditional techniques, such as surveys conducted with passengers. Research in the field is characterized by UGC provided on specialized blogs and websites. This study utilizes London Heathrow airport's Twitter account dataset and applies the sentiment analysis (SA) technique to measure ASQ. The aim of this research is to explore how SA techniques can identify new insights beyond those provided by more traditional methods. The dataset includes 4392 tweets and the SA identifies 23 attributes that can be used for comparison with other ASQ scales. Findings indicate that the frequency of passenger references to the attributes of the scale differs significantly in some cases and that the discernment of these differences can provide actionable insights for airport management when improving airport service quality.
... Honda leveraged Text Analytics and Machine Learning to extract and classify useful information from the unstructured feedback it receives from a pull of over 20 million customers resulting in an 80% reduction in the time required by Quality Assurance staff (Kumar, 2018). To estimate the mobile phone provider service quality using social media (Twitter), Calvin & Setiawan (2014) developed a supervised learning model and Naïve Bayes classifier. This study has several limitations, including limited use of training data, trying just one supervised learning algorithm, and a lack of proper validation method of test results. ...
Conference Paper
Full-text available
Disruptive technology, especially machine learning (ML), is changing the paradigm in many fields, including quality. Advancements in data science, increasing processing powers of computers, and the availability of massive datasets, have made machine learning a useful tool to solve the problem at scale. In this work, a systematic review of literature has been conducted to analyze the type of industry and quality problems that can be detected with ML. ML applications in industries such as service, manufacturing, food, software/IT, and healthcare to detect quality issues and detect fraud in healthcare and health insurance have been presented. The paper has also summarized the common themes in applying ML in detecting quality problems and discussed the advantages and disadvantages of various ML algorithms in detecting quality issues and anomalies, including fraud, in various industries.
... Sentiment analysis can also be interpreted as learning an opinion, problem, feeling, or emotion from someone or the public in responding to something in the form of text or writing. In determining a sentiment, it is done by calculating some of the words contained in sentences, documents, or text [6]. Rapidminer is software that can be used to process data mining. ...
Article
Full-text available
Indonesia Lawyers Club (ILC) is a talk show on TVOne that discusses topics around public phenomena, legal issues, crime, and other similar topics. In 2018, ILC won the Panasonic Gobel Awards as the best news talk show program. But in 2019, ILC failed to win the award which was won by Mata Najwa which featured a talk show event that appeared on Trans7. As one of the television shows that has won awards, ILC has pros and cons for its shows from the public. This study applies a sentiment analysis approach to examine public opinion on Twitter about Mata Najwa and ILC in 2018 and 2019. This study applies K-Nearest Neighbor, Naïve Bayes Classifier, and Decision Tree classification algorithm to validate the result. The contribution of this study is to show that public opinion on Twitter can be examined to figure out community sentiment on a tv talk show as well as to confirm the Award winner of tv Talkshow. Index Terms—datamining; Decision Tree; K-NN; Naïve Bayes Classifier; sentiment analysis
... Other work with Naïve Bayes method to analyze the sentiment in Twitter about cellular operator in Indonesia had an accuracy of 72,22 % [4], Analysis of the mobile phone service provider quality in social media Twitter using Naïve Bayes shows the provider with the highest customer satisfaction level [16]. Yu and Hatzivassiloglou 2003 reported an accuracy of 97% in document classification of data acquisition of 400 sentences [17] dan memiliki and reported a good result for sentiment data classification using N-gram dan POS-tag as the features [18]. Thus, Naïve Bayes is an accurate, efficient and easily interpreted method of classification [19,20]. ...
Article
Telecommunication users are rapidly growing each year. As people keep demanding a better service level of Short Message Service (SMS), telephone or data use, service providers compete to attract their customer, while customer feedbacks in some platforms, for example Twitter, are their souce of information. Multinomial Naïve Bayes Tree, adapted from the method of Multinomial Naïve Bayes and Decision Tree, is one technique in data mining used to classify the raw data or feedback from customers.Multinomial Naïve Bayes method used specifically addressing frequency in the text of the sentence or document. Documents used in this study are comments of Twitter users on the GSM telecommunications provider in Indonesia.This research employed Multinomial Naïve Bayes Tree classification technique to categorize customers sentiment opinion towards telecommunication providers in Indonesia. Sentiment analysis only included the class of positive, negative and neutral. This research generated a Decision Tree roots in the feature "aktif" in which the probability of the feature "aktif" was from positive class in Multinomial Naive Bayes method. The evaluation showed that the highest accuracy of classification using Multinomial Naïve Bayes Tree (MNBTree) method was 16.26% using 145 features. Moreover, the Multinomial Naïve Bayes (MNB) yielded the highest accuracy of 73,15% by using all dataset of 1665 features. The expected benefits in this research are that the Indonesian telecommunications provider can evaluate the performance and services to reach customer satisfaction of various needs.
Chapter
Businesses seek to analyse their customer feedback to compare their brand's popularity with the popularity of competing brands. The increasing use of social media in recent years is producing large amounts of textual content, which has become rich source of data for brand popularity analysis. In this article, a novel hybrid approach of classification and lexicon based methods is proposed to assess brand popularity based on the sentiments expressed in social media posts. Two different classification models using Naïve Bayes (NB) and SVM are built based on Twitter messages for 9 different brands of 3 cosmetic products. In addition, sentiment quantification have been performed using a lexicon-based approach. Based on the overall comparison of the proposed models, the SVM classifier has the highest performance with 78.85% accuracy and 94.60% AUC, compared to 73.57% and 63.63% accuracy, 80.63% and 69.38% AUC of the NB classifier and the sentiment quantification approach respectively. Specific indices based on classification and lexicon approaches are proposed to assess the brand popularity.
Article
Today we are living in an era of social media explosion where people are connected across the globe for entertainment purpose, providing their reviews on matter like politics, movies, celebrities etc. Organizations/industries can predict market behavior as well as the requirements of the user based on the opinion provided in the form of posts by the people. It will help in both the economic growth and in the research zone. Sentiment analysis is the study that analyzes people opinion and sentiment towards entities such as product, services etc. With the rapid proliferation of web, people are using online service sites, blogs, social networking sites anywhere anytime. It is necessary to analyze and understand these online generated data for proper decision making about a process, product and service. The object of Sentiment Analysis is an item or an administration whose survey has been made open in the Internet. It has proved to be beneficial in many ways, helping in decision making, in brand prediction and its reputation. In this paper we proposed an enhanced data dictionary that will perform data pre-processing task more efficiently and effectively. We justified this enhancement using a case study that will find the brand reputation of three mobile brands namely Motorola, Samsung and Iphone.
Article
Businesses seek to analyse their customer feedback to compare their brand’s popularity with the popularity of competing brands. The increasing use of social media in recent years is producing large amounts of textual content, which has become rich source of data for brand popularity analysis. In this article, a novel hybrid approach of classification and lexicon based methods is proposed to assess brand popularity based on the sentiments expressed in social media posts. Two different classification models using Naïve Bayes (NB) and SVM are built based on Twitter messages for 9 different brands of 3 cosmetic products. In addition, sentiment quantification have been performed using a lexicon-based approach. Based on the overall comparison of the proposed models, the SVM classifier has the highest performance with 78.85% accuracy and 94.60% AUC, compared to 73.57% and 63.63% accuracy, 80.63% and 69.38% AUC of the NB classifier and the sentiment quantification approach respectively. Specific indices based on classification and lexicon approaches are proposed to assess the brand popularity. Copyright © 2018, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Conference Paper
Full-text available
Text mining can be applied to many fields. One of the application is using text mining in digital newspaper to do politic sentiment analysis. In this paper sentiment analysis is applied to get information from digital news articles about its positive or negative sentiment regarding particular politician. This paper suggests a simple model to analyze digital newspaper sentiment polarity using naïve Bayes classifier method. The model uses a set of initial data to begin with which will be updated when new information appears. The model showed promising result when tested and can be implemented to some other sentiment analysis problems.
Conference Paper
Full-text available
The Web is a huge virtual space where to express and share individual opinions, influencing any aspect of life, with implications for marketing and communication alike. Social Media are influencing consumers' preferences by shaping their attitudes and behaviors. Monitoring the Social Media activities is a good way to measure customers' loyalty, keeping a track on their sentiment towards brands or products. Social Media are the next logical marketing arena. Currently, Facebook dominates the digital marketing space, followed closely by Twitter. This paper describes a Sentiment Analysis study performed on over than 1000 Facebook posts about newscasts, comparing the sentiment for Rai -the Italian public broadcasting service -towards the emerging and more dynamic private company La7. This study maps study results with observations made by the Osservatorio di Pavia, which is an Italian institute of research specialized in media analysis at theoretical and empirical level, engaged in the analysis of political communication in the mass media. This study takes also in account the data provided by Auditel regarding newscast audience, correlating the analysis of Social Media, of Facebook in particular, with measurable data, available to public domain.
Conference Paper
The huge amount of unstructured data available on the Web and the intranets creates an information overloading problem. So, managing the knowledge contained in the textual documents is an important problem of Knowledge Management. Knowledge Extraction from collections of data is possible by Knowledge Discovery in Database (KDD), an interactive and iterative process focused on the exploration of data to discover new and interesting patterns within them. The fundamental phase of KDD process is Data Mining if data are in structured form and Text Mining when they are unstructured. This paper describes a prototype of a vertical corporate portal that implements a KDD process for knowledge extraction from unstructured data contained in textual documents. Text mining is realized through a clustering method that produces a partition of a set of documents on the basis of their contents characterized through the frequency of the words
Unstructured Data and the 80 Percent Rule
  • C Bridge
C. Bridge, Unstructured Data and the 80 Percent Rule, 2011.
Department of Computer Science, Cornell University
  • B Pang
  • L Lee
B. Pang and L. Lee. (April 2, 2013). Department of Computer Science, Cornell University. [Online]. Available: http://www.cs.cornell.edu/home/llee/papers/pang-lee-stars.pdf
Implementasi naï ve bayes classifier pada program bantu penentuan buku referensi matakuliah
  • A Nurani
  • B Susanto
  • U Proboyekti
A. Nurani, B. Susanto, and U. Proboyekti, "Implementasi naï ve bayes classifier pada program bantu penentuan buku referensi matakuliah," Jurnal Informatika, Universitas Kristen Duta Wacana, vol. 3, no. 2, pp. 32-36, 2007.
Handbook of Natural Language Processing Customer think
  • B Liu
B. Liu, Handbook of Natural Language Processing, CRC Press, 2010. [8] M. D. Haff. (March 12, 2010). Customer think. [Online]. Available: http://www.customerthink.com/blog/sentiment_analysis_hard_but_wo rth_it
Available: http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me Simple text mining for sentiment analysis of political figure using naï ve bayes classifier method
  • Y E Soelistio
  • M R S Surendra
Lingpipe: Sentiment Analysis Tutorial. [Online]. Available: http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html [10] Y. E. Soelistio and M. R. S. Surendra, " Simple text mining for sentiment analysis of political figure using naï ve bayes classifier method, " in Proc. the 7th International Conference on Information and Communication Technology and Systems, Bali, 2013.