ArticlePDF Available

Empirical evaluation and study of text stemming algorithms

Authors:

Abstract and Figures

Text stemming is one of the basic preprocessing step for Natural Language Processing applications which is used to transform different word forms into a standard root form. For Arabic script based languages, adequate analysis of text by stemmers is a challenging task due to large number of ambigious structures of the language. In literature, multiple performance evaluation metrics exist for stemmers, each describing the performance from particular aspect. In this work, we review and analyze the text stemming evaluation methods in order to devise criteria for better measurement of stemmer performance. Role of different aspects of stemmer performance measurement like main features, merits and shortcomings are discussed using a resource scarce language i.e. Urdu. Through our experiments we conclude that the current evaluation metrics can only measure an average conflation of words regardless of the correctness of the stem. Moreover, some evaluation metrics favor some type of languages only. None of the existing evaluation metrics can perfectly measure the stemmer performance for all kind of languages. This study will help researchers to evaluate their stemmer using right methods.
Content may be subject to copyright.
Vol.:(0123456789)
Artificial Intelligence Review
https://doi.org/10.1007/s10462-020-09828-3
1 3
Empirical evaluation andstudy oftext stemming algorithms
AbdulJabbar1· SajidIqbal2 · ManzoorIlahiTamimy1· ShaqHussain3·
AdnanAkhunzada1
© Springer Nature B.V. 2020
Abstract
Text stemming is one of the basic preprocessing step for Natural Language Processing
applications which is used to transform different word forms into a standard root form. For
Arabic script based languages, adequate analysis of text by stemmers is a challenging task
due to large number of ambigious structures of the language. In literature, multiple perfor-
mance evaluation metrics exist for stemmers, each describing the performance from par-
ticular aspect. In this work, we review and analyze the text stemming evaluation methods
in order to devise criteria for better measurement of stemmer performance. Role of differ-
ent aspects of stemmer performance measurement like main features, merits and shortcom-
ings are discussed using a resource scarce language i.e. Urdu. Through our experiments
we conclude that the current evaluation metrics can only measure an average conflation of
words regardless of the correctness of the stem. Moreover, some evaluation metrics favor
some type of languages only. None of the existing evaluation metrics can perfectly meas-
ure the stemmer performance for all kind of languages. This study will help researchers to
evaluate their stemmer using right methods.
Keywords Natural language processing· Information retrieval· Text mining· Stemming
algorithms· Stemmer evaluation methods· Urdu stemming
* Sajid Iqbal
sajidiqbal.pk@gmail.com
Abdul Jabbar
a.jabbar73@gmail.com
Shafiq Hussain
shafiqhussain@bzu.edu.pk
Adnan Akhunzada
akhunzadaadnan@gmail.com
1 Department ofComputer Science, COMSATS University Islamabad (CUI), Main campus, Park
Road, Tarlai Kalan, Islamabad45550, Pakistan
2 Department ofComputer Science, Bahauddin Zakariya University Multan, Multan, Punjab,
Pakistan
3 Bahauddin Zakariya University Multan (Sahiwal Sub-Campus), Multan, Punjab, Pakistan
A.Jabbar et al.
1 3
1 Introduction
Performance evaluation is the primary method to find the effectiveness of algorithms and
methods developed to solve different scientific problems. Efficient evaluation methods can
boost the applicability of the solution. Performance evaluation methods describe and deter-
mine the extent to which the solution can achieve its intended goals.
It is an open challenge for computational linguistics researchers to assess the perfor-
mance of NLP applications (Cambria and White 2014). Existing evaluation methods can
be divided between two main categories: intrinsic and extrinsic methods (Gaidhane etal.
2015). In the intrinsic evaluation, performance measure of NLP applications and methods
are compared with some gold standard results that are calculated using manual methods.
For instance, a stem produced by a stemmer is compared with relevant dictionary stem
developed by human experts. Whereas, in the extrinsic evaluation method, the performance
is measured directly in any realistic scenario. For example, two stemmers are tested on the
same dataset and achieved accuracy is compared in relative way.
An evaluation of the text stemming system has always a long debate (Brychcín and
Konopík 2015). In past studies, various text stemming evaluation (TSE) methods are used
by researchers. However, the manual assessment of stemmers require human effort that
make it a challenging task (Suryani etal. 2018). The state-of-the-art TSE methods can be
direct or indirect (Singh and Gupta 2016). Direct evaluation refers to text stemming error
analysis, conflation ratio, and compression factors as mentioned by Abainia etal. (2017).
Indirect evaluation refers to stemmer evaluation with respect to specific NLP applica-
tion. Such methods usually use machine learning (ML) methods like K-Nearest Neighbor
(KNN), Naïve Bayesian (NB) and Support Vector Machines (SVM) are used for text clas-
sification (Saeed etal. 2018a, b). Presently neural network based methods are showing bet-
ter performance than tranditional ML methods. Performance evaluation methods used in
Information Reterival (IR) domain are considered as indirect methods (Mustafa and Rashid
2018).
To perform stemming in different langauges, various stemmers have been proposed
(Jabbar etal. 2018a, b). To evaluate a stemmer from multiple aspects, it is required to eval-
uate it through different methods i.e. direct and indirect ways. Direct evaluation methods
only consider the conflation ratio for performance measurement (Brychcín and Konopík
2015; Sirsat etal. 2013). These methods measure their performance based on correct stem-
ming ratio. They did not focus on false positives, false negatives and untouched words.
This shortcoming leads to limited applicability of designed stemmers. In this work, we
address and review state of the art TSE methods. We experimently prove that current eval-
uation methods provide a partial picture of stemmer performance.
The contributions of this work are three fold. Firstly, we present an extensive compara-
tive study of existing stemmer evaluation methods to highlight their merits and demerits.
Secondly, we perform various experiments to find the performance of different stemmers
designed for Urdu language. Thirdly and lastly, through our experiments, we show that
current evaluation methods provide partial view and some of the evaluation methods are
language specific.
Remaing part of this paper is organized as follows. Section2 provides the background
of this study. In Sect.3, different text stemming applications are discussed. Different stem-
ming evaluation methods are described in Sect.4. A deep analysis of evaluation methods is
given in Sect.5. In Sect.6, we discuss challenges associated with text stemming evaluation
Empirical evaluation andstudy oftext stemming algorithms
1 3
methods and future research directions in this context. Finally, the findings and conclusion
is provided in Sect.7.
2 Background
Stemming is a computational procedure in which affixes are truncated from conflated word
to extract the root. This process may differe from application to application based on their
role in that application. In text stemming, stemmers minimize the document index size to
improve the efficienty of computation. An index of a document containing English words
such as “accepts”, “accepted” and “acceptance” can be mapped to one common root i.e.
“accept”. Text stemming is an integral part of many NLP applications and an evaluation
of these systems is a very crucial and tedious task (Dahab etal. 2015). This section pre-
sents necessary definitions and descriptions, required to understand stemming evaluation
process.
Morpheme Morpheme is the smallest grammatical unit of a language that cannot be fur-
ther divided into smaller meaningful parts and are combined together to form meaningful
words. This combination could be through inflection, derivation, and composition (Aronoff
and Fudeman 2011). For example, a morpheme may consist of a word such as “accept-
ance” in which “accept”, is a meaningful piece of a word, whereas; “ance” is a meaningless
morpheme. It cannot be further divided into smaller meaningful parts.
Affix Affix is a morpheme that can be defined as word or letters which are attached to
the root or stem on any position i.e. it may be at the end /start /on both sides of the word or
anywhere in the middle of the word (Aronoff and Fudeman 2011). Affixes are used to pro-
duce inflections and derivative forms of a word. For instance “accepts”, “accepting” and
“accepted” in which “s”, “ing” and “ed” are affixes and stem is “accept”.
Inflectional and derivational affixes Inflectional morphemes refer to the modification of
words and that change the grammatical categories such as tenses, singulars, plurals, mas-
culine, feminine and neutrals. The derivation is the reverse of inflection; it constructs new
words by adding affixes to a root word (Qureshi etal. 2018).
Stem It is a base morpheme to which other morphemes like affixes are attached (Aronoff
and Fudeman 2011).
Root A root is like a stem, but it contains only two morphologically simple units. For
instance, ‘disagree’ is the stem of ‘disagreement’ because it is the base morpheme to which
ment’ affix is attached, but ‘agree is the root (Aronoff and Fudeman 2011). With this defi-
nition, we will be using both “stem” and “root” alternativilly. If required, the context will
clear the particular meanings.
Lemma and Lexeme.
Lexemes indicate a common morpheme in a variant form of a word. On the other hand,
a lemma is a definite form that is chosen from the lexemes collection to characterize the
lexeme. The lemma is a valid dictionary word (Singh and Gupta 2016). For instance, the
words ‘write’, ‘writing’, ‘wrote’, ‘writes’ and ‘written’ are lexeme and ‘write is a lemma.
2.1 Overview ofstemming algorithms
In literature, researchers have categorized stemming algorithms in different ways. Al-Sug-
haiyer and Al‐Kharashi (2004) studied stemmers for English and Arabic languages and
A.Jabbar et al.
1 3
have classified the English stemmers into three main categories i.e. table lookup, linguis-
tics, and computational. They classified Arabic stemmers into four groups i.e. table lookup,
linguistics, computational and pattern based. According to Paik et al. (2011), stemmers
can be categorized as rule-based and statistical stemmers. Jivani (2011) examined the Eng-
lish stemmers and classified them into three subcategories i.e. truncating, statistical and
mixed. Zhou etal. (2012) divided the stemmers into two main categories (i.e., rules-based
stemmers, and statistical stemmers). Moghadam and MohammadReza (2015) described
the Persian stemmers and classified them into three classes: structural stemmers, table
lookup stemmers and statistical stemmers. Singh and Gupta (2017) divided the stemming
approaches into linguistic rule-based methods and language independent/statistical meth-
ods. Jabbar etal. (2018a, b) classified the stemmers into three classes: (a) linguistic-based
stemmers, (b) corpus-based stemmers, and (c) hybrid stemmers. In general, the stemming
algorithms can be classified as linguistic-based and computational-based stemmers as
shown in Fig.1. In linguistic-based stemmers, handcrafted grammatical rules are designed
to derive the stem. For instance Saeed etal. (2018a, b) presented rule-based stemmer for
the Kurdish language, and Suryani et al. (2018) developed Sundanese rule-based stem-
mer. On the other hand, computational stemmer performs some statistical or non-statisti-
cal computations. Corpus-based stemming measures the co-occurrence of variants word
forms [e.g., Alotaibi and Gupta (2018)]. Similarly, in statistical stemmers, researchers have
applied statistical and machine learning based procedures/techniques to extract the stem
(Bölücü and Can 2019; Pande etal. 2018).
A good number of rule based stemmers for English language are proposed in litera-
ture (Lovin 1968; Porter 1980). Many researchers have attempted to improve the perfor-
mance of Marting Porter’s Stemmer (Bimba etal. 2016), due to its effectiveness for many
NLP applications (Chintala and Reddy 2013; Patil and Patil 2013). Examples of rules and
resource development for rule based stemmers are given by Jabbar etal. (2016) where the
authors have developed some resources for Urdu language processing. According to table
lookup/ references lookup (also called brute force approach), the word and correspond-
ing stem are saved in the form of a table (Hussain et. al. 2017). This approach is suitable
for those languages which have very complex linguistic structures. In statistical stemmers,
several statistical features are extracted from given dataset. By using these features, the
stem of the query word is obtained using statistical classifiers. Statistical methods are also
known as language independent methods. Hybrid stemmers are constructed by combining
two or more stemming methods. For instance, Bimba etal. (2016) stemmer combined the
Types of stemmers
Computational-basedLinguistic-based
Corpus basedStatistical
Rule based/
Affix striping stemmers
Template basedTable lookup
Fig. 1 Classification of stemming algorithms in terms of types
Empirical evaluation andstudy oftext stemming algorithms
1 3
rule base and table lookup approaches for the Hausa language. Jabbar etal. (2018a, b) also
constructed an Urdu stemmer using a hybrid approach.
2.2 Text stemming errors
Recognizing the types of errors, a stemmer may produce, is the first step to measure the
effectiveness of given stemmer. These types of errors can help to find the answers of ques-
tions like when and why they are occurred and what is their affect on stemmer perfor-
mance? Fig.2 gives the categorization of stemming errors.
Under stemming errors (USE) It refers to the fact when a stemmer strips the letters
under acceptable level. In this type of errors, the stemmer either produces the word as
it (no stem) or the process of affix removal generate the word with changed meaning as
shown in Table1.
Over stemming errors (OSE) OSE is an error in which a stemmer truncates more char-
acters than required. OSE error leads toward invalid stem or out of vocabulary (OOV) word
(Table2).
Mis-stemming errors (MSE) The term “Mis-stemming errors” refers to those errors in
which the stripped characters do not make proper affix (Table3).
Text stemmin
g
errors
Miss Stemming Over Stemming
Invalid Word Change WordNo Stem Under stem InvalidStem Over Stem
Under Stemming
Fig. 2 Overview of stemming errors
Table 1 Examples of under
stemming errors Input word Actual stem Produced stem Types of error
Acceptance Accept Acceptance No stem
Acceptances Accept Acceptance Under stem
Table 2 Example of over
stemming errors Input word Actual stem Produced stem Types of error
receiving receive receiv Invalid stem
consistently consist consistent Over stem
Table 3 Example of mis-
stemming errors Input word Actual stem Produced stem Types of error
Red Red r Invalid word
kneel Kneel knee Change word
A.Jabbar et al.
1 3
3 Applications ofstemming algorithms
A stemming is a morphological analysis and necessary preprocessing step for NLP appli-
cations (Hassani and Lee 2016). To model a language, researchers extract different type
of features (manual or automatic) from given data (Brychcín and Konopík 2015). There
are variety of NLP applications and each type of application may require different type
of features. For example, a language expert may utilize stemmer for vocabulary learn-
ing and development (Mochizuki and Aizawa 2000). Sarma and Purkayastha (2013) and
Dang et. al. (2013) have used stemming for word classification and wordnet development
respectivelly. Domain specific words extraction is performed by Rehman et. al. (2013) and
Nguyen and Leveling (2013). Vocabalry mismatch problem can also be solved with the
help of stemming (Singh and Gupta 2016). Applications like Information Extraction (IE),
Information Reterival (IR), Text Classification (TC), Text Clustering (TClu), Question
Answering (QA), Text Summarizations (TS), Machine Translation (MT), Text Segmenta-
tion (TS), Indexing (Ind), Automatic Speech Recognition (ASR) (Dahab etal. 2015; Singh
and Gupta. 2016) and language generation (Mishra and Prakash 2012) require stemming
as preprocessing step. In short, stemming improves the performance by reducing time and
space complexity for several NLP applications (Boudchiche and Mazroui. (2015). A sum-
marized view of applications of text stemming systems is provided in Table4.
4 Stemming evaluation methods
In this section we review the representative stemming evaluation methods. To the best of
our knowledge, we have included all available stemming evaluation methods and are sum-
marized in Table5.
Text stemming evaluation methods can be categorized as direct, indirect and gold stand-
ard evaluation metrices as shown in Fig.3.
4.1 Direct evaluation methods
These evaluations methods used training datasets to extract required statistics known as
features to perform stemming. These statistics may include over stemming index (OI),
under stemming index (UI), Index compression factor (ICF), Average Words Conflation
Factor (AWCF), and few more statistical significance metrics. In this subsection, we ana-
lyze all these different direct evaluation methods and make a discussion on our analysis.
4.1.1 Paice’s (1994, 1996) evaluation methodology
Paice (1994, 1996) proposed first stemmer which is based on error counting and prede-
fined groups of words that are related to each other either morphologically or semantically.
A good stemmer conflates as many words as possible in a predefined group and avoids
conflating different class words or words that are semantically distinct. Using this method,
performance of proposed stemmer is measured with under-stemming index (UI), over-
stemming index (OI) parameters, their ratio and the stemming weight (SW). To determine
Empirical evaluation andstudy oftext stemming algorithms
1 3
Table 4 State-of-the-art applications of text stemming algorithms
No. 1 Cited Application Description
1 Schofield and Mimno. (2016), Hassani and Lee (2016) Language modeling Stemming may be viewed as a system of smoothing and as a
way of better statistical estimation
2 Schofield and Mimno (2016) Topic modeling Stemmers can reduce the vocabulary size and topic mod-
eling depends upon sparse vocabulary. So, it is also lever-
aged in topic modeling as a preprocessing step
3 Boukhalfa etal. (2018) Plagiarism detection Stemming enhances the performance of the similarity detect-
ing system
4 McCormick (2016) Word embedding Two words have similar context must have the same word
vector such as “ant” and “ants” have a similar context that
is possible when a stemmer stem “ants” to “ant”
5 Sarma and Purkayastha (2013) Word classification Stemming also improves the efficiency of word classification
applications
6 Dang etal. (2013) WordNet development Stemming deals with a variant form of words and each form
of word belongs to a specific part of speech that is helpful
in WordNet development
7 Nguyen and Leveling (2013) Domain-specific words extract from the text Domain-specific words have a specific form or specific affix
and stemming facilitates to identify these words form or
affix
8 Saeed etal. (2018a, b)
Ismailov etal. (2016)
Karimi etal. (2015)
Text mining The goal of text mining is to extract meaningful information
from text data
9 Dey etal. (2014) Named entity recognition (NER) Named entity recognition (NER) system seeks and extract
the predefined proper and common nouns entities from the
natural language text
10 Rehman etal. (2013) Word segmentation Stemming can be viewed as word tokenization from the
continuous text
11 Dahab etal. (2015), Singh and Gupta (2016) Information extraction (IE) Stemming also improves the efficiency of an information
extraction system
12 Flores and Moreira (2016), de Oliveira and Junior (2018) Information retrieval (IR) IR system uses a variant form of the words via stemmer
A.Jabbar et al.
1 3
Table 4 (continued)
No. 1 Cited Application Description
13 Rani etal. (2015), Ali etal. (2018), Saeed etal. (2018a, b) Text classification (TC) Stemming can be viewed as text classification mechanism
because it groups the words that share the same morpho-
logical root
14 Khalid etal. (2016), Dahab etal. (2015) Text clustering (TClu) A bag of words can be grouped by the stemming system
15 Giachanou and Crestani. (2016), Yadollahi etal. (2017) Sentiment analysis Pre-processing includes recognition and deletion of stop
words, slangs, abbreviations, stemming and correction
16 Khalid etal. (2016) text compression Text stemming compresses the vocabulary by reducing
conflicted words to their common root form
17 Dahab etal. (2015), Singh and Gupta (2016) Question answering (QA) A variant form of questioning words stems that enhances the
performance of the QA system
18 Dahab etal. (2015), Singh and Gupta (2016) Text summarization (TS) Variant forms of words with different meanings can be
reduced to their common root. It makes easy for TS system
to perform better
19 Fattah etal. (2006) Machine translation (MT) It is the variant form of words which are not present in the
tagset language. Subsequently, in such cases, stemmer
provides the stem that is helpful for translation
20 Rashid and Mohamad (2016) Detecting wicked website In wicked information filtering and detecting wicked web-
site, text stemming is used to extract features that ultimatly
improve the performance of the system
Empirical evaluation andstudy oftext stemming algorithms
1 3
Table 5 Summary of state-of-the-art stemmer’s evaluation metrics
No Wor k Stemming method Languages Dataset size Evaluation methods
1 Mishra and Prakash (2012) Rule-based Hindi 2265 words Manual
2 Ababneh etal. (2012) Rule-based Arabic Sample terms list Manual
3 Karaa (2013) Rule-based English 30,000 words Paice (1994)’s evaluation method
4 Thangarasu and Manavalan
(2013)
Statistics (cluster analysis) Tamil 7,000 words Manual
5 Husain etal. (2013) Statistics (N-gram) Urdu and Marathi 1,200 Urdu words
1,200 Marathi words
Manual
6 Sulaiman etal. (2014) Rule-based Malay 1,200 words Paice (1994)’s Evaluation
Method
7 Abu-Errub etal. (2014) Rule-based Arabic 1100 words Manual
8 Al-Omari and Abuata (2014) Rule-based (linguistic and
mathematics rules)
Arabic 6,225 words Manual
9 Rashidi and Lighvan (2014) Hybrid Persian Small data from Hamshahri
collection
Manual
10 Dianati etal. (2014) Corpus base approach Persian 1,250 Persian words Manual
11 Al-Kabi etal. (2015) Rule-based and pattern base Arabic 6,081 words Manual
12 Khan etal. (2015) Rule-based and template
matching
Urdu 66,200 words Precision, recall and F-measure
13 Brychcín and Konopík (2015) Statistical Czech, Slovak, Polish, Hungar-
ian, Spanish and English
languages
Large date set for each lan-
guage
Precision, recall and F-measure
14 Bimba etal. (2016) Rule-based Hausa language 1,723 words Paice (1994)’s evaluation method
Sirsat’s evaluation method (Sirsat
etal. 2013)
15 Momenipour and Keyvanpour
(2016)
Statistical Persian PER-Tree-Bank words
Bijankhan distinct words
Hamshahri test collection
Manual, precision, recall
16 El-Defrawy etal. (2016) Rule-based Arabic International Corpus of Arabic
(ICA)
Precision, recall and F-measure
A.Jabbar et al.
1 3
Table 5 (continued)
No Wor k Stemming method Languages Dataset size Evaluation methods
17 Abainia etal. (2017) Rule-based Arabic ARASTEM data setaPaice (1994)’s evaluation meth-
odology
18 Taghi-Zadeh etal. (2015) Hybrid Persian 4,689 words
26,913 words
Manual
19 Singh and Gupta (2017) Statistical English
Marathi
Hungarian
Bengali
173,252 WSJ documents
(English)
99,275 documents (Marathi)
Magyar
Hirlap collection of 49,530
documents (Hungarian)
FIRE
2010 collection containing
123,047 documents (Bengali)
Precision, recall and F-measure
20 Mateen etal. (2017) Hybrid Punjabi 85,152 words Manual
21 Jaafar etal. (2017) Rule-based Arabic Quranic Arabic Corpus Frakes and Fox (2003) Evalua-
tion mechanism
22 Jabbar etal. (2018a, b) Hybrid Urdu 76,074 words Precision, Recall and F-measure
Frakes and Fox (2003)
23 Alotaibi and Gupta (2018) Statistical English
Marathi
Hungarian
Bengali
WSJ documents (English)
Sakal and Maharashtra Times
(Marathi)
Magyar Hirlap corpus (Hungar-
ian)
FIRE
2010 collection (Bengali)
Precision, Recall and F-measure
24 Suryani etal (2018) Rule-based Sundanese 4,453 words Paice (1994)’s evaluation meth-
odology
25 Saeed etal. (2018a, b) Rule-based Arabic 4007 documents Precision, recall and F-measure
26 Ali etal. (2019) Rule-based Urdu 32,000 words Manual
Empirical evaluation andstudy oftext stemming algorithms
1 3
Table 5 (continued)
No Wor k Stemming method Languages Dataset size Evaluation methods
27 Bölücü and Can (2019) Statistical Turkish, Hungarian, Finnish,
Basque, and English
Turkish = 5,620 sentence and
53,798 tokens
Hungarian = 24K words
Finnish = 19,000 sentences and
1,62,000 words
Basque = 24K words
English = 24K words
Frakes and Fox (2003) evaluation
mechanism
a https ://abain ia.net
A.Jabbar et al.
1 3
these values, a list of groups of semantically and morphologically related words are formed
and then submitted to the stemmer. A stemmer commits under stemming error if it pro-
duces more than one unique stems for the same group or class of words. On the other hand
if produced stem of one group also occurs in another group (same stem is produced for two
groups of words) then the stemmer has committed over-stemming error. An ideal stemmer
conflates this group to the same stem and has low UI and OI indexes. UI and OI can be
calculated using Eqs. (1) and (2). Following four parameters are used to calculate over-
stemming and under-stemming indexes.
Global Desired Merge Total (GDMT)
Global Desired Non-Merge Total (GDNT)
Global Unachieved Merge Total (GUMT)
Global Wrongly Merged Total (GWMT)
These parameters are defined in following section.
Under stemming index (UI) The Desire Merges Total (DMT) represents the total
number of word forms in the group, and it can be calculated by the Eq.(3)
where
ng
= possible number of morphological forms in particular group having same stem
GDMT is equal to the sum of DMT values for all word groups as in Eq.(4)
Unachieved Merge Total (UMT) represents the failure of a stemmer to merge all
query words to the same root in a specific group. Unachieved Merge Total (UMT),
u
represents the total number of distinct stems in a group that are produced by the stem-
mer, and it can be calculated as follows in (Eq.5)
(1)
Under Stemming Index
(UI)=
GUMT
GDMT
(2)
Over Stemming Index
(OI)=
GWMT
GDNT
(3)
DMT
g=
1
2
ng(ng1
)
(4)
GDMT
=
N
i=1
DMT
g
(5)
g=
1
2
uingui
Text Stemming Evaluation Methods
Direct methods
Examples:Paice evaluation method,
Sirsat’s evaluation method
Indirect methods
Examples: Evaluation of stemmer using IR
systems (precision, recall and F-measure)
Gold standard
Manual methods
Fig. 3 Classification of text stemming evaluation methods
Empirical evaluation andstudy oftext stemming algorithms
1 3
where
s
= Number of distinct stems produced by stemmer (a stemmer may produce multi-
ple stems for a particular group of words),
ng
= possible number of morphological forms in
particular group having same stem,
ui
= ith stem produced by stemmer
GUMT can be calculated as sum of UMTs for each group using Eq.(6)
where
G
= Total number of groups in given corpus
Finally, the under-stemming index (UI) can be defined as given in (Eq.1):
Over steaming index (OI) Wrongly Merged Total (WMT) represents the over stem-
ming when words from two different groups are stemmed to one root. It can be calcu-
lated using (Eq.7):
where N = total number of groups involved in correct and wrong stemming,
ni
= number of
words in ith group,
vij
= Number of stems that should actually belong to group i but pro-
duced in group j.
In Eq. (7), we can see that
vij
is the number of stems that belong to group
i
and
stemmer has produced in group
j
. If
i=j
, the stemmer has performed the job correctly
wheras in case
ij
, wrong stemming has been performed. In short, it tells that the
stemming process for a particular group of morpholical variants is interfering with other
groups.
Global Wrongly Merged Total (GWMT) is obtained by summing the WMT for all the
groups by Eq.(8).
Desired Non-Merge Total (DNT) refers to the number of words in a certain group that
can be conflated after stemming with words from some other semantic group. DNT can be
calculated by the Eq.(9)
where
w
= Total number of words in the test dataset,
ng
= Total number of words in particu-
lar group
The Global Desired Non-Merge (GDNT) is equal to the sum of DNT for all the groups
and can be calculated by Eq.(10)
Hence, the Over-Stemming Index (OI) can be calculated by Eq. (2)
Stemming weight (SW) The SW parameter measures the strength of a stemmer. Lower
value of SW identifies weak stemming whereas higher value indicates the strong stem-
ming. It can be calculated using Eq. (11).
(6)
GUMT
=
G
i=1
UMT
g
(7)
WMT
g=1
2
N
i,j=1
vijnivij =v11 ,v12,v13 ,v21 ,v22,v23 ,v31 ,v32,v33
(8)
GWMT
=
G
i=1
WMT
g
(9)
DNT
g=
1
2
ng
(
wng
)
(10)
GDNT
=
G
i=1
DNT
g
A.Jabbar et al.
1 3
Paice (1994) utilized the ClSI source (CISI Collection, University of Glasgow) which
contain 184,659 words and the authors extracted two smaller word samples of size 1,527
distinct words. The author experimented with Lovin’s (1968), Porter’s (1980) and Paice/
Husk’s (1990) stemmers and concluded that Paice/Husk (Paice 1990) stemmer has the
highest value of OI index; on the contrary, rest of the two stemmers show the lowest score.
In the case of UI index, Porter’s stemmer has the highest under-stemming errors than oth-
ers. Paice/Husk (Paice 1990) has UI lowest score
[
1.21 ×10
1]
and Porter has highest UI
[
3.74 ×101
]
whereas porter has lowest OI
[
2.18 ×105
]
and Paice/Husk (Paice 1990) has
highest OI
[
1.18 ×10
4]
as mentioned in Table6. According to SW score
[
9.78 ×10
4]
,
Paice/Husk (Paice 1990) is the strongest stemmer, Lovin’s (1968)’ stemmer scored SW is
[
1.93 ×10
4]
and is at second place and Porter’s (1980)’s stemmer with SW
[
7.4 ×10
5]
stands last as shown in Table6.
Discussion Paice’s Evaluation Methodology (PEM) has some problems. First, it is not
trivial to create groups of morphologically related words, and if a group contains only one
word then the value of DMT will be zero. Moreover, it is time-consuming because the
manual check to find whether a resultant word is suffering from under-stemming or over-
stemming. This methodology is not suitable to check a large volume of data set. It deals
only with two types of stemming errors; however, a stemmer may commit some other
errors such as generation of invalid words i.e. generated stem does not lie in any group. In
some cases, the stemmer produces linguistically correct stem but incorrect in reality, as
 [two boys] by Khoja (1999) stemmer root  [soft], is derived, it is linguistically cor-
rect but not valid stem,  [give birth] is the correct stem (Nwesri and Alyagoubi 2015).
And finally, this method is suitable only for the English language (AlSerhan and Alqrainy
2008).
4.1.2 Hull’s evaluation method
In the situations where the performance of two stemmers is slightly different from each
other. it is very hard to say that the performance variation is enough or not, or it just hap-
pens by chance. For such cases, Hull (1996) proposed Analysis Of Variance (ANOVA)
model for large, continuous and normally distributed sample size. It is observed that for
English language, word inflectional forms are low, and the observed differences are lim-
ited. Hull (1996) performed the experiments over five stemmers [Remove s stemmer,1
Lovins (1968),2 Porter (1980), Inflec and Deriv stemmers (Xer 1994)] using the SMART
text retrieval system originated at Cornell University (Buckley 1985). No stemming is used
(11)
SW
=
OI
UI
Table 6 Results of Paice
evaluation methods Stemmers UI OI SW
Lovins 3.26 × 10−1 16.3 × 10−5 1.93 × 10−4
Paice/husk 1.21 × 10−1 1.18 × 10−4 9.78 × 10−4
Porter 3.74 × 10−1 2.18 × 10−5 7.4 × 10−5
1 Built in SMART system.
2 Extensively modified version Lovens (1968) included in SMART system.
Empirical evaluation andstudy oftext stemming algorithms
1 3
in order to index the queries and compare them with their standard form. Hull (1996) con-
cluded the average absolute improvement is smaller (up to 1–3%) in IR system only due to
stemming.
4.1.3 Frakes ‘s evaluation
The strength of stemmer indicates the degree of variation of the derived stem. Weak/light
stemmers conflate only highly related words such as “consist”, “consisted”, and “consist-
ing”. In contrast, strong/heavy stemmers can handle more variation in morphological forms
such as “consistency”, “consistent”, “consistently”. Frakes and Fox (2003) proposed a cri-
terion for determining the stemmer strength and similarity as described below.
The Mean number of words per conflation class (MWC) MWC refers to the mean
number of words conflated per class. For example, “consist”, “consisted”, “’consisting”
are conflated to “consist” which determine the value of MWC. In this case, it is three
words. Higher value of MWC signifies the better performance of a stemmer. It is calcu-
lated using Eq.(12).
where N refers to the total number of unique words in a class and S refers to the number of
unique stems obtained.
Index compression factor (ICF) A higher ICF value signifies the strength of the stem-
mer (Frakes and Fox 2003). Many experiments have proven that strong stemmers yield
higher ICF value. Lennon etal. (1981) achieved ICF
[30.945.8%]
on Lovin’s (1968)’
stemmer and 26.238.8% using Porter’s (1980)’s stemmer. Frakes and Fox (2003)
depicted higher 29% ICF and 17% on porter’s (1980) stemmer. Paice, (1994) and Har-
man (1991) proved Loven’s (1968) has higher ICF 44.60% and 38.38 respectivly than
porter’s (1980) stemmer as shown in Table7 last two columns.
The index compression factor is defined in Eq.(13)
where,
n
= the number of words in the corpus,
s
= the number of stems
For example, a corpus with 50,000 words (n) and 40,000 stems (s) would have an
index compression factor of 20%.
The word change factor (WCF) WCF indicates the number of words that are left
unchanged by the stemmer. For example, a stemmer might not alter the word “consist”
as it is already a stem. Strong stemmers may often change such words than weaker stem-
mers. Normally, the higher value of WCF indicates the best stemmer. WCF can be cal-
culated in (Eq.14)
(12)
MWC
=
N
S
(13)
ICF
=
ns
n
Table 7 ICF values of Lovins and Porter stemmers
Stemmer Lennon etal. (1988) Frakes and Fox
(2003)
Paice, (1994) Harman (1991)
Lovin’s (1968) From 30.9–45.8% 29% 44.60% 38.23%
Porter’s (1980) From 26.2–38.8% 17% 38.90% 28.74%
A.Jabbar et al.
1 3
where
N
parameter is the number of unique words and
C
is the number of unchanged
words.
The mean number of characters removed (MCR) It represents the average number of
characters (in a group) removed to derive the stem. The strong stemmer truncates more
letters than the weak stemmer to obtain the stem. As an example, for the word “helps”,
one character ‘s’ is removed, for “helper” two letters ‘er’ are removed, in case of the
word “helpful” three letters ‘ful’ are removed, similarly for “helpless” word four let-
ters ‘less’ are stripped to extract the stem “help”. Equation Eq.(15) computes the MCR
score.
Frakes and Fox (2003) used the Moby Common Dictionary wordlist3 to evaluate four
stemmers ["S" stemmer (Harman 1991), Lovins (1968), Porter (1980), Paice/Husk(1990)]
and claimed that Paice and Lovins stemming algorithms are the most similar, while the
Paice and "S" stemmers are the most dissimilar.
MCR only measures the strength of a stemmer and gives metrics to check the similarity
of two stemmers; however, it does not deal with the accuracy of the extracted stem. It also
does not measure the correctly stemmed words. It does not provide information about inva-
lid or modified stem production.
MCR does not measure the transformations of the stem and CI only check the compres-
sions ratio of vocabulary size, not the correctness of the stem. Moreover, it is difficult to
identify all the conflation classes and checking corresponding stem words manually for
every conflation class is also a tedious task. Because some languages have high conflation
and derivation morphology like Hungarian or Hebrew languages which have thousands of
variant forms from a single word (Krovetz 2000). Consequently, its vocabulary Compres-
sion Index (CI) will be high. For instance, famous English stemmer Porter (1980) claimed
to reduce initial vocabulary by one third and Jabbar etal. (2018a, b) proposed Urdu stem-
mer that reduced vocabulary size by 55%.
4.1.4 Sirsat’s evaluation method
Sirsat etal. (2013) criterion is very compelling for assessing the strength and accuracy of a
stemming algorithm. The following parameters are used to evaluate the strength and accu-
racy of the stemmer.
Word stemmed factor (WSF) It refers to the average number of words stemmed from the
stemmer. The threshold value is the minimum (50%). The larger value of WSF signifies the
strength of the stemmer. It can be calculated by Eq.(16)
(14)
WCF
=
NC
N
(15)
MCR =
Total no.of letters removed
Total no.of words
MCR =(1+2+3+4)
4
=2.5
(16)
WSF
=
WS
TW
×
100
3 https ://antifl ux.org/dicti onary ?dict=moby-thesa urus
Empirical evaluation andstudy oftext stemming algorithms
1 3
where
WS
= No. of stem words,
TW
= Total number of words in a sample
Correctly stemmed words factor (CSWF) It indicates the mean number of words cor-
rectly stemmed by the stemmer. The higher percentage of CSWF indicates the higher
strength and accuracy of the stemmer. Minimum threshold value of CSWF is 50% and it
can be calculated by the following Eq.(17)
where
CSW
= Number of correctly stemmed words,
WS
= Total number of stemmed words.
Average words conflation factor (AWCF) This refers to the mean value of variant words
of a different conflation group/s that are correctly stemmed. To calculate AWCF, we must
compute the number of distinct words after conflation, which is calculated by
Eq
. (18):
where
S
= Number of distinct stems after stemming,
CW
= Number of correct words which
are not stemmed
Finally, AWCF is obtained by Eq.(19)
The higher value of AWCF indicates the higher strength and accuracy of the stemmer.
Sirsat etal. (2013) carried out the experiments over four stemmers [lovins (1968), porter
1(1980), porter 2 (2006), Paice/Husk(1990)] and concluded that the Paice/Husk stemmer
is slightly better than other stemmers in terms of ICF [64.63] and AWCF [19.26]. Lovins
(1968)’s stemmer has higher WSF 73.35 and porter 2 better with CSWF 34.76 as shown in
Table8.
The value of AWCF may be zero or negative when the number of incorrect stems is
larger than the correctly stemmed words.
4.1.5 Jaafar évaluations mechanism
Jaafar etal. (2017) used the execution time and accuracy of stemmer to determine the per-
formance of a stemmer using formula given in Eq.(20)
where,
GSscore
= Global Stemming Score,
Tw
= Execution time to get a stem for a word,
Accw
= Correctness of stem word
w
,
𝛼
,
𝛽
= Variables to give the weights of time taken by
(17)
CSWF
=
CSW
SW
×
100
(18)
NWC =SCW
(19)
AWC F
=
CSWNWC
CSW
×
100
(20)
GS
score =
𝛼.T
w
𝛽.Accw
Table 8 Results of Sirsat’s
evaluation method Stemmers ICF WSF CSWF AWCF
Paice/Husk 64.63 70.99 28.73 19.26
lovins 56.52 73.35 27.80 −24.8
Porter1 51.88 67.17 31.97 −8.52
Porter2 53.72 66.58 34.76 8.6
A.Jabbar et al.
1 3
the stemmer to extract the stem and accuracy. These values reflect which is more impor-
tant, accuracy or execution time, if accuracy is matter, then the value of
𝛼
is set higher than
𝛽
.
Tw
and
Accw
are two different measurements.
Tw
is measured in the form of
n1,2,
and
Accw
can be from 1 to 100. The relation between accuracy and execution time is always
inverse to each other. Ridiculus results may be obtained if weights are not properly
assigned.
4.2 Gold standard assessments
In this evaluation approach, the correctness of a system is manually checked by experts.
For this purpose, input is given, and its corresponding output is checked manually.
Many statistical and rule-based stemmers are evaluated manually such as Ali et al.
(2019) Urdu stemmer, Al-Kabi et al. (2015) Arabic stemmer and Persian stemmer
(Taghi-Zadeh etal. 2015). The accuracy of the stemmer by gold standard assessment
can be calculated as given in
Eq
. (21)
This method is good for small sized dataset however it is not suitable for large-scale
evaluation. This method reflects the ratio of correct stems produced by a stemmer, but
is silent about already stem words given to a stemmer. TP and TN both are important to
determine the performance of a stemmer.
4.3 Indirect evaluation
Stemming reduces the dimensionality of the text data and improves the performance
of information retrieval (IR) system (de Oliveira and Junior 2018). The performance
of a stemmer can also be evaluated in the context of specific NLP applications such as
Alotaibi and Vishal Gupta (2018) evaluated their proposed stemmer in an IR system.
Ali etal. (2018) tested their stemmer for text classification and Boukhalfa etal. (2018)
proposed Arabic stemmer to improve the performance of Arabic plagiarism detection
system. Many researchers compared the performance of various stemming algorithms
for IR system such as Flores and Moreira (2016) to evaluate on Portuguese, Spanish,
French, and English language stemmers in IR experiments and concluded 70% of the
query topics improvement in AP (average precision). Karaa (2013) modified the Porter
(1980) stemmer and claimed that new porter stemmer improves the IR system by 0.852
precision and 0.884 recall. In contrast without stemming precision is 0.661 and recall is
(21)
Accuracy
=
Tot al N o.of correct stem obtained
Tot al N o.of words given to stemmer
×
100
Table 9 Stemmer’s performance
in the IR system (Karaa 2013)Used stemmer in IR Precision Recall
Without stemming 0.661 0.671
Original porter stemmer 0.732 0.775
New porter stemmer 0.852 0.884
Empirical evaluation andstudy oftext stemming algorithms
1 3
0.671 and with orginal porter (1980) stemmers precision 0.732 and recal 0.775 shown
in Table9.
The recall, precision, and F1-measure (Jabbar etal. 2018a, b) are standard measures
to assess the performance of stemmer in an IR system.
Recall Recall indicates the ratio between stem words extracted by stemmer and total
possible stem words as mentioned in Eq.(22).
Precision Precision is a ratio of total correct stems and total produced stems. It is
calculated by Eq.(23)
Weighted F1-measure A variant of F1-measure allows weighting emphasis on preci-
sion over recall. It is calculated by (Eq.24).
where Β = Weighting between precision and recall typically β = 1.
A weighted combination of recall and precision In addition to the standard precision/
recall measures, several other methods are also adopted by the researchers such as Lennon
etal. (1998) who have used a weighted combination of recall and precision as given in
Eq.(25).
where
E
= an effective function, lower value of E indicates better performance,
P
= Preci-
sion,
R
= Recall,
b
= measures the relative importance attached to PR
AP and Mean Average Precision (MAP) are also used to evaluate the impact of stem-
ming in an IR system. AP refers to the mean of precision and recall; whereas, the MAP
represents the average of AvPs when more than one query is used (Flores and Moreira
2016).
TERRIER (Qunis et al. 2006) is an open source IR system that is a highly flexible,
efficient and comprehensive platform for carrying out stemming experiments. TERRIER
(Qunis etal. 2006) is developed by the School of Computing Science, the University of
Glasgow in JAVA programming language and is available at https ://terri er.org/. It supports
UTF (Unicode Transformation Format) text hance corpora of many languages can be used.
It uses Porter stemmer (1980) by default. Many other IR systems are also available such as
Lemur/Indri (“Lemur” 2016) Lucene/Solr (“Lucene” 2018), Xapian (“Xapian” 2018). All
IR systems can be used to perform basic IR tasks. However, TERRIER has some deficien-
cies that may include:
It is difficult to check how many words are relevant in the corpus.
It is hard to choose a stemmer for search engine because every search engine has a dif-
ferent database.
The results are not reliable because every stemmer is evaluated on different datasets.
(22)
Recall =
TotalCorrect Stem Produce By System
TotalPossible Correct Stem
(23)
Precision
=
TotalCorrect Stem Got By System
TotalStem Produce By System
(24)
F
1weighted =
(
𝛽
2
+1
)
(precision ×recall
)
𝛽
2
(recall)+precision
(25)
E
=1
(
1+b
2)PR
b
2
P+R
A.Jabbar et al.
1 3
There is no mechanism defined to find number of produced stem words and actual stem
words because the platform only tells whether a word is stemmed or not. Moreover, it does
not tell the degree of the correctness of the stem.
5 Analysis ofevaluation methods
Every evaluation method measures the specific features of the stemmer and ignores the
rest. For example, Al-Shammari and Lin (2008) assessed the performance of proposed
Arabic stemmer named Educated Text Stemmer (ETS) using Paice (1994) evaluation
method and claimed that ETS stemmer (Al-Shammari and Lin 2008) performed better
than Khoja and Garside (1999)’s stemmer. However, they obtained “0” value of stem-
ming weight (SW) for both above-mentioned stemmers which show that both stemmers
are equal in strength and performance. Whereas, according to the gold standard evalu-
ation method ETS stemmer’s accuracy is 100% that is better than Khoja and Garside
(1999) which achieve 70% stemmer’s accuracy, as shown in Table10. The authors ran-
domly choose two samples of Arabic documents. First sample consists of 47 medical
documents that contained 9,435 Arabic words, and the second sample comprises 10
long, Arabic sports articles from CNN.com with total 7,071 words. (Al-Shammari and
Lin 2008)
Paice (1994) evaluation method, in some experiments, shows the contrary results
from the gold standard evaluation method as shown in Table 9. This controversy in
results can also be seen in Brazilian Portuguese languages (Alvares et al. 2005), in
Table 10 Comparison of Paice
method with the manual method Stemmer Paice’s evaluation Gold standard method
UI OI SW Accuracy (%)
ETS stemmer (Al-Sham-
mari and Lin 2008)
0 0 0 100
Khoja stemmer
(Khoja and Garside 1999)
0 0.0755 070
Table 11 Results of the test
using the sample I Stemmers References Accuracy (%) SW
STEMBERS Alvares etal. (2005)62.20 1.60 × 10–3
STEMP Orengo and Huyck (2001) 55.30 1.44 × 10–3
PORTER Porter (1980) 43.80 0.67 × 10–3
Table 12 Results of the test
using sample II Stemmers References Accuracy (%) SW
STEMBERS Alvares etal. (2005)69.02 3.50 × 10–4
STEMP Orengo and Huyck (2001) 67.60 3.30 × 10–4
PORTER Porter (1980) 57.86 1.25 × 10–4
Empirical evaluation andstudy oftext stemming algorithms
1 3
which results produced by Paice (1994) are totally opposite to the accuracy measured
by Gold standard/manual methods. Similarly, STEMBERS stemmer’s performance
is 62.20% that is better than the counter stemmer STEMP [55.30%] and PORTER
[43.80%], but the SW value
1.60 ×103
of STEMBERS shows the lowest performance
among evaluated stemmers as mentioned in Table 11. By observing Table 12, similar
performance is reflected in sample II experimental data, in which STEMBERS stemmer
achieved an accuracy of 69.02% that is higher than its counterpart stemmer i.e. STEMP,
PORTER. However, SW is 3.50 × 10−4, which is equal to STEMP stemmer but higher
than the PORTER [1.25 × 10−4] stemmer. Sample I and sample II comprosis 102 and
2,696 semantic groups constructed manually.
This trend in Paice evaluation method has also been proved by AlSerhan and
Alqrainy (2008), where, they compared the results obtained through manual method
and Paice evaluation method for Arabic language using two virtual stemmer’s results
(AlSerhan and Alqrainy 2008). It is depicted from the obtained results that Paice eval-
uation method denies the gold standard results. As stated before, the 0 value of SW
shows the strongest stemmer, and 0 value is obtained when OI, or UI or both are zeros
as mentioned in Table12. When gold standard accuracy is 100% or 0% as shown in
Table13(2nd column), in both cases, the value of SW is zero that is far from reality.
Table 13 Experiment’s result of
Paice evaluation method Author Stemmer Accuracy (%) OI UI SW
AlSerhan etal. (2008) Stem1 100 0 0 0
Stem2 100 0 0 0
Stem1 0 0 0 0
Stem2 0 0 0 0
Stem1 28.57 1 0 0
Stem2 85.71 0.2 0.36 0.56
Table 14 Comparison of manual
and Sirsat’s method Stemmer Sirsat’s method Manual
WSF CSWF AWC F Accuracy
HStemV1 69.47 81.12 52.83 56.35
HStemV2 65.7 87.37 59.45 57.39
Table 15 Comparison of Gold standard, Frakes and Sirsat’s evaluation parameters
Stemmer Frakes evaluation metrics Sirsat’s evaluation
method
Gold standard method
ICF MWC WCF MCR WSF CSWF Accuracy (%)
Light10 51.08 2.04 80.86 1.57 80.84 32.2 14.96
Motaz 36.04 1.56 54.31 0.64 54.31 57.75 18.59
Tashaphyne 69.94 3.32 87.23 2.02 87.23 22.63 10.95
SAFAR-stemmer 48.22 1.93 62.55 1.11 62.55 80.61 33.70
A.Jabbar et al.
1 3
But when accuracy is 85.71% obtained by gold standard method, its corresponding SW
value is 0.56, and the accuracy of its competitors is 28.57 that is lower, but SW value is
0 that means Stem2 is a stronger stemmer than Stem1 as reflected from Table13.
Some of Sirsat’s parameter’s values (Sirsat etal. 2013) showed a tendency in gold stand-
ard’s results as shown in Table 14. Where, HStemV2 showed better performance in terms
of accuracy as shown in Table14. On the other hand, with respect to Sirsat’s parameter
(Sirsat etal. 2013) WSF, HStemV1 performs better as shown in Table14.
Frakes evaluation method also talks about how many words are changed when stem-
mer ignored the correctness. Jabbar et al. (2016) used the Quranic Arabic Corpus
(Dukes and Habash 2010) which contains 18,350 unique Arabic words to compare the
results with counterpart stemmers Light10 (Larkey etal. 2007), Motaz (Saad and Ash-
our 2010), and Tashaphyne (Zerrouki 2016). SAFAR-Stemmer (Jaafar etal. 2016) pro-
vided results statistics about their stemmer. Form these calculations, we compute the
Frakes evaluation (Frakes and Fox 2003) parameters (as shown in Table14). Sirsat’s
parameters (Sirsat etal. 2013) and manual accuracy are also calculated and mentioned
in Table15. SAFAR-Stemmer (Jaafar etal. 2017) achieved 33.70 accuracy that is higher
than its competitor Light10 (Larkey etal. 2007) [14.96], Motaz (Saad and Ashour 2010)
[18.59] and Tashaphyne (Zerrouki 2016) [10.95]. Sirsat’s evaluation parameter WSF is
62.55 and CSWF is 80.61 which is highest among the stemmers. Sirsat’s evaluation and
gold standard method show that the SAFAR-Stemmer is better than Light10 (Larkey
et al. 2007), Motaz (Saad and Ashour 2010), and Tashaphyne (Zerrouki 2016) stem-
mers. However, Frakes evaluation metrics (Frakes and Fox 2003) deny this result as
Tashaphyne (Zerrouki 2016) performed better with respect to the higher values of ICF
[69.94], MWC [3.32], WCF [87.23], and MCR [2.02] as shown in Table14.
Considering the observation mentioned in Table15 the performance graph of men-
tioned stemmers varied with respect to evaluation methods. So, to be more accurate,
we develop a virtual scenario for the scarce resourced Urdu language stemming (see
Table16) and experimented two virtual stemmers i.e. VS1 and VS2.
Virtual stemmer1 Paice evaluation
Unachieved Merge Total (UMT) derived using (Eq.5).
Table 16 Experimental data for Paice evaluation
Groups Input words Actual stem VS1 VS2
G1  [Bdnsaz/Bodybuilder]  [Bdan/body]  [bdan/body]  [Bd/bad]
 [Bdnsazi/Bodybuilding]  [Bdan/body]  [bdan /body]  [Bd/bad]
 [Bdni/bodily]  [Bdan/body]  [bdan /body]  [Bd/bad]
 [Abdan/bodies]  [Bdan/body]  [bdan /body] [Bd/bad]
 [Abdano/bodies]  [Bdan/body]  [bd/bad] [Bd/bad]
G2  [Bdpan/badness]  [Bd/bad]  [bdan/body]  [Bd/bad]
 [Bdniah/bad luck]  [Bd/bad]  [Bd/bad]  [Bd/bad]
G3 [Bad roh/Evil spirit] [Roh/soul]  [Roh/soul]  [Bd/bad]
[Bad rohain/Evil spirits] [Roh/soul]  [Roh/soul]  [Bd/bad]
Empirical evaluation andstudy oftext stemming algorithms
1 3
Then for a particular group
g1
,
s
= Number of distinct stems produced by stemmer (a stemmer may produce multiple
stems for a particular group of words).
ng1
= possible number of morphological forms in particular group having same stem.
5
The total number of the stems  [Bdan/body] and  [Bd/bad] in group 1.
ui
= ith stem produced by stemmer. For
u1,4
stems of word  [Bdan/body] in group 1
and for
u2,1
stem  [Bd/bad] in group 1 is produced.
We calculate the UMT [using Eq. (5)] for both stemmers and for each group that is
shown in Table16. The UMT is the only one that valeted the conflation regardless of
correctness (as mentioned in Table16). The value of UMT is 0 if all words in the group
are conflated (correctly or incorrectly). This causes the inverse result of the gold strand
evaluation results. UsingEq. (3), we can calculate:
The DMT values of both stemmers against each group are mentioned in Table16.
UMT
g=1
2
s
i=1
uingui(5
)
UMTg1=
1
2[4×(54)+1×(51)
]
UMTG1
=
4
DMTg=
1
2ng(ng1)(3
)
DMT
g1=1
2[5×(51)]
DMTg1
=10
WMT
G=
1
2
t
i=1
vi(nsvi
)
Table 17 Calculation of UMT,
DMT, WMT, DNT Stemmer G UMT DMT WMT DNT
VS1 G1 4 10 4 10
G2 1 1` 1 7
G3 0 1 0 7
VS2 G1 0 10 0 10
G2 0 1 24 7
G3 0 1 0 7
Table 18 Calculation of GUMT,
GDMT, GWMT, GDNT Stemmer GUMT GDMT GWMT GDNT
VS1 5 12 5 24
VS2 0 12 24 24
A.Jabbar et al.
1 3
Table 19 Experimental results to check the viability of various stemming evaluation methods
Stemmer Paice (1994, 1996)’s evaluation method Frakes and Fox (2003)’s Evaluation Metrics Sirsat etal. (2013)’s evaluation
method
Gold
standard
evaluation
UI OI SW MWC ICF WCF MCR WSF CSWF AWC F Accuracy
(%)
VS1 0.42 0.21 0.5 3 66.67 12.3 100 77.8 57 77.8
VS2 0 1 0 9 88.89 1 3.3 100 22.2 50 22.2
Empirical evaluation andstudy oftext stemming algorithms
1 3
ns
=5
. The total number of the stems  [Bdan/body] in group 1 and group 2,
v1=4
The
number of stem  [Bdan/body] in group 1,
v2=1
The number of stem  [Bdan/body] in
group 2
The values of WMT (Eq.7) for each stemmer against each group are represented in
Table17.
For first group
The remaining values of
DNTg
are calculated and are provided in Table17.
The values of GUMT (Eq.6), GDMT (Eq.7), GWMT (Eq.8), and GDNT (Eq.10) are
calculated from Table17 and are given in Table18.
From Table19, it is observed that the accuracy obtained by VS1 is comparatively higher
[77.8] than VS2 stemmer [22.2].
It is clear from Table18 that the stemming weight SW shows that the VS1 [0.5] is a
light stemmer and VS2 [0] is a heavy stemmer. These results indicate that the stemmer SV2
is a perfect stemmer that is quite opposite of manual evaluation results.
The MWC value is 3 for VS1 and VS2 is 9 that is higher. It is observed from Table17
that index ICF obtained by VS2 [88.89] is higher than the VS1 [66.67]. The WCF is 1 for
both stemmers, but, MCR is higher for VS2 [3.3] than VS1 [2.3]. Frakes proposed param-
eters to assess the performance of the stemmer which gives conflicting results as that of the
gold standard method.
The WSF obtained by both stemmers is 100% that is above a threshold value [50%].
This indicates that the strength of both the stemmers is better and aggressive in nature.
The AWCF value of stemmer VS1 is 57 and 50 for VS2 that shows VS1 is stronger and
more aggressive than the stemmer VS2. However, there is a comparatively large difference
between both the stemmers with respect to CSWF, VS1 is 77.8 and VS2 is 22.28 that show
the same tendency as a gold standard evaluation method reflects. But two other parameters
give contrast results from the manual assessment.
6 Challenges andfuture directions
The text stemming is well studied and still open area of work especially for Arabic script
based langauges. Text Stemming has been recognized as an excellent pre-processing tool
in many NLP applications. Several approaches are proposed by the researchers facing vari-
ous issues and challenges. However, how to evaluate a stemmer is an open question. In this
WMT
G1=
1
2[4×(54)+1×(51)
]
WMTG1
=
4
DNT
g=
1
2
ng(wng
)
w=9n
g1
=5
DNT
g1=1
2[5×(95)
]
DNTg1
=10
A.Jabbar et al.
1 3
section, we present different issues and challenges to evaluate a stemmer that must be con-
sidered by the researchers in this domain.
Language type Concatenative and non-concatenative are two categories of languages
with respect to the morphological process. For a concatenative language, the affix and
stem are traced in a linear fashion but in non-concatenative language, the stem and affix
are intervening (Kastner 2019). As the stemming approaches are language dependent,
the stemmer evaluation methods are also changed accordingly. For instance, in the case
of Semitic language, Paice (1994, 1996) evaluation method is not suitable (AlSerhan and
Alqrainy 2008).
Types of affixes A variety of affixes is used in various languages such as the Indonesian
language possess prefix, suffix, confixes, and suffix (Setiawan etal. 2016). State of the art
stemming evaluation methods do not tell us about the types of affixes that are handled and
the types of affixes that are not considered.
Text stemming approaches Most of the stemmers presented in the literature are Linguis-
tic knowledge-based, that handle more than one affix type. On the other hand, statistical
stemmers only remove the suffix. With the development of high end AI methods like neural
networks, development of stemmers may produce superior results.
Domain-specific Some researchers evaluate the stemmer in a particular NLP applica-
tion. Every application only considers the specific feature of the language. For instance,
English IR system, Porter (1980) claimed that only suffix removal is enough; however, it is
not suitable for Semitic language like Arabic.
Besides above all, certain issues like computation complexity, space complexity, lin-
guistic correctness and considered types of affixes are helpful to determine the perfor-
mance of a stemmer. Hence, the metrics to evaluate a stemmer must address all issues men-
tioned above.
7 Conclusions
The purpose of stemming algorithms is very simple that is to extract the same stem from
various conflation forms of words. Various researchers proposed different parameters to
measure the performance of text stemming algorithms. However, each criterion only meas-
ures specific aspects of stemmer performance. Different text stemming evaluation (TSE)
methods proved to be useful in case of specific NLP applications.
In the developed NLP systems that use stemming, there is no standard TSE-method
which can provide a landmark to measure the performance of the stemming algorithms. A
stemmer performance may increase or decrease if different evaluation methods are used.
Therefore, there is a certain need to develop a standard evaluation method. The reason for
such result lies in the type of experimental data, training data, size of data and construction
of stemming rules (if rule based approach is used).
A varity of features such as affix types, language types, data sets types and size can be
used to develop a robust stemmer’s evaluation mechanism that consider the conflation ratio
as well as linguistically correctness of the stem. We conclude that this article provides a
comprehensive review of the state-of-the-art text stemming evaluation methods, their chal-
lenges and the avenues for future work.
Funding Funding was provided by Bahauddin Zakariya University (PK) (Grant No: 2019-05).
Empirical evaluation andstudy oftext stemming algorithms
1 3
References
Ababneh M, Al-Shalabi R, Kanaan G, Al-Nobani A (2012) Building an effective rule-based light stemmer
for arabic language to improve search effectiveness. Int Arab J Inf Technol 9(4):368–372
Abainia K, Ouamour S, Sayoud H (2017) A novel robust Arabic light stemmer. J Exp Theor Artif Intell
29(3):557–573
Abu-Errub A, Odeh A, Shambour Q, Hassan OAH (2014) Arabic roots extraction using morphological
analysis. Int J Comput Sci Issues (IJCSI) 11(2):128
Ali M, Khalid S, Aslam MH (2018) Pattern-based comprehensive Urdu stemmer and short text classifica-
tion. IEEE Access 6:7374–7389
Ali M, Khalid S, Saleemi M (2019) Comprehensive stemmer for morphologically rich urdu language. Int
Arab J Inf Technol 16(1):138–147
Alotaibi FS, Gupta V (2018) A cognitive inspired unsupervised language-independent text stemmer for
Information retrieval. Cognit Syst Res 52:291–300
Al-Kabi MN, Kazakzeh SA, Ata BMA, Al-Rababah SA, Alsmadi IM (2015) A novel root based Arabic
stemmer. J King Saud Univ-Comput Inf Sci 27(2):94–103
Al-Omari A, Abuata B (2014) Arabic light stemmer (ARS). J Eng Sci Technol 9(6):702–717
AlSerhan HM, Alqrainy S, Ayesh A (2008, November). Is paice method suitable for evaluating Arabic stem-
ming algorithms? In: International conference on computer engineering & systems, 2008 (ICCES
2008). IEEE, pp 131–135
Al-Shammari ET, Lin J. (2008, October). Towards an error-free Arabic stemming. In Proceedings of the
2nd ACM workshop on Improving non English web searching. ACM, pp 9–16
Al-Sughaiyer IA, Al-Kharashi IA (2004) Arabic morphological analysis techniques: A comprehensive sur-
vey. J American Soc Inf Sci Tech 55(3):189–213
Alvares RV, Garcia AC, Ferraz I (2005) December) STEMBR: a stemming algorithm for the Brazilian Por-
tuguese language. Portuguese conference on artificial intelligence. Springer, Berlin, pp 693–701
Aronoff M, Fudeman K (2011) What is morphology? vol. 8.Wiley, pp 2–3
Bimba A, Idris N, Khamis N, Noor NF (2016) Stemming Hausa text: using affix-stripping rules and ref-
erence look-up. Lang Resour Eval 50(3):687–703
Bölücü, Necva and Burcu Can. (2019). Unsupervised Joint PoS Tagging and Stemming for Agglutinative
Languages. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 18, 3, Article 25 (January 2019),
21 pages. https ://doi.org/10.1145/32923 98
Boudchiche M, Mazroui A (2015, December). Evaluation of the ambiguity caused by the absence of dia-
critical marks in Arabic texts: statistical study. In: 2015 5th international conference on informa-
tion and communication technology and accessibility (ICTA). IEEE, pp 1–6
Boukhalfa I, Mostefai S, Chekkai N (2018, March) A study of graph based stemmer in Arabic extrinsic
plagiarism detection. In: Proceedings of the 2nd mediterranean conference on pattern recognition
and artificial intelligence. ACM, pp 27–32
Brychcín T, Konopík M (2015) HPS: high precision stemmer. Inf Process Manag 51(1):68–91
Buckley C (1985) Implementation of the smart information retrieval system. Technical report 85–686,
Cornell University.
Cambria E, White B (2014) Jumping NLP curves: a review of natural language processing research.
IEEE Comput Intell Mag 9(2):48–57
Chintala DR, Reddy EM (2013) An approach to enhance the CPI using Porter stemming algorithm. Int J
Adv Res Comput Sci Softw Eng 3(7):1148–1156
CISI Collection https ://ir.dcs.gla.ac.uk/resou rces/test_colle ction s/cisi/. Accessed 30 Dec 2019. Devel-
oped by University of Glasgow
Dahab MY, Ibrahim A, Al-Mutawa R (2015) A comparative study on Arabic stemmers. Int J Comput
Appl 125(8):38–47
Dang Q, Zhang J, Lu Y, Zhang K (2013) WordNet-based suffix tree clustering algorithm. In: Interna-
tional conference on information science and computer applications (ISCA 2013)
Dey A, Paul A, Purkayastha BS (2014) Named entity recognition for Nepali language: a semi hybrid
approach. Int J Eng Innov Technol (IJEIT) 3:21–25
Dianati MH, Sadreddini MH, Hossein RA, Fakhrahmad SM, Taghi-Zadeh H (2014) Words stemming based
on structural and semantic similarity. Comp Eng Appl J 3(2):89–99
de Oliveira RAN, Junior MC (2018) Experimental analysis of stemming on jurisprudential documents
retrieval. Information 9(2):28
Dukes K, Habash N (2010) Morphological annotation of Quranic Arabic. In Lrec, pp 2530–2536
El-Defrawy M, El-Sonbaty Y, Belal NA (2016) A rule-based subject-correlated Arabic stemmer. Arab J
Sci Eng 41(8):2883–2891
A.Jabbar et al.
1 3
Fattah MA, Ren F, Kuroiwa S (2006) Stemming to improve translation lexicon creation form bitexts. Inf
Process Manag 42(4):1003–1016
Flores FN, Moreira VP (2016) Assessing the impact of stemming accuracy on information retrieval–a
multilingual perspective. Inf Process Manag 52(5):840–854
Frakes WB, Fox CJ (2003) Strength and similarity of affix removal stemming algorithms. In ACM
SIGIR forum, vol 37, no 1. ACM, pp 26–30.
Gaidhane MS, Gondhale MD, Talole MP (2015) A comparative study of stemming algorithms for natu-
ral language processing. J Eng Educ Technol (ARDIJEET) 3(2):1–6
Giachanou A, Crestani F (2016) Like it or not: a survey of twitter sentiment analysis methods. ACM
Comput Surv (CSUR) 49(2):28
Harman D (1991) How effective is suffixing. J Am Soc Inf Sci 42(1):7–15
Hassani K, Lee WS (2016) Visualizing natural language descriptions: a survey. ACM Comput Surv
(CSUR) 49(1):17
Husain MS, Ahamad F, Khalid S (2013) A language independent approach to develop Urdu stemmer.
Advances in computing and information technology. Springer, Berlin, pp 45–53
Hull DA (1996) Stemming algorithms—a case study for detailed evaluation. J Am Soc Inf Sci 47:70–84
Hussain Z, Iqbal S, Saba T, Almazyad AS, Rehman A (2017) Design and development of dictionary-
based stemmer for the urdu language. J Theor Appl Inf Technol 95(15):3560–3569
Islam Md, Uddin Md, Khan M (2007) A light weight stemmer for Bengali and its use in spelling checker.
Retrieved 24 March, 2019, from http://hdl.handl e.net/10361 /328
Ismailov A, Jalil MA, Abdullah Z, Rahim NA (2016) A comparative study of stemming algorithms for
use with the Uzbek language. In: 3rd international conference on computer and information sci-
ences (ICCOINS), 2016. IEEE, pp 7–12
Jaafar Y, Namly D, Bouzoubaa K, Yousfi A (2017) Enhancing Arabic stemming process using resources
and benchmarking tools. J King Saud Univ-Comput Inf Sci 29(2):164–170
Jabbar A, Iqbal S, Khan MUG (2016a) Analysis and development of resources for Urdu text stemming.
In: Proceedings of the 6th annual international conference on language and technology, KICS-
CLE, UET Lahore
Jabbar A, Iqbal S, Akhunzada A, Abbas Q (2018a) An improved Urdu stemming algorithm for text min-
ing based on multi-step hybrid approach. J Exp Theor Artif Intell. https ://doi.org/10.1080/09528
13X.2018.14674 95
Jabbar A, Iqbal S, Khan MUG, Hussain S (2018b) A survey on Urdu and Urdu like language stemmers and
stemming techniques. Artif Intell Rev 49(3):339–373
Jabbar A, Iqbal S, Khan MUG, Hussain S (2018b) A survey on Urdu and Urdu like language stemmers and
stemming techniques. Artif Intell Rev 49(3):339–373
Jivani AG (2011) A comparative study of stemming algorithms. Int J Comp Tech Appl 2(6):1930–1938
Karaa WBA (2013) A new stemmer to improve information retrieval. Int J Netw Secur Appl 5(4):143
Karimi S, Wang C, Metke-Jimenez A, Gaire R, Paris C (2015) Text and data mining techniques in adverse
drug reaction detection. ACM Comput Surv (CSUR) 47(4):56
Kastner I (2019) Templatic morphology as an emergent property. Nat Lang Linguist Theory 37(2):571–619
Khalid A, Hussain Z, Baig MA (2016) Arabic stemmer for search engines information retrieval. Int J Adv
Comput Sci Appl 1(7):407–411
Khan S, Waqas A, Usama B, Xuan W (2015) Template based affix stemmer for a morphologically rich lan-
guage. Int Arab J Inf Tech 12(2):146–154
Khoja S, Garside R (1999) Stemming arabic text. Lancaster University, Lancaster, UK, Computing
Department
Krovetz R (2000) Viewing morphology as an inference process. Artif intel 118(1–2):277–294
Larkey LS, Ballesteros L, Connell ME (2007) Light stemming for Arabic information retrieval. Arabic com-
putational morphology. Springer, Dordrecht, pp 221–243
Lemur (2016) https ://www.lemur proje ct.org. Accessed 14 Aug 2018
Lennon M, Peirce DS, Tarry BD, Willett P (1981) An evaluation of some conflation algorithms for informa-
tion retrieval. Inf Sci 3(4):177–183
Lovins JB (1968) Development of a stemming algorithm. Mech Transl Comput Linguist 11(1–2):22–31
Lucene (2018) https ://lucen e.apach e.org. Accessed 12 Aug 2018
Mateen A, Malik MK, Nawaz Z, Danish HM, Siddiqui MH, Abbas Q (2017) A hybrid stemmer of punjabi
shahmukhi script. Int J Comput Sci Netw Secur 17(8):90–97
McCormick C (2016) Word2Vec tutorial—the skip-gram model. https ://www.mccor mickm l.com
Mishra U, Prakash C (2012) MAULIK: an effective stemmer for Hindi language. Int J Comput Sci Eng
4(5):711–717
Empirical evaluation andstudy oftext stemming algorithms
1 3
Mochizuki M, Aizawa K (2000) An affix acquisition order for EFL learners: an exploratory study. System
28(2):291–304
Moghadam FM, MohammadReza K (2015) Comparative study of various Persian stemmers in the field of
information retrieval. J Inf Proc Syst 11(3):450–464
Momenipour F, Keyvanpour MR (2016) PHMM: stemming on Persian texts using statistical stemmer based
on hidden Markov Model. Int J Inf Sci Manag 14(2):107–117
Mustafa AM, Rashid TA (2018) Kurdish stemmer pre-processing steps for improving information retrieval.
J Inf Sci 44(1):15–27
Nguyen, (2013) Nguyen DT, Leveling J (2013) Exploring domain-sensitive features for extractive summari-
zation in the medical domain. International conference on application of natural language to informa-
tion systems. Springer, Berlin, pp 90–101
Nwesri AFA, Alyagoubi HAH (2015). Applying arabic stemming using query expansion. In 2015 26th
international workshop on database and expert systems applications (DEXA) (pp. 299–303). IEEE
Orengo VM, Huyck C (2001) a stemming algorithm for the portuguese language. In; SPIRE ’01: Proceed-
ings of eigth symposium on string processing and information retrieval, pp 186–193.
Paice CD (1990) Another stemmer. SIGIR Forum 24(3):56–61
Paice CD (1996) Method for evaluation of stemming algorithms based on error counting. J Am Soc Inf Sci
47(8):632–649
Paice CD (1994) An evaluation method for stemming algorithms. In: Proceedings of the 17th annual inter-
national ACM SIGIR conference on research and development in information retrieval. Springer,
New York, pp 42–50
Pande BP, Tamta P, Dhami HS (2018) Generation, implementation and appraisal of an N-gram based stem-
ming algorithm. Digit Scholarsh Humanit. https ://doi.org/10.1093/llc/fqy05 3
Paik JH, Pal D, Parui SK (2011) A novel corpus-based stemming algorithm using co-occurrence statistics.
In: Proceedings of the 34th annual international ACM SIGIR conference on research and develop-
ment in information retrieval (SIGIR’11). ACM, New York, pp 863–872
Patil CG, Patil SS (2013) Use of Porter stemming algorithm and SVM for emotion extraction from news
headlines. Int J Electron Commun Soft Comput Sci Eng 2(7):9–13
Porter MF (2006) https ://snowb all.artar us.org/algor ithms /engli sh/ stemmer.html
Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137
Qureshi AH, Hassan MU, Akhter S (2018) Towards description of derivation in Urdu: morphological per-
spective. Al-Qalam 23(2):96–100
Rani SPR, Ramesh B, Anusha M, Rani SJGR (2015) Evaluation of stemming techniques for text classifica-
tion. Int J Comput Sci Mobile Comput 4(3):165–171
Rashid TA, Mohamad SO (2016) Enhancement of detecting wicked website through intelligent methods.
International symposium on security in computing and communication. Springer, Singapore, pp
358–368
Rashidi A, Lighvan MZ (2014) HPS: a hierarchical Persian stemming method. arXiv preprint
arXiv:1403.2837.
Rehman Z, Anwar W, Bajwa UI, Xuan W, Chaoying Z (2013) Morpheme matching based text tokenization
for a scarce resourced language. PLoS ONE 8(8):e68178
Saad MK, Ashour W (2010) Arabic morphological tools for text mining. Corpora 18:19
Saeed AM, Rashid TA, Mustafa AM, Al-Rashid Agha RA, Shamsaldin AS, Al-Salihi NK (2018a) An evalu-
ation of Reber stemmer with longest match stemmer technique in Kurdish Sorani text classification.
Iran J Comput Sci 1(2):99–107
Saeed AM, Rashid TA, Mustafa AM, Fattah P, Ismael B (2018b) Improving Kurdish web mining through
tree data structure and Porter’s Stemmer algorithms. UKH J Sci Eng 2(1):48–54
Sarma B, Purkayastha BS (2013) An affix based word classification method of assamese text. Int J Adv Res
Comput Sci 4(9):213–216
Schofield A, Mimno D (2016) Comparing apples to apple: the effects of stemmers on topic models. Trans
Assoc Comput Linguist 4:287–300
Setiawan R, Kurniawan A, Budiharto W, Kartowisastro IH, Prabowo H (2016) Flexible affix classification
for stemming Indonesian Language. In: 2016 13th international conference on electrical engineering/
electronics, computer, telecommunications and information technology (ECTI-CON). IEEE, pp 1–6
Singh J, Gupta V (2016) Text stemming: approaches, applications, and challenges. ACM Comput Surv
(CSUR) 49(3):45
Singh J, Gupta V (2017) An efficient corpus-based stemmer. Cognit Comput 9(5):671–688
Sirsat SR, Chavan V, Mahalle HS (2013) Strength and accuracy analysis of affix removal stemming algo-
rithms. Int J Comput Sci Inf Technol 4(2):265–269
A.Jabbar et al.
1 3
Sulaiman S, Omar K, Omar N, Murah MZ, Abdul Rahman HD (2014) The effectiveness of a Jawi stem-
mer for retrieving relevant Malay documents in Jawi characters. ACM Trans Asian Lang Inf Process
(TALIP) 13(2):6
Suryani AA, Widyantoro DW, Purwarianti A, Sudaryat Y (2018) The rule-based sundanese stemmer. ACM
Trans Asian Low-Resour Lang Inf Process (TALLIP) 17(4):27
Taghi-Zadeh H, Sadreddini MH, Diyanati MH, Rasekh AH (2015) A new hybrid stemming method for per-
sian language. Digit Scholarsh Humanit 32(1):209–221
Thangarasu M, Manavalan R (2013) Design and development of stemmer for Tamil language: cluster analy-
sis. Int J Adv Res Comput Sci Softw Eng 3(7):812–818
The free dictionary (2018) https ://www.thefr eedic tiona ry.com/. Accessed 03 Aug 2018
Qunis I, Amati G, Plachouras V, He B, Macdonald C, Lioma C (2006) A high performance and scalable
information retrieval plateform. In: SIGR workshop on open source information retrieval
Urdu L (2006) https ://182.180.102.251:8081/oud/help_3.htm. Accessed 04 Aug 2018
Xapian (2018) https ://xapia n.org. Accessed 07 Aug 2018
Xer (1994) Xeror linguistic database reference, English version 1.1.4 ed.s
Yadollahi A, Shahraki AG, Zaiane OR (2017) Current state of text sentiment analysis from opinion to emo-
tion mining. ACM Comput Surv (CSUR) 50(2):25
Zerrouki T (2016) Tashaphyne 0.2 (Online). https ://pypi.pytho n.org/pypi/Tasha phyne . Accessed 14 Apr
2016
Zhou D, Mark T, Brailsford T, Wade V, Ashman H (2012) Translation techniques in cross-language infor-
mation retrieval. ACM Comput Surv (CSUR) 45(1):1
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
... Across languages, diverse stemming techniques are available, with examples including Potter's algorithm designed for stemming English words. Jabbar et al. [33] explore a range of stemming approaches applicable to textual data. ...
Article
Full-text available
Answer scripts are an important aspect in evaluating student’s performance. Evaluating papers from a descriptive outlook can be a challenging and exhausting task. Typically, answer script evaluations are conducted dynamically, which can lead to bias and can be quite time-consuming. Various efforts have been made to automate the evaluation of student responses with the usage of Artificial Intelligence techniques. Yet, most of the work relies on particular words or typical counts to accomplish this task. In addition, there is a shortage of organized data sets too. In this research a novel ensemble model Descriptive answer evaluation system(DAES) is introduced, which integrates Topic Modelling (TM) and Question Answering (QA) models for automatically evaluating descriptive answers. Latent Dirichlet Allocation (LDA) and a fine-tuned Text-to-Text Transfer Transformer(T5) models were utilized to identify key topics and the correctness of specific statements within the student answers. Sentence-BERT is utilized to encode sentences and cosine similarity method is applied to generate similarity scores. For this approach, LDA studies thematic evaluation, T5 assess for semantic analysis of the student answer. A final score is given to each answer after a thorough review procedure using predetermined criteria. Experiments results in achieving an accuracy of 95%, precision of 94%, recall 95% and f1-score of 94% on training data by using the proposed model.
... Then we perform automatic identification of explicit references of these tokens and relate them to the corresponding values. We do this using stemming [Jabbar et al., 2020] on both the token lists and the fairy tale texts. This is because, in contrast e.g. to lemmatisation, stemming reduces different word forms to the same originating token. ...
Article
Looking at how social values are represented in fairy tales can give insights about the variations in communication of values across cultures. We study how values are communicated in fairy tales from Portugal, Italy and Germany using a technique called word embedding with a compass to quantify vocabulary differences and commonalities. We study how these three national traditions differ in their explicit references to values. To do this, we specify a list of value-charged tokens, consider their word stems and analyse the distance between these in a bespoke pre-trained Word2Vec model. We triangulate and critically discuss the validity of the resulting hypotheses emerging from this quantitative model. Our claim is that this is a reusable and reproducible method for the study of the values explicitly referenced in historical corpora. Finally, our preliminary findings hint at a shared cultural understanding and the expression of values such as Benevolence, Conformity, and Universalism across the studied cultures, suggesting the potential existence of a pan-European cultural memory.
... Thematic evolution of BPA remediation was evaluated based using simple centre co-word algorithm [15,16]. In all approaches in the thematic areas and thematic evolution, Porter's stemming algorithm was implemented to stemmed inflectional terms to their roots [17]. Table 1 represents a summary of information on Bisphenol A remediation studies retrieved from Scopus, Web of Science and PubMed. ...
Article
Full-text available
Bisphenol A (BPA) is an endocrine-disrupting compound and a mutagenic agent that poses health hazards to living organisms, making it a global contaminant. Several remediation techniques have been reported in the literature, however, a mixed-method science mapping analysis of research trends on BPA is still lacking. The present study aimed to investigate global research trends in BPA remediation. Published research papers on BPA remediation indexed in Web of Science, PubMed, and Scopus between 1992 and 2021 were analysed qualitatively and quantitatively using science mapping algorithms including Rstudio, bibliometrix package and R Version 4.2.1. The thematic areas were determined using k-means clustering of the author-keywords while Porter’s stemming algorithm was used to stemmed inflectional terms to their roots. Overall, 640 documents were published by 1903 authors with 2.07 authors/article and 0.336 article/author, 4.31 co-authors/article, an annual growth rate of 17.35% and a collaboration index of 2.99. Research productivity increased from 1 article in 1992 to 93 articles in 2021. The citations of the topmost 23 articles ranged from 365 to 109 and the total citation per year ranged from 45.6 to 27.3. China (n = 267, 41.7%), Japan (n = 53, 8.3%), USA (n = 33, 5.2%) and Korea (n = 28, 4.4%) were respectively the top four countries based on the total of published articles and overall citation. There were 48 relevant keywords dominated by Bisphenol A, adsorption, biodegradation, and peroximonosulphate. The present analysis identifies research accomplishment, focus and gaps on Bisphenol A remediation and offer the researchers the information needed to forecast future research priorities that can help policymakers and governments to internationalize collaborations and create research curricula that can remediate BPA on a global scale.
... Bu sebeple, analiz sürecini daha verimli hale getirmek ve sürecin özenli ve ayrıntılı bir şekilde ele alınmasını sağlamak amacıyla, ön işlemede alana özgü ya da özel sözlüklerin kullanılması ya da uygun ön işleme tekniklerinin uygulanması gerekmektedir[1].Gövdeleme (stemming) ve kök çözümleme (lemmatization):Gövdeleme ve kök çözümleme işlemleriyle, varyantlara sahip kelime formlarının temel formlarına indirgenmesi amaçlanmaktadır[134]. Gövdeleme, genellikle kelimelerin türetme eklerinin kaldırılmasında görevlidir ve sözlükte yer alıp almadığına bakılmaksızın ele alınan kelimelerin, morfolojik formlarının her birinin anlamsal olarak ilişkili olduğu varsayıldığı köklere dönüştürülmesi sürecini yönetmektedir[135]. Kök çözümleme ise; gövdelemeden farklı olarak kelimenin bağlamına bağlı sözcük türünü belirleme ve daha sonra lemma olarak adlandırılan baş sözcüğün kanonik biçiminin elde edilmesindeki kompleks süreci ifade etmektedir[135,136]. Ayrıca yazılım geliştirme platformlarına farklı algoritmalarla geliştirilmiş çok sayıda gövdeleyici ve kök çözümleyici araç entegre edilebilmektedir[30,137].Metin temsili, NLP'nin temel görevleri arasındadır. ...
Thesis
Full-text available
Recently, online review platforms have become significant data sources that support users' purchasing decisions. Users refer to these information sources to reach possible experiences before purchasing a product or service. On the other hand, businesses aim to benefit from the potential power of these resources to investigate the effects of the products they market on users. However, considering the volumetric size of these resources, it becomes almost impossible for users to evaluate them effectively by reading all the reviews. On the other hand, businesses that face large user populations need automated approaches in processes such as data processing and analysis. For this reason, researchers are interested in Aspect-based Opinion Mining studies, which enable more effective and fine-grained analysis by addressing the problems mentioned above. In this thesis, firstly, the Opinion Target Extraction (OTE) approach, including the pattern-based text pre-processing method, algorithms of extended syntactic-based relation rules with auxiliary components, and the majority voting method applied to provide performance optimization in model outputs, is proposed. It was provided to extract explicit expressions (opinion targets) representing distinctive entity aspects interpreted by users with subjective opinion words with this proposed approach. In order to test the effectiveness of the proposed OTE approach, experimental studies were carried out on a data set containing restaurant reviews. When the results of experimental studies were analyzed, it was reached that the proposed approach produced results comparable to supervised approaches in the literature and performed higher than other unsupervised approaches. Secondly, deep learning-based Aspect Category Detection (ACD) approaches are proposed to classify multi-label and hierarchically tagged review sentences with specific aspect categories. In the proposed ACD approaches, Convolutional Neural Network (CNN) and Deep Neural Network (DNN) based in which pre-trained Bidirectional Encoder Representations from Transformers (BERT) and Semantic Folding Theory (SFT) word embedding models (WEMs) that generate rich vector representations by considering contextual information are applied in their inputs multi-label text classification approaches have been developed. In order to analyze the effectiveness of the developed ACD approaches and their contribution to the classification performance of the implemented WEMs, experimental studies were carried out on laptop and restaurant review datasets. When the results of experimental studies were analyzed, higher or competitive performance results were obtained with the proposed approaches compared to other approaches in the literature—in addition, evaluating the performances of applied WEMs together is among the firsts in the literature. TR: Son zamanlarda, çevrimiçi inceleme platformları kullanıcıların satın alma kararlarını destekleyen önemli bilgi kaynakları haline gelmiştir. Kullanıcılar bir ürünü ya da hizmeti satın almadan önce olası deneyimlere ulaşmada bu bilgi kaynaklarına başvurmaktadır. İşletmeler ise pazarladıkları ürünlerin kullanıcılar üzerindeki etkilerini keşfedebilmek için bu kaynakların potansiyel gücünden yararlanmayı hedeflemektedir. Ancak bu kaynakların hacimsel büyüklüğü düşünüldüğünde; kullanıcıların tüm incelemeleri okuyarak etkin bir şekilde değerlendirmesi neredeyse imkânsız hale gelmektedir. Diğer bir taraftan büyük kullanıcı popülasyonuyla karşı karşıya kalan işletmeler, verilerin işlenmesi ve analizi gibi süreçlerde otomatikleştirilmiş yaklaşımlara ihtiyaç duymaktadır. Bu sebeple araştırmacılar, yukarıda bahsedilen problemleri ele alarak daha etkin ve detaylı analizlere olanak sağlayan Özellik Tabanlı Görüş (Fikir) Madenciliği çalışmalarına ilgi göstermektedir. Bu tez çalışmasında ilk olarak, örüntü tabanlı metin ön işleme yöntemine, yardımcı bileşenlerle genişletilmiş sözdizilimsel tabanlı ilişki kuralları algoritmalarına ve model çıktılarında performans optimizasyonu sağlamak amacıyla uygulanan çoğunlukla seçim yöntemine sahip Görüş Hedefi Çıkarımı (Opinion Target Extraction, OTE) yaklaşımı önerilmiştir. Önerilen bu yaklaşımla, kullanıcılar tarafından öznel nitelikli görüş sözcükleriyle yorumlanan varlığa ilişkin ayırt edici özellikleri temsil eden açık ifadelerin (görüş hedefleri) çıkarılması sağlanmıştır. Önerilen OTE yaklaşımının etkinliğini test etmek amacıyla, restoran incelemeleri içeren veri seti üzerinde deneysel çalışmalar gerçekleştirilmiştir. Deneysel çalışmaların sonuçları analiz edildiğinde, önerilen yaklaşımın literatürdeki denetimli yaklaşımlarla karşılaştırılabilir sonuçlar ürettiği, denetimsiz diğer yaklaşımlara göre ise daha yüksek performans gösterdiği sonucuna ulaşılmıştır. İkinci olarak, belirli özellik kategorileriyle çoklu ve hiyerarşik yapıda etiketlenmiş inceleme cümlelerinin sınıflandırılması amacıyla derin öğrenme tabanlı Özellik Kategorisi Tespiti (Aspect Category Detection, ACD) yaklaşımları önerilmiştir. Önerilen ACD yaklaşımlarında, girdilerinde bağlamsal bilgiyi dikkate alarak zengin vektör temsilleri üretebilen önceden eğitilmiş Dönüştürücülerden Çift Yönlü Kodlayıcı Temsilleri (BERT) ve Anlamsal Katlama Teorisi (SFT) kelime temsil modellerinin (word embedding model, WEM) uygulandığı Evrişimsel Sinir Ağı ve Derin Sinir Ağı tabanlı çok etiketli metin sınıflandırma yaklaşımları geliştirilmiştir. Geliştirilen ACD yaklaşımlarının etkinliklerini ve uygulanan WEM’lerin sınıflandırma performanslarına olan katkılarını analiz etmek için, dizüstü bilgisayar ve restoran inceleme veri setleri üzerinde deneysel çalışmalar gerçekleştirilmiştir. Deneysel çalışmaların sonuçları analiz edildiğinde, önerilen yaklaşımlarla literatürdeki diğer yaklaşımlara göre daha yüksek veya rekabetçi performans sonuçları elde edilmiştir. Ayrıca bu çalışmada uygulanan WEM’lerin performanslarının birlikte değerlendirilmesi, literatürde ilkler arasında yer almaktadır.
Article
Amidst the robust development of the service economy and information technology, the information age has significantly transformed consumption concepts and service demands. The requirements for logistics services from customers have become increasingly stringent. Beyond price and speed, quality has steadily emerged as the most crucial factor. When evaluating express services, experts primarily choose indicators from an enterprise perspective, mainly overlooking customer perception. The purpose of this paper is to establish a key index system for assessing the quality of express services aimed at improving the service quality of express deliveries within the development mode of intelligent logistics. In terms of methodology, we proposed a quantitative and qualitative e-commerce logistics service evaluation system. Specifically, we developed a key index system for assessing the quality of express services based on the theory of the six senses and the selection from an index library at first. And then, we ascertained the weight of each index in the evaluation by analyzing the proportion of customer comments. We employed analysis methods of machine learning, text data processing, mathematical statistics, and big data technology to determine the evaluation index and evaluation model of express service. Finally, our primary findings from the empirical verification and analysis of a real express enterprise indicate that this evaluation model can effectively assess the quality of express services. Our model can evaluate the current service quality and predict the quality of future services. We have derived some interpretations and conclusions based on these analyses and findings. Firstly, the level of customer perception is of great importance in the evaluation of express service quality. Secondly, our model can provide valuable feedback for express enterprises to improve their services. Lastly, we proposed corresponding improvement strategies to enhance the service quality of express enterprises under the intelligent logistics development mode.
Article
Full-text available
The exponential increase in textual unstructured digital data creates significant demand for advanced and smart stemming systems. As a preprocessing stage, stemming is applied in various research fields such as information retrieval (IR), domain vocabulary analysis, and feature reduction in many natural language processing (NLP). Text stemming (TS), an important step, can significantly improve performance in such systems. Text-stemming methods developed till now could be better in their results and can produce errors of different types leading to degraded performance of the applications in which these are used. This work presents a systematic study with an in-depth review of selected stemming works published from 1968 to 2023. The work presents a multidimensional review of studied stemming algorithms i.e., methodology, data source, performance, and evaluation methods. For this study, we have chosen different stemmers, which can be categorized as 1) linguistic knowledge-based, 2) statistical, 3) corpus-based, 4) context-sensitive, and 5) hybrid stemmers. The study shows that linguistic knowledge-based stemming techniques were widely used for highly inflected languages (such as Arabic, Hindi, and Urdu) and have reported higher accuracy than other techniques. We compare and analyze the performance of various state-of-the-art TS approaches, including their issues and challenges, which are summarized as research gaps. This work also analyzes different NLP applications utilizing stemming methods. At the end, we list the future work directions for interested researchers.
Article
Writing is a crucial component of the language requirement and is an effective method for correctly reflecting language proficiency. Manually evaluating Tamil language exams becomes time-consuming and costly for standardized language administrators as they grow in popularity. Numerous studies on computerized English assessment systems have been conducted in recent years. Due to Tamil text’s complicated grammatical structures, less research has been done on computerized evaluation methods. In this research, we present a Tamil review comment analysis system using a novel multivariate naïve Bayes classifier (mv - NB) where the comments are acquired from an online social network and performed training using the database for further analysis. Experiments show that the graded Kappa of 0.4239, error rate of 2.55 and precision of 85% was achieved on the online dataset by our contents grading system, which is superior in grading compared to the other widely used machine learning algorithms training on big datasets. Our findings are promising. Additionally, our contents analysis may provide beneficial criticism on Tamil writing on YouTube posts including comments, spelling errors and morphological issues that help to analyze thelanguage correlation.
Chapter
Every organization, both educational and non-educational, administers conducting tests to check the. Along with objective questions, the question papers used to assess a student’s performance include descriptive questions. In contrast to competitive exams, which include objective or multiple-choice questions, school exams typically include descriptive or subjective questions. Machines can quickly and readily assess the objective responses, which is highly helpful for saving time and money. However, because there is no automatic mechanism or computer to analyze student responses, schools and institutions have difficulties when analyzing descriptive questions. Due to the time and effort required by manual review, it is the end result since the evaluator's feelings have an effect on the evaluation's quality, which in turn has an effect on the student's performance score, there is also a potential of bias. Finding comparable words and phrases is the first step in assessing similarity. The project includes a Natural Language Processing (NLP), concepts, Deep learning as well as machine learning to demonstrate the development of autonomous subjective response assessment algorithms. In order to evaluate results in real-time datasets, this work analyses a variety of machine learning classifiers and deep learning concepts.KeywordsSubjective questionsDeep learningAutomatic language recognitionMachine learningSimilarity checking
Article
Full-text available
A language-independent stemmer has always been looked for. Single N-gram tokenization technique works well; however, it often generates stems that start with intermediate characters, rather than initial ones. We present a novel technique that takes the concept of N-gram stemming one step ahead and compare our method with an established algorithm in the field, say, Porter’s stemmer for English, Spanish, and Portuguese languages. Results indicate that our N-gram stemmer is comparable with the Porter’s linguistic stemmer.
Article
Full-text available
Stemming is the basic operation in Natural language processing (NLP) to remove derivational and inflectional affixes without performing a morphological analysis. This practice is essential to extract the root or stem. In NLP domains, the stemmer is used to improve the process of information retrieval (IR), text classifications (TC), text mining (TM) and related applications. In particular, Urdu stemmers utilize only uni-gram words from the input text by ignoring bigrams, trigrams, and n-gram words. To improve the process and efficiency of stemming, bigrams and trigram words must be included. Despite this fact, there are a few developed methods for Urdu stemmers in the past studies. Therefore, in this paper, we proposed an improved Urdu stemmer, using hybrid approach divided into multi-step operation, to deal with unigram, bigram, and trigram features as well. To evaluate the proposed Urdu stemming method, we have used two corpora; word corpus and text corpus. Moreover, two different evaluation metrics have been applied to measure the performance of the proposed algorithm. The proposed algorithm achieved an accuracy of 92.97% and compression rate of 55%. These experimental results indicate that the proposed system can be used to increase the effectiveness and efficiency of the Urdu stemmer for better information retrieval and text mining applications.
Article
Full-text available
Stemming algorithms are commonly used during textual preprocessing phase in order to reduce data dimensionality. However, this reduction presents different efficacy levels depending on the domain that it is applied to. Thus, for instance, there are reports in the literature that show the effect of stemming when applied to dictionaries or textual bases of news. On the other hand, we have not found any studies analyzing the impact of radicalization on Brazilian judicial jurisprudence, composed of decisions handed down by the judiciary, a fundamental instrument for law professionals to play their role. Thus, this work presents two complete experiments, showing the results obtained through the analysis and evaluation of the stemmers applied on real jurisprudential documents, originating from the Court of Justice of the State of Sergipe. In the first experiment, the results showed that, among the analyzed algorithms, the RSLP (Removedor de Sufixos da Lingua Portuguesa) possessed the greatest capacity of dimensionality reduction of the data. In the second one, through the evaluation of the stemming algorithms on the legal documents retrieval, the RSLP-S (Removedor de Sufixos da Lingua Portuguesa Singular) and UniNE (University of Neuchâtel), less aggressive stemmers, presented the best cost-benefit ratio, since they reduced the dimensionality of the data and increased the effectiveness of the information retrieval evaluation metrics in one of analyzed collections.
Article
The number of possible word forms is theoretically infinite in agglutinative languages. This brings up the out-of-vocabulary (OOV) issue for part-of-speech (PoS) tagging in agglutinative languages. Since inflectional morphology does not change the PoS tag of a word, we propose to learn stems along with PoS tags simultaneously. Therefore, we aim to overcome the sparsity problem by reducing word forms into their stems. We adopt a Bayesian model that is fully unsupervised. We build a Hidden Markov Model for PoS tagging where the stems are emitted through hidden states. Several versions of the model are introduced in order to observe the effects of different dependencies throughout the corpus, such as the dependency between stems and PoS tags or between PoS tags and affixes. Additionally, we use neural word embeddings to estimate the semantic similarity between the word form and stem. We use the semantic similarity as prior information to discover the actual stem of a word since inflection does not change the meaning of a word. We compare our models with other unsupervised stemming and PoS tagging models on Turkish, Hungarian, Finnish, Basque, and English. The results show that a joint model for PoS tagging and stemming improves on an independent PoS tagger and stemmer in agglutinative languages.
Chapter
Noticeably, different environments of wicked website include different types of information which could be a threat for all web users such as incitement for hacking sites and encouraging them for spreading notions through learning theft networks, Wi-Fi, websites, internet forums, Facebook, email accounts, etc. The proposed work deals with sites to protect from hacking through designing a method that takes full advantage of machine learning and intelligent systems’ capabilities to realize the informative contents. The ultimate goal of this work of research is to understand the system behavior and determine the best solution to secure the vulnerable users, state and society via Random Forest (RF) and Support Vector Machines (SVM) methods instead of traditional methods. Random Forest exhibited Promising Results in terms of accuracy.
Article
Our research proposed an iterative Sundanese stemmer by removing the derivational affixes prior to the inflexional. This scheme was chosen because, in the Sundanese affixation, a confix (one of derivational affix) is applied in the last phase of a morphological process. Moreover, most of Sundanese affixes are derivational, so removing the derivational affix as the first step is reasonable. To handle ambiguity, the last recognized affix was returned as the result. As the baseline, a Confix-Stripping Approach that applies Porter Stemmer for the Indonesian language was used. This stemmer shares similarities in terms of affix type, but uses a different stemming order. To observe whether the baseline stems the Sundanese affixed word properly, some features that were not covered by the baseline, such as the infix and allomorph removal, were added. The evaluation was done using 4,453 unique affixed words collected from Sundanese online magazines. The experiment shows that, as a whole, our stemmer outperforms the modified baseline in terms of recognized affixed type accuracy and properly stemmed affixed words. Our stemmer recognized 68.87% of the Sundanese affixed types and produced 96.79% of the correctly affixed words; the modified baseline resulted in 21.70% and 71.59%, respectively
Article
In Information Retrieval systems, stemming handles the words that can occur in different morphological forms, and hence matches the terms of the documents and the queries that are related in meanings. In this article, we have proposed a cognitive inspired language-independent stemming that learns group of morphologically related words from the ambient corpus without any linguistic knowledge or human intervention and it behaves in a way the human brain works. The main idea of our proposed algorithm is to determine only those variants of the words from the ambient corpus that match the original intent of the query terms. We conducted ad-hoc retrieval experiments in a number of languages of varying morphological complexity using standard TREC, FIRE, and CLEF document collection. The results indicate that stemming improves the retrieval accuracy and the effectiveness of stemming algorithm increases with the increase in the morphological complexity of algorithm. The results also indicates that the performance of our proposed algorithm is better than the stemmers based on linguistic knowledge and other state-of-the-art statistical stemmers in almost all the languages under study. In multi-lingual setup these results are quite encouraging.
Article
Modern Hebrew exhibits a non-concatenative morphology of consonantal “roots” and melodic “templates” that is typical of Semitic languages. Even though this kind of non-concatenative morphology is well known, it is only partly understood. In particular, theories differ in what counts as a morpheme: the root, the template, both, or neither. Accordingly, theories differ as to what representations learners must posit and what processes generate the eventual surface forms. In this paper I present a theory of morphology and allomorphy that combines lexical roots with syntactic functional heads, improving on previous analyses of root-and-pattern morphology. Verbal templates are here argued to emerge from the combination of syntactic elements, constrained by the general phonology of the language, rather than from some inherent difference between Semitic morphology and that of other languages. This way of generating morphological structure fleshes out a theory of morphophonological alternations that are non-adjacent on the surface but are local underlyingly; with these tools it is possible to identify where lexical exceptionality shows its effects and how it is reined in by the grammar. The Semitic root is thus analogous to lexical roots in other languages, storing idiosyncratic phonological and semantic information but respecting the syntactic structure in which it is embedded.
Article
Stemming is one of the main important preprocessing techniques that can be used to enhance the accuracy of text classification. The key purpose of using the stemming is combining the number of words that have same stem to decrease high dimensionality of feature space. Reducing feature space cause to decline time to construct a model and minimize the memory space. In this paper, a new stemming approach is explored for enhancing Kurdish text classification performance. Tree data structure and Porter’s stemmer algorithms are incorporated for building the proposed approach. The system is assessed through using Support Vector Machine (SVM) and Decision Tree (C4.5) to illustrate the performance of the suggested stemmer after and before applying it. Furthermore, the usefulness of using stop words are considered before and after implementing the suggested approach.
Conference Paper
Arabic stemming as a technique of Natural Language Processing is increasingly becoming a significant research domain since Arabic is one of the most challengeing laguages. In this study, a new graph based-approach for stemming in Arabic documents was proposed. Moreover, an evaluation the impact of this stemmer on extrinsic plagiarism detection was elaborated. In this approach, a word is represented by a directed weighted graph having a set of connected components. Each of these components has a specific representation. Then, a stem is selected by comparing the word's representation with a database of 450 stems. This stemmer showed efficiency by improving the detection process of extrinsic plagiarism which is proved by the results obtained.