Content uploaded by Vijayan .T
Author content
All content in this area was uploaded by Vijayan .T on May 24, 2023
Content may be subject to copyright.
Expert Systems With Applications 227 (2023) 120303
Available online 2 May 2023
0957-4174/© 2023 Elsevier Ltd. All rights reserved.
Comparing cost sensitive classiers by the false-positive to false- negative
ratio in diagnostic studies
A. Kumaravel
a
, T. Vijayan
b
a
Department of Information Technology, Bharath Institute of Higher Education and Research, India
b
Department of Electronics and Communication Engineering, Bharath Institute of Higher Education and Research, India
ARTICLE INFO
Keywords:
Cost ratio
Confusion matrix
Cost matrix
Total cost
False positive
False negative
Cost sensitive learning
In vitro fertilization
ABSTRACT
Nowadays researchers want to be cautious about cost of building models which can generate false positives and
false negatives in unexpected ways. They keep on searching for various measures for controlling such behavior
depending upon the underlying datasets. Cost sensitive classiers are the models to check the total cost due to
misclassications. In this article, the cost sensitive classiers are tried for the rst time endowed with a new
measure ‘cost ratio’ to monitor such misclassications in the sensitive diagnostic studies. The scheme for vari-
ations of such ratio is introduced and its inuence on the loss is investigated. This cost ratio, a rational number,
ρ
is made up of the integers for the cost of false positive by its’ frequency of occurrences, in the numerator and the
similar cost of false negative in the denominator. We apply this novel cost monitoring measure for learning the
sample dataset of sensitive nature in the context of in vitro fertilization (IVF) dataset indicating the success or
failure of fertilization depending on the attributes like Age, Anti-Müllerian hormone (AMH), Right ovary (RO),
Left Ovary (LO), Number of eggs, No of Inseminations, No of fertilized and Egg quality. This article mainly makes
focus on variations of different ranges of cost ratio
ρ
and establishes the possibility of reducing errors in the
predictions made.
1. Introduction
It is natural for some sensitive decisions signicantly behave differ-
ently and inuence the outputs. IVF decisions by clinicians are prone to
wrong decision if false positive frequency is not taken care. Hence in this
article we propose the new measure made up of cost ratio based on false-
positive to false- negative occurrences.
In vitro fertilization (IVF), one of the types of assisted reproductive
technology (ART), carries the procedures for getting pregnancy through
fertilization, embryo development, and implantation. Combining pre-
scriptions of medicines and surgeries, IVF supports the patients in above
mentioned procedures. The main process of IVF uses medication for
making several eggs mature and being ready for further fertilization. In
the second step the eggs are removed from the body to mix with sperm
for fertilization. Out of these fertilized eggs called embryos are
implanted in the uterus. The annual reports of Indian Forum for fertility
clinics is less that produced across the United States (CDC, 2018; Sadecki
et al., 2022) . It does not have the information on women population,
treatment and clinical locations. But in contrast the Danish project
highlights not only these missing details but also the possibility of
inuencing factors (Baldur-Felskov et al., 2012; Bungum et al., 2019;
Thorsted et al., 2019) on infertility and future consequences.
Also there are cautions for the long-term health consequences of
infertility is presented (Murugappan et al., 2019; Pisarska, 2017) while
the tools for proper evaluation of results are not sufcient. Genetic
causes for impacting the guaranteed conception also contribute to the
Infertility added to the multi-factors from, both male and female sides,
usually by disruption of ER stress and cell death. Moreover, the associ-
ated obstetrical outcomes is inuenced by the treatment of infertility
(Vander Borght & Wyns, 2018).
The problem of nding effective and efcient classiers is the central
topic of knowledge discovery eld. Many techniques, methods and
principle are applied in nding more effective, efcient and also accu-
rate classier in data mining research. It is also important to evaluate
and opt preprocessing procedures applied on the given data set thor-
oughly to construct a best learning model for processing. There exists
context where cost sensitivity plays a major role. In most of the cases the
cost sensitive models are accepted due to their potential in producing
accurate or minimal error results as performance. But rate of false pos-
itive or false negative are not controlled. Many applications demand for
cost sensitive separate measures and they may be most tting. In order
to check this hypothesis, in this article we propose a measure in terms of
ratio between false- positive to false- negative try to get the results
through total cost for training and testing. The researchers with the
similar theme used either only one type of classier to measure the total
cost or they consider many types of data sets as found in the following
Contents lists available at ScienceDirect
Expert Systems With Applications
journal homepage: www.elsevier.com/locate/eswa
https://doi.org/10.1016/j.eswa.2023.120303
Received 31 January 2023; Received in revised form 11 April 2023; Accepted 27 April 2023
Expert Systems With Applications 227 (2023) 120303
2
related works. Hence most of the time one can see controlling the
occurrence of false negatives as if this has no effect on false positives and
vice versa. The notion of cost ratio dened as metric for measuring its
inuence on the evaluating measures like accuracy, precision, recall,
total cost and likelihood ratios for the classiers is rarely found in the
literature. Hence we have a novel framework in which we address this
issue rst of its kind.
The main objective of this paper is to produce a mapping between the
cost matrices (as input to the cost sensitive classiers) and confusion
matrixes (as output for extracting the occurrences of false positive and
false negative). Below the contents are divided into seven sections.
Section 2 and 3 consist of related works and materials. Section 4 and 5
describe the data set and proposed algorithm. Section 6 and 7 present
design of experiment and performance followed by conclusion.
2. Related works
The author (Thakkar et al., 2022 2022) while predicting customer
churning rate, the AdaBoost ensemble is applied by author with the help
of cost enabled cost sensitive classiers to reduce false-negative error
and the misclassication cost more signicantly inside an error-based
framework. (Mienye & Sun, 2021) investigated the strength of cost-
sensitive learning approaches in the context of imbalanced data set
using merely the conventional machine learning methods. (Telikani
et al., 2022) investigated cost sensitive classication by deep learning
based on partitioning the dataset and their corresponding cost matrix of
the components by dening a separate cost function layer. (Thai-Nghe
et al., 2010) presented two methods for cost sensitive learning for
imbalanced data using sampling techniques and optimizing cost ratio
locally. However, in many contexts of imbalanced dataset, the
misclassication costs cannot be determined completely. The cost-
sensitive learning technique takes misclassication costs into account
during the model construction, and does not modify the imbalanced data
distribution directly. Assigning distinct costs to the training examples
seems to be the most effective approach for the problem of class
imbalanced data. The author (Weiss et al., 2007) proved the dependence
of total cost by varying the cost ratio uniformly. The author (Domingos,
1999) shows the meta cost procedure helps cost reduction. Here we
present a frame work different from earlier work by allowing multiple
classiers instead of single classier SVM classier is used in (Thai-Nghe
et al., 2010), decisions are found in (Peter., 2001). The cost oriented
classiers built so far help us to reduce the risk associated with the
distribution of false positives in the predictions. In general methods for
measuring the loss due to wrong predictions is of interest and it varies
application to application signicantly.
The effect of making false positives varies one context to another. In
catalogue mailing for business promotion may yield small negative cost
in the case of non-respondent whereas this may be relatively more when
missing the potential respondent. Many researchers deal with the vari-
ety of algorithms (Kubat & Matwin, 1997; Pes & Lai, 2021; Peter., 2001)
to increase the accuracy of the evaluated models or to reduce the
probability of making wrong predictions. Sampling of training data
directly inuences the distribution of classes. The learning models ob-
tained from highly biased datasets are incapable of producing fair pre-
dictions. Hence either oversampling or under sampling can be used to
alter the class distribution (Weiss et al., 2007) of the training data as
found in (Abe et al., 2004; Breiman et al., 2017; Chan & Stolfo, 1998) .
In (Weiss et al., 2013) authors applied the heuristic technique for the
relationship between the cost of false positive and the cost of false
negative only for attribute selection. This heuristic method works with
the cooperation from the domain experts. In this case domain expert
happens to be the physician for coronary artery disease. They also
generated result based on the cost for subset of features and even cost for
individual feature (Khan et al., 2018). Here in our proposed work we
consider cost ratio of false positive to cost negative to establish the se-
lection of appropriate cost sensitive classiers.
Here the relationship (Equivalence in the nature of distributions)
between class frequencies and misclassication based on cost ratio was
established in differently (Ioannidis et al., 2011; Peter, 2001) . The
frequencies of positive and negative examples may be monitored to
make the learning algorithms cost sensitive. The authors in (Kubat &
Matwin, 1997) suggested the approximate equality of different classes
must be adopted for better performance. Epidemiology studies in
(Ioannidis et al., 2011) considers the ratios of false negative to false
positive for identifying risk factors contributing to causes and effects for
preventing health care. The ‘Black stone’ ratio emphases the thrust of
false negative and false positive in the criminal justice system to strike
an acceptable tradeoff between their cost in terms of reward and pun-
ishment. ‘Sentimental exaggerations’ as made in the criminal justice
system or medical diagnosis system reects as cost of false positive and
false negative in many forms.
The approach addresses the challenge of handling class-imbalanced
data, where the minority class holds greater signicance than the ma-
jority class a problem that standard machine learning classiers typi-
cally struggle with. To tackle this, correlation based feature selection is
utilized as a preprocessing technique to eliminate noise features and
extract the most relevant ones and gives superior geometric perfor-
mance (Elkarami et al., 2016).
3. Methods and materials
Cost-Sensitive Learning, the construction of such classier and their
parameters are described in the following sub sections.
3.1. Cost sensitive classier
A cost-sensitive classier refers to a mechanism in machine learning
that factors in the costs linked with various forms of classication errors.
While conventional classication treats all types of errors (false positives
and false negatives) uniformly, the idea in cost-sensitive classication is
for assigning distinct costs to each type of error. There are several ap-
proaches to introduce cost-sensitivity in machine learning models. One
way is to adjust the weights of training instances based on the assigned
cost of each class. Another approach is to predict the class that mini-
mizes the expected misclassication cost, instead of the most likely class.
Using a bagged classier can enhance the accuracy of probability esti-
mates from the base classier, leading to improved performance. In
cases where the base classier is unable to handle instance weights and
the weights are non-uniform, the data can be re-sampled with replace-
ment based on the weights prior to being fed into the base classier.
3.2. Cost-sensitive learning (CSL)
It is assumed that most classiers’ misclassication costs are same,
but in reality this assumption is not true always. For example, in diag-
nosis of cancer the misclassication is very serious than a false alarm
because the patient could loose his life due to delayed diagnosis and late
treatment (Ioannidis et al., 2011) In our processes we have revised our
equations and parameters of cost calculation in terms of materials shown
below.
The cost values are tabulated in cost matrix as in Table 1, with same
structure as confusion matrix, a table with main diagonal entries aligned
exactly true as true and false as false, during training and testing
Table 1
Templatefor Cost Matrix based on confusion matrix.
Predicted Class
Negative Positive
Actual Class Negative C
11
C
12
Positive C
21
C
22
A. Kumaravel and T. Vijayan
Expert Systems With Applications 227 (2023) 120303
3
respectively. The non– aligned entries are in the non– diagonal posi-
tions. The below table shows the template of cost matrix according to
confusion matrix structure.
The above cost matrix helps us to calculate the total cost and we can
vary the non-diagonal elements for studying the nature of misclassi-
cation. If misclassication cost in terms of false positive or false nega-
tive, is known, total cost is chosen in this proposed model as the best
metric to evaluate classier performance as shown below. We have
enumerated only total cost evaluation for the metric of the performance
applied for the four cost sensitive learning methods using ‘tree’ type of
base classiers.
The cost ratio
ρ
is dened as the ratio
ρ
=Number of falsepositives
Number of falsenegatives (1)
The equation below shows the applied Total Cost formula.
Total Cost = (FN ×CFN) + (FP ×CFP)(2)
where
FN =# false negatives, FP =# false positives obtained from the
confusion matrix generated as outputs while training and testing are
carried out.
CFP – cost of false positive denoted by C
12
in the Table 1.
CFN – cost of false negative denoted by C
21
in the Table 1.
3.3. Cost-Sensitive classiers construction
This section explains about assumptions made to construct an algo-
rithm to generate FP -false positives and FN-false negatives in minimum
level using cost sensitive classiers. Below four different cases are
explained how this generated value are used in following processes.
The ratios of false positive to false negative is used as inputs for the
main algorithms and extract output from confusion matrix. The chal-
lenge in training through the cost sensitive classier for learning the
model is to obtain misclassication cost for expected classication
results.
Here we consider C(i, j), where i and j take values 1 or 2, indicates the
cost of predicting an instance belonging to class i while ground truth is
that it belongs to class j. The main algorithm rolls around the ratio C
(1,2)/ C(2,1) and the inverse of this by either uniform increment or
relative prime steps amounting to four different styles. The main
objective of this process is to nd acceptable such ratio as it varies across
different values in these four different styles.
The objective is to construct the mapping ζ from Q to R where the
domain is the set of rational numbers and the range is the set of real
numbers indicating the cost associated. ζ (x) =c indicates a ratio FP: FN,
x in Q takes the cost c where FP and FN are two integers (McCrimmon,
1960; Sagher, 1989; Yu-Ting, 1980) Here we have adopted three types
of variations for ×in Q, distinguished by four cases as described below
as follow.
1. Case 1. If ×is of the form 1/y where y is non negative integer.
2. Case 2. If ×is of the form p/q where p, q are non-negative integers
and gcd(p,q) =1.
3. Case 3. Reciprocal of ×in case 1.
4. Case 4. Reciprocal of ×in case 2.
These four cases allow all possible ×in Q i.e. values of ×(FP: FN-
ratio), which is considered based on number theory results discussed
in (Domingos, 1999; Thai-Nghe et al., 2010; Weiss et al., 2007) By using
these four cases the main algorithm is formed for four components
mentioned in below Table 2.
4. Data collection and preprocessing
The data collection for the underlying data set is based on IVF ex-
ercises carried out at prasanth Fertility Chennai Centre, India which had
been approved by the Centre’s Review Board. The period of collection
carried out between the year 2016 to 2018 and it has been amounted a
sample size with 327 patient records having class distribution:118
negative,209 positive instances (Hari Priya, 2021). Hence we realize the
class ratio of negative and positive 1:2.
Exclusion criteria go along the following lines:
Table 2
Algorithm components based on Cost Ratio.
Ratio Pattern (CFP:CFN) Uniform Inverse
Normal CSC-U CSC-UI
Non Uniform (Relatively Prime) CSC-NU CSC-NUI
Fig. 1. Attribute distribution chart.
A. Kumaravel and T. Vijayan
Expert Systems With Applications 227 (2023) 120303
4
a) Women gone through ovarian surgery or having any endocrine dis-
orders were not included for the study.
b) Those who cannot show the ovarian simulation results controlling
within 150 IU/day with pituitary suppression of FSH 100 IU/day
(Recagon) sensed by TVS scan along estradiol serum measuring as
standard methods (Bas-Lando et al., 2017; Muttukrishna et al., 2005;
Uyar et al., 2014) .
This dataset contains 8 attributes as columns and 327 patients’ re-
cords in real time. The distribution of each attribute is shown in Fig. 1.
Discretization of attributes Age, Egg Quality, Anti-Müllerian hor-
mone (AMH), Right Ovary (RO), Left Ovary (LO), No of Insemination
and No of fertilized, No of eggs.
Age: The range is divided into ve sectors s1, s2, s3, s4 and s5 as
s1:20–25, s2:26–30, s3:30–35, s4:36–40 and s5:41–46.
Egg Quality: The range is divided into ve sectors s1, s2, s3, s4 and s5
as s1:0.01, s2:0.25, s3:0.5, s4:0.75 and s5:1.
Anti-Müllerian hormone (AMH): The range is divided into ve sec-
tors s1, s2, s3, s4 and s5 as s1:0–2.0, s2:2.1–4.0, s3:4.1–6.0, s4:6.1–8.0
and s5:8.1–10.6.
Right ovary (RO), Left Ovary (LO), No of Insemination and No of
fertilized: The range is divided into ve sectors s1, s2, s3, s4 and s5 as
s1:0–5, s2:6–10, s3:11–15, s4:16–20 and s5:21–38.
No of eggs: The range is divided into ve sectors s1, s2, s3, s4 and s5
as s1:1–5, s2:6–10, s3:11–15, s4:16–20 and s5:21–43.
5. Proposed main algorithm
In this paper article cost sensitive classiers are constructed with the
help of algorithm in Fig. 3 to implement the above-described algorithm
components. To build cost sensitive classiers Weka tool (Weka, 2021)
is used by tuning the ratio
ρ
as planned in section 3.2. Below in Table 2
we have described the inter related four components of main algorithm.
The prex CSC stands for Cost Sensitive Classier and sufx U, UI,
NU and NUI denote the components of uniform, uniform inverse, non
-uniform and non -uniform inverse respectively as stated in Table 2.The
main process and its steps involved in the algorithm is shown in Fig. 2.
Fig. 2. Cost Sensitive classier based on cost ratio.
Fig 3. Proposed Algorithm for Cost Ratio based Cost Sensitive Learning.
A. Kumaravel and T. Vijayan
Expert Systems With Applications 227 (2023) 120303
5
5.1. Pseudo code for cost sensitive classier by cost ratio
In Fig. 3 we specify the steps of algorithm components annotated
in‘[…]’ and other steps common for all the 4 cases are described in the
rest.
Dataset D contains the underlying date instances, here the instances
of patient attributes as shown in 3 contributing the class values either
positive or negative. This dataset contains 327 records and 8 features in
each record. Being the context to be treated cost-sensitive, preprocessing
like normalization and attribute selection is not considered as all attri-
butes carry equal signicance. The data collection does not contain any
missing or blank data.
In line 2, b
i
denotes any tree classier∈{J48, LMT, ADT, Decision
Stump}.
Between the two loops, outer loop is for iterating over four tree
classiers and inner loop for iteration over the index i variation for the
cost ratio extracted from the cost matrix. After xing the loop variants
for the current iteration, the “TC” procedure call takes care of training,
testing and classifying and nally the total cost involved. Final output is
presented by the optimal value among the set of total cost generated by
the above iterations.
5.1.1. Component for CSC-U:
To obtain the cost value as discussed earlier the ratio
ρ
’s numerator is
incremented uniformly from 1 to10 and denominator is xed.
5.1.2. Component for CSC-UI:
This is similar to that of CSC -UI by just inversing Component for
CSC-U where CFN =c
21
=i andc
12
=1, i.e. i∈{1, 2,..,10}.
5.1.3. Component for CSC-NU:
The above algorithm is applied with the ratio
ρ
as m: n where m & n
are relatively primes and their values are with in principle range 1 to10.
The only extra complexity for non-uniform cost algorithm is testing the
relative primality condition by greatest common divisor that is gcd (m,
n) =1.
5.1.4. Component for CSC-NUI:
From the ratio
ρ
calculated by CFP/CFN, we consider its inverse and
repeat the classication process to get the new results for both case
uniform and non-uniform through their index in non-decreasing order.
Hence we obtain the non-uniform cost algorithm namely algorithm NUI
by simply inverting the loop indices in the algorithm for NU.
Fig. 4. Uniform Variation of Total Cost.
Fig. 5. Uniform Inverted Variation of Total Cost.
A. Kumaravel and T. Vijayan
Expert Systems With Applications 227 (2023) 120303
6
5.2. Interpretation of main algorithm
To get the output list of classier’s performance features like accu-
racy, precision, recall, total cost and likelihood ratio we apply the main
algorithm and each component works depending on the range type of
the cost ratio. The innermost part is common to all types of components
and it consists of the steps for classifying and generating confusion
matrix. The index of the outermost loop varies over base classiers (for
these experiments we consider only decision trees). The next level inner
loop’s index type is varying for each component. In the rst component
CSC-U the index (cost ratio) varies uniformly by unit increment whereas
in CSC-UI the index is inverted. The two components in the non-uniform
case CSC-NU and CSC-NUIare similar except the index determined by
cost ratio with numerator and denominator selected as co primes. The
inputs of the main algorithm are dataset and the cost matrix. The entries
in the cost matrix are assumed to be integers for the sake of simplicity.
The values of right diagonal elements c
21
and c
12
are restricted to
principle values 1 to 10 and we construct the four types of ratios based
on these values. The reasons for the selection of values in principle range
are rstly most of the classiers show their behavior stable in this range
and secondly even if it is not the case, the extremely large learning time
for larger values.
The comparisons of above mentioned four components in algorithm
are shown through tables and graphs below.
6. Experimental results
We implemented data from clinical records of IVF from data base in
tree classiers for cost sensitive learners in Weka platform (Weka,
2021). The best four classiers namely J48, ADT, LMT,and Decision
Stump was adopted for this test.
In Figs. 4 and 5, by comparing the graph of CSC-U and CSC-UI it
shows that magnitude of total cost is high while increasing false positive
in main algorithm and also noted that the increase in total cost is
increasing gradually.it is also noted that choosing right tree classier is
important because it is observed that in both in CSC-U J48 produces
more total cost and decision stump shows low total cost, but in CSC-UI
the reverse is seen where J48 produces less total cost compare to deci-
sion stump. (See Table 3 and 4).
In Figs. 6 and 7, by comparing the graph of CSC-NU and CSC-NUI it
shows that magnitude of total cost is high while increasing false positive
in main algorithm and also noted that the increase in total cost is
increasing gradually it is also noted that choosing right tree classier is
important because it is observed that in both in CSC-NU LMT produces
more total cost and decision stump shows low total cost, but in CSC-NUI
the reverse is seen where Decision Stump produces less total cost
compare to decision stump (See Table 5 and 6).
7. Conclusion
The cost sensitive model for the IVF data sets is processed for four
different ranges of cost ratio. The results show the inuence of cost ratio
false positive to false negative is varying for different types of tree
classier as done here namely J48, ADT, LMT and Decision stump for
Table 3
Performance by Total cost based on the ratio (false positive: false negative) in
Cost Sensitive Classiers applying CSC-U.
Total Cost for IVF Cost sensitive Classiers CSC-U
Cost Ratio J48 LMT ADT Decision Stump
1:1 121 131 107 123
1:2 191 214 184 237
1:3 261 297 261 351
1:4 331 380 338 465
1:5 401 463 415 579
1:6 471 546 492 693
1:7 541 629 569 807
1:8 611 712 646 921
1:9 681 795 723 1035
1:10 751 876 800 1149
Table 4
Performance by Total cost based on the ratio (false positive: false negative) in
Cost Sensitive Classiers applying CSC-UI.
Total Cost for IVF Cost sensitive Classiers CSC-UI
Cost Ratio J48 LMT ADT Decision Stump
1:1 121 131 107 123
2:1 172 179 137 132
3:1 223 227 167 141
4:1 274 275 197 150
5:1 325 323 228 159
6:1 376 371 257 168
7:1 427 419 287 177
8:1 478 467 317 186
9:1 529 515 347 195
10:1 580 563 377 204
Fig. 6. Non – Uniform Variation of Total Cost.
A. Kumaravel and T. Vijayan
Expert Systems With Applications 227 (2023) 120303
7
IVF dataset. More over the magnitude of total cost is increasing gradu-
ally with change in ratio and based on the obtained results best classier
can been chosen from the better understanding of classier in future.
The limitations primarily are visible in the number of iterations based on
the index varying only in the initial segment of integers 1–10. Though it
works for the purposes establishing the existence of mapping on the cost
ratio and demonstrating the same, this restriction can be relaxed higher
number of iterations with computing facility for more time and space for
generating the cost models. The future work can be extended with- the
study for other types of cost sensitive meta classiers to measure the
error cost as discussed in this work.
CRediT authorship contribution statement
A. Kumaravel: Conceptualization, Methodology, Writing – original
draft, Visualization, Supervision, Writing – review & editing. T.
Vijayan: Software, Data curation, Investigation, Validation, Project
administration.
Declaration of Competing Interest
The authors declare that they have no known competing nancial
interests or personal relationships that could have appeared to inuence
the work reported in this paper.
Data availability
Data will be made available on request.
Acknowledgements
We would like to thank authorities of Prasanth Fertility Hospital,
Chennai, India for allowing the real time data used for this research
work.
Fig. 7. Non – Uniform Inverted Variation of Total Cost.
Table 5
Performance by Total cost based on the ratio (false positive: false negative) in
Cost Sensitive Classiers applying CSC-NU.
Total Cost for IVF Cost sensitive Classiers CSC-NU
Cost Ratio J48 LMT ADT Decision Stump
1:1 121 131 107 123
1:3 223 227 167 141
1:4 274 275 197 150
2:3 293 310 244 255
2:5 244 406 304 273
2:7 497 502 364 291
2:9 599 327 424 309
3:4 414 441 351 378
3:7 567 585 441 405
4:5 535 572 458 501
4:7 637 668 518 519
4:9 739 764 578 537
5:6 657 703 565 624
5:7 707 751 595 633
5:8 758 799 625 642
5:9 809 847 655 651
6:7 777 834 672 747
7:8 898 965 779 870
7:9 949 1013 809 879
8:9 1019 1096 886 993
Table 6
Performance by Total cost based on the ratio (false positive: false negative) in
Cost Sensitive Classiers applying CSC-NUI.
Total Cost for IVF Cost sensitive Classiers CSC-NUI
Cost Ratio J48 LMT ADT Decision Stump
1: 1 121 131 107 123
3: 1 261 297 261 351
3: 2 312 345 291 360
4: 1 331 380 338 465
4: 3 433 476 398 483
5: 2 452 511 445 588
5: 4 554 607 505 606
6: 5 675 738 612 729
7: 2 592 677 599 816
7: 3 643 725 629 825
7: 4 694 773 659 834
7: 5 745 821 689 843
7: 6 796 869 719 852
8: 5 815 904 766 957
8: 7 917 1000 826 975
9: 2 732 843 753 1044
9: 4 834 939 813 1062
9: 5 885 987 843 1071
9: 7 987 1083 903 1089
9: 8 1038 1131 933 1098
A. Kumaravel and T. Vijayan
Expert Systems With Applications 227 (2023) 120303
8
References
Abe, N., Zadrozny, B., & Langford, J. (2004). An iterative method for multi-class cost-
sensitive learning. Proceedings of the Tenth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining. https://doi.org/10.1145/1014052.1014056.
Baldur-Felskov, B., Kjaer, S. K., Albieri, V., Steding-Jessen, M., Kjaer, T., Johansen, C.,
Dalton, S. O., & Jensen, A. (2012). Psychiatric disorders in women with fertility
problems: results from a large Danish register-based cohort study. Human
Reproduction, 28(3), 683–690. https://doi.org/10.1093/humrep/des422
Bas-Lando, M., Rabinowitz, R., Farkash, R., Algur, N., Rubinstein, E., Schonberger, O., &
Eldar-Geva, T. (2017). Prediction value of anti-Mullerian hormone (AMH) serum
levels and antral follicle count (AFC) in hormonal contraceptive (HC) users and non-
HC users undergoing IVF-PGD treatment. Gynecological Endocrinology, 33(10),
797–800. https://doi.org/10.1080/09513590.2017.1320376
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (2017, October 19).
Classication And Regression Trees. https://doi.org/10.1201/9781315139470.
Bungum, A. B., Glazer, C. H., Arendt, L. H., Schmidt, L., Pinborg, A., Bonde, J. P., &
Tøttenborg, S. S. (2019). Risk of hospitalization for early onset of cardiovascular
disease among infertile women: a register-based cohort study. Human Reproduction,
34(11), 2274–2281. https://doi.org/10.1093/humrep/dez154
Chan, P. K., & Stolfo, S. (1998). Toward scalable learning with non-uniform class and
cost distributions: A case study in credit card fraud detection. Knowledge Discovery
and Data Mining.
CDC. (2018). 2017 Fertility Clinic Success Rates | Assisted Reproductive Technology
(ART) Report | Reproductive Health | CDC. https://www.cdc.gov/art/repor
ts/2017/fertility-clinic.html.
Domingos, P. (1999). MetaCost. Proceedings of the Fifth ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/
312129.312220.
Elkarami, B., Alkhateeb, A., & Rueda, L. (2016, May). Cost-sensitive classication on
class-balanced ensembles for imbalanced non-coding RNA data. 2016 IEEE EMBS
International Student Conference (ISC). https://doi.org/10.1109/
embsisc.2016.7508607.
Hari Priya, G., et al. (2021). Classiers with synthetic oversampling pre-process for In
Vitro Fertilization predictions. Indian Journal of Computer Science and Engineering, 12
(6), 1532–1541. https://doi.org/10.21817/indjcse/2021/v12i6/211206061.
Ioannidis, J. P. A., Tarone, R., & McLaughlin, J. K. (2011). The False-positive to False-
negative Ratio in Epidemiologic Studies. Epidemiology, 22(4), 450–456. https://doi.
org/10.1097/ede.0b013e31821b506e
McCrimmon, K. (1960). Enumeration of the positive rationals. The American
Mathematical Monthly, 67(9), 868.
Khan, S. H., Hayat, M., Bennamoun, M., Sohel, F. A., & Togneri, R. (2018). Cost-Sensitive
Learning of Deep Feature Representations From Imbalanced Data. IEEE Transactions
on Neural Networks and Learning Systems, 29(8), 3573–3587. https://doi.org/
10.1109/tnnls.2017.2732482
Kubat, M. and Matwin, S. (1997). Addressing the curse of imbalanced training sets: One-
sided selection. Proceedings of the 14th International Conference in Ma chine
Learning, Nashville, 179-186.
Mienye, I. D., & Sun, Y. (2021). Performance analysis of cost-sensitive learning methods
with application to imbalanced medical data. Informatics in Medicine Unlocked, 25,
Article 100690. https://doi.org/10.1016/j.imu.2021.100690
Murugappan, G., Li, S., Lathi, R. B., Baker, V. L., & Eisenberg, M. L. (2019). Increased risk
of incident chronic medical conditions in infertile women: analysis of US claims data.
American Journal of Obstetrics and Gynecology, 220(5), 473.e1–473.e14. https://doi.
org/10.1016/j.ajog.2019.01.214
Muttukrishna, S., McGarrigle, H., Wakim, R., Khadum, I., Ranieri, D., & Serhal, P. (2005).
Antral follicle count, anti-mullerian hormone and inhibin B: predictors of ovarian
response in assisted reproductive technology? BJOG: An International Journal of
Obstetrics & Gynaecology, 112(10), 1384–1390. https://doi.org/10.1111/j.1471-
0528.2005.00670.x
Pes, B., & Lai, G. (2021). Cost-sensitive learning strategies for high-dimensional and
imbalanced data: a comparative study. Peer J Computer Science, 7. https://doi.org/
10.7717/peerj-cs.832
Peter. (2001, August). The foundations of cost-sensitive learning. IJCAI’01: Proceedings of
the 17th International Joint Conference on Articial Intelligence, 2, 973–978. https://
doi.org/10.5555/1642194.1642224.
Pisarska, M. D. (2017, June 28). Fertility Status and Overall Health. PubMed Central
(PMC). https://doi.org/10.1055/s-0037-1603728.
Sadecki, E., Weaver, A., Zhao, Y., Stewart, E. A., & Ainsworth, A. J. (2022). Fertility
trends and comparisons in a historical cohort of US women with primary infertility.
Reproductive Health, 19(1). https://doi.org/10.1186/s12978-021-01313-6
Telikani, A., Gandomi, A. H., Choo, K. K. R., & Shen, J. (2022). A cost-sensitive deep
learning-based approach for network trafc classication. IEEE Transactions on
Network and Service Management, 19(1), 661–670. https://doi.org/10.1109/
tnsm.2021.3112283
Thai-Nghe, N., Gantner, Z., & Schmidt-Thieme, L. (2010, July). Cost-sensitive learning
methods for imbalanced data. The 2010 International Joint Conference on Neural
Networks (IJCNN). https://doi.org/10.1109/ijcnn.2010.5596486.
Thakkar, H. K., Desai, A., Ghosh, S., Singh, P., & Sharma, G. (2022, January 22).
Clairvoyant: AdaBoost with Cost-Enabled Cost-Sensitive Classier for Customer
Churn Prediction. Computational Intelligence and Neuroscience, 2022, 1–11. https://
doi.org/10.1155/2022/9028580.
Thorsted, A., Lauridsen, J., Høyer, B., Arendt, L. H., Bech, B., Toft, G., Hougaard, K.,
Olsen, J., Bonde, J. P., & Ramlau-Hansen, C. (2019). Birth weight for gestational age
and the risk of infertility: a Danish cohort study. Human Reproduction, 35(1),
195–202. https://doi.org/10.1093/humrep/dez232
Uyar, A., Bener, A., & Ciray, H. N. (2014). Predictive modeling of implantation outcome
in an in vitro fertilization setting. Medical Decision Making, 35(6), 714–725. https://
doi.org/10.1177/0272989x14535984
Vander Borght, M., & Wyns, C. (2018). Fertility and infertility: Denition and
epidemiology. Clinical Biochemistry, 62, 2–10. https://doi.org/10.1016/j.
clinbiochem.2018.03.012
Weiss, G. M., McCarthy, K., & Zabar, B. (2007). Cost-sensitive learning vs. sampling:
Which is best for handling unbalanced classes with unequal error costs? DMIN, 7
(35–41), 24.
Weiss, Y., Elovici, Y., & Rokach, L. (2013). February). The CASH algorithm-cost-sensitive
attribute selection using histograms. Information Sciences, 222, 247–268. https://doi.
org/10.1016/j.ins.2011.01.035
Weka (2021). Department of Computer Science: University of Waikato. (n.d.). Department of
Computer Science: University of Waikato. http://www.cs.waikato.ac.nz.
Sagher, Y. (1989). Counting the rationals. Amer. Math. Monthly, 96(9), 823.
Yu-Ting, S. (1980). A “Natural” enumeration of non-negative rational numbers–an
informal discussion. The American Mathematical Monthly, 87(1), 25. https://doi.org/
10.2307/2320374
A. Kumaravel and T. Vijayan