ResearchPDF Available

Classification of Cardiotocogram Data using Neural Network based Machine Learning Technique

June 2012

June 2012

Authors:

Arulmigu palaniniandavar college of Arts and Culture

Cardiotocography (CTG) is a simultaneous recording of fetal heart rate (FHR) and uterine contractions (UC). It is one of the most common diagnostic techniques to evaluate maternal and fetal well-being during pregnancy and before delivery. By observing the Cardiotocography trace patterns doctors can understand the state of the fetus. There are several signal processing and computer programming based techniques for interpreting a typical Cardiotocography data. Even few decades after the introduction of cardiotocography into clinical practice, the predictive capacity of the these methods remains controversial and still inaccurate. In this paper, we implement a model based CTG data classification system using a supervised artificial neural network(ANN) which can classify the CTG data based on its training data. According to the arrived results, the performance of the supervised machine learning based classification approach provided significant performance. We used Precision, Recall, F-Score and Rand Index as the metric to evaluate the performance. It was found that, the ANN based classifier was capable of identifying Normal, Suspicious and Pathologic condition, from the nature of CTG data with very good accuracy.

Content uploaded by M Chitra Devi

Content may be subject to copyright.

International Journal of Computer Applications (0975 – 888)

Volume 47– No.14, June 2012

Classification of Cardiotocogram Data using Neural

Network based Machine Learning Technique

Sundar.C

Christian College of Engineering

Technology,

Oddanchatram – 624619

M.Chitradevi

PRIST University

Trichy Campus – Tamilnadu

Tiruchirappalli – 620 009

Dr.G.Geetharamani

Anna University of Technology

Tiruchirappalli - 620024

ABSTRACT

Cardiotocography (CTG) is a simultaneous recording of fetal

heart rate (FHR) and uterine contractions (UC). It is one of the

most common diagnostic techniques to evaluate maternal and

fetal well-being during pregnancy and before delivery. By

observing the Cardiotocography trace patterns doctors can

understand the state of the fetus. There are several signal

processing and computer programming based techniques for

interpreting a typical Cardiotocography data. Even few

decades after the introduction of cardiotocography into

clinical practice, the predictive capacity of the these methods

remains controversial and still inaccurate. In this paper, we

implement a model based CTG data classification system

using a supervised artificial neural network(ANN) which can

classify the CTG data based on its training data. According to

the arrived results, the performance of the supervised machine

learning based classification approach provided significant

performance. We used Precision, Recall, F-Score and Rand

Index as the metric to evaluate the performance. It was found

that, the ANN based classifier was capable of identifying

Normal, Suspicious and Pathologic condition, from the nature

of CTG data with very good accuracy.

Keywords

Multidimensional Data Classification, Medical Data

Classification, Cardiotocography, CTG, fetal heart rate, FHR.

uterine contractions, UC, ANN.

1. INTRODUCTION

Data Mining (DM) and the technology of Knowledge

Discovery from Data (KDD) has brought many new

developments, methods, and technologies in the recent

decade. Also the improvement of integration of techniques

and the application of data mining techniques had contributed

in handling of new kinds of data types and applications.

However, the field of data mining and its application in

medical domain is still young enough so that the possibilities

of the application are still limitless [20].

One of the major challenges in medical domain is the

extraction of comprehensible knowledge from medical

diagnosis data such as CTG data. In this information era, the

use of machine learning tools in medical diagnosis is

increasing gradually. This is mainly because the effectiveness

of classification and recognition systems has improved in a

great deal to help medical experts in diagnosing diseases[21].

1.1 Cardiotocography (CTG)

Cardiotocography (CTG) is a technical means of recording

the fetal heart rate (FHR) and the uterine contractions (UC)

during pregnancy, typically in the third trimester to evaluate

maternal and fetal well-being. FHR patterns are observed

manually by obstetricians during the process of CTG analysis.

In the recent past fetal heart rate baseline and its frequency

analysis has been taken in to research on many aspects [2],[6].

Fetal heart rate (FHR) monitoring is mainly used to find out

the amount of oxygen a fetus is acquiring during the time of

labor [7]. Even then death and long term disablement occurs

due to hypoxia during delivery. More than 50% of these

deaths were caused by not recognizing the abnormal FHR

pattern, even after recognizing not communicating the same

without knowing the seriousness and the delay in taking

appropriate action [7]. The currently proposed computation

and datamining techniques for FHR can be used for analyzing

and classifying the CTG data to avoid human mistakes and

helps the doctors to take a decision.

2. PROBLEM DEFINITION

Cardiotocography (CTG), consisting of fetal heart rate (FHR)

and tocographic (TOCO) measurements, is used to evaluate

fetal well-being during the delivery. Since 1970 many

researchers have employed different methods to help the

doctors to interpret the CTG trace pattern from the field of

signal processing and computer programming [2]. They have

supported doctors with interpretations in order to reach a

satisfactory level of reliability so as to act as a decision

support system in obstetrics. Up to now, none of them has

been adopted worldwide for everyday practice (Van

Geijnt, 1996). There is currently no consensus on the best

methodology for baseline estimation in computer analysis of

cardiotocographs [2]. More than 30 years after the

introduction of antepartum cardiotocography into clinical

practice, the predictive capacity of the method remains

controversial. In a review of lot of articles published on this

subject, it was found that its reported sensitivity varies

between 2 and 100%, and its specificity between 37 and 100%

[5]. So, in this work, we are going to evaluate some of the

statistical, machine learning and datamining techniques for the

classification of CTG data.

Classification can be viewed as a supervised learning

scenario. Here a training data set of records is accompanied

by class labels. New data can be classified based on the

training set by generating descriptions of the classes. In

addition to the training set, there is also a test data set which is

used to determine the effectiveness of a classification. In

principle, the popular neural network can be trained to

recognize the data directly. However, a simple network can be

very complex and difficult to train. Further, if the dimension

of the input data is high, then the training process will

consume very lot of time and the accuracy of classification

also vary with the increase of dimension in the training data.

Generally, the techniques used in the neural network systems

will depend on the application of the system.

International Journal of Computer Applications (0975 – 888)

Volume 47– No.14, June 2012

As means of data collection have become more capable, the

need for non-linear modeling techniques has become more

and more apparent. Traditional statistical methods rely on an

assumption of linearity. However, since most of the data

collected concerns, or is the result of, human behavior, and

humans rarely behave linearly, methods that assume linear

separability are ultimately doomed to failure. Furthermore,

data collection streams are broadening. The number of

variables of concern to modelers has increased by at least an

order of magnitude. Traditional methods simply were not

designed to work with one hundred or more variables.

In answer to this, the last decade has seen the emergence of

neural networks as a means of non-linear modeling. These

devices resulted from the efforts of a number of cognitive

scientists to mimic learning and memory in the human brain.

The back-propagation neural network in particular has proven

successful in creating useful models from large masses of

complex data. The algorithm has been successfully applied in

a variety of settings including direct marketing, intelligence

and process control. Because of its pattern recognition nature

it has proven robust with respect to missing data and other

data irregularities.

2.1 The Medical Background Of

Cardiotocography (CTG)

Cardiotocography is a medical test conducted during

pregnancy that records fetal heart rate(FHR) and uterine

contractions. The tests may be conducted by either internal or

external methods. In internal testing, a catheter is placed in the

uterus after a specific amount of dilation has taken place.

With external tests, a pair of sensory nodes is affixed to the

mother's stomach. The CTG trace generally shows two lines.

The upper line is a record of the fetal heart rate in beats per

minute. The lower line is a recording of uterine contractions

from the TOCO [4].

2.1 Baseline Heart Rate

The baseline heart rate helps to evaluate the healthy

functioning of the cardiovascular system. The baseline fetal

heart rate is determined by approximating the mean FHR

rounded to increments of 5 beats per minute (bpm) during a

10-minute window, excluding accelerations and decelerations

and periods of marked FHR variability (greater than 25 bpm.

Abnormal baseline is termed bradycardia and tachycardia.

The fluctuations are visually quantitated as the amplitude of

the peak- to-trough in bpm. Using this definition, the baseline

FHR variability is categorized by the quantitated amplitude

as:

Absent- undetectable

Minimal- greater than undetectable, but less than or equal

to 5 bpm

Moderate- 6 bpm - 25 bpm

Marked- greater than 25 bpm

Bradycardia: It is the resting heart rate of under 60 beats

per minute, though it is seldom symptomatic until the rate

drops below 50 beats/min. It may cause cardiac arrest in some

patients

Tachycardia: It typically refers to a heart rate that

exceeds the normal range for a resting heart rate (heart rate in

an inactive or sleeping individual). It can be dangerous

depending on the speed and type of rhythm.

Type 1 (early)

This occurs during the peak of the uterine contraction. It will

be uniform, repetitive, periodic slowing of FHR with onset

early in the contraction and return to baseline at the end of the

contraction. The reasons behind this may be fetal head

compression, cord compression or early hypoxia. This occurs

in first and second stage labor with decent of the head [4].

This is synchronous with uterine contraction

Type 2 (late)

This occurs after the peak of the uterine contraction. It will

also be uniform, repetitive, slowing of FHR with onset mid to

end of the contraction and nadir more than 20 seconds after

the peak of the contraction and ending after the contraction. If

the lag time is high seriousness is also high. This is also

synchronous with uterine contraction. Mx: a fetal pH

measurement is mandatory [4].

Type 3 (variable)

This is variable, repetitive, periodic slowing of FHR with

rapid onset and recovery. Variable and isolated time

relationships with contraction cycles may occur. In some

cases, they resemble other types of deceleration patterns in

timing and shape. If they occur consistently, there is a chance

of fetal hypoxia. This is unrelated to uterine contractions.

Mx: check fetal pH if the pattern persists after turning the

patient on her side (or if other adverse features are present)

[4].

3. CLASSIFICATION USING

ARTIFICIAL NEURAL NETWORK

3.1 ANN Based Classification

Here in this classification, we use supervised learning by

using a set of training data which is accompanied by class

labels. When a new data arrive, then classification of that data

will be done based on the training set by generating

descriptions of the classes. In addition to training set we also

have a test data set that is used to determine the effectiveness

of a classification. In general, commonly used and popular

neural networks can be trained to recognize the data directly,

whereas in simple networks there is a chance of the system

being complex and training may be difficult. The time taken

and the accuracy of classification depend on the dimension of

the input given and also on the dimension in the training data.

For input data with high dimension, the process will take a

longer time.

3.2 Structuring the Network

The number of layers and the number of processing elements

per layer are important decisions. These parameters to a feed

forward, back-propagation topology are also the most ethereal

- they are the "art" of the network designer. There is no

quantifiable, best answer to the layout of the network for any

particular application. There are only general rules picked up

over time and followed by most researchers and engineers

applying this architecture to their problems.

Rule One: As the complexity in the relationship between

the input data and the desired output increases, the number of

the processing elements in the hidden layer should also

increase.

Rule Two: If the process being modeled is separable into

multiple stages, then additional hidden layer(s) may be

International Journal of Computer Applications (0975 – 888)

Volume 47– No.14, June 2012

required. If the process is not separable into stages, then

additional layers may simply enable memorization of the

training set, and not a true general solution effective with

other data.

Fig 1: Feed forward Network

Rule Three: The amount of training data available sets an

upper bound for the number of processing elements in the

hidden layer(s). To calculate this upper bound, use the number

of cases in the training data set and divide that number by the

sum of the number of nodes in the input and output layers in

the network. Then divide that result again by a scaling factor

between five and ten. Larger scaling factors are used for

relatively less noisy data. If you use too many artificial

neurons the training set will be memorized. If that happens,

generalization of the data will not occur, making the network

useless on new data sets.

A single-layer network of S logsig neurons having R inputs is

shown below in full detail on the left and with a layer diagram

on the right [16].

Feed forward networks often have one or more hidden layers

of sigmoid neurons followed by an output layer of linear

neurons [11], Multiple layers of neurons with nonlinear

transfer functions allow the network to learn nonlinear and

linear relationships between input and output vectors. The

linear output layer lets the network produce values outside the

range -1 to +1. On the other hand, if you want to constrain the

outputs of a network (such as between 0 and 1), then the

output layer should use a sigmoid transfer function.

3.3 The ANN based CTG Data

Classification System

The Fig. 2 shows the ANN based CTG data Classification

system.

The Metrics Used for the Evaluation

Precision, recall and F-Score are computed for every (class,

cluster) pair. But Rand index is a metric which will consider

all the classes and the clusters as the whole.

Rand Index

The Rand index or Rand measure is a commonly used

technique for measure of such similarity between two data

clusters.

Given a set of n objects S = {O1, ..., On} and two data

clusters of S which we want to compare: X = {x1, ..., xR} and

Y = {y1, ..., yS} where the different partitions of X and Y are

disjoint and their union is equal to S; we can compute the

following values:

a is the number of elements in S that are in the same

partition in X and in the same partition in Y,

b is the number of elements in S that are not in the same

partition in X and not in the same partition in Y,

c is the number of elements in S that are in the same

partition in X and not in the same partition in Y,

d is the number of elements in S that are not in the same

partition in X but are in the same partition in Y.

Intuitively, one can think of a + b as the number of

agreements between X and Y and c + d the number of

disagreements between X and Y. The Rand index, R, then

becomes,

International Journal of Computer Applications (0975 – 888)

Volume 47– No.14, June 2012

Fig 2: The ANN based CTG Data Classifier

dcba

RI 





The Rand index has a value between 0 and 1 with 0 indicating

that the two set of data clusters do not agree on any pair of

points and 1 indicating that the two data clusters are exactly

similar.

Precision

Precision is calculated as the fraction of correct objects among

those that the algorithm believes belonging to the relevant

class. It can be loosely equated to accuracy and it will roughly

answers the question: “How many of the points in this cluster

belong there/ correctly classified?”

The Precision is calculated as :

P(Lr, Si) = nri/ni

for

class Lr of size nr

cluster Si if size ni

nri data points in Si from class Lr

Recall

Recall roughly answers the question: "Did all of the

documents that belong in this cluster make it in?". In other

words, recall is the fraction of actual objects that were

identified.

The recall is calculated as :

R(Lr, Si) = nri/nr

F-Score

F-Score is the harmonic mean of Precision and Recall and will

tries to give a good combination of the two. It is calculated

with the equation:

),(),(

),(),(2

),(

irir

ir SLPSLR

SLPSLR

SLF 





4. RESULTS AND DISCUSSION

4.1 Data Set Information

For evaluating the algorithms under consideration, we used

cardiotocograms data from UCI Machine Learning

Repository.

This data set contains 2126 fetal cardiotocograms belonging

to different classes. The data contains 21 attributes and two

class labels. The CTGs were classified by three expert

obstetricians and a consensus classification label assigned to

each of them. Classification was both with respect to a

morphologic pattern (A, B, C. ...) and to a fetal state (N, S, P).

Therefore the dataset can be used either for 10-class or 3-class

experiments. Here we use this data set for these evaluations.

Attribute Information

1) LB - FHR baseline (beats per minute)

2) AC - # of accelerations per second

3) FM - # of fetal movements per second

4) UC - # of uterine contractions per second

5) DL - # of light decelerations per second

6) DS - # of severe decelerations per second

7) DP - # of prolongued decelerations per second

8) ASTV - percentage of time with abnormal short

term variability

9) MSTV - mean value of short term variability

10) ALTV - percentage of time with abnormal long

term variability

11) MLTV - mean value of long term variability

12) Width - width of FHR histogram

13) Min - minimum of FHR histogram

14) Max - Maximum of FHR histogram

15) Nmax - # of histogram peaks

16) Nzeros - # of histogram zeros

17) Mode - histogram mode

18) Mean - histogram mean

19) Median - histogram median

20) Variance - histogram variance

21) Tendency - histogram tendency

The CTG Data

The Training

Data with Class

Labels

The Testing

Data

Normalize the

Training Data

Normalize the

Testing Data

Train ANN using Training

Data and Class Labels

Classify the Test Data

using the Trained

Network

New Class

Labels

Class Labels of

the Test Data

Measure Performance Using

Rand Index, Precision, Recall and

F-Score

International Journal of Computer Applications (0975 – 888)

Volume 47– No.14, June 2012

22) CLASS - FHR pattern class code (1 to 10)

23) NSP - fetal state class code (Normal=1; Suspect=2;

Pathologic=3)

Class Information

We used the data for a three class classification problem. The

descriptions for the three classes are

Normal

A CTG where all four features fall into the reassuring

category

Suspicious

A CTG whose features fall into one of the non-

reassuring categories and the reassuring category and the

remainder of features are reassuring

Pathological

A CTG whose features fall into two or more of the Non-

reassuring the reassuring category or two or more abnormal

categories.

4.2 The Visualization of Data Space

The following image shows the projection of this 21 attribute

(dimension) data in to a virtual three dimensional data space.

We used three principal components of the data for this

projection. In this plot, the normal CTG data points are shown

in black dots, the suspicious data points are shown as blue

dots, and the Pathologic data points are shown as red ‘x’

mark. This figure roughly shows the distribution of the data in

the virtual space.

Fig 1 : The 3D projection of CTG data

The Numerical Results

The following tables show the average performance of the

three different methods. Here we tabulate the average results

of ten trials. (The detailed results of all the trials can be found

in the tables presented in annexure section)

Table 1.The Performance in terms of

Rand Index and CPU time

Table 2. The Average Performance of ANN Based

Classifier

Metric

Normal

Suspicious

Pathological

Precision

0.9663

0.5897

0.9706

Recall

0.991

0.3688

0.9745

F-Score

0.9784

0.4514

0.9724

The Analysis of Results

The performance of the algorithms in terms of Rand Index

was good and always greater than 0.9. The proposed model

consumed around 2.5 seconds for training and testing. 2.5

seconds is not a big figure to consider and will not be a

obstacle in practical use of the method in real world

application.

0.2

0.4

0.6

0.8

1.2

Precision

Recall

F-Score

Performance Index

Metric

Analysis of Performance of ANN

Normal

Suspicious

Pathological

Fig 4. Performance of ANN

SlNo

Time

0.9146

3.6719

0.9428

2.4844

0.9317

2.3750

0.9266

2.5469

0.9396

2.3750

0.9481

2.4688

0.9395

2.3750

0.9325

2.4688

0.9348

2.4531

0.9178

2.6719

Avg

0.9328

2.5891

International Journal of Computer Applications (0975 – 888)

Volume 47– No.14, June 2012

The above chart (Fig. 4) obviously shows the good

performance of ANN based classifier. It gives good precision,

recall and f-score for normal as well as pathological

records. But giving poor performance in the case of suspicious

records.

Arrived results obviously show that supervised machine

learning based methods can be used for the classification of

CTG data. We realize that there are some training glitches in

the case of suspicious records which caused some unexpected

poor results while classifying the CTG data class

“suspicious”.

5. CONCLUSION

The performance neural network based classification model

has been analyzed with CTG dataset.. According to the

arrived results, the performance of the supervised machine

learning based classification approach provided significant

performance. It was found that, the ANN based classifier was

capable of identifying Normal, Suspicious and Pathologic

condition, from the nature of CTG data with very good

accuracy. ANN based classifier provided excellent

performance in terms of Rand Index, Precision, Recall and F-

Score. It was capable of identifying Normal and Pathologic

condition with almost equal accuracy. But if we carefully see

the comparative chart of ANN (the last figure), we can tell

that, it’s performance to identify the Suspicious CTG pattern

is little bit poor than the other two classes. So future works

my address the way to improve the system to recognize the

Suspicious CTG patterns with the same accuracy.

Even though the ANN based classifier provided excellent

average performance, if we carefully watch the results of ten

trials with ANN (the last table in annexure), we may find

another weakness of this system. If we see some cells of the

columns P2, R2 and F2 there are some bad results

(highlighted in gray colour) during some trials. It means, in

that trial, the system was absolutely incapable of identifying a

single suspicious record. It means, even though we train the

system with all the classes of samples, there is a chance by

which the trained system may be incapable of identifying

suspicious record. That is why we are getting comparatively

poor average performance while classifying suspicious

records. It is a major weakness of the system which should be

overcome in future design. One may address the way to

improve the system for getting proper training with different

classes of CTG patterns. Future works may address hybrid

models using statistical and machine learning techniques for

improved classification accuracy.

ANNEXURE - 1

In the following tables, P1 is the precision for normal record, P2 is the precision for suspicious record, P3 is the precision for

pathological records. R1 is the recall for normal record, R2 is the recall for suspicious record, R3 is the recall for pathological records.

F1 is the f-score for normal record, F2 is the f-score for suspicious records, F3 is the f-score for pathological records.

Table 3. Results with ANN (10 Trials)

SlNo

Precision

Recall

F-score

0.9485

0.0000

0.9787

0.9978

0.0000

0.9787

0.9725

0.0000

0.9787

0.9744

0.7714

0.9785

0.9892

0.5625

0.9681

0.9817

0.6506

0.9733

0.9652

0.7037

1.0000

0.9913

0.3958

0.9574

0.9781

0.5067

0.9783

0.9650

0.6522

0.9485

0.9881

0.3125

0.9787

0.9764

0.4225

0.9634

0.9733

0.7429

0.9785

0.9881

0.5417

0.9681

0.9806

0.6265

0.9733

0.9785

0.7500

0.9691

0.9881

0.5625

1.0000

0.9833

0.6429

0.9843

0.9713

0.7857

0.9583

0.9902

0.4583

0.9787

0.9807

0.5789

0.9684

0.9682

0.7407

0.9474

0.9892

0.4167

0.9574

0.9785

0.5333

0.9524

0.9682

0.7500

0.9785

0.9902

0.4375

0.9681

0.9791

0.5526

0.9733

0.9504

0.0000

0.9688

0.9978

0.0000

0.9894

0.9735

0.0000

0.9789

Avg

0.9663

0.5897

0.9706

0.9910

0.3688

0.9745

0.9784

0.4514

0.9724

International Journal of Computer Applications (0975 – 888)

Volume 47– No.14, June 2012

6. REFERENCES

[1] Xiaojun Chen, Yunming Ye, Xiaofei Xu, Joshua Zhexue

Huang , “A feature group weighting method for

subspace clustering of high-dimensional data”, Pattern

Recognition 45 (2012) 434-446, Elsevier

[2] Shahad Nidhal, M. A. Mohd. Ali1 and Hind Najah, “A

novel cardiotocography fetal heart rate baseline

estimation algorithm”, Scientific Research and Essays

Vol. 5(24), pp. 4002-4010, 18 December, 2010

[3] ANA. KLIMEŠOVÁ, EVA OCELÍKOVÁ,

Multidimensional Data Classification, Proceedings of the

10th WSEAS International Conference on

AUTOMATION & INFORMATION, ISSN: 1790-5117,

ISBN: 978-960-474-064-2

[4] Stirrat, Mills and Draycott, "Notes on Obstetrics and

Gynaecology for the MRCOG, 5th Edition", 04 Aug

2003, ISBN: 9780443072239

[5] Diogo Ayres-de-Camposa, Cristina Costa-Santosb, Joa˜o

Bernardesa, "Prediction of neonatal state by computer

analysis of fetal heart rate tracings: the antepartum arm

of the SisPorto1 multicentre validation study”, European

Journal of Obstetrics & Gynecology and Reproductive

Biology 118 (2005) 52-60.

[6] http://www.academicjournals.org/SRE, ISSN 1992 –

[7] Antonia Costa, MD; Diogo Ayres-de-Campos, PhD;

Fernada Costa, MD; Cristina Santos, MS; Joao

Bernardes, PhD, “Prediction of neonatal academia by

Computer analysis of fetal heart rate and ST event

sibnals” 2009 AJOG – American Journal of Obstetrics

and Gynecology.

[8] Ben Kao, Sau Dan Lee, Foris K.F.Lee, David W.

Cheung, Wai-Shing Ho,” Clustering Uncertain Data

using Voronoi Diagrams and R-Tree Index” IEEE

Transactions on Knowledge and Data Engineering, Vol.

22(9), pp. 1219 – 1233, sep 2010

[9] E. Ocelikova, D. Klimesova, “ Bays Classifier in

multidimensional data classification “ 15th Int.

Conference Process Control 2005, pp. 188-1 – 188-5.

Strbske Pleso, Slovakia.

[10] E. Ocelikovć, J Krištof, “Classification of multispectral

data” Zbornik radova, Volume 25, Number 1(2001).

[11] http://www-h.eng.cam.ac.uk/help/tpl/programs/

matlab.html.

[12] S.Anto, Dr. S.Chandramathi, “Supervised Machine

Learning Approaches for Medical Data Set

Classification – A Review” IJCST Nol. No.2, Issue 4, pp.

234 – 240, Oct – Dec 2011, ISSN : 2229-4333.

[13] Frank, A. Asuncion, UCI Machine Learning Repository

{http://archive.ics.uci.edu/ml}, 2010.

[14] Zhaohong Deng , Kup-Sze Choi , Fu-Lai Chung ,

Shitong Wang, Enhanced soft subspace clustering

integrating within-cluster and between-cluster

information, Pattern Recognition, v.43 n.3, p.767-781,

March, 2010 [doi>10.1016/j.patcog.2009.09.010]

[15] Hans-Peter Kriegel , Peer Kröger , Arthur Zimek,

Clustering high-dimensional data: A survey on subspace

clustering, pattern-based clustering, and correlation

clustering, ACM Transactions on Knowledge Discovery

from Data (TKDD), v.3 n.1, p.1-58, March 2009

[doi>10.1145/1497577.1497578]

[16] http://www.mathworks.in/help/toolbox/nnet/ug/bss33y1-

1.html.

[17] S.Angle Latha Mary, K.R.Shankar Kumar,” Evaluation

of Clustering Algorithm with Cluster Validation Metrics”

European Journal of Scientific Research ISSN 1450-

216X Vol.69 No.1 (2012), pp.61-72

[18] https://sites.google.com/site/dataclusteringalgorithms/fuz

zy-c-means-clustering-algorithm.

[19] http://home.dei.polimi.it/matteucc/Clustering/tutorial_ht

ml/cmeans.html.

[20] YI PENG, GANG KOU“A descriptive framework for

the field of data mining and knowledge discovery”

International Journal of Information Technology &

Decision Making Vol. 7, No. 4 (2008) pp. 639–682.

[21] Michael Lloyd-Williams, “Discovering the hidden

secrets in your data - the data mining approach to

information”, Information Research,

{http://informationr.net/ir/3-2/paper36.html},Vol. 3 No.

2, September 1997.

ResearchGate has not been able to resolve any citations for this publication.

Supervised Machine Learning Approaches for Medical Data Set Classification-A Review

Article

Full-text available

Jan 2011

S. Anto

Clinical decision making, using medical expert systems, is a complex task as it requires more accuracy. Hence the design of such medical expert systems requires relevant and the most suitable machine learning algorithm. This paper reviews the various supervised machine learning classification approaches available along with their functional use in medical field. A number of classification algorithms are considered and reviewed for their relative performances and practical usefulness on different types of health care datasets. This review gives an inference that the performance of the classification technique will depend on the features of the dataset that is analyzed with more emphasis on the health care dataset. While keeping the classification accuracy and speed as major criteria of this study, it is inferred that the SVMs and Neural Networks are more suitable for medical dataset classification with higher performance.

Ethics in Obstetrics and Gynaecology

Article

Full-text available

Jun 1995

Gordon Stirrat

Ed Susan Bewley, R Humphry Ward RCOG Press, pounds sterling35, pp 364 ISBN 0 902331 69 8The front line role of obstetricians and gynaecologists in dealing with the major ethical flashpoints in medicine makes it vital that the ethical basis for good practice be given a high priority by the Royal College of Obstetricians and Gynaecologists, publishers of Ethics in Obstetrics and Gynaecology.Gordon Dunstan outlines what is needed for consistent translation of moral theory into practical judgments. At the clinical level (which must include the conduct of research) he asserts that “it concerns judgement, choices and decisions taken within certain governing relationships.” …

Clustering Uncertain Data Using Voronoi Diagrams and R-Tree Index

Article

Full-text available

Sep 2010

Abstract-We study the problem of clustering uncertain objects whose locations are described by probability density functions (pdfs). We show that the UK-means algorithm, which generalizes the k-means algorithm to handle uncertain objects, is very inefficient. The inefficiency comes from the fact that UK-means computes expected distances (EDs) between objects and cluster representatives. For arbitrary pdfs, expected distances are computed by numerical integrations, which are costly operations. We propose pruning techniques that are based on Voronoi diagrams to reduce the number of expected distance calculations. These techniques are analytically proven to be more effective than the basic bounding-box-based technique previously known in the literature. We then introduce an R-tree index to organize the uncertain objects so as to reduce pruning overheads. We conduct experiments to evaluate the effectiveness of our novel techniques. We show that our techniques are additive and, when used in combination, significantly outperform previously known methods.

A Descriptive Framework for the Field of Data Mining and Knowledge Discovery

Article

Full-text available

Dec 2008

Despite the rapid development, the field of data mining and knowledge discovery (DMKD) is still vaguely defined and lack of integrated descriptions. This situation causes difficulties in teaching, learning, research, and application. This paper surveys a large collection of DMKD literature to provide a comprehensive picture of current DMKD research and classify these research activities into high-level categories using grounded theory approach; it also evaluates the longitudinal changes of DMKD research activities during the last decade.

Evaluation of Clustering Algorithm with Cluster Validation Metrics

Article

Jan 2012

S. Angel Latha Mary

Clustering in data mining is a discovery process that groups a set of data such that the intracluster similarity is maximized and the intercluster similarity is minimized. Existing clustering algorithms are designed to find clusters that fit some static models. In this work, we are going to evaluate the performance of some of the popular clustering algorithms chameleon, DBSCAN, FC-Mean and K-means algorithm. Clustering in general, the quality of the discovered clusters are validated using suitable cluster validation metrics. The performance of the algorithms was tested with synthetic as well as real datasets using two cluster validation metrics 1. The Generalised Dunn Index and 2. The Davies-Bouldin Index.We demonstrate the validation measures with a number of data sets that contain points in 2D space, and contain clusters of different shapes and noise. However, if we measure the performance with a cluster validation metric, then it will give entirely different result. Experimental results on these data sets show that DBSCAN can discover natural clusters

A novel cardiotocography fetal heart rate baseline estimation algorithm

Article

Jan 2011
SCI RES ESSAYS

Cardiotocography (CTG) is a simultaneous recording of fetal heart rate (FHR) and uterine contractions (UC) and it is one of the most common diagnostic techniques to evaluate maternal and fetal well-being during pregnancy and before delivery. FHR patterns are observed manually by obstetricians during the process of CTG analyses. For the last three decades, great interest has been paid to the fetal heart rate baseline and its frequency analysis, as a base for a more objective analysis of the CTG tracings. Changes in the fetal heart rate pattern relative to contractions provide an induction of fetal condition. This paper proposed new algorithm for FHR baseline calculation.In this work, we present an algorithm for estimating baseline as one of the most important features present in the FHR signal. An algorithm based on digital CTG using Mathlab programming to estimate FHR baseline, the work in this paper rely on detection of baseline values which gives an indication of the fetal status and health condition. The results were compared with the opinion of experts (obstetricians) baseline estimation and one researcher in the same field of study. The obtained results showed slight difference with the experts opinion as a first step for further work to estimate the other parameters of the CTG. Key words: Cardiotocogram (CTG), fetal heart rate (FHR), baseline (BL), uterine contraction (UC), electronic fetal heart rate monitoring (EFM), Royal College of Obstetricians and Gynecologists (RCOG).

A feature group weighting method for subspace clustering of high-dimensional data

Article

Jan 2012
PATTERN RECOGN

This paper proposes a new method to weight subspaces in feature groups and individual features for clustering high-dimensional data. In this method, the features of high-dimensional data are divided into feature groups, based on their natural characteristics. Two types of weights are introduced to the clustering process to simultaneously identify the importance of feature groups and individual features in each cluster. A new optimization model is given to define the optimization process and a new clustering algorithm FG-k-means is proposed to optimize the optimization model. The new algorithm is an extension to k-means by adding two additional steps to automatically calculate the two types of subspace weights. A new data generation method is presented to generate high-dimensional data with clusters in subspaces of both feature groups and individual features. Experimental results on synthetic and real-life data have shown that the FG-k-means algorithm significantly outperformed four k-means type algorithms, i.e., k-means, W-k-means, LAC and EWKM in almost all experiments. The new algorithm is robust to noise and missing values which commonly exist in high-dimensional data.

Enhanced soft subspace clustering integrating within-cluster and between-cluster information

Article

Mar 2010
PATTERN RECOGN

While within-cluster information is commonly utilized in most soft subspace clustering approaches in order to develop the algorithms, other important information such as between-cluster information is seldom considered for soft subspace clustering. In this study, a novel clustering technique called enhanced soft subspace clustering (ESSC) is proposed by employing both within-cluster and between-class information. First, a new optimization objective function is developed by integrating the within-class compactness and the between-cluster separation in the subspace. Based on this objective function, the corresponding update rules for clustering are then derived, followed by the development of the novel ESSC algorithm. The properties of this algorithm are investigated and the performance is evaluated experimentally using real and synthetic datasets, including synthetic high dimensional datasets, UCI benchmarking datasets, high dimensional cancer gene expression datasets and texture image datasets. The experimental studies demonstrate that the accuracy of the proposed ESSC algorithm outperforms most existing state-of-the-art soft subspace clustering algorithms.

Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

Article

Jan 2009

As a prolific research area in data mining, subspace clustering and related problems induced a vast quantity of proposed solutions. However, many publications compare a new proposition—if at all—with one or two competitors, or even with a so-called “naïve” ad hoc solution, but fail to clarify the exact problem definition. As a consequence, even if two solutions are thoroughly compared experimentally, it will often remain unclear whether both solutions tackle the same problem or, if they do, whether they agree in certain tacit assumptions and how such assumptions may influence the outcome of an algorithm. In this survey, we try to clarify: (i) the different problem definitions related to subspace clustering in general; (ii) the specific difficulties encountered in this field of research; (iii) the varying assumptions, heuristics, and intuitions forming the basis of different approaches; and (iv) how several prominent solutions tackle different problems.

Discovering the Hidden Secrets in Your Data - the Data Mining Approach to Information

Article

Jan 1997
INFORM RES

Michael D. Williams

Nowadays, digital information is relatively easy to capture and fairly inexpensive to store. The digital revolution has seen collections of data grow in size, and the complexity of the data therein increase. Advances in technology have resulted in our ability to meaningfully analyse and understand the data we gather lagging far behind our ability to capture and store these data . It is often the case that large collections of data, however well structured, conceal implicit patterns of information that cannot be readily detected by conventional analysis techniques . Such information may often be usefully analysed using a set of techniques referred to as knowledge discovery or data mining. These techniques essentially seek to build a better understanding of data, and in building characterisations of data that can be used as a basis for further analysis, extract value from volume. This paper describes a number of empirical studies of the use of the data mining approach to the analysis of health information.

Classification of Cardiotocogram Data using Neural Network based Machine Learning Technique

Abstract

Recommended publications

Enhanced Optimal Feature Selection Techniques for Fetal Risk Prediction using Machine Learning Algor...

Classification of Cardiotocogram Data using Neural Network based Machine Learning Technique

Incapable of identifying suspicious records in CTG data using ANN based machine learning techniques

Classification of Cardiotocogram Data using Neural Network based Machine Learning Technique