ChapterPDF Available

Classification of Soft Tissue Tumors by Machine Learning Algorithms

Authors:
0
Classification of Soft Tissue Tumors
by Machine Learning Algorithms
Jaber Juntu1, Arthur M. De Schepper2, Pieter Van Dyck2, Dirk Van Dyck1,
Jan Gielen2, Paul M. Parizel2and Jan Sijbers1
1Universiy of Antwerp, Physics Department, Vision Lab.
2Dept. of Radiology, Antwerp University Hospital, University of Antwerp
Belgium
1. Introduction
MR imaging is currently regarded as the standard diagnostic tool for detection and grading
of soft tissue tumors (STT ) (De Schepper et al. (2005)). Soft tissue is a term describing all
the supporting, connecting or tissues surrounding other structures and organs of the body
such as fat, muscle, blood vessels, deep skin tissues, nerves and the tissues around joints
(synovial tissues). Soft tissue tumors can grow almost anywhere in the human body. Soft
tissue sarcomas, which are the malignant type of STT , are grouped together because they
share certain microscopic characteristics, have similar symptoms, and are generally treated in
similar ways. Radiologists often look for certain features in the MR image to differentiate
benign from malignant STT tumors (Juan et al. (2004); Mutlu et al. (2006)). Although the
signal characteristics of both benign and malignant tumors frequently overlap, some MR
image features are more highly correlated to the benign or the malignant types of STT , see
De Schepper et al. (2000) and De Schepper & Bloem (2007). For example, the most commonly
used individual parameters for predicting malignancy are the inhomogeneity (texture) and
the intensity (gray level) of the MRI signal with different pulse sequences (De Schepper et al.
(2005); Hermann et al. (1992)). Inhomogeneity of the tumor region on T1-weighted MR images
is a very good indicator of the malignancy of the tumor because 90% of malignant tumors
are inhomogeneous and show a disorganized textured pattern of the MRI signal intensity
(Weatherall (1995)). This pattern is formed as a result of the losses of tissue structure and
the changes of the extracellular matrix (ECM) by cancer. The study by ( Hermann et al.
(1992)) reported a sensitivity of 72% and specificity of 87% in predicting malignancy based
on visual comparison of texture in the tumor regions in T1-MR images. The reason for the
large difference between the sensitivity and the specificity in this study is the difficulty of
perceiving texture in some of the malignant tumors. The limited ability for human to perceive
and discriminate between textures is well known for quite some time (Julesz (1975); Julesz
et al. (1973)). Computer aided diagnostic systems can improve the radiologists performance
in identifying the pathological type (i.e. benign or malignant) of a soft tissue tumor from
MR images (Meinel et al. (2007)). Eventhough visually comparing the textures of benign
tumor and malignant tumor sometimes show no difference, the extracted numerical values
by texture analysis are quite different. Figure 1 shows subimages of a benign and a malignant
tumors and the values of some of the extracted texture features. Such an example shows that
3
2 Will-be-set-by-IN-TECH
texture analysis can be used for obtaining information that is not visible to the human eye.
The reader can refer to (Materka & Strzelectky (1998); Tuceryan & Jain (1998); Wagner (1999))
as excellent references to texture analysis.
In the last few years there has been growing interest in the use of machine learning classifiers
for analyzing MRI data. The main aim of this chapter is to train and test several machine
learning classifiers with texture analysis features extracted from MR images of soft tissue
tumors. The present chapter will also serve as an introductory tutorial by providing a
systematic procedure to build and evaluate a machine learning classifier that can be used
for practical applications. The typical steps to build machine learning classifier consist of
feature extraction, feature selection, classifier training and evaluation of the results. Several
studies have tackled the problem of texture analysis for discriminating between benign and
malignant tumors for specific type of malignancy, for example, the brain (Mahmoud-Ghoneim
et al. (2003)) the liver (Jirák et al. (2002)) and the breast (Huang et al. (2006)). However, most
papers did not follow the recommended approach for building machine learning systems (for
an example see Salzberg (1997)) and left some unanswered questions. This research aims
at answering some questions related to the problem of texture analysis of STT , such as the
classifiers complexity, the effect of the training data set on the classifier behaviour and the
appropriate size of the training data that can be used to train a machine learning classifier and
obtain good generalization performance. In the following sections, we will go through the
process of building and testing several machine learning classifiers as shown in Fig. 2.
We warn the reader that the training dataset is not meant to train the classifier per se,as
the name implies, but should be considered as a representative statistical sample from the
population of STT . We assume that the training and testing data samples are randomly,
identically and independently sampled from the population of STT (i.e, it is an idd sample).
The process of training and testing the classifier is a sort of statistical parameter estimation
problem where in that case the parameter of interest is the error rate of the classifier
performance in unseen data. As such, all the experiments in the following sections are in fact
to study how the classifier perform in other unseen data from the same STT population. To
put a classifier in real practice, the classifier should be trained and tested with several datasets
sampled from the same population with the same procedure as outlined in the following
sections. Once the classifier evaluation is finished, all the available data can be used to train
the final classifier. The classifier should be comprehensively tested based on a prospective
study before using the classifier. A shorter preliminary version of this chapter was published
in Juntu et al. (2010).
2. Patients data set and the MR images
A large database of multicenter, multimachine MR images was collected by the University
Hospital Antwerp (UZA) from different radiology centers for the purpose of conducting
scientific research. At the start of this study, there was a real concern that texture features
could be more sensitive to image variation due to imaging with different MRI systems or
changes in MRI acquisition parameters than variation due to changes in texture as a result of
pathological changes. However, a recent study by Mayerhoefer et al. (2005), clearly showed
that the difference in texture features extracted from MR images obtained with different
machine units seems to have only small impact on the results of tissue discrimination. In the
present study, a database of T1-MR images of 86 patients having benign soft tissue tumors and
49 patients having malignant tumors were used in this retrospective study. All malignant and
benign masses were histologically confirmed. We discarded all MR images that showed severe
54 Soft Tissue Tumors
Classification of Soft Tissue Tumors
by Machine Learning Algorithms 3
Fig. 1. An example of benign and malignant tumors texture
imaging artifacts or that were corrupted by a high level of bias field inhomogeneity signal.
From the tumor regions in the MR images, we cut square subimages of size 50 ×50 pixels for
texture features computation. The physical size of that area is not fixed but it depends on the
image acquisition parameters. However, the actual size of that area will not effect the values
of the extracted features. To increase the size of the training dataset, we selected several tumor
regions from the MR images for every patient. Hence, the total size of the dataset available
for training consisted of 253 benign and 428 malignant subimages of size 50 ×50 pixels each.
In order to preserve texture information, we avoid preprocessing the subimages. However,
histogram equalization was applied to all the tumor subimages since some texture features
such as the first order texture features are sensitive to graylevel variation.
3. Texture computation
Texture can be characterized and described in different ways using various sets and
combinations of parameters. Most texture features computation was done using the software
package MaZda 3.20 which allows the computation of texture features based on statistical,
wavelet filtering, and model-based methods of analyzing texture (Castellano et al. (2004)). We
also wrote other Matlab programs to calculate some texture features such as the Haralick’s
texture features to have a better and fine control of adjusting the parameters that effect the
extracted features. To ensure the consistency of the calculated texture feature across all the
tumor subimages, we wrote a MaZda macro script that reads the tumor subimages and
calculates tumor texture with the same texture analysis parameters setting. The extracted
texture features were saved in a text file for feature selection and classification. The following
is a short description of the texture features that were computed from the tumor subimages,
which are also summarized in Table 1 for easy reference:
First order statistics: extract texture statistics based on a function of a single pixel. The
simplest approach is to construct a histogram for the image of interest. The histogram is
converted into probability function by dividing the values in the histogram by the total
55
Classification of Soft Tissue Tumors by Machine Learning Algorithms
4 Will-be-set-by-IN-TECH
Fig. 2. Block diagram of the chapter
number of pixels in the image. A set of statistical parameters from the probability density
function are calculated such as the mean, the variance, the skewness, and the kurtosis.
Second order statistics: the Haralick’s texture features and the absolute gradient distribution
are used in this study. In this method of texture analysis the correlation between two
or more neighborhood pixels is taken into account. Since complex texture patterns are
formed by the interaction between more than one pixel, second order statistics might
provide extra texture information that can not be extracted based on first order statistics
of the texture. The Haralick’s texture analysis (Haralick et al. (1973)) is probably the most
famous technique of second order texture analysis methods. It is based on the calculation
of statistics from a function of two variables that measures the probability of occurrence
of a pair of pixels that are separated by dpixels with an angle θ. We calculated 11
56 Soft Tissue Tumors
Classification of Soft Tissue Tumors
by Machine Learning Algorithms 5
different Haralick’s features from the co-occurrence matrix. The co-occurence matrix is
calculated for every two pixels inclined by an angle θand separated by a distance d.To
take the scaling and rotation of texture into account, we calculated the Haralick’s features
from the co-occurrence matrices calculated with angles {0,45
,90
, 135}and distances of
{1, 2, 3, 4, 5}pixels. The absolute gradient texture features are also included to incorporate
texture features that are invariant to gray-level scaling caused by bias field inhomogeneity.
Every pixel in the image was replaced by the absolute gradient which was calculated
from a window of size 3 ×3 around the pixel by calculating the absolute of the squared
summation of the difference between the two pixels above and down the center pixel and
the two pixels on the right and left. Doing that for all pixels resulted in a gradient image
from which several statistical parameters could be obtained: the mean, the variance, the
skewness, and the kurtosis.
Higher order statistics: used to capture texture information which are dependent on the
interaction between several neighborhood pixels. We selected two different approaches,
the run-length gray-level matrix approach were a consecutive set of pixels with the
same gray level value are counted and the result is stored in a 2D matrix indexed by the
gray-level value and length of the gray-level run. Several statistics are calculated from
the 2D matrix.
write a mathematical function or model that describes the texture, for example the
autoregressive texture model. The basic idea of autoregressive models for texture is to
express a gray level of a pixel as a function of the gray levels of its neighborhood pixels
Mao & Jain (1992). The related model parameters for one image are calculated using a
least squares technique and are used as texture features. This approach is similar to the
Markov random fields.
Filtering method: The image is split into subbands with bandpass filters such as the wavelet
transform. The energy of the sub-bands are used as a texture features.
After the texture analysis step, each tumor subimage is encoded by a feature vector as shown
in Fig. 3. The texture features are labeled as {f1,f2, ......., f290 }(see Table 1).
Fig. 3. Texture analysis features
4. Feature selection
Feature selection was used to remove redundant features. This step is very important because
it improves the performance of the learning models and reduces the effect of the curse of
dimensionality. Feature selection also speeds the learning process and improves the model
interpretability. Deciding which feature to keep, because it is relevant, and which one to
discard, is largely dependent on the context. To perform an unbiased feature selection, we
tested several feature selection techniques. We experimented with the following feature
selection methods:
57
Classification of Soft Tissue Tumors by Machine Learning Algorithms
6 Will-be-set-by-IN-TECH
Methods Calculated parameters
First order: {f1,..., f10}
histogram mean, minimum, variance, skewness, kurtosis
1%, 10%, 50%, 90% and 99% percentiles.
Second Order: {f11 ,...,f250}&{f271 ,..., f277}
coocurrence matrix angular second moment, contrast, sum of squares,
{ angles=θ=0,45
,90
, 135inverse difference moment, sum average, correlation,
and distances=1,2,3,4,5 } entropy, difference variance, difference entropy.
absolute gradient distribution mean of absolute gradient, variance of absolute gradient
skewness of absolute gradient, kurtosis of absolute gradient.
Higher order: {f251 ,...,f270 }&{f278,...,f282}
runlength graylevel matrix short run emphasis moment, long run emphasis moment,
run length nonuniformity, fraction of image in run.
autoregressive texture model θ1,θ2,θ3,θ4,σ.
Filtering technique: {f283 ,...,f290 }
wavelet energies of wavelet coefficients of subbands at successive scales.
Table 1. Texture analysis methods used in this study and the corresponding texture features
Unsupervised feature selection techniques: these methods do not use the class labels and the
selected features are strongly dependent on the sample distribution of the pixels graylevel
values. We selected texture features subsets by forward, backward, bidirectional, and
greedy stepwise search methods and two feature ranking methods, namely, the chi-squares
statistics and the information gain criteria ranking methods.
Supervised selection techniques: these techniques use class labels for guiding the feature
selection process, thus, the selected features are the ones that improve the discrimination
between benign and malignant tumors. We used the C4.5 decision tree algorithm and the
support vector machines as a wrappers.
Table 2 lists all the feature selection techniques that were tested in this study and their selected
subset features. It is not surprising that the 8 feature selection methods selected different
features subsets because each one has a different measure for feature relevance. However,
feature selection methods that belong to the same group generally selected almost similar
features. The selected features subsets were used as an input to a simple Bayes classifier
to evaluate the efficacy of the texture features subsets. The results of the classification are
listed in Table 2. We also listed the classification accuracy (Acc%), the True Positive (TP),
the True Negative (TN) and the Area Under the Curve (AUC) of the ROC. The measure that
is generally recommended to use is the AUC, since it is a global measure and insensitive to
the data distribution. In the last row of Table 2, we included the performance of the Bayes
classifier using the full textures features set for comparison. Looking at Table 2, one can notice
that the classification results with the feature subsets selected by the feature ranking methods
are worse than classification using the full texture feature since their AUC values are 0.72 and
0.75, respectively, while the full texture features classification has an AUC value of 0.78. The
best texture features subset was the one that had the highest AUC value. The texture features
subset with the highest AUC is the forward selection method which was used for training and
testing the classifiers.
5. The trained classifiers
The main purpose of the training data is to infer a mathematical decision function or an
algorithm for making prediction. Thereby, a given training data set is used to optimize the
parameters of a machine learning classifier, which then results in a simple mathematical
function or expression that can be used for making prediction. If the same classifier is trained
58 Soft Tissue Tumors
Classification of Soft Tissue Tumors
by Machine Learning Algorithms 7
Method The best selected features ACC%TP TN AUC
Forward selection f4,f6,f7,f8,f66,f169 ,f255,f263,f274 ,f279,f282,f286 76.80 0.80 0.74 0.87
Backward selection f4,f6,f7,f8,f114,f253 ,f263,f274,f279 ,f281,f282,f286 77.70 0.80 0.74 0.85
Bidirectional search f4,f6,f7,f8,f66,f169 ,f255,f263,f274 ,f279,f282,f286 77.10 0.79 0.73 0.86
Greedy stepwise search f4,f6,f7,f8,f66,f253 ,ff263,f274,f279 ,f282,f286 78.00 0.83 0.69 0.83
Ranking with chi-squares statistics f7,f16 ,f37,f45,f46 ,f52,f251,f253 ,f255,f263,f265 ,f268 67.99 0.65 0.73 0.72
Ranking with information gain f7,f16,f37 ,f45,f46,f52 ,f251,f253,f254 ,f255,f268,f282 ,f286 65.34 0.56 0.81 0.75
C4.5 decision tree wrapper f6,f21,f38 ,f49,f56,f64 ,f118,f164,f253 70.77 0.70 0.73 0.78
Best features with SVM wrapper f5,f6,f13,f98 ,f172,f178,f216 ,f217,f256 78.00 0.86 0.64 0.84
Full texture features set f1,f2, ..., f290 73.71 0.74 0.73 0.78
Table 2. Bayes classifier results for the best selected texture features subsets
on a different training data drawn independently and identically from the same problem
domain, we expect to obtain a decision function with a similar performance. If the classifier
performance stays the same independent of training with a specific training dataset, the
classifier then learned how to differentiate benign from malignant tumors from the training
data. However, if the classifier performance changes considerably by changing the training
dataset, then that classifier can not be used for prediction. However, in principle the decision
function (i.e. the classifier) can not be made completely independent from the structure of
the training data and the complexity of the learning algorithm. To isolate all contributing
factors that might interfere with training the classifier and to minimize the bias in the stated
results, we systematically applied several machine learning evaluation strategies. First, we
trained several classifiers that belong to different machine learning algorithms on the same
texture features data. The selected classifiers are trained with crossvalidation procedure to
make better use of the training data. The crossvalidation procedure also tries to minimize the
effect of the probability distribution of a specific training dataset on the classifier performance.
Second, we study the effect of changing the size of the training data set on the classifiers
performance by plotting the learning curves that show the error rate of the trained classifiers
as a function of the size of the training data set. Third, we used some statistical tests
for comparison between the classifiers performance. We also plotted the ROC (Receiver
Operating Curve) and the Cost curves to analyze the classifiers’ performance. Finally, we
applied the McNemar’s statistical test to compare the performance of the best classifier against
the radiologists’ performance.
From several machine algorithm groups, we selected the following classifiers:
Linear classifier: This classifier assumes that the benign and the malignant classes have the
same covariance matrix but different means. It estimates the covariance matrix from the
full training data and assigns a new case to the class with the highest probability. Such
classifier is able to separate benign and malignant tumors by a simple linear decision
surface. The probability distribution of the full training dataset is assumed to be normally
distributed.
Quadratic classifier: This classifier is more complex than the linear classifier since it estimates
different matrices for the means and covariance of the benign and the malignant classes.
Such classifier is able to separate the benign and the malignant tumors by a quadratic
nonlinear decision surface. The probability distributions of the benign and the malignant
classes are assumed to be normally distributed but not necessary with the same covariance
matrices.
Nonparametric density estimation classifiers: Parzen classifier and k-NN nearest neighborhood
classifier. Both classifiers estimate the empirical probability density function of the benign
59
Classification of Soft Tissue Tumors by Machine Learning Algorithms
8 Will-be-set-by-IN-TECH
and the malignant classes from the training data instead of assuming certain probability
distribution function such as the linear and quadratic classifiers.
Decision trees classifier: Such classifier uses logical rules to separate the benign form the
malignant tumors regardless of the probability distribution of the training data.
Back-propagation neural network: The NN-classifier separates the tumors by high nonlinear
decision surface. The neural network uses an iterative optimization algorithm to find the
weights of the neural network from the training data.
Support vector machine classifier: The SVM classifier simplifies the classification problem by
transforming the input space into high dimensional space such that the classification
problem become a linear one and easier to solve. The SVM classifier does not depend on
the probabilistic distribution of the training dataset and has the ability to generalize quite
well for classification problems of varied degrees of complexities. During the training
process, a quadratic optimization algorithm is used to iteratively adjust the complexity of
the decision function to adopt to the problem domain.
In the following sections, we describe several tests that were performed to study the effect
of the size of the training data set on the classifier performance. Additionally, we tested
the complexity of the decision function, analyzed the classifier performance and statistically
compared the performance of two classifiers. Finally, we tested the classifier performance
against the radiologists’ performance.
6. The size of the training data and the classifiers performance
The classifier learns the classification function from the training data. The training data
represents a small sample from the population of soft tissue tumors and hence the size of
the training data has an impact on the trained classifier. We run the learning curve test
to study the effect of the size of the training data set on the classifier performance. Using
a small subset of the training data, we tuned the parameters for each classifier as follows.
The back-propagation neural network has two hidden layers, an input layer of 12 nodes (i.e,
number of selected texture features by the forward selection method) and an output layer
with two nodes corresponding to the benign and the malignant classes. The SVM classifier
is trained with an RBF kernel which is tuned with a grid search algorithm that resulted in a
(σ=10000) and a cost coefficient (C=1.0). We used the PRTOOLS 4.0 matlab toolbox to run
this experiment. We left the parameters of the decision trees and the Parzen classifier to their
default values, which forces the PRTOOLS toolbox to tune them automatically to their best
values. We trained the 7 classifiers with different sizes of the training data set. At each specific
size of the training data set, we measured the error rate of all the classifiers. For each specific
size of the training data, we repeated the experiment 10 times and the average error rate
was calculated. Figure 4 shows the learning curves of the 7 trained classifiers. The learning
curves show some interesting facts about the problem domain. First, the learning curves are
smooth which is a good indicator of the classifiers stability against changes in the training data
distribution . The smoothness of the learning curves is also a necessary condition for carrying
some statistical tests that we used to compare the classifiers performance(Dietterich (1998)).
Second, the 7 classifiers learned very well with few training samples. Most classifiers achieved
an error rates between 0.251 and 0.198 after training with as few as 50 training samples.
As we increase the size of the training data set, the error rate decreases very slowly after
training by 50 samples. This observation indicates that a small training data set is sufficient
to get good generalization performance. Increasing the size of the training set after certain
60 Soft Tissue Tumors
Classification of Soft Tissue Tumors
by Machine Learning Algorithms 9
limit seems to have little impact on improving the classifiers performance any further. The
third observation is related to the complexity of the classifiers. Simple classifiers such as the
k-NN nearest neighborhood classifier and the SVM with an RBF kernel with large bandwidth
achieved lower error rates compared to the neural network classifier. This observation is an
indication that the decision surface that separates the benign from the malignant tumors based
on texture features is a very simple mathematical function which we investigate further in the
following section. Classification problems that procedure linear or simple decision function
are less likely to overfit the training data and often generalize and predict very well in unseen
data.
Fig. 4. The learning curves of the 7 trained classifiers
7. The complexity of the decision function
The learning curves from the last section showed that classifiers which produce simple
decision functions generalize better since they have the smallest error rate on the testing
samples. To check that conclusion we ran a test using an SVM classifier with a polynomial
kernel that produces a polynomial decision function with a varied degree of complexity. We
varied the degree of the polynomial kernel gradually from 1 to 20 and at each degree of the
polynomial, we run the experiment 10 times using a crossvalidation procedure. Each point
in the learning curves is the average of the error rates of ten different experiments. Figure 5
shows the error rate of the polynomial classifier versus the degree of the polynomial kernel
function. The plot clearly shows that the error rate is minimum at a polynomial decision
function of the 4th degree. The error rates for the linear classifier (a 1st degree polynomial) and
the quadratic classifier (a 2nd degree polynomial) are large since they under-fit the training
data. A polynomial classifier higher than the 4th degree also have high error rate since it
61
Classification of Soft Tissue Tumors by Machine Learning Algorithms
10 Will-be-set-by-IN-TECH
overfit the training data. This explains why in Fig. 4 that the simple linear classifier and the
neural network classifier both have high error rates compared to other classifiers, because the
linear classifier is too simple and the neural network classifier is too complex for the problem
domain. That also explains why the SVM classifier has a good classification performance
because it is very flexible and can adept to classification problems of varied complexity.
Fig. 5. The error rate versus the complexity of a polynomial classifier
8. Analyzing the classifiers performance
To gain more insight into the classifiers’ performance, we trained the 7 classifiers using the
full data set with a 10-folds crossvalidation procedure. In Fig. 6 and Fig. 7, we plotted the
ROC curves and the Cost curves of the 7 classifiers. In the ROC curves plot, the best curves
are at the top of the plot. In the ROC curves, we see that the classifiers are ranked, according
to an increase in performance, as follow: the decision trees, the neural networks, the linear
classifier, the quadratic classifier and the k-NN classifier. However, there is an ambiguity
about the ranking of the Parzen and SVM classifiers because their ROC curves intersect. In
the Cost-curve plot, the classifiers are ranked in the same order as the ROC curves. However,
this time the curves of the best classifiers are at the bottom of the plot. The Cost-curves of the
Parzen classifier and the SVM classifier have the same normalized expected cost value for a
probability cost function (PCF) between 0.45-0.75 where both curves intersect. For a value of
PCF <0.45, the SVM classifier performance is better than the Parzen classifier while for the
value of PCF >0.75 the Parzen classifier performance is better. In other words, both classifiers
perform equally well if the cost of classifying benign and malignant tumors is kept the same.
However, if we would like to change the cost of classifying benign and malignant tumors, for
example, we decided to give more cost for missing malignant tumors than missing benign
tumors then both classifiers perform differently (see Holte & Drummond (2011)). The later
observation explains why the SVM and Parzen classifier have an overlapping performance
which is easy to explain from the ROC curves.
62 Soft Tissue Tumors
Classification of Soft Tissue Tumors
by Machine Learning Algorithms 11
Fig. 6. ROC curves of the trained classifiers
Fig. 7. Cost curves of the trained classifiers
9. Statistical comparison between two classifiers
Classifier performance is a function of several factors including the statistical distribution
of the training and testing data, the internal structure of the classifier and the inherent
randomness in the training process. Even if we train two different classifiers with the same
dataset their classification error rates will not be necessary the same. That is because classifiers
are trained with different algorithms and with different optimizations criteria and different
parameter settings. The most effective way to compare classifiers is to empirically train
63
Classification of Soft Tissue Tumors by Machine Learning Algorithms
12 Will-be-set-by-IN-TECH
and test the classifiers using multiple training and testing data. This procedure is repeated
several times and then some statistical tests should be applied to assess their performance.
Dietterich (1998) described an 5 ×2cv algorithm that can be used to statistically compare the
performance of two machine learning classifiers in the same classification problem. The name
of the test is an abbreviation for "5 iterations 2-fold crossvalidation paired t-Test". The same test
can be used to check if one classifier outperforms another classifier on a specific classification
task. Let Dbe a dataset which is divided into five folds F1,F2, .., F5and let Aand Bbe two
classifiers that their performance will be compared. Let p{i}
jstands for the difference in errors
between the two classifiers in iteration jfold replication i. Then, the steps of the algorithm are
as follows:
divide the first fold F1into two equal-sized parts t1and t2. Train both classifiers Aand B
using t1and test them using t2to obtain two error estimations e1
Aand e1
B. Calculate the
difference in errors p(1)=e1
Ae1
B
swap t1and t2such that the classifiers are trained with t2and tested with t1. Re-train both
classifiers and calculate new errors and new difference in errors p(2)=e2
Ae2
B
for this crossvalidation run, calculate the mean ¯
p=p(1)+p(2)
2and the variance s2=(p(1)
¯
p)2+(p(2)¯
p)2
repeat the same procedure for the remaining folds {F2, ..., F5}
Let p(1)
1denotes the difference p(1)from the first run, and s2
idenote the estimated variance for
run i,i=1, ..., 5. Calculate the ˜
t-statistics using:
˜
t=p(1)
1
(1/5)5
i=1s2
i
(1)
Note that only one of the ten differences is used in the above expression. Dietterich (1998)
has shown that under the null hypothesis, ˜
tis approximately a t-distributed with 5 degrees
of freedom. The test can be used to check if two constructed classifiers have a similar error
rate on new example. The null hypothesis indicates that the two classifiers have the same error
rate and the alternative hypothesis indicates different error rates. We reject the null hypothesis
with 95 percent confidence if ˜
tis larger than the tabulated t-statistics.
Note that, there are 10 different values that can be placed in the numerator of Eq.(1) leading to
10 possible statistics. Selecting different values in the numerator of Eq.(1) should not effect the
results of the test. Practically, this is not always the case as shown in Alpaydin (1999), which
proposed a modified test called the combined 5 ×2cv. The modified Dietterich test combines
the results of the 10 possible statistics and uses more degrees of freedom which promises to
be more robust and has better statistical power than the original Dietterich test. The new test
calculates:
˜
f=
5
i=1
2
j=1p(j)
i2
25
i=1s2
i
Fn,m(2)
and tests the estimated ˜
fagainst an F-statistics with 10 and 5 degrees of freedom. Reject the
null hypothesis if ˜
fis larger than the tabulated F-statistics value (i.e., F=4.74), otherwise,
accept the null hypothesis.
64 Soft Tissue Tumors
Classification of Soft Tissue Tumors
by Machine Learning Algorithms 13
Exp# e(1)
Ae(1)
Bp(1)e(2)
Ae(2)
Bp(2)s2
1 0.3853 0.1618 0.2235 0.3588 0.2029 0.1559 0.0023
2 0.3382 0.1735 0.1647 0.1353 0.1706 -0.0353 0.0200
3 0.4265 0.1794 0.2471 0.3176 0.2000 0.1176 0.0084
4 0.3824 0.1735 0.2088 0.3618 0.1529 0.2088 0.0
5 0.3912 0.1794 0.2118 0.3529 0.1647 0.1882 0.0003
Table 3. Error rates, differences and variances s2of the SVM classifer (A) and the Parzen (B)
using 5 ×2-fold crossvalidation on tumors’ texture.
We selected two classifiers from Fig. 7, namely, the SVM and the neural networks classifiers.
We run the test to check whether both classifiers have similar performance or have different
performance. The results of running the 5-iterations 2-fold crossvalidation algorithm are
summarized in Table 3. Using Eq.(2), we calculated ˜
f=5.58 which is larger than the the
theoretical F-statistics value. Hence, the null hypothesis that both classifiers have similar
error rates was rejected. Therefore, according to the combined 5 ×2cvtest, the SVM classifier
had better performance than the neural network classifier with 95% statistical confidence.
In conclusion, the test shows that some classifiers can have better performance than other
classifier when trained with the same training dataset.
10. Machine learning versus radiologists performance
An important question is how machine learning classifiers perform compared to radiologists.
In the previous section, we used the modified 5 ×2cv Dietterich test to compare two
classifiers. However, we can not use the same test to compare a classifier performance against
the radiologists diagnosis since the radiologist results can not be repeated. Instead, we applied
the McNemar’s test (Alpaydin (2001)). To apply McNemar’s test, we first have to express
the results of the radiologists and the SVM classifier as depicted in Table 4: Second, we
N00: Number of examples
misclassified by both
N01 : Number of examples
misclassified by the classifier
but not the radiologists
N10: Number of examples
misclassified by radiologists
but not the classifier
N11: Number of examples
correctly classified by both
Table 4. A table used to perform McNemar ’s test.
construct two hypothesis: the null hypothesis H0is that there is no difference between the
error rates or accuracies of the radiologists and the classifier and the alternative hypothesis
H1is that the radiologists and the classifier have different performance. If the null hypothesis
is correct, then the expected counts for both off-diagonal entries in Table(4) are 1
2(N01 +N10).
The discrepancy between the expected and the observed counts is measured by the following
statistics:
(|N01 N10|1)2
N01 +N10
=˜
χ2, (3)
which is, approximately, distributed as χ2with 1 degree of freedom. First, we run several
experiments to find an optimal classifier. The best classifier so far was the SVM classifier.
The results of the SVM classifier against the radiologists are summarized in Table 5. Using
Eq.3, we obtained ˜
χ2=12.85 which is larger than the tabulated χ2=3.48. Hence, we rejected
65
Classification of Soft Tissue Tumors by Machine Learning Algorithms
14 Will-be-set-by-IN-TECH
Fig. 8. The SVM and the radiologists confusion matrices
N00 =39 N01 =16 N00 +N01 =55
N10 =45 N11 =581 N10 +N11 =625
N00 +N10 =84 N01 +N11 =597 N=681
Table 5. A table constructed for the McNemar ’s test
the null hypothesis that both the radiologists and the SVM classifier have similar error rates.
Therefore, the SVM seems to perform slightly better than the radiologist. This last conclusion
should, however, be taken with a grain of salt because it is based on statistical analysis of the
SVM classifier with a limited training data set that does not represent the full distribution of
the soft tissue tumors.
The McNemar’s test does not tell us about the strength between the agreement or the
disagreement between the radiologists and the SVM classifier to validate the previous test
so we evaluated the kappa statistics ( κ=0.5) which is larger than 0 which shows that the
results of the McNemar’s test is correct. Finally, the confusion matrix of the SVM classifier is
shown in Fig. 8. The radiologist performance is also shown in Fig. 8.
11. Conclusions
We demonstrated that texture analysis of soft tissue tumors and machine learning algorithms
can be used as a tool for objective evaluation of MR images and the results correlate well with
the laboratory results. We ran several tests and come up with some interesting observation
related to the problem of texture analysis of soft issue tumors. First, texture features combined
with machine learning algorithms seems to perform as well as radiologists since computer can
extract more information related to signal homogeneity in T1-MRI than what human can do
based only on visual perception. Second, we do not need a large training data set to train a
machine learning classifier and obtain a good classification performance since texture features
correlate very well with the pathology of the tumor. Moreover, simple classifiers such as a
Parzen classifier or an SVM classifier can effectively separate benign from malignant tumors.
12. Acknowledgments
Thanks to the University Hospital Antwerp (UZA), Dept. of Radiology for providing the MR
images. The authors would like to thank Prof. Robert Holte for providing the Cost Curve
software.
13. References
Alpaydin, E. (1999). Combined 5x2cvFtest for comparing supervised classification learning
algorithms, Neural Computation 11(8): 1885–1892.
Alpaydin, E. (2001). Assessing and comparing classification algorithms.
Castellano, G., Bonilha, L., Li, L. & Cendes, F. (2004). Texture analysis of medical images,
Clinical Radiology 59: 1061–1069.
66 Soft Tissue Tumors
Classification of Soft Tissue Tumors
by Machine Learning Algorithms 15
De Schepper, A. M. & Bloem, J. L. (2007). Soft tissue tumors : grading, staging, and
tissue-specific diagnosis, Topics in Magnetic Resonance Imaging 18(6): 431–444.
De Schepper, A. M., De Beuckeleer, L., Vandevenne, J. & Somville, J. (2000). Magnetic
resonance imaging of soft tissue tumors, European Radiology 10(2): 213–223.
De Schepper, A., Vanhoenacker, F., Parizel, P. & Gielen, J. (eds) (2005). Imaging of Soft Tissue
Tumors, 3rd edn, Springer.
Dietterich (1998). Approximate statistical tests for comparing supervised classification
learning algorithms., Neural Computation 10(7): 1895–1923.
Haralick, R.M., Shanmugan, K. & Dinstein, I. (1973). Textural features for image classification,
IEEE Transactions on Systems, Man and Cybernetics 3(6): 610–621.
Hermann, G., Abdelwahab, I., Miller, T., Kelin, M. & Lewis, M. (1992). Tumor and tumor-like
conditions of the soft tissue: Magnetic resonance imaging features differentiating
benign from malignant masses, Br J Radiol 65: 14–20.
Holte, R. C. & Drummond, C. (2011). Cost-sensitive classifier evaluation using cost
curves, Proceedings of The 24th Florida Artificial Intelligence Research Society Conference
(FLAIRS-24).
Huang, Y., Wang, K. & Chen, D. (2006). Diagnosis of breast tumors with ultrasonic
texture analysis using support vector machines, Neural Computing & Applications
15(2): 164–169.
Jirák, D., Dezortová, M., Taimr, P. & Hájek, M. (2002). Texture analysis of human liver, Journal
of Magnetic Resonance Imaging 15(1): 68–74.
Juan, M., García-Gómez, Vidal, C., Luis Martí-Bonmat, Joaquín, G. & et al. (2004).
Benign/malignant classifier of soft tissue tumors using MR imaging, Magnetic
Resonance Materials in Physics, Biology and Medicine 16: 194–201.
Julesz, B. (1975). Experiments in visual perception of texture, Sci Am 232: 34–43.
Julesz, B., Gilbert, E., Shepp, L. & Frisch, H. (1973). Inability of humans to discriminate
between visual textures that agree in second-order statistics, Perception 2: 391–405.
Juntu, J., Sijbers, J., De Backer, S., Rajan, J. & Van Dyck, D. (2010). Machine learning
study of several classifiers trained with texture analysis features to differentiate
benign from malignant soft-tissue tumors in T1-MRI images, J. Magn. Reson. Imaging
31(3): 680–689.
Mahmoud-Ghoneim, D., Toussaint, G. & Jean-Marc, C. (2003). Three dimensional texture
analysis in MRI: a preliminary evaluation in gliomas, Magnetic Resonance Imaging
21(9): 983–987.
Mao, J. & Jain, A. K. (1992). Texture classification and segmentation using multiresolution
simultaneous autoregressive models, Pattern Recognition 25(2): 173 – 188.
Materka, A. & Strzelectky, M. (1998). Texture analysis methods- a review, Technical University
of Lodz 1998, COST B11-techincal report 11: 873–887.
Mayerhoefer, M. E., Breitenseher, M. J., Kramer, J., Aigner, N., Hofmann, S. & Materka, A.
(2005). Texture analysis for tissue discrimination on T1-weighted MR images of knee
joint in a multicenter study: Transferability of texture features and comaprison of
feature selection methods and classifiers, J Mag Reson Imaging 22: 674–680.
Meinel, L. A., Stolpen, A. H., Berbaum, K. S., Fajardo, L. L. & Reinhardt, J. M. (2007).
Breast MRI lesion classification: Improved performance of human readers with a
backpropagation neural network computer-aided diagnosis (CAD) system, Journal of
Magnetic Resonance Imaging 25(1): 89 –95.
67
Classification of Soft Tissue Tumors by Machine Learning Algorithms
16 Will-be-set-by-IN-TECH
Mutlu, H., Silit, E., Pekkafali, Z., Basekim, C., Ozturk, E., Sildiroglu, O., Kizilkaya, E. & Karsli,
A. (2006). Soft-tissue masses: Use of a scoring system in differentiation of benign and
malignant lesions, Clinical Imaging 30(1): 37–42.
Salzberg, S. L. (1997). On comparing classifiers: Pitfalls to avoid and a recommended
approach, Data Mining and Knowledge Discovery 1: 317–327.
Tuceryan, M. & Jain, A. K. (1998). Texture analysis, in C. H. Chen and L. F. Pau and P. S.
P. Wang (ed.), The Handbook of Pattern Recognition and Computer Vision (2nd Edition),
World Scientific Publishing Co., pp. 207–248.
Wagner, T. (1999). Texture analysis, in B. Jane, H. Haubecker & P. Geibler (eds), Handbook
of Computer Vision and Applications, Vol.2, Signal Processing and Pattern Recognition,
Academic Press, chapter 12, pp. 275–308.
Weatherall, P. (1995). Benign and malignant masses, MR imaging differentiation, Mag Reson
Clin N Am 3: 669–694.
68 Soft Tissue Tumors
... As the name suggests, these are sensitive tissues that can be affected by several infections, including tumors that can develop almost anywhere in the human body. The malignant types of these tumors, also known as Soft Tissue Sarcomas (STS), are grouped together because they share many microscopic features, exhibit the same symptoms, and are almost similarly treated [1,2] . Yet, effective diagnosis of Soft Tissues Tumors (STT) is still a big challenge owing to the difficulty in detecting these cancers. ...
... MRI is currently considered the standard diagnostic tool for the detection and classification of STT [3] with well characterized biological properties such as cellular origins and tumour specimens [4] used to distinguish tumors. MRI can be used to analyze textural characteristics or other less characterized tumor characteristics (average MRI signal intensity, shape of tumor boundaries) for several reasons: (1) ease of computation textural characteristics, (2) wide correlation of textural characteristics to tumor pathology [5,6] , and (3) robustness to changes in MRI acquisition parameters such as changes in the resolution of the tumor image and the corruption of the MRI image due to heterogeneity of the magnetic field [2] . Such magnetic field heterogeneity in MRI makes it difficult to perceive the texture in certain malignant tumors and humans have a limited capacity to perceive and discriminate these textures as well [7] . ...
... MRI is currently considered the standard diagnostic tool for the detection and classification of STT [3] with well characterized biological properties such as cellular origins and tumour specimens [4] used to distinguish tumors. MRI can be used to analyze textural characteristics or other less characterized tumor characteristics (average MRI signal intensity, shape of tumor boundaries) for several reasons: (1) ease of computation textural characteristics, (2) wide correlation of textural characteristics to tumor pathology [5,6] , and (3) robustness to changes in MRI acquisition parameters such as changes in the resolution of the tumor image and the corruption of the MRI image due to heterogeneity of the magnetic field [2] . Such magnetic field heterogeneity in MRI makes it difficult to perceive the texture in certain malignant tumors and humans have a limited capacity to perceive and discriminate these textures as well [7] . ...
Article
Full-text available
Soft Tissue Tumors (STT) are a form of sarcoma found in tissues that connect, support, and surround body structures. Because of their shallow frequency in the body and their great diversity, they appear to be heterogeneous when observed through Magnetic Resonance Imaging (MRI). They are easily confused with other diseases such as fibroadenoma mammae, lymphadenopathy, and struma nodosa, and these diagnostic errors have a considerable detrimental effect on the medical treatment process of patients. Researchers have proposed several machine learning models to classify tumors, but none have adequately addressed this misdiagnosis problem. Also, similar studies that have proposed models for evaluation of such tumors mostly do not consider the heterogeneity and the size of the data. Therefore, we propose a machine learning-based approach which combines a new technique of preprocessing the data for features transformation, resampling techniques to eliminate the bias and the deviation of instability and performing classifier tests based on the Support Vector Machine (SVM) and Decision Tree (DT) algorithms. The tests carried out on dataset collected in Nur Hidayah Hospital of Yogyakarta in Indonesia show a great improvement compared to previous studies. These results confirm that machine learning methods could provide efficient and effective tools to reinforce the automatic decision-making processes of STT diagnostics.
... Certains type d'algorithmes de ML seront plus ou moins sujet à l'un ou l'autre problème (voir Juntu et al. qui propose une méthodologie de comparaison de sept algorithmes [Jun+11]). On évalue leurs performances en traçant l'erreur du modèle sur le jeu d'entrainement et sur celui de test en fonction de la quantité de données d'entrainement : c'est la courbe d'apprentissage ( Fig. 4.3)). Le biais est important lorsque les scores d'entrainement et de test convergent mais restent faibles : l'augmentation de la quantité de données n'améliore pas vraiment la performance du modèle ( Fig. 4.3, à gauche). ...
Thesis
L'évolution de la texture ou de la forme d'une tumeur à l'imagerie médicale reflète les modifications internes dues à la progression (naturelle ou sous traitement) d'une lésion tumorale. Dans ces travaux nous avons souhaité étudier l'apport des caractéristiques delta-radiomiques pour prédire l'évolution de la maladie. Nous cherchons à fournir un pipeline complet de la reconstruction des lésions à la prédiction, en utilisant seulement les données obtenues en routine clinique.Tout d'abord, nous avons étudié un sous ensemble de marqueurs radiomiques calculés sur IRM, en cherchant à établir quelles conditions sont nécessaires pour assurer leur robustesse. Des jeux de données artificiels et cliniques nous permettent d'évaluer l'impact de la reconstruction 3D des zones d'intérêt et celui du traitement de l'image.Une première analyse d'un cas clinique met en évidence des descripteurs de texture statistiquement associés à la survie sans évènement de patients atteints d'un carcinome du canal anal dès le diagnostic.Dans un second temps, nous avons développé des modèles d'apprentissage statistique. Une seconde étude clinique révèle qu'une signature radiomique IRM en T2 à trois paramètres apprise par un modèle de forêts aléatoires donne des résultats prometteurs pour prédire la réponse histologique des sarcomes des tissus mous à la chimiothérapie néoadjuvante.Le pipeline d'apprentissage est ensuite testé sur un jeu de données de taille moyenne sans images, dans le but cette fois de prédire la rechute métastatique à court terme de patientes atteinte d'un cancer du sein. La classification des patientes est ensuite comparée à la prédiction du temps de rechute fournie par un modèle mécanistique de l'évolution des lésions.Enfin nous discutons de l'apport des techniques plus avancées de l'apprentissage statistique pour étendre l'automatisation de notre chaîne de traitement (segmentation automatique des tumeurs, analyse quantitative de l'oedème péri-tumoral).
Research Proposal
Full-text available
STTs are a type of sarcoma that develops in the tissues that connect, support, and surround bodily structures. Because of their scarcity and diversity, they are difficult to detect when seen using Magnetic Resonance Imaging (MRI). They are frequently mistaken with other disorders, and diagnostic errors have a significant negative impact on patients' medical care. Several methods for classifying these tumours have been presented by researchers, but none have satisfactorily addressed the problem of misdiagnosis. This is due to the fact that most research that have developed models for evaluating such tumours ignore the heterogeneity and magnitude of the data. Because of their scarcity and diversity, they are difficult to detect when seen using Magnetic Resonance Imaging (MRI). They are frequently mistaken with other disorders, and diagnostic errors have a significant negative impact on patients' medical care. Several methods for classifying these tumours have been presented by researchers, but none have satisfactorily addressed the problem of misdiagnosis. This is due to the fact that most research that have developed models for evaluating such tumours ignore the heterogeneity and magnitude of the data. As a result, we offer a machine learning-based strategy that incorporates a new pre-processing technique for features modification, resampling approaches to minimise discrepancies, and Decision Tree (DT) algorithms to eliminate discrepancies. Applying Machine learning processes could provide effective tools to aid in the automatic decision-making processes of STT diagnosis.
Chapter
Soft Tissue Tumor (STT) are cell growths, whose existence are not limited to the presence of tumors in soft tissues. Furthermore, they are classified into soft tissue and non-soft tissue tumor and early detection is important to determine the right course of treatment. This research, therefore, aims to compare fuzzy kernel c-means, fuzzy kernel possibilistic c-means and support vector machines on Soft Tissue Tumor dataset, obtained from Nur Hidayah Hospital, Yogyakarta, Indonesia, consisting of 50 STT and 25 non-STT samples. The results conclude that fuzzy kernel c-means provides a better running time when using the parameter \( \sigma = 0.05 \). However, support vector machines, with the parameter \( \sigma = 0.0001 \) performs better than other methods in terms of accuracy, sensitivity, precision, and F1-Score.
Chapter
Full-text available
Intelligent Medical Image Analysis plays a vital role in identification of various pathological conditions. Magnetic Resonance Imaging (MRI) is a useful imaging technique that is widely used by physicians to investigate different pathologies. Increase in computing power has introduced Computer Aided Diagnosis (CAD) which can effectively work in an automated environment. Diagnosis or classification accuracy of such a CAD system is associated with the selection of features. This paper proposes an enhanced brain MRI classifier targeting two main objectives, the first is to achieve maximum classification accuracy and secondly to minimize the number of features for classification. Two different machine learning algorithms are enhanced with a feature selection pre-processing step. Feature selection is performed using Genetic Algorithm (GA) while classifiers used are Support Vector Machine (SVM) and K-Nearest Neighbor (KNN).
Article
Full-text available
This chapter reviews and discusses various aspects of texture analysis. The concentration is on the various methods of extracting textural features from images. The geometric, random field, fractal, and signal processing models of texture are presented. The major classes of texture processing problems such as segmentation, classification, and shape from texture are discussed. The possible application areas of texture such as automated inspection, document processing, and remote sensing are summarized. A bibliography is provided at the end for further reading.
Article
Full-text available
Methods for digital-image texture analysis are reviewed based on available literature and research work either carried out or supervised by the authors. The review has been prepared on request of Dr Richard Lerski, Chairman of the Management Committee of the COST B11 action "Quantitation of Magnetic Resonance Image Texture".
Conference Paper
Full-text available
The evaluation of classier performance in a cost-sensitive setting is straightforward if the operating conditions (misclassication costs and class dis- tributions) are x ed and known. When this is not the case, evaluation requires a method of visualizing classier performance across the full range of possi- ble operating conditions. This talk outlines the most important requirements for cost-sensitive classier evaluation for machine learning and KDD researchers and practitioners, and introduces a recently developed technique for classier perfor- mance visualization ñ the cost curve ñ that meets all these requirements.
Book
This richly illustrated book provides a comprehensive survey of the growing role of medical imaging studies in the detection, staging, grading, tissue characterization, and post-treatment follow-up of soft tissue tumors. For each tumor group, imaging findings are correlated with clinical, epidemiologic, and histologic data. The relative merits and indications of various imaging modalities are discussed and compared. Particular emphasis is placed on MRI because of its unique contrast resolution and multiplanar imaging capabilities. This third, revised and updated edition includes new chapters on genetics and molecular biology and on pathology of soft tissue tumors, with respect to the new World Health Organization (WHO) calssification of soft tissue tumors. It aims to serve both as a systematic, descriptive textbook and as a rich pictorial database of soft tissue masses. The addition of numerous new illustrations of common and rare soft tissue tumors, will further increase the scientific and educational value of this third edition. This clinically oriented book will be of use not only to radiologists but also to orthopedic surgeons, oncologists and pathologists.
Article
Dietterich (1998) reviews five statistical tests and proposes the 5 x 2 cv t test for determining whether there is a significant difference between the error rates of two classifiers. In our experiments, we noticed that the 5 x 2 cv t test result may vary depending on factors that should not affect the test, and we propose a variant, the combined 5 x 2 cv F test, that combines multiple statistics to get a more robust test. Simulation results show that this combined version of the test has lower type I error and higher power than 5 x 2 cv proper.
Article
An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very carefully, comparative studies of classification and other types of algorithms can easily result in statistically invalid conclusions. This is especially true when one is using data mining techniques to analyze very large databases, which inevitably contain some statistically unlikely data. This paper describes several phenomena that can, if ignored, invalidate an experimental comparison. These phenomena and the conclusions that follow apply not only to classification, but to computational experiments in almost any aspect of data mining. The paper also discusses why comparative analysis is more important in evaluating some types of algorithms than for others, and provides some suggestions about how to avoid the pitfalls suffered by many experimental studies.
Article
We present a multiresolution simultaneous autoregressive (MR-SAR) model for texture classification and segmentation. First, a multivariate rotation-invariant SAR (RISAR) model is introduced which is based on the circular autoregressive (CAR) model. Experiments show that the multivariate RISAR model outperforms the CAR model in texture classification. Then, we demonstrate that integrating the information extracted from multiresolution SAR models gives much better performance than single resolution methods in both texture classification and texture segmentation. A quality measure to evaluate individual features for the purpose of segmentation is also presented. We employ the spatial coordinates of the pixels as two additional features to remove small speckles in the segmented image, and carefully examine the role that the spatial features play in texture segmentation. Two internal indices are introduced to evaluate the unsupervised segmentation and to find the “true” number of segments or clusters existing in the textured image.
Article
This study presents a computer-aided diagnosis (CAD) system with textural features for classifying benign and malignant breast tumors on medical ultrasound systems. A series of pathologically proven breast tumors were evaluated using the support vector machine (SVM) in the differential diagnosis of breast tumors. The proposed CAD system utilized facile textural features, i.e., block difference of inverse probabilities, block variation of local correlation coefficients and auto-covariance matrix, to identify breast tumor. An SVM classifier using the textual features classified the tumor as benign or malignant. The proposed system identifies breast tumors with a comparatively high accuracy. This can help inexperienced physicians avoid misdiagnosis. The main advantage of the proposed system is that the training and diagnosis procedure of SVM are faster and more stable than that of multilayer perception neural networks. With the expansion of the database, new cases can easily be gathered and used as references. This study dramatically reduces the training and diagnosis time. The SVM is a reliable choice for the proposed CAD system because it is fast and excellent in ultrasound image classification.
Article
To study, from a machine learning perspective, the performance of several machine learning classifiers that use texture analysis features extracted from soft-tissue tumors in nonenhanced T1-MRI images to discriminate between malignant and benign tumors. Texture analysis features were extracted from the tumor regions from T1-MRI images of clinically proven cases of 49 malignant and 86 benign soft-tissue tumors. Three conventional machine learning classifiers were trained and tested. The best classifier was compared to the radiologists by means of the McNemar's statistical test. The SVM classifier performs better than the neural network and the C4.5 decision tree based on the analysis of their receiver operating curves (ROC) and cost curves. The classification accuracy of the SVM, which was 93% (91% specificity; 94% sensitivity), was better than the radiologist classification accuracy of 90% (92% specificity; 81% sensitivity). Machine learning classifiers trained with texture analysis features are potentially valuable for detecting malignant tumors in T1-MRI images. Analysis of the learning curves of the classifiers showed that a training data size smaller than 100 T1-MRI images is sufficient to train a machine learning classifier that performs as well as expert radiologists.