ChapterPDF Available

Classification of Soft Tissue Tumors by Machine Learning Algorithms

November 2011

November 2011

DOI:10.5772/27757

In book: Soft Tissue Tumors

Authors:

Jaber Juntu

University of Antwerp

Pieter Van Dyck

Universitair Ziekenhuis Antwerpen

Dirk Van Dyck

University of Antwerp

Show all 7 authorsHide

An example of benign and malignant tumors texture

…

Block diagram of the chapter

…

The learning curves of the 7 trained classifiers

…

The error rate versus the complexity of a polynomial classifier

…

ROC curves of the trained classifiers

…

Figures - uploaded by Paul M Parizel

Content may be subject to copyright.

Content uploaded by Paul M Parizel

Content may be subject to copyright.

Classiﬁcation of Soft Tissue Tumors

by Machine Learning Algorithms

Jaber Juntu1, Arthur M. De Schepper2, Pieter Van Dyck2, Dirk Van Dyck1,

Jan Gielen2, Paul M. Parizel2and Jan Sijbers1

1Universiy of Antwerp, Physics Department, Vision Lab.

2Dept. of Radiology, Antwerp University Hospital, University of Antwerp

Belgium

1. Introduction

MR imaging is currently regarded as the standard diagnostic tool for detection and grading

of soft tissue tumors (STT ) (De Schepper et al. (2005)). Soft tissue is a term describing all

the supporting, connecting or tissues surrounding other structures and organs of the body

such as fat, muscle, blood vessels, deep skin tissues, nerves and the tissues around joints

(synovial tissues). Soft tissue tumors can grow almost anywhere in the human body. Soft

tissue sarcomas, which are the malignant type of STT , are grouped together because they

share certain microscopic characteristics, have similar symptoms, and are generally treated in

similar ways. Radiologists often look for certain features in the MR image to differentiate

benign from malignant STT tumors (Juan et al. (2004); Mutlu et al. (2006)). Although the

signal characteristics of both benign and malignant tumors frequently overlap, some MR

image features are more highly correlated to the benign or the malignant types of STT , see

De Schepper et al. (2000) and De Schepper & Bloem (2007). For example, the most commonly

used individual parameters for predicting malignancy are the inhomogeneity (texture) and

the intensity (gray level) of the MRI signal with different pulse sequences (De Schepper et al.

(2005); Hermann et al. (1992)). Inhomogeneity of the tumor region on T1-weighted MR images

is a very good indicator of the malignancy of the tumor because 90% of malignant tumors

are inhomogeneous and show a disorganized textured pattern of the MRI signal intensity

(Weatherall (1995)). This pattern is formed as a result of the losses of tissue structure and

the changes of the extracellular matrix (ECM) by cancer. The study by ( Hermann et al.

(1992)) reported a sensitivity of 72% and speciﬁcity of 87% in predicting malignancy based

on visual comparison of texture in the tumor regions in T1-MR images. The reason for the

large difference between the sensitivity and the speciﬁcity in this study is the difﬁculty of

perceiving texture in some of the malignant tumors. The limited ability for human to perceive

and discriminate between textures is well known for quite some time (Julesz (1975); Julesz

et al. (1973)). Computer aided diagnostic systems can improve the radiologists performance

in identifying the pathological type (i.e. benign or malignant) of a soft tissue tumor from

MR images (Meinel et al. (2007)). Eventhough visually comparing the textures of benign

tumor and malignant tumor sometimes show no difference, the extracted numerical values

by texture analysis are quite different. Figure 1 shows subimages of a benign and a malignant

tumors and the values of some of the extracted texture features. Such an example shows that

2 Will-be-set-by-IN-TECH

texture analysis can be used for obtaining information that is not visible to the human eye.

The reader can refer to (Materka & Strzelectky (1998); Tuceryan & Jain (1998); Wagner (1999))

as excellent references to texture analysis.

In the last few years there has been growing interest in the use of machine learning classiﬁers

for analyzing MRI data. The main aim of this chapter is to train and test several machine

learning classiﬁers with texture analysis features extracted from MR images of soft tissue

tumors. The present chapter will also serve as an introductory tutorial by providing a

systematic procedure to build and evaluate a machine learning classiﬁer that can be used

for practical applications. The typical steps to build machine learning classiﬁer consist of

feature extraction, feature selection, classiﬁer training and evaluation of the results. Several

studies have tackled the problem of texture analysis for discriminating between benign and

malignant tumors for speciﬁc type of malignancy, for example, the brain (Mahmoud-Ghoneim

et al. (2003)) the liver (Jirák et al. (2002)) and the breast (Huang et al. (2006)). However, most

papers did not follow the recommended approach for building machine learning systems (for

an example see Salzberg (1997)) and left some unanswered questions. This research aims

at answering some questions related to the problem of texture analysis of STT , such as the

classiﬁers complexity, the effect of the training data set on the classiﬁer behaviour and the

appropriate size of the training data that can be used to train a machine learning classiﬁer and

obtain good generalization performance. In the following sections, we will go through the

process of building and testing several machine learning classiﬁers as shown in Fig. 2.

We warn the reader that the training dataset is not meant to train the classiﬁer per se,as

the name implies, but should be considered as a representative statistical sample from the

population of STT . We assume that the training and testing data samples are randomly,

identically and independently sampled from the population of STT (i.e, it is an idd sample).

The process of training and testing the classiﬁer is a sort of statistical parameter estimation

problem where in that case the parameter of interest is the error rate of the classiﬁer

performance in unseen data. As such, all the experiments in the following sections are in fact

to study how the classiﬁer perform in other unseen data from the same STT population. To

put a classiﬁer in real practice, the classiﬁer should be trained and tested with several datasets

sampled from the same population with the same procedure as outlined in the following

sections. Once the classiﬁer evaluation is ﬁnished, all the available data can be used to train

the ﬁnal classiﬁer. The classiﬁer should be comprehensively tested based on a prospective

study before using the classiﬁer. A shorter preliminary version of this chapter was published

in Juntu et al. (2010).

2. Patients data set and the MR images

A large database of multicenter, multimachine MR images was collected by the University

Hospital Antwerp (UZA) from different radiology centers for the purpose of conducting

scientiﬁc research. At the start of this study, there was a real concern that texture features

could be more sensitive to image variation due to imaging with different MRI systems or

changes in MRI acquisition parameters than variation due to changes in texture as a result of

pathological changes. However, a recent study by Mayerhoefer et al. (2005), clearly showed

that the difference in texture features extracted from MR images obtained with different

machine units seems to have only small impact on the results of tissue discrimination. In the

present study, a database of T1-MR images of 86 patients having benign soft tissue tumors and

49 patients having malignant tumors were used in this retrospective study. All malignant and

benign masses were histologically conﬁrmed. We discarded all MR images that showed severe

54 Soft Tissue Tumors

Classiﬁcation of Soft Tissue Tumors

by Machine Learning Algorithms 3

Fig. 1. An example of benign and malignant tumors texture

imaging artifacts or that were corrupted by a high level of bias ﬁeld inhomogeneity signal.

From the tumor regions in the MR images, we cut square subimages of size 50 ×50 pixels for

texture features computation. The physical size of that area is not ﬁxed but it depends on the

image acquisition parameters. However, the actual size of that area will not effect the values

of the extracted features. To increase the size of the training dataset, we selected several tumor

regions from the MR images for every patient. Hence, the total size of the dataset available

for training consisted of 253 benign and 428 malignant subimages of size 50 ×50 pixels each.

In order to preserve texture information, we avoid preprocessing the subimages. However,

histogram equalization was applied to all the tumor subimages since some texture features

such as the ﬁrst order texture features are sensitive to graylevel variation.

3. Texture computation

Texture can be characterized and described in different ways using various sets and

combinations of parameters. Most texture features computation was done using the software

package MaZda 3.20 which allows the computation of texture features based on statistical,

wavelet ﬁltering, and model-based methods of analyzing texture (Castellano et al. (2004)). We

also wrote other Matlab programs to calculate some texture features such as the Haralick’s

texture features to have a better and ﬁne control of adjusting the parameters that effect the

extracted features. To ensure the consistency of the calculated texture feature across all the

tumor subimages, we wrote a MaZda macro script that reads the tumor subimages and

calculates tumor texture with the same texture analysis parameters setting. The extracted

texture features were saved in a text ﬁle for feature selection and classiﬁcation. The following

is a short description of the texture features that were computed from the tumor subimages,

which are also summarized in Table 1 for easy reference:

•First order statistics: extract texture statistics based on a function of a single pixel. The

simplest approach is to construct a histogram for the image of interest. The histogram is

converted into probability function by dividing the values in the histogram by the total

Classification of Soft Tissue Tumors by Machine Learning Algorithms

4 Will-be-set-by-IN-TECH

Fig. 2. Block diagram of the chapter

number of pixels in the image. A set of statistical parameters from the probability density

function are calculated such as the mean, the variance, the skewness, and the kurtosis.

•Second order statistics: the Haralick’s texture features and the absolute gradient distribution

are used in this study. In this method of texture analysis the correlation between two

or more neighborhood pixels is taken into account. Since complex texture patterns are

formed by the interaction between more than one pixel, second order statistics might

provide extra texture information that can not be extracted based on ﬁrst order statistics

of the texture. The Haralick’s texture analysis (Haralick et al. (1973)) is probably the most

famous technique of second order texture analysis methods. It is based on the calculation

of statistics from a function of two variables that measures the probability of occurrence

of a pair of pixels that are separated by dpixels with an angle θ. We calculated 11

56 Soft Tissue Tumors

Classiﬁcation of Soft Tissue Tumors

by Machine Learning Algorithms 5

different Haralick’s features from the co-occurrence matrix. The co-occurence matrix is

calculated for every two pixels inclined by an angle θand separated by a distance d.To

take the scaling and rotation of texture into account, we calculated the Haralick’s features

from the co-occurrence matrices calculated with angles {0◦,45

◦,90

◦, 135◦}and distances of

{1, 2, 3, 4, 5}pixels. The absolute gradient texture features are also included to incorporate

texture features that are invariant to gray-level scaling caused by bias ﬁeld inhomogeneity.

Every pixel in the image was replaced by the absolute gradient which was calculated

from a window of size 3 ×3 around the pixel by calculating the absolute of the squared

summation of the difference between the two pixels above and down the center pixel and

the two pixels on the right and left. Doing that for all pixels resulted in a gradient image

from which several statistical parameters could be obtained: the mean, the variance, the

skewness, and the kurtosis.

•Higher order statistics: used to capture texture information which are dependent on the

interaction between several neighborhood pixels. We selected two different approaches,

–the run-length gray-level matrix approach were a consecutive set of pixels with the

same gray level value are counted and the result is stored in a 2D matrix indexed by the

gray-level value and length of the gray-level run. Several statistics are calculated from

the 2D matrix.

–write a mathematical function or model that describes the texture, for example the

autoregressive texture model. The basic idea of autoregressive models for texture is to

express a gray level of a pixel as a function of the gray levels of its neighborhood pixels

Mao & Jain (1992). The related model parameters for one image are calculated using a

least squares technique and are used as texture features. This approach is similar to the

Markov random ﬁelds.

•Filtering method: The image is split into subbands with bandpass ﬁlters such as the wavelet

transform. The energy of the sub-bands are used as a texture features.

After the texture analysis step, each tumor subimage is encoded by a feature vector as shown

in Fig. 3. The texture features are labeled as {f1,f2, ......., f290 }(see Table 1).

Fig. 3. Texture analysis features

4. Feature selection

Feature selection was used to remove redundant features. This step is very important because

it improves the performance of the learning models and reduces the effect of the curse of

dimensionality. Feature selection also speeds the learning process and improves the model

interpretability. Deciding which feature to keep, because it is relevant, and which one to

discard, is largely dependent on the context. To perform an unbiased feature selection, we

tested several feature selection techniques. We experimented with the following feature

selection methods:

Classification of Soft Tissue Tumors by Machine Learning Algorithms

6 Will-be-set-by-IN-TECH

Methods Calculated parameters

First order: {f1,..., f10}

histogram mean, minimum, variance, skewness, kurtosis

1%, 10%, 50%, 90% and 99% percentiles.

Second Order: {f11 ,...,f250}&{f271 ,..., f277}

coocurrence matrix angular second moment, contrast, sum of squares,

{ angles=θ=0◦,45

◦,90

◦, 135◦inverse difference moment, sum average, correlation,

and distances=1,2,3,4,5 } entropy, difference variance, difference entropy.

absolute gradient distribution mean of absolute gradient, variance of absolute gradient

skewness of absolute gradient, kurtosis of absolute gradient.

Higher order: {f251 ,...,f270 }&{f278,...,f282}

runlength graylevel matrix short run emphasis moment, long run emphasis moment,

run length nonuniformity, fraction of image in run.

autoregressive texture model θ1,θ2,θ3,θ4,σ.

Filtering technique: {f283 ,...,f290 }

wavelet energies of wavelet coefﬁcients of subbands at successive scales.

Table 1. Texture analysis methods used in this study and the corresponding texture features

•Unsupervised feature selection techniques: these methods do not use the class labels and the

selected features are strongly dependent on the sample distribution of the pixels graylevel

values. We selected texture features subsets by forward, backward, bidirectional, and

greedy stepwise search methods and two feature ranking methods, namely, the chi-squares

statistics and the information gain criteria ranking methods.

•Supervised selection techniques: these techniques use class labels for guiding the feature

selection process, thus, the selected features are the ones that improve the discrimination

between benign and malignant tumors. We used the C4.5 decision tree algorithm and the

support vector machines as a wrappers.

Table 2 lists all the feature selection techniques that were tested in this study and their selected

subset features. It is not surprising that the 8 feature selection methods selected different

features subsets because each one has a different measure for feature relevance. However,

feature selection methods that belong to the same group generally selected almost similar

features. The selected features subsets were used as an input to a simple Bayes classiﬁer

to evaluate the efﬁcacy of the texture features subsets. The results of the classiﬁcation are

listed in Table 2. We also listed the classiﬁcation accuracy (Acc%), the True Positive (TP),

the True Negative (TN) and the Area Under the Curve (AUC) of the ROC. The measure that

is generally recommended to use is the AUC, since it is a global measure and insensitive to

the data distribution. In the last row of Table 2, we included the performance of the Bayes

classiﬁer using the full textures features set for comparison. Looking at Table 2, one can notice

that the classiﬁcation results with the feature subsets selected by the feature ranking methods

are worse than classiﬁcation using the full texture feature since their AUC values are 0.72 and

0.75, respectively, while the full texture features classiﬁcation has an AUC value of 0.78. The

best texture features subset was the one that had the highest AUC value. The texture features

subset with the highest AUC is the forward selection method which was used for training and

testing the classiﬁers.

5. The trained classiﬁers

The main purpose of the training data is to infer a mathematical decision function or an

algorithm for making prediction. Thereby, a given training data set is used to optimize the

parameters of a machine learning classiﬁer, which then results in a simple mathematical

function or expression that can be used for making prediction. If the same classiﬁer is trained

58 Soft Tissue Tumors

Classiﬁcation of Soft Tissue Tumors

by Machine Learning Algorithms 7

Method The best selected features ACC%TP TN AUC

Forward selection f4,f6,f7,f8,f66,f169 ,f255,f263,f274 ,f279,f282,f286 76.80 0.80 0.74 0.87

Backward selection f4,f6,f7,f8,f114,f253 ,f263,f274,f279 ,f281,f282,f286 77.70 0.80 0.74 0.85

Bidirectional search f4,f6,f7,f8,f66,f169 ,f255,f263,f274 ,f279,f282,f286 77.10 0.79 0.73 0.86

Greedy stepwise search f4,f6,f7,f8,f66,f253 ,ff263,f274,f279 ,f282,f286 78.00 0.83 0.69 0.83

Ranking with chi-squares statistics f7,f16 ,f37,f45,f46 ,f52,f251,f253 ,f255,f263,f265 ,f268 67.99 0.65 0.73 0.72

Ranking with information gain f7,f16,f37 ,f45,f46,f52 ,f251,f253,f254 ,f255,f268,f282 ,f286 65.34 0.56 0.81 0.75

C4.5 decision tree wrapper f6,f21,f38 ,f49,f56,f64 ,f118,f164,f253 70.77 0.70 0.73 0.78

Best features with SVM wrapper f5,f6,f13,f98 ,f172,f178,f216 ,f217,f256 78.00 0.86 0.64 0.84

Full texture features set f1,f2, ..., f290 73.71 0.74 0.73 0.78

Table 2. Bayes classiﬁer results for the best selected texture features subsets

on a different training data drawn independently and identically from the same problem

domain, we expect to obtain a decision function with a similar performance. If the classiﬁer

performance stays the same independent of training with a speciﬁc training dataset, the

classiﬁer then learned how to differentiate benign from malignant tumors from the training

data. However, if the classiﬁer performance changes considerably by changing the training

dataset, then that classiﬁer can not be used for prediction. However, in principle the decision

function (i.e. the classiﬁer) can not be made completely independent from the structure of

the training data and the complexity of the learning algorithm. To isolate all contributing

factors that might interfere with training the classiﬁer and to minimize the bias in the stated

results, we systematically applied several machine learning evaluation strategies. First, we

trained several classiﬁers that belong to different machine learning algorithms on the same

texture features data. The selected classiﬁers are trained with crossvalidation procedure to

make better use of the training data. The crossvalidation procedure also tries to minimize the

effect of the probability distribution of a speciﬁc training dataset on the classiﬁer performance.

Second, we study the effect of changing the size of the training data set on the classiﬁers

performance by plotting the learning curves that show the error rate of the trained classiﬁers

as a function of the size of the training data set. Third, we used some statistical tests

for comparison between the classiﬁers performance. We also plotted the ROC (Receiver

Operating Curve) and the Cost curves to analyze the classiﬁers’ performance. Finally, we

applied the McNemar’s statistical test to compare the performance of the best classiﬁer against

the radiologists’ performance.

From several machine algorithm groups, we selected the following classiﬁers:

Linear classiﬁer: This classiﬁer assumes that the benign and the malignant classes have the

same covariance matrix but different means. It estimates the covariance matrix from the

full training data and assigns a new case to the class with the highest probability. Such

classiﬁer is able to separate benign and malignant tumors by a simple linear decision

surface. The probability distribution of the full training dataset is assumed to be normally

distributed.

Quadratic classiﬁer: This classiﬁer is more complex than the linear classiﬁer since it estimates

different matrices for the means and covariance of the benign and the malignant classes.

Such classiﬁer is able to separate the benign and the malignant tumors by a quadratic

nonlinear decision surface. The probability distributions of the benign and the malignant

classes are assumed to be normally distributed but not necessary with the same covariance

matrices.

Nonparametric density estimation classiﬁers: Parzen classiﬁer and k-NN nearest neighborhood

classiﬁer. Both classiﬁers estimate the empirical probability density function of the benign

Classification of Soft Tissue Tumors by Machine Learning Algorithms

8 Will-be-set-by-IN-TECH

and the malignant classes from the training data instead of assuming certain probability

distribution function such as the linear and quadratic classiﬁers.

Decision trees classiﬁer: Such classiﬁer uses logical rules to separate the benign form the

malignant tumors regardless of the probability distribution of the training data.

Back-propagation neural network: The NN-classiﬁer separates the tumors by high nonlinear

decision surface. The neural network uses an iterative optimization algorithm to ﬁnd the

weights of the neural network from the training data.

Support vector machine classiﬁer: The SVM classiﬁer simpliﬁes the classiﬁcation problem by

transforming the input space into high dimensional space such that the classiﬁcation

problem become a linear one and easier to solve. The SVM classiﬁer does not depend on

the probabilistic distribution of the training dataset and has the ability to generalize quite

well for classiﬁcation problems of varied degrees of complexities. During the training

process, a quadratic optimization algorithm is used to iteratively adjust the complexity of

the decision function to adopt to the problem domain.

In the following sections, we describe several tests that were performed to study the effect

of the size of the training data set on the classiﬁer performance. Additionally, we tested

the complexity of the decision function, analyzed the classiﬁer performance and statistically

compared the performance of two classiﬁers. Finally, we tested the classiﬁer performance

against the radiologists’ performance.

6. The size of the training data and the classiﬁers performance

The classiﬁer learns the classiﬁcation function from the training data. The training data

represents a small sample from the population of soft tissue tumors and hence the size of

the training data has an impact on the trained classiﬁer. We run the learning curve test

to study the effect of the size of the training data set on the classiﬁer performance. Using

a small subset of the training data, we tuned the parameters for each classiﬁer as follows.

The back-propagation neural network has two hidden layers, an input layer of 12 nodes (i.e,

number of selected texture features by the forward selection method) and an output layer

with two nodes corresponding to the benign and the malignant classes. The SVM classiﬁer

is trained with an RBF kernel which is tuned with a grid search algorithm that resulted in a

(σ=10000) and a cost coefﬁcient (C=1.0). We used the PRTOOLS 4.0 matlab toolbox to run

this experiment. We left the parameters of the decision trees and the Parzen classiﬁer to their

default values, which forces the PRTOOLS toolbox to tune them automatically to their best

values. We trained the 7 classiﬁers with different sizes of the training data set. At each speciﬁc

size of the training data set, we measured the error rate of all the classiﬁers. For each speciﬁc

size of the training data, we repeated the experiment 10 times and the average error rate

was calculated. Figure 4 shows the learning curves of the 7 trained classiﬁers. The learning

curves show some interesting facts about the problem domain. First, the learning curves are

smooth which is a good indicator of the classiﬁers stability against changes in the training data

distribution . The smoothness of the learning curves is also a necessary condition for carrying

some statistical tests that we used to compare the classiﬁers performance(Dietterich (1998)).

Second, the 7 classiﬁers learned very well with few training samples. Most classiﬁers achieved

an error rates between 0.251 and 0.198 after training with as few as 50 training samples.

As we increase the size of the training data set, the error rate decreases very slowly after

training by 50 samples. This observation indicates that a small training data set is sufﬁcient

to get good generalization performance. Increasing the size of the training set after certain

60 Soft Tissue Tumors

Classiﬁcation of Soft Tissue Tumors

by Machine Learning Algorithms 9

limit seems to have little impact on improving the classiﬁers performance any further. The

third observation is related to the complexity of the classiﬁers. Simple classiﬁers such as the

k-NN nearest neighborhood classiﬁer and the SVM with an RBF kernel with large bandwidth

achieved lower error rates compared to the neural network classiﬁer. This observation is an

indication that the decision surface that separates the benign from the malignant tumors based

on texture features is a very simple mathematical function which we investigate further in the

following section. Classiﬁcation problems that procedure linear or simple decision function

are less likely to overﬁt the training data and often generalize and predict very well in unseen

data.

Fig. 4. The learning curves of the 7 trained classiﬁers

7. The complexity of the decision function

The learning curves from the last section showed that classiﬁers which produce simple

decision functions generalize better since they have the smallest error rate on the testing

samples. To check that conclusion we ran a test using an SVM classiﬁer with a polynomial

kernel that produces a polynomial decision function with a varied degree of complexity. We

varied the degree of the polynomial kernel gradually from 1 to 20 and at each degree of the

polynomial, we run the experiment 10 times using a crossvalidation procedure. Each point

in the learning curves is the average of the error rates of ten different experiments. Figure 5

shows the error rate of the polynomial classiﬁer versus the degree of the polynomial kernel

function. The plot clearly shows that the error rate is minimum at a polynomial decision

function of the 4th degree. The error rates for the linear classiﬁer (a 1st degree polynomial) and

the quadratic classiﬁer (a 2nd degree polynomial) are large since they under-ﬁt the training

data. A polynomial classiﬁer higher than the 4th degree also have high error rate since it

Classification of Soft Tissue Tumors by Machine Learning Algorithms

10 Will-be-set-by-IN-TECH

overﬁt the training data. This explains why in Fig. 4 that the simple linear classiﬁer and the

neural network classiﬁer both have high error rates compared to other classiﬁers, because the

linear classiﬁer is too simple and the neural network classiﬁer is too complex for the problem

domain. That also explains why the SVM classiﬁer has a good classiﬁcation performance

because it is very ﬂexible and can adept to classiﬁcation problems of varied complexity.

Fig. 5. The error rate versus the complexity of a polynomial classiﬁer

8. Analyzing the classiﬁers performance

To gain more insight into the classiﬁers’ performance, we trained the 7 classiﬁers using the

full data set with a 10-folds crossvalidation procedure. In Fig. 6 and Fig. 7, we plotted the

ROC curves and the Cost curves of the 7 classiﬁers. In the ROC curves plot, the best curves

are at the top of the plot. In the ROC curves, we see that the classiﬁers are ranked, according

to an increase in performance, as follow: the decision trees, the neural networks, the linear

classiﬁer, the quadratic classiﬁer and the k-NN classiﬁer. However, there is an ambiguity

about the ranking of the Parzen and SVM classiﬁers because their ROC curves intersect. In

the Cost-curve plot, the classiﬁers are ranked in the same order as the ROC curves. However,

this time the curves of the best classiﬁers are at the bottom of the plot. The Cost-curves of the

Parzen classiﬁer and the SVM classiﬁer have the same normalized expected cost value for a

probability cost function (PCF) between 0.45-0.75 where both curves intersect. For a value of

PCF <0.45, the SVM classiﬁer performance is better than the Parzen classiﬁer while for the

value of PCF >0.75 the Parzen classiﬁer performance is better. In other words, both classiﬁers

perform equally well if the cost of classifying benign and malignant tumors is kept the same.

However, if we would like to change the cost of classifying benign and malignant tumors, for

example, we decided to give more cost for missing malignant tumors than missing benign

tumors then both classiﬁers perform differently (see Holte & Drummond (2011)). The later

observation explains why the SVM and Parzen classiﬁer have an overlapping performance

which is easy to explain from the ROC curves.

62 Soft Tissue Tumors

Classiﬁcation of Soft Tissue Tumors

by Machine Learning Algorithms 11

Fig. 6. ROC curves of the trained classiﬁers

Fig. 7. Cost curves of the trained classiﬁers

9. Statistical comparison between two classiﬁers

Classiﬁer performance is a function of several factors including the statistical distribution

of the training and testing data, the internal structure of the classiﬁer and the inherent

randomness in the training process. Even if we train two different classiﬁers with the same

dataset their classiﬁcation error rates will not be necessary the same. That is because classiﬁers

are trained with different algorithms and with different optimizations criteria and different

parameter settings. The most effective way to compare classiﬁers is to empirically train

Classification of Soft Tissue Tumors by Machine Learning Algorithms

12 Will-be-set-by-IN-TECH

and test the classiﬁers using multiple training and testing data. This procedure is repeated

several times and then some statistical tests should be applied to assess their performance.

Dietterich (1998) described an 5 ×2cv algorithm that can be used to statistically compare the

performance of two machine learning classiﬁers in the same classiﬁcation problem. The name

of the test is an abbreviation for "5 iterations 2-fold crossvalidation paired t-Test". The same test

can be used to check if one classiﬁer outperforms another classiﬁer on a speciﬁc classiﬁcation

task. Let Dbe a dataset which is divided into ﬁve folds F1,F2, .., F5and let Aand Bbe two

classiﬁers that their performance will be compared. Let p{i}

jstands for the difference in errors

between the two classiﬁers in iteration jfold replication i. Then, the steps of the algorithm are

as follows:

• divide the ﬁrst fold F1into two equal-sized parts t1and t2. Train both classiﬁers Aand B

using t1and test them using t2to obtain two error estimations e1

Aand e1

B. Calculate the

difference in errors p(1)=e1

A−e1

• swap t1and t2such that the classiﬁers are trained with t2and tested with t1. Re-train both

classiﬁers and calculate new errors and new difference in errors p(2)=e2

A−e2

• for this crossvalidation run, calculate the mean ¯

p=p(1)+p(2)

2and the variance s2=(p(1)−

p)2+(p(2)−¯

p)2

• repeat the same procedure for the remaining folds {F2, ..., F5}

Let p(1)

1denotes the difference p(1)from the ﬁrst run, and s2

idenote the estimated variance for

run i,i=1, ..., 5. Calculate the ˜

t-statistics using:

t=p(1)

(1/5)∑5

i=1s2

(1)

Note that only one of the ten differences is used in the above expression. Dietterich (1998)

has shown that under the null hypothesis, ˜

tis approximately a t-distributed with 5 degrees

of freedom. The test can be used to check if two constructed classiﬁers have a similar error

rate on new example. The null hypothesis indicates that the two classiﬁers have the same error

rate and the alternative hypothesis indicates different error rates. We reject the null hypothesis

with 95 percent conﬁdence if ˜

tis larger than the tabulated t-statistics.

Note that, there are 10 different values that can be placed in the numerator of Eq.(1) leading to

10 possible statistics. Selecting different values in the numerator of Eq.(1) should not effect the

results of the test. Practically, this is not always the case as shown in Alpaydin (1999), which

proposed a modiﬁed test called the combined 5 ×2cv. The modiﬁed Dietterich test combines

the results of the 10 possible statistics and uses more degrees of freedom which promises to

be more robust and has better statistical power than the original Dietterich test. The new test

calculates:

∑5

i=1

∑

j=1p(j)

i2

2∑5

i=1s2

∼Fn,m(2)

and tests the estimated ˜

fagainst an F-statistics with 10 and 5 degrees of freedom. Reject the

null hypothesis if ˜

fis larger than the tabulated F-statistics value (i.e., F=4.74), otherwise,

accept the null hypothesis.

64 Soft Tissue Tumors

Classiﬁcation of Soft Tissue Tumors

by Machine Learning Algorithms 13

Exp# e(1)

Ae(1)

Bp(1)e(2)

Ae(2)

Bp(2)s2

1 0.3853 0.1618 0.2235 0.3588 0.2029 0.1559 0.0023

2 0.3382 0.1735 0.1647 0.1353 0.1706 -0.0353 0.0200

3 0.4265 0.1794 0.2471 0.3176 0.2000 0.1176 0.0084

4 0.3824 0.1735 0.2088 0.3618 0.1529 0.2088 0.0

5 0.3912 0.1794 0.2118 0.3529 0.1647 0.1882 0.0003

Table 3. Error rates, differences and variances s2of the SVM classifer (A) and the Parzen (B)

using 5 ×2-fold crossvalidation on tumors’ texture.

We selected two classiﬁers from Fig. 7, namely, the SVM and the neural networks classiﬁers.

We run the test to check whether both classiﬁers have similar performance or have different

performance. The results of running the 5-iterations 2-fold crossvalidation algorithm are

summarized in Table 3. Using Eq.(2), we calculated ˜

f=5.58 which is larger than the the

theoretical F-statistics value. Hence, the null hypothesis that both classiﬁers have similar

error rates was rejected. Therefore, according to the combined 5 ×2cvtest, the SVM classiﬁer

had better performance than the neural network classiﬁer with 95% statistical conﬁdence.

In conclusion, the test shows that some classiﬁers can have better performance than other

classiﬁer when trained with the same training dataset.

10. Machine learning versus radiologists performance

An important question is how machine learning classiﬁers perform compared to radiologists.

In the previous section, we used the modiﬁed 5 ×2cv Dietterich test to compare two

classiﬁers. However, we can not use the same test to compare a classiﬁer performance against

the radiologists diagnosis since the radiologist results can not be repeated. Instead, we applied

the McNemar’s test (Alpaydin (2001)). To apply McNemar’s test, we ﬁrst have to express

the results of the radiologists and the SVM classiﬁer as depicted in Table 4: Second, we

N00: Number of examples

misclassiﬁed by both

N01 : Number of examples

misclassiﬁed by the classiﬁer

but not the radiologists

N10: Number of examples

misclassiﬁed by radiologists

but not the classiﬁer

N11: Number of examples

correctly classiﬁed by both

Table 4. A table used to perform McNemar ’s test.

construct two hypothesis: the null hypothesis H0is that there is no difference between the

error rates or accuracies of the radiologists and the classiﬁer and the alternative hypothesis

H1is that the radiologists and the classiﬁer have different performance. If the null hypothesis

is correct, then the expected counts for both off-diagonal entries in Table(4) are 1

2(N01 +N10).

The discrepancy between the expected and the observed counts is measured by the following

statistics:

(|N01 −N10|−1)2

N01 +N10

=˜

χ2, (3)

which is, approximately, distributed as χ2with 1 degree of freedom. First, we run several

experiments to ﬁnd an optimal classiﬁer. The best classiﬁer so far was the SVM classiﬁer.

The results of the SVM classiﬁer against the radiologists are summarized in Table 5. Using

Eq.3, we obtained ˜

χ2=12.85 which is larger than the tabulated χ2=3.48. Hence, we rejected

Classification of Soft Tissue Tumors by Machine Learning Algorithms

14 Will-be-set-by-IN-TECH

Fig. 8. The SVM and the radiologists confusion matrices

N00 =39 N01 =16 N00 +N01 =55

N10 =45 N11 =581 N10 +N11 =625

N00 +N10 =84 N01 +N11 =597 N=681

Table 5. A table constructed for the McNemar ’s test

the null hypothesis that both the radiologists and the SVM classiﬁer have similar error rates.

Therefore, the SVM seems to perform slightly better than the radiologist. This last conclusion

should, however, be taken with a grain of salt because it is based on statistical analysis of the

SVM classiﬁer with a limited training data set that does not represent the full distribution of

the soft tissue tumors.

The McNemar’s test does not tell us about the strength between the agreement or the

disagreement between the radiologists and the SVM classiﬁer to validate the previous test

so we evaluated the kappa statistics ( κ=0.5) which is larger than 0 which shows that the

results of the McNemar’s test is correct. Finally, the confusion matrix of the SVM classiﬁer is

shown in Fig. 8. The radiologist performance is also shown in Fig. 8.

11. Conclusions

We demonstrated that texture analysis of soft tissue tumors and machine learning algorithms

can be used as a tool for objective evaluation of MR images and the results correlate well with

the laboratory results. We ran several tests and come up with some interesting observation

related to the problem of texture analysis of soft issue tumors. First, texture features combined

with machine learning algorithms seems to perform as well as radiologists since computer can

extract more information related to signal homogeneity in T1-MRI than what human can do

based only on visual perception. Second, we do not need a large training data set to train a

machine learning classiﬁer and obtain a good classiﬁcation performance since texture features

correlate very well with the pathology of the tumor. Moreover, simple classiﬁers such as a

Parzen classiﬁer or an SVM classiﬁer can effectively separate benign from malignant tumors.

12. Acknowledgments

Thanks to the University Hospital Antwerp (UZA), Dept. of Radiology for providing the MR

images. The authors would like to thank Prof. Robert Holte for providing the Cost Curve

software.

13. References

Alpaydin, E. (1999). Combined 5x2cvFtest for comparing supervised classiﬁcation learning

algorithms, Neural Computation 11(8): 1885–1892.

Alpaydin, E. (2001). Assessing and comparing classiﬁcation algorithms.

Castellano, G., Bonilha, L., Li, L. & Cendes, F. (2004). Texture analysis of medical images,

Clinical Radiology 59: 1061–1069.

66 Soft Tissue Tumors

Classiﬁcation of Soft Tissue Tumors

by Machine Learning Algorithms 15

De Schepper, A. M. & Bloem, J. L. (2007). Soft tissue tumors : grading, staging, and

tissue-speciﬁc diagnosis, Topics in Magnetic Resonance Imaging 18(6): 431–444.

De Schepper, A. M., De Beuckeleer, L., Vandevenne, J. & Somville, J. (2000). Magnetic

resonance imaging of soft tissue tumors, European Radiology 10(2): 213–223.

De Schepper, A., Vanhoenacker, F., Parizel, P. & Gielen, J. (eds) (2005). Imaging of Soft Tissue

Tumors, 3rd edn, Springer.

Dietterich (1998). Approximate statistical tests for comparing supervised classiﬁcation

learning algorithms., Neural Computation 10(7): 1895–1923.

Haralick, R.M., Shanmugan, K. & Dinstein, I. (1973). Textural features for image classiﬁcation,

IEEE Transactions on Systems, Man and Cybernetics 3(6): 610–621.

Hermann, G., Abdelwahab, I., Miller, T., Kelin, M. & Lewis, M. (1992). Tumor and tumor-like

conditions of the soft tissue: Magnetic resonance imaging features differentiating

benign from malignant masses, Br J Radiol 65: 14–20.

Holte, R. C. & Drummond, C. (2011). Cost-sensitive classiﬁer evaluation using cost

curves, Proceedings of The 24th Florida Artiﬁcial Intelligence Research Society Conference

(FLAIRS-24).

Huang, Y., Wang, K. & Chen, D. (2006). Diagnosis of breast tumors with ultrasonic

texture analysis using support vector machines, Neural Computing & Applications

15(2): 164–169.

Jirák, D., Dezortová, M., Taimr, P. & Hájek, M. (2002). Texture analysis of human liver, Journal

of Magnetic Resonance Imaging 15(1): 68–74.

Juan, M., García-Gómez, Vidal, C., Luis Martí-Bonmatï£¡, Joaquín, G. & et al. (2004).

Benign/malignant classiﬁer of soft tissue tumors using MR imaging, Magnetic

Resonance Materials in Physics, Biology and Medicine 16: 194–201.

Julesz, B. (1975). Experiments in visual perception of texture, Sci Am 232: 34–43.

Julesz, B., Gilbert, E., Shepp, L. & Frisch, H. (1973). Inability of humans to discriminate

between visual textures that agree in second-order statistics, Perception 2: 391–405.

Juntu, J., Sijbers, J., De Backer, S., Rajan, J. & Van Dyck, D. (2010). Machine learning

study of several classiﬁers trained with texture analysis features to differentiate

benign from malignant soft-tissue tumors in T1-MRI images, J. Magn. Reson. Imaging

31(3): 680–689.

Mahmoud-Ghoneim, D., Toussaint, G. & Jean-Marc, C. (2003). Three dimensional texture

analysis in MRI: a preliminary evaluation in gliomas, Magnetic Resonance Imaging

21(9): 983–987.

Mao, J. & Jain, A. K. (1992). Texture classiﬁcation and segmentation using multiresolution

simultaneous autoregressive models, Pattern Recognition 25(2): 173 – 188.

Materka, A. & Strzelectky, M. (1998). Texture analysis methods- a review, Technical University

of Lodz 1998, COST B11-techincal report 11: 873–887.

Mayerhoefer, M. E., Breitenseher, M. J., Kramer, J., Aigner, N., Hofmann, S. & Materka, A.

(2005). Texture analysis for tissue discrimination on T1-weighted MR images of knee

joint in a multicenter study: Transferability of texture features and comaprison of

feature selection methods and classiﬁers, J Mag Reson Imaging 22: 674–680.

Meinel, L. A., Stolpen, A. H., Berbaum, K. S., Fajardo, L. L. & Reinhardt, J. M. (2007).

Breast MRI lesion classiﬁcation: Improved performance of human readers with a

backpropagation neural network computer-aided diagnosis (CAD) system, Journal of

Magnetic Resonance Imaging 25(1): 89 –95.

Classification of Soft Tissue Tumors by Machine Learning Algorithms

16 Will-be-set-by-IN-TECH

Mutlu, H., Silit, E., Pekkafali, Z., Basekim, C., Ozturk, E., Sildiroglu, O., Kizilkaya, E. & Karsli,

A. (2006). Soft-tissue masses: Use of a scoring system in differentiation of benign and

malignant lesions, Clinical Imaging 30(1): 37–42.

Salzberg, S. L. (1997). On comparing classiﬁers: Pitfalls to avoid and a recommended

approach, Data Mining and Knowledge Discovery 1: 317–327.

Tuceryan, M. & Jain, A. K. (1998). Texture analysis, in C. H. Chen and L. F. Pau and P. S.

P. Wang (ed.), The Handbook of Pattern Recognition and Computer Vision (2nd Edition),

World Scientiﬁc Publishing Co., pp. 207–248.

Wagner, T. (1999). Texture analysis, in B. Jane, H. Haubecker & P. Geibler (eds), Handbook

of Computer Vision and Applications, Vol.2, Signal Processing and Pattern Recognition,

Academic Press, chapter 12, pp. 275–308.

Weatherall, P. (1995). Benign and malignant masses, MR imaging differentiation, Mag Reson

Clin N Am 3: 669–694.

68 Soft Tissue Tumors

Improvement in Automated Diagnosis of Soft Tissues Tumors Using Machine Learning

Article

Full-text available

Jan 2021

Soft Tissue Tumors (STT) are a form of sarcoma found in tissues that connect, support, and surround body structures. Because of their shallow frequency in the body and their great diversity, they appear to be heterogeneous when observed through Magnetic Resonance Imaging (MRI). They are easily confused with other diseases such as fibroadenoma mammae, lymphadenopathy, and struma nodosa, and these diagnostic errors have a considerable detrimental effect on the medical treatment process of patients. Researchers have proposed several machine learning models to classify tumors, but none have adequately addressed this misdiagnosis problem. Also, similar studies that have proposed models for evaluation of such tumors mostly do not consider the heterogeneity and the size of the data. Therefore, we propose a machine learning-based approach which combines a new technique of preprocessing the data for features transformation, resampling techniques to eliminate the bias and the deviation of instability and performing classifier tests based on the Support Vector Machine (SVM) and Decision Tree (DT) algorithms. The tests carried out on dataset collected in Nur Hidayah Hospital of Yogyakarta in Indonesia show a great improvement compared to previous studies. These results confirm that machine learning methods could provide efficient and effective tools to reinforce the automatic decision-making processes of STT diagnostics.

Analyse quantitative des données de routine clinique pour le pronostic précoce en oncologie

Thesis

Nov 2019

Cynthia Perier

L'évolution de la texture ou de la forme d'une tumeur à l'imagerie médicale reflète les modifications internes dues à la progression (naturelle ou sous traitement) d'une lésion tumorale. Dans ces travaux nous avons souhaité étudier l'apport des caractéristiques delta-radiomiques pour prédire l'évolution de la maladie. Nous cherchons à fournir un pipeline complet de la reconstruction des lésions à la prédiction, en utilisant seulement les données obtenues en routine clinique.Tout d'abord, nous avons étudié un sous ensemble de marqueurs radiomiques calculés sur IRM, en cherchant à établir quelles conditions sont nécessaires pour assurer leur robustesse. Des jeux de données artificiels et cliniques nous permettent d'évaluer l'impact de la reconstruction 3D des zones d'intérêt et celui du traitement de l'image.Une première analyse d'un cas clinique met en évidence des descripteurs de texture statistiquement associés à la survie sans évènement de patients atteints d'un carcinome du canal anal dès le diagnostic.Dans un second temps, nous avons développé des modèles d'apprentissage statistique. Une seconde étude clinique révèle qu'une signature radiomique IRM en T2 à trois paramètres apprise par un modèle de forêts aléatoires donne des résultats prometteurs pour prédire la réponse histologique des sarcomes des tissus mous à la chimiothérapie néoadjuvante.Le pipeline d'apprentissage est ensuite testé sur un jeu de données de taille moyenne sans images, dans le but cette fois de prédire la rechute métastatique à court terme de patientes atteinte d'un cancer du sein. La classification des patientes est ensuite comparée à la prédiction du temps de rechute fournie par un modèle mécanistique de l'évolution des lésions.Enfin nous discutons de l'apport des techniques plus avancées de l'apprentissage statistique pour étendre l'automatisation de notre chaîne de traitement (segmentation automatique des tumeurs, analyse quantitative de l'oedème péri-tumoral).

Machine Learning based Improved Automatic Diagnosis of Soft Tissue Tumors (STS)

Conference Paper

Nov 2022

Improvement of Automatic Diagnosis of Soft Tissue Tumours Using ML

Research Proposal

Full-text available

Nov 2021

STTs are a type of sarcoma that develops in the tissues that connect, support, and surround bodily structures. Because of their scarcity and diversity, they are difficult to detect when seen using Magnetic Resonance Imaging (MRI). They are frequently mistaken with other disorders, and diagnostic errors have a significant negative impact on patients' medical care. Several methods for classifying these tumours have been presented by researchers, but none have satisfactorily addressed the problem of misdiagnosis. This is due to the fact that most research that have developed models for evaluating such tumours ignore the heterogeneity and magnitude of the data. Because of their scarcity and diversity, they are difficult to detect when seen using Magnetic Resonance Imaging (MRI). They are frequently mistaken with other disorders, and diagnostic errors have a significant negative impact on patients' medical care. Several methods for classifying these tumours have been presented by researchers, but none have satisfactorily addressed the problem of misdiagnosis. This is due to the fact that most research that have developed models for evaluating such tumours ignore the heterogeneity and magnitude of the data. As a result, we offer a machine learning-based strategy that incorporates a new pre-processing technique for features modification, resampling approaches to minimise discrepancies, and Decision Tree (DT) algorithms to eliminate discrepancies. Applying Machine learning processes could provide effective tools to aid in the automatic decision-making processes of STT diagnosis.

Comparison Between Fuzzy Kernel C-Means, Fuzzy Kernel Possibilistic C-Means and Support Vector Machines in Soft Tissue Tumor Classification

Chapter

Feb 2020

Soft Tissue Tumor (STT) are cell growths, whose existence are not limited to the presence of tumors in soft tissues. Furthermore, they are classified into soft tissue and non-soft tissue tumor and early detection is important to determine the right course of treatment. This research, therefore, aims to compare fuzzy kernel c-means, fuzzy kernel possibilistic c-means and support vector machines on Soft Tissue Tumor dataset, obtained from Nur Hidayah Hospital, Yogyakarta, Indonesia, consisting of 50 STT and 25 non-STT samples. The results conclude that fuzzy kernel c-means provides a better running time when using the parameter \( \sigma = 0.05 \). However, support vector machines, with the parameter \( \sigma = 0.0001 \) performs better than other methods in terms of accuracy, sensitivity, precision, and F1-Score.

Texture Feature Selection Using GA for Classification of Human Brain MRI Scans

Chapter

Full-text available

Jun 2016

Intelligent Medical Image Analysis plays a vital role in identification of various pathological conditions. Magnetic Resonance Imaging (MRI) is a useful imaging technique that is widely used by physicians to investigate different pathologies. Increase in computing power has introduced Computer Aided Diagnosis (CAD) which can effectively work in an automated environment. Diagnosis or classification accuracy of such a CAD system is associated with the selection of features. This paper proposes an enhanced brain MRI classifier targeting two main objectives, the first is to achieve maximum classification accuracy and secondly to minimize the number of features for classification. Two different machine learning algorithms are enhanced with a feature selection pre-processing step. Feature selection is performed using Genetic Algorithm (GA) while classifiers used are Support Vector Machine (SVM) and K-Nearest Neighbor (KNN).

Textural Features for Image Classification

Article

Full-text available

Jan 1973

Handbook of pattern recognition & computer vision

Article

Full-text available

Jan 1998

This chapter reviews and discusses various aspects of texture analysis. The concentration is on the various methods of extracting textural features from images. The geometric, random field, fractal, and signal processing models of texture are presented. The major classes of texture processing problems such as segmentation, classification, and shape from texture are discussed. The possible application areas of texture such as automated inspection, document processing, and remote sensing are summarized. A bibliography is provided at the end for further reading.

Texture Analysis Methods - A Review

Article

Full-text available

Jan 1998

Methods for digital-image texture analysis are reviewed based on available literature and research work either carried out or supervised by the authors. The review has been prepared on request of Dr Richard Lerski, Chairman of the Management Committee of the COST B11 action "Quantitation of Magnetic Resonance Image Texture".

Cost-Sensitive Classifier Evaluation Using Cost Curves

Conference Paper

Full-text available

May 2008

The evaluation of classier performance in a cost-sensitive setting is straightforward if the operating conditions (misclassication costs and class dis- tributions) are x ed and known. When this is not the case, evaluation requires a method of visualizing classier performance across the full range of possi- ble operating conditions. This talk outlines the most important requirements for cost-sensitive classier evaluation for machine learning and KDD researchers and practitioners, and introduces a recently developed technique for classier perfor- mance visualization ñ the cost curve ñ that meets all these requirements.

Imaging of Soft Tissue Tumors

Book

Jan 2006

This richly illustrated book provides a comprehensive survey of the growing role of medical imaging studies in the detection, staging, grading, tissue characterization, and post-treatment follow-up of soft tissue tumors. For each tumor group, imaging findings are correlated with clinical, epidemiologic, and histologic data. The relative merits and indications of various imaging modalities are discussed and compared. Particular emphasis is placed on MRI because of its unique contrast resolution and multiplanar imaging capabilities. This third, revised and updated edition includes new chapters on genetics and molecular biology and on pathology of soft tissue tumors, with respect to the new World Health Organization (WHO) calssification of soft tissue tumors. It aims to serve both as a systematic, descriptive textbook and as a rich pictorial database of soft tissue masses. The addition of numerous new illustrations of common and rare soft tissue tumors, will further increase the scientific and educational value of this third edition. This clinically oriented book will be of use not only to radiologists but also to orthopedic surgeons, oncologists and pathologists.

Combined 5 - 2 cv f test for comparing supervised classification learning classifiers

Article

Nov 1999

Ethem Alpaydın

Dietterich (1998) reviews five statistical tests and proposes the 5 x 2 cv t test for determining whether there is a significant difference between the error rates of two classifiers. In our experiments, we noticed that the 5 x 2 cv t test result may vary depending on factors that should not affect the test, and we propose a variant, the combined 5 x 2 cv F test, that combines multiple statistics to get a more robust test. Simulation results show that this combined version of the test has lower type I error and higher power than 5 x 2 cv proper.

On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach

Article

Jan 1997

S.L. Salzberg

An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very carefully, comparative studies of classification and other types of algorithms can easily result in statistically invalid conclusions. This is especially true when one is using data mining techniques to analyze very large databases, which inevitably contain some statistically unlikely data. This paper describes several phenomena that can, if ignored, invalidate an experimental comparison. These phenomena and the conclusions that follow apply not only to classification, but to computational experiments in almost any aspect of data mining. The paper also discusses why comparative analysis is more important in evaluating some types of algorithms than for others, and provides some suggestions about how to avoid the pitfalls suffered by many experimental studies.

Texture classification and segmentation using multiresolution simultaneous autoregressive models

Article

Feb 1992
PATTERN RECOGN

We present a multiresolution simultaneous autoregressive (MR-SAR) model for texture classification and segmentation. First, a multivariate rotation-invariant SAR (RISAR) model is introduced which is based on the circular autoregressive (CAR) model. Experiments show that the multivariate RISAR model outperforms the CAR model in texture classification. Then, we demonstrate that integrating the information extracted from multiresolution SAR models gives much better performance than single resolution methods in both texture classification and texture segmentation. A quality measure to evaluate individual features for the purpose of segmentation is also presented. We employ the spatial coordinates of the pixels as two additional features to remove small speckles in the segmented image, and carefully examine the role that the spatial features play in texture segmentation. Two internal indices are introduced to evaluate the unsupervised segmentation and to find the “true” number of segments or clusters existing in the textured image.

Diagnosis of breast tumors with ultrasonic texture analysis using support vector machines

Article

Apr 2006

This study presents a computer-aided diagnosis (CAD) system with textural features for classifying benign and malignant breast tumors on medical ultrasound systems. A series of pathologically proven breast tumors were evaluated using the support vector machine (SVM) in the differential diagnosis of breast tumors. The proposed CAD system utilized facile textural features, i.e., block difference of inverse probabilities, block variation of local correlation coefficients and auto-covariance matrix, to identify breast tumor. An SVM classifier using the textual features classified the tumor as benign or malignant. The proposed system identifies breast tumors with a comparatively high accuracy. This can help inexperienced physicians avoid misdiagnosis. The main advantage of the proposed system is that the training and diagnosis procedure of SVM are faster and more stable than that of multilayer perception neural networks. With the expansion of the database, new cases can easily be gathered and used as references. This study dramatically reduces the training and diagnosis time. The SVM is a reliable choice for the proposed CAD system because it is fast and excellent in ultrasound image classification.

Machine Learning Study of Several Classifiers Trained With Texture Analysis Features to Differentiate Benign from Malignant Soft-Tissue Tumors in T1-MRI Images

Article

Mar 2010
J MAGN RESON IMAGING

To study, from a machine learning perspective, the performance of several machine learning classifiers that use texture analysis features extracted from soft-tissue tumors in nonenhanced T1-MRI images to discriminate between malignant and benign tumors. Texture analysis features were extracted from the tumor regions from T1-MRI images of clinically proven cases of 49 malignant and 86 benign soft-tissue tumors. Three conventional machine learning classifiers were trained and tested. The best classifier was compared to the radiologists by means of the McNemar's statistical test. The SVM classifier performs better than the neural network and the C4.5 decision tree based on the analysis of their receiver operating curves (ROC) and cost curves. The classification accuracy of the SVM, which was 93% (91% specificity; 94% sensitivity), was better than the radiologist classification accuracy of 90% (92% specificity; 81% sensitivity). Machine learning classifiers trained with texture analysis features are potentially valuable for detecting malignant tumors in T1-MRI images. Analysis of the learning curves of the classifiers showed that a training data size smaller than 100 T1-MRI images is sufficient to train a machine learning classifier that performs as well as expert radiologists.

Classification of Soft Tissue Tumors by Machine Learning Algorithms

Figures

Recommended publications

Approach to the Diagnosis of Bone and Soft Tissue Tumors - Clinical, Radiologic, and Classification...

WHO classification of tumors of soft tissue

The case: Dedifferentiated Liposarcoma

Histologic classification of soft tissue tumors (WHO, 1994)