ChapterPDF Available

Speeding Up Statistical Tests to Detect Recurring Concept Drifts

June 2013

June 2013

DOI:10.1007/978-3-319-00804-2_10

In book: Computer and Information Science (pp.129-142)
Edition: 1, Series: Studies in Computational Intelligence, Vol. 493
Chapter: 10
Publisher: Springer International Publishing
Editors: Roger Lee

Authors:

Paulo Mauricio Gonçalves Jr.

Instituto Federal de Educação, Ciência e Tecnologia de Pernambuco (IFPE)

Roberto Souto Maior de Barros

Federal University of Pernambuco

RCD is a framework for dealing with recurring concept drifts. It reuses previously stored classifiers that were trained on examples similar to actual data, through the use of multivariate non-parametric statistical tests. The original proposal performed statistical tests sequentially. This paper improves RCD to perform the statistical tests in parallel by the use of a thread pool and presents how parallelism impacts performance. Results show that using parallel execution can considerably improve the evaluation time when compared to the corresponding sequential execution in environments where many concept drifts occur.

RCD classifiers set

…

Results for a buffer with 100 instances and test frequency of 500 instances (in sec- onds)

…

Thread pool management for a buffer with 100 instances and test frequency of 500 instances (in milliseconds)

…

Results for a buffer and test frequency of 100 instances (in seconds)

…

Results for a buffer and test frequency of 500 instances (in seconds)

…

Figures - uploaded by Roberto Souto Maior de Barros

Content may be subject to copyright.

Content uploaded by Roberto Souto Maior de Barros

Content may be subject to copyright.

Content uploaded by Roberto Souto Maior de Barros

Content may be subject to copyright.

Speeding Up Statistical Tests to Detect

Recurring Concept Drifts

Paulo Mauricio Gonc¸alves J´unior and Roberto Souto Maior de Barros

Abstract.

RCD is a framework for dealing with recurring concept drifts. It reuses

previously stored classiﬁers that were trained on examples similar to actual data,

through the use of multivariate non-parametric statistical tests. The original pro-

posal performed statistical tests sequentially. This paper improves

RCD to perform

the statistical tests in parallel by the use of a thread pool and presents how paral-

lelism impacts performance. Results show that using parallel execution can consid-

erably improve the evaluation time when compared to the corresponding sequential

execution in environments where many concept drifts occur.

Keywords: Data streams, recurring concept drifts, multivariate non-parametric sta-

tistical tests, parallelism.

1 Introduction

Concept drift is a common situation when dealing with data streams. Several authors

have deﬁned it in different terms. One of these deﬁnitions was stated by Wang et

al. [23]: “the term concept refers to the quantity that a learning model is trying

to predict, i.e., the variable. Concept drift is the situation in which the statistical

properties of the target concept change over time.” Kolter and Maloof offered a more

informal deﬁnition: “concept drift occurs when a set of examples has legitimate

class labels at one time and has different legitimate labels at another time” [17].

Paulo Mauricio Gonc¸alves J´unior

Instituto Federal de Educac¸˜ao, Ciˆencia e Tecnologia de Pernambuco, Cidade Universit´aria,

50.740-540, Recife, Brasil

e-mail: paulogoncalves@recife.ifpe.edu.br

Roberto Souto Maior de Barros

Centro de Inform´atica, Universidade Federal de Pernambuco, Cidade Universit´aria,

50.740-560, Recife, Brasil

e-mail: roberto@cin.ufpe.br

R. Lee (Ed.): Computer and Information Science, SCI 493, pp. 129–142.

DOI: 10.1007/978-3-319-00804-2_10

 Springer International Publishing Switzerland 2013

130 P.M. Gonc¸alves J´unior and R.S.M. de Barros

Concept drifts may occur in several different situations, in applications such as

spam ﬁltering [6], credit card fraud detection [22], and intrusion detection [18].

In recent years, many proposals have been made to deal with concept drifts, like

the use of concept drift detectors and ensemble classiﬁers. One actual solution to

deal with recurring concept drifts, named

RCD, was previously proposed to perform

non-parametric multivariate statistical tests to identify if a concept is recurring or

not, and if so, reuse the classiﬁer built on similar data.

In this paper, we present the results of executing the statistical tests in parallel:

how much faster it is when compared to sequential execution, in which situations it

reports better results, the inﬂuence of abrupt and gradual concept drifts in the test

results, and how

RCD performs in environments with different number of processing

cores.

The rest of this paper is organized as follows: Sect. 2 presents some common

techniques used to deal with concept drifts; Sect. 3 summarizes the

RCD framework

and how the parallelism was implemented; Sect. 4 describes the data sets used and

their parameters, the evaluation methodology, the

RCD conﬁguration, and other in-

formation about the experiments; Sect. 5 introduces the results of the experiments;

and, ﬁnally, Sect. 6 presents our conclusions.

2 Background

There are many approaches used to deal with concept drifts. One approach is to

create a single classiﬁer that adapts its internal structure as new data arrive. A com-

monly used single classiﬁer is based on a Hoeffding tree [7], also named

VFDT

(Very Fast Decision Tree). It is a decision tree that uses a Hoeffding bound to cal-

culate how much data it needs to process in order to select the value of a decision

node. Accuracy of results is similar to that of a batch decision tree, but using much

less memory. In its original form, it was not designed to handle concept drifts. Many

extensions have already been proposed to adapt Hoeffding trees to deal with concept

drifts.

One of these proposals is named

CVFDT (Concept-adapting Very Fast Decision

Tree) [16]. It states that

CVFDT “is an extension to VFDT which maintains VFDT’S

speed and accuracy advantages but adds the ability to detect and respond to changes

in the example-generating process”. It uses a sliding window of examples to try to

keep its model up-to-date. For each new arriving instance, statistics are recomputed,

reducing the inﬂuence of older instances. When the concept begins to change, alter-

native attributes increase their information gain, making the Hoeffding test on the

split to fail. An alternative tree begins to grow with the new best attribute at its root.

If this subtree becomes more accurate than the old one on new data, it is substituted.

VFDTC [13], on the other hand, extends VFDT with the ability to deal with numeric

attributes and uses naive Bayes classiﬁers at tree leaves. Proposals with decision

rules were also made [9].

Another common approach to deal with a concept drift is to identify when it

occurs and create a new classiﬁer. Therefore, only classiﬁers trained on a current

Speeding Up Statistical Tests to Detect Recurring Concept Drifts 131

concept are maintained. Algorithms that follow this approach work in the following

way: each arriving training instance is ﬁrst evaluated by the base classiﬁer. Internal

statistics are updated with the results and two thresholds are computed: a warning

level and an error level. As the base classiﬁer makes mistakes, the warning level

is reached and instances are stored. If the behavior continues, the error level will

be reached, indicating that a concept drift has occurred. At this moment, the base

classiﬁer is destroyed and a new base classiﬁer is created and initially trained on

the stored instances. On the other hand, if the classiﬁer starts to correctly evaluate

instances, this situation is considered a false alarm and stored instances are ﬂushed.

Algorithms that follow this approach can work with any type of classiﬁer as they

only analyze how the classiﬁer evaluates instances.

One example of this approach is

DDM (Drift Detection Method) [10]. It works

by controlling the algorithm’s error rate. For each point i in the sequence of arriving

instances, the error rate is computed as the probability of misclassifying (p

), with

standard deviation given by s



(1 − p

)/i. Statistical theory guarantees that,

when the distribution changes, the error will increase. The values of p

and s

are

stored when p

+ s

reaches its minimum value during the process (obtaining p

min

and s

min

). The warning level is reached when p

+ s

≥ p

min

+ 2× s

min

and the error

level is set at p

+ s

≥ p

min

+ 3 × s

min

Another similar method is Early Drift Detection Method (

EDDM) [1]. It works

similarly to

DDM, but, instead of controlling solely the amount of error of the clas-

siﬁer, it uses the distance between two errors to identify concept drifts. It computes

the average distance between two errors (p

) and the standard deviation of p

These values are stored when p

+2× s

reaches its maximum value (obtaining p

max

and s

max

). Thus, the value of p

max

+ 2 × s

max

corresponds to the point where the

distribution of distances between errors is maximum.

EDDM was shown to be more

adequate to detect gradual concept drifts while

DDM was better suited for abrupt

concept drifts [1].

Exponentially weighted moving average (

EWMA) charts [19] were originally pro-

posed for detecting an increase in the mean of a sequence of random variables, con-

sidering that the mean and standard deviation of the stream are known. Yeh et al.

[25] proposed an

EWMA change detector for a sequence of random variables that

form a Bernoulli distribution.

ECDD (EWMA for Concept Drift Detection) [20], ex-

tends

EWMA to monitor the misclassiﬁcation rate of a streaming classiﬁer, allowing

the rate of false positive detection to be controlled and kept constant over time.

Several proposals try to deal with concept drifts by the use of ensemble classi-

ﬁers. This approach maintains a collection of learners and combine their decisions

to make an overall decision. To deal with concept drifts, ensemble classiﬁers must

take into account the temporal nature of the data stream.

EARN

.NSE is a recent proposal of an ensemble classiﬁer. The original al-

gorithm [8] works as follows: a single classiﬁer is created for each data set that

becomes available. The algorithm ﬁrst evaluates the classiﬁcation accuracy of the

current ensemble on the newly available data, obtained by the weighted majority

voting of all classiﬁers in the ensemble. Its error is computed as a simple ratio of

the correctly identiﬁed instances of the new data set and normalized in the interval

132 P.M. Gonc¸alves J´unior and R.S.M. de Barros

[0,1]. Then, the weight of the instances are updated: the weights of the instances

misclassiﬁed by the ensemble are reduced by a factor of the normalized error. The

weights are then normalized; a new classiﬁer is created; and all the classiﬁers gener-

ated so far are evaluated on the current data set, by computing their weighted error.

If the error of the most recent classiﬁer is greater than 0.5, it is discarded and a new

one is created. For each of the other classiﬁers, if its error is greater than 0.5, its

voting power is removed during the weighted majority voting.

Another proposal for ensemble classiﬁer is

DWAA (Dynamic Weight Assignment

and Adjustment) [24]. It creates classiﬁers based on data chunks, using the next

chunk to evaluate the classiﬁer previously built. If the ensemble is not full, the clas-

siﬁer is added; otherwise, the worst classiﬁer in the last data chunk is replaced. To

set the weight, it uses a formula that considers how many of the ensemble classiﬁers

have actually made correct predictions. If more than half of the classiﬁers predic-

tions are correct, each one receives a normal reward. Otherwise, each one receives

a higher reward, making those inﬂuence more the global decision of the ensemble,

as they are better suited to represent the concept.

3 Parallel RCD

RCD [14, 15] is a framework developed to deal with recurring concept drifts. It

keeps a collection of pairs of classiﬁers and samples used to train these classiﬁers,

as presented in Fig. 1. In the training phase, a concept drift detector is used. If it

identiﬁes a concept drift, a multivariate non-parametric statistical test is performed

to compare actual data to stored samples. If the statistical test informs that both data

come from the same distribution, the classiﬁer associated with the stored sample is

reused, meaning that the classiﬁer is adequate to deal with actual data.

On the other hand, if the test indicates that samples are not similar, the next stored

data sample is used for testing, and so on. If no stored classiﬁer is apt for actual data,

a new classiﬁer is created and stored in the set. If the set is full, the older classiﬁer

is substituted. In the testing phase, statistical tests are performed every t instances

(a user parameterized value) to select, from the stored classiﬁers, the best one for

actual data. Thus,

RCD dynamically adapts to the current data distribution even in

the testing phase.

Originally,

RCD performed the statistical tests sequentially. Thus, a statistical test

would be performed comparing actual data todatastoredinthebufferforclassiﬁer

1 to verify if both represented the same data distribution. If positive, this classiﬁer

Classiﬁer 1

Buffer 1

Classiﬁer 2

Buffer 2

Classiﬁer 3

Buffer 3

...

Classiﬁer n

Buffer n

Fig. 1 RCD classiﬁers set

Speeding Up Statistical Tests to Detect Recurring Concept Drifts 133

was considered the new actual classiﬁer; else, a statistical test would be performed

on classiﬁer 2 buffer data, and so on.

The improvement being proposed is to perform several tests simultaneously by

the use of a thread pool of conﬁgurable ﬁxed size to allow the user to ﬁne tune

its value based on the hardware being used. Fig. 2 presents an example illustrating

how the thread pool works. It considers a thread pool with two active cores and a

classiﬁers set of size six.

When a concept drift occurs, it means the actual classiﬁer does not correctly

represent the actual context. So, it is necessary to check whether any stored classiﬁer

better represents the actual context. The remaining ﬁve classiﬁers stored in the set

must be tested, comparing a sample of actual data to the data stored in the buffer

associated with each classiﬁer which represents the data the classiﬁer was trained

on. Five threads are built to perform the statistical tests and they are sent to the thread

pool using a FIFO scheme to associate each test to a position in the thread pool, but

only the ﬁrst two are active, i.e., are actually performing a statistical test. At Fig. 2

they are represented by bolder lines, and inactive threads by thinner lines. At this

point (t = 0), two statistical tests are active and the remaining three are waiting to

execute.

When the ﬁrst statistical test ﬁnishes (let’s consider statistical test 1), if the result

indicates that actual data and sample data from classiﬁer 1 do not represent the

same data distribution, the next inactive statistical test (in this case, statistical test

3) executes in the corresponding slot (t = 1). At t = 2, the same occurs. Classiﬁer

2, represented by statistical test 2, also does not better represent actual data and the

next statistical test (number 4) occupies its place.

Now, let’s consider that statistical test 3 has ﬁnished and it identiﬁed that actual

data and data stored in the buffer of classiﬁer 3 represent the same distribution. In

Active

Slot 1

Active

Slot 2

Inactive

Slot 1

Inactive

Slot 2

...

Statistical

Test 1

Statistical

Test 2

Statistical

Test 3

Statistical

Test 4

Statistical

Test 5

t = 0

Statistical

Test 3

Statistical

Test 2

Statistical

Test 4

Statistical

Test 5

t = 1

Statistical

Test 3

Statistical

Test 4

Statistical

Test 5

t = 2

Fig. 2 Example of a thread pool execution

134 P.M. Gonc¸alves J´unior and R.S.M. de Barros

this situation, this classiﬁer substitutes the actual classiﬁer, all other active statistical

tests are stopped from executing, and the inactive ones are canceled.

This scheme is interesting because, if a test is negative, the next test to perform

is already being executed, allowing a faster performance of the algorithm. If a test

is positive, all other executing tests are stopped and tests yet to be executed do not

enter the active thread pool.

Notice that this scheme is general and allows the execution of any statistical

test in parallel. Source code and instructions on how to use

RCD are available as a

MOA extension and can be obtained at http://sites.google.com/site/

moaextensions/.

4 Experiments Conﬁguration

We used several data sets to perform the experiments: Hyperplane [16], LED [11],

SEA [21], Forest Covertype [4], Poker Hand [3], and Electricity [10, 12]. The ﬁrst

three are artiﬁcial data sets: the ﬁrst one presents gradual concept drifts while the

following two present abrupt concept drifts. The last three are real-world data sets.

These data sets and their conﬁgurations are the same as used by Bifet et al. [3].

Hyperplane was tested in ten million instances while LED and SEA, in one million.

All tests in the artiﬁcial data sets were repeated ten times and computed a 95%

conﬁdence interval. The parameters of these streams are the following:

• HYP(x,v) represents a Hyperplane data stream with x attributes changing at

speed v;

• LED(v) appends four concepts (1, 3, 5, 7), each one representing a different num-

ber of drifting attributes with length of change v;

• SEA(v) uses the same four concepts and in the same order as deﬁned in the

original paper [21], with length of change v.

The

RCD conﬁguration used in the experiments includes naive Bayes as base learner,

classiﬁers collection size set to 15,

KNN as the statistical test used (with k = 3), and

the minimum amount of similarity between data samples set to 0.05. Two buffer

sizes, two test frequencies (only in the testing phase), and three thread pool sizes

have been used.

The evaluation methodology used was Interleaved Chunks, also known as data

block evaluation method [5], on ten runs. It initially reads a block of d instances.

When the block is formed, it uses the instances for testing the existing classiﬁer and

then the classiﬁer is trained on the instances. This methodology was used because it

is better suited to compute training and testing times. In the following experiments,

d was set to 100,000 instances in the Hyperplane, Covertype, and Poker Hand data

sets, and to 10,000 instances in the LED, SEA, and Electricity data sets.

All the experiments were performed using the Massive Online Analysis (

MOA)

framework [2] in a Core i3 330M processor with 4GB of main memory running

Windows 7 Professional. This processor has four cores, two physical and two virtual

ones, where each core runs at 2.13GHz.

Speeding Up Statistical Tests to Detect Recurring Concept Drifts 135

We used a modiﬁed version of the Interleaved Chunks presented in the MOA

framework because the version available in the tool computes the time the thread

executing the classiﬁer uses the processor. In this solution, it is possible to exe-

cute other applications at the same time and the results will not be affected. How-

ever, because we will use a thread pool to perform the statistical tests, these are

not computed because the original thread may not be active in the processor. Here,

we compute the real time taken by

RCD to perform, not being possible to run other

applications at the same time.

5Results

Table 1 presents the average number of detected concept drifts, classiﬁers set size,

number of reused classiﬁers, as well as the evaluation, train and test times (in sec-

onds) for

RCD considering the ten runs, using a buffer size with 100 instances and a

test frequency of 500, considering thread pools with one, two, and four active cores.

It is worth pointing out the results were quite similar in the artiﬁcial data sets,

regardless of the thread pool size. This behavior did not occur in the real-world data

sets and is probably related to the number of detected concept drifts. In the artiﬁcial

data sets, the average number of detected concept drifts are considerably low, as can

be seen in the ﬁrst column (CD), because a small number of concept drifts demand

few statistical tests to be performed.

However, not only the number of detected concept drifts inﬂuences performance.

Reusing classiﬁers also matters. If the ﬁrst tests identify similarity between distribu-

tions, several other tests will not be executed, reducing the beneﬁts of using a thread

pool. On the other hand, if only the last tests or none at all identify similarity, more

tests need to be executed. This is the situation expected to be more beneﬁted from

the parallelization of the statistical tests.

For example, the average classiﬁers collection size (CS) in the artiﬁcial data sets

is below three, not being even close to ﬁll the set (15 classiﬁers). Having few stored

classiﬁers indicates that few statistical tests need to be performed. The difference

between the number of detected concept drifts and the number of stored classiﬁers

is due to the reuse of classiﬁers. Analyzing the column with the number of reused

classiﬁers (RC), we can see that the values are very close to the ones presented in

the ﬁrst column. This means that in the majority of the concept drifts a classiﬁer was

reused.

In this

RCD conﬁguration, analyzing the artiﬁcial data sets, using one core had

better statistical results than using two cores in both conﬁgurations of Hyperplane

and in LED but worse results in both versions of SEA. Using one core performed

statistically better in the HYP(10, 0.0001) and LED data sets when compared to

using four cores but worse performance in SEA, similarly to using two cores. In the

HYP(10, 0.001) data set, both had statistically similar results. Using four cores had

better statistical results than using two cores in the Hyperplane and SEA data sets,

and similar ones in LED.

136 P.M. Gonc¸alves J´unior and R.S.M. de Barros

Table 1 Results for a buffer with 100 instances and test frequency of 500 instances (in sec-

onds)

Data sets CD CS RC

1 core 2 cores 4 cores

eval train test eval train test eval train test

HYP(10,0.001) 7.7 2.6 6.0 99.64 44.45 36.30 101.49 45.40 39.98 100.13 46.13 38.20

HYP(10,0.0001) 8.2 2.4 6.7 98.99 44.06 36.15 101.49 45.35 40.29 100.10 46.18 38.18

SEA(50) 0.1 1.1 0.0 4.70 2.11 1.09 4.32 1.53 1.31 4.30 1.56 1.27

SEA(50000) 0.4 1.1 0.3 4.63 1.97 1.25 4.26 1.54 1.31 4.23 1.56 1.27

LED(50000) 0.3 1.2 0.1 32.56 14.23 12.78 33.54 14.72 13.25 33.54 14.70 13.24

Covertype 2980.0 15.0 844.0 98.12 80.68 8.30 84.24 66.99 8.25 73.99 56.77 8.32

Poker Hand 1871.0 15.0 109.0 45.72 37.86 3.78 38.31 30.45 3.81 31.67 23.82 3.82

Electricity 212.0 15.0 30.0 2.89 2.48 0.09 2.40 1.98 0.09 2.00 1.58 0.09

Table 2 Thread pool management for a buffer with 100 instances and test frequency of 500

instances (in milliseconds)

Creation time Execution time Destruction time

1 core 2 cores 4 cores 1 core 2 cores 4 cores 1 core 2 cores 4 cores

HYP(10,0.001) 2.49 1.03 1.03 1.87 2.22 2.63 0.00 0.00 0.00

HYP(10,0.0001) 2.49 1.16 0.40 1.73 2.90 3.68 0.00 0.00 0.00

SEA(50) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

SEA(50000) 4.00 0.00 0.00 0.00 0.00 4.00 0.00 0.00 0.00

LED(50000) 0.00 5.33 0.00 5.00 0.00 10.33 0.00 0.00 0.00

Covertype 2.12 2.25 2.97 25.76 19.84 15.04 0.03 0.02 0.04

Poker Hand 2.01 2.21 2.96 14.81 10.89 6.10 0.01 0.01 0.06

Electricity 2.16 1.95 2.92 8.82 6.58 3.42 0.00 0.00 0.00

Table 3 Results for a buffer and test frequency of 100 instances (in seconds)

Data sets CD CS RC

1 core 2 cores 4 cores

eval train test eval train test eval train test

HYP(10,0.001) 8.4 2.8 6.5 422.74 45.61 360.38 487.43 46.86 424.62 499.94 46.87 437.12

HYP(10,0.0001) 10.2 2.3 8.2 421.24 45.37 359.88 459.86 46.66 397.05 454.82 46.63 392.47

SEA(50) 0.1 1.1 0.0 6.18 1.51 3.15 6.25 1.55 3.19 6.22 1.54 3.16

SEA(50000) 0.2 1.1 0.1 6.27 1.53 3.23 6.35 1.56 3.29 6.34 1.58 3.25

LED(50000) 2.3 1.2 2.1 37.22 14.25 17.38 38.47 14.89 17.98 38.51 14.90 18.06

Covertype 3376.0 15.0 944.0 534.32 86.47 438.71 354.76 72.31 273.38 315.25 60.87 245.36

Poker Hand 1871.0 15.0 109.0 305.93 37.83 264.08 229.43 30.78 194.59 166.22 23.65 138.53

Electricity 212.0 15.0 30.0 12.40 2.43 9.62 9.50 2.07 7.08 6.80 1.50 4.96

Table 4 Thread pool management for a buffer and test frequency of 100 instances (in

milliseconds)

Creation time Execution time Destruction time

1 core 2 cores 4 cores 1 core 2 cores 4 cores 1 core 2 cores 4 cores

HYP(10,0.001) 0.39 0.49 0.51 2.85 3.36 3.47 0.01 0.01 0.01

HYP(10,0.0001) 0.32 0.37 0.36 2.91 3.21 3.17 0.01 0.01 0.01

SEA(50) 0.03 0.03 0.02 0.15 0.15 0.15 0.00 0.00 0.00

SEA(50000) 0.02 0.03 0.03 0.16 0.16 0.16 0.00 0.00 0.00

LED(50000) 0.03 0.03 0.04 0.40 0.41 0.41 0.00 0.00 0.00

Covertype 2.05 2.35 3.14 73.84 46.14 39.73 0.01 0.02 0.07

Poker Hand 2.14 2.39 3.08 30.97 21.95 14.00 0.01 0.02 0.08

Electricity 2.08 2.75 2.85 21.79 15.07 9.69 0.08 0.00 0.03

Speeding Up Statistical Tests to Detect Recurring Concept Drifts 137

On the other hand, in the real-world data sets, more cores returned lower eval-

uation times: two cores were faster than one core and using four cores was faster

than the other two thread pool sizes. In the artiﬁcial data sets, using one core was,

on average, 1.90% faster than using two cores, but 14.84% slower in the real-world

data sets. Comparing one and four cores, similar results apply. One core was faster

by 0.74% in the artiﬁcial data sets while four cores was faster by 26.63% in the real

data sets. Using four cores had practically the same performance than using two

cores in artiﬁcial data sets, being 1.14% faster, but was 13.85% faster in the real-

world data sets. The real-world data sets presented a huge amount of concept drifts,

the classiﬁers set became full and the number of reused classiﬁers was also much

bigger than in the artiﬁcial data sets.

To better analyze how the thread pool inﬂuences performance, we computed the

average amount of time (in milliseconds) needed to create the thread pool and to

assign the statistical tests to their respective slots, to execute the thread pool, and to

ﬁnalize it. In the assignment stage, if a statistical test is assigned to an active slot,

it starts executing immediately, while other tests are still being assigned, so it is not

necessary to wait for all tests to be assigned a speciﬁc slot to start execution, saving

time. This information is presented at Table 2.

Observing the real-world data sets, it is possible to notice that, in general, the

greater the number of cores, the longer was the time spent in the creation of the

thread pool, but the differences are usually very small. In the execution time, using

more cores meant faster execution, with no exception. The destruction times were

usually negligible, taking less than 0.1 milliseconds.

The results of Table 3 are similar to those presented at Table 1 but the test fre-

quency in the experiments was increased to 100 instances. Again, parallelism out-

performs sequential solution in the real-world data sets, the ones with more detected

concept drifts.

In these tests, we can also notice that the evaluation time is mostly spent in the

testing phase, differently from the results of Table 1. As the test frequency is higher,

the evaluation time and the time spent in the testing phase increased considerably.

Making the tests more frequently also increased the number of detected concept

drifts in 50% of the data sets. In SEA(50,000), it was 0.4 to 0.2. In Poker Hand,

Electricity, and SEA(50), the number of detected concept drifts stayed the same.

Using one core was faster than using two cores in average by 11.72% in the ar-

tiﬁcial data sets, but was 30.37% slower in the real-world data sets. Comparing one

core to four cores similar results apply: one core was faster by 12.55% in the artiﬁ-

cial data sets and four cores was faster by 42.74% in the real-world data sets. Using

two cores offered better average results compared to using four cores by 0.75% in

the artiﬁcial data sets and worse performance by 17.76% in the real-world ones.

Table 4 presents similar information to those presented at Table 2. Results are

also similar: the creation times is usually slightly faster using less cores, the execu-

tion time is considerably smaller when using more cores, and destruction times are

usually less than 0.1 milliseconds.

Instead of increasing the test frequency, Table 5 presents similar information as

Tables 1 and 3 but increasing the buffer size to 500 instances. Here, the tests took

138 P.M. Gonc¸alves J´unior and R.S.M. de Barros

Table 5 Results for a buffer and test frequency of 500 instances (in seconds)

Data sets CD CS RC

1 core 2 cores 4 cores

eval train test eval train test eval train test

HYP(10,0.001) 11.9 2.6 9.9 1339.33 46.07 1275.16 1453.37 46.65 1388.60 1508.29 48.98 1441.27

HYP(10,0.0001) 9.5 2.4 7.5 1356.57 45.77 1293.32 1404.87 46.01 1341.41 1413.77 47.41 1348.88

SEA(50) 1.1 1.1 1.0 11.17 1.59 7.30 11.17 1.57 7.31 11.25 1.62 7.30

SEA(50000) 0.2 1.1 0.1 11.39 1.56 7.57 11.40 1.58 7.55 11.49 1.60 7.60

LED(50000) 0.2 1.2 0.0 57.99 14.28 37.33 53.98 14.25 33.33 55.14 14.92 33.85

Covertype 3063.0 15.0 798.0 1781.17 358.50 1413.39 1107.10 244.95 853.07 1167.37 268.35 889.92

Poker Hand 410.0 15.0 18.0 588.42 47.24 537.02 359.69 34.10 321.47 377.63 36.19 336.09

Electricity 183.0 15.0 20.0 33.38 8.85 24.15 20.87 6.02 14.46 22.92 7.19 15.26

Table 6 Thread pool management for a buffer and test frequency of 500 instances (in mil-

liseconds)

Creation time Execution time Destruction time

1 core 2 cores 4 cores 1 core 2 cores 4 cores 1 core 2 cores 4 cores

HYP(10,0.001) 0.54 0.60 0.60 61.90 67.54 70.14 0.02 0.02 0.02

HYP(10,0.0001) 0.50 0.55 0.51 62.94 65.31 65.66 0.02 0.01 0.02

SEA(50) 0.04 0.03 0.04 2.99 2.99 2.98 0.00 0.00 0.00

SEA(50000) 0.04 0.05 0.04 3.10 3.09 3.09 0.00 0.00 0.01

LED(50000) 0.06 0.04 0.05 12.32 10.27 10.26 0.00 0.00 0.00

Covertype 2.01 2.34 13.66 560.81 342.49 354.36 0.02 0.01 0.06

Poker Hand 2.27 2.65 6.36 314.13 186.57 187.92 0.01 0.06 0.03

Electricity 2.04 2.28 5.23 140.45 91.20 91.78 0.00 0.06 0.06

longer to complete; in average, 62 milliseconds compared to 3 milliseconds when

using a buffer with 100 instances. Nevertheless, the results were similar to the ones

presented at Table 3. Using parallelism was much faster in the real-world data sets

and slightly slower in the artiﬁcial ones. Using one core was 5.70% faster than using

two cores in the artiﬁcial data sets but 38.09% slower in the real-world ones.

However, it was interesting to observe that the evaluation time was lower using

two cores than with four. Comparing one and four cores similar results apply: one

core was 8.05% faster in the artiﬁcial data sets and 34.75% slower in the real-world

ones. Using two cores was slightly better than using four: 2.22% in the artiﬁcial

and 5.39% in the real-world data sets. This probably occurs because there are much

more statistical tests to perform and they take longer to complete than the tests in

the other conﬁgurations, putting a higher load on the whole system and negatively

affecting the performance.

Comparing Tables 1, 3 and 5, it is possible to notice that the increase in the

buffer size had a higher inﬂuence in the evaluation time than the increase in the

test frequency. Increasing the test frequency by ﬁve times increased the evaluation

time between 4.26 and 4.50 times. Increasing the buffer size by ﬁve times increased

the evaluation time between 11.95 and 13.37 times. The training time practically

did not change in the three conﬁgurations performed; the increase in the evaluation

time was due to the testing time.

Table 6 presents the times taken for the thread pool management, as previously

described at Tables 2 and 4. In the artiﬁcial data sets, the creation times are very

Speeding Up Statistical Tests to Detect Recurring Concept Drifts 139

close in the three sizes of active cores used. In the real-world data sets, one and two

cores take very similar times, and using four cores takes more time than the other

two. This probably occurs because, in the creation time, the statistical tests asso-

ciated with active cores begin executing while other tests are still being assigned.

Thus, the creation time tends to be bigger when there are more active cores and they

take longer to complete. We can see it comparing the three tables concerning thread

pool management. At Tables 2 and 4, the differences in the creation time between

one, two and four cores are almost negligible. In these cases, the average time taken

to perform a statistical test is three milliseconds. At Table 6, using four cores takes

more time than using one and two cores. Here, the average time taken to perform

the statistical test is 62 milliseconds. The execution times are very similar in the ar-

tiﬁcial data sets. In the real-world data sets, using two or four cores is considerably

faster than using one active core. Using two cores was faster than using four cores in

the Covertype data set, while in the other two real-world data sets, the performances

were quite similar.

6Conclusion

This paper studied the inﬂuence of executing parallel statistical tests in the RCD

framework using six data sets (eight conﬁgurations), with and without concept

drifts, with abrupt and gradual concept drifts, and considering artiﬁcial and real-

world data sets. Tests were performed with both sequential and parallel execution

of two and four statistical tests.

Analysis of the experiment results led to the conclusion that the execution of

parallel statistical tests was most beneﬁcial when there was a high number of de-

tected concept drifts leading to more statistical tests being performed. Tests were

also performed to analyze the performance results in the following conditions:

1. the buffer size was increased, making the statistical tests take longer to complete;

and

2. increase the test frequency, making more statistical tests to be performed.

In data sets with a small number of detected concept drifts (the artiﬁcial data sets),

the performances were quite similar, but using sequential execution had lower evalu-

ation times ranging from 0.74% to 12.55%. On the other hand, in the data sets with

a high number of detected concept drifts (the real-world ones), using parallelism

increased performance in values ranging from 13.85% to 42.74%.

Multiplying the test frequency ﬁve times increased the evaluation time more than

four times, while multiplying the size of the buffer ﬁve times increased the evalua-

tion time more than 11 times, indicating that the buffer size has a higher impact in

performance than test frequency.

The analysis of the thread pool creation, execution, and destruction times were

also performed, showing that, as expected, the major improvement occurs in the

execution phase. The creation times using different number of cores are close to

140 P.M. Gonc¸alves J´unior and R.S.M. de Barros

one another and the destruction times are commonly negligible, taking less than 0.1

milliseconds.

6.1 Future Work

Some other experiments might be made to better understand how the execution of

parallel statistical tests can improve the performance of the

RCD framework. One

of these experiments is testing the inﬂuence of the number of available cores in the

processor in performance. Other possible experiment is to analyze the inﬂuence of

other buffer sizes and test frequencies.

References

1. Baena-Garc´ıa, M., Del Campo-

Avila, J., Fidalgo, R., Bifet, A., Gavald`a, R., Morales-

Bueno, R.: Early drift detection method. In: International Workshop on Knowledge Dis-

covery from Data Streams, IWKDDS 2006, pp. 77–86 (2006),

http://eprints.pascal-network.org/archive/00002509/

2. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive online analysis. J. of

Mach. Learn. Res. 11, 1601–1604 (2010),

http://portal.acm.org/citation.cfm?id=1859890.1859903

3. Bifet, A., Holmes, G., Pfahringer, B., Frank, E.: Fast perceptron decision tree learn-

ing from evolving data streams. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.)

PAKDD 2010, Part II. LNCS (LNAI), vol. 6119, pp. 299–310. Springer, Heidelberg

(2010), http://dx.doi.org/10.1007/978-3-642-13672-6_30

4. Blackard, J.A., Dean, D.J.: Comparative accuracies of artiﬁcial neural networks and dis-

criminant analysis in predicting forest cover types from cartographic variables. Comput.

and Electron. in Agric. 24(3), 131–151 (1999),

http://dx.doi.org/10.1016/S0168-1699(99)00046-0

5. Brzezi´nski, D., Stefanowski, J.: Accuracy updated ensemble for data streams with con-

cept drift. In: Corchado, E., Kurzy´nski, M., Wo´zniak, M. (eds.) HAIS 2011, Part II.

LNCS, vol. 6679, pp. 155–163. Springer, Heidelberg (2011),

http://dx.doi.org/10.1007/978-3-642-21222-2_19

6. Delany, S.J., Cunningham, P., Tsymbal, A., Coyle, L.: A case-based technique

for tracking concept drift in spam ﬁltering. Knowl.-Based Syst. 18(4-5), 187–195

(2005), http://dx.doi.org/10.1016/j.knosys.2004.10.002; AI-2004,

Cambridge, England, December 13-15 (2004)

7. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth

ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,

KDD 2000, New York, NY, USA, pp. 71–80 (2000),

http://dx.doi.org/10.1145/347090.347107

8. Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environ-

ments. IEEE Trans. on Neural Netw. 22(10), 1517–1531 (2011),

http://dx.doi.org/10.1109/TNN.2011.2160459

9. Ferrer-Troyano, F., Aguilar-Ruiz, J.S., Riquelme, J.C.: Discovering decision rules from

numerical data streams. In: Proceedings of the 2004 ACM Symposium on Applied Com-

puting, SAC 2004, New York, NY, USA, pp. 649–653 (2004),

http://dx.doi.org/10.1145/967900.968036

Speeding Up Statistical Tests to Detect Recurring Concept Drifts 141

10. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan,

A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer,

Heidelberg (2004),

http://dx.doi.org/10.1007/978-3-540-28645-5_29

11. Gama, J., Medas, P., Rocha, R.: Forest trees for on-line data. In: Proceedings of the 2004

ACM Symposium on Applied Computing, SAC 2004, New York, NY, USA, pp. 632–

636 (2004),

http://dx.doi.org/10.1145/967900.968033

12. Gama, J., Medas, P., Rodrigues, P.: Learning decision trees from dynamic data streams.

In: Proceedings of the 2005 ACM Symposium on Applied Computing, SAC 2005, New

York, NY, USA, pp. 573–577 (2005),

http://dx.doi.org/10.1145/1066677.1066809

13. Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data

streams. In: Proceedings of the Ninth ACM SIGKDD International Conference on

Knowledge Discovery and Data Mining, KDD 2003, New York, NY, USA, pp. 523–528

(2003), http://dx.doi.org/10.1145/956750.956813

14. Gonc¸alves Jr., P.M., Barros, R.S.M.: A comparison on how statistical tests deal with

concept drifts. In: Arabnia, H.R., et al. (eds.) Proceedings of the 2012 International Con-

ference on Artiﬁcial Intelligence, ICAI 2012, vol. 2, pp. 832–838. CSREA Press, Las

Vegas (2012)

15. Gonc¸alves Jr., P.M., Barros, R.S.M.: RCD: A recurring concept drift framework. Pattern

Recognit. Lett. (to appear, 2013),

http://dx.doi.org/10.1016/j.patrec.2013.02.005

16. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Pro-

ceedings of the Seventh ACM SIGKDD International Conference on Knowledge Dis-

covery and Data Mining, KDD 2001, New York, NY, USA, pp. 97–106 (2001),

http://dx.doi.org/10.1145/502512.502529

17. Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: An ensemble method for drift-

ing concepts. J. of Mach. Learn. Res. 8, 2755–2790 (2007),

http://dl.acm.org/citation.cfm?id=1314498.1390333

18. Lane, T., Brodley, C.E.: Approaches to online learning and concept drift for user identiﬁ-

cation in computer security. In: Agrawal, R., Stolorz, P. (eds.) Proceedings of the Fourth

International Conference on Knowledge Discovery and Data Mining, KDD 1998, pp.

259–263. AAAI Press, Menlo Park (1998),

http://www.aaai.org/Papers/KDD/1998/KDD98-045.pdf

19. Roberts, S.W.: Control chart tests based on geometric moving averages. Technomet-

rics 1(3), 239–250 (1959), http://www.jstor.org/stable/1266443

20. Ross, G.J., Adams, N.M., Tasoulis, D.K., Hand, D.J.: Exponentially weighted moving

average charts for detecting concept drift. Pattern Recognit. Lett. 33(2), 191–198 (2012),

http://dx.doi.org/10.1016/j.patrec.2011.08.019

21. Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classiﬁca-

tion. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowl-

edge Discovery and Data Mining, KDD 2001, New York, NY, USA, pp. 377–382 (2001),

http://dx.doi.org/10.1145/502512.502568

22. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensem-

ble classiﬁers. In: Proceedings of the Ninth ACM SIGKDD International Conference on

Knowledge Discovery and Data Mining, KDD 2003, New York, NY, USA, pp. 226–235

(2003), http://dx.doi.org/10.1145/956750.956778

142 P.M. Gonc¸alves J´unior and R.S.M. de Barros

23. Wang, S., Schlobach, S., Klein, M.: Concept drift and how to identify it. Web Semant.:

Sci., Serv. and Agents on the World Wide Web 9(3), 247–265 (2011),

http://dx.doi.org/10.1016/j.websem.2011.05.003

24. Wu, D., Wang, K., He, T., Ren, J.: A dynamic weighted ensemble to cope with concept

drifting classiﬁcation. In: The 9th International Conference for Young Computer Scien-

tists, ICYCS 2008, pp. 1854–1859 (2008),

http://dx.doi.org/10.1109/ICYCS.2008.491

25. Yeh, A.B., Mcgrath, R.N., Sembower, M.A., Shen, Q.: Ewma control charts for monitor-

ing high-yield processes based on non-transformed observations. International Journal

of Production Research 46(20), 5679–5699 (2008),

http://dx.doi.org/10.1080/00207540601182252

ResearchGate has not been able to resolve any citations for this publication.

Early Drift Detection Method

Article

Full-text available

Jan 2006

An emerging problem in Data Streams is the detection of concept drift. This problem is aggravated when the drift is gradual over time. In this work we deflne a method for detecting concept drift, even in the case of slow gradual change. It is based on the estimated distribution of the distances between classiflcation errors. The proposed method can be used with any learning algorithm in two ways: using it as a wrapper of a batch learning algorithm or implementing it inside an incremental and online algorithm. The experimentation results compare our method (EDDM) with a similar one (DDM). Latter uses the error-rate instead of distance-error-rate.

RCD: A Recurring Concept Drift Framework

Article

Full-text available

Jul 2013
PATTERN RECOGN LETT

This paper presents recurring concept drifts (RCD), a framework that offers an alternative approach to handle data streams that suffer from recurring concept drifts (on-line learning). It creates a new classifier to each context found and stores a sample of data used to build it. When a new concept drift occurs, the algorithm compares the new context to previous ones using a non-parametric multivariate statistical test to verify if both contexts come from the same distribution. If so, the corresponding classifier is reused. The RCD framework is compared with several algorithms (among single and ensemble approaches), in both artificial and real data sets, chosen from frequently used algorithms and data sets in the concept drift research area. We claim the proposed framework had better average ranks in data sets with abrupt and gradual concept drifts compared to both the single classifiers and the ensemble approaches that use the same base learner.

A Comparison on How Statistical Tests Deal with Concept Drifts

Conference Paper

Full-text available

Jul 2012

RCD is a framework proposed to deal with recurring concept drifts. It stores classifiers together with a sample of data used to train them. If a concept drift occurs, RCD tests all the stored samples with a sample of actual data, trying to verify if this is a new context or an old one that is recurring. This is performed by a non-parametric multivariate statistical test to make the verification. This paper describes how two statistical tests (KNN and Cramer) can distinguish between new and old contexts. RCD is tested with several base classifiers, in environments with different rates-of-change values, with gradual and abrupt concept drifts. Results show that RCD improves single classifiers accuracy independently of the statistical test used.

Fast Perceptron Decision Tree Learning from Evolving Data Streams

Conference Paper

Full-text available

Mining of data streams must balance three evaluation dimensions: accuracy, time and memory. Excellent accuracy on data streams has been obtained with Naive Bayes Hoeffding Trees—Hoeffding Trees with naive Bayes models at the leaf nodes—albeit with increased runtime compared to standard Hoeffding Trees. In this paper, we show that runtime can be reduced by replacing naive Bayes with perceptron classifiers, while maintaining highly competitive accuracy. We also show that accuracy can be increased even further by combining majority vote, naive Bayes, and perceptrons. We evaluate four perceptron-based learning strategies and compare them against appropriate baselines: simple perceptrons, Perceptron Hoeffding Trees, hybrid Naive Bayes Perceptron Trees, and bagged versions thereof. We implement a perceptron that uses the sigmoid activation function instead of the threshold activation function and optimizes the squared error, with one perceptron per class value. We test our methods by performing an evaluation study on synthetic and real-world datasets comprising up to ten million examples.

Incremental Learning of Concept Drift in Nonstationary Environments

Article

Full-text available

Nov 2011

We introduce an ensemble of classifiers-based approach for incremental learning of concept drift, characterized by nonstationary environments (NSEs), where the underlying data distributions change over time. The proposed algorithm, named Learn++.NSE, learns from consecutive batches of data without making any assumptions on the nature or rate of drift; it can learn from such environments that experience constant or variable rate of drift, addition or deletion of concept classes, as well as cyclical drift. The algorithm learns incrementally, as other members of the Learn++ family of algorithms, that is, without requiring access to previously seen data. Learn++.NSE trains one new classifier for each batch of data it receives, and combines these classifiers using a dynamically weighted majority voting. The novelty of the approach is in determining the voting weights, based on each classifier's time-adjusted accuracy on current and past environments. This approach allows the algorithm to recognize, and act accordingly, to the changes in underlying data distributions, as well as to a possible reoccurrence of an earlier distribution. We evaluate the algorithm on several synthetic datasets designed to simulate a variety of nonstationary environments, as well as a real-world weather prediction dataset. Comparisons with several other approaches are also included. Results indicate that Learn++.NSE can track the changing environments very closely, regardless of the type of concept drift. To allow future use, comparison and benchmarking by interested researchers, we also release our data used in this paper.

Comparative Accuracies of Artificial Neural Networks and Discriminant Analysis in Predicting Forest Cover Types from Cartographic Variables

Article

Full-text available

Dec 1999
COMPUT ELECTRON AGR

This study compared two alternative techniques for predicting forest cover types from cartographic variables. The study evaluated four wilderness areas in the Roosevelt National Forest, located in the Front Range of northern Colorado. Cover type data came from US Forest Service inventory information, while the cartographic variables used to predict cover type consisted of elevation, aspect, and other information derived from standard digital spatial data processed in a geographic information system (GIS). The results of the comparison indicated that a feedforward artificial neural network model more accurately predicted forest cover type than did a traditional statistical model based on Gaussian discriminant analysis.

EWMA control charts for monitoring high-yield processes based on non-transformed observations

Article

Oct 2008

We propose and study exponentially weighted moving average (EWMA) control charts for monitoring high-yield processes. The EWMA control charts are developed based on non-transformed geometric, binomial and Bernoulli counts. The proposed charts are evaluated based on the average number of items sampled before the first out-of-control signal is detected. By selecting small smoothing constants, the proposed EWMA control charts outperform in numerous cases the recently developed CUSUM control charts [Chang, T.C. and Gan, F.F., Cumulative sum charts for high yield processes. Statist. Sin., 2001, 11, 791–805], which are considered the most efficient control charting mechanisms in the existing literature for monitoring fraction non-conforming as small as 0.0001. Numerous simulations are included for performance comparisons. An example is also given to demonstrate the applicability of the proposed EWMA control charts.

Control Chart Tests Based on Geometric Moving Averages

Article

Aug 1959
TECHNOMETRICS

S. W. Roberts

A geometrical moving average gives the most recent observation the greatest weight, and all previous observations weights decreasing in geometric progression from the most recent back to the first. A graphical procedure for generating geometric moving averages is described in which the most recent observation is assigned a weight r. The properties of control chart tests based on geometric moving averages are compared to tests based on ordinary moving averages.

Exponentially Weighted Moving Average Charts for Detecting Concept Drift

Article

Dec 2012
PATTERN RECOGN LETT

Classifying streaming data requires the development of methods which are computationally efficient and able to cope with changes in the underlying distribution of the stream, a phenomenon known in the literature as concept drift. We propose a new method for detecting concept drift which uses an Exponentially Weighted Moving Average (EWMA) chart to monitor the misclassification rate of an streaming classifier. Our approach is modular and can hence be run in parallel with any underlying classifier to provide an additional layer of concept drift detection. Moreover our method is computationally efficient with overhead O(1) and works in a fully online manner with no need to store data points in memory. Unlike many existing approaches to concept drift detection, our method allows the rate of false positive detections to be controlled and kept constant over time.

A Case-Based Technique for Tracking Concept Drift in Spam Filtering

Article

Aug 2005
KNOWL-BASED SYST

Spam filtering is a particularly challenging machine learning task as the data distribution and concept being learned changes over time. It exhibits a particularly awkward form of concept drift as the change is driven by spammers wishing to circumvent spam filters. In this paper we show that lazy learning techniques are appropriate for such dynamically changing contexts. We present a case-based system for spam filtering that can learn dynamically. We evaluate its performance as the case-base is updated with new cases. We also explore the benefit of periodically redoing the feature selection process to bring new features into play. Our evaluation shows that these two levels of model update are effective in tracking concept drift.

Speeding Up Statistical Tests to Detect Recurring Concept Drifts

Abstract and Figures

Recommended publications

A Comparison on How Statistical Tests Deal with Concept Drifts

RCD: A Recurring Concept Drift Framework

A Comparative Study on Concept Drift Detectors

Speeding Up Recovery From Concept Drifts