ArticlePDF Available

A Review on Concept Drift

Authors:
  • Dr. Vishwanath Karad MIT World Peace University, Pune

Abstract and Figures

The concept changes in continuously evolving data streams are termed as concept drifts. It is required to address the problems caused due to concept drift and adapt according to the concept changes. This can be achieved by designing supervised or unsupervised techniques in such a way, that concept changes are considered, and useful knowledge is extracted. This paper discusses various techniques to manage concept drifts. The synthetic and real datasets with different concept drifts and the applications are discussed.
Content may be subject to copyright.
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 1, Ver. II (Jan Feb. 2015), PP 20-26
www.iosrjournals.org
DOI: 10.9790/0661-17122026 www.iosrjournals.org 20 | Page
A Review on Concept Drift
Yamini Kadwe1, Vaishali Suryawanshi2
1,2 (Department of IT, M.I.T. College Of Engineering, Pune, India)
Abstract: The concept changes in continuously evolving data streams are termed as concept drifts. It is
required to address the problems caused due to concept drift and adapt according to the concept changes. This
can be achieved by designing supervised or unsupervised techniques in such a way, that concept changes are
considered, and useful knowledge is extracted. This paper discusses various techniques to manage concept
drifts. The synthetic and real datasets with different concept drifts and the applications are discussed.
Keywords: Drift detectors, ensemble classifiers, data stream mining, and bagging.
I. Introduction
Today, is a world of advanced technologies, every field is automated. Due to advances in technology,
plenty of data is generated every second. Examples of such applications include network monitoring, web
mining, sensor networks, telecommunications data management, and financial applications [14]. The data needs
to be gathered and processed, to extract unknown, useful and interesting knowledge. But it is impossible to
manually extract that knowledge due to the volume and speed of the data gathered.
Concept drift occurs when the concept about which data is being collected shifts from time to time after
a minimum stability period. This problem of concept drift needs to be considered to mine data with acceptable
accuracy level. Some examples of concept drift include spam detection, financial fraud detection, climate
change prediction, customer preferences for online shopping.
This paper is organized as follows, Section II gives the overview of concept drift, involving problem,
need to adapt concept drift and types of concept drift. Section III explains various methods of detecting concept
drift. Section IV discusses about statistical tests for concept drift. In Section V classifiers for dealing with
concept drifts are discussed. Section VI gives a summary on possible datasets based on the type of drifts present
and Section VII summarizes consideration of concept drift in various real-world applications.
II. Overview
Problem of Concept Drift:
There has been increased importance of concept drift in machine learning as well as data mining tasks.
Today, data is organized in the form of data streams rather than static databases. Also the concepts and data
distributions ought to change over a long period of time.
Need for Concept drift adaptation:
In dynamically changing or non-stationary environments, the data distribution can change over time
yielding the phenomenon of concept drift[4]. The concept drifts can be quickly adapted by storing concept
descriptions, so that they can be re-examined and reused later. Hence, adaptive learning is required to deal with
data in non-stationary environments. When concept drift is detected, the current model needs to be updated to
maintain accuracy.
Types of Concept drift:
Depending on the relation between the input data and target variable, concept change take different
forms. Concept drift between time point t0 and time point t1 can be defined as-
X : pt0(X, y) ≠ pt1(X, y) (1)
where pt0 denotes the joint distribution at time t0 between the set of input variables X and the target variable y.
Kelly et al. presented the three ways in which concept drift may occur [3]:
prior probabilities of classes, p(y) may change over time
class-conditional probability distributions, p(X,y) might change
posterior probabilities p(y|X) might change.
Concept drift may be classified in terms of the [4] speed of change and the reason of change as shown
in figure 1. When 'a set of examples has legitimate class labels at one time and has different legitimate labels at
another time', it is real drift, i.e. reason of change[20], refers to changes in p(y|X).
A Review on Concept Drift
DOI: 10.9790/0661-17122026 www.iosrjournals.org 21 | Page
Fig 1: Types of drift: circles represent instances; different colors represent different classes[4]
Fig 2: Patterns of concept change [4]
When 'the target concepts remain the same but the data distribution changes'[6], it is virtual drift, i.e.
speed of change, refers to changes in p(X).
A drift can be sudden or abrupt, when concept switching is from one to another (refer figure 2)[4]. The
concept change can be incremental, consisting of many intermediate concepts in between. Drift may be gradual;
change is not abrupt, but goes back to previous pattern for some time. Concept drift handling algorithms should
not mix the true drift with an outlier (blip) or noise, which refers to an anomaly. A recurring drifts is when new
concepts that were not seen before, or previously seen concepts may reoccur after some time.
Detecting Concept changes:
The ways to monitor concept drift are as given below:
Concept drift is monitored by checking with the data's probability distribution, since it changes with time.
One can judge whether concept drift has happened, by monitoring and tracking the relevance between
various sample characteristics or attributions.
Concept drifts leads to changes in features of classification models.
Classification accuracy can be taken into account while detecting concept drift on a given data stream.
Recall, precision and F-measure are some of the accuracy indicators of classification
The arrival of the timestamp of single sample or block sample can be taken as an additional input attribute, to
determine occurrence concept drift. It keeps a check on whether the classification rule has become outdated.
III. Concept Drift Detectors
This section discusses algorithms allowing to detect concept drift, known as concept drift detectors.
They alarm the base learner, that the model should be rebuilt or updated.
DDM: In the Drift Detection Method (DDM), proposed by Gama et al. uses Binomial Distribution[14]. For
each point i in the sequence that is being sampled, the error rate is the probability of misclassifying (pi), with
standard deviation (si) given by eq 2-
si = (2)
they store the values of pi and si when pi + si reaches its minimum value during the process i.e. pmin and
smin. These values are used to calculate a warning level condition presented in eq. 3 and an alarm level condition
presented in eq. 4 -
pi + si ≥ pmin + α . smin (warning level) (3)
pi + si ≥ pmin + β.smin (alarm level ) (4)
Beyond the warning the examples are stored in anticipation of a possible change of context. Beyond the
alarm level, the concept drift is supposed to be true, the model induced by the learning method is reset, also pmin
and smin, and a new model is learnt using the examples stored since the warning level triggered. DDM works
best on data streams with sudden drift as gradually changing concepts can pass without triggering the alarm
level.
A Review on Concept Drift
DOI: 10.9790/0661-17122026 www.iosrjournals.org 22 | Page
EDDM:
Baena-García et al. proposed a modification of DDM called EDDM [16]. The same warning- alarm
mechanism, was used but instead of using the classifier‟s error rate, the distance-error-rate was proposed. They
denote p'i as the average distance between two consecutive errors and s'i as its standard deviation. Using these
values the new warning and alarm conditions are given by eq. 5 and eq. 6.
p'i +2 . s'i / p'max+2.s'max < α (warning level) (5)
p'i+3. s'i/ p'max+3.s'max < β (alarm level) (6)
the values of p'i and s'i are stored when reaches its maximum value p'i +2 .s'i (obtaining p'max and s'max). EDDM
works better than DDM for slow gradual drift, but is more sensitive to noise. Another drawback is that it
considers the thresholds and searches for concept drift when a minimum of 30 errors have occurred.
Adwin:
Bifet et al. proposed this method, that uses sliding windows of variable size, which are recomputed
online according to the rate of change observed from the data in these windows[13]. The window(W) is
dynamically enlarged when there is no clear change in the context, and shrinks it when a change is detected.
Additionally, ADWIN provides rigorous guarantees of its performance, in the form of limits on the rates of false
positives and false negatives. ADWIN works only for one-dimensional data. A separate window must be
maintained for each dimension, for n-dimensional raw data, which results in handling more than one window.
Paired Learners:
The Paired Learners, proposed by Stephen Bach et al., uses two learners: stable and reactive[17]. The
stable learner predicts based on all of its experience, while the reactive one predicts based on a window of recent
examples. It uses the interplay between these two learners and their accuracy differences to cope with concept
drift. The reactive learner can be implemented in two different ways; by rebuilding the learner with the last
w(window size) examples, or by using a retractable learner that can unlearn examples.
Exponentially weighted moving average for Concept Drift Detection (ECDD):
Ross et al., proposed a drift detection method based on Exponentially Weighted Moving Average
(EWMA)[15], used for identifying an increase in the mean of a sequence of random variables. In EWMA, the
probability of incorrectly classifying an instance before the change point and the standard deviation of the
stream are known. In ECDD, the values of success and failure probability(1 and 0) are computed online, based
on the classification accuracy of the base learner in the actual instance, together with an estimator of the
expected time between false positive detections.
Statistical Test of Equal Proportions (STEPD):
The STEPD proposed by Nishida et al., assumes that 'the accuracy of a classifier for recent W
examples will be equal to the overall accuracy from the beginning of the learning if the target concept is
stationary; and a significant decrease of recent accuracy suggests that the concept is changing'[18]. A chi-square
test is performed by computing a statistic and its value is compared to the percentile of the standard normal
distribution to obtain the observed significance level. If this value is less than a significance level, then the null-
hypothesis is rejected, assuming that a concept drift has occurred. The warning and drift thresholds are also
used, similar to the ones presented by DDM, EDDM, PHT, and ECDD.
DOF:
The method proposed by Sobhani et al. detects drifts by processing data chunk by chunk, the nearest
neighbor in the previous batch is computed for each instance in the current batch and comparing their
corresponding labels. A distance map is created, associating the index of the instance in the previous batch and
the label computed by its nearest neighbor; degree of drift is computed based on the distance map. The average
and standard deviation of all degrees of drift are computed and, if the current value is away from the average
more than s standard deviations, a concept drift is raised, where s is a parameter of the algorithm. [10]This
algorithm is more effective for problems with well separated and balanced classes.
IV. Statistical Tests For Concept Drift:
The design of a change detector is a compromise between detecting true changes and avoiding false
alarms. This is accomplished by carrying out statistical tests that verifies if the running error or class distribution
remain constant over time.
A Review on Concept Drift
DOI: 10.9790/0661-17122026 www.iosrjournals.org 23 | Page
CUSUM test:
The cumulative sum algorithm[24], is a change detection algorithm that raises an alarm when the mean
of the input data is significantly different from zero. The CUSUM input ϵ
t can be any filter residual, for
example, the prediction error from a Kalman filter. The CUSUM test is as follows-
go = 0
gt = max (0, gt-1 + ϵ
t - υ)
if gt > h then alarm and gt = 0 (7)
The CUSUM test is memoryless, and its accuracy depends on the choice of parameters υ and h.
Page Hinkley test: It is a sequential analysis technique, proposed by, that computes the observed values and their
mean up to the current moment. The Page-Hinkley test[5] is given as -
go = 0, gt = gt-1 + ϵ
t - υ
Gt = min(gt)
if gt - Gt > h then alarm and gt = 0 (8)
Geometric moving average test:
The Geometric Moving Average (GMA) test [25] is as below:
go = 0
gt = λgt−1 + (1 − λ)ϵ
t
if gt > h then alarm and gt = 0 (9)
The forgetting factor λ is used to give more or less weight to the last data arrived. The threshold h is
used to tune the sensitivity and false alarm rate of the detector.
Statistical test:
CUSUM and GMA are methods those deal with numeric sequences. A statistical test is a procedure for
deciding whether a hypothesis about a quantitative feature of a population is true or false. We test an hypothesis
by drawing a random sample from the population in question and calculating an appropriate statistic on its
items.
To detect change, we need to compare two sources of data, and decide if the hypothesis H0 that they
come from the same distribution is true. Otherwise, a hypothesis test will reject H0 and a change is detected. The
simplest way for hypothesis, is to study the difference from which a standard hypothesis test can be formulated.
0 - 1 Є N(0, σ20 + σ21 ), under H0
or, to make a χ2 test, [( 0 - 1)2/ σ20 + σ21] Є χ2(1), under H0
The Kolmogorov-Smirnov test (non-parametric) is another statistical test to compare two populations.
The KS-test has the advantage of making no assumption about the distribution of data.
V. Concept Drift Handling
Kuncheva proposes to group ensemble strategies for changing environments[9] as follows:
Dynamic combiners (horse racing): component classifiers are trained and their combination is changed
using forgetting process.
Updated training data : component classifiers in the ensemble are created incrementally by incoming
examples.
Updating the ensemble member : ensemble members are updated online or retrained with blocks of data.
Structural changes of the ensemble : ensemble members are reevaluated and the worst classifiers are
updated or replaced with a classifier trained on the most recent examples, with any concept change.
Adding new features - The attributes used are changed, as an attribute becomes significant, without
redesigning the ensemble structure.
The approaches to handle concept drifts includes single classifier and ensemble classifier approaches.
The single classifiers are traditional learners that were modeled for stationary data mining and have the qualities
of an online learner and a forgetting mechanism. Basically, ensemble classifiers are sets of single classifiers
whose individual decisions are aggregated by a voting rule. The ensemble classifiers provide better
classification accuracy as compared to the single classifiers due combined decision. They have a natural way of
adapting to concept changes due to their modularity.
Streaming Ensemble Algorithm(SEA): The SEA[8], proposed by Street and Kim, changes its structure based
on concept change. It is a heuristic replacement strategy of the weakest base classifier based on accuracy and
diversity. The combined decision was based on simple majority voting and base classifiers unpruned. This
algorithm works best for at most 25 components of the ensemble.
A Review on Concept Drift
DOI: 10.9790/0661-17122026 www.iosrjournals.org 24 | Page
Accuracy Weighted Ensemble (AWE): In SEA, it is crucial to properly define the data chunk size as it
determines the ensembles flexibility. The algorithm AWE, proposed by Wang et al., trains a new classifier C' on
each incoming data chunk and use that chunk to evaluate all the existing ensemble members to select the best
component classifiers. AWE is best suited for large data streams and works well for recurring and other drifts.
Adaptive Classifier Ensemble(ACE): To overcome AWE‟s slow drift reactions, Nishida proposed a hybrid
approach in which a data chunk ensemble is aided by a drift detector, called Adaptive Classifier Ensemble
(ACE), aims at reacting to sudden drifts by tracking the classifier‟s error rate with each incoming example,
while slowly reconstructing a classifier ensemble with large chunks of examples.
Hoeffding option trees(HOT) and ASHT Bagging: Hoeffding Option Trees (HOT) provide a compact
structure that works like a set of weighted classifiers, and are built in an incremental fashion. This
algorithms[27] allows each training example to update a set of option nodes rather than just a single leaf.
Adaptive-Size Hoeffding Tree Bagging (ASHT Bagging) diversifies ensemble members by using trees of
different sizes and uses a forgetting mechanism. Compared to HOT, ASHT Bagging proves to be more accurate
on most data sets. But both are time and memory expensive than option trees or single classifiers.
Accuracy Diversified Ensemble(ADE): The algorithm called Accuracy Diversified Ensemble (ADE)[22], not
only selects but also updates components according to the current distribution. ADE differs from AWE in
weight definition, the use of online base classifiers, bagging, and updating components with incoming examples.
Compared to ASHT and HOT, we do not limit base classifier size, do not use any windows, and update
members only if they are accurate enough according to the current distribution.
Accuracy Updated Ensemble(AUE): Compared to AWE, AUE1 [7] conditionally updates component
classifiers. It maintains a weighted pool of component classifiers and predicts classes for incoming examples
based on weighted voting rule. It substitutes the weakest performing ensemble member and new classifier is
created with each data chunk of examples, also their weights are adjusted. It uses Hoeffding trees as component
classifiers. Compared to AUE1, AUE2 introduces a new weighting function[22], does not require cross-
validation of the candidate classifier, does not keep a classifier buffer, prunes its base learners, and always
updates its components. It does not limit base classifier size and use any windows. The OAUE[23], tries to
combine block-based ensembles and online processing.
VI. Datasets With Concept Drift
Artificial datasets give the ground truth of the data, however, real datasets are more interesting as they
correspond to real-world applications where the algorithms‟ usability is tested[22].
6.1 Real datasets:
Forest Covertype, obtained from US Forest Service (USFS) Region 2 Resource Information System (RIS)
data, contains 581, 012 instances and 54 attributes.
Poker-Hand consists of 1,000, 000 instances and 11 attributes.
Electricity dataset, collected from the Australian New South Wales Electricity Market, contains 45, 312
instances.
Airlines Dataset contains 539,383 examples described by seven attributes.
Ozone level detection data set consists of 2,534 entries and is highly unbalanced (2% or 5% positives
depending on the criteria of “ozone days”).
6.2 Synthetic datasets:
The synthetic datasets allow us to analyze how the methods deal with the types of drift included in the datasets,
as it is known in advance when the drifts begin and end. For abrupt or sudden drifts, Stagger, Gauss, Mixed2
can be used. The Waveforrm, LED generator or Circles dataset best suited for gradual drifts. Hyperplane dataset
works well for both gradual and incremental drift. Radial basis function(RBF) can also be used for incremental
drift, and blips can also be incorporated.
Applications
This sections describes various real-life problem [11,12] in different domains related to the concept drifts in the
data generated from these real domains.
A Review on Concept Drift
DOI: 10.9790/0661-17122026 www.iosrjournals.org 25 | Page
Fig 3: Applications of Real-domain concept drift
Monitoring and control often employs unsupervised learning, which detects abnormal behavior. In
monitoring and control applications the data volumes are large and it needs to be processed in real time.
Personal assistance and information applications mainly organize and/or personalize the flow of
information. the class labels are mostly „soft‟ and the costs of mistake are relatively low.
Decision support includes diagnostics, evaluation of creditworthiness. Decision support and diagnostics
applications usually involve limited amount of data. Decisions are not required to be made in real time but high
accuracy is essential in these applications and the costs of mistakes are large.
Artificial intelligence applications include a wide spectrum of moving and stationary systems, which
interact with changing environment. The objects learn how to interact with the environment and since the
environment is changing, the learners need to be adaptive.
VII. Conclusion
This paper describes about the problem of concept drift. It summarizes the need, types and reasons for
concept change. The various concept drift detection methods viz. DDM, EDDM, Paired learners, ECDD,
ADWIN, STEPD and DOF are discussed and methods it adopts to detect concept change. To identify if concept
drift has occurred, statistical tests like CUSUM, Page-Hinkley and GMA test are explained. Various classifier
approaches, especially, ensemble classifiers provide better accuracy in case of concept change. The ensemble
classifiers SEA, AWE, ACE, ADE, HOT, ASHT, AUE adapt according to the drift that occurs, yielding good
classifier accuracy. Later, applications and the datasets, real and synthetic, suited for various concept drifts can
be used to check the adaptability of any algorithm handling concept drift.
In future, we can enhance the classification performance of the ensemble algorithms discussed above,
by adapting it to various drifts and diversity.
References
[1]. P. M. Goncalves, Silas G.T. de Carvalho Santos, Roberto S.M. Barros, Davi C.L. Vieira, (2014) "Review: A comparative study on
concept drift detectors", A International Journal: Expert Systems with Applications,81448156.
[2]. L. L. Minku and Xin Yao(2011), "DDD: A New Ensemble Approach For Dealing With Concept Drift", IEEE TKDE, Vol. 24, pp.
619 - 633.
[3]. M. G. Kelly, D. J. Hand, and N. M. Adams(1999), "The Impact of Changing Populations on Classifier Performance", In Proc. of the
5th ACM SIGKDD Int. Conf. on Knowl. Disc. and Dat. Mining (KDD). ACM, 367371.
[4]. J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, A. Bouchachia(2014), "A Survey on Concept Drift Adaptation ", ACM Computing
Surveys, Vol. 46, No. 4, Article 44.
[5]. Mouss, H., Mouss, D., Mouss, N., Sefouhi, L.(2004), "Test of Page-Hinkley, an Approach for Fault Detection in an Agro-
Alimentary Production System", 5th Asian Control Conference, IEEE Computer Society, vol. 2, pp. 815--818.
[6]. S. Delany, P. Cunningham, A. Tsymbal, and L. Coyle. (2005),"A Case-based Technique for Tracking Concept Drift in Spam
filtering", Knowledge-Based Sys. 18, 45 , 187195.
[7]. D. Brzezinski and J. Stefanowski(2011), “Accuracy updated ensemble for data streams with concept drift,” Proc. 6th HAIS Int.
Conf. Hybrid Artificial Inteligent. Syst., II, pp. 155163.
[8]. W. N. Street and Y. Kim(2001), “A streaming ensemble algorithm (SEA) for large-scale classification,” in Proc. 7th ACM
SIGKDD Int. Conf. Knowl. Discovery Data Mining, pp. 377382.
[9]. Ludmila I. Kuncheva(2004), "Classifier ensembles for changing environments", Multiple Classifier Systems, Lecture Notes in
Computer Science, Springer , vol. 3077, pages 115.
[10]. Sobhani P. and Beigy H.(2011), "New drift detection method for data streams", Adaptive and intelligent systems, Lecture notes in
computer science, Vol. 6943, pp. 8897.
[11]. D Brzezinski, J Stefanowsk(2011), "Mining data streams with concept drift " Poznan University of Technology Faculty of
Computing Science and Management Institute of Computing Science.
A Review on Concept Drift
DOI: 10.9790/0661-17122026 www.iosrjournals.org 26 | Page
[12]. I Žliobaite (2010), "Adaptive Training Set Formation", Doctoral dissertation Physical sciences, informatics (09P) Vilnius
University.
[13]. A Bifet(2009), "Adaptive Learning and Mining for Data Streams and Frequent Patterns", Doctoral Thesis.
[14]. J Gama, P Medas, G Castillo and Pedro Rodrigues(2004), "Learning with Drift Detection", Lecture Notes in Computer Science,
Vol. 3171, pp 286-295.
[15]. G. J. Ross, N. M. Adams, D. Tasoulis, D. Hand(2012), "Exponentially weighted moving average charts for detecting concept drift",
International Journal Pattern Recognition Letters, 191-198.
[16]. M Baena-Garcia, J Campo-Avila, R Fidalgo, A Bifet, R Gavaldµa and R Morales-Bueno(2006), "Early Drift Detection Method",
IWKDDS, pp. 7786.
[17]. S. H. Bach and M. A. Maloof (2008), "Paired Learners for Concept Drift", Eighth IEEE International Conference on Data Mining,
pp. 23-32.
[18]. K. Nishida(2008), "Learning and Detecting Concept Drift", A Dissertation: Doctor of Philosophy in Information Science and
Technology, Graduate School of Information Science and Technology, Hokkaido University.
[19]. D Brzezinski, J Stefanowski(2012), "From Block-based Ensembles to Online Learners In Changing Data Streams: If- and How-To",
ECML PKDD Workshop on Instant Interactive Data Mining, pp. 60965.
[20]. J. Kolter and M. A. Maloof (2007), "Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts", Journal of
Machine Learning Research 8, 2755-2790.
[21]. P. B. Dongre, L. G. Malik(2014), " A Review on Real Time Data Stream Classification and Adapting To Various Concept Drift
Scenarios", IEEE International Advance Computing Conference (IACC), pp. 533-537.
[22]. D Brzezinski, J Stefanowski (2014),"Reacting to Different Types of Concept Drift:The Accuracy Updated Ensemble Algorithm"
IEEE Transactions On Neural Networks And Learning Systems, Vol. 25, pp. 81-94.
[23]. D Brzezinski, J Stefanowski(2014), "Combining block-based and online methods in learning ensembles from concept drifting data
streams", An International Journal: Information Sciences 265, 5067.
[24]. E. S. Page.(1954) Continuous inspection schemes. Biometrika, 41(1/2):100115.
[25]. S. W. Roberts(2000), "Control chart tests based on geometric moving averages", Technometrics, 42(1):97101.
[26]. R. Elwell and R. Polikar(2011), “Incremental learning of concept drift in nonstationary environments,” IEEE Trans. Neural Netw.,
vol. 22, no. 10, pp. 15171531.
[27]. A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà(2009),“New ensemble methods for evolving data streams,” in Proc.
15th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pp. 139-148.
... Adaptive Windowing. Adaptive Windowing (ADWIN) by Bifet and Gavaldà [8] is a popular CD detection algorithm that uses sliding windows of variable size [25,30,32,7]. The variable size is recalculated according to the rate of change observed from the data in each window [25,30,32]. ...
... Adaptive Windowing (ADWIN) by Bifet and Gavaldà [8] is a popular CD detection algorithm that uses sliding windows of variable size [25,30,32,7]. The variable size is recalculated according to the rate of change observed from the data in each window [25,30,32]. ADWIN dynamically enlarges the window W when there is no obvious change in context, and shrinks it when a change is identified [25,62,32]. ...
... The variable size is recalculated according to the rate of change observed from the data in each window [25,30,32]. ADWIN dynamically enlarges the window W when there is no obvious change in context, and shrinks it when a change is identified [25,62,32]. The algorithm divides the window W into two subwindows w 1 and w 2 , representing older and newer data, respectively, causing those to have different average values and looking for distances from each other [25,62,7]. ...
Conference Paper
Full-text available
In a dynamic world, data streams are continuously generated, which poses immense challenges for machine learning (ML) algo- rithms to adapt to changing statistical properties that are subject to a non-stationary context. The underlying scenario is defined as concept drift (CD), where changes in the relationship between response and prediction variables (real CD) or a change in input data (virtual CD) are accompanied by a significant degradation in the predictive performance of the models, causing ML models to reach unacceptable levels of system accuracy. In this paper, the state of the art for CD algorithms is analyzed and compared. For this purpose, a systematic literature review was performed. Then, the 10 most popular CD algorithms were extracted from the literature using a newly-developed metric. Subsequently, the algorithms were analyzed and compared with respect to their functionality and limitations. Based on these, the optimization potentials were systematically derived. This work presents a summarized overview of CD algorithms and provides the basis for algorithm optimization in this domain.
... It revolves around online labeled data streams where the relationship between the input target variable alters with time. Kadwe and Suryawanshi (2015) described a number of methods to deal with concept drifts. Applications of synthetic and real datasets with various concept drifts are also presented. ...
Article
Full-text available
Last decade demonstrate the massive growth in organizational data which keeps on increasing multi‐fold as millions of records get updated every second. Handling such vast and continuous data is challenging which further opens up many research areas. The continuously flowing data from various sources and in real‐time is termed as streaming data. While deriving valuable statistics from data streams, the variation that occurs in data distribution is called concept drift. These drifts play a significant role in a variety of disciplines, including data mining, machine learning, ubiquitous knowledge discovery, quantitative decision theory, and so forth. As a result, a substantial amount of research is carried out for studying methodologies and approaches for dealing with drifts. However, the available material is scattered and lacks guidelines for selecting an effective technique for a particular application. The primary novel objective of this survey is to present an understanding of concept drift challenges and allied studies. Further, it assists researchers from diverse domains to accommodate detection and adaptation algorithms for concept drifts in their applications. Overall, this study aims to contribute to deeper insights into the classification of various types of drifts and methods for detection and adaptation along with their key features and limitations. Furthermore, this study also highlights performance metrics used to evaluate the concept drift detection methods for streaming data. This paper presents the future research scope by highlighting gaps in the existing literature for the development of techniques to handle concept drifts. This article is categorized under: Algorithmic Development > Ensemble Methods Application Areas > Data Mining Software Tools Fundamental Concepts of Data and Knowledge > Big Data Mining
... They argue that both intrusion and concept drift detection should learn from the changes over time, but current ML-based NIDSs are built on the assumption of a stationary traffic data distribution. Their framework detects concept drift by the Page-Hinkley test (PHT) [89] and adopts incremental learning to update the training data and detection model. ...
Article
Full-text available
Utilizing machine learning (ML)-based approaches for network intrusion detection systems (NIDSs) raises valid concerns due to the inherent susceptibility of current ML models to various threats. Of particular concern are two significant threats associated with ML: adversarial attacks and distribution shifts. Although there has been a growing emphasis on researching the robustness of ML, current studies primarily concentrate on addressing specific challenges individually. These studies tend to target a particular aspect of robustness and propose innovative techniques to enhance that specific aspect. However, as a capability to respond to unexpected situations, the robustness of ML should be comprehensively built and maintained in every stage. In this paper, we aim to link the varying efforts throughout the whole ML workflow to guide the design of ML-based NIDSs with systematic robustness. Toward this goal, we conduct a methodical evaluation of the progress made thus far in enhancing the robustness of the targeted NIDS application task. Specifically, we delve into the robustness aspects of ML-based NIDSs against adversarial attacks and distribution shift scenarios. For each perspective, we organize the literature in robustness-related challenges and technical solutions based on the ML workflow. For instance, we introduce some advanced potential solutions that can improve robustness, such as data augmentation, contrastive learning, and robustness certification. According to our survey, we identify and discuss the ML robustness research gaps and future direction in the field of NIDS. Finally, we highlight that building and patching robustness throughout the life cycle of an ML-based NIDS is critical.
... Several effective drift detection algorithms have been proposed, such as ADWIN, DDM, DWM, STEPD, DMDDM, and many others (Agrahari and Singh, 2021). For more information on the concept drift detectors, the reader can refer to (Agrahari and Singh, 2021;Lu et al, 2018;Zliobaitė, 2010;Gama et al, 2014;Kadwe and Suryawanshi, 2015). ...
Preprint
Full-text available
Data streams are sequences of fast-growing and high-speed data points that typically suffer from the infinite length, large volume, and specifically unstable data distribution. These potential issues of data streams bold the necessity of data stream mining tasks. Ensemble learning as a prevalent classification approach is widely used in data stream mining studies. Besides the impressive performance of ensemble learning algorithms in providing a collection of diverse and accurate classifiers, they are specifically efficient in handling non-stationary data streams. Due to the component-based nature and the chance of dynamic updates for the components of the ensemble, this category is appropriate for dynamically learning the changing concepts of the data. This paper aims to provide a thorough review of the most significant ensemble-based data stream classification approaches, along with a discussion about the potential issues of the non-stationary data streams. Furthermore, comprehensive experimental analysis is performed to compare the classification performance of the well-known state-of-the-art ensemble-based data stream classification approaches on 24 synthetic non-stationary data streams. The superiority of the approaches is proved by conducting various statistical tests.
... In order to mitigate the problem of dataset shift, different strategies have been proposed (Gama et al. 2014;Kadwe and Suryawanshi 2015;Yu et al. 2019): (i) first, to detect the presence of dataset shift and categorise it into different types and (ii) to choose the most suitable classifier from a pool of calibrated classifiers according to the shift detected. ...
Article
Full-text available
Spam emails have been traditionally seen as just annoying and unsolicited emails containing advertisements, but they increasingly include scams, malware or phishing. In order to ensure the security and integrity for the users, organisations and researchers aim to develop robust filters for spam email detection. Recently, most spam filters based on machine learning algorithms published in academic journals report very high performance, but users are still reporting a rising number of frauds and attacks via spam emails. Two main challenges can be found in this field: (a) it is a very dynamic environment prone to the dataset shift problem and (b) it suffers from the presence of an adversarial figure, i.e. the spammer. Unlike classical spam email reviews, this one is particularly focused on the problems that this constantly changing environment poses. Moreover, we analyse the different spammer strategies used for contaminating the emails, and we review the state-of-the-art techniques to develop filters based on machine learning. Finally, we empirically evaluate and present the consequences of ignoring the matter of dataset shift in this practical field. Experimental results show that this shift may lead to severe degradation in the estimated generalisation performance, with error rates reaching values up to $$48.81\%$$ 48.81 % .
... The test window represents the new data batch and the algorithm monitors its error rate in detecting the concept drift. A threshold value is declared to identify the existence of the drift, based on the average error rate of the reference sub-windows and other parameters [2][5] [11] [22]. ...
Chapter
Data mining techniques are currently of great importance in companies and organisations worldwide for building predictive models. These models are particularly useful for classifying new data and supporting decision-making processes by helping to make the most appropriate decisions. However, over time, the predictive models created can become outdated as the patterns found in the data change due to natural evolution. This aspect can affect the quality of the models and lead to results that do not match reality. In this paper, we present a general approach for creating a self-updating system of predictive models that can be adapted to specific contexts. This system periodically generates and selects the most appropriate predictive model for ensuring the validity of its predictions. It integrates data processing and data mining model generation, and allows for the detection of changes in existing patterns as new data is added. This is suitable for supervised data mining tasks that may be affected by data evolution. The implementation of the system has demonstrated that it is possible to pre-process the data and select the best predictive model. In addition, since the execution is triggered automatically, the need for system maintenance is reduced.
Article
The challenge of deploying neural network learning workloads on ultra-low power tiny devices has recently attracted several machine learning researchers of the TinyML community. A typical on-device learning session processes real-time streams of data acquired by heterogeneous sensors. In such a context, this paper proposes TinyRCE, a forward-only learning approach based on a hyperspherical classifier, which can be deployed on microcontrollers and potentially integrated into the sensor package. TinyRCE is fed with compact features extracted by a convolutional neural network, which can be trained with BP or it can be an extreme learning machine with randomly initialized weights. A forget mechanism has been introduced to discard useless neurons from the hidden layer, since they can become redundant over the time. TinyRCE has been evaluated with a new interleaved learning and testing data protocol to mimic a typical forward on-tiny-device workload. It has been tested with the standard MLCommons Tiny datasets used for KeyWord Spotting and Image Classification, and against the respective neural benchmarks. 95.25% average accuracy was achieved over the former classes (vs. 91.49%) and 87.17% over the latter classes (vs. 100%, caused by overfitting). In terms of complexity, TinyRCE requires 22× less MACC than SoftMax (with 36 epochs) on the former, while it requires 5× more MACC than SoftMax (with 500 epochs) for the latter. Classifier complexity and memory footprint are marginal w.r.t. the Feature Extractor, for training and inference workloads.
Article
Data stream management (DSM) for cyber-physical systems (CPSs) provides good quality care services in the medical domain. This is a very prominent field of research that includes sensing, processing, and networking of various medical devices. The DSM for CPS is a combination of computation using Cyber world (computers or WBANs), including wearable sensors and smart meters as well as communication between the processes through the networks. Data analytics and mobile computing include the usage of wireless sensors which plays a very significant role while handling uncertainties of data stream in healthcare domain. This paper presents a comprehensive review of DSM techniques, including the problem of concept drift in the healthcare domain using CPS, and the challenges associated with the domain. The complete taxonomy characterizes and classifies all the components and methods required for data management in healthcare. This paper provides a glimpse of futuristic techniques used for DSM in view of concept drift while handling real-time data management in healthcare and also identifies the fields for future research. The prime objective of this review is to provide a solution to aggregate health data streams using DSM, generated from different sources, clean and normalize them, and improve them for analysis, diagnosis, pattern identification for data analytics and complex event processing. It is supposed that the techniques discussed in this paper are relevant and useful for further research in the area of DSM for CPS in healthcare.
Article
Full-text available
Concept drift primarily refers to an online supervised learning scenario when the relation between the input data and the target variable changes over time. Assuming a general knowledge of supervised learning in this article, we characterize adaptive learning processes; categorize existing strategies for handling concept drift; overview the most representative, distinct, and popular techniques and algorithms; discuss evaluation methodology of adaptive algorithms; and present a set of illustrative applications. The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art. Thus, it aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.
Article
Full-text available
Data stream mining has been receiving increased attention due to its presence in a wide range of applications, such as sensor networks, banking, and telecommunication. One of the most important challenges in learning from data streams is reacting to concept drift, i.e., unforeseen changes of the stream's underlying data distribution. Several classification algorithms that cope with concept drift have been put forward, however, most of them specialize in one type of change. In this paper, we propose a new data stream classifier, called the Accuracy Updated Ensemble (AUE2), which aims at reacting equally well to different types of drift. AUE2 combines accuracy-based weighting mechanisms known from block-based ensembles with the incremental nature of Hoeffding Trees. The proposed algorithm is experimentally compared with 11 state-of-the-art stream methods, including single classifiers, block-based and online ensembles, and hybrid approaches in different drift scenarios. Out of all the compared algorithms, AUE2 provided best average classification accuracy while proving to be less memory consuming than other ensemble approaches. Experimental results show that AUE2 can be considered suitable for scenarios, involving many types of drift as well as static environments.
Article
Full-text available
Most stream classifiers are designed to process data incrementally, run in resource-aware environments, and react to concept drifts, i.e., unforeseen changes of the stream’s underlying data distribution. Ensemble classifiers have become an established research line in this field, mainly due to their modularity which offers a natural way of adapting to changes. However, in environments where class labels are available after each example, ensembles which process instances in blocks do not react to sudden changes sufficiently quickly. On the other hand, ensembles which process streams incrementally, do not take advantage of periodical adaptation mechanisms known from block-based ensembles, which offer accurate reactions to gradual and incremental changes. In this paper, we analyze if and how the characteristics of block and incremental processing can be combined to produce new types of ensemble classifiers. We consider and experimentally evaluate three general strategies for transforming a block ensemble into an incremental learner: online component evaluation, the introduction of an incremental learner, and the use of a drift detector. Based on the results of this analysis, we put forward a new incremental ensemble classifier, called Online Accuracy Updated Ensemble, which weights component classifiers based on their error in constant time and memory. The proposed algorithm was experimentally compared with four state-of-the-art online ensembles and provided best average classification accuracy on real and synthetic datasets simulating different drift scenarios.
Article
Full-text available
An emerging problem in Data Streams is the detection of concept drift. This problem is aggravated when the drift is gradual over time. In this work we deflne a method for detecting concept drift, even in the case of slow gradual change. It is based on the estimated distribution of the distances between classiflcation errors. The proposed method can be used with any learning algorithm in two ways: using it as a wrapper of a batch learning algorithm or implementing it inside an incremental and online algorithm. The experimentation results compare our method (EDDM) with a similar one (DDM). Latter uses the error-rate instead of distance-error-rate.
Article
Full-text available
We introduce an ensemble of classifiers-based approach for incremental learning of concept drift, characterized by nonstationary environments (NSEs), where the underlying data distributions change over time. The proposed algorithm, named Learn<sup>++</sup>.NSE, learns from consecutive batches of data without making any assumptions on the nature or rate of drift; it can learn from such environments that experience constant or variable rate of drift, addition or deletion of concept classes, as well as cyclical drift. The algorithm learns incrementally, as other members of the Learn<sup>++</sup> family of algorithms, that is, without requiring access to previously seen data. Learn<sup>++</sup>.NSE trains one new classifier for each batch of data it receives, and combines these classifiers using a dynamically weighted majority voting. The novelty of the approach is in determining the voting weights, based on each classifier's time-adjusted accuracy on current and past environments. This approach allows the algorithm to recognize, and act accordingly, to the changes in underlying data distributions, as well as to a possible reoccurrence of an earlier distribution. We evaluate the algorithm on several synthetic datasets designed to simulate a variety of nonstationary environments, as well as a real-world weather prediction dataset. Comparisons with several other approaches are also included. Results indicate that Learn<sup>++</sup>.NSE can track the changing environments very closely, regardless of the type of concept drift. To allow future use, comparison and benchmarking by interested researchers, we also release our data used in this paper.
Conference Paper
Data streams are viewed as a sequence of relational tuples (e.g., sensor readings,call records, web page visits) that continuously arrive at time-varying and possibly unbound streams. These data streams are potentially huge in size and thus it is impossible to process many data mining techniques and approaches. Classification techniques fail to successfully process data streams because of two factors: their overwhelming volume and their distinctive feature known as concept drift. Concept drift is a term used to describe changes in the learned structure that occur over time. The occurance of concept drift leads to a drastic drop in classification accuracy. The recognition of concept drift in data streams has led to sliding-window approaches also different approaches to mining data streams with concept drift include instance selection methods, drift detection, ensemble classifiers, option trees and using Hoeffding boundaries to estimate classifier performance. This paper describes the various types of concept drifts that affect the data examples and discusses various approaches in order to handle concept drift scenarios. The aim of this paper is to review and compare single classifier and ensemble approaches to data stream mining respectively.
Article
A geometrical moving average gives the most recent observation the greatest weight, and all previous observations weights decreasing in geometric progression from the most recent back to the first. A graphical procedure for generating geometric moving averages is described in which the most recent observation is assigned a weight r. The properties of control chart tests based on geometric moving averages are compared to tests based on ordinary moving averages.
Article
Classifying streaming data requires the development of methods which are computationally efficient and able to cope with changes in the underlying distribution of the stream, a phenomenon known in the literature as concept drift. We propose a new method for detecting concept drift which uses an Exponentially Weighted Moving Average (EWMA) chart to monitor the misclassification rate of an streaming classifier. Our approach is modular and can hence be run in parallel with any underlying classifier to provide an additional layer of concept drift detection. Moreover our method is computationally efficient with overhead O(1) and works in a fully online manner with no need to store data points in memory. Unlike many existing approaches to concept drift detection, our method allows the rate of false positive detections to be controlled and kept constant over time.