ArticlePDF Available

Tracking recurring contexts using ensemble classifiers: An application to email filtering

Authors:

Abstract and Figures

Concept drift constitutes a challenging problem for the machine learning and data mining community that frequently appears in real world stream classification problems. It is usually defined as the unforeseeable concept change of the target variable in a prediction task. In this paper, we focus on the problem of recurring contexts, a special sub-type of concept drift, that has not yet met the proper attention from the research community. In the case of recurring contexts, concepts may re-appear in future and thus older classification models might be beneficial for future classifications. We propose a general framework for classifying data streams by exploiting stream clustering in order to dynamically build and update an ensemble of incremental classifiers. To achieve this, a transformation function that maps batches of examples into a new conceptual representation model is proposed. The clustering algorithm is then applied in order to group batches of examples into concepts and identify recurring contexts. The ensemble is produced by creating and maintaining an incremental classifier for every concept discovered in the data stream. An experimental study is performed using (a) two new real-world concept drifting datasets from the email domain, (b) an instantiation of the proposed framework and (c) five methods for dealing with drifting concepts. Results indicate the effectiveness of the proposed representation and the suitability of the concept-specific classifiers for problems with recurring contexts.
Content may be subject to copyright.
A preview of the PDF is not available
... We study the on-the-fly classification of evolving text streams. For example, detecting spam emails from incoming emails, in which definitions of spam evolve over time, demonstrates certain challenges with our problem setting [30]. Due to the volume and velocity of incoming data, it is impractical to train the model with multiple passes over the data-as is the practice with offline settings. ...
... For other learning paradigms on text streams, Katakis et al. [30] introduce Conceptual Clustering and Prediction (CCP), using clustering to identify concepts and train cluster-specific classifiers in an ensemble. Kumar et al. [39] proposed OSDM, an online model for short text stream clustering by embedding word-occurrence semantic information in the clustering task. ...
... Then, the output layers' weighted average learning rate and loss values are used to optimize the weights of hidden layers through backpropagation (lines [26][27][28][29][30][31][32][33]. The weighted average learning rate of the output layers is calculated as: ...
Article
We study on-the-fly classification of evolving text streams in which the relation between the input data and target labels changes over time—i.e. “concept drift”. These variations decrease the model’s performance, as predictions become less accurate over-time and they necessitate a more adaptable system. While most studies focus on concept drift detection and handling with ensemble approaches, the application of neural models in this area is relatively less studied. We introduce Adaptive Neural Ensemble Network ( AdaNEN ), a novel ensemble-based neural approach, capable of handling concept drift in data streams. With our novel architecture, we address some of the problems neural models face when exploited for online adaptive learning environments. Most current studies address concept drift detection and handling in numerical streams, and the evolving text stream classification remains relatively unexplored. We hypothesize that the lack of public and large-scale experimental data could be one reason. To this end, we propose a method based on an existing approach for generating evolving text streams by introducing various types of concept drifts to real-world text datasets. We provide an extensive evaluation of our proposed approach using 12 state-of-the-art baselines and 13 datasets. We first evaluate concept drift handling capability of AdaNEN and the baseline models on evolving numerical streams; this aims to demonstrate the concept drift handling capabilities of our method on a general spectrum and motivate its use in evolving text streams. The models are then evaluated in evolving text stream classification. Our experimental results show that AdaNEN consistently outperforms the existing approaches in terms of predictive performance with conservative efficiency.
... The spam [25] dataset, which is mainly used for progressive drift detection, has 9324 samples containing 499 feature dimensions and two classes. ...
Article
Full-text available
Concept drift describes unforeseeable changes in the underlying distribution of streaming data over time. Concept drift is a phenomenon in which the statistical properties of a target domain change over time in an arbitrary way. These changes might be caused by changes in hidden variables that cannot be measured directly. With the onset of the big data era, domains such as social networks, meteorology, and finance are generating copious amounts of streaming data. Embedded within these data, the issue of concept drift can affect the attributes of streaming data in various ways, leading to a decline in the accuracy and performance of models. There is a pressing need for new models to re-adapt to the changes in streaming data. Traditional concept drift detection algorithms struggle to effectively capture and utilize the key feature points of concept drift within complex time series, thereby failing to maintain the accuracy and efficiency of the models. In light of these challenges, this study introduces a novel concept drift detection method that incorporates a temporal attention mechanism within a prototypical network. By integrating a temporal attention mechanism during the feature extraction process, our approach enhances the capability to process complex time series data, preserves temporal locality, strengthens the learning of key features, and reduces the amount of labeled data required. This method significantly improves the detection accuracy and efficiency of small sample streaming data by better capturing the local features of the data. Experiments conducted across multiple datasets demonstrate that this method exhibits comprehensive leading performance in terms of accuracy and F1-score, with excellent recall and precision, thereby validating its effectiveness in enhancing concept drift detection in streaming data.
... Conventional ensemble classifiers, such as Random Forest [37,38], Gradient-Boosting Trees [39], Ensemble SVMs [40,41] and so on, consist of multiple weak classifiers, which increase the diversity of classifiers and improve classification performance [42,43], while our primary objective is to enhance the robustness of classifiers. Therefore, the first motivation of our approach is to ensemble with strong classifiers. ...
Article
Full-text available
Learning-based classifiers are found to be vulnerable to attacks by adversarial samples. Some works suggested that ensemble classifiers tend to be more robust than single classifiers against evasion attacks. However, recent studies have shown that this is not necessarily the case under more realistic settings of black-box attacks. In this paper, we propose a novel ensemble approach to improve the robustness of classifiers against evasion attacks by using diversified feature selection and a stochastic aggregation strategy. Our proposed scheme includes three stages. Firstly, the adversarial feature selection algorithm is used to select a feature each time that can trade-offbetween classification accuracy and robustness, and add it to the feature vector bank. Secondly, each feature vector in the bank is used to train a base classifier and is added to the base classifier bank. Finally, m classifiers from the classifier bank are randomly selected for decision-making. In this way, it can cause each classifier in the base classifier bank to have good performance in terms of classification accuracy and robustness, and it also makes it difficult to estimate the gradients of the ensemble accurately. Thus, the robustness of classifiers can be improved without reducing the classification accuracy. Experiments performed using both Linear and Kernel SVMs on genuine datasets for spam filtering, malware detection, and handwritten digit recognition demonstrate that our proposed approach significantly improves the classifiers’ robustness against evasion attacks.
... 6. https://archive.ics.uci.edu/ml/datasets.php 7. https://moa.cms.waikato.ac.nz/ Simultaneously, we conduct experiments on various real-world datasets, which consist of seven UCI datasets 6 (i.e., Vowel, Image Segmentation, Satimage, Mushroom, Pendigits, Letter Recognition, and Shuttle), five real-world streaming datasets (namely Usenet-2 [56], Spam [57], Weather [49], Covertype 6 , and Poker-Hand 6 ), three image datasets (that is, ORL 8 , MNIST 9 , and Fashion-MNIST 10 ), and two traffic data streams. The brief information with these datasets is summarized in Table 1. ...
Preprint
Full-text available
p>People can often acquire knowledge dynamically and rapidly from different types of data, yet existing incremental learning algorithms are still computationally time consuming and most of stream learning methods are mainly designed for streaming data while ignoring other types of data. Hence, this paper proposes a novel dynamic concept learning (CL) algorithm by imitating human cognitive learning processes from the perspective of brain logical cognition, which is named stream concept-cognitive computing system (streamC3S). For streamC3S, it mainly consists of three aspects: the concept space, CL process, and model update process. Moreover, considering the concept drift frequently occurs in the streaming data over time, an extended version of streamC3S (namely, streamC3S<sub>E</sub>) is also proposed in this work. Specifically, we first show the related theories for streamC3S and streamC3S<sub>E</sub> on the basis of the concept space. Then an overall framework and its corresponding algorithm are shown. Finally, experimental results on various types of datasets, including the standard machine learning datasets, streaming datasets, image datasets, and two traffic data streams, validate the effectiveness of our streamC3S and streamC3S<sub>E</sub> compared to the state-of-the-art incremental learning and stream learning algorithms. </p
... 6. https://archive.ics.uci.edu/ml/datasets.php 7. https://moa.cms.waikato.ac.nz/ Simultaneously, we conduct experiments on various real-world datasets, which consist of seven UCI datasets 6 (i.e., Vowel, Image Segmentation, Satimage, Mushroom, Pendigits, Letter Recognition, and Shuttle), five real-world streaming datasets (namely Usenet-2 [56], Spam [57], Weather [49], Covertype 6 , and Poker-Hand 6 ), three image datasets (that is, ORL 8 , MNIST 9 , and Fashion-MNIST 10 ), and two traffic data streams. The brief information with these datasets is summarized in Table 1. ...
Preprint
Full-text available
p>People can often acquire knowledge dynamically and rapidly from different types of data, yet existing incremental learning algorithms are still computationally time consuming and most of stream learning methods are mainly designed for streaming data while ignoring other types of data. Hence, this paper proposes a novel dynamic concept learning (CL) algorithm by imitating human cognitive learning processes from the perspective of brain logical cognition, which is named stream concept-cognitive computing system (streamC3S). For streamC3S, it mainly consists of three aspects: the concept space, CL process, and model update process. Moreover, considering the concept drift frequently occurs in the streaming data over time, an extended version of streamC3S (namely, streamC3S<sub>E</sub>) is also proposed in this work. Specifically, we first show the related theories for streamC3S and streamC3S<sub>E</sub> on the basis of the concept space. Then an overall framework and its corresponding algorithm are shown. Finally, experimental results on various types of datasets, including the standard machine learning datasets, streaming datasets, image datasets, and two traffic data streams, validate the effectiveness of our streamC3S and streamC3S<sub>E</sub> compared to the state-of-the-art incremental learning and stream learning algorithms. </p
... Weather [96] contains daily weather measurement data for a certain area from 2006 to 2016, including temperature, humidity, wind direction, wind speed, visibility, atmospheric pressure, etc., for predicting rainfall. Spam [97] is mainly used to identify spam. CoverType [94] is derived from the forest cover of a certain area in the U.S. Forest Service system. ...
Article
Full-text available
With the advent of the fourth industrial revolution, data-driven decision making has also become an integral part of decision making. At the same time, deep learning is one of the core technologies of the fourth industrial revolution that have become vital in decision making. However, in the era of epidemics and big data, the volume of data has increased dramatically while the sources have become progressively more complex, making data distribution highly susceptible to change. These situations can easily lead to concept drift, which directly affects the effectiveness of prediction models. How to cope with such complex situations and make timely and accurate decisions from multiple perspectives is a challenging research issue. To address this challenge, we summarize concept drift adaptation methods under the deep learning framework, which is beneficial to help decision makers make better decisions and analyze the causes of concept drift. First, we provide an overall introduction to concept drift, including the definition, causes, types, and process of concept drift adaptation methods under the deep learning framework. Second, we summarize concept drift adaptation methods in terms of discriminative learning, generative learning, hybrid learning, and others. For each aspect, we elaborate on the update modes, detection modes, and adaptation drift types of concept drift adaptation methods. In addition, we briefly describe the characteristics and application fields of deep learning algorithms using concept drift adaptation methods. Finally, we summarize common datasets and evaluation metrics and present future directions.
Article
Incremental data drifting is a common problem when employing a machine-learning model in industrial applications. The underlying data distribution evolves gradually, e.g., users change their buying preferences on an E-commerce website over time. The problem needs to be addressed to obtain high performance. Right now, studies regarding incremental data drifting suffer from several issues. For one thing, there is a lack of clear-defined incremental drift datasets for examination. Existing efforts use either collected real datasets or synthetic datasets that show two obvious limitations. One is in particular when and of which type of drifts the distribution undergoes is unknown, and the other is that a simple synthesized dataset cannot reflect the complex representation we would normally face in the real world. For another, there lacks of a well-defined protocol to evaluate a learner’s knowledge transfer capability on an incremental drift dataset. To provide a holistic discussion on these issues, we create approaches to generate datasets with specific drift types, and define a novel protocol for evaluation. Besides, we investigate recent advances in the transfer learning field, including Domain Adaptation and Lifelong Learning, and examine how they perform in the presence of incremental data drifting. The results unfold the relationships among drift types, knowledge preservation, and learning approaches.
Article
Continuous machine learning pipelines are common in industrial settings where models are periodically trained on data streams. Unfortunately, concept drifts may occur in data streams where the joint distribution of the data X and label y, P(X, y), changes over time and possibly degrade model accuracy. Existing concept drift adaptation approaches mostly focus on updating the model to the new data possibly using ensemble techniques of previous models and tend to discard the drifted historical data. However, we contend that explicitly utilizing the drifted data together leads to much better model accuracy and propose Quilt, a data-centric framework for identifying and selecting data segments that maximize model accuracy. To address the potential downside of efficiency, Quilt extends existing data subset selection techniques, which can be used to reduce the training data without compromising model accuracy. These techniques cannot be used as is because they only assume virtual drifts where the posterior probabilities P(y|X) are assumed not to change. In contrast, a key challenge in our setup is to also discard undesirable data segments with concept drifts. Quilt thus discards drifted data segments and selects data segment subsets holistically for accurate and efficient model training. The two operations use gradient-based scores, which have little computation overhead. In our experiments, we show that Quilt outperforms state-of-the-art drift adaptation and data selection baselines on synthetic and real datasets.
Article
Full-text available
In a data stream environment, classification models must effectively and efficiently handle concept drift. Ensemble methods are widely used for this purpose; however, the ones available in the literature either use a large data chunk to update the model or learn the data one by one. In the former, the model may miss the changes in the data distribution, while in the latter, the model may suffer from inefficiency and instability. To address these issues, we introduce a novel ensemble approach based on the Broad Learning System (BLS), where mini chunks are used at each update. BLS is an effective lightweight neural architecture recently developed for incremental learning. Although it is fast, it requires huge data chunks for effective updates and is unable to handle dynamic changes observed in data streams. Our proposed approach, named Broad Ensemble Learning System (BELS), uses a novel updating method that significantly improves best-in-class model accuracy. It employs an ensemble of output layers to address the limitations of BLS and handle drifts. Our model tracks the changes in the accuracy of the ensemble components and reacts to these changes. We present the mathematical derivation of BELS, perform comprehensive experiments with 35 datasets that demonstrate the adaptability of our model to various drift types, and provide its hyperparameter, ablation, and imbalanced dataset performance analysis. The experimental results show that the proposed approach outperforms 10 state-of-the-art baselines, and supplies an overall improvement of 18.59% in terms of average prequential accuracy.
Article
Full-text available
Real world text classiflcation applications are of special inter- est for the machine learning and data mining community, mainly because they introduce and combine a number of special di-culties. They deal with high dimensional, streaming, unstructured, and, in many occasions, concept drifting data. Another important peculiarity of streaming text, not adequately discussed in the relative literature, is the fact that the feature space is initially unavailable. In this paper, we discuss this aspect of textual data streams. We underline the necessity for a dynamic fea- ture space and the utility of incremental feature selection in streaming text classiflcation tasks. In addition, we describe a computationally un- demanding incremental learning framework that could serve as a baseline in the fleld. Finally, we introduce a new concept drifting dataset which could assist other researchers in the evaluation of new methodologies.
Article
Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely studied problems in this area is the identification of clusters, or densely populated regions, in a multi-dimensional dataset. Prior work does not adequately address the problem of large datasets and minimization of I/O costs.This paper presents a data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies), and demonstrates that it is especially suitable for very large databases. BIRCH incrementally and dynamically clusters incoming multi-dimensional metric data points to try to produce the best quality clustering with the available resources (i.e., available memory and time constraints). BIRCH can typically find a good clustering with a single scan of the data, and improve the quality further with a few additional scans. BIRCH is also the first clustering algorithm proposed in the database area to handle "noise" (data points that are not part of the underlying pattern) effectively.We evaluate BIRCH 's time/space efficiency, data input order sensitivity, and clustering quality through several experiments. We also present a performance comparisons of BIRCH versus CLARANS, a clustering method proposed recently for large datasets, and show that BIRCH is consistently superior.