Improvement of Data Stream Decision Trees

Sarah Nait Bahloul; Oussama Abderrahim; Aya Ichrak Benhadj Amar; Mohammed Yacine Bouhedadja

doi:10.4018/ijdwm.290889

Improvement of Data Stream Decision Trees

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.290889 ◽

2022 ◽

Vol 18 (1) ◽

pp. 1-17

Author(s):

Sarah Nait Bahloul ◽

Oussama Abderrahim ◽

Aya Ichrak Benhadj Amar ◽

Mohammed Yacine Bouhedadja

Keyword(s):

Decision Trees ◽

Data Streams ◽

Data Stream ◽

High Speed ◽

Computational Cost ◽

Research Area ◽

Stream Classification ◽

Data Stream Classification ◽

Hoeffding Tree ◽

Benchmark Datasets

The classification of data streams has become a significant and active research area. The principal characteristics of data streams are a large amount of arrival data, the high speed and rate of its arrival, and the change of their nature and distribution over time. Hoeffding Tree is a method to, incrementally, build decision trees. Since its proposition in the literature, it has become one of the most popular tools of data stream classification. Several improvements have since emerged. Hoeffding Anytime Tree was recently introduced and is considered one of the most promising algorithms. It offers a higher accuracy compared to the Hoeffding Tree in most scenarios, at a small additional computational cost. In this work, the authors contribute by proposing three improvements to the Hoeffding Anytime Tree. The improvements are tested on known benchmark datasets. The experimental results show that two of the proposed variants make better usage of Hoeffding Anytime Tree’s properties. They learn faster while providing the same desired accuracy.

Download Full-text

Adapted One-versus-All Decision Trees for Data Stream Classification

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2008.181 ◽

2009 ◽

Vol 21 (5) ◽

pp. 624-637 ◽

Cited By ~ 47

Author(s):

S. Hashemi ◽

Ying Yang ◽

Z. Mirzamomen ◽

M. Kangavari

Keyword(s):

Decision Trees ◽

Data Stream ◽

Stream Classification ◽

Data Stream Classification

Download Full-text

Cost-Sensitive Classification for Evolving Data Streams with Concept Drift and Class Imbalance

Computational Intelligence and Neuroscience ◽

10.1155/2021/8813806 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yange Sun ◽

Meng Li ◽

Lei Li ◽

Han Shao ◽

Yi Sun

Keyword(s):

Data Streams ◽

Data Stream ◽

Learning Strategy ◽

Concept Drift ◽

Class Imbalance ◽

Data Preprocessing ◽

Cost Information ◽

Detection Mechanism ◽

Stream Classification ◽

Data Stream Classification

Class imbalance and concept drift are two primary principles that exist concurrently in data stream classification. Although the two issues have drawn enough attention separately, the joint treatment largely remains unexplored. Moreover, the class imbalance issue is further complicated if data streams with concept drift. A novel Cost-Sensitive based Data Stream (CSDS) classification is introduced to overcome the two issues simultaneously. The CSDS considers cost information during the procedures of data preprocessing and classification. During the data preprocessing, a cost-sensitive learning strategy is introduced into the ReliefF algorithm for alleviating the class imbalance at the data level. In the classification process, a cost-sensitive weighting schema is devised to enhance the overall performance of the ensemble. Besides, a change detection mechanism is embedded in our algorithm, which guarantees that an ensemble can capture and react to drift promptly. Experimental results validate that our method can obtain better classification results under different imbalanced concept drifting data stream scenarios.

Download Full-text

Microcluster-Based Incremental Ensemble Learning for Noisy, Nonstationary Data Streams

Complexity ◽

10.1155/2020/6147378 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Sanmin Liu ◽

Shan Xue ◽

Fanzhen Liu ◽

Jieren Cheng ◽

Xiulai Li ◽

...

Keyword(s):

Ensemble Learning ◽

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Majority Vote ◽

Stream Classification ◽

Model Stability ◽

Data Stream Classification ◽

Nonstationary Data ◽

Synthetic Datasets

Data stream classification becomes a promising prediction work with relevance to many practical environments. However, under the environment of concept drift and noise, the research of data stream classification faces lots of challenges. Hence, a new incremental ensemble model is presented for classifying nonstationary data streams with noise. Our approach integrates three strategies: incremental learning to monitor and adapt to concept drift; ensemble learning to improve model stability; and a microclustering procedure that distinguishes drift from noise and predicts the labels of incoming instances via majority vote. Experiments with two synthetic datasets designed to test for both gradual and abrupt drift show that our method provides more accurate classification in nonstationary data streams with noise than the two popular baselines.

Download Full-text

The Research of Data Stream Classification Based on Rough Set Theory-Neural Network Integration

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.441.717 ◽

2013 ◽

Vol 441 ◽

pp. 717-720

Author(s):

Zhi Bo Ren ◽

Chun Miao Yan ◽

Yu Zhou Wei ◽

Lei Sun

Keyword(s):

Neural Network ◽

Set Theory ◽

Rough Set ◽

Data Stream ◽

High Speed ◽

Rough Set Theory ◽

Classification Model ◽

Stream Classification ◽

Data Stream Classification ◽

Voting Rule

According to the high speed of data arriving, a large amount of data and concept drifting in the stream model, combining the techniques of rough set theory, neural network and voting rule, we put forward a new data stream classification model, which is a multi-classifier integration based on rough set theory, neural network. Firstly, it reduces all attributes using rough set theory; secondly, it constructs base classifiers on the data chunks after the reduction of attributes using the improved BP neural network; finally, it fuses various base classifiers into an ensemble by voting rule. Through applying the model to classify data stream, the experiment results show that the ensemble method is feasible and effective.

Download Full-text

Evolving Fuzzy Min–Max Neural Network Based Decision Trees for Data Stream Classification

Neural Processing Letters ◽

10.1007/s11063-016-9528-8 ◽

2016 ◽

Vol 45 (1) ◽

pp. 341-363 ◽

Cited By ~ 5

Author(s):

Zahra Mirzamomen ◽

Mohammad Reza Kangavari

Keyword(s):

Neural Network ◽

Decision Trees ◽

Data Stream ◽

Stream Classification ◽

Data Stream Classification

Download Full-text

Data Stream Classification by Dynamic Incremental Semi-Supervised Fuzzy Clustering

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213019600091 ◽

2019 ◽

Vol 28 (08) ◽

pp. 1960009 ◽

Cited By ~ 9

Author(s):

Gabriella Casalino ◽

Giovanna Castellano ◽

Corrado Mencar

Keyword(s):

Fuzzy Clustering ◽

Data Streams ◽

Data Stream ◽

Clustering Algorithm ◽

Classification Model ◽

Real World Data ◽

Stream Classification ◽

Data Stream Classification ◽

Partially Labeled Data ◽

Classification Quality

A data stream classification method called DISSFCM (Dynamic Incremental Semi-Supervised FCM) is presented, which is based on an incremental semi-supervised fuzzy clustering algorithm. The method assumes that partially labeled data belonging to different classes are continuously available during time in form of chunks. Each chunk is processed by semi-supervised fuzzy clustering leading to a cluster-based classification model. The proposed DISSFCM is capable of dynamically adapting the number of clusters to data streams, by splitting low-quality clusters so as to improve classification quality. Experimental results on both synthetic and real-world data show the effectiveness of the proposed method in data stream classification.

Download Full-text

Data Stream Classification Based on the Gamma Classifier

Mathematical Problems in Engineering ◽

10.1155/2015/939175 ◽

2015 ◽

Vol 2015 ◽

pp. 1-17 ◽

Cited By ~ 7

Author(s):

Abril Valeria Uriarte-Arcia ◽

Itzamá López-Yáñez ◽

Cornelio Yáñez-Márquez ◽

João Gama ◽

Oscar Camacho-Nieto

Keyword(s):

Data Streams ◽

Time Management ◽

Data Stream ◽

High Speed ◽

Concept Drift ◽

Synthetic Data ◽

Continuous Data ◽

Data Generation ◽

Underlying Distribution ◽

Data Stream Classification

The ever increasing data generation confronts us with the problem of handling online massive amounts of information. One of the biggest challenges is how to extract valuable information from these massive continuous data streams during single scanning. In a data stream context, data arrive continuously at high speed; therefore the algorithms developed to address this context must be efficient regarding memory and time management and capable of detecting changes over time in the underlying distribution that generated the data. This work describes a novel method for the task of pattern classification over a continuous data stream based on an associative model. The proposed method is based on the Gamma classifier, which is inspired by the Alpha-Beta associative memories, which are both supervised pattern recognition models. The proposed method is capable of handling the space and time constrain inherent to data stream scenarios. The Data Streaming Gamma classifier (DS-Gamma classifier) implements a sliding window approach to provide concept drift detection and a forgetting mechanism. In order to test the classifier, several experiments were performed using different data stream scenarios with real and synthetic data streams. The experimental results show that the method exhibits competitive performance when compared to other state-of-the-art algorithms.

Download Full-text

Handling Concept Drift in Data Stream Classification

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j8857.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 548-550

Keyword(s):

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Research Work ◽

Class Imbalance ◽

Class Imbalance Problem ◽

Stream Classification ◽

Data Stream Classification ◽

Imbalance Problem ◽

Major Factors

Data Streams are having huge volume and it can-not be stored permanently in the memory for processing. In this paper we would be mainly focusing on issues in data stream, the major factors which are affecting the accuracy of classifier like imbalance class and Concept Drift. The drift in Data Stream mining refers to the change in data. Such as Class imbalance problem notifies that the samples are in the classes are not equal. In our research work we are trying to identify the change (Drift) in data, we are trying to detect Imbalance class and noise from changed data. And According to the type of drift we are applying the algorithms and trying to make the stream more balance and noise free to improve classifier’s accuracy.

Download Full-text

Deterministic Concept Drift Detection in Ensemble Classifier Based Data Stream Classification Process

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2019010103 ◽

2019 ◽

Vol 11 (1) ◽

pp. 29-48 ◽

Cited By ~ 2

Author(s):

Mohammed Ahmed Ali Abdualrhman ◽

M C Padma

Keyword(s):

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Ensemble Classifier ◽

Experimental Result ◽

Process Time ◽

Stream Classification ◽

Data Stream Classification ◽

Proposed Model ◽

Concept Drift Detection

The data in streaming environment tends to be non-stationary. Hence, frequent and irregular changes occur in data, which usually denotes as a concept drift related to the process of classifying data streams. Depiction of the concept drift in traditional phase of data stream mining demands availability of labelled samples; however, incorporating the label to a streamlining transaction is infeasible in terms of process time and resource utilization. In this article, deterministic concept drift detection (DCDD) in ensemble classifier-based data stream classification process is proposed, which can depict a concept drift regardless of the labels assigned to samples. The depicted model of DCDD is evaluated by experimental study on dataset called poker-hand. The experimental result showing that the proposed model is accurate and scalable to detect concept drift with high drift detection rate and minimal false alarming and missing rate that compared to other contemporary models.

Download Full-text

Data stream classification Detection System Using Genetic Algorithm

i-manager’s Journal on Software Engineering ◽

10.26634/jse.6.1.1537 ◽

2011 ◽

Vol 6 (1) ◽

pp. 36-44

Author(s):

S. Jeya ◽

◽

S. Muthu Perumal Pillai ◽

Keyword(s):

Genetic Algorithm ◽

Data Stream ◽

Detection System ◽

Stream Classification ◽

Data Stream Classification

Download Full-text