Random Tree Data Stream Classifier With Sliding Window Estimator And Concept Drift

Ebtesam Almalki; Manal Abdullah

doi:10.21786/bbrc/12.1/25

Analyzing and repairing concept drift adaptation in data stream classification

Machine Learning ◽

10.1007/s10994-021-05993-w ◽

2021 ◽

Author(s):

Ben Halstead ◽

Yun Sing Koh ◽

Patricia Riddle ◽

Russel Pears ◽

Mykola Pechenizkiy ◽

...

Keyword(s):

Data Stream ◽

Concept Drift ◽

Stream Classification ◽

Data Stream Classification

Download Full-text

Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm

Entropy ◽

10.3390/e23030380 ◽

2021 ◽

Vol 23 (3) ◽

pp. 380

Author(s):

Emanuele Cavenaghi ◽

Gabriele Sottocornola ◽

Fabio Stella ◽

Markus Zanker

Keyword(s):

Real World ◽

Concept Drift ◽

Empirical Evaluation ◽

Sliding Window ◽

Discount Factor ◽

Data Streaming ◽

Sources Of Information ◽

Sequential Decision ◽

Time Step ◽

Thompson Sampling

The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address real-world challenges related to sequential decision making. In this setting, an agent selects the best action to be performed at time-step t, based on the past rewards received by the environment. This formulation implicitly assumes that the expected payoff for each action is kept stationary by the environment through time. Nevertheless, in many real-world applications this assumption does not hold and the agent has to face a non-stationary environment, that is, with a changing reward distribution. Thus, we present a new MAB algorithm, named f-Discounted-Sliding-Window Thompson Sampling (f-dsw TS), for non-stationary environments, that is, when the data streaming is affected by concept drift. The f-dsw TS algorithm is based on Thompson Sampling (TS) and exploits a discount factor on the reward history and an arm-related sliding window to contrast concept drift in non-stationary environments. We investigate how to combine these two sources of information, namely the discount factor and the sliding window, by means of an aggregation function f(.). In particular, we proposed a pessimistic (f=min), an optimistic (f=max), as well as an averaged (f=mean) version of the f-dsw TS algorithm. A rich set of numerical experiments is performed to evaluate the f-dsw TS algorithm compared to both stationary and non-stationary state-of-the-art TS baselines. We exploited synthetic environments (both randomly-generated and controlled) to test the MAB algorithms under different types of drift, that is, sudden/abrupt, incremental, gradual and increasing/decreasing drift. Furthermore, we adapt four real-world active learning tasks to our framework—a prediction task on crimes in the city of Baltimore, a classification task on insects species, a recommendation task on local web-news, and a time-series analysis on microbial organisms in the tropical air ecosystem. The f-dsw TS approach emerges as the best performing MAB algorithm. At least one of the versions of f-dsw TS performs better than the baselines in synthetic environments, proving the robustness of f-dsw TS under different concept drift types. Moreover, the pessimistic version (f=min) results as the most effective in all real-world tasks.

Download Full-text

Bhattacharyya Distance based Concept Drift Detection Method For evolving data stream

Expert Systems with Applications ◽

10.1016/j.eswa.2021.115303 ◽

2021 ◽

pp. 115303

Author(s):

Ishwar Baidari ◽

Nagaraj Honnikoll

Keyword(s):

Data Stream ◽

Detection Method ◽

Concept Drift ◽

Bhattacharyya Distance ◽

Concept Drift Detection ◽

Evolving Data

Download Full-text

A Structure for Sliding Window Equijoins in Data Stream Processing

2013 IEEE 16th International Conference on Computational Science and Engineering ◽

10.1109/cse.2013.25 ◽

2013 ◽

Cited By ~ 8

Author(s):

Hyeon Gyu Kim

Keyword(s):

Data Stream ◽

Stream Processing ◽

Sliding Window ◽

Data Stream Processing

Download Full-text

Learning from Ontology Streams with Semantic Concept Drift

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/133 ◽

2017 ◽

Cited By ~ 7

Author(s):

Jiaoyan Chen ◽

Freddy Lecue ◽

Jeff Z. Pan ◽

Huajun Chen

Keyword(s):

Semantic Web ◽

Data Stream ◽

Concept Drift ◽

Data Distribution ◽

Accurate Prediction ◽

Knowledge Structures ◽

Semantic Concept ◽

Web Data ◽

Semantic Inference

Data stream learning has been largely studied for extracting knowledge structures from continuous and rapid data records. In the semantic Web, data is interpreted in ontologies and its ordered sequence is represented as an ontology stream. Our work exploits the semantics of such streams to tackle the problem of concept drift i.e., unexpected changes in data distribution, causing most of models to be less accurate as time passes. To this end we revisited (i) semantic inference in the context of supervised stream learning, and (ii) models with semantic embeddings. The experiments show accurate prediction with data from Dublin and Beijing.

Download Full-text

An Improved Differential Evolution Algorithm for Data Stream Clustering

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i4.pp2659-2667 ◽

2019 ◽

Vol 9 (4) ◽

pp. 2659

Author(s):

Bhaskar Adepu ◽

Jayadev Gyani ◽

G. Narsimha

Keyword(s):

Differential Evolution ◽

Data Stream ◽

Concept Drift ◽

Differential Evolution Algorithm ◽

Optimization Approach ◽

Stream Clustering ◽

Data Stream Clustering ◽

Evolution Algorithm ◽

Improved Differential Evolution Algorithm ◽

Measure Estimate

A Few algorithms were actualized by the analysts for performing clustering of data streams. Most of these algorithms require that the number of clusters (K) has to be fixed by the customer based on input data and it can be kept settled all through the clustering process. Stream clustering has faced few difficulties in picking up K. In this paper, we propose an efficient approach for data stream clustering by embracing an Improved Differential Evolution (IDE) algorithm. The IDE algorithm is one of the quick, powerful and productive global optimization approach for programmed clustering. In our proposed approach, we additionally apply an entropy based method for distinguishing the concept drift in the data stream and in this way updating the clustering procedure online. We demonstrated that our proposed method is contrasted with Genetic Algorithm and identified as proficient optimization algorithm. The performance of our proposed technique is assessed and cr eates the accuracy of 92.29%, the precision is 86.96%, recall is 90.30% and F-measure estimate is 88.60%.

Download Full-text

Mining top-k high-utility itemsets from a data stream under sliding window model

Applied Intelligence ◽

10.1007/s10489-017-0939-7 ◽

2017 ◽

Vol 47 (4) ◽

pp. 1240-1255 ◽

Cited By ~ 12

Author(s):

Siddharth Dawar ◽

Veronica Sharma ◽

Vikram Goyal

Keyword(s):

Data Stream ◽

Sliding Window ◽

High Utility ◽

High Utility Itemsets

Download Full-text

Heuristic ensemble for unsupervised detection of multiple types of concept drift in data stream classification

Intelligent Decision Technologies ◽

10.3233/idt-210115 ◽

2021 ◽

pp. 1-14

Author(s):

Hanqing Hu ◽

Mehmed Kantardzic

Keyword(s):

Data Stream ◽

Concept Drift ◽

False Alarms ◽

Detection Accuracy ◽

Real World Data ◽

Traditional Concept ◽

Stream Classification ◽

Data Stream Classification ◽

Detection Algorithms ◽

Concept Drift Detection

Real-world data stream classification often deals with multiple types of concept drift, categorized by change characteristics such as speed, distribution, and severity. When labels are unavailable, traditional concept drift detection algorithms, used in stream classification frameworks, are often focused on only one type of concept drift. To overcome the limitations of traditional detection algorithms, this study proposed a Heuristic Ensemble Framework for Drift Detection (HEFDD). HEFDD aims to detect all types of concept drift by employing an ensemble of selected concept drift detection algorithms, each capable of detecting at least one type of concept drift. Experimental results show HEFDD provides significant improvement based on the z-score test when comparing detection accuracy with state-of-the-art individual algorithms. At the same time, HEFDD is able to reduce false alarms generated by individual concept drift detection algorithms.

Download Full-text