scholarly journals Autonomous Sensor Data Cleaning in Stream Mining Setting

2018 ◽  
Vol 9 (2) ◽  
pp. 69-79 ◽  
Author(s):  
Klemen Kenda ◽  
Dunja Mladenić

Abstract Background: Internet of Things (IoT), earth observation and big scientific experiments are sources of extensive amounts of sensor big data today. We are faced with large amounts of data with low measurement costs. A standard approach in such cases is a stream mining approach, implying that we look at a particular measurement only once during the real-time processing. This requires the methods to be completely autonomous. In the past, very little attention was given to the most time-consuming part of the data mining process, i.e. data pre-processing. Objectives: In this paper we propose an algorithm for data cleaning, which can be applied to real-world streaming big data. Methods/Approach: We use the short-term prediction method based on the Kalman filter to detect admissible intervals for future measurements. The model can be adapted to the concept drift and is useful for detecting random additive outliers in a sensor data stream. Results: For datasets with low noise, our method has proven to perform better than the method currently commonly used in batch processing scenarios. Our results on higher noise datasets are comparable. Conclusions: We have demonstrated a successful application of the proposed method in real-world scenarios including the groundwater level, server load and smart-grid data

Author(s):  
Sylva Girtelschmid ◽  
Matthias Steinbauer ◽  
Vikash Kumar ◽  
Anna Fensel ◽  
Gabriele Kotsis

Purpose – The purpose of this article is to propose and evaluate a novel system architecture for Smart City applications which uses ontology reasoning and a distributed stream processing framework on the cloud. In the domain of Smart City, often methodologies of semantic modeling and automated inference are applied. However, semantic models often face performance problems when applied in large scale. Design/methodology/approach – The problem domain is addressed by using methods from Big Data processing in combination with semantic models. The architecture is designed in a way that for the Smart City model still traditional semantic models and rule engines can be used. However, sensor data occurring at such Smart Cities are pre-processed by a Big Data streaming platform to lower the workload to be processed by the rule engine. Findings – By creating a real-world implementation of the proposed architecture and running simulations of Smart Cities of different sizes, on top of this implementation, the authors found that the combination of Big Data streaming platforms with semantic reasoning is a valid approach to the problem. Research limitations/implications – In this article, real-world sensor data from only two buildings were extrapolated for the simulations. Obviously, real-world scenarios will have a more complex set of sensor input values, which needs to be addressed in future work. Originality/value – The simulations show that merely using a streaming platform as a buffer for sensor input values already increases the sensor data throughput and that by applying intelligent filtering in the streaming platform, the actual number of rule executions can be limited to a minimum.


2019 ◽  
Vol 8 (S3) ◽  
pp. 45-49
Author(s):  
V. Bhagyasree ◽  
K. Rohitha ◽  
K. Kusuma ◽  
S. Kokila

The Internet of Things anticipates the combination of physical gadgets to the Internet and their access to wireless sensor data which makes it useful to restrain the physical world. Big Data convergence has many aspects and new opportunities ahead of business ventures to get into a new market or enhance their operations in the current market. The existing techniques and technologies is probably safe to say that the best solution is to use big data tools to provide an analytical solution to the Internet of Things. Based on the current technology deployment and adoption trends, it is visioned that the Internet of Things is the technology of the future; while to-day’s real-world devices can provide best and valuable analytics, and people in the real world use many IOT devices. In spite of all the advertisements that companies offer in connection with the Internet of Things, you as a liable consumer, have the right to be suspicious about IoT advertisements. This paper focuses on the Internet of things concerning reality and what are the prospects for the future.


Entity resolution refers to the method of identifying the same real world object from multiple data sets. In Data cleaning and data integration application, entity resolution is an important process. When data is large the task of entity resolution becomes complex and time consuming. End-to-end entity resolution proposal involves stages like blocking (efficiently identifies duplicates), detailed comparison (refines blocking output) and clustering (identifies the set of records which may refer to the same entity). In this paper, an approach for feedback based optimization of complete entity resolution is proposed in which supervised meta-blocking is used for blocking stage. This paper proposes a technique for entity resolution which does optimization of each phase of entity resolution with benefits of supervised Meta-blocking to improve performance of entity resolution for big data


Author(s):  
Anuradha Rajkumar ◽  
Bruce Wallace ◽  
Laura Ault ◽  
Julien Lariviere-Chartier ◽  
Frank Knoefel ◽  
...  

2021 ◽  
pp. 100489
Author(s):  
Paul La Plante ◽  
P.K.G. Williams ◽  
M. Kolopanis ◽  
J.S. Dillon ◽  
A.P. Beardsley ◽  
...  

Entropy ◽  
2021 ◽  
Vol 23 (3) ◽  
pp. 380
Author(s):  
Emanuele Cavenaghi ◽  
Gabriele Sottocornola ◽  
Fabio Stella ◽  
Markus Zanker

The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address real-world challenges related to sequential decision making. In this setting, an agent selects the best action to be performed at time-step t, based on the past rewards received by the environment. This formulation implicitly assumes that the expected payoff for each action is kept stationary by the environment through time. Nevertheless, in many real-world applications this assumption does not hold and the agent has to face a non-stationary environment, that is, with a changing reward distribution. Thus, we present a new MAB algorithm, named f-Discounted-Sliding-Window Thompson Sampling (f-dsw TS), for non-stationary environments, that is, when the data streaming is affected by concept drift. The f-dsw TS algorithm is based on Thompson Sampling (TS) and exploits a discount factor on the reward history and an arm-related sliding window to contrast concept drift in non-stationary environments. We investigate how to combine these two sources of information, namely the discount factor and the sliding window, by means of an aggregation function f(.). In particular, we proposed a pessimistic (f=min), an optimistic (f=max), as well as an averaged (f=mean) version of the f-dsw TS algorithm. A rich set of numerical experiments is performed to evaluate the f-dsw TS algorithm compared to both stationary and non-stationary state-of-the-art TS baselines. We exploited synthetic environments (both randomly-generated and controlled) to test the MAB algorithms under different types of drift, that is, sudden/abrupt, incremental, gradual and increasing/decreasing drift. Furthermore, we adapt four real-world active learning tasks to our framework—a prediction task on crimes in the city of Baltimore, a classification task on insects species, a recommendation task on local web-news, and a time-series analysis on microbial organisms in the tropical air ecosystem. The f-dsw TS approach emerges as the best performing MAB algorithm. At least one of the versions of f-dsw TS performs better than the baselines in synthetic environments, proving the robustness of f-dsw TS under different concept drift types. Moreover, the pessimistic version (f=min) results as the most effective in all real-world tasks.


Sign in / Sign up

Export Citation Format

Share Document