Clustering of Time Series Data

Author(s):  
Anne Denton

Time series data is of interest to most science and engineering disciplines and analysis techniques have been developed for hundreds of years. There have, however, in recent years been new developments in data mining techniques, such as frequent pattern mining, which take a different perspective of data. Traditional techniques were not meant for such pattern-oriented approaches. There is, as a result, a significant need for research that extends traditional time-series analysis, in particular clustering, to the requirements of the new data mining algorithms.

Author(s):  
Anne Denton

Time series data is of interest to most science and engineering disciplines and analysis techniques have been developed for hundreds of years. There have, however, in recent years been new developments in data mining techniques, such as frequent pattern mining, that take a different perspective of data. Traditional techniques were not meant for such pattern-oriented approaches. There is, as a result, a significant need for research that extends traditional time-series analysis, in particular clustering, to the requirements of the new data mining algorithms.


2022 ◽  
pp. 116435
Author(s):  
Meserret Karaca ◽  
Michelle M. Alvarado ◽  
Mostafa Reisi Gahrooei ◽  
Azra Bihorac ◽  
Panos M. Pardalos

2017 ◽  
Author(s):  
◽  
Michael Phinney

Frequent pattern mining is a classic data mining technique, generally applicable to a wide range of application domains, and a mature area of research. The fundamental challenge arises from the combinatorial nature of frequent itemsets, scaling exponentially with respect to the number of unique items. Apriori-based and FPTree-based algorithms have dominated the space thus far. Initial phases of this research relied on the Apriori algorithm and utilized a distributed computing environment; we proposed the Cartesian Scheduler to manage Apriori's candidate generation process. To address the limitation of bottom-up frequent pattern mining algorithms such as Apriori and FPGrowth, we propose the Frequent Hierarchical Pattern Tree (FHPTree): a tree structure and new frequent pattern mining paradigm. The classic problem is redefined as frequent hierarchical pattern mining where the goal is to detect frequent maximal pattern covers. Under the proposed paradigm, compressed representations of maximal patterns are mined using a top-down FHPTree traversal, FHPGrowth, which detects large patterns before their subsets, thus yielding significant reductions in computation time. The FHPTree memory footprint is small; the number of nodes in the structure scales linearly with respect to the number of unique items. Additionally, the FHPTree serves as a persistent, dynamic data structure to index frequent patterns and enable efficient searches. When the search space is exponential, efficient targeted mining capabilities are paramount; this is one of the key contributions of the FHPTree. This dissertation will demonstrate the performance of FHPGrowth, achieving a 300x speed up over state-of-the-art maximal pattern mining algorithms and approximately a 2400x speedup when utilizing FHPGrowth in a distributed computing environment. In addition, we allude to future research opportunities, and suggest various modifications to further optimize the FHPTree and FHPGrowth. Moreover, the methods we offer will have an impact on other data mining research areas including contrast set mining as well as spatial and temporal mining.


2011 ◽  
Vol 403-408 ◽  
pp. 1022-1027 ◽  
Author(s):  
Gauravjeet Singh ◽  
Sandeep Bal ◽  
Poonamjeet Kaur ◽  
Kanwaljit Kaur

Frequent pattern mining has been a focused theme in data mining research. Lots of techniques have been proposed to improve the performance of frequent pattern mining algorithms. This paper presents review of different frequent mining techniques. With each technique, we have provided brief description of the technique. At the end, we compared different frequent pattern mining techniques.


2008 ◽  
pp. 1280-1299
Author(s):  
Moonjung Cho ◽  
Jian Pei ◽  
Haixun Wang ◽  
Wei Wang

Frequent pattern mining is an important data-mining problem with broad applications. Although there are many in-depth studies on efficient frequent pattern mining algorithms and constraint pushing techniques, the effectiveness of frequent pattern mining remains a serious concern: It is non-trivial and often tricky to specify appropriate support thresholds and proper constraints. In this paper, we propose a novel theme of preference-based frequent pattern mining. A user simply can specify a preference instead of setting detailed parameters in constraints. We identify the problem of preference-based frequent pattern mining and formulate the preferences for mining. We develop an efficient framework to mine frequent patterns with preferences. Interestingly, many preferences can be pushed deep into the mining by properly employing the existing efficient frequent pattern mining techniques. We conduct an extensive performance study to examine our method. The results indicate that preference-based frequent pattern mining is effective and efficient. Furthermore, we extend our discussion from pattern-based frequent pattern mining to preference-based data mining in principle and draw a general framework.


2021 ◽  
Vol 11 (1) ◽  
pp. 36-53
Author(s):  
Ramesh Dhanaseelan F. ◽  
Jeyasutha M.

Breast cancer, a type of malignant tumor, affects women more than men. About one-third of women with breast cancer die of this disease. Hence, it is imperative to find a tool for the proper identification and early treatment of breast cancer. Unlike the conventional data mining algorithms, fuzzy logic-based approaches help in the mining of association rules from quantitative transactions. In this study, a novel fuzzy methodology, IFFP (improved fuzzy frequent pattern mining), based on a fuzzy association rule mining for biological knowledge extraction, is introduced to analyze the dataset in order to find the core factors that cause breast cancer. It is determined that the factor, mitoses, has low range of values on both malignant and benign, and hence it does not contribute to the detection of breast cancer. On the other hand, the high range of bare nuclei shows more chances for the presence of breast cancer. Experimental evaluations on real datasets show that the proposed method outperforms recently proposed state-of-the-art algorithms in terms of runtime and memory usage.


Author(s):  
TARUN DHAR DIWAN ◽  
PRADEEP CHOUKSEY ◽  
R. S. THAKUR ◽  
BHARAT LODHI

The research work in data mining has achieved a high attraction due to the importance of its applications This paper addresses some theoretical and practical aspects on Exploiting Data Mining Techniques for Improving the Efficiency of Time Series Data using SPSS-CLEMENTINE. This paper can be helpful for an organization or individual when choosing proper software to meet their mining needs. In this paper, we propose utilizes the famous data mining software SPSS Clementine to mine the factors that affect information from various vantage points and analyse that information. However the purpose of this paper is to review the selected software for data mining for improving efficiency of time series data. Data mining techniques is the exploration and analysis of data in order to discover useful information from huge databases. So it is used to analyse a large audit data efficiently for Improving the Efficiency of Time Series Data. SPSS- Clementine is object-oriented, extended module interface, which allows users to add their own algorithms and utilities to Clementine’s visual programming environment. The overall objective of this research is to develop high performance data mining algorithms and tools that will provide support required to analyse the massive data sets generated by various processes that is used for predicting time series data using SPSS- Clementine. The aim of this paper is to determine the feasibility and effectiveness of data mining techniques in time series data and produce solutions for this purpose.


Author(s):  
Sudhir Tirumalasetty ◽  
A. Divya ◽  
D. Rahitya Lakshmi ◽  
Ch. Durga Bhavani ◽  
D. Anusha

Frequent pattern mining is an essential data-mining task, with a goal of discovering knowledge in the form of repeated patterns. Many efficient pattern-mining algorithms have been discovered in the last two decades, yet most do not scale to the type of data we are presented with today, the so-called “Big Data”. Scalable parallel algorithms hold the key to solving the problem in this context. This paper reviews recent advances in parallel frequent pattern mining, analysing them through the Big Data lens. Load balancing and work partitioning are the major challenges to be conquered. These challenges always invoke innovative methods to do, as Big Data evolves with no limits. The biggest challenge than before is conquering unstructured data for finding frequent patterns. To accomplish this Semi Structured Doc-Model and ranking of patterns are used.


As time-series data are eventually large the discovery of knowledge from these massive data seems to be a challenge issue. The similarity measure plays a primary role in time series data mining, which improves the accuracy of data mining task. Time series data mining is used to mine all useful knowledge from the profile of data. Obviously, we have a potential to perform these works, but it leads to a vague crisis. This paper involves a survey regarding time series technique and its related issues like challenges, preprocessing methods, pattern mining and rule discovery using data mining. Streaming of data is one of the difficult tasks that should be managed over time. Thus, this paper can provide a basic and prominent knowledge about time series in data mining research field.


2018 ◽  
Vol 232 ◽  
pp. 02049
Author(s):  
Dalin Xu ◽  
Yingmei Wei

Sequential pattern mining is always a very important branch of time series data mining. The pattern mining with visual means can be used to extract the knowledge of time series data more intuitively. Based on the research content, this paper analyzes the sequence pattern mining methods in different aspects and their combination with visualization technology. We further discuss and summarize the advantages of different visualization methods in discovering the potential patterns in time series data. Different systems and models have their unique information to show the focus. Compared with the characteristics of the model, the development and evolution of visualization technology for the discovery of potential patterns of time series data can be summarized. Finally, this paper discusses its development trend and how to play a greater role in the era of big data.


Sign in / Sign up

Export Citation Format

Share Document