Novel techniques to reduce search space in multiple minimum supports-based frequent pattern mining algorithms

The research on mining interesting patterns from transactions or scientific datasets has matured over the last two decades. At present, numerous algorithms exist to mine patterns of variable complexities, such as set, sequence, tree, graph, etc. Collectively, they are referred as Frequent Pattern Mining (FPM) algorithms. FPM is useful in most of the prominent knowledge discovery tasks, like classification, clustering, outlier detection, etc. They can be further used, in database tasks, like indexing and hashing while storing a large collection of patterns. But, the usage of FPM in real-life knowledge discovery systems is considerably low in comparison to their potential. The prime reason is the lack of interpretability caused from the enormity of the output-set size. For instance, a moderate size graph dataset with merely thousand graphs can produce millions of frequent graph patterns with a reasonable support value. This is expected due to the combinatorial search space of pattern mining. However, classification, clustering, and other similar Knowledge discovery tasks should not use that many patterns as their knowledge nuggets (features), as it would increase the time and memory complexity of the system. Moreover, it can cause a deterioration of the task quality because of the popular “curse of dimensionality” effect. So, in recent years, researchers felt the need to summarize the output set of FPM algorithms, so that the summary-set is small, non-redundant and discriminative. There are different summarization techniques: lossless, profile-based, cluster-based, statistical, etc. In this article, we like to overview the main concept of these summarization techniques, with a comparative discussion of their strength, weakness, applicability and computation cost.

Download Full-text

Clustering of Time Series Data

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch042 ◽

2011 ◽

pp. 258-263

Author(s):

Anne Denton

Keyword(s):

Data Mining ◽

Time Series ◽

Pattern Mining ◽

Time Series Data ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Series Data ◽

Science And Engineering ◽

Data Mining Algorithms ◽

Mining Algorithms

Time series data is of interest to most science and engineering disciplines and analysis techniques have been developed for hundreds of years. There have, however, in recent years been new developments in data mining techniques, such as frequent pattern mining, that take a different perspective of data. Traditional techniques were not meant for such pattern-oriented approaches. There is, as a result, a significant need for research that extends traditional time-series analysis, in particular clustering, to the requirements of the new data mining algorithms.

Download Full-text

Data-Performance Characterization of Frequent Pattern Mining Algorithms

International Journal of Data Mining & Knowledge Management Process ◽

10.5121/ijdkp.2015.5105 ◽

2015 ◽

Vol 5 (1) ◽

pp. 51-70

Author(s):

Sayaka Akioka

Keyword(s):

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Performance Characterization ◽

Mining Algorithms

Download Full-text

Frequent Pattern Mining Algorithms for Data Clustering

Frequent Pattern Mining ◽

10.1007/978-3-319-07821-2_16 ◽

2014 ◽

pp. 403-423 ◽

Cited By ~ 3

Author(s):

Arthur Zimek ◽

Ira Assent ◽

Jilles Vreeken

Keyword(s):

Data Clustering ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Mining Algorithms

Download Full-text

Frequent Pattern Mining Algorithms for Finding Associated Frequent Patterns for Data Streams: A Survey

Procedia Computer Science ◽

10.1016/j.procs.2014.08.019 ◽

2014 ◽

Vol 37 ◽

pp. 109-116 ◽

Cited By ~ 21

Author(s):

Shamila Nasreen ◽

Muhammad Awais Azam ◽

Khurram Shehzad ◽

Usman Naeem ◽

Mustansar Ali Ghazanfar

Keyword(s):

Data Streams ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Frequent Patterns ◽

Mining Algorithms

Download Full-text

Distributed frequent hierarchical pattern mining for robust and efficient large-scale association discovery

10.32469/10355/63867 ◽

2017 ◽

Author(s):

◽

Michael Phinney

Keyword(s):

Data Mining ◽

Distributed Computing ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Generation Process ◽

Computing Environment ◽

Wide Range ◽

Mining Algorithms ◽

Hierarchical Pattern

Frequent pattern mining is a classic data mining technique, generally applicable to a wide range of application domains, and a mature area of research. The fundamental challenge arises from the combinatorial nature of frequent itemsets, scaling exponentially with respect to the number of unique items. Apriori-based and FPTree-based algorithms have dominated the space thus far. Initial phases of this research relied on the Apriori algorithm and utilized a distributed computing environment; we proposed the Cartesian Scheduler to manage Apriori's candidate generation process. To address the limitation of bottom-up frequent pattern mining algorithms such as Apriori and FPGrowth, we propose the Frequent Hierarchical Pattern Tree (FHPTree): a tree structure and new frequent pattern mining paradigm. The classic problem is redefined as frequent hierarchical pattern mining where the goal is to detect frequent maximal pattern covers. Under the proposed paradigm, compressed representations of maximal patterns are mined using a top-down FHPTree traversal, FHPGrowth, which detects large patterns before their subsets, thus yielding significant reductions in computation time. The FHPTree memory footprint is small; the number of nodes in the structure scales linearly with respect to the number of unique items. Additionally, the FHPTree serves as a persistent, dynamic data structure to index frequent patterns and enable efficient searches. When the search space is exponential, efficient targeted mining capabilities are paramount; this is one of the key contributions of the FHPTree. This dissertation will demonstrate the performance of FHPGrowth, achieving a 300x speed up over state-of-the-art maximal pattern mining algorithms and approximately a 2400x speedup when utilizing FHPGrowth in a distributed computing environment. In addition, we allude to future research opportunities, and suggest various modifications to further optimize the FHPTree and FHPGrowth. Moreover, the methods we offer will have an impact on other data mining research areas including contrast set mining as well as spatial and temporal mining.

Download Full-text

USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS

Information System in Management ◽

10.22630/isim.2017.6.3.19 ◽

2017 ◽

Vol 6 (3) ◽

pp. 213-222

Author(s):

Piotr Ożdżyński ◽

Keyword(s):

Text Analysis ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Mining Algorithms

Download Full-text

Performance Analysis of Frequent Pattern Mining with Multiple Minimum Supports

Journal of Internet Computing and services ◽

10.7472/jksii.2013.14.6.01 ◽

2013 ◽

Vol 14 (6) ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Heungmo Ryang ◽

Unil Yun

Keyword(s):

Performance Analysis ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Multiple Minimum Supports

Download Full-text