Scalable Mining of High-Utility Sequential Patterns With Three-Tier MapReduce Model

Jerry Chun-Wei Lin; Youcef Djenouri; Gautam Srivastava; Yuanfa Li; Philip S. Yu

doi:10.1145/3487046

Scalable Mining of High-Utility Sequential Patterns With Three-Tier MapReduce Model

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3487046 ◽

2022 ◽

Vol 16 (3) ◽

pp. 1-26

Author(s):

Jerry Chun-Wei Lin ◽

Youcef Djenouri ◽

Gautam Srivastava ◽

Yuanfa Li ◽

Philip S. Yu

Keyword(s):

Large Scale ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Main Memory ◽

Frequent Itemset ◽

Sequential Pattern ◽

Sequential Patterns ◽

Speed Up ◽

Mapreduce Model ◽

High Utility

High-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional frequent itemset mining or sequential pattern mining. Several works of HUSPM have been presented but most of them are based on main memory to speed up mining performance. However, this assumption is not realistic and not suitable in large-scale environments since in real industry, the size of the collected data is very huge and it is impossible to fit the data into the main memory of a single machine. In this article, we first develop a parallel and distributed three-stage MapReduce model for mining high-utility sequential patterns based on large-scale databases. Two properties are then developed to hold the correctness and completeness of the discovered patterns in the developed framework. In addition, two data structures called sidset and utility-linked list are utilized in the developed framework to accelerate the computation for mining the required patterns. From the results, we can observe that the designed model has good performance in large-scale datasets in terms of runtime, memory, efficiency of the number of distributed nodes, and scalability compared to the serial HUSP-Span approach.

Download Full-text

Mining Time-Interval Sequential Patterns with High Utility from Transaction Databases

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2016.p1018 ◽

2016 ◽

Vol 20 (6) ◽

pp. 1018-1026 ◽

Cited By ~ 1

Author(s):

Wen-Yen Wang ◽

◽

Anna Y.-Q. Huang ◽

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Business Practice ◽

Sequential Pattern ◽

Sequential Patterns ◽

Time Interval ◽

Business Managers ◽

Time Intervals ◽

High Utility ◽

Product Sales

The purpose of time-interval sequential pattern mining is to help superstore business managers promote product sales. Sequential pattern mining discovers the time interval patterns for items: for example, if most customers purchase product item A, and then buy items B and C after r to s and t to u days respectively, the time interval between r to s and t to u days can be provided to business managers to facilitate informed marketing decisions. We treat these time intervals as patterns to be mined, to predict the purchasing time intervals between A and B, as well as B and C. Nevertheless, little work considers the significance of product items while mining these time-interval sequential patterns. This work extends previous work and retains high-utility time interval patterns during pattern mining. This type of mining is meant to more closely reflect actual business practice. Experimental results show the differences between three mining approaches when jointly considering item utility and time intervals for purchased items. In addition to yielding more accurate patterns than the other two methods, the proposed UTMining_A method shortens execution times by delaying join processing and removing unnecessary records.

Download Full-text

HIGH UTILITY ITEM INTERVAL SEQUENTIAL PATTERN MINING ALGORITHM

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/1/1/14398 ◽

2020 ◽

Vol 36 (1) ◽

pp. 1-15

Author(s):

Tran Huy Duong ◽

Nguyen Truong Thang ◽

Vu Duc Thi ◽

Tran The Anh

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Sequence Database ◽

Mining Algorithm ◽

Pattern Growth ◽

High Utility ◽

Growth Approach

High utility sequential pattern mining is a popular topic in data mining with the main purpose is to extract sequential patterns with high utility in the sequence database. Many recent works have proposed methods to solve this problem. However, most of them does not consider item intervals of sequential patterns which can lead to the extraction of sequential patterns with too long item interval, thus making little sense. In this paper, we propose a High Utility Item Interval Sequential Pattern (HUISP) algorithm to solve this problem. Our algorithm uses pattern growth approach and some techniques to increase algorithm's performance.

Download Full-text

Dramatically Reducing Search for High Utility Sequential Patterns by Maintaining Candidate Lists

Information ◽

10.3390/info11010044 ◽

2020 ◽

Vol 11 (1) ◽

pp. 44

Author(s):

Scott Buffett

Keyword(s):

Upper Bound ◽

Pattern Mining ◽

Computational Cost ◽

Search Space ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Frequent Patterns ◽

Run Time ◽

High Utility

A ubiquitous challenge throughout all areas of data mining, particularly in the mining of frequent patterns in large databases, is centered on the necessity to reduce the time and space required to perform the search. The extent of this reduction proportionally facilitates the ability to identify patterns of interest. High utility sequential pattern mining (HUSPM) seeks to identify frequent patterns that are (1) sequential in nature and (2) hold a significant magnitude of utility in a sequence database, by considering the aspect of item value or importance. While traditional sequential pattern mining relies on the downward closure property to significantly reduce the required search space, with HUSPM, this property does not hold. To address this drawback, an approach is proposed that establishes a tight upper bound on the utility of future candidate sequential patterns by maintaining a list of items that are deemed potential candidates for concatenation. Such candidates are provably the only items that are ever needed for any extension of a given sequential pattern or its descendants in the search tree. This list is then exploited to significantly further tighten the upper bound on the utilities of descendent patterns. An extension of this work is then proposed that significantly reduces the computational cost of updating database utilities each time a candidate item is removed from the list, resulting in a massive reduction in the number of candidate sequential patterns that need to be generated in the search. Sequential pattern mining methods implementing these new techniques for bound reduction and further candidate list reduction are demonstrated via the introduction of the CRUSP and CRUSPPivot algorithms, respectively. Validation of the techniques was conducted on six public datasets. Tests show that use of the CRUSP algorithm results in a significant reduction in the overall number of candidate sequential patterns that need to be considered, and subsequently a significant reduction in run time, when compared to the current state of the art in bounding techniques. When employing the CRUSPPivot algorithm, the further reduction in the size of the search space was found to be dramatic, with the reduction in run time found to be dramatic to moderate, depending on the dataset. Demonstrating the practical significance of the work, experiments showed that time required for one particularly complex dataset was reduced from many hours to less than one minute.

Download Full-text

An Efficient Algorithm for Extracting High-Utility Hierarchical Sequential Patterns

Wireless Communications and Mobile Computing ◽

10.1155/2020/8816228 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Chunkai Zhang ◽

Zilin Du ◽

Yiwen Zu

Keyword(s):

Pattern Mining ◽

Search Space ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Second Phase ◽

Two Phase ◽

High Utility ◽

Synthetic Datasets ◽

Hierarchical Relation

High-utility sequential pattern mining (HUSPM) is an emerging topic in data mining, where utility is used to measure the importance or weight of a sequence. However, the underlying informative knowledge of hierarchical relation between different items is ignored in HUSPM, which makes HUSPM unable to extract more interesting patterns. In this paper, we incorporate the hierarchical relation of items into HUSPM and propose a two-phase algorithm MHUH, the first algorithm for high-utility hierarchical sequential pattern mining (HUHSPM). In the first phase named Extension, we use the existing algorithm FHUSpan which we proposed earlier to efficiently mine the general high-utility sequences (g-sequences); in the second phase named Replacement, we mine the special high-utility sequences with the hierarchical relation (s-sequences) as high-utility hierarchical sequential patterns from g-sequences. For further improvements of efficiency, MHUH takes several strategies such as Reduction, FGS, and PBS and a novel upper bounder TSWU, which will be able to greatly reduce the search space. Substantial experiments were conducted on both real and synthetic datasets to assess the performance of the two-phase algorithm MHUH in terms of runtime, number of patterns, and scalability. Conclusion can be drawn from the experiment that MHUH extracts more interesting patterns with underlying informative knowledge efficiently in HUHSPM.

Download Full-text

HIGH UTILITY ITEM INTERVAL SEQUENTIAL PATTERN MINING ALGORITHM

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/36/1/14398 ◽

2020 ◽

Vol 36 (1) ◽

pp. 1-15

Author(s):

Tran Huy Duong ◽

Nguyen Truong Thang ◽

Vu Duc Thi ◽

Tran The Anh

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Sequence Database ◽

Mining Algorithm ◽

Pattern Growth ◽

High Utility ◽

Growth Approach

Download Full-text

Highly Scalable Sequential Pattern Mining Based on MapReduce Model on the Cloud

2013 IEEE International Congress on Big Data ◽

10.1109/bigdata.congress.2013.48 ◽

2013 ◽

Cited By ~ 18

Author(s):

Chun-Chieh Chen ◽

Chi-Yao Tseng ◽

Ming-Syan Chen

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Mapreduce Model

Download Full-text

An Efficient Algorithm for High Utility Sequential Pattern Mining

Lecture Notes in Electrical Engineering - Frontier and Innovation in Future Computing and Communications ◽

10.1007/978-94-017-8798-7_7 ◽

2014 ◽

pp. 49-56 ◽

Cited By ~ 4

Author(s):

Jun-Zhe Wang ◽

Zong-Hua Yang ◽

Jiun-Long Huang

Keyword(s):

Efficient Algorithm ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

High Utility

Download Full-text

Efficient Chain Structure for High-Utility Sequential Pattern Mining

IEEE Access ◽

10.1109/access.2020.2976662 ◽

2020 ◽

Vol 8 ◽

pp. 40714-40722 ◽

Cited By ~ 3

Author(s):

Jerry Chun-Wei Lin ◽

Yuanfa Li ◽

Philippe Fournier-Viger ◽

Youcef Djenouri ◽

Ji Zhang

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Chain Structure ◽

Sequential Pattern ◽

High Utility

Download Full-text

Sequential Pattern Mining Algorithm Based on Text Data: Taking the Fault Text Records as an Example

Sustainability ◽

10.3390/su10114330 ◽

2018 ◽

Vol 10 (11) ◽

pp. 4330 ◽

Cited By ~ 2

Author(s):

Xinglong Yuan ◽

Wenbing Chang ◽

Shenghan Zhou ◽

Yang Cheng

Keyword(s):

Time Series ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Fault Classification ◽

Sequential Patterns ◽

Series Data ◽

Similarity Measurement ◽

Text Similarity ◽

Text Data

Sequential pattern mining (SPM) is an effective and important method for analyzing time series. This paper proposed a SPM algorithm to mine fault sequential patterns in text data. Because the structure of text data is poor and there are many different forms of text expression for the same concept, the traditional SPM algorithm cannot be directly applied to text data. The proposed algorithm is designed to solve this problem. First, this study measured the similarity of fault text data and classified similar faults into one class. Next, this paper proposed a new text similarity measurement model based on the word embedding distance. Compared with the classic text similarity measurement method, this model can achieve good results in short text classification. Then, on the basis of fault classification, this paper proposed the SPM algorithm with an event window, which is a time soft constraint for obtaining a certain number of sequential patterns according to needs. Finally, this study used the fault text records of a certain aircraft as experimental data for mining fault sequential patterns. Experiment showed that this algorithm can effectively mine sequential patterns in text data. The proposed algorithm can be widely applied to text time series data in many fields such as industry, business, finance and so on.

Download Full-text

Detecting Implicit Security Exceptions Using an Improved Variable-Length Sequential Pattern Mining Method

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194017500462 ◽

2017 ◽

Vol 27 (08) ◽

pp. 1235-1268

Author(s):

Jinfu Chen ◽

Saihua Cai ◽

Dave Towey ◽

Lili Zhu ◽

Rubing Huang ◽

...

Keyword(s):

Visual Inspection ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Variable Length ◽

Sequential Pattern ◽

Sequential Patterns ◽

Mining Method ◽

Security Testing ◽

String Searching ◽

Correct Execution

The process of component security testing can produce massive amounts of monitor logs. Current approaches to detect implicit security exceptions (those which cannot be identified by visual inspection alone) compare correct execution sequences with fixed patterns mined from the execution of sequential patterns in the monitor logs. However, this is not efficient and is not suitable for mining large monitor logs. To enable effective mining of implicit security exceptions from large monitor logs, this paper proposes a method based on improved variable-length sequential pattern mining. The proposed method first mines the variable-length sequential patterns from correct execution sequences and from actual execution sequences, thus reducing the number of patterns. The sequential patterns are then detected using the Sunday string-searching algorithm. We conducted an experimental study based on this method, the results of which show that the proposed method can efficiently detect the implicit security exceptions of components.

Download Full-text