XML-Enabled Association Analysis

Author(s):  
Ling Feng

The discovery of association rules from large amounts of structured or semi-structured data is an important data mining problem [Agrawal et al. 1993, Agrawal and Srikant 1994, Miyahara et al. 2001, Termier et al. 2002, Braga et al. 2002, Cong et al. 2002, Braga et al. 2003, Xiao et al. 2003, Maruyama and Uehara 2000, Wang and Liu 2000]. It has crucial applications in decision support and marketing strategy. The most prototypical application of association rules is market basket analysis using transaction databases from supermarkets. These databases contain sales transaction records, each of which details items bought by a customer in the transaction. Mining association rules is the process of discovering knowledge such as “80% of customers who bought diapers also bought beer, and 35% of customers bought both diapers and beer”, which can be expressed as “diaper ? beer” (35%, 80%), where 80% is the confidence level of the rule, and 35% is the support level of the rule indicating how frequently the customers bought both diapers and beer. In general, an association rule takes the form X ? Y (s, c), where X and Y are sets of items, and s and c are support and confidence, respectively. In the XML Era, mining association rules is confronted with more challenges than in the traditional well-structured world due to the inherent flexibilities of XML in both structure and semantics [Feng and Dillon 2005]. First, XML data has a more complex hierarchical structure than a database record. Second, elements in XML data have contextual positions, which thus carry the order notion. Third, XML data appears to be much bigger than traditional data. To address these challenges, the classic association rule mining framework originating with transactional databases needs to be re-examined.

Author(s):  
Ling Zhou ◽  
Stephen Yau

Association rule mining among frequent items has been extensively studied in data mining research. However, in recent years, there is an increasing demand for mining infrequent items (such as rare but expensive items). Since exploring interesting relationships among infrequent items has not been discussed much in the literature, in this chapter, the authors propose two simple, practical and effective schemes to mine association rules among rare items. Their algorithms can also be applied to frequent items with bounded length. Experiments are performed on the well-known IBM synthetic database. The authors’ schemes compare favorably to Apriori and FP-growth under the situation being evaluated. In addition, they explore quantitative association rule mining in transactional databases among infrequent items by associating quantities of items: some interesting examples are drawn to illustrate the significance of such mining.


2010 ◽  
Vol 108-111 ◽  
pp. 50-56 ◽  
Author(s):  
Liang Zhong Shen

Due to the popularity of knowledge discovery and data mining, in practice as well as among academic and corporate professionals, association rule mining is receiving increasing attention. The technology of data mining is applied in analyzing data in databases. This paper puts forward a new method which is suit to design the distributed databases.


Author(s):  
Ling Feng ◽  
Tharam Dillon

The discovery of association rules from large amounts of structured or semi-structured data is an important data-mining problem (Agrawal et al., 1993; Agrawal & Srikant, 1994; Braga et al., 2002, 2003; Cong et al., 2002; Miyahara et al., 2001; Termier et al., 2002; Xiao et al., 2003). It has crucial applications in decision support and marketing strategy. The most prototypical application of association rules is market-basket analysis using transaction databases from supermarkets. These databases contain sales transaction records, each of which details items bought by a customer in the transaction. Mining association rules is the process of discovering knowledge such as, 80% of customers who bought diapers also bought beer, and 35% of customers bought both diapers and beer, which can be expressed as “diaper Þ beer” (35%, 80%), where 80% is the confidence level of the rule, and 35% is the support level of the rule indicating how frequently the customers bought both diapers and beer. In general, an association rule takes the form X Þ Y (s, c), where X and Y are sets of items, and s and c are support and confidence, respectively.


Author(s):  
Carson Kai-Sang Leung

The problem of association rule mining was introduced in 1993 (Agrawal et al., 1993). Since then, it has been the subject of numerous studies. Most of these studies focused on either performance issues or functionality issues. The former considered how to compute association rules efficiently, whereas the latter considered what kinds of rules to compute. Examples of the former include the Apriori-based mining framework (Agrawal & Srikant, 1994), its performance enhancements (Park et al., 1997; Leung et al., 2002), and the tree-based mining framework (Han et al., 2000); examples of the latter include extensions of the initial notion of association rules to other rules such as dependence rules (Silverstein et al., 1998) and ratio rules (Korn et al., 1998). In general, most of these studies basically considered the data mining exercise in isolation. They did not explore how data mining can interact with the human user, which is a key component in the broader picture of knowledge discovery in databases. Hence, they provided little or no support for user focus. Consequently, the user usually needs to wait for a long period of time to get numerous association rules, out of which only a small fraction may be interesting to the user. In other words, the user often incurs a high computational cost that is disproportionate to what he wants to get. This calls for constraint-based association rule mining.


Author(s):  
Carson K.-S. Leung ◽  
Fan Jiang ◽  
Edson M. Dela Cruz ◽  
Vijay Sekar Elango

Collaborative filtering uses data mining and analysis to develop a system that helps users make appropriate decisions in real-life applications by removing redundant information and providing valuable to information users. Data mining aims to extract from data the implicit, previously unknown and potentially useful information such as association rules that reveals relationships between frequently co-occurring patterns in antecedent and consequent parts of association rules. This chapter presents an algorithm called CF-Miner for collaborative filtering with association rule miner. The CF-Miner algorithm first constructs bitwise data structures to capture important contents in the data. It then finds frequent patterns from the bitwise structures. Based on the mined frequent patterns, the algorithm forms association rules. Finally, the algorithm ranks the mined association rules to recommend appropriate merchandise products, goods or services to users. Evaluation results show the effectiveness of CF-Miner in using association rule mining in collaborative filtering.


2014 ◽  
Vol 23 (05) ◽  
pp. 1450004 ◽  
Author(s):  
Ibrahim S. Alwatban ◽  
Ahmed Z. Emam

In recent years, a new research area known as privacy preserving data mining (PPDM) has emerged and captured the attention of many researchers interested in preventing the privacy violations that may occur during data mining. In this paper, we provide a review of studies on PPDM in the context of association rules (PPARM). This paper systematically defines the scope of this survey and determines the PPARM models. The problems of each model are formally described, and we discuss the relevant approaches, techniques and algorithms that have been proposed in the literature. A profile of each model and the accompanying algorithms are provided with a comparison of the PPARM models.


2013 ◽  
Vol 765-767 ◽  
pp. 282-285
Author(s):  
Zhi Guo Dai ◽  
Yang Yang Han

Study on the applications of association rule mining in traditional Chinese medicine (TCM) knowledge and experience is carried out in this paper. The association rules of disease symptoms and syndrome differentiation, syndrome differentiation and prescription, disease symptoms and prescription are mined by analyzing the cases of patients with chronic gastritis, and then the mined association rules are interpreted that provide the beneficial reference for data mining technology in TCM.


2018 ◽  
Vol 36 (3) ◽  
pp. 443-457 ◽  
Author(s):  
Kaigang Yi ◽  
Tinggui Chen ◽  
Guodong Cong

Purpose Nowadays, database management system has been applied in library management, and a great number of data about readers’ visiting history to resources have been accumulated by libraries. A lot of important information is concealed behind such data. The purpose of this paper is to use a typical data mining (DM) technology named an association rule mining model to find out borrowing rules of readers according to their borrowing records, and to recommend other booklists for them in a personalized way, so as to increase utilization rate of data resources at library. Design/methodology/approach Association rule mining algorithm is applied to find out borrowing rules of readers according to their borrowing records, and to recommend other booklists for them in a personalized way, so as to increase utilization rate of data resources at library. Findings Through an analysis on record of book borrowing by readers, library manager can recommend books that may be interested by a reader based on historical borrowing records or current book-borrowing records of the reader. Research limitations/implications If many different categories of book-borrowing problems are involved, it will result in large length of encoding as well as giant searching space. Therefore, future research work may be considered in the following aspects: introduce clustering method; and apply association rule mining method to procurement of book resources and layout of books. Practical implications The paper provides a helpful inspiration for Big Data mining and software development, which will improve their efficiency and insight on users’ behavior and psychology. Social implications The paper proposes a framework to help users understand others’ behavior, which will aid them better take part in group and community with more contribution and delightedness. Originality/value DM technology has been used to discover information concealed behind Big Data in library; the library personalized recommendation problem has been analyzed and formulated deeply; and a method of improved association rules combined with artificial bee colony algorithm has been presented.


2013 ◽  
Vol 7 (3) ◽  
pp. 620-625 ◽  
Author(s):  
Varsha Mashoria ◽  
Dr. Anju Singh

As we all know that association rule is used to find out the rules that are associated with the items present in the database that satisfy user specified support and confidence. There are many algorithms for mining association rules. For improving efficiency and effectiveness of mining task. Constraints based mining enable users to concentrate on mining interested association rules instead of the complete set of association rule.”The constraints can be defined as the condition that a pattern has to satisfy ” . This paper provides or gives the major advancement in the approaches for association rule mining using different constraints.


Author(s):  
Suma B. ◽  
Shobha G.

<div>Association rule mining is a well-known data mining technique used for extracting hidden correlations between data items in large databases. In the majority of the situations, data mining results contain sensitive information about individuals and publishing such data will violate individual secrecy. The challenge of association rule mining is to preserve the confidentiality of sensitive rules when releasing the database to external parties. The association rule hiding technique conceals the knowledge extracted by the sensitive association rules by modifying the database. In this paper, we introduce a border-based algorithm for hiding sensitive association rules. The main purpose of this approach is to conceal the sensitive rule set while maintaining the utility of the database and association rule mining results at the highest level. The performance of the algorithm in terms of the side effects is demonstrated using experiments conducted on two real datasets. The results show that the information loss is minimized without sacrificing the accuracy. </div>


Sign in / Sign up

Export Citation Format

Share Document