Improved approaches to mine rare association rules in transactional databases

Author(s):  
R. Uday Kiran ◽  
P. Krishna Reddy
2017 ◽  
Vol 26 (1) ◽  
pp. 69-85
Author(s):  
Mohammed M. Fouad ◽  
Mostafa G.M. Mostafa ◽  
Abdulfattah S. Mashat ◽  
Tarek F. Gharib

AbstractAssociation rules provide important knowledge that can be extracted from transactional databases. Owing to the massive exchange of information nowadays, databases become dynamic and change rapidly and periodically: new transactions are added to the database and/or old transactions are updated or removed from the database. Incremental mining was introduced to overcome the problem of maintaining previously generated association rules in dynamic databases. In this paper, we propose an efficient algorithm (IMIDB) for incremental itemset mining in large databases. The algorithm utilizes the trie data structure for indexing dynamic database transactions. Performance comparison of the proposed algorithm to recently cited algorithms shows that a significant improvement of about two orders of magnitude is achieved by our algorithm. Also, the proposed algorithm exhibits linear scalability with respect to database size.


2021 ◽  
Vol 30 (04) ◽  
pp. 2150018
Author(s):  
Anindita Borah ◽  
Bhabesh Nath

Most pattern mining techniques almost singularly focus on identifying frequent patterns and very less attention has been paid to the generation of rare patterns. However, in several domains, recognizing less frequent but strongly related patterns have greater advantage over the former ones. Identification of compelling and meaningful rare associations among such patterns may proved to be significant for air quality management that has become an indispensable task in today’s world. The rare correlations between air pollutants and other parameters may aid in restricting the air pollution to a manageable level. To this end, efficient and competent rare pattern mining techniques are needed that can generate the complete set of rare patterns, further identifying significant rare association rules among them. Moreover, a notable issue with databases is their continuous update over time due to the addition of new records. The users requirement or behavior may change with the incremental update of databases that makes it difficult to determine a suitable support threshold for the extraction of interesting rare association rules. This paper, presents an efficient rare pattern mining technique to capture the complete set of rare patterns from a real environmental dataset. The proposed approach does not restart the entire mining process upon threshold update and generates the complete set of rare association rules in a single database scan. It can effectively perform incremental mining and also provides flexibility to the user to regulate the value of support threshold for generating the rare patterns. Significant rare association rules representing correlations between air pollutants and other environmental parameters are further extracted from the generated rare patterns to identify the substantial causes of air pollution. Performance analysis shows that the proposed method is more efficient than existing rare pattern mining approaches in providing significant directions to the domain experts for air pollution monitoring.


2012 ◽  
Vol 24 (06) ◽  
pp. 513-524
Author(s):  
Mohsen Alavash Shooshtari ◽  
Keivan Maghooli ◽  
Kambiz Badie

One of the main objectives of data mining as a promising multidisciplinary field in computer science is to provide a classification model to be used for decision support purposes. In the medical imaging domain, mammograms classification is a difficult diagnostic task which calls for development of automated classification systems. Associative classification, as a special case of association rules mining, has been adopted in classification problems for years. In this paper, an associative classification framework based on parallel mining of image blocks is proposed to be used for mammograms discrimination. Indeed, association rules mining is applied to a commonly used mammography image database to classify digital mammograms into three categories, namely normal, benign and malign. In order to do so, first images are preprocessed and then features are extracted from non-overlapping image blocks and discretized for rule discovery. Association rules are then discovered through parallel mining of transactional databases which correspond to the image blocks, and finally are used within a unique decision-making scheme to predict the class of unknown samples. Finally, experiments are conducted to assess the effectiveness of the proposed framework. Results show that the proposed framework proved successful in terms of accuracy, precision, and recall, and suggest that the framework could be used as the core of any future associative classifier to support mammograms discrimination.


Author(s):  
Syam Menon ◽  
Abhijeet Ghoshal ◽  
Sumit Sarkar

Although firms recognize the value in sharing data with supply chain partners, many remain reluctant to share for fear of sensitive information potentially making its way to competitors. Approaches that can help hide sensitive information could alleviate such concerns and increase the number of firms that are willing to share. Sensitive information in transactional databases often manifests itself in the form of association rules. The sensitive association rules can be concealed by altering transactions so that they remain hidden when the data are mined by the partner. The problem of hiding these rules in the data are computationally difficult (NP-hard), and extant approaches are all heuristic in nature. To our knowledge, this is the first paper that introduces the problem as a nonlinear integer formulation to hide the sensitive association rule while minimizing the alterations needed in the data set. We apply transformations that linearize the constraints and derive various results that help reduce the size of the problem to be solved. Our results show that although the nonlinear integer formulations are not practical, the linearizations and problem-reduction steps make a significant impact on solvability and solution time. This approach mitigates potential risks associated with sharing and should increase data sharing among supply chain partners.


Author(s):  
Ling Feng

The discovery of association rules from large amounts of structured or semi-structured data is an important data mining problem [Agrawal et al. 1993, Agrawal and Srikant 1994, Miyahara et al. 2001, Termier et al. 2002, Braga et al. 2002, Cong et al. 2002, Braga et al. 2003, Xiao et al. 2003, Maruyama and Uehara 2000, Wang and Liu 2000]. It has crucial applications in decision support and marketing strategy. The most prototypical application of association rules is market basket analysis using transaction databases from supermarkets. These databases contain sales transaction records, each of which details items bought by a customer in the transaction. Mining association rules is the process of discovering knowledge such as “80% of customers who bought diapers also bought beer, and 35% of customers bought both diapers and beer”, which can be expressed as “diaper ? beer” (35%, 80%), where 80% is the confidence level of the rule, and 35% is the support level of the rule indicating how frequently the customers bought both diapers and beer. In general, an association rule takes the form X ? Y (s, c), where X and Y are sets of items, and s and c are support and confidence, respectively. In the XML Era, mining association rules is confronted with more challenges than in the traditional well-structured world due to the inherent flexibilities of XML in both structure and semantics [Feng and Dillon 2005]. First, XML data has a more complex hierarchical structure than a database record. Second, elements in XML data have contextual positions, which thus carry the order notion. Third, XML data appears to be much bigger than traditional data. To address these challenges, the classic association rule mining framework originating with transactional databases needs to be re-examined.


Author(s):  
Ling Zhou ◽  
Stephen Yau

Association rule mining among frequent items has been extensively studied in data mining research. However, in recent years, there is an increasing demand for mining infrequent items (such as rare but expensive items). Since exploring interesting relationships among infrequent items has not been discussed much in the literature, in this chapter, the authors propose two simple, practical and effective schemes to mine association rules among rare items. Their algorithms can also be applied to frequent items with bounded length. Experiments are performed on the well-known IBM synthetic database. The authors’ schemes compare favorably to Apriori and FP-growth under the situation being evaluated. In addition, they explore quantitative association rule mining in transactional databases among infrequent items by associating quantities of items: some interesting examples are drawn to illustrate the significance of such mining.


Sign in / Sign up

Export Citation Format

Share Document