Deep active reinforcement learning for privacy preserve data mining in 5G environments

Finding frequent patterns identifies the most important patterns in data sets. Due to the huge and high-dimensional nature of transactional data, classical pattern mining techniques suffer from the limitations of dimensions and data annotations. Recently, data mining while preserving privacy is considered an important research area in recent decades. Information privacy is a tradeoff that must be considered when using data. Through many years, privacy-preserving data mining (PPDM) made use of methods that are mostly based on heuristics. The operation of deletion was used to hide the sensitive information in PPDM. In this study, we used deep active learning to hide sensitive operations and protect private information. This paper combines entropy-based active learning with an attention-based approach to effectively detect sensitive patterns. The constructed models are then validated using high-dimensional transactional data with attention-based and active learning methods in a reinforcement environment. The results show that the proposed model can support and improve the decision boundaries by increasing the number of training instances through the use of a pooling technique and an entropy uncertainty measure. The proposed paradigm can achieve cleanup by hiding sensitive items and avoiding non-sensitive items. The model outperforms greedy, genetic, and particle swarm optimization approaches.

Download Full-text

Data Privacy Preservation and Security Approaches for Sensitive Data in Big Data

10.3233/apc210221 ◽

2021 ◽

Author(s):

Rohit Ravindra Nikam ◽

Rekha Shahapurkar

Keyword(s):

Data Mining ◽

Data Analytics ◽

Data Privacy ◽

Privacy Preservation ◽

Large Data ◽

Research Area ◽

Data Sets ◽

Sensitive Information ◽

Sensitive Data ◽

Data Mining Techniques

Data mining is a technique that explores the necessary data is extracted from large data sets. Privacy protection of data mining is about hiding the sensitive information or identity of breach security or without losing data usability. Sensitive data contains confidential information about individuals, businesses, and governments who must not agree upon before sharing or publishing his privacy data. Conserving data mining privacy has become a critical research area. Various evaluation metrics such as performance in terms of time efficiency, data utility, and degree of complexity or resistance to data mining techniques are used to estimate the privacy preservation of data mining techniques. Social media and smart phones produce tons of data every minute. To decision making, the voluminous data produced from the different sources can be processed and analyzed. But data analytics are vulnerable to breaches of privacy. One of the data analytics frameworks is recommendation systems commonly used by e-commerce sites such as Amazon, Flip Kart to recommend items to customers based on their purchasing habits that lead to characterized. This paper presents various techniques of privacy conservation, such as data anonymization, data randomization, generalization, data permutation, etc. such techniques which existing researchers use. We also analyze the gap between various processes and privacy preservation methods and illustrate how to overcome such issues with new innovative methods. Finally, our research describes the outcome summary of the entire literature.

Download Full-text

Mining Environmental Data in the ADMIRE Project Using New Advanced Methods and Tools

Technology Integration Advancements in Distributed Systems and Computing ◽

10.4018/978-1-4666-0906-8.ch018 ◽

2012 ◽

pp. 296-308

Author(s):

Ondrej Habala ◽

Martin Šeleng ◽

Viet Tran ◽

Branislav Šimo ◽

Ladislav Hluchý

Keyword(s):

Data Mining ◽

Environmental Data ◽

Environmental Applications ◽

Data Sets ◽

Distributed Data ◽

New Methods ◽

Prospective Application ◽

Using Data ◽

Computer Power

The project Advanced Data Mining and Integration Research for Europe (ADMIRE) is designing new methods and tools for comfortable mining and integration of large, distributed data sets. One of the prospective application domains for such methods and tools is the environmental applications domain, which often uses various data sets from different vendors where data mining is becoming increasingly popular and more computer power becomes available. The authors present a set of experimental environmental scenarios, and the application of ADMIRE technology in these scenarios. The scenarios try to predict meteorological and hydrological phenomena which currently cannot or are not predicted by using data mining of distributed data sets from several providers in Slovakia. The scenarios have been designed by environmental experts and apart from being used as the testing grounds for the ADMIRE technology; results are of particular interest to experts who have designed them.

Download Full-text

Video Data Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch223 ◽

2011 ◽

pp. 1185-1189 ◽

Cited By ~ 2

Author(s):

Jung Hwan Oh ◽

Jeong Kyu Lee ◽

Sae Hwang

Keyword(s):

Data Mining ◽

Research Area ◽

Multimedia Databases ◽

Video Data ◽

Multimedia Data ◽

Data Sets ◽

Data Set ◽

Useful Knowledge ◽

Active Research ◽

Diverse Data

Data mining, which is defined as the process of extracting previously unknown knowledge and detecting interesting patterns from a massive set of data, has been an active research area. As a result, several commercial products and research prototypes are available nowadays. However, most of these studies have focused on corporate data — typically in an alpha-numeric database, and relatively less work has been pursued for the mining of multimedia data (Zaïane, Han, & Zhu, 2000). Digital multimedia differs from previous forms of combined media in that the bits representing texts, images, audios, and videos can be treated as data by computer programs (Simoff, Djeraba, & Zaïane, 2002). One facet of these diverse data in terms of underlying models and formats is that they are synchronized and integrated hence, can be treated as integrated data records. The collection of such integral data records constitutes a multimedia data set. The challenge of extracting meaningful patterns from such data sets has lead to research and development in the area of multimedia data mining. This is a challenging field due to the non-structured nature of multimedia data. Such ubiquitous data is required in many applications such as financial, medical, advertising and Command, Control, Communications and Intelligence (C3I) (Thuraisingham, Clifton, Maurer, & Ceruti, 2001). Multimedia databases are widespread and multimedia data sets are extremely large. There are tools for managing and searching within such collections, but the need for tools to extract hidden and useful knowledge embedded within multimedia data is becoming critical for many decision-making applications.

Download Full-text

Interactive Learning of Pattern Rankings

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213014600264 ◽

2014 ◽

Vol 23 (06) ◽

pp. 1460026 ◽

Cited By ~ 10

Author(s):

Vladimir Dzyuba ◽

Matthijs van Leeuwen ◽

Siegfried Nijssen ◽

Luc De Raedt

Keyword(s):

Data Mining ◽

Active Learning ◽

Pattern Mining ◽

Interactive Learning ◽

Building Blocks ◽

Frequent Itemset ◽

Preference Learning ◽

Ranking Functions ◽

Learning Techniques ◽

Learning Heuristics

Pattern mining provides useful tools for exploratory data analysis. Numerous efficient algorithms exist that are able to discover various types of patterns in large datasets. Unfortunately, the problem of identifying patterns that are genuinely interesting to a particular user remains challenging. Current approaches generally require considerable data mining expertise or effort from the data analyst, and hence cannot be used by typical domain experts. To address this, we introduce a generic framework for interactive learning of userspecific pattern ranking functions. The user is only asked to rank small sets of patterns, while a ranking function is inferred from this feedback by preference learning techniques. Moreover, we propose a number of active learning heuristics to minimize the effort required from the user, while ensuring that accurate rankings are obtained. We show how the learned ranking functions can be used to mine new, more interesting patterns. We demonstrate two concrete instances of our framework for two different pattern mining tasks, frequent itemset mining and subgroup discovery. We empirically evaluate the capacity of the algorithm to learn pattern rankings by emulating users. Experiments demonstrate that the system is able to learn accurate rankings, and that the active learning heuristics help reduce the required user effort. Furthermore, using the learned ranking functions as search heuristics allows discovering patterns of higher quality than those in the initial set. This shows that machine learning techniques in general, and active preference learning in particular, are promising building blocks for interactive data mining systems.

Download Full-text

Student Performance Predictions Using Knowledge Discovery Database and Data Mining, DPU Students Records as Sample

Academic Journal of Nawroz University ◽

10.25007/ajnu.v10n3a875 ◽

2021 ◽

Vol 10 (3) ◽

pp. 121-127

Author(s):

Bareen Haval ◽

Karwan Jameel Abdulrahman ◽

Araz Rajab

Keyword(s):

Data Mining ◽

Decision Tree ◽

Student Performance ◽

Educational Data Mining ◽

Data Sets ◽

Decision Tree Classifier ◽

Data Mining Techniques ◽

Academic History ◽

Tree Classifier ◽

Using Data

This article presents the results of connecting an educational data mining techniques to the academic performance of students. Three classification models (Decision Tree, Random Forest and Deep Learning) have been developed to analyze data sets and predict the performance of students. The projected submission of the three classificatory was calculated and matched. The academic history and data of the students from the Office of the Registrar were used to train the models. Our analysis aims to evaluate the results of students using various variables such as the student's grade. Data from (221) students with (9) different attributes were used. The results of this study are very important, provide a better understanding of student success assessments and stress the importance of data mining in education. The main purpose of this study is to show the student successful forecast using data mining techniques to improve academic programs. The results of this research indicate that the Decision Tree classifier overtakes two other classifiers by achieving a total prediction accuracy of 97%.

Download Full-text

AN OPTIMIZED ARM SCHEME FOR DISTINCT NETWORK DATA SET

International Journal of Computer and Communication Technology ◽

10.47893/ijcct.2015.1302 ◽

2015 ◽

pp. 191-195

Author(s):

K.GANESH KUMAR ◽

H.VIGNESH RAMAMOORTHY ◽

M.PREM KUMAR ◽

S. SUDHA

Keyword(s):

Data Mining ◽

Association Rule ◽

Association Rule Mining ◽

Distributed Databases ◽

Research Area ◽

Sequential Algorithm ◽

Data Sets ◽

Rule Mining ◽

Data Set ◽

Communication Costs

Association rule mining (ARM) discovers correlations between different item sets in a transaction database. It provides important knowledge in business for decision makers. Association rule mining is an active data mining research area and most ARM algorithms cater to a centralized environment. Centralized data mining to discover useful patterns in distributed databases isn't always feasible because merging data sets from different sites incurs huge network communication costs. In this paper, an improved algorithm based on good performance level for data mining is being proposed. In local sites, it runs the application based on the improved LMatrix algorithm, which is used to calculate local support counts. Local Site also finds a center site to manage every message exchanged to obtain all globally frequent item sets. It also reduces the time of scan of partition database by using LMatrix which increases the performance of the algorithm. Therefore, the research is to develop a distributed algorithm for geographically distributed data sets that reduces communication costs, superior running efficiency, and stronger scalability than direct application of a sequential algorithm in distributed databases.

Download Full-text

Review on Malware and Malware Detection ‎Using Data Mining Techniques

Journal of University of Babylon for Pure and Applied Sciences ◽

10.29196/jub.v25i5.104 ◽

2017 ◽

Vol 25 (5) ◽

pp. 1585-1601

Author(s):

Wesam S Bhaya ◽

Mustafa A Ali

Keyword(s):

Data Mining ◽

Computer Security ◽

Computer System ◽

Private Information ◽

Detection System ◽

Denial Of Service ◽

Malware Detection ◽

Malicious Software ◽

Attack Data ◽

Using Data

Malicious software is any type of software or codes which hooks some: private information, data from the computer system, computer operations or(and) merely just to do malicious goals of the author on the computer system, without permission of the computer users. (The short abbreviation of malicious software is Malware). However, the detection of malware has become one of biggest issues in the computer security field because of the current communication infrastructures are vulnerable to penetration from many types of malware infection strategies and attacks. Moreover, malwares are variant and diverse in volume and types and that strictly explode the effectiveness of traditional defense methods like signature approach, which is unable to detect a new malware. However, this vulnerability will lead to a successful computer system penetration (and attack) as well as success of more advanced attacks like distributed denial of service (DDoS) attack. Data mining methods can be used to overcome limitation of signature-based techniques to detect the zero-day malware. This paper provides an overview of malware and malware detection system using modern techniques such as techniques of data mining approach to detect known and unknown malware samples.

Download Full-text

A Weighted Frequent Item-Set Mining using WD-FIM Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3683.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 4792-4796

Keyword(s):

Data Mining ◽

Decision Making ◽

Research Area ◽

Data Sets ◽

Weight Factor ◽

Smart Systems ◽

Frequent Item ◽

Significant Research ◽

Downward Closure ◽

The One

Smart systems are the one of the most significant inventions of our times. These systems rely on powerful information mining techniques to achieve intelligence in decision making. Frequent item set mining (FIM), has become one of the most significant research area of data mining. The information present in databases is in-general ambiguous and uncertain. In such databases, one should think of weighted FIM to discover item sets which are significant from end user’s perspective. Be that as it may, with introduction of weight-factor for FIM makes the weighted continuous item sets may not fulfil the descending conclusion property anymore. Subsequently, the pursuit space of successive item set can't be limited by descending conclusion property which prompts a poor time effectiveness. In this paper, we introduce two properties for FIM, first one is, weight judgment downward closure property (WD-FIM), it is for weighted FIM and the second one is existence property for its subsets. In view of above two properties, the WD-FIM calculation is proposed to limit the looking through space of the weighted regular item sets and improve the time effectiveness. In addition, the culmination and time productivity of WD-FIM calculation are examined hypothetically. At last, the exhibition of the proposed WD-FIM calculation is confirmed on both engineered and genuine data sets

Download Full-text

Using Data Mining In Learning Management Systems Amidst Covid-19

Aksara: Jurnal Ilmu Pendidikan Nonformal ◽

10.37905/aksara.6.3.213-216.2020 ◽

2020 ◽

Vol 6 (3) ◽

pp. 213

Author(s):

Froilan D Mobo

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Learning Management Systems ◽

The Philippines ◽

Management Systems ◽

Official Gazette ◽

Learning Management ◽

The Republic ◽

Using Data ◽

The One

The Second Semester of Academic Year 2019-2020 was temporarily suspended due to the widespread COVID-19 last March 16, 2020, forcing the President of the Republic of the Philippines, Hon. Rodrigo Roa Duterte imposed an Enhanced Community Quarantine in Luzon which is known as a lockdown closing all the border points of each town and provinces. One of the major problem encountered during the lockdown is the suspension of classes because as per IATF guidelines you need to stay home, the said Memorandum Order was posted in the official gazette, (Medialdea, 2020)The dataset on the features of the Learning Management Systems using Moodle is that Professors will be the one who will set the topics, quizzes, and exercises for his class even the assessment methods on the system. To prevent from slowing down the network, the Team of Seaversity the developer of the learning management systems headed by C/E Ephrem Dela Cernan conducts a ZOOM Training to all Faculty to be familiarized more on the Learning Management Systems of the Philippine Merchant Marine Academy. The Moodle Learning Management Systems is a user-friendly environment because of its features and users can easily adjust from the traditional face to face teaching going to e-Learning approach because of it’s all capabilities as a data mining methods such as statistics, association rule mining, pattern mining visualization, categorization, clustering, and text mining., (AlAjmi & Shakir, 2013)

Download Full-text

Detecting Outliers in High Dimensional Data Sets using Z-Score Methodology

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a3910.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 48-53

Keyword(s):

Outlier Detection ◽

Credit Card ◽

High Dimensional Data ◽

Research Area ◽

High Dimensional ◽

Data Sets ◽

Z Score ◽

Wide Range ◽

Efficiency And Effectiveness ◽

Projected Methods

Outlier detection is an interesting research area in machine learning. With the recently emergent tools and varied applications, the attention of outlier recognition is growing significantly. Recently, a significant number of outlier detection approaches have been observed and effectively applied in a wide range of fields, comprising medical health, credit card fraud and intrusion detection. They can be utilized for conservative data analysis. However, Outlier recognition aims to discover sequence in data that do not conform to estimated performance. In this paper, we presented a statistical approach called Z-score method for outlier recognition in high-dimensional data. Z-scores is a novel method for deciding distant data based on data positions on charts. The projected method is computationally fast and robust to outliers’ recognition. A comparative Analysis with extant methods is implemented with high dimensional datasets. Exploratory outcomes determines an enhanced accomplishment, efficiency and effectiveness of our projected methods.

Download Full-text