mapreduce model
Recently Published Documents


TOTAL DOCUMENTS

117
(FIVE YEARS 32)

H-INDEX

6
(FIVE YEARS 2)

2022 ◽  
Vol 16 (3) ◽  
pp. 1-26
Author(s):  
Jerry Chun-Wei Lin ◽  
Youcef Djenouri ◽  
Gautam Srivastava ◽  
Yuanfa Li ◽  
Philip S. Yu

High-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional frequent itemset mining or sequential pattern mining. Several works of HUSPM have been presented but most of them are based on main memory to speed up mining performance. However, this assumption is not realistic and not suitable in large-scale environments since in real industry, the size of the collected data is very huge and it is impossible to fit the data into the main memory of a single machine. In this article, we first develop a parallel and distributed three-stage MapReduce model for mining high-utility sequential patterns based on large-scale databases. Two properties are then developed to hold the correctness and completeness of the discovered patterns in the developed framework. In addition, two data structures called sidset and utility-linked list are utilized in the developed framework to accelerate the computation for mining the required patterns. From the results, we can observe that the designed model has good performance in large-scale datasets in terms of runtime, memory, efficiency of the number of distributed nodes, and scalability compared to the serial HUSP-Span approach.


Author(s):  
Uttama Garg

The amount of data in today’s world is increasing exponentially. Effectively analyzing Big Data is a very complex task. The MapReduce programming model created by Google in 2004 revolutionized the big-data comput-ing market. Nowadays the model is being used by many for scientific and research analysis as well as for commercial purposes. The MapReduce model however is quite a low-level progamming model and has many limitations. Active research is being undertaken to make models that overcome/remove these limitations. In this paper we have studied some popular data analytic models that redress some of the limitations of MapReduce; namely ASTERIX and Pregel (Giraph) We discuss these models briefly and through the discussion highlight how these models are able to overcome MapReduce’s limitations.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Jerry Chun-Wei Lin ◽  
Youcef Djenouri ◽  
Gautam Srivastava ◽  
Philippe Fournier-Viger

In recent years, HUIM (or a.k.a. high-utility itemset mining) can be seen as investigated in an extensive manner and studied in many applications especially in basket-market analysis and its relevant applications. Since current basket-market scenario also involves IoT equipment to collect information, i.e., sensor or smart devices, it is necessary to consider the mining of HUIs (or a.k.a. high-utility itemsets) in a large-scale database especially with IoT situations. First, a GA-based MapReduce model is presented in this work known as GMR-Miner for mining closed patterns with high utilization in large-scale databases. The k -means model is initially adopted to group transactions regarding their relevant correlation based on the frequency factor. A genetic algorithm (GA) is utilized in the developed MapReduce framework that can be used to explore the potential and possible candidates in a limited time. Also, the developed 3-tier MapReduce model can be easily deployed in Spark for the handlings of any database of large scale for knowledge discovery of closed patterns with high utilization. We created sets of extensive experimental environments for evaluating the results of the developed GMR-Miner compared to the well-known and state-of-the-art CLS-Miner. We present our in-depth results to show that the developed GMR-Miner outperforms CLS-Miner in many criteria, i.e., memory usage, scalability, and runtime.


2021 ◽  
Vol 11 (3) ◽  
pp. 190-198
Author(s):  
Sharafadeen Muhammad ◽  
Ibrahim Kabiru Dahiru ◽  
Ahmad Abubakar ◽  
Muhammad Sanusi Ibrahim

The emergence of large amount of data requires an efficient means of processing and storage facilities. Cloud computing provides an effective solution; MapReduce programming paradigm has the ability to handle such data by implementing Hadoop, but came up with some conflicting challenges in terms of Service Level Agreement (SLA) between major stakeholders. This paper focuses on coming up with a MapReduce model through system identification in order to address the requirement of the service time to meet-up the SLA within the limit of defined threshold in the presence of uncertainties in the system. A second order nonlinear model was obtained, which shows a good representation of the real system and could be used to develop control laws on the real system.


Electronics ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 532
Author(s):  
Lan Huang ◽  
Teng Gao ◽  
Dalin Li ◽  
Zihao Wang ◽  
Kangping Wang

FPGA has recently played an increasingly important role in heterogeneous computing, but Register Transfer Level design flows are not only inefficient in design, but also require designers to be familiar with the circuit architecture. High-level synthesis (HLS) allows developers to design FPGA circuits more efficiently with a more familiar programming language, a higher level of abstraction, and automatic adaptation of timing constraints. When using HLS tools, such as Xilinx Vivado HLS, specific design patterns and techniques are required in order to create high-performance circuits. Moreover, designing efficient concurrency and data flow structures requires a deep understanding of the hardware, imposing more learning costs on programmers. In this paper, we propose a set of functional patterns libraries based on the MapReduce model, implemented by C++ templates, which can quickly implement high-performance parallel pipelined computing models on FPGA with specified simple parameters. The usage of this pattern library allows flexible adaptation of parallel and flow structures in algorithms, which greatly improves the coding efficiency. The contributions of this paper are as follows. (1) Four standard functional operators suitable for hardware parallel computing are defined. (2) Functional concurrent programming patterns are described based on C++ templates and Xilinx HLS. (3) The efficiency of this programming paradigm is verified with two algorithms with different complexity.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
C. Lakshmi ◽  
K. UshaRani

PurposeResilient distributed processing technique (RDPT), in which mapper and reducer are simplified with the Spark contexts and support distributed parallel query processing.Design/methodology/approachThe proposed work is implemented with Pig Latin with Spark contexts to develop query processing in a distributed environment.FindingsQuery processing in Hadoop influences the distributed processing with the MapReduce model. MapReduce caters to the works on different nodes with the implementation of complex mappers and reducers. Its results are valid for some extent size of the data.Originality/valuePig supports the required parallel processing framework with the following constructs during the processing of queries: FOREACH; FLATTEN; COGROUP.


2021 ◽  
Vol 348 ◽  
pp. 01003
Author(s):  
Abdullayev Vugar Hacimahmud ◽  
Ragimova Nazila Ali ◽  
Khalilov Matlab Etibar

The volume of information in the 21st century is growing at a rapid pace. Big data technologies are used to process modern information. This article discusses the use of big data technologies to implement monitoring of social processes. Big data has its characteristics and principles, which reflect here. In addition, we also discussed big data applications in some areas. Particular attention in this article pays to the interactions of big data and sociology. For this, there consider digital sociology and computational social sciences. One of the main objects of study in sociology is social processes. The article shows the types of social processes and their monitoring. As an example, there is implemented monitoring of social processes at the university. There are used following technologies for the realization of social processes monitoring: products 1010data (1010edge, 1010connect, 1010reveal, 1010equities), products of Apache Software Foundation (Apache Hive, Apache Chukwa, Apache Hadoop, Apache Pig), MapReduce framework, language R, library Pandas, NoSQL, etc. Despite this, this article examines the use of the MapReduce model for social processes monitoring at the university.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Asmaa G. Seliem ◽  
Hesham F. A. Hamed ◽  
Wael Abouelwafa

2021 ◽  
Vol 107 ◽  
pp. 05002
Author(s):  
Sergey Ivanov ◽  
Mykola Ivanov

In the paper discusses the use of big data as a tool to increase data transfer speed while providing access to multidimensional data in the process of forecasting product sales in the market. In this paper discusses modern big data tools that use the MapReduce model. The big data presented in this article is a single, centralized source of information across your entire domain. In the paper also proposes the structure of a marketing analytics system that includes many databases in which transactions are processed in real time. For marketing forecasting of multidimensional data in Matlab, a neural network is considered and built. For training and building a network, it is proposed to construct a matrix of input data for presentation in a neural network and a matrix of target data that determine the output statistical information. Input and output data in the neural network is presented in the form of a 5x10 matrix, which represents static information about 10 products for five days of the week. The application of the Levenberg-Marquardt algorithm for training a neural network is considered. The results of the neural network training process in Matlab are also presented. The obtained forecasting results are given, which allows us to conclude about the advantages of a neural network in multivariate forecasting in real time.


Sign in / Sign up

Export Citation Format

Share Document