scholarly journals Experimental Characteristics Study of Data Storage Formats for Data Marts Development within Data Lakes

2021 ◽  
Vol 11 (18) ◽  
pp. 8651
Author(s):  
Vladimir Belov ◽  
Alexander N. Kosenkov ◽  
Evgeny Nikulchev

One of the most popular methods for building analytical platforms involves the use of the concept of data lakes. A data lake is a storage system in which the data are presented in their original format, making it difficult to conduct analytics or present aggregated data. To solve this issue, data marts are used, representing environments of stored data of highly specialized information, focused on the requests of employees of a certain department, the vector of an organization’s work. This article presents a study of big data storage formats in the Apache Hadoop platform when used to build data marts.

2021 ◽  
Author(s):  
Chunxiao Wang ◽  
Zhigang Zhao ◽  
Jian Zhang ◽  
Jidong Huo

2020 ◽  
Vol 12 (5) ◽  
pp. 1-9
Author(s):  
Telesphore Tiendrebeogo ◽  
Mamadou Diarra

The Big Data is unavoidable considering the place of the digital is the predominant form of communication in the daily life of the consumer. The control of its stakes and the quality its data must be a priority in order not to distort the strategies arising from their treatment in the aim to derive profit. In order to achieve this, a lot of research work has been carried out companies and several platforms created. MapReduce, is one of the enabling technologies, has proven to be applicable to a wide range of fields. However, despite its importance recent work has shown its limitations. And to remedy this, the Distributed Hash Tables (DHT) has been used. Thus, this document not only analyses the and MapReduce implementations and Top-Level Domain (TLD)s in general, but it also provides a description of a model of DHT as well as some guidelines for the planification of the future research.


2017 ◽  
Vol 10 (3) ◽  
pp. 597-602
Author(s):  
Jyotindra Tiwari ◽  
Dr. Mahesh Pawar ◽  
Dr. Anjajana Pandey

Big Data is defined by 3Vs which stands for variety, volume and velocity. The volume of data is very huge, data exists in variety of file types and data grows very rapidly. Big data storage and processing has always been a big issue. Big data has become even more challenging to handle these days. To handle big data high performance techniques have been introduced. Several frameworks like Apache Hadoop has been introduced to process big data. Apache Hadoop provides map/reduce to process big data. But this map/reduce can be further accelerated. In this paper a survey has been performed for map/reduce acceleration and energy efficient computation in quick time.


Symmetry ◽  
2016 ◽  
Vol 8 (7) ◽  
pp. 69 ◽  
Author(s):  
Longxiang Wang ◽  
Xiaoshe Dong ◽  
Xingjun Zhang ◽  
Fuliang Guo ◽  
Yinfeng Wang ◽  
...  

Author(s):  
Santosh Jankatti ◽  
Raghavendra B. K. ◽  
Raghavendra S. ◽  
Meenakshi Meenakshi

Big data is the biggest challenges as we need huge processing power system and good algorithms to make an decision. We need Hadoop environment with pig hive, machine learning and hadoopecosystem components. The data comes from industries. Many devices around us and sensor, and from social media sites. According to McKinsey There will be a shortage of 15000000 big data professionals by the end of 2020. There are lots of technologies to solve the problem of big data Storage and processing. Such technologies are Apache Hadoop, Apache Spark, Apache Kafka, and many more. Here we analyse the processing speed for the 4GB data on cloudx lab with Hadoop mapreduce with varing mappers and reducers and with pig script and Hive querries and spark environment along with machine learning technology and from the results we can say that machine learning with Hadoop will enhance the processing performance along with with spark, and also we can say that spark is better than Hadoop mapreduce pig and hive, spark with hive and machine learning will be the best performance enhanced compared with pig and hive, Hadoop mapreduce jar.


2018 ◽  
Vol 33 (3) ◽  
pp. 203-217
Author(s):  
Myat Nandar Oo ◽  
Sazia Parvin ◽  
Thandar Thein

2018 ◽  
Vol 22 (23) ◽  
pp. 7763-7772 ◽  
Author(s):  
Yinghui Zhang ◽  
Menglei Yang ◽  
Dong Zheng ◽  
Pengzhen Lang ◽  
Axin Wu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document