Experimental Characteristics Study of Data Storage Formats for Data Marts Development within Data Lakes

Vladimir Belov; Alexander N. Kosenkov; Evgeny Nikulchev

doi:10.3390/app11188651

Experimental Characteristics Study of Data Storage Formats for Data Marts Development within Data Lakes

Applied Sciences ◽

10.3390/app11188651 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8651

Author(s):

Vladimir Belov ◽

Alexander N. Kosenkov ◽

Evgeny Nikulchev

Keyword(s):

Big Data ◽

Data Storage ◽

Storage System ◽

Apache Hadoop ◽

Aggregated Data ◽

Data Marts ◽

Hadoop Platform ◽

Analytical Platforms ◽

Big Data Storage

One of the most popular methods for building analytical platforms involves the use of the concept of data lakes. A data lake is a storage system in which the data are presented in their original format, making it difficult to conduct analytics or present aggregated data. To solve this issue, data marts are used, representing environments of stored data of highly specialized information, focused on the requests of employees of a certain department, the vector of an organization’s work. This article presents a study of big data storage formats in the Apache Hadoop platform when used to build data marts.

Download Full-text

Analysis of Big Data Storage Tools for Data Lakes based on Apache Hadoop Platform

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2021.0120864 ◽

2021 ◽

Vol 12 (8) ◽

Author(s):

Vladimir Belov ◽

Evgeny Nikulchev

Keyword(s):

Big Data ◽

Data Storage ◽

Apache Hadoop ◽

Hadoop Platform ◽

Big Data Storage

Download Full-text

Research of Big Data Storage System Based on Underground Space Information

10.1145/3491396.3506516 ◽

2021 ◽

Author(s):

Chunxiao Wang ◽

Zhigang Zhao ◽

Jian Zhang ◽

Jidong Huo

Keyword(s):

Big Data ◽

Data Storage ◽

Storage System ◽

Underground Space ◽

Data Storage System ◽

Big Data Storage

Download Full-text

Application and research of massive big data storage system based on HBase

2018 IEEE 3rd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA) ◽

10.1109/icccbda.2018.8386515 ◽

2018 ◽

Cited By ~ 1

Author(s):

Pan Zhengjun ◽

Zhao Lianfen

Keyword(s):

Big Data ◽

Data Storage ◽

Storage System ◽

Data Storage System ◽

Big Data Storage

Download Full-text

Big Data Storage System Based on a Distributed Hash Tables System

International Journal of Database Management Systems ◽

10.5121/ijdms.2020.12501 ◽

2020 ◽

Vol 12 (5) ◽

pp. 1-9

Author(s):

Telesphore Tiendrebeogo ◽

Mamadou Diarra

Keyword(s):

Big Data ◽

Data Storage ◽

Storage System ◽

Research Work ◽

Future Research ◽

Distributed Hash Tables ◽

Hash Tables ◽

Wide Range ◽

Data Storage System ◽

Big Data Storage

The Big Data is unavoidable considering the place of the digital is the predominant form of communication in the daily life of the consumer. The control of its stakes and the quality its data must be a priority in order not to distort the strategies arising from their treatment in the aim to derive profit. In order to achieve this, a lot of research work has been carried out companies and several platforms created. MapReduce, is one of the enabling technologies, has proven to be applicable to a wide range of fields. However, despite its importance recent work has shown its limitations. And to remedy this, the Distributed Hash Tables (DHT) has been used. Thus, this document not only analyses the and MapReduce implementations and Top-Level Domain (TLD)s in general, but it also provides a description of a model of DHT as well as some guidelines for the planification of the future research.

Download Full-text

A Survey on Accelerated Mapreduce for Hadoop

Oriental journal of computer science and technology ◽

10.13005/ojcst/10.03.07 ◽

2017 ◽

Vol 10 (3) ◽

pp. 597-602

Author(s):

Jyotindra Tiwari ◽

Dr. Mahesh Pawar ◽

Dr. Anjajana Pandey

Keyword(s):

Big Data ◽

Data Storage ◽

Energy Efficient ◽

High Performance ◽

Map Reduce ◽

Efficient Computation ◽

Apache Hadoop ◽

Huge Data ◽

Performance Techniques ◽

Big Data Storage

Big Data is defined by 3Vs which stands for variety, volume and velocity. The volume of data is very huge, data exists in variety of file types and data grows very rapidly. Big data storage and processing has always been a big issue. Big data has become even more challenging to handle these days. To handle big data high performance techniques have been introduced. Several frameworks like Apache Hadoop has been introduced to process big data. Apache Hadoop provides map/reduce to process big data. But this map/reduce can be further accelerated. In this paper a survey has been performed for map/reduce acceleration and energy efficient computation in quick time.

Download Full-text

A Logistic Based Mathematical Model to Optimize Duplicate Elimination Ratio in Content Defined Chunking Based Big Data Storage System

Symmetry ◽

10.3390/sym8070069 ◽

2016 ◽

Vol 8 (7) ◽

pp. 69 ◽

Cited By ~ 2

Author(s):

Longxiang Wang ◽

Xiaoshe Dong ◽

Xingjun Zhang ◽

Fuliang Guo ◽

Yinfeng Wang ◽

...

Keyword(s):

Mathematical Model ◽

Big Data ◽

Data Storage ◽

Storage System ◽

Data Storage System ◽

Big Data Storage

Download Full-text

Bucket based data deduplication technique for big data storage system

2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) ◽

10.1109/icrito.2016.7784963 ◽

2016 ◽

Cited By ~ 3

Author(s):

Naresh Kumar ◽

Rahul Rawat ◽

S. C. Jain

Keyword(s):

Big Data ◽

Data Storage ◽

Storage System ◽

Data Deduplication ◽

Data Storage System ◽

Big Data Storage

Download Full-text

Performance evaluation of Map-reduce jar pig hive and spark with machine learning using big data

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i4.pp3811-3818 ◽

2020 ◽

Vol 10 (4) ◽

pp. 3811

Author(s):

Santosh Jankatti ◽

Raghavendra B. K. ◽

Raghavendra S. ◽

Meenakshi Meenakshi

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Storage ◽

Processing Speed ◽

Learning Technology ◽

Apache Hadoop ◽

Hadoop Mapreduce ◽

Processing Power ◽

Big Data Storage ◽

Better Than

Big data is the biggest challenges as we need huge processing power system and good algorithms to make an decision. We need Hadoop environment with pig hive, machine learning and hadoopecosystem components. The data comes from industries. Many devices around us and sensor, and from social media sites. According to McKinsey There will be a shortage of 15000000 big data professionals by the end of 2020. There are lots of technologies to solve the problem of big data Storage and processing. Such technologies are Apache Hadoop, Apache Spark, Apache Kafka, and many more. Here we analyse the processing speed for the 4GB data on cloudx lab with Hadoop mapreduce with varing mappers and reducers and with pig script and Hive querries and spark environment along with machine learning technology and from the results we can say that machine learning with Hadoop will enhance the processing performance along with with spark, and also we can say that spark is better than Hadoop mapreduce pig and hive, spark with hive and machine learning will be the best performance enhanced compared with pig and hive, Hadoop mapreduce jar.

Download Full-text

Forensic Investigation Through Data Remnants on Hadoop Big Data Storage System

Computer Systems Science and Engineering ◽

10.32604/csse.2018.33.203 ◽

2018 ◽

Vol 33 (3) ◽

pp. 203-217

Author(s):

Myat Nandar Oo ◽

Sazia Parvin ◽

Thandar Thein

Keyword(s):

Big Data ◽

Data Storage ◽

Storage System ◽

Forensic Investigation ◽

Data Storage System ◽

Big Data Storage

Download Full-text

Efficient and secure big data storage system with leakage resilience in cloud computing

Soft Computing ◽

10.1007/s00500-018-3435-z ◽

2018 ◽

Vol 22 (23) ◽

pp. 7763-7772 ◽

Cited By ~ 14

Author(s):

Yinghui Zhang ◽

Menglei Yang ◽

Dong Zheng ◽

Pengzhen Lang ◽

Axin Wu ◽

...

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Storage ◽

Storage System ◽

Leakage Resilience ◽

Data Storage System ◽

Big Data Storage

Download Full-text