data redundancy
Recently Published Documents


TOTAL DOCUMENTS

199
(FIVE YEARS 62)

H-INDEX

15
(FIVE YEARS 2)

2022 ◽  
Vol 2022 ◽  
pp. 1-10
Author(s):  
Tingting Yu

In order to meet the requirements of users in terms of speed, capacity, storage efficiency, and security, with the goal of improving data redundancy and reducing data storage space, an unbalanced big data compatible cloud storage method based on redundancy elimination technology is proposed. A new big data acquisition platform is designed based on Hadoop and NoSQL technologies. Through this platform, efficient unbalanced data acquisition is realized. The collected data are classified and processed by classifier. The classified unbalanced big data are compressed by Huffman algorithm, and the data security is improved by data encryption. Based on the data processing results, the big data redundancy processing is carried out by using the data deduplication algorithm. The cloud platform is designed to store redundant data in the cloud. The results show that the method in this paper has high data deduplication rate and data deduplication speed rate and low data storage space and effectively reduces the burden of data storage.


2021 ◽  
Vol 17 (4) ◽  
pp. 1-38
Author(s):  
Takayuki Fukatani ◽  
Hieu Hanh Le ◽  
Haruo Yokota

With the recent performance improvements in commodity hardware, low-cost commodity server-based storage has become a practical alternative to dedicated-storage appliances. Because of the high failure rate of commodity servers, data redundancy across multiple servers is required in a server-based storage system. However, the extra storage capacity for this redundancy significantly increases the system cost. Although erasure coding (EC) is a promising method to reduce the amount of redundant data, it requires distributing and encoding data among servers. There remains a need to reduce the performance impact of these processes involving much network traffic and processing overhead. Especially, the performance impact becomes significant for random-intensive applications. In this article, we propose a new lightweight redundancy control for server-based storage. Our proposed method uses a new local filesystem-based approach that avoids distributing data by adding data redundancy to locally stored user data. Our method switches the redundancy method of user data between replication and EC according to workloads to improve capacity efficiency while achieving higher performance. Our experiments show up to 230% better online-transaction-processing performance for our method compared with CephFS, a widely used alternative system. We also confirmed that our proposed method prevents unexpected performance degradation while achieving better capacity efficiency.


Author(s):  
Shubh Goyal

Abstract: By utilizing the Hadoop environment, data may be loaded and searched from local data nodes. Because the dataset's capacity may be vast, loading and finding data using a query is often more difficult. We suggest a method for dealing with data in local nodes that does not overlap with data acquired by script. The query's major purpose is to store information in a distributed environment and look for it quickly. In this section, we define the script to eliminate duplicate data redundancy when searching and loading data in a dynamic manner. In addition, the Hadoop file system is available in a distributed environment. Keywords: HDFS; Hadoop distributed file system; replica; local; distributed; capacity; SQL; redundancy


2021 ◽  
Author(s):  
Baoling Qin

Targeted at the current issues of communication delay, data congestion, and data redundancy in cloud computing for medical big data, a fog computing optimization model is designed, namely an intelligent front-end architecture of fog computing. It uses the network structure characteristics of fog computing and “decentralized and local” mind-sets to tackle the current medical IoT network’s narrow bandwidth, information congestion, heavy computing burden on cloud services, insufficient storage space, and poor data security and confidentiality. The model is composed of fog computing, deep learning, and big data technology. By full use of the advantages of WiFi and user mobile devices in the medical area, it can optimize the internal technology of the model, with the help of classification methods based on big data mining and deep learning algorithms based on artificial intelligence, and automatically process case diagnosis, multi-source heterogeneous data mining, and medical records. It will also improve the accuracy of medical diagnosis and the efficiency of multi-source heterogeneous data processing while reducing network delay and power consumption, ensuring patient data privacy and safety, reducing data redundancy, and reducing cloud overload. The response speed and network bandwidth of the system have been greatly optimized in the process, which improves the quality of medical information service.


2021 ◽  
Vol 13 (6) ◽  
pp. 1-13
Author(s):  
Guangxuan Chen ◽  
Guangxiao Chen ◽  
Lei Zhang ◽  
Qiang Liu

In order to solve the problems of repeated acquisition, data redundancy and low efficiency in the process of website forensics, this paper proposes an incremental acquisition method orientecd to dynamic websites. This method realized the incremental collection on dynamically updated websites through acquiring and parsing web pages, URL deduplication, web page denoising, web page content extraction and hashing. Experiments show that the algorithm has relative high acquisition precision and recall rate, and can be combined with other data to perform effective digital forensics on dynamically updated real-time websites.


2021 ◽  
Author(s):  
Katherine James ◽  
Aoesha Alsobhe ◽  
Simon Joseph Cockell ◽  
Anil Wipat ◽  
Matthew Pocock

Background: Probabilistic functional integrated networks (PFINs) are designed to aid our understanding of cellular biology and can be used to generate testable hypotheses about protein function. PFINs are generally created by scoring the quality of interaction datasets against a Gold Standard dataset, usually chosen from a separate high-quality data source, prior to their integration. Use of an external Gold Standard has several drawbacks, including data redundancy, data loss and the need for identifier mapping, which can complicate the network build and impact on PFIN performance. Results: We describe the development of an integration technique, ssNet, that scores and integrates both high-throughput and low-throughout data from a single source database in a consistent manner without the need for an external Gold Standard dataset. Using data from Saccharomyces cerevisiae we show that ssNet is easier and faster, overcoming the challenges of data redundancy, Gold Standard bias and ID mapping, while producing comparable performance. In addition ssNet results in less loss of data and produces a more complete network. Conclusions: The ssNet method allows PFINs to be built successfully from a single database, while producing comparable network performance to networks scored using an external Gold Standard source. Keywords: Network integration; Bioinformatics; Gold Standards; Probabilistic functional integrated networks; Protein function prediction; Interactome.


2021 ◽  
Author(s):  
Vincent van der Bent ◽  
Amin Amin ◽  
Timothy Jadot

Abstract With the advent of increased measurements and instrumentation in oil and gas upstream production infrastructure; in the wellbore, in subsea and on surface processing facilities, data integration from all sources can be used more effectively in producing consistent and robust production profiles. The proposed data integration methodology aims at identifying the sources of measurement and process errors and removing them from the system. This ensures quasi error-free data when driving critical applications such as well rate determination from virtual and multiphase meters, and production allocation schemes, to name few. Confidence in the data is further enhanced by quantifying the uncertainty of each measured and unmeasured variable. Advanced Data Validation and Reconciliation (DVR) methodology uses data redundancy to correct measurements. As more data is ingested in a modeling system the statistical aspect attached to each measurement becomes an important source of information to further improve its precision. DVR is an equation-based calculation process. It combines data redundancy and conservation laws to correct measurements and convert them into accurate and reliable information. The methodology is used in upstream oil & gas, refineries and gas plants, petrochemical plants as well as power plants including nuclear. DVR detects faulty sensors and identifies degradation of equipment performance. As such, it provides more robust inputs to operations, simulation, and automation processes. The DVR methodology is presented using field data from a producing offshore field. The discussion details the design and implementation of a DVR system to integrate all available field data from the wellbore and surface facilities. The integrated data in this end-to-end evaluation includes reservoir productivity parameters, downhole and wellhead measurements, tuned vertical lift models, artificial lift devices, fluid sample analysis and thermodynamic models, and top facility process measurements. The automated DVR iterative runs solve all conservation equations simultaneously when determining the production flowrates "true values" and their uncertainties. The DVR field application is successfully used in real-time to ensure data consistency across a number of production tasks including the continual surveillance of the critical components of the production facility, the evaluation and validation of well tests using multiphase flow metering, the virtual flow metering of each well, the modeling of fluid phase behavior in the well and in the multistage separation facility, and performing the back allocation from sales meters to individual wells.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Min Li ◽  
Jiashu Wu ◽  
Junbiao Dai ◽  
Qingshan Jiang ◽  
Qiang Qu ◽  
...  

AbstractCurrent research on DNA storage usually focuses on the improvement of storage density by developing effective encoding and decoding schemes while lacking the consideration on the uncertainty in ultra-long-term data storage and retention. Consequently, the current DNA storage systems are often not self-contained, implying that they have to resort to external tools for the restoration of the stored DNA data. This may result in high risks in data loss since the required tools might not be available due to the high uncertainty in far future. To address this issue, we propose in this paper a self-contained DNA storage system that can bring self-explanatory to its stored data without relying on any external tool. To this end, we design a specific DNA file format whereby a separate storage scheme is developed to reduce the data redundancy while an effective indexing is designed for random read operations to the stored data file. We verified through experimental data that the proposed self-contained and self-explanatory method can not only get rid of the reliance on external tools for data restoration but also minimise the data redundancy brought about when the amount of data to be stored reaches a certain scale.


Sign in / Sign up

Export Citation Format

Share Document