data deduplication Latest Research Papers

In order to meet the requirements of users in terms of speed, capacity, storage efficiency, and security, with the goal of improving data redundancy and reducing data storage space, an unbalanced big data compatible cloud storage method based on redundancy elimination technology is proposed. A new big data acquisition platform is designed based on Hadoop and NoSQL technologies. Through this platform, efficient unbalanced data acquisition is realized. The collected data are classified and processed by classifier. The classified unbalanced big data are compressed by Huffman algorithm, and the data security is improved by data encryption. Based on the data processing results, the big data redundancy processing is carried out by using the data deduplication algorithm. The cloud platform is designed to store redundant data in the cloud. The results show that the method in this paper has high data deduplication rate and data deduplication speed rate and low data storage space and effectively reduces the burden of data storage.

Download Full-text

The Analysis and Implication of Data Deduplication in Digital Forensics

Cyberspace Safety and Security - Lecture Notes in Computer Science ◽

10.1007/978-3-030-94029-4_14 ◽

2022 ◽

pp. 198-215

Author(s):

Izabela Savić ◽

Xiaodong Lin

Keyword(s):

Digital Forensics ◽

Data Deduplication

Download Full-text

Comparison of Ciphertext Features for Data Deduplication

2021 International Conference on Big Data Analytics for Cyber-Physical System in Smart City - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-981-16-7469-3_106 ◽

2022 ◽

pp. 941-948

Author(s):

Kejie Zhao ◽

Dongfang Jia ◽

Jun Ye

Keyword(s):

Data Deduplication

Download Full-text

Privacy-Enhanced Data Deduplication Computational Intelligence Technique for Secure Healthcare Applications

Computers Materials & Continua ◽

10.32604/cmc.2022.019277 ◽

2022 ◽

Vol 70 (2) ◽

pp. 4169-4184

Author(s):

Jinsu Kim ◽

Sungwook Ryu ◽

Namje Park

Keyword(s):

Computational Intelligence ◽

Data Deduplication ◽

Healthcare Applications ◽

Computational Intelligence Technique ◽

Intelligence Technique

Download Full-text

VeriDedup: A Verifiable Cloud Data Deduplication Scheme with Integrity and Duplication Proof

IEEE Transactions on Dependable and Secure Computing ◽

10.1109/tdsc.2022.3141521 ◽

2022 ◽

pp. 1-1

Author(s):

Xixun Yu ◽

Hui Bai ◽

Zheng Yan ◽

Rui Zhang

Keyword(s):

Data Deduplication ◽

Cloud Data

Download Full-text

Improving the Performance of Deduplication-Based Backup Systems via Container Utilization Based Hot Fingerprint Entry Distilling

ACM Transactions on Storage ◽

10.1145/3459626 ◽

2021 ◽

Vol 17 (4) ◽

pp. 1-23

Author(s):

Datong Zhang ◽

Yuhui Deng ◽

Yi Zhou ◽

Yifeng Zhu ◽

Xiao Qin

Keyword(s):

High Efficiency ◽

False Positive Rate ◽

Bloom Filter ◽

Data Locality ◽

Data Deduplication ◽

Backup System ◽

Memory Overhead ◽

Data Fragmentation ◽

Positive Rate ◽

Salient Features

Data deduplication techniques construct an index consisting of fingerprint entries to identify and eliminate duplicated copies of repeating data. The bottleneck of disk-based index lookup and data fragmentation caused by eliminating duplicated chunks are two challenging issues in data deduplication. Deduplication-based backup systems generally employ containers storing contiguous chunks together with their fingerprints to preserve data locality for alleviating the two issues, which is still inadequate. To address these two issues, we propose a container utilization based hot fingerprint entry distilling strategy to improve the performance of deduplication-based backup systems. We divide the index into three parts: hot fingerprint entries, fragmented fingerprint entries, and useless fingerprint entries. A container with utilization smaller than a given threshold is called a sparse container . Fingerprint entries that point to non-sparse containers are hot fingerprint entries. For the remaining fingerprint entries, if a fingerprint entry matches any fingerprint of forthcoming backup chunks, it is classified as a fragmented fingerprint entry. Otherwise, it is classified as a useless fingerprint entry. We observe that hot fingerprint entries account for a small part of the index, whereas the remaining fingerprint entries account for the majority of the index. This intriguing observation inspires us to develop a hot fingerprint entry distilling approach named HID . HID segregates useless fingerprint entries from the index to improve memory utilization and bypass disk accesses. In addition, HID separates fragmented fingerprint entries to make a deduplication-based backup system directly rewrite fragmented chunks, thereby alleviating adverse fragmentation. Moreover, HID introduces a feature to treat fragmented chunks as unique chunks. This feature compensates for the shortcoming that a Bloom filter cannot directly identify certain duplicated chunks (i.e., the fragmented chunks). To take full advantage of the preceding feature, we propose an evolved HID strategy called EHID . EHID incorporates a Bloom filter, to which only hot fingerprints are mapped. In doing so, EHID exhibits two salient features: (i) EHID avoids disk accesses to identify unique chunks and the fragmented chunks; (ii) EHID slashes the false positive rate of the integrated Bloom filter. These salient features push EHID into the high-efficiency mode. Our experimental results show our approach reduces the average memory overhead of the index by 34.11% and 25.13% when using the Linux dataset and the FSL dataset, respectively. Furthermore, compared with the state-of-the-art method HAR, EHID boosts the average backup throughput by up to a factor of 2.25 with the Linux dataset, and EHID reduces the average disk I/O traffic by up to 66.21% when it comes to the FSL dataset. EHID also marginally improves the system's restore performance.

Download Full-text

Study on Deduplication on Distributed Cloud Environment

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-2111 ◽

2021 ◽

pp. 248-252

Author(s):

Pradeep Nayak ◽

Poornachandra S ◽

Pawan J Acharya ◽

Shravya ◽

Shravani

Keyword(s):

Data Storage ◽

Data Privacy ◽

Data Deduplication ◽

Storage Unit ◽

Work Related ◽

Detection Analysis ◽

Information Event ◽

The Common ◽

Distributed Cloud ◽

Big Data Storage

Deduplication methods were designed to destroy copy information which bring about capacity of single duplicates of information as it were. Information Deduplication diminishes the circle space needed to store the back-ups in the extra room, tracks and kill the second duplicate of information inside the capacity unit. It permits as it were one case information event to be put away initially and afterward following occasions will be given reference pointer to the first information put away. In a Big information stockpiling climate, immense measure of information should be secure. For this legitimate administration, work, misrepresentation identification, investigation of information protection is an significant theme to be thought of. This paper inspects and assesses the common deduplication procedures and which are introduced in plain structure. In this review, it was seen that the secrecy and security of information has been undermined at numerous levels in common strategies for deduplication. Albeit much exploration is being done in different zones of distributed computing still work relating to this point is inadequate. To get rid of duplicate data which results in storage of single copies of data, data deduplication techniques were used. Data deduplication helps in decreasing storage capacity requirements and eliminates extra copies of same data inside storage unit. Proper management, work, fraud detection, analysis of data privacy are the topics to be considered in a big data storage environment, since, large amount of data needs to be secure. At many levels in general techniques for deduplication it is observed that safety of data and confidentiality has been compromised. Even though more research is being carried out in different areas of cloud computing still work related to this topic is little.

Download Full-text

Hekate a tool for gauging Data Deduplication Performance

10.1109/smartcloud52277.2021.00019 ◽

2021 ◽

Author(s):

Lars Nielsen ◽

Daniel E. Lucani

Keyword(s):

Data Deduplication

Download Full-text

An identity-based proxy re-encryption for data deduplication in cloud

Journal of Systems Architecture ◽

10.1016/j.sysarc.2021.102332 ◽

2021 ◽

pp. 102332

Author(s):

Ge Kan ◽

Chunhua Jin ◽

Huihui Zhu ◽

Yongliang Xu ◽

Nian Liu

Keyword(s):

Data Deduplication ◽

Identity Based

Download Full-text

data deduplication
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A HYBRID ENCRYPTION FOR SECURE DATA DEDUPLICATION IN CLOUD

Unbalanced Big Data-Compatible Cloud Storage Method Based on Redundancy Elimination Technology

The Analysis and Implication of Data Deduplication in Digital Forensics

Comparison of Ciphertext Features for Data Deduplication

Privacy-Enhanced Data Deduplication Computational Intelligence Technique for Secure Healthcare Applications

VeriDedup: A Verifiable Cloud Data Deduplication Scheme with Integrity and Duplication Proof

Improving the Performance of Deduplication-Based Backup Systems via Container Utilization Based Hot Fingerprint Entry Distilling

Study on Deduplication on Distributed Cloud Environment

Hekate a tool for gauging Data Deduplication Performance

An identity-based proxy re-encryption for data deduplication in cloud

Export Citation Format

data deduplicationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A HYBRID ENCRYPTION FOR SECURE DATA DEDUPLICATION IN CLOUD

Unbalanced Big Data-Compatible Cloud Storage Method Based on Redundancy Elimination Technology

The Analysis and Implication of Data Deduplication in Digital Forensics

Comparison of Ciphertext Features for Data Deduplication

Privacy-Enhanced Data Deduplication Computational Intelligence Technique for Secure Healthcare Applications

VeriDedup: A Verifiable Cloud Data Deduplication Scheme with Integrity and Duplication Proof

Improving the Performance of Deduplication-Based Backup Systems via Container Utilization Based Hot Fingerprint Entry Distilling

Study on Deduplication on Distributed Cloud Environment

Hekate a tool for gauging Data Deduplication Performance

An identity-based proxy re-encryption for data deduplication in cloud

data deduplication
Recently Published Documents