Study on Data Compression Algorithm Based on Semantic Analysis

As a lossless data compression coding, Huffman coding is widely used in text compression. Nevertheless, the traditional approach has some deficiencies. For example, same compression on all characters may overlook the particularity of keywords and special statements as well as the regularity of some statements. In terms of this situation, a new data compression algorithm based on semantic analysis is proposed in this paper. The new kind of method, which takes C language keywords as the basic element, is created for solving the text compression of source files of C language. The results of experiment show that the compression ratio has been improved by 150 percent roughly in this way. This method can be promoted to apply to text compression of the constrained-language.

Download Full-text

USE OF NEW EFFICIENT LOSSLESS DATA COMPRESSION METHOD IN TRANSMITTING ENCRYPTED BAPTISTA SYMMETRIC CHAOTIC CRYPTOSYSTEM DATA

Jurnal Teknologi ◽

10.11113/jt.v78.8976 ◽

2016 ◽

Vol 78 (6-4) ◽

Author(s):

Muhamad Azlan Daud ◽

Muhammad Rezal Kamel Ariffin ◽

S. Kularajasingam ◽

Che Haziqah Che Hussin ◽

Nurliyana Juhan ◽

...

Keyword(s):

Data Compression ◽

Data Transmission ◽

Compression Algorithm ◽

Huffman Coding ◽

Compression Method ◽

Compression Technique ◽

Chaotic Dynamical System ◽

Fast Encoding ◽

Compression Mechanism ◽

Lossless Data Compression

A new compression algorithm used to ensure a modified Baptista symmetric cryptosystem which is based on a chaotic dynamical system to be applicable is proposed. The Baptista symmetric cryptosystem able to produce various ciphers responding to the same message input. This modified Baptista type cryptosystem suffers from message expansion that goes against the conventional methodology of a symmetric cryptosystem. A new lossless data compression algorithm based on theideas from the Huffman coding for data transmission is proposed.This new compression mechanism does not face the problem of mapping elements from a domain which is much larger than its range.Our new algorithm circumvent this problem via a pre-defined codeword list. The purposed algorithm has fast encoding and decoding mechanism and proven analytically to be a lossless data compression technique.

Download Full-text

A Codec Architecture for the Compression of Short Data Blocks

Journal of Circuits System and Computers ◽

10.1142/s0218126618500196 ◽

2017 ◽

Vol 27 (02) ◽

pp. 1850019 ◽

Cited By ~ 6

Author(s):

Jürgen Freudenberger ◽

Mohammed Rajab ◽

Daniel Rohweder ◽

Malek Safieh

Keyword(s):

Data Compression ◽

Storage Systems ◽

Compression Algorithm ◽

Huffman Coding ◽

Compression Scheme ◽

Data Encoding ◽

Block Level ◽

Lossless Data Compression ◽

Block Sizes

This work proposes a lossless data compression algorithm for short data blocks. The proposed compression scheme combines a modified move-to-front algorithm with Huffman coding. This algorithm is applicable in storage systems where the data compression is performed on block level with short block sizes, in particular, in non-volatile memories. For block sizes in the range of 1[Formula: see text]kB, it provides a compression gain comparable to the Lempel–Ziv–Welch algorithm. Moreover, encoder and decoder architectures are proposed that have low memory requirements and provide fast data encoding and decoding.

Download Full-text

Lossless text compression using GPT-2 language model and Huffman coding

SHS Web of Conferences ◽

10.1051/shsconf/202110204013 ◽

2021 ◽

Vol 102 ◽

pp. 04013

Author(s):

Md. Atiqur Rahman ◽

Mohamed Hamada

Keyword(s):

Data Compression ◽

State Of The Art ◽

Language Model ◽

Huffman Coding ◽

Original Text ◽

Text Compression ◽

Compression Technique ◽

Daily Life Activities ◽

Burrows Wheeler Transform ◽

Compressed Data

Modern daily life activities produced lots of information for the advancement of telecommunication. It is a challenging issue to store them on a digital device or transmit it over the Internet, leading to the necessity for data compression. Thus, research on data compression to solve the issue has become a topic of great interest to researchers. Moreover, the size of compressed data is generally smaller than its original. As a result, data compression saves storage and increases transmission speed. In this article, we propose a text compression technique using GPT-2 language model and Huffman coding. In this proposed method, Burrows-Wheeler transform and a list of keys are used to reduce the original text file’s length. Finally, we apply GPT-2 language mode and then Huffman coding for encoding. This proposed method is compared with the state-of-the-art techniques used for text compression. Finally, we show that the proposed method demonstrates a gain in compression ratio compared to the other state-of-the-art methods.

Download Full-text

Constructing Binary Huffman Tree

Formalized Mathematics ◽

10.2478/forma-2013-0015 ◽

2013 ◽

Vol 21 (2) ◽

pp. 133-143

Author(s):

Hiroyuki Okazaki ◽

Yuichi Futa ◽

Yasunari Shidama

Keyword(s):

Data Compression ◽

Binary Code ◽

Lossless Compression ◽

Huffman Coding ◽

Compression Algorithms ◽

Huffman Encoding ◽

Lossless Data Compression ◽

Entropy Encoding

Summary Huffman coding is one of a most famous entropy encoding methods for lossless data compression [16]. JPEG and ZIP formats employ variants of Huffman encoding as lossless compression algorithms. Huffman coding is a bijective map from source letters into leaves of the Huffman tree constructed by the algorithm. In this article we formalize an algorithm constructing a binary code tree, Huffman tree.

Download Full-text

Hybrid indexes for repetitive datasets

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2013.0137 ◽

2014 ◽

Vol 372 (2016) ◽

pp. 20130137 ◽

Cited By ~ 16

Author(s):

H. Ferrada ◽

T. Gagie ◽

T. Hirvola ◽

S. J. Puglisi

Keyword(s):

Dna Sequencing ◽

Data Compression ◽

Simple Technique ◽

Upper Bounds ◽

Compression Algorithm ◽

Original Text ◽

Human Genomes ◽

Lossless Data Compression

Advances in DNA sequencing mean that databases of thousands of human genomes will soon be commonplace. In this paper, we introduce a simple technique for reducing the size of conventional indexes on such highly repetitive texts. Given upper bounds on pattern lengths and edit distances, we pre-process the text with the lossless data compression algorithm LZ77 to obtain a filtered text, for which we store a conventional index. Later, given a query, we find all matches in the filtered text, then use their positions and the structure of the LZ77 parse to find all matches in the original text. Our experiments show that this also significantly reduces query times.

Download Full-text

Information technology. Adaptive lossless data compression algorithm (ALDC)

10.3403/01064935 ◽

1997 ◽

Keyword(s):

Information Technology ◽

Data Compression ◽

Compression Algorithm ◽

Lossless Data Compression

Download Full-text

Information technology. Adaptive lossless data compression algorithm (ALDC)

10.3403/01064935u ◽

2015 ◽

Keyword(s):

Information Technology ◽

Data Compression ◽

Compression Algorithm ◽

Lossless Data Compression

Download Full-text

A Lossless Data Compression Algorithm for Wireless Sensor Networks Based on Linear Regression Model

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.403-408.2441 ◽

2011 ◽

Vol 403-408 ◽

pp. 2441-2444

Author(s):

Hong Zhi Lu ◽

Xue Jun Ren

Keyword(s):

Linear Regression ◽

Regression Model ◽

Data Compression ◽

Linear Regression Model ◽

Compression Algorithm ◽

Sensor Data ◽

Simple Linear Regression ◽

Compression Algorithms ◽

One Dimensional ◽

Lossless Data Compression

According to the theory of simple linear regression model, this paper designed a lossless sensor data compression algorithm based on one-dimensional linear regression model. The algorithm computes the linear fitting values of sensor data’s differences and fitting residuals, which are input to a normal distribution entropy encoder to perform compression. Compared with two typical lossless compression algorithms, the proposed algorithm indicated better compression ratios.

Download Full-text