ABSTRACT: This article describes an optimization method concerning entropy encoding applicable to a source of independent and identically-distributed random variables. The algorithm can be explained with the following example: let us take a source of i.i.d. random variables
X with uniform probability density and cardinality 10. With this source, we generate messages of length 1000 which will be encoded in base 10. We call
XG the set containing all messages that can be generated from the source. According to Shannon's first theorem, if the average entropy of
X, calculated on the set
XG, is
H(X)≈0.9980, the average length of the encoded messages will be 1000*
H(X)=998. Now, we increase the length of the message by one and calculate the average entropy concerning the 10% of the sequences of length 1001 having less entropy. We call this set
XG10. The average entropy of
X10, calculated on the
XG10 set, is
H(X10)≈0.9964, consequently, the average length of the encoded messages will be 1001*
H(X10)=997.4 . Now, we make the difference between the average length of the encoded sequences belonging to the two sets (
XG and
XG10) 998.0-997.4 = 0.6. Therefore, if we use the
XG10 set, we reduce the average length of the encoded message by 0.6 values in base ten. Consequently, the average information per symbol becomes 997.4/1000=0.9974, which turns out to be less than the average entropy of
X H(X)≈0.998. We can use the
XG10 set instead of the
X10 set, because we can create a biunivocal correspondence between all the possible sequences generated by our source and ten percent of the sequences with less entropy of the messages having length 1001. In this article, we will show that this transformation can be performed by applying random variations on the sequences generated by the source.