Interval Coding

Modified Cumulative Distribution Function

Given a discrete probability distribution , we define the modified cumulative distribution function as: where is the usual cumulative distribution function.

Shannon-Fano-Elias Coding

For each outcome ,

  1. Calculate the value of modified cumulative distribution function .
  2. Calculate the length of the required binary string .
  3. Convert the value to binary .
  4. Trim the binary string to the required length , assign the trimmed binary string to the outcome as its codeword.

e.g. Suppose has outcomes and probabilities . Then we have the following table:

4
5
3
3

Therefore we have the code .

Lossless Property

Proposition

The Shannon-Fano-Elias code is lossless.

Proof Denote the Shannon-Fano-Elias code as for the th outcome . Then we have That is the codeword lies entirely in the interval between and , and these intervals are disjoint. Therefore, the code is lossless.

Prefix Property

Proposition

Shannon-Fano-Elias code is prefix free.

Proof According to the previous argument, we know that . Additionally, Hence The intervals for each codeword are thus trivially disjoint, since we know each of the interval is disjoint, thus the code is prefix free.

Shannon-Fano-Elias Decoding

Shannon-Fano-Elias Decoding

To decode a given bitstring,

  1. start with the first bit, compute the corresponding binary interval;
  2. if the interval is strictly contained within that of a codeword:
  3. output the corresponding outcome
  4. skip over any redundant bits for this codeword
  5. repeat the process with the remaining bits
  6. otherwise include next bit and compute the corresponding interval.

e.g. Suppose we have with and we want to decode : shannon_fano_elias_decoding.jpeg|450 We could actually stop the process once we see since .

Expected Length

Theorem

The expected length of a Shannon-Fano-Elias code on ensemble satisfies:

Proof The lower bound is similarly obtained.