Algorithm: Huffman Coding

Huffman coding is a procedure for generating optimal prefix codes. It assigns the longest codewords to least probable symbols:

  1. Take the two least probable symbols in the alphabet.
  2. Prepend bits and to current codewords of symbols.
  3. Combine these two symbols into a new symbol with probability equal to the sum of their probabilities.
  4. Repeat until all symbols are combined.

Example Python code can be found here. e.g. Suppose and . Then the whole procedure is as follows: |500 Therefore we have the output Huffman code .

Proposition

The Huffman coding algorithm generates an optimal prefix code.

Remark

Though Huffman coding is simple and efficient, as it assumes a fixed distribution of symbols, Huffman codes are the best possible symbol code, but symbol coding is not always the best type of code.