Algorithm: Huffman Coding
Huffman coding is a procedure for generating optimal prefix codes. It assigns the longest codewords to least probable symbols:
- Take the two least probable symbols in the alphabet.
- Prepend bits
and to current codewords of symbols. - Combine these two symbols into a new symbol with probability equal to the sum of their probabilities.
- Repeat until all symbols are combined.
Example Python code can be found here. e.g. Suppose
and . Then the whole procedure is as follows: Therefore we have the output Huffman code
.
Proposition
The Huffman coding algorithm generates an optimal prefix code.
Remark
Though Huffman coding is simple and efficient, as it assumes a fixed distribution of symbols, Huffman codes are the best possible symbol code, but symbol coding is not always the best type of code.