Compression

Definition

Def Data Compression Data compression is the process if replacing a message with a smaller message which can be reliably converted back to the origin.

Formalising Coding

Definition

Def Source Code We denote the set of all finite binary strings by . Given an ensemble , a function is a source code for . The number of symbols in is the length of the codeword for .

Definition

Def Extension of Source Codes The extension of a source code is defined by where is the set of all finite sequences of symbols from . e.g. The code names outcomes from by , , . The length of the codeword for each outcome is . The extension of gives .

Definition

Def Uniform Code Let be an ensemble, and be a source code. We say is a uniform code if is the same for all . We say is a variable-length code otherwise.

Lossless Code

Let be an ensemble, and be a source code. We say is a lossless code if is injective. Otherwise, we say is a lossy code. e.g. Let . Then , , , is uniform lossless; while , , , is uniform lossy.

Formalising Compression

Definition

Def Raw Bit Content If is an ensemble with outcome set then its raw bit content is where is the cardinality of .

Comment

There is an inherent trade off between the number of bits required in a uniform lossy code and the probability of being able to code an outcome.

Definition

Def Smallest -sufficient Subset Let be an ensemble and for , define the smallest -sufficient subset to be the smallest subset of such that e.g.

Algorithm

Algorithm Find The Smallest -sufficient Subset Intuitively, construct by removing elements of in ascending order of probability, till we have reached the threshold.

Definition

Def Essential Bit Content Let be an ensemble the for the essential bit content of is where is the smallest -sufficient subset.

Typical Sets and AEP

Extended Ensemble

The extended ensemble of blocks of size is denoted . Outcomes from are denoted . The probability of is defined to be . e.g. Let be an ensemble with outcomes with and . Then (3 flips of the coin) has outcomes with and .

Definition

Def Typical Set For closeness the typical set for an ensemble is

Remark

The name “typical” is used since � will have roughly occurrences of symbol , occurrences of symbol , and so on, where are the probabilities of the symbols in .

Thrm Source Coding Theorem Let be an ensemble with entropy bits. Given and , there exists a positive integer such that for all we have Proof Recall that for the typical set , we have By AEP,

Remark

The source coding theorem shows that given a tiny probability of error, the average bits per outcome can be made as close to as required. Even if we allow a large probability of error, we cannot compress more than bits per outcome for large sequences.