Entropy
Ensemble
An ensemble
is a triple where is a random variable taking values in with probabilities .
Information Content
Information content of an outcome
with probability is defined as
Warning
The choice of logarithm basis is arbitrary. Normally we use
and measure information in bits. All following contents will restrict on base .
Shannon Entropy
Entropy of a random variable is defined as average information content:
Remark
Entropy describes the uncertainty of a random variable.
Proposition
Entropy has the following properties:
. - For the distribution function, more sharply, lower entropy, more evenly, higher entropy.
with equality if and only if . - Entropy is the lower bound on the average number of bits to transmit the state of a random variable.
- The number of binary questions lies between
and .
Theorem
Consider a discrete variable
taking on values from the finite set . Let be the probability of each state, with . Denote the vector of probabilities with . Then the entropy is maximized if is uniform. That is with equality iff for all .
Proof The object function is
Proposition
Entropy is a lower bound on the average number of bits to transmit the state of a random variable. That is the number of binary questions to describe the information lies between
and .
Joint and Conditional Entropy
Conditional Entropy
The conditional entropy describes average uncertainty that remains about
Joint Entropy
The joint uncertainty of
Relative Entropy and Mutual Information
Relative Entropy (KL Divergence)
The relative entropy or KL divergence given probability distributions
and is defined as
Proposition
The KL divergence has the following properties:
with equality if and only if . - Not symmetric:
. - Not satisfy triangular inequality.
Proposition
Relative entropy is jointly convex. That is for any
, we have:
Proof It suffices to show that for any
Mutual Information
The mutual information of two random variable
and is defined as:
Proposition
The mutual information has the following properties:
. with equality if and only if . . .
