Entropy

Ensemble

An ensemble is a triple where is a random variable taking values in with probabilities .

Information Content

Information content of an outcome with probability is defined as

Warning

The choice of logarithm basis is arbitrary. Normally we use and measure information in bits. All following contents will restrict on base .

Shannon Entropy

Entropy of a random variable is defined as average information content:

Remark

Entropy describes the uncertainty of a random variable.

Proposition

Entropy has the following properties:

  • .
  • For the distribution function, more sharply, lower entropy, more evenly, higher entropy.
  • with equality if and only if .
  • Entropy is the lower bound on the average number of bits to transmit the state of a random variable.
  • The number of binary questions lies between and .

Theorem

Consider a discrete variable taking on values from the finite set . Let be the probability of each state, with . Denote the vector of probabilities with . Then the entropy is maximized if is uniform. That is with equality iff for all .

Proof The object function is subject to the constraint . By strong duality, we optimize its Lagrange dual problem: It follows that Sum over all of the , obtaining: Therefore . Hence , i.e. for all .

Proposition

Entropy is a lower bound on the average number of bits to transmit the state of a random variable. That is the number of binary questions to describe the information lies between and .

Joint and Conditional Entropy

Conditional Entropy

The conditional entropy describes average uncertainty that remains about when is known:

Joint Entropy

The joint uncertainty of and is the uncertainty of plus the uncertainty of given :

Relative Entropy and Mutual Information

Relative Entropy (KL Divergence)

The relative entropy or KL divergence given probability distributions and is defined as

Proposition

The KL divergence has the following properties:

  • with equality if and only if .
  • Not symmetric: .
  • Not satisfy triangular inequality.

Proposition

Relative entropy is jointly convex. That is for any , we have:

Proof It suffices to show that for any , We can see this by checking convexity of . The Hessian of is given by which is positive semidefinite, hence is convex.

Mutual Information

The mutual information of two random variable and is defined as:

Proposition

The mutual information has the following properties:

  • .
  • with equality if and only if .
  • .
  • .

entropy|400