Score

The score or informant is defined as the gradient of the log-likelihood with respect to the parameter:

Remark

The score can be characterized as a set of vectors that is a basis for the tangent space of the statistical manifold at a point: where is the log-likelihood and are the parameters indexed by .

Proposition

The expectation value of the score vanishes:

Proof By the definition of the score, we have:$$\mathbb{E}{x\mid\theta}\left[\nabla{\theta} \log p(x\mid \theta)\right]=\int \nabla_{\theta} \log p(x\mid \theta) p(x\mid\theta)\dd x=\nabla_{\theta}\int p(x\mid\theta)\dd x=\nabla_{\theta}1=0$$$\square$

Fisher Information

The Fisher information matrix of a parameter is defined as following:where denotes the log-likelihood.

Regularity Conditions

The fisher information can also be written asif is twice differentiable with respect to , and the following regularity conditions hold:

  1. The partial derivative of with respect to exists almost everywhere. (It can fail to exist on a null set, as long as this set does not depend on )
  2. The integral of can be differentiated under the integral sign with respect to .
  3. The support of does not depend on .

Cramér–Rao Lower Bound

The inverse of the Fisher information matrix is a lower bound on the variance of any unbiased estimator of :More generally, for any biased estimator with bias , we have:

Proof A proof of this theorem using information geometry can be found here.