Score
The score or informant is defined as the gradient of the log-likelihood with respect to the parameter:
Remark
The score can be characterized as a set of vectors that is a basis for the tangent space of the statistical manifold at a point:
where is the log-likelihood and are the parameters indexed by .
Proposition
The expectation value of the score vanishes:
Proof By the definition of the score, we have:$$\mathbb{E}{x\mid\theta}\left[\nabla{\theta} \log p(x\mid \theta)\right]=\int \nabla_{\theta} \log p(x\mid \theta) p(x\mid\theta)\dd x=\nabla_{\theta}\int p(x\mid\theta)\dd x=\nabla_{\theta}1=0$$$\square$
Fisher Information
The Fisher information matrix of a parameter
is defined as following: where denotes the log-likelihood.
Regularity Conditions
The fisher information can also be written as
if is twice differentiable with respect to , and the following regularity conditions hold:
- The partial derivative of
with respect to exists almost everywhere. (It can fail to exist on a null set, as long as this set does not depend on ) - The integral of
can be differentiated under the integral sign with respect to . - The support of
does not depend on .
Cramér–Rao Lower Bound
The inverse of the Fisher information matrix is a lower bound on the variance of any unbiased estimator
of : More generally, for any biased estimator with bias , we have:
Proof A proof of this theorem using information geometry can be found here.