Tensorial Representation of Random Variables
Tensor Product of Vector Valued Random Variables
Suppose
is a finite sample space. The tensor product of two random variables and with and being vector spaces is defined as
General Expected Value
Suppose
is a finite sample space and is a vector space over . The expected value of a random variable is defined as Naturally, if fixing some , then forms a map from the space of probability measures to .
General Covariance
Suppose
and are two random variables, with and being vector spaces with standard -algebra of Borel sets. Then the covariance of and with respect to some probability measure is defined as where satisfying . Correspondingly, the variance of is defined as . This aligns with the definition of covariance in , where will degenerate to the vector outer product.
Remark
Now according to the above definition, the covariance of two random variables
and is tensorial, since the expected value of a map will be in .
Covariance Inequality
For all random variables
, and , the covariance tensor satisfies the following identity: Alternatively,
Proof For any probability measure
Fisher Information Metric
Now we will focus on a parameterized subspace of the space of probability measures, say
Fisher-Rao (Information) Metric
Suppose
is the manifold of probability measures for finite -point sample space . Given any tangent vector at , we can take derivative of the surprise in the direction of , yielding a measurable score function such that . And the Fisher information metric is defined as the expectation of the tensor product of the score functions: Notice that this definition agrees with the definition of Fisher information matrix.
Lemma
The expected value of the score function is zero.
Proof For all
Corollary
The Fisher information metric is the covariance tensor of the score functions:
Proof For all
Proposition
The statistical manifold
with Fisher information metric and totally symmetric cubic tensor such that is equivalent to the Bregman manifold using the negative Shannon entropy as potential function.
Proof First show that the two metric tensor coincide.
Cramér–Rao Lower Bound
Cramér–Rao lower bound holds for the Fisher information metric, based on the general covariance definition. i.e. The inverse of the Fisher information metric tensor is a general lower bound on the variance of any unbiased estimator
of : where is the generalized inequality for positive semi-definite tensors.
Proof Fix some
Maximum Likelihood Estimation
If the maximum likely hood estimate achieves the Cramer-Rao lower bound on
Then, show that the MLE achieves the Cramer-Rao lower bound on the space of all probability measures on a finite space.