Tensorial Representation of Random Variables

Tensor Product of Vector Valued Random Variables

Suppose is a finite sample space. The tensor product of two random variables and with and being vector spaces is defined as

General Expected Value

Suppose is a finite sample space and is a vector space over . The expected value of a random variable is defined as Naturally, if fixing some , then forms a map from the space of probability measures to .

General Covariance

Suppose and are two random variables, with and being vector spaces with standard -algebra of Borel sets. Then the covariance of and with respect to some probability measure is defined as where satisfying . Correspondingly, the variance of is defined as . This aligns with the definition of covariance in , where will degenerate to the vector outer product.

Remark

Now according to the above definition, the covariance of two random variables and is tensorial, since the expected value of a map will be in .

forms an inner product on the space of random variables ,

Covariance Inequality

For all random variables , and , the covariance tensor satisfies the following identity: Alternatively,

Proof For any probability measure , we have

Fisher Information Metric

Now we will focus on a parameterized subspace of the space of probability measures, say . For example, could be the set of all normal distributions, parameterized by mean and standard deviation, which are both in . Notice that is homeomorphic to some Euclidean space , and this is allowed as is a manifold.

Fisher-Rao (Information) Metric

Suppose is the manifold of probability measures for finite -point sample space . Given any tangent vector at , we can take derivative of the surprise in the direction of , yielding a measurable score function such that . And the Fisher information metric is defined as the expectation of the tensor product of the score functions: Notice that this definition agrees with the definition of Fisher information matrix.

Lemma

The expected value of the score function is zero.

Proof For all and , we have

Corollary

The Fisher information metric is the covariance tensor of the score functions:

Proof For all and , since the expected value of the score function is zero, we have the following identity:

Proposition

The statistical manifold with Fisher information metric and totally symmetric cubic tensor such that is equivalent to the Bregman manifold using the negative Shannon entropy as potential function.

Proof First show that the two metric tensor coincide. On the other hand, the Bregman metric satisfies: Hence, the two metric tensors are the same. Now consider the cubic tensor: Similarly, the cubic tensor on the Bregman manifold is: Therefore, the cubic tensors are also the same, and the two manifolds are equivalent.

Cramér–Rao Lower Bound

Cramér–Rao lower bound holds for the Fisher information metric, based on the general covariance definition. i.e. The inverse of the Fisher information metric tensor is a general lower bound on the variance of any unbiased estimator of : where is the generalized inequality for positive semi-definite tensors.

Proof Fix some . Suppose is an unbiased estimator for . For any arbitrary tangent vector , we have Furthermore, the covariance of and at is: Now replace directional derivative with derivative with respect to , we have: where is canonically defined as . Hence by lemma, we can write: $$\var[t]=\cov[t,t] \succeq \cov[t,l] \cov[l,l]^{-1} \cov[l,t]=\cov[l,l]^{-1}=I^{-1}.$$$\square$

Maximum Likelihood Estimation

If the maximum likely hood estimate achieves the Cramer-Rao lower bound on and , then it also achieves the Cramer-Rao lower bound on . Proof

Then, show that the MLE achieves the Cramer-Rao lower bound on the space of all probability measures on a finite space.