Thursday, December 22, 2016

Rao-Blackwell Theorem

In my past posts on sufficient statistics, I've indicated my preference for the Bayesian formulation, so I'm sticking with that here, even though the frequentist version is more likely to appear on the Q. They are for all intents and purposes equivalent except in some weird infinite-dimension cases that I'm not too worried about. I'm also going to be somewhat informal on my definitions.

As previously noted a statistic is any function T of a random sample X. Preferably, the dimension of T(X) is less than that of X, but that's not a requirement; it just makes the statistic more useful since the whole point is data reduction.

A sufficient statistic for parameter θ is one such that P(θ|X) = P(θ|T(X)). And, from the Factorization Theorem (which, incidentally is also referred to as the Fisher-Neyman Factorization Theorem because all statistical results sound more legit if you attach Fisher's name to them), this is equivalent to saying the pmf/pdf, f, can be factored such that f(x|θ) = g(x)h(T(x)|θ).

A minimal sufficient statistic for parameter θ is a sufficient statistic that can be expressed as a function of any other sufficient statistic. In other words, any sufficient statistic can be further "boiled down" via another function to the minimal sufficient statistic. As functions never add information, the minimal statistic is the statistic where removing any more information results in the statistic becoming insufficient. Due to some nutty edge cases, you can't simply say that a minimal sufficient statistic has the minimum dimension of all sufficient statistics, but that is the gist of it.

Since there are infinitely many such sufficient statistics, it's impossible to verify directly that every sufficient statistic can be transformed to a candidate for minimal sufficient statistic. the Factorization Theorem is again useful. If S and T are both sufficient statistics, then f(x|θ) = g(x)h(T(x)|θ) = q(x)r(S(x)|θ). If S(x) = S'(T(x)) then the only part of the f that depends on θ is r(S'(T(x)|θ)). Since this must hold for any sufficient T(x) and corresponding S'(T(x)), we get the following equivalence: S(X) is a minimal sufficient statistic if and only if



This is much easier to prove.

Having defined our terms, it's time to get to the result, which is actually very useful; the Rao-Blackwell Theorem:
If g(X) is an estimator of θ and S(X) is a sufficient statistic for θ, then E(g(X)|S(X)) is a generally better and never a worse estimator of θ. (By "better", we mean that the estimator has a lower expected mean squared error).
Thus, you can start with pretty much any estimator and compute the conditional expectation to get a really good estimator. If you're using a minimal sufficient statistic, you get the added bonus of knowing the dimensionality of your inputs is minimized.

No comments:

Post a Comment