Tuesday, December 20, 2016

Factorization Theorem

Before getting to the named result, a word on sufficient statistics. I've already weighed in on my own biases, but I'm somewhat intrigued by the fact that the whole notion of sufficient statistics breaks down so quickly once one leaves the comfy realms of exponential distributions. It's no wonder frequentists love them so much. Of course, we Bayesians are often pinged for using distributions that form convenient conjugate priors rather than what best models the underlying data. It's really not a question of right or wrong; it's using the tool that works given your point of view. Just as a Baptist would quote the bible while a Nihilist would quote Nietzche; you go with what fits.

Anyway, assuming we do care about sufficient statistics and aren't assuming a distribution from the exponential family, how do we find one? The Factorization Theorem helps.
Let f(x|θ) denote the joint pdf or pmf of a sample X. A statistic T(X) is a sufficient statistic for θ if and only if there exists functions g(t|θ) and h(x) such that, for all sample points x and all parameter points θ, f(x|θ) = g(T(x)|θ)h(x).
As is so often the case with these characterization theorems, the proof is a less than enlightening exercise in symbol manipulation. So, we'll skip that. The result, on the other hand, does make it a whole lot easier to find a sufficient statistic, particularly when one steps outside the family of exponential distributions.

Consider, for example, the uniform distribution. Suppose you know that values are evenly distributed, but you have no idea what the upper bound is. It seems pretty obvious that the maximum value from your sample would be the best predictor of that and one could prove that from the definition of sufficient statistic. However, it's super easy if you use the Factorization Theorem:

f(x|θ) = 1/θ for 0 < x < θ so the joint distribution is f(x|θ) = θ-n for all 0 < xi < θ. Note that this density only depends on the minimum being greater than zero and the maximum being less than θ.

So, the only part of the distribution that depends on both x and θ can be re-written as a function of T(x) = max{xi} and θ. At this point, it should be fairly clear that you're done, but if you absolutely must grind it out, it goes like this:

T(x) = max{xi}
g(t|θ) = θ-n if t < θ
h(x) = 1
and
f(x|θ) = g(max{xi}|θ) = g(T(x)|θ)h(x)

No comments:

Post a Comment