Anyway, assuming we do care about sufficient statistics and aren't assuming a distribution from the exponential family, how do we find one? The Factorization Theorem helps.
Let f(x|θ) denote the joint pdf or pmf of a sample X. A statistic T(X) is a sufficient statistic for θ if and only if there exists functions g(t|θ) and h(x) such that, for all sample points x and all parameter points θ, f(x|θ) = g(T(x)|θ)h(x).As is so often the case with these characterization theorems, the proof is a less than enlightening exercise in symbol manipulation. So, we'll skip that. The result, on the other hand, does make it a whole lot easier to find a sufficient statistic, particularly when one steps outside the family of exponential distributions.
Consider, for example, the uniform distribution. Suppose you know that values are evenly distributed, but you have no idea what the upper bound is. It seems pretty obvious that the maximum value from your sample would be the best predictor of that and one could prove that from the definition of sufficient statistic. However, it's super easy if you use the Factorization Theorem:
f(x|θ) = 1/θ for 0 < x < θ so the joint distribution is f(x|θ) = θ-n for all 0 < xi < θ. Note that this density only depends on the minimum being greater than zero and the maximum being less than θ.
So, the only part of the distribution that depends on both x and θ can be re-written as a function of T(x) = max{xi} and θ. At this point, it should be fairly clear that you're done, but if you absolutely must grind it out, it goes like this:
T(x) = max{xi}
g(t|θ) = θ-n if t < θ
h(x) = 1
and
f(x|θ) = g(max{xi}|θ) = g(T(x)|θ)h(x)
No comments:
Post a Comment