Friday, November 25, 2016

Central Limit Theorem and Slutsky's Theorem

Today I'll look at two results stemming from stochastic convergence. The first is quite possibly the most significant result in all of statistics. Odd that nobody seems to know who first proved it. There are many variants extending it to special cases, but the original proof of the Central Limit Theorem is not easy to track down. It might be one of those things that evolved; first conjectured by example, then proven in a bunch of special cases and then incrementally extended to its current robust form:
Let {Xi} be a sequence of iid random variables with finite variance, σ and EXi = μ. Then then the mean for a sample size of n converges in distribution to a normal with mean μ and variance σ2/n.
It should be pretty obvious that this is an important result. It basically says that as long as our sample sizes are large and variances are finite, we wind up with normally-distributed means. Stated another way, it's a bridge between a finite sample and the strong law of large numbers. Yes, the sample mean converges, and here's how it converges. The problem, of course, is that variances aren't always finite and "large" is a pretty subjective term. Many researchers are far too quick to assume normality without actually verifying it.

Slutsky's Theorem is a lesser-known, but in many ways more practically useful result:
If Xn converges in distribution to X and Yn converges in probability to a constant, a, then
  • YnXn converge in distribution to aX.
  • Xn + Yn converge in distribution to X + a.
You're still stuck with the vagueness of how big n needs to be, but taking linear combinations of things is a really common operation. It's nice to know that you're not completely invalidating your results by doing it. In particular, we can use this to plug the sample variance back into the Central Limit Theorem:



The right hand side is actually the typical way to state the CLT. We know that the sample variance, Sn2, converges in probability to σ2. So, by Slutsky, we can swap it in to get:



This is a much more useful form of the result because it allows us to make inferences on the mean without knowing the true variance. Again, adequate sample size is very dependent on the underlying distribution. But, assuming you perform your proper normality checks, you're good to go.

No comments:

Post a Comment