I've suspected this all along and have had it in the paper for some time, but only recently bothered to prove it. The funky shape of the sample variance distribution isn't the real problem causing too many confidence intervals missing the true value. It's the correlation between the sample sum and the sample variance.
As you can see in the plots below, the uncorrelated sum and variance are basically independent. When the data is correlated, there's a bigger chance of getting "zero" blocks which understates both the sum and the variance. This combination is, simply put, bad.
While there's a lot more variation in the correlated case, the average values for both the sample sum and sample variance are fine. But, since the joint distribution pushes both away from the mean together, it results in small confidence intervals when the error is actually the greatest.
And, the problem never really goes away. Even at 250 of the 1000 blocks sampled, the correlation is obvious from the plots:
It's tight enough that the rejection rate is about right, but there's no denying that these statistics are correlated.
No comments:
Post a Comment