Wednesday, October 4, 2017

Empirical stratification

I put together a little table showing how chopping the tails off of three different distributions affects the non-tail variance. Each of these has an inter-quartile range (the difference between the 25th and 75th percentile) of 1.35.

N(0,1) Laplace(0,2.25) Cauchy(0.675)
σ2 1 3.18 inf
75% 0.67 0.67 0.67
95% 1.64 2.25 4.26
97.5% 1.96 2.93 8.580
σ0.502 0.15 0.13 0.13
σ0.102 0.60 0.81 1.41
σ0.052 0.76 1.15 3.37

Clearly (and predictably), the heavy tailed distributions benefit a lot more from chopping the tails off and sampling them completely. However, even the Normal distribution gets a 25% boost in power just by trimming the extreme 5% of the distribution. That is, if you can stop sampling at with n-r blocks unsampled with the tails, you can stop with (n-r)/0.76 blocks unsampled if you trim the tails. With Laplace (AKA, double exponential), you get to stop at (n-r)1.15/3.18. That's nearly three times as many unsampled blocks.

The law of diminishing returns is also clear. While Cauchy gets some real benefit from trimming a full 10% off, even the Laplace distribution is showing only marginal improvement from the 5% trim. Trimming off everything beyond the quartiles doesn't make any sense even in the case of Cauchy. You'd be sampling half the distribution every query plus whatever you needed to get the interior estimate to converge.

A better approach is to go with multiple strata. While, I'm sure the best way to tune this is to just run a bunch of tests on your actual data set, I'm also pretty sure that optimal stratification breaks the data so that the stratum variance is inversely proportional to the number of blocks in the stratum. That may not be true and it might be really hard to prove, but it's worth a shot.

No comments:

Post a Comment