Thursday, April 26, 2018

Doing the comparos justice

With my new outline, I'm having to change my treatment of BLB and BMH a bit. I had introduced them only in the context of the uncorrelated case. They do great there. But, then again, so does simple sampling, so why go to the trouble? Given that they fail miserably when the data is correlated, one could question why I even bring them up.

Well, they don't fail quite as miserably if you put just a little work into it.

First, there's the fact that BMH specifically says you should resample blocks with each iteration rather than using the same block over and over again. If you actually sample random blocks from the entire population, this will work great, but that's because you will have sampled the entire population by the time you're done. That rather defeats the purpose of sampling.

But, suppose we just sampled from the blocks that we had already selected. So, say we've sampled ten blocks and we want to check three blocks at each step of the chain. No problem, just pick any three from the ten you've got. Since we've already computed summary statistics, this doesn't involve any extra computation, it's just a matter of picking the right value out of an array at each step. This will spread the distribution to at least cover all of the blocks sampled.

Now, let's look at BLB. This one is a bit trickier because the method explicitly relies on each "bag" being a truly independent random sample, which a block is definitely not. But, the partitions within our sampled blocks are, more or less, independent random samples. So, if instead of looking at individual observations, we select a group of partitions and then construct the bootstrap from them, we should get similar variation in the total sum without having to randomly sample at the row level.

I'll need to test out both these ideas, of course. Fortunately, the sampler harness is pretty generic so it's probably only 1-2 days work, plus another day writing up the results (unless they come out whacky, then it could be several days trying to figure that out). Either way, I think it will be a presentation where I'm less likely to be accused of portraying competing methods in their worst possible light.

No comments:

Post a Comment