Monday, November 13, 2017

Bootstrap Metropolis Hastings

I'm coding up the second algorithm for the comparo. This one is a bit less intuitive. First, a quick overview of the Metropolis Hastings algorithm:

The idea is to approximate the posterior distribution of some parameter given a set of data. Presumably, this distribution is difficult (or impossible) to derive, so we settle for generating an empirical distribution. It goes like this:

1) Pick a starting value for your parameter. Let's call it Y since greek letters are a pain to produce in a normal text editor. This first value is Y0.

2) Pick another value for Y by applying some distribution to Y0. The choice of distribution is pretty arbitrary, but things converge faster if it's somewhat similar to the distribution of Y. Since we're looking at sums, a normal distribution should be close (if not spot on). So, we let Y' = Y0 + N(0,s) where the standard deviation, s, is a reasonable guess at the variance of our sum.

3) Compute the relative likelihood of the two values L = P(X|Y')/P(x|Y0) where x is our observed data set.

4) If L > 1, then Y1 = Y', that is, we "accept" Y' as our next value in the empirical distribution.

5) If L < 1, then we accept Y1 = Y' with probability L (by comparing it to a pseudo-random uniform(0,1) variable. Otherwise, we keep the original value, Y0 = Y1.

We keep doing this for a long time until the empirical distribution of {Yi} converges. That limiting distribution is our estimate of the true distribution of Y.

The Bootstrap Metropolis Hastings algorithm works pretty much the same way except that rather computing the likelihood of the entire data set, we first select a subset and run it on that. We repeat that for multiple subsets and then average the results.

If this sounds like a lot of work for something as simple as a sample sum, you're right. We're not expecting the result to be any better than the BLB algorithm, which requires significantly less computation. I'll hopefully have confirmation of that in a couple days.

However, because the BMH algorithm returns an arbitrarily complex distribution and not just a point estimate and variance, it opens the possibility of leveraging the hierarchical model that we're using in CISS or even expanding that to something more complex. Once I'm done implementing the naive method, I'm going to try it with the hierarchical model and see if that keeps the variance from collapsing the way it did in the BLB tests on filtered queries.

No comments:

Post a Comment