I had intended to use a non-informative prior on the upper bound of the Uniform distribution that the block sum is drawn from. A few problems immediately present themselves.
The best non-informative prior for Uniform is the Jeffreys prior (or reference prior; they're the same in the univariate case)
p(U) = 1/U, where U is the upper bound of the distribution.
The first obvious issue is that this is an improper prior. Not only does it not integrate to 1, it doesn't integrate to anything. That means that, until I have some data, I can't estimate a mean which means I can't estimate a variance which means I can't determine whether I should be sampling this stratum.
That's not terribly difficult to work around. Just set U = nbk until we have some data or, cap it at nbk (since it can't possibly be larger than that) which makes the integral finite.
The bigger problem is what happens after the data arrives. Given a block sum of X, the posterior is p(U|X) = X / U 2, U > X. That's a perfectly good density function, but it has rather atrocious consequences. Namely, if that first block sum is small, it's going to drive the estimate for all remaining block sums way down and crush the estimate of the variance in the process. As such, we won't return to the stratum to sample more blocks and find that the sums are generally much higher.
So, while there is no way to know what the distribution of U is when starting a query, the non-informative approach is going to kill the algorithm. Therefore, I have to inject a fake belief that the sums are higher and bake that into the prior. This is essentially setting the prior consistent with an "assume the worst" attitude.
I think that's defensible in principle but leaves me without any mathematical precedent on which to pick a prior. So, I guess I'll just run a bunch of empirical tests and try to find a some sort of consistent shape. Or, at the very least, some starting point that results in posteriors that have that can represent a family of shapes observed.
No comments:
Post a Comment