Monday, May 16, 2016

"nu" prior

Sorry, that's not even a particularly good pun, but I couldn't resist. I'm going to use nu (ν) rather than "U" as the upper bound of the distribution of block sums. Just seems more consistent to use a greek letter for a distribution parameter.

As mentioned on Friday, the standard non-informative priors don't work well in this case. While we don't actually have any information, we want to assert the idea that ν is to be treated as being high until the data proves elsewise. The simplest prior that accomplishes that is p(ν) = ν. That's an improper prior, but we'll get to norming constants in a moment; it doesn't matter for now.

Given a series of block sums X = (x1, ..., xm), P(X|ν) = Π P(xi|ν) = 1/νm   max{xi} ≤ ν. Thus, the un-normalized posterior g(ν|X) = ν ν-m = ν1-m   ν ∈ (max{xi}, nbk). Integrating this to get the normalizing constant gives:



so


That looks messier than it is. Plug in m = 1 and you see that the prior goes from being linearly increasing to a flat posterior running from the block sum to the maximum possible block sum. That seems reasonable. If we've only sampled one block, all we really know is that the distribution of block sums goes at least as high as what we just saw. Of course, if we sample lots of block sums and the distribution really is uniform, then the posterior on ν should converge to the maximum observed value as the number of observations gets large. Let's check on that:



The ratio on the left clearly goes to 1. The first term in the parens goes to zero because nbk > max{xi} so the negative exponent will send the denominator to -infinity. The denominator of the rightmost term goes to -1 so the entire thing converges to max{xi}. Yay for that.

Here's the rub: suppose the first couple observations are particularly low. That's not unusual; with at least 16 strata, we'd expect at least one to have the first two in the bottom quartile. With two observations, the posterior is already biasing towards max{xi}, but that's going to chop off a lot of our distribution (and variance) and cheat this stratum. So, we need slower convergence.

Suppose we were to change our prior to p(ν) = νc where c is some real number > 1. g(ν|X) becomes νc-m and the c just propagates through everything (simply replace 1-m with c-m). Now, you can dial c up as high as needed to keep the posterior from collapsing too quickly, but still get the same asymptotic convergence to the proper mean. As c really is arbitrary, I'll need to run a bunch of tests and derive a heuristic for picking a good value, but I'm pretty optimistic that this is going to work.

No comments:

Post a Comment