Friday, October 21, 2016

Variance 3.0

Shifting gears again back to thesis work.

Did I mention that, despite the fact it works OK, I'm not really happy with modeling the blocksums as U(0,ν) as I did last spring? It's not that the formula is ugly, it's that it's bogus. There's always a chance that a block will come back and every single row will be relevant. A small chance, for sure, but not zero. Uniform sets the probability to zero. So, I've been toying with modeling the blocksums as exponential. Here's what that does to the variance:

First, as with the moving upper limit uniform, the portion of the variance attributable to a block hit, q, is now a function of the parameter on the blocksum distribution. As before, that parameter is a single scalar. The exponential is usually parameterized with a "rate" parameter denoted, λ, that indicates how frequently things occur. This is the inverse of the mean. q(λ) turns out to be pretty trivial. Here, X' denotes the blocksum given that we have a hit; that is, X'~exp(λ).

q(λ) = E(|X'-u|2) = E((X')2 - 2X'μ + μ2) = 2λ-2 - 2μ/λ + μ2

We can pause here to note that we didn't even have to derive that much. We know the exponential distribution has a variance of λ-2. As the mean minimizes the squared distance to the rest of the distribution, shifting that point will cause an increase. As this is a decomposition of squares, the Pythagorean theorem tells us that the new sum of squares must be the original sum of squares plus the squared distance that the center point was moved. Thus:

E(|X'-u|2) = Var(X') + (E(X')-μ)2 = λ-2 + (1/λ-μ)2

Which is just the first result, somewhat rearranged. That tidbit aside, the first form is easier to integrate with respect to λ, so we'll stick with that.

Now, to calculate the actual variance, we need a prior on λ. The gamma distribution is the natural conjugate prior, and it should work fine in this case. We've already used the usual parameter letters, so we'll switch to h and s to indicate the number of hits and the sum of those hits, which is the normal interpretation of the gamma parameters when used as an exponential prior (note that the sum gets inverted to create a rate for the Gamma). Thus:



So that just leaves us to calculate:



Oh, heck, look at the time...

No comments:

Post a Comment