Tuesday, January 16, 2018

What do we mean by mean?

Another point raised by my adviser is that it is intuitively easier to bound a mean than a sum. Unless the mean is zero, the sum will tend to infinity, so that makes it rather hard to bound. The mean will always converge if the mean is finite.

The problem with that is determining what we even mean by a mean? Obviously, the sum over the number of observations. But, what do we consider observations?

The total size of the data set? OK, that's the obvious answer, but what that means is that if somebody loads a bunch of completely irrelevant data, the mean of my query just changed while the sum didn't. Seems bogus.

The number of rows actually included by the query? Sure, except we don't know what that number is. It would be yet another estimated parameter which would contribute more uncertainty to the convergence.

How about the number of blocks? Then, we're getting the mean of the blocksums. This is still just a tad bogus for the same reason as the first, but it seems less so. Since I already have the whole theory section lined out around blocksums and I don't expect the Markov/matingale treatment to change that, I think that's where I'm headed.

But, at the end of the day it's the sum I'm after.

No comments:

Post a Comment