Half the battle in mathematics is the invention of a good notation. - Simon Laplace.
Hopefully, I'll do better in whatever the other half is.
I spent the bulk of last night filling the whiteboards in one of our larger conference rooms with the terms of the block variance. I was so excited that the really problematic terms factored out and could be pre-computed that I didn't take much time to note what they really were. Instead, I just immediately dropped them into a function and noted that the value could be attained by a simple look up at query time. That's true, but not super helpful to the reader who's trying to understand what we're doing.
This morning, I looked again at my work and was rather dismayed that I had missed expressing the formula in terms that everybody understands.
The original term was this:
and I just went with the rather dubious
(The "1" in the subscript indicates that we're looking at the first moment of b. There was a similar term for the second moment.) That's fine, you can define a function to mean whatever you want, but these terms aren't really that mysterious. We're basically just taking the probability of an event m and multiplying it by the expectation of the partition row count given m. These sorts of terms come up all the time in distributions, which is why we have standard notation for them:
Anybody with even a passing knowledge of statistics will know what the right hand side of those equations means. The left hand side gives no indication whatsoever as to what we're doing. Granted, we'll store the product as a single value that will get looked up but, for expository purposes, the right hand side is a much better representation.
No comments:
Post a Comment