Wednesday, October 18, 2017

Docking the tail

Why do you dock a boxer's tail? Because it destroys everything.

Same is true with heavy-tailed distributions. Here's a little graph showing how full sampling of the tail allows us to take much less of a sample from the interior of the distribution. The optimal point of the cut depends on the distribution. As you might expect, the heavier the tails, the more you want to cut off. Normal hardly benefits at all. Cauchy works best with a fifth of the distribution relegated to the tail.



The optimal cut point also depends on the total number of blocks. Here, the total is 10000 and we're assuming a "full" query, that is, every row is relevant. Most queries filter rows which raises the variance of the interior.

No comments:

Post a Comment