This post is mainly for my adviser as i told him I'd have the implementation details section done tonight and may not get to prettying up this graph. Under the heading of "How can we tell if this is just plain wrong?" my thought was that a simple heuristic might be to not use a minimum sample size but rather a minimum non-zero sample size. That is, where we get in trouble is when we don't have enough non-zero blocks. So, I modified the simple block variance sampler to keep going until it had the specified number of non-zero blocks instead of at a fixed block count. The results are below:
As you can see, at really small sample sizes, this technique helps a lot (the horizontal scale is logarithmic, the points are 5, 10, 20, 50, 100, 250). That is, waiting for 5 non-zero blocks yields much better confidence intervals than simply stopping at 5 no matter what. However, by 10, it's that advantage has gone away. Somewhere between 5 and 10 non-zero blocks is enough that the method works.
So, this is a really simple heuristic: run a few queries using whichever method you want and plot the performance when the stopping rule is some number of total blocks versus the same number of non-zero blocks. When the two converge, that's your minimum non-zero count. Don't even test the width of the confidence interval if you don't have that many; it can't be trusted.
Saturday, June 16, 2018
Wednesday, June 13, 2018
Something to show for it
Going dark for a week was intentional. It's the only way I could make any progress in light of current work demands. The good news is that I think I have made progress. Not on the Metropolis-Hastings stuff; I think that line of inquiry is dead for the moment. But, I found a better way to use the bootstrap that does almost as well (but not quite as well) as the full kernel.
That's a big win on several fronts in my eyes. First, it provides a nice segue from the bootstrap to the kernel. Second, the kernel, which is the most Mathy solution is still the winner. Finally, the full bootstrap, which is the more Data Sciency solution is the winner once you consider performance (it's about 50% faster than the full kernel). That sets up the next phase of research really well since D-BESt isn't very Mathy at all, but will be a great add to the bootstrap algorithm.
FWIW, here's the rejection graph (as always, 50 is the target).
That's a big win on several fronts in my eyes. First, it provides a nice segue from the bootstrap to the kernel. Second, the kernel, which is the most Mathy solution is still the winner. Finally, the full bootstrap, which is the more Data Sciency solution is the winner once you consider performance (it's about 50% faster than the full kernel). That sets up the next phase of research really well since D-BESt isn't very Mathy at all, but will be a great add to the bootstrap algorithm.
FWIW, here's the rejection graph (as always, 50 is the target).
Saturday, June 2, 2018
One percent
Trying to finish this paper as we get into the busiest part of the year at work has pushed blogging a ways down the list of things I get done in a day. Most of what I've been doing on the paper has been the usual editing stuff that isn't that interesting to write about.
In less mundane news: I hit a few snags with the Metropolis-Hastings stuff and may just jettison that whole line of work if I can't get it fixed. It's really just a transitional step to the full kernel sampler, anyway. But, I haven't given up quite yet.
Looking ahead, I have formed a goal in my head as to what would constitute a useful result (from an applied standpoint, as opposed to just something that may or may not be mathematically interesting). I already know the D-Best is pretty good at cutting 90% of the data out of a query. I also know that CISS did a pretty good job of returning results on just 10% of the data read. So, now having the theoretical framework to justify those results, it seems reasonable to expect that we could produce good results reading just 1% of the data. Two orders of magnitude is no small thing in the real world. It would make our current $1.2-million cluster operate like one that costed $120 million. That's a number that shows up on the bottom line even for a Fortune-500 company like mine.
Granted, we already have other tuning methods that give us one order of magnitude, so it's really more like a $10 million difference. Still, I don't know any VP's that wouldn't take that if you offered it to them. (Though, my VP still grabs his chest and hyperventilates every time I suggest we actually write a production version of this stuff - I guess he's seen big ideas go down in flames before).
In less mundane news: I hit a few snags with the Metropolis-Hastings stuff and may just jettison that whole line of work if I can't get it fixed. It's really just a transitional step to the full kernel sampler, anyway. But, I haven't given up quite yet.
Looking ahead, I have formed a goal in my head as to what would constitute a useful result (from an applied standpoint, as opposed to just something that may or may not be mathematically interesting). I already know the D-Best is pretty good at cutting 90% of the data out of a query. I also know that CISS did a pretty good job of returning results on just 10% of the data read. So, now having the theoretical framework to justify those results, it seems reasonable to expect that we could produce good results reading just 1% of the data. Two orders of magnitude is no small thing in the real world. It would make our current $1.2-million cluster operate like one that costed $120 million. That's a number that shows up on the bottom line even for a Fortune-500 company like mine.
Granted, we already have other tuning methods that give us one order of magnitude, so it's really more like a $10 million difference. Still, I don't know any VP's that wouldn't take that if you offered it to them. (Though, my VP still grabs his chest and hyperventilates every time I suggest we actually write a production version of this stuff - I guess he's seen big ideas go down in flames before).
Subscribe to:
Posts (Atom)