Trying to finish this paper as we get into the busiest part of the year at work has pushed blogging a ways down the list of things I get done in a day. Most of what I've been doing on the paper has been the usual editing stuff that isn't that interesting to write about.
In less mundane news: I hit a few snags with the Metropolis-Hastings stuff and may just jettison that whole line of work if I can't get it fixed. It's really just a transitional step to the full kernel sampler, anyway. But, I haven't given up quite yet.
Looking ahead, I have formed a goal in my head as to what would constitute a useful result (from an applied standpoint, as opposed to just something that may or may not be mathematically interesting). I already know the D-Best is pretty good at cutting 90% of the data out of a query. I also know that CISS did a pretty good job of returning results on just 10% of the data read. So, now having the theoretical framework to justify those results, it seems reasonable to expect that we could produce good results reading just 1% of the data. Two orders of magnitude is no small thing in the real world. It would make our current $1.2-million cluster operate like one that costed $120 million. That's a number that shows up on the bottom line even for a Fortune-500 company like mine.
Granted, we already have other tuning methods that give us one order of magnitude, so it's really more like a $10 million difference. Still, I don't know any VP's that wouldn't take that if you offered it to them. (Though, my VP still grabs his chest and hyperventilates every time I suggest we actually write a production version of this stuff - I guess he's seen big ideas go down in flames before).
No comments:
Post a Comment