Monday, April 23, 2018

Still stuff to do

Definitely getting warm on this last revision (I'll find out if my adviser agrees tomorrow). The flow of the paper makes much more sense now, as does the inclusion of the BLB and BMH algorithms. There are still a few things left to do:

  • Display consistent results for each of the methods. Right now, the graphs are a little scattered. They all tell good stories, but I need to either come up with a consistent graph that allows comparisons across methods, or come up with some other type of table that puts them on equal footing.
  • There are a few sections of text that my adviser thinks are less than clear and I haven't been able to come up with significant improvements. We may just have to work them together word by word. Fortunately, it's a short list.
  • The conclusion is, well, missing. I need to write up a conclusion of some sort.
  • Run all the samplers on the empirical data set. This is the biggest lift, but at least I have that data set on the HDFS cluster at work now. It's 17 billion fact rows and it's no problem to construct queries against it that take several minutes to complete (they used to take hours before we rehosted to the cluster this year). So, it's a big enough data set to make the case that sampling makes sense. The problem is that I haven't properly parallelized the samplers, so they will probably take just as long as the full query engine (which we have spent the last year making very fast). I guess I don't need to worry about that since we're just demonstrating that things work and not making a bunch of performance claims.
That's probably a solid week's worth of work and I'm sure there's another round of edits, but this is looking like something that really will happen quite soon.

No comments:

Post a Comment