Going into last week, I noted that I was not finished on any of my major items. I'm still not, but I've certainly have made a lot of progress.
I turned in a reasonable stab at HW3 for Set Theory. It will need to be resubmitted after I get the comments back, but I wasn't embarrassed by what I turned in (though, I might be when I get the comments).
I've finished the query generator for Blocker and have a good direction for the evolutionary algorithm. I'm going to drop the Bayesian stuff for now and just use a heuristic. I give it 50/50 that I'll have something I can run by the time I leave for Boston.
At work we got through quite a bit of stuff prepping for production. Go-live is still tentatively scheduled for next Wednesday, though I'm not seeing how we get final signoff in time for that. That's kind of a bummer since it means I'll be in Boston when they turn it on. At any rate, now that we've got our Vertica instance running on a real cluster, we're getting billion-row queries back in 3-4 seconds, which is a big improvement over the old system. Load times are also faster by an order of magnitude. The old system loaded at just under 100 million rows per hour. The new one is around 1.2 billion and most of the time is extracting the data from Oracle; it will be even faster than that once the source system is writing directly to Hadoop. Given that the requirement is that we load 8 billion overnight, that increase is pretty crucial.
No comments:
Post a Comment