Sunday, April 10, 2016

Next big thing

Tons to do before the current paper is ready for publication, but it's not too early to think about what's next. My main priority this summer will be to make sure I'm ready for the Q. I'm not worried about Analysis, that's always been a really strong subject for me. I've already gone through Linear Algebra, so that's "just" practice (I use quotes because I think it will be quite a bit of practice to get that really sharp). That leaves the two stats courses. I don't know the exact content of those courses, but it appears to be fairly standard upper-undergraduate level stuff. As with Algebra, it's just a matter of doing enough problems to bring it all back the the surface.

I've got about 20 weeks to work with and I'd think 100 hours of working problems should be enough, so that's 5 hours a week. That leaves at least 10 hours a week for getting real research done. If I can tie it in with our re-platforming efforts at work, I could probably double that to 20. Subtract out this coming month which needs to be dedicated to finishing off this semester (and my CISS paper) and I've got somewhere between 200 and 300 hours. A lot can be done in that amount of time.

There are two possible paths with respect to dissertation work. One would be to continue on the implementation road by working on adding dynamic partitioning to the stratification. The other would be to dig into understanding the mathematical properties of heavy tailed distributions and the implications for processing financial data. The latter is a fairly active line of research right now, which means there's probably a lot less low hanging fruit for publication. However, it also means that cracking a tough nut could lead to a dissertation with some real significance.

My gut is telling me to take the first route since that will be much easier to fit in with activities at work. We're currently spinning up scalable environments for our larger analytics databases and dynamic partitioning would play right into that.

To that end, I spoke with my Data Mining prof about the presentation that's due this Tuesday. The assignment is to present the BIRCH algorithm. Well, with all six grad students presenting the same paper, the undergrads are going to be pretty bored by the end of class (this is a mixed grad/undergrad course). However, my thoughts on dynamic partitioning are pretty close to how BIRCH works. Optimal partitioning is really just a clustering problem where the distance metric is whether or not two rows show up in the same query. Furthermore, I was already going down the b+tree path for managing the splits, which is how BIRCH does it as well. So, I convinced him that I should present my forward looking view, that is, what ideas I'm going to use from BIRCH rather than the algorithm itself. It was actually a rather easy sell; I think he was somewhat dreading hearing six more or less identical presentations on a topic he already understands.

Now there's just the trivial matter of getting all my thoughts together and trimming it to a 10-minute presentation in two days.


No comments:

Post a Comment