I had a brief email conversation with my Thesis Advisor from Cornell, Bruce Turnbull, about the topic of my research. It basically centers around the idea of an Optimal stopping rule for database queries. In a nutshell, the question is, at what point are you beating your head against a wall by bringing back more rows?
At work, I deal with databases that are measured in terabytes. Lots of terabytes. Single queries often bring back several hundred billion rows. Really? Do you really need that many? Of course not. But, how many do you need? Are you sure? With what level of confidence? And, most importantly, how is the best way to select just the rows you do need? That's the problem I'm interested in working on.
To do it, I'm going to need to get myself onto the cutting edge of large database design (that's the easy part, as what we're doing at work is pretty close already) and also take a fresh look at predictive statistics. So, my lesson plan between now and classes starting in the fall is to dust off all my statistics work from 25 years ago.
No comments:
Post a Comment