Thursday, September 14, 2017

Signs of life

Well, we're a month into the semester (not that semesters matter much at this point). I have actually been putting in a fair bit of work on my survey which will be the basis for my Admission to Candidacy (AKA, the A exam). I haven't been posting about it or logging time because, well, I've been busy.

Anyway, I think my adviser may have tipped me off to a much better way to frame the solution we're after. When I was asking for guidance about reference materials on sub-sampling, he brought up the two most common techniques, the Jackknife and the Bootstrap (from the names they use, you'd think these early statisticians were all frontiersman, not dons at English and East Coast universities). I dutifully wrote a section on them and them, despite thinking it was sort of the opposite of what we were after when it suddenly dawned on me: it's the EXACT OPPOSITE of what we are after. That opens up a whole new line of inquiry: how to undo a process.

To understand why it's the opposite, look at the premise of both methods: data is expensive. You have to get subjects, take measurements, hope nothing gets screwed up that invalidates your measurements, remove observation bias, etc. The whole idea of classical estimator theory is getting as much information out of a small data set as possible.

In our world, the opposite is true. Data is essentially free. We're overrun by data. We have way too much of it. We're trying to subset it in a way that we can actually do something useful with it without compromising the population we're subsetting.

None of this actually solves the problem, but sometimes framing the question is half the battle. (Probably more like 10% of the battle here, but that's still not a small thing).

No comments:

Post a Comment