Wednesday, January 10, 2018

Competing goals

I'm pretty much stalled on my research again. Some of this has been due to a year end deadline at work, but mostly it's been that I've been having a real hard time modifying the thrust of the paper. Simply put, my adviser and I have competing goals. He wants something with academic merit; I want something that works. The intersection of those two is turning out to be a lot smaller than I had realized.

The truth is that, once you've stratified the data, you're done. The sampler works, period. Sure, we can refine the variance estimate a bit, but that's really splitting hairs. Nobody would use this or any other sampler for financial planning and reporting; there are no generally accepted regulatory practices that allow that. The use of this algorithm is in exploratory analysis where "good enough" is just that.

Simply stratifying the data and then performing a method of moments estimation of the variance is not the stuff of academic journals. It's more like something you'd publish in an industry white paper. Or, perhaps, as a chapter in a dissertation. So, while CISS is a huge win on the practical front, it appears it's just not worthy of a stand-alone publication in a peer-reviewed journal. Trying to make it such is just complicating the algorithm which, in turn, makes it less useful.

All that said, we have turned up some interesting stuff along this road. The original line of inquiry was how to deal with the correlation of the individual observations. Stratifying and blocking the data obviated that discussion for the sampler, but that doesn't mean the discussion is pointless. The BMH and BLB algorithms both have real challenges with correlated data, even when it's not heavy tailed. Shifting the focus back to that problem and showing how we can sample by block rather than by row even in correlated data has both academic and practical merit.

So, I think I'm going to punt on the practical for now and go where my adviser has been trying to push me for the last few months. We've got some interesting direction with the Markov direction and most of the theory section from the CISS paper is still applicable. A pivot will require a rewrite of a few sections, but it doesn't look like more than a few days of work to me. Meanwhile, the code is written in such a way that you can set the strata = 1 and it all runs fine, so I don't have to redo the implementation right now (at some point, I am going to re-host it as a fully-parallel algorithm on our HDFS cluster, but I think that can wait for the moment).

No comments:

Post a Comment