never2old4school: For lack of a better term

Wednesday, March 16, 2016

For lack of a better term

Or post, for that matter. Realized that I didn't post anything yesterday. There's no law that says I should, but I like to get one out every day. So, I'm double posting today. And, since I haven't been able to come up with anything better, I am now naming my algorithm "Confidence-based Incremental Stratified Sampler" (CISS).

I may change that at some subsequent date, but it is a somewhat catchy acronym and a faithful description of what it actually does, so it's probably a keeper. Now I just need to get it working and then get to writing. Time's running out on this semester.

Things that still need to be done:

Compute prior. For now, I'm just going to start with the overall distribution of the data as the prior. I'll need that as a fallback anyway since not all data sets will have easy to compute priors for subsets of the data. The prior will actually sit on each stratum; that is, I'll have a parameter S_i which is the absolute sum of the values for stratum i. So, for the first step, this is just the row count for the stratum times 2ⁱ (the average magnitude of values in the stratum).
Compute the posterior. Again, we'll start with the simple one. We'll just weight the prior against the actual data we've collected so far. An important caveat is that when we have sampled the entire stratum, we crank the confidence interval down to a point.
Compute the next stratum to sample. This is easy to do; simply assign a probability to the stata. The only question is whether it will converge faster if we assign the probability proportional to the variability (width of the confidence interval) or the mass of the distribution. I'm hoping for the first because that presents fewer edge cases (a fully sample stratum will get zero probability of further sampling). But, I suppose I really should check both techniques.

None of this is particularly hard, so I'm still hoping to get it ready to present to my adviser tomorrow afternoon. Might be a late night, though.

never2old4school

Wednesday, March 16, 2016

For lack of a better term

No comments:

Post a Comment