I may change that at some subsequent date, but it is a somewhat catchy acronym and a faithful description of what it actually does, so it's probably a keeper. Now I just need to get it working and then get to writing. Time's running out on this semester.
Things that still need to be done:
- Compute prior. For now, I'm just going to start with the overall distribution of the data as the prior. I'll need that as a fallback anyway since not all data sets will have easy to compute priors for subsets of the data. The prior will actually sit on each stratum; that is, I'll have a parameter Si which is the absolute sum of the values for stratum i. So, for the first step, this is just the row count for the stratum times 2i (the average magnitude of values in the stratum).
- Compute the posterior. Again, we'll start with the simple one. We'll just weight the prior against the actual data we've collected so far. An important caveat is that when we have sampled the entire stratum, we crank the confidence interval down to a point.
- Compute the next stratum to sample. This is easy to do; simply assign a probability to the stata. The only question is whether it will converge faster if we assign the probability proportional to the variability (width of the confidence interval) or the mass of the distribution. I'm hoping for the first because that presents fewer edge cases (a fully sample stratum will get zero probability of further sampling). But, I suppose I really should check both techniques.
None of this is particularly hard, so I'm still hoping to get it ready to present to my adviser tomorrow afternoon. Might be a late night, though.
No comments:
Post a Comment