The first is, again, iid. There's nothing remotely interesting about sampling iid observations, but it does allow us to establish the framework and methods without all the complexities that follow.
The next is sequential correlation, typically caused by the fact that things that are loaded together tend to be related and also tend to get stuck next to each other in the storage system. This was also part of the plan.
The third step ends up flowing pretty simply from that. Suppose things aren't strictly sequential, but there's still a lot of correlation. This happens on massively parallel systems where writes get sent to multiple nodes according to some hash algorithm. It has the effect of breaking up batches, but the little chunks are still sequential and highly correlated. This case is actually a little cleaner mathematically because you don't have the annoying boundary conditions for the first or last batch in a block (which typically spill into adjacent blocks).
The actual methods work the same in all three cases (though you'd be a fool to use the complicated methods if you knew the data was iid). The results vary quite a bit, though. Plus, in the context of a full dissertation, the third case sets me up nicely to follow with D-BESt.
My meeting with my adviser got pushed back to tomorrow, so I should know by tomorrow's post if that's the way we're going.
No comments:
Post a Comment