Wednesday, September 20, 2017

CISS vs. U-statistic

Leaving aside the problem of second moments, there are some other important differences between the CISS estimator and a true U-statistic.

Just to be clear, I'll digress to define terms. Given a sample with n observations, we can find functions of the observations that deliver unbiased estimates. That is h(X1, X2, X3, ..., Xn) such that E(h()) = theta, where theta is the parameter we're after. If we can restrict h to the first r observations while preserving the unbiased expectation, we get an unnatural estimator. It works, but it's not using all the information available. To use all the information, we take the average of h over all sets of r observations from the sample. This is called a U-statistic.

U-statistics have lots of nice properties.

CISS isn't a U-statistic.

First and foremost, the estimator CISS produces is biased. That bias goes away the longer you sample. In fact, the CISS estimator converges Almost Surely to the sample mean, which is the strongest form of stochastic convergence you can get. (It's actually even stronger than that; CISS is the sample mean if you let r=n). However, the first items is much more likely to be pulled from one of the strata with large values. Unless the distribution is perfectly symmetric around mean (which is definitely not a safe thing to assume), the first few observations are likely to come from one of the two tails and the statistic will be biased towards whichever tail is fatter.

Secondly, the samples are anything but iid. Sampling a stratum reduces the chance that it will be sampled again. This is particularly true when r is small and we're looking at the blocks from the smaller strata. If one sample from a low-magnitude stratum makes it into the sample, there's just about zero chance that a second will until many higher-magnitude strata have been sampled. Even as r approaches n this doesn't really go away.

So, the CISS estimator, does not qualify as h() and, even if it did, we're still not using that in a way that would make the result a U-statistic.

However...

As noted above, it does converge almost surely to a U-statistic (the sample mean, which is simply the r=1 U-statistic with h(X) = X as the kernel. Seems like there should be something there. I'm not sure how to turn that intuition into a theorem.

No comments:

Post a Comment