Friday, March 2, 2018

The big miss

The reason the actuaries are leery of using sampling techniques in their analysis is one backed up by fact: you might miss something really big. (Most of the habits of actuaries are backed up by facts; you'd be hard pressed to find a more rational group of folks. I guess spending 2000 hours a year putting a price tag on death has that effect).

I can't say my research has done anything to assuage that fear. For example here's the kernel distribution of the hit rate from a sample that represents to overall population:


Everything lines up nicely. The sample average is very close to the true average and the estimate of the block variance is spot on. Notice the bump at 0.65. That is coming from a two of the ten sampled blocks. Because the hit rate for those blocks is much higher than most of the blocks, it's really important that they be represented. Not only does it pull up the average, but it also widens the variance. Say that, by dumb luck, instead of sampling one of those blocks, we had got a different one with a lower hit rate.


To the naked eye, this distribution doesn't look that much different, but it is. The estimate is significantly low for both the sum and the variance. This coupling is particularly problematic. Samples that produce bad estimates are also the ones that return the tightest confidence intervals. In other words, the worse the sample is, the better it thinks it is. Not a great combination.

It's not easy to adjust for this because, in the absence of the extreme values, there's nothing in the sample to indicate that the variability is as high as it is. Bootstrapping won't help because we'd still be looking at just the elements in the sample.

So, what if we reversed the process? Instead of resampling the data we've already seen, we invent different data and ask how much things would change if this data was introduced. The actuaries do this all the time, it's called sensitivity analysis. It's basically measuring how badly a missed assumption (or sample in our case) hurts you. If it doesn't make much difference, you can be pretty confident that your estimates are good. If introducing a single new data point really shakes things up, that's a clue you might want to keep sampling. I'm going to code this up real quick and see how it performs.

Well, not super real quick. I've got a six-hour race tomorrow and I also promised Yaya I'd give her a driving lesson. But, I will get it done this weekend.

No comments:

Post a Comment