Saturday, October 15, 2016

Fat tails

Getting out of Q material and into the stuff I'm supposed to be researching this fall. Given my genetics, this is a rather fitting line of inquiry for me. My family isn't particularly portly overall, but we all have, shall we say, low centers of gravity. Especially when seated. I've passed this on to my daughter and she is not amused.

As mentioned yesterday, when higher moments don't exist, the mean of a series of independent, identically distributed (iid) random variables may not converge to a normal distribution. It will converge to something, though. And that something is generally what's called a stable distribution.

As these distributions generally don't have higher moments, the term stable may seem less than apt. It refers not to the fact that the means are predictable, but that the means (or any linear combination) of a (possibly infinite) series of random variables is a random variable of the same family.

For example, the normal distribution is a stable distribution because any linear combination of normal random variables is a normal random variable. The center and shape parameters may change, but the family remains the same.

The normal distribution is the only stable distribution to have all its moments. All the other stable distributions have fatter tails than normal and have an undefined variance. By the time the tails are as bloated as the Cauchy distribution, you've lost your mean as well. It actually gets worse than that; Cauchy is essentially the midpoint of the fat-tail spectrum. However, in real applications, the distribution is usually somewhere between Cauchy and Normal.

The sampling method I developed last spring (no, the paper is not done yet) adjusted for this by brute force. We simply stratified the data and made sure we completely sampled the highest strata. At that point, the remaining data is a finite-bounded sample and by definition has all its moments. Insofar as any database is really just a sample of a much larger set of observations that may or may not ever be recorded, this is obviously cheating (though it works!)

My next line of inquiry is: how do we really know when what we are looking at is a representative sample of this larger, theoretical population? To answer that question, I'm going to have to get a lot more familiar with the behavior of heavy-tailed distributions in general, and stable distributions in particular. Real math! Yay for that.

No comments:

Post a Comment