never2old4school: Pareto

I haven't done much (hardly any) work with Pareto random variables. That's actually a little odd since they show up so often in financial analysis. The Pareto distribution is the mathematical basis of the 80/20 rule. That is, 80% of the wealth (or income, or effort, or whatever you're measuring) is held by 20% of the population. Furthermore, as a power distribution, you can apply this rule as often as you like. So, if you just look at that top 20%, you find the same 80/20 rule applying. This pattern is definitely present in my data, as I noted way back when I started down this road 2 years ago.

It doesn't have to be 80/20; that's just the classic statement of the "Pareto Principle". Pareto found that 80/20 was the most common ratio with respect to wealth distribution both in his home country of Italy and pretty much every other country for which he could get data. That seems to have held up pretty well over the subsequent 125 years not only for wealth but a lot of other things as well. The actual distribution can be tuned to have an inflection point somewhere else (for example 90/10), but 80/20 works for a surprisingly large number of cases.

Believers in math religion (that is, that the world is somehow governed by math) will quickly note that the scale parameter, α, that yields the 80/20 rule is a very pretty (log 5)/(log 4). Yes, the Ancient Greeks would have loved it (except for the bit about it being an irrational number; they weren't down with that idea). However, I find that view dangerously naive. Math is very useful for describing reality, but it certainly doesn't dictate it. 80/20 is just a nice approximation, nothing more.

But, we're drifting into Philosophy and right at the moment I need to be focused on Statistics. The reason Pareto is useful to me is that it's a common distribution with an infinite variance. As such, it's a better example than the Laplace distribution I have been using. So, I think I'm going to swap it into the analysis.

There is a small problem when dealing with positive and negative numbers. Laplace is a simple symmetrical version of the exponential distribution. You can do that with Pareto as well, but you get a hole in the domain. Pareto random variables are bounded away from zero. So, if your minimum value is x_m, then simply folding the distribution over on itself means that you can have observations anywhere on the real line except the interval (-x_m, x_m). This not a particularly hard problem to get around, just shift the distribution closer to zero by x_m.

The downside of that is that it screws up the 80/20 rule. Another option is just to not worry about values with magnitude less than x_m. My model already assumes a mixture of a real-valued random variable and one that is constant zero, so I don't really need to do anything to adjust for that. The draw back here is that I'm really giving up a lot of the center of my distribution. To get the interquartile range of the standard normal with a symmetric 80/20 Pareto, the minimum value is approximately 0.37. So, the "hole" in the distribution is over a quarter the size of the inter-quartile range. By contrast, the standard normal puts nearly 30% of it's distribution in that same interval and Cauchy jams in a bit more.

However, it's the tails I'm really interested in, so I'm not going to stress much over the interior. I don't really want to devote a whole paragraph in the paper to how I transformed the pseudo-random data just to do a comparison. I'll just state it as symmetrical Pareto and let the results speak for themselves (unless, of course, my adviser vetoes that approach).

Just for the record (so I don't have to derive it again) the Pareto parameters that give 80/20 distribution with an interquartile range equal to standard normal are x_m=0.3713 (I'm using 4 significant digits in my parameters) and α=1.161.

never2old4school

Sunday, December 3, 2017

Pareto

No comments:

Post a Comment