Tuesday, November 29, 2016

Box-Muller random normal generation

This topic might be considered a bit archaic since any decent stats package comes with functions that will generate random numbers from any of the major distributions. That said, somebody has to actually write those routines and knowing how its done is helpful for knowing the limitations. More to the point in my case: it might come up on the test.

We start with the assumption that we can generate a reasonably random sequence of U(0,1) random variables. This is, of course, a whole field of research in and of itself. However, for now, we'll just assume that we can do that. There are several ways to proceed from there.

If you want uniform random variables, well, you're pretty much done. Just scale and shift it to the interval you want.

When the pdf is closed form and invertible, plugging the uniform random variable (U) into the the inverse gets the job done. So, for example, if you want exponential random variables, just invert the exponential pdf, Y = FY-1(U) = -λ log(1-U) and you're done.

Of course a lot of distributions don't have nice, closed-form, invertible pdf's. In that case, you may be able to use a transform of one that does. "May" being the operative word. Some transforms are simpler than others.

Of particular note is the normal distribution. Since it comes up so frequently, one would obviously want to handle this case. However the normal pdf is not in closed-form and there's no good transformation from a variable that is. In an odd and fortuitous quick, there is a transformation from two random variables, one uniform, one exponential (the exponential, as shown above, would also likely come from a uniform). This method goes by both the Box-Muller algorithm (for it's creators) or the Polar method (which describes how you actually pull it off).

Without going into the formal proof, here's how it works. Start with the observations that, if Y ~ exp(1/2), then Y ~ χ22 and if X ~ N(0,1), then X2 ~ χ12. Therefore, the sum of two squared standard normals is exponential. There's not much one can immediately do with that, but it should at least suggest a relationship. In particular, what if we were to consider the normals as the legs of a right triangle and the exponential the hypotenuse? How many ways are there for that to happen? Well, you basically have the whole unit circle times the exponential length. So, let's randomly generate polar coordinates as uniform and exponential and then take the cartesian coordinates as normal.

Crunching the actual proof is not as daunting as it may seem. Once you see that switching to polar coordinates is your answer, it falls out pretty quickly. So, (without proof), if we have two uniform(0,1) random variables U1, U2 and R = (-2 log U1)1/2 and θ = 2πU2, then X1 = Rcosθ and X2 = Rsinθ are independent N(0,1) random variables.

Why stress over generating one when you can get two for cheap?

No comments:

Post a Comment