Saturday, November 19, 2016

Student's t-distribution

Let's get right to the obvious: "Student" is, in fact, a real guy named William Sealy Gosset. He had some issues. Not only did he not like his name (he regularly went by W.S. Gosset rather than William, or Bill), he felt even that lacked sufficient anonymity, so he published under the name of "Student". Dude, what's your deal? What's even nuttier is that he worked for Guinness. Yes, the brewery in Dublin. You'd think a guy who got free pints of stout would be a bit more chill.

Anyway, the t distribution is sufficiently important that if ole W.S. had been a bit less bashful, we'd all be writing Gosset distribution all over the place. Basically, it addresses the fact that we typically don't know the variance of a sample. We can estimate it, but that's not the same as knowing it.

If Xi are iid N(μ, σ2) random variables, then



If we knew σ, then we could make all sorts of statements about μ based on the sample mean. But we don't. So what we need is to know the distribution of



Where S is the sample variance. While somewhat more complicated on the surface, the right side is actually just the ratio of a N(0,1) and a mildly transformed chi-squared. That is:



Importantly, U and V are independent. The distribution of this ratio is defined as t with p degrees of freedom (which is determined by the sample size). Grinding out the pdf is left as an exercise to the reader (or you could just look it up). In reality, everybody just uses the tables or a stats package. Tying this all back to the original question, if X1, ..., XN are iid N(μ, σ2) random variables, then



which we call a t distribution with n-1 degrees of freedom.

No comments:

Post a Comment