Anyway, the t distribution is sufficiently important that if ole W.S. had been a bit less bashful, we'd all be writing Gosset distribution all over the place. Basically, it addresses the fact that we typically don't know the variance of a sample. We can estimate it, but that's not the same as knowing it.
If Xi are iid N(μ, σ2) random variables, then
If we knew σ, then we could make all sorts of statements about μ based on the sample mean. But we don't. So what we need is to know the distribution of
Where S is the sample variance. While somewhat more complicated on the surface, the right side is actually just the ratio of a N(0,1) and a mildly transformed chi-squared. That is:
Importantly, U and V are independent. The distribution of this ratio is defined as t with p degrees of freedom (which is determined by the sample size). Grinding out the pdf is left as an exercise to the reader (or you could just look it up). In reality, everybody just uses the tables or a stats package. Tying this all back to the original question, if X1, ..., XN are iid N(μ, σ2) random variables, then
which we call a t distribution with n-1 degrees of freedom.
No comments:
Post a Comment