Thursday, January 5, 2017

Birnbaum's Theorem

This one relies on some tediously technical definitions, so I'll state it informally.

Sufficiency Principal: If S(X) is a sufficient statistic for X. And you have an experiment that produces S(X) then the evidence from the experiment is the same whether you use X or S(X). This is a slight extension of sufficient statistic, because now it brings in the notion of how the data was collected.

Conditionality Principal: If you have multiple ways of testing something and you pick one randomly, then the evidence from that test is the same as if you had intentionally run just that test. That is, the fact that some other test might have produced the same result in a different way is irrelevant.

Likelihood Principal: The evidential significance of an experiment is determined by the likelihood function. That is, any conclusions from the experiment are dependent only on how likely one is to observe such data given the conclusion is true.

Note that all three of these are axioms. You can accept them or not. Birnbaum's Theorem shows that the first two imply the third and the third implies the first two. So, if you accept the Likelihood Principal, you have to buy into Sufficiency and Conditionality.

Sufficiency is pretty well accepted. The other two are subjects of considerable debate.

To see why, consider this fairly simple experiment: I want to know how often a baseball player gets on base for every time they come to bat. I can watch 20 at bats and count the times they reach base. Or, I can see how many tries it takes to get on base seven times. Suppose I choose the latter and it takes them 20 tries. That's a plausible result from the first experiment as well. Either way, I get a maximum likelihood estimate of 0.35, which is a pretty good on base percentage.

But, I'd like to know how certain of that I should be. Using standard Null Hypothesis Significance Testing (NHST) techniques, I would say that the 90% confidence interval was 0.15 to 0.60. That's because I was waiting for the seventh success, so I use a Negative Binomial distribution. But, if I was running the first experiment, my data would be distributed by a Binomial distribution. The exact same 7 out of 20 gives a confidence interval of .19 to .56.

Why should I get two different confidence intervals for the exact same data? Why does it matter what my intentions were? I'm trying to measure the batter, not me. What's worse, suppose it wasn't even my intention. Suppose I just flipped a coin to decide what experiment to run. My confidence interval for a players ability is now affected by the result of an independent coin toss!

For this reason, most Bayesians reject the Conditionality Principal in favor of using prior beliefs. Of course, that still means my result is going to depend on the beliefs that I brought to the experiment, not just the data, but at least I've quantified them up front.

Quantum physicists had a really hard time with this idea a hundred years ago. It just seems plain wrong that an outcome is different simply because we chose to look at it differently. At least they were honest enough to freak out about it. Sadly, the preponderance of statistical research is done in complete ignorance of this exceedingly fundamental fact.

No comments:

Post a Comment