never2old4school: Overcomplicating things (Bayes' Rule)

Monday, September 19, 2016

Overcomplicating things (Bayes' Rule)

At the outset of this "Identify all named results" adventure, I mentioned that I already knew Bayes' Rule. I should hope so. It generally pops up in chapter 1 of any introductory probability text. I am surprised, however, by how often authors make it more complicated than it has to be.

First, the rule: P(A|B) = P(B|A)P(A) / P(B).

The easy way to remember it is to include the identity used to derive it:

P(A|B)P(B) = P(A ∩ B) = P(B|A)P(A)

Then just eliminate the middle term and divide by P(B). Done. Easy.

So why complicate it? For some reason, a lot of texts want to list what they call a "more general" result based on partitioning the sample space S into {A_i} (meaning, the subsets A_i are mutually exclusive and their union equals S). Then, the rule is re-written:

$P(A_i|B) = \frac{P(B|A_i)P(A_i)}{\sum_j P(B|A_j)P(A_j)}$

OK, that's certainly true, but how is it more "general"? There's nothing about that equation that requires A_i to be in the partition. It could be any set and the equation still holds. Adding constraints makes a formula less general, not more, so let's drop the subscript and just call the event of interest A:

$P(A|B) = \frac{P(B|A)P(A)}{\sum_j P(B|A_j)P(A_j)}$

Now, let's look at the denominator. Remember the original identity: P(B|A)P(A) = P(A ∩ B). Also recall that, if the sets are mutually exclusive, the sum of the probabilities is the probability of the union. Combine that with the basic associative properties of countable unions and intersections to get:

$\begin{align*} \sum_j P(B|A_j)P(A_j) &= \sum_j P(A_j \cap B)\\ &= P(\cup_j \{A_j \cap B\}) \\ &= P(\{\cup_j A_j\} \cap B) \\ &= P(S \cap B) = P(B) \end{align*}$

Which brings us back to the denominator of the original equation. Sure, if you have a partitioned space, feel free to substitute it in, but that makes it a special case of the theorem. The simple version is the general case.

One could, of course, argue that the simple version is just the partitioned version where the partitions are simply A and A^c. That's still a bit lame, as the denominator is messier than it needs to be and, again, that partitioning is buying you absolutely nothing up top. If computing the denominator is easier when done by partitions, go for it, but that is simply a computational substitution, not a generalization of the result. The numerator is the same no matter how you label it.

never2old4school

Monday, September 19, 2016

Overcomplicating things (Bayes' Rule)

No comments:

Post a Comment