Why Excel?
Solving the Cookie Problem
As always, the row labels (CilantroFree and CilantroAdd) are your hypotheses. You can’t know them directly, but the whole goal of this exercise is to use data (that’s the columns) to increase the difference in probability until you can feel comfortable that one hypothesis is likely enough to act on.
p(HD) = p(DH)p(H) / p(D)
p(D) = ∑p(DHi)p(Hi), or p(D & Hi) for all i, assuming hypotheses are MECE
∑: Sum of all (In this case, we’re saying that the probability of the data is p(D & H) for all possible hypotheses, assuming they are MECE
MECE: Mututally Exclusive and Collectively Exhaustive. This is a great term for a concept that we all intuit but don’t have a good word for. Generally, we should be formulating our set of hypotheses so that there’s no chance that the data could have come any other way. So I’m not saying that the cookie could have come from the CilantroFree, CilantroAdd, or “an unknown variety of other sources”. If there are other sources, I need to make that a definite hypothesis, with a prior, and everything that my first two have.
In this simple case, ∑p(DHi) just means p(V & CF) =75 plus p(V & CA) = 50 <EDIT it’s actually this divided by the total, 200>

Vanilla 
Chocolate 
Total 
CilantroFree 
75 
25 
100 
CilantroAdd 
50 
50 
100 
Total 
125 
75 
200 
So we have all of this information, and BAM – we draw a Vanilla cookie. That means we have a piece of data. Whereas our previous best guess at where a given cookie came from was 100/200 for each container (from the ‘total’ column, since we had no more specific information), we can leave those totals behind and focus on the ‘Vanilla’ column, where the probability is the percentage of vanilla cookies from each jar.

Vanilla 
CilantroFree 
0.6 
CilantroAdd 
0.4 
I did all of this with a count of cookies, to keep it intuitive. Let’s reply the exact same thing with probabilities, which will allow us to take the critical leap in our next section:

Vanilla 
Chocolate 
Total 
CilantroFree 
37.50% 
12.50% 
50.00% 
CilantroAdd 
25.00% 
25.00% 
50.00% 
Total 
62.50% 
37.50% 
100.00% 
This is exactly what we saw above, except that I’ve divided everything by 200 (the total number of cookies). Look at the lower right corner first. The probability that you draw a cookie of either type from either jar is 100%. Awesome. Look at the Total column – the probability that you draw it from the CF jar is 50%, and the same from the CA jar, because they had an equal number of cookies. That’s p(H), by the way. Similarly, p(D) is the probability that you drew that vanilla. It’s 62.5%, because there are more Vanilla cookies overall. So, returning to the central equation:
p(HD) = p(DH)p(H)/p(D)
p(HD) = (37.5%/50%) x 50% / 62.5%
p(HD) = 60%
What we really just did was say “Multiply the priors of each hypothesis by how likely they were to give the data, then divide everything by the sum to make them sum to one again”. In this case, you’re saying “Now that we know we have a vanilla cookie, the chances of the cookie coming from CF are 75 (number of vanilla cookies in CF) over 75 + 50 (number of vanilla cookies anywhere)”. That’s pretty intuitive when visualizing cookies, but it feels a little weirder when talking about probabilities. But those numbers are just the same, divided by the total number of cookies.
Where it gets really interesting is if you did this cookie draw again. Imagine that the vanilla cookie was put back in its original jar, and then another cookie was randomly drawn (I can’t think of a story that would justify this, but work with me here). Here’s the deal. You can actually think of those two draws as one piece of data: VanillaVanilla, VanillaChocolate, ChocolateVanilla, or ChocolateChocolate. You simply multiply the original probabilities together (so p(Vanilla Chocolate  CF) = 37.5%x 12.5%). Here’s how that plays out:
First Draw 
Vanilla 

Chocolate 


Second Draw 
Vanilla 
Chocolate 
Vanilla 
Chocolate 
Total 
CilantroFree 
14.06% 
4.69% 
4.69% 
1.56% 
25.00% 
CilantroAdd 
6.25% 
6.25% 
6.25% 
6.25% 
25.00% 
Total 
20.31% 
10.94% 
10.94% 
7.81% 
50.00% 
<NOTE: Explain why this doesn’t sum to 100% – I think it’s because this table makes the assumption that you’re drawing from the same jar twice, which is a 50% chance)
Or, dividing everything by the totaltotal again:
First Draw 
Vanilla 

Chocolate 


Second Draw 
Vanilla 
Chocolate 
Vanilla 
Chocolate 
Total 
CilantroFree 
28.13% 
9.38% 
9.38% 
3.13% 
50.00% 
CilantroAdd 
12.50% 
12.50% 
12.50% 
12.50% 
50.00% 
Total 
40.63% 
21.88% 
21.88% 
15.63% 
100.00% 
So if you drew VanillaVanilla from the same jar, your probability of it coming from CF is 28.13% / 40.63% = 69.23%. Note that this is higher than the 60% certainty you had with one draw, and any further straight vanilla draws would increase the chance even further.
But we’ve already tortured this analogy as far as we can. Cookies in a jar are easy to visualize for one draw, but lets move on to something that lends itself more easily to repeated experimentation
Solving the Coin Problem (Simplified)
One confusing part about going from the cookie example to this coin problem is that the hypotheses are now numeric, which is convenient for calculation, but a little confusing. Let’s say that we encounter a weirdshaped coin, and we have literally no idea what the odds are of ‘heads’ versus ‘tails’ (however that’s defined on this coin). Instead of the hypotheses being that you’re drawing from one jar instead of another (both of which exist and have a definite number of cookies of each type), it’s that you’re flipping one of several possible coins (only one of which exists, and has a single attribute “probability of heads”). To make this somewhat more concrete, here’s a table like the one we saw for cookies, but for a set of “probability of heads” possibilities:
0 Heads, 0 Tails 



Probability of Heads 
Heads 
Tails 
Total 
0% 
0 
0.2 
0.2 
25% 
0.05 
0.15 
0.2 
50% 
0.1 
0.1 
0.2 
75% 
0.15 
0.05 
0.2 
100% 
0.2 
0 
0.2 
Total 
0.5 
0.5 
1 
It’s critical that we keep in mind that the left most column is for hypotheses, not results. So “25%” is just like “Cilantro Free” or “Cilantro Added”, in that it’s the thing we want to figure out, and which we’re going to build confidence in based on data.
Speaking of data, we still have two possible outcomes, now Heads and Tails instead of Vanilla and Chocolate.
We start by looking at our chances if we’ve have 0 flips of the coin. We’re assuming that all hypotheses are equally likely, including the possibility that the coin could never turn up Heads, as well as the possibility that it will always turn up heads.
Let’s look down the ‘Heads’ column (remember, each of these numbers represents the probability that both the hypothesis and the data are true). We shouldn’t be surprised to see that the there’s no chance that both our hypothesis that Heads is impossible, and that our flip ends up Heads. To look at it the other way, if our flip ends up Heads, we have eliminated this hypothesis forever, even if we get straight Tails after that.
What’s interesting is that the opposite is not true. Toward the bottom of the column, we see that the assumption that the coin will always flip Heads does not necessarily go to 100% if we get a heads. That would be true even if we got 1,000 Heads in a row (or a million, or an billion). As long as there are other hypotheses that could also explain that string of Heads (even if the hypothesis that Heads is 50% likely, and you’ve just gotten exceedingly lucky), you can’t say for sure that the 100% Heads hypothesis is true. This is the mathematical reasoning behind the statement “No amount of data can prove a theory, but one data point can refute it”, or the story that people assumed that all swans were white, until someone saw a single black swan and *poof*, obliterated the theory that had seemed clear for hundreds of years.
Let’s say we got Heads on that first flip. That’s like filtering out the Tails column, leaving just the Heads (note that there’s now a probability of 1 at the bottom of the Heads column, since that’s what actually happened):
1 Heads, 0 Tails (end of first flip) 



Probability of Heads 
Heads 
Tails 
Total 
0% 
0 

0 
25% 
0.1 

0.1 
50% 
0.2 

0.2 
75% 
0.3 

0.3 
100% 
0.4 

0.4 
Total 
1 

1 
So what if we got a Heads on our first flip, and we wanted to flip again? This is analogous to the end of the Cookie section, where we drew a second cookie from the same jar. Here, we’re getting a second result from the same coin – another output of the same unknown system.
We constructed a somewhat awkward table for the Cookie, with a two layered column header, which showed the probability of every pair of outcomes. For simplicity’s sake, I’m going to ignore the columns where Tails came up first, since we’ve already gotten one Heads:
1 Heads, 0 Tails 




Heads 
Tails 

0% 
0 
0 
0 
25% 
0.0125 
0.0375 
0.05 
50% 
0.05 
0.05 
0.1 
75% 
0.1125 
0.0375 
0.15 
100% 
0.2 
0 
0.2 

0.375 
0.125 
0.5 
What’s interesting here is that the equations are exactly the same, but the prior was copypasted from the Heads column of the last table. That’s because, when we flipped that first Heads, we learned more about the coin. For example, we learned that it was impossible that it could have a 0% chance of heads. Now I take that prior, multiply it by the probability of getting another Heads for each hypothesis, and get the probability of HeadsHeads for each of them. Note that the TotalTotal is just .5, instead of 1. That’s because we’re looking at a subuniverse, in which we got heads for that first flip (which had a probability of .5, given our priors). I didn’t do the rebalancing that we saw in the onecolumn table before, so each box here is the probability of HeadsHeads, as judged before we’ve flipped anything.
Here’s where we start to see the future – we can just as easily do this a third time:
2 Heads, 0 Tails 




Heads 
Tails 

0% 
0 
0 
0 
25% 
0.003125 
0.009375 
0.0125 
50% 
0.025 
0.025 
0.05 
75% 
0.084375 
0.028125 
0.1125 
100% 
0.2 
0 
0.2 

0.3125 
0.0625 
0.375 
Again, the Heads column here represents HeadsHeadsHeads, as judged from the standpoint of not having flipped the coin at all. We can basically say, before flipping anything, “There is a .025 chance that the coin is 50% likely to be heads, and we get HeadsHeadsHeads”.
Let’s say that we did get HeadsHeadsHeads. then the Total at the bottom will be 1, and everything else in the column scales up by the same factor. So it would look like this:
2 Heads, 0 Tails 


Heads 
0% 
0 
25% 
0.01 
50% 
0.08 
75% 
0.27 
100% 
0.64 

1 
So, if we got HeadsHeadsHeads, we’d say there was a .64 chance that the coin was 100% likely to come up heads.
Solving the Coin Problem
This ‘copypaste the old posterior as the new prior’ thing makes some intuitive sense (each flip is like a new test, starting with the results from the last), but it gets awkward. Particularly if you end up with a mixture of heads and tails. But if you remember the fundamental equation that we’ve got going here:
p(HD) = p(DH)p(H)/p(D)
We can connect it with the extremely large set of tools for calculating p(DH). In this case, it would be the binomal distribution.
Probability of first Heads, as calculated using =BINOMDIST(1,1,HYPOTHESIS,False)*PRIOR

Heads 
Tails 

0 
0 
0.2 
0.2 
0.25 
0.05 
0.15 
0.2 
0.5 
0.1 
0.1 
0.2 
0.75 
0.15 
0.05 
0.2 
1 
0.2 
0 
0.2 
This matches the other method

HHH 
0 
0 
0.25 
0.003125 
0.5 
0.025 
0.75 
0.084375 
1 
0.2 

0.3125 
Then, normalizing

HHH 
0% 
0 
25% 
0.01 
50% 
0.08 
75% 
0.27 
100% 
0.64 

1 
BAM – exactly what we got the other way
Let’s extend it to 150 heads and 10 Tails. Because, we can!

Heads 
0% 
0 
25% 
0.0000002081245439 
50% 
0.004546550037 
75% 
0.9954532418 
100% 
0 
100%, which had been in the lead, is thrown out by that single Tails. 75% is now the clear favorite.