I was partially through Leonard Mlodinow’s entertaining The Drunkard’s Walk (How Randomness Rules Our Lives) when the author’s solution to an intriguing problem didn’t sit quite right with me.
Part one goes something like this:
Let us suppose that a distant cousin has two children. You know one or both are girls, and you are trying to remember which it is – one or both? In a family with two children, what are the chances, if one of the children is a girl, that both children are girls?
The intuitive, and incorrect, notion is to figure that we know one of the children is a girl, and we know the chances of the other being a girl is 50 percent, so the probability that both are girls is 50 percent. Mlodinow dispenses with this reasoning:
Although the statement of the problem says that one child is a girl, it doesn’t say which one, and that changes things…The new information – one of the children is a girl – means that we are eliminating from consideration the possibility that both children are boys….That leaves only 3 outcomes in the sample space: (girl, boy), (boy, girl), and (girl, girl).
Since all three of these outcomes are equally likely, the chances of two girls (girl, girl) is 1 in 3, or 33 percent. So far, so good. A few chapters later, he offers a twist.
The variant is this: in a family with two children, what are the chances, if one of the children is a girl named Florida, that both children are girls?…I picked (Florida) rather carefully, because part of the riddle is the question, what, if anything, about the name Florida affects the odds?…are the chances of two girls still 1 in 3?…The fact that one of the girls is named Florida changes the chances to 1 in 2.
What. The. Heck. Why would it matter if we knew her name? Every girl has a name! What if we were looking for Allisons or Bridgets? Even the New York Times reviewer had trouble believing it.
The author proceeds with his explanation, offering the statistic that only about 1 in 1 million girls are named Florida.
Since our original sample space should be a list of all the possibilities, is this case it is a list of both gender and name. Denoting “girl-name-Florida” by girl-F and “girl-not-named-Florida” by girl-NF we write the sample space (boy, girl-F), (girl-F, boy), (girl-NF, girl-F), and girlF, girl-NF)I am skipping the steps in which he prunes (boy, boy) because we know there is at least one girl, and (girl-F, girl-F) as being irrelevantly unlikely which are, to a very good approximation, equally likely. Since 2 of the 4, or half of the elements in the sample space are families with two girls, the answer is not 1 in 3 – as it was in the two-daughter problem – but 1 in 2.
Imagine that we gather into a very large room 75 million families that have two children, at least one of whom is a girl. As the two-daughter problem taught us, there will be about 25 million two-girl families in that room and 50 million one-girl families (25 million in which the girl is the older child and an equal number in which she is the younger). Next comes the pruning: we ask that only the families that include a girl named Florida remain. Since Florida is a 1 in 1 million name, about 50 of the 50 million one-girl families will remain. And of the 25 million two-girl families, 50 of them will also get to stay, 25 because their firstborn is named Florida and another 25 because their younger girl has that name. It’s as if the girls are lottery tickets and the girls named Florida are the winning tickets. Although there are twice as many one-girl families as two-girl families, the two-girl families each have two tickets, so the one-girl families and the two-girl families will be about equally represented among the winners.
I reread that paragraph ten times before I started to understand, but I couldn’t shake the fact that the “1 in a million” chance wasn’t factored in anywhere – the rate of two-girl families had simply changed from 1/3 to 1/2. What if the chance of a girl being named Florida was 1 in 1,000? 1 in 10? 1 in 2?
The “1 in 2″ thinking set me on the right path. Let’s plug the assumption that 50% percent of girls are named Florida into his lottery numbers above. Of the 25 million two-girl families, 25 million will get to stay, 12.5 million because their firstborn is named Florida and 12.5 million because their younger girl has that name.
Notice anything wrong? Of course you do. All 25 million two-girl families won’t have a girl named Florida. It’s like saying that if there’s a 50% chance of rain on Friday and a 50% chance of rain on Saturday then we’re 100% in for a rainy weekend. You can’t add simply add probabilities; you have to approach it another way: the chance of neither girl being named Florida is 25% (.5 x .5), therefore the chance of at least one Florida is 75%, not 100%. Mlodinow does mention that we should assume parents won’t give their kids the same name, which breaks the common “independant” clause of simple statistics. For instance if the oldest is named Florida, the chance that their second child will be named Florida is now zero. Think of it as sampling without replacement. After thinking on it, I realized that this doesn’t affect the calculations at all since we don’t care about the total number of Floridas, just the number of families with a Florida. If a family names their oldest child Florida, we don’t care what they name their second child. Florida or not, we’re counting that family anyway. When Googling this problem this was a huge source of controversy, but my stance is that it can be safely ignored.
The logic is more subtle but no less important when dealing with a 1 in 1 million chance. The chance that if one of the children is a girl named Florida, that both children are girls, ranges from 1/3 at near-zero Florida rates to 1/2 at 100% Florida rate. (If all girls are named Florida, all 25 million two-girl and 50 million one-girl families will remain.)
|Odds a girl is named Florida||# One-girl families with Florida||# Two-girl families with Florida||% Two-girl families|
|1 in 1,000,000||50||<50||49.9999875%|
|1 in 1,000||50,000||49,975||49.987%|
|1 in 10||5,000,000||4,750,000||48.7179%|
|1 in 2||25,000,000||18,750,000||42.85%|
|9 in 10||45,000,000||24.750,000||35.48%|
|10 in 10||50,000,000||25,000,000||33.3%|
The answer isn’t 50%, it’s 49.9999875%
This is important not because of the extremely small difference in value, but because the author doesn’t adequately explain that the solution exists on a continuum based of the rate of girls-named-Florida, but instead makes it sound as though the rate jumps from exactly 1/3 to exactly 1/2, which is not the case. The percent of two-girl families asymptotically approaches 1/2 at near-zero Florida rates but will never hit it. While I think the author understands this, while searching for further discussions online it’s clear that not everyone does.
Jolly Blogger took a stab at it in one comment thread.
Holy crap, I’ve just skimmed most of the comments, but I think you guys are over thinking this. Work through it just like the first problem.
Here are all of the possibilities for two children if we allow three “types” (B-boy, F-girl named Florida, G-girl not named Florida): BB, BF, BG GB, GF, GG FB, FF, FG
The problem says we need at least one F, which leaves us with: BF, GF, FB, FG, FF
We need to make one assumption: that Florida is a rare name, and the probability of two Floridas in one family is nearly zero, so now we have four equally likely possibilities: BF, GF, FB, FG
Two of these are 2 girls, so the probability is 2/4 or 1/2.
It may be more accurate to say the probability is ever so slightly greater than 1/2… and assuming nothing about the probability of the name Florida, we can say the probability of two girls lies somewhere between 1/2 and 3/5.
So close and yet so far. (It’s slightly less than 1/2, not slightly more.) His error lies here: “we have four equally likely possibilities: BF, GF, FB, FG”. At near-zero Florida rates, those options are close to (but not exactly) equally likely. However, as you raise the rate of Floridas you will see “BF” and “FB” becoming more likely than “GF” and “GB”. I ran 100 million computer simulations of two-child families and threw out families without a girl named Florida:
|Odds a girl is named Florida||BF||GF||FB||FG|
|1 in 1,000,000||~25%||~25%||~25%||~25%|
|1 in 1,000||24.9%||25.2%||25.0%||24.9%|
|1 in 10||25.15%||24.84%%||25.14%||24.87%|
|1 in 2||33.33%%||16.66%||33.34%||16.66%|
|9 in 10||45.46%||4.54%||45.45%||4.55%|
|10 in 10||50%||0%||50%||0%|
As the Florida rate rises, the chances of GF and GB drop. At 100%, there’s no such thing as GF or GB – all girls are named Florida.
Are we out of the woods yet?
My second beef with the author is that, after presenting us the solution to this mind-bender, he doesn’t explain why the seemingly irrelevant rarity of a girls name affects the probabilities above.
It took me a while to wrap my head around it, but let me present a scenario that will help. Florida State is playing Miami in a basketball game and it’s tied with one second left. Florida State is on the free throw line. What are the chances that FSU will win the game assuming they have one, or two, free throws left? First of all, we’ll need to know the free throw percentage of the player about to shoot. If it’s 98%, the surprising revelation is you don’t really care whether he has 1 or 2 shots. FSU is almost certain to win. With 1 shot, the team’s chance of winning is 98%, and with 2 shots it nudges up to 99.996% – just a 2% rise.
What if he’s a 60% shooter? Then the chances of a win are 60% with one free throw and 84% with 2 shots – a 40% rise! See where I’m going with this? As the shooting percent falls, it’s more and more important to get that second shot. Indeed, a 10% shooter will benefit 90% by taking two shots instead of one. Remember, we’re not concerned with the number of shots made, just that he makes at least one.
Back to Mladinov’s “lottery” theory. As the rate of girls named Florida falls, families with two children benefit disproportionally more than families with one child when looking for a girl named FloridaOr any independent rare condition – it could be a name, disease, or IQ, just like a poor basketball shooter benefits from having two free throws more than a great shooter. The rate of benefit of the extra lotto ticket will approach 100% as Florida rate approaches zero, which is why the ratio of two-girl families climbs if you know that one of their girls has the 1 in 1 million name.
So there you have it.