# False Positives and False Negatives

## Test Says "Yes" ... or does it?

When you have a test that can say "Yes" or "No" (such as a medical test), you have to think:

- It could be
**wrong**when it says "Yes". - It could be
**wrong**when it says "No".

### Wrong?

It is like being told you **did** something when you **didn't**!

Or you didn't do it when you really did.

They each have a special name: **"False Positive"** and **"False Negative"**:

They say you did | They say you didn't | |

You really did | They are right! | "False Negative" |

You really didn't | "False Positive" | They are right! |

Here are some examples of "false positives" and "false negatives":

**Airport Security**: a "false positive" is when ordinary items such as keys or coins get mistaken for weapons (machine goes "beep")**Quality Control**: a "false positive" is when a good quality item gets rejected, and a "false negative" is when a poor quality item gets accepted. (A "positive" result means there IS a defect.)**Antivirus software**: a "false positive" is when a normal file is thought to be a virus**Medical screening**: low-cost tests given to a large group can give many false positives (saying you have a disease when you don't), and then ask you to get more accurate tests.

But many people don't understand the true numbers behind "Yes" or "No", like in this example:

## Example: Allergy or Not?

Hunter says she is itchy. There is a test for Allergy to Cats, but this test is not always right:

- For people that
**really do**have the allergy, the test says "Yes"**80%**of the time - For people that
**do not**have the allergy, the test says "Yes"**10%**of the time ("false positive")

Here it is in a table:

Test says "Yes" | Test says "No" | |

Have allergy | 80% | 20% "False Negative" |

Don't have it | 10% "False Positive" | 90% |

Question: If 1% of the population have the allergy, and **Hunter's test says "Yes"**, what are the chances that Hunter really has the allergy?

Do you think 75%? Or maybe 50%?

A similar test was given to Doctors and most guessed around 75% ...

... but they were very wrong!

(Source: "Probabilistic reasoning in clinical medicine: Problems and opportunities" by David M. Eddy 1982, which this example is based on)

There are three different ways to solve this:

- "Imagine a 1000",
- "Tree Diagrams" or
- "Bayes' Theorem",

use any you prefer. Let's look at them now:

### Try Imagining A Thousand People

When trying to understand questions like this, just imagine a large group (say 1000) and play with the numbers:

- Of 1000 people, only
**10**really have the allergy (1% of 1000 is 10) - The test is 80% right for people who
**have**the allergy, so it will get**8 of those 10 right**. - But 990
**do not**have the allergy, and the test will say "Yes" to 10% of them,

which is**99 people**it says "Yes" to**wrongly**(false positive) - So out of 1000 people the test says "
**Yes**" to (8+99) =**107 people**

As a table:

1% have it | Test says "Yes" | Test says "No" | |

Have allergy | 10 | 8 | 2 |

Don't have it | 990 | 99 | 891 |

1000 | 107 | 893 |

So 107 people get a "Yes" but only 8 of those really have the allergy:

8 / 107 = about 7%

So, even though Hunter's test said "Yes", it is still only **7% likely** that Hunter has a Cat Allergy.

Why so small? Well, the allergy is so rare that those who actually have it are greatly **outnumbered** by those with a false positive.

### As A Tree

Drawing a tree diagram can really help:

First of all, let's check that all the percentages add up:

0.8% + 0.2% + 9.9% + 89.1% = **100%** (good!)

And the two "Yes" answers add up to 0.8% + 9.9% = **10.7%**, but only 0.8% are correct.

0.8/10.7 = **7%** (same answer as above)

### Bayes' Theorem

Bayes' Theorem has a special formula for this kind of thing:

P(A|B) = \frac{P(A)P(B|A)}{ P(A)P(B|A) + P(not A)P(B|not A)}

where:

- P means "Probability of"
- | means "given that"
- A in this case is "actually has the allergy"
- B in this case is "test says Yes"

So:

**P(A|B)** means "The probability that Hunter actually has the allergy given that the test says Yes"

**P(B|A)** means "The probability that the test says Yes given that Hunter actually has the allergy"

To be clearer, let's change A to **has** (actually has allergy) and B to **Yes** (test says yes):

P(has|Yes) = \frac{P(has)P(Yes|has)}{ P(has)P(Yes|has) + P(not has)P(Yes|not has)}

And put in the numbers:

P(has|yes) = \frac{0.01×0.8}{ 0.01×0.8 + 0.99×0.1}

= 0.0748...

Which is about **7%**

Learn more about this at Bayes' Theorem.

## One Last Example

### Extreme Example: Computer Virus

A computer virus spreads around the world, all reporting to a master computer.

The good guys capture the master computer and find that a million computers have been infected (but don't know which ones).

Governments decide to take action!

No one can use the internet until their computer passes the "virus-free" test. The test is 99% accurate (pretty good, right?) But 1% of the time it says you have the virus when you don't (a "false positive").

Now let's say there are **1000 million** internet users.

- Of 1 million
**with**the virus 99% of them get correctly banned = about**1 million** - But false positives are 999 million x 1% = about
**10 million**

So a total of **11 million** get banned, but only 1 out of those 11 actually have the virus.

**So if you get banned there is only a 9% chance you actually have the virus!**

## Conclusion

When dealing with false positives and false negatives (or other tricky probability questions) we can use these methods:

- Imagine you have 1000 (of whatever),
- Make a tree diagram, or
- Use Bayes' Theorem