# Sampling

- Population: The whole group we are interested in
- Census: A collection of data from the whole population
- Sample: A collection of data from
**part**of the population

But how do we choose what members of the population to sample?

There are 4 main methods:

**Random Sample**

(pick randomly from list)

**Systematic Sample**

(such as every 4th)

**Stratified Sample**

(randomly, but in ratio to group size)

**Cluster Sample**

(choose whole groups randomly)

## Random Sampling

The **best** way is to choose **randomly**

Imagine slips of paper each with a person's name, put all the slips into a barrel, mix them up, then dive your hand in and choose some slips of paper.

But this means you need a **full list of the population** to choose from.

Computer databases can be a big help here!

### Example: You want to know the favorite colors for people at your school, but don't have the time to ask everyone.

Somehow get a full list of students printed out, then

- place all pages on the ground, drop a pencil and note down the student's name.
- repeat until you have 50 names.

Your results will * hopefully* be nearly as good as if you had asked everyone.

Random surveys are the best way to avoid bias.

And your results are better when you ask more people.

Example: nationwide opinion polls survey around 2,000 people, and the results are nearly as good (within about 1%) as asking everyone.

## Systematic Sampling

This is where we follow some system of selection like "every 10th person"

### Example: You want to know the favorite colors for people at your school, but don't have the time to ask everyone.

Solution: stand at the gate and choose "every 4th person to arrive"

Not perfect, as you will miss out on people who are away.

You could improve this by selecting from a full list of people then go and find them.

## Stratified Sampling

This is where we divide the population into groups by some characteristic such as age or occupation or gender.

Then make sure our survey includes people from each group in proportion to how many there are in the whole population.

### Example: Survey 100 People in Our Town

We know that teachers make up 7% of our town's population, so we should include:

100 x 7% = 7 teachers

### Example: We want to survey 300 people in the USA

This is the population breakdown for the USA in 2010:

Age Range | Percent |
---|---|

0-4 | 6.5% |

5-17 | 17.5% |

18-23 | 9.9% |

24-44 | 26.6% |

45-64 | 26.4% |

65+ | 13.0% |

100% |

We want to survey 300 people, so we choose:

Age Range | Percent | People |
---|---|---|

0-4 | 6.5% | 20 |

5-17 | 17.5% | 52 |

18-23 | 9.9% | 30 |

24-44 | 26.7% | 80 |

45-64 | 26.4% | 79 |

65+ | 13.0% | 39 |

100% | 300 |

I am not sure how to ask the 0-4 range, but we will think of something.

## Cluster Sampling

We break the population into many groups, then randomly choose whole groups.

Example: we divide the town into many different zones, then randomly choose 5 zones and survey everyone in those zones.

Cluster sampling works best when the clusters are similar in character to each other.

Example: if the town has rich and poor zones then try to create a new way of dividing the town into fairer regions. Also a good idea in general!