Sampling

Sampling is a technique used in statistics to select a subset (sample) of individuals from a statistical population in order to make inferences about the characteristics of the population as a whole.

Sampling is beneficial because in many cases, it is not possible or practical to study an entire population, since it may be too difficult or too expensive to do so. For example, if a university has 40,000 students, it would not be practical to compile height data on every single student in order to draw conclusions about the height characteristics of the student population. Instead, a researcher may collect data from a random sample of students to estimate characteristics of the student body.

In order for any inferences or estimates made about the population to be valid, it is important for the sample to be representative of the population. Various sampling methods can be used, some of which result in data that will more effectively represent a population. Sampling methods can generally be categorized as probability sampling and non-probability sampling.

Probability sampling

Probability sampling is a sampling method in which each element of the population has a determinable non-zero chance of being selected in a random sample; this makes it possible to weight characteristics of selected elements based on their probability of being selected. There are several types of probability sampling, all of which involve random selection from a population whose elements have determinable non-zero probabilities.

Simple random sampling

Simple random sampling (SRS) is a sampling method in which each element of a population has an equal probability of being selected. It is an unbiased sampling method that can also be used as part of other more complex sampling methods.

Advantages of SRS:

Disadvantages of SRS:

In regards to the second point, consider the example of a bag of fruit that contains 25 apples and 25 oranges. A random sample of the bag of fruit should, on average, yield an equal number of apples and oranges. However, any given sample may include many more apples than oranges, or vice versa, skewing the representation of the population.

Stratified sampling

Stratified sampling is a sampling method in which a population is divided into distinct categories, or "strata." Each stratum can then be sampled as a subpopulation (including using SRS) based on the subpopulation's representation within the population as a whole. For example, if a movie theater found that 40% of the population of moviegoers on a given day were male and 60% were female, this would need to be represented in a sample of the population: if 250 people went to the movies, then 100 were male and 150 were female. A random sample of these subgroups may therefore include 12 males and 18 females.

Advantages of stratified sampling:

Disadvantages of stratified sampling:

Systematic sampling

Systematic sampling (also referred to as interval sampling) involves creating an ordered list of each element in the population, randomly selecting the starting element, then selecting each subsequent element as every nth element. This periodic interval is referred to as the sampling interval. Given that the study population is the set of integers from 1-100, the randomly selected starting element is 22, and the sampling interval is 5, the following set represents the sample acquired through systematic sampling:

{22, 27, 32, 37, 42, 47...}

If the end of the list is reached but there are not enough values for the desired sample size, the count loops back to the beginning. For example, the element that would be selected after 97 in the example above is 2.

Advantages of systematic sampling:

Disadvantages of systematic sampling:

Cluster sampling

Cluster sampling is a sampling method that is used when a population can be divided into groups (referred to as clusters) that, together, are relatively homogenous, but individually, are heterogenous. It is often cheaper or more practical to use cluster sampling than it is to use other sampling methods.

For example, if a researcher wanted to study the percentage of 15-18 year olds who participate in sports in the city of Chicago, it would not really be possible, or would be tremendously expensive and impractical to collect data for every 15-18 year old in the city. In this case, researchers may use high schools in Chicago to represent clusters of 15-18 year olds. They could then acquire a random sample by selecting high schools in Chicago (clusters), then collect data from the sample of clusters.

The above is an example of single-stage cluster sampling because data from every element within selected clusters is collected. Multi-stage cluster sampling further reduces the size of the sampling units; rather than collecting data from every 15-18 year old in each selected school, a few classes in the school may be selected as clusters; data would then be collected from just these classes. This further reduces the amount of data that needs to be collected, but each time a sample is divided into clusters, the risk of the data being less representative of the overall population increases.

Advantages of cluster sampling:

Disadvantages of cluster sampling:

Cluster sampling vs. stratified sampling

Cluster sampling and stratified sampling are similiar in that they divide populations into groups, but are otherwise quite different.

Cluster sampling:

Stratified sampling:

Nonprobability sampling

Nonprobability sampling is a type of sampling that is commonly used for qualitative research. Unlike probability sampling, it is not possible to determine the probability of acquiring a given sample. Thus, it is not possible to use nonprobability sampling to make any inferences about the population based on the sample, and it is typically not useful for statistical quantitative research. There are a number of nonprobability sampling methods including convenience sampling, judgmental sampling, quota sampling, and snowball sampling.

Convenience sampling

Convenience sampling (grab sampling, accidental sampling, opportunity sampling) draws a sample from a part of the population based on characteristics that make individuals convenient study participants. For example, individuals who make convenient participants may be located close to the researchers (they could be people known to the researchers) or simply have ample time. The only criteria for a convenience sample is that the individual is able and willing to be a participant in the study. As a result of this, convenience sampling can often result in extreme bias.

Judgmental sampling

Judgmental sampling, also referred to as selective sampling, is a type of sampling in which the researcher targets particular members in the population based on certain criteria they believe will result in a sample that is appropriate for the study. Judgmental sampling can result in heavy bias based on the judgment or goals of the researcher.

Quota sampling

Quota sampling involves first dividing the population into mutually exclusive groups then selecting individuals from each group based on some specified proportion. For example, a researcher may decide to select a sample of 50 men and 75 women between the ages of 25 and 60 from a population of gym goers. This selection is not random, so researchers may select specific men and women based on whether they seem approachable, or based on some other characteristics. This results in bias because people who do not meet the researcher's implicit criteria may not have a chance of selection.

Snowball sampling

Snowball sampling is a sampling method in which existing participants are asked to recruit new participants to the study. This method is useful when populations may be difficult for researchers to find or access, such as underage smokers, drug users, etc. Thus, when individuals who meet the criteria are found, researchers rely on these individuals to recruit others who meet the study criteria.