In statistics, a population is the total group from which a sample is taken. A statistical population does not have to be some group of people, and can be more abstract. It can be any complete set of observations or objects. For instance, the ACT scores of all students in a specific high school for a given year can be described as a population.
Populations are an important concept in statistics and are often discussed alongside samples. Generally, the goal of inferential statistics is to predict or estimate some population parameter based on statistics that are obtained from experiments on a subset, or sample, of the population of interest.
Population vs sample
A population consists of a complete collection of observations. In contrast, a sample is any smaller collection of observations that are taken from a population. The computation of many statistical measures, such as variance and standard deviation, are highly dependent upon whether the collected data represents a population or a sample.
It is common for samples to be used in statistical experiments because in many cases it is too difficult or expensive to collect data for an entire population. As an example, the United States Census, which involves collecting data from every household in the United States, is conducted only once every 10 years. It requires collecting data from a country with a population of over 300 million people at the cost of billions of dollars to the US; the 2000 Census cost the United States $4.5 billion dollars while the 2020 Census cost around $15.6 billion.
Rather than having to collect data from an entire population to learn more about it, inferential statistics makes use of statistical tools to design experiments using samples of the population of interest. As long as proper statistical procedures are used to ensure that the collected sample is representative of the population of interest, sample statistics can be used to estimate or predict population parameters.
Real vs hypothetical populations
Statistical populations can be further discussed as real or hypothetical populations. A real population refers to one where, at the time the sample is taken, all of the potential observations are accessible. The examples described above regarding ACT scores as well as the United States Census both constitute real populations.
Unlike real populations, not all potential observations are available for a hypothetical population. This is the case in many experiments. Usually, subjects for an experiment are chosen from very small real populations, such as students in a psychology class. These subjects are viewed as being part of a larger hypothetical population that is similar in many ways to the group of subjects.
In inferential statistics, generalizations need to be made based on real populations that have been sampled. However, this does not mean that generalizations cannot be made using hypothetical populations. Instead, it is important to view generalizations based on hypothetical populations as provisional conclusions whose merit can only typically be measured through further experimentation.