Sample variance

A sample variance refers to the variance of a sample rather than that of a population. Variance is a statistical measurement of variability that indicates how far the data in a set varies from its mean; a higher variance indicates a wider range of values in the set while a lower variance indicates a narrower range. It is often used alongside other measures of central tendency such as the mean, median, and mode, which can sometimes provide an incomplete representation of the data. For example, two sets of data may have the same mean, but very different shapes based on the variance:



In the above figure, both sets of data have the same mean, but very different distributions. The distribution outlined in blue has a much higher variance than the distribution in green. Given only the mean of both sets of data, one might conclude that the data is the same, or very similar, but given the variance, we can see that the data is actually quite different. This is one of the reasons it is important to consider a number of statistical measures: different measures provide different information, and no single measure can really tell us everything that we can learn from a given set of data.

Sample vs. population

In the context of statistics, a population is an entire group of objects or observations. A statistical population does not have to be some group of people; it can consist of heights, weights, test scores, temperatures, and so on.

While a population represents an entire group of objects or observations, a sample is any smaller collection of said objects or observations taken from a population. Sampling is often used in statistical experiments because in many cases, it may not be practical or even possible to collect data for an entire population. For example, it may not be practical to collect weight data for all the students attending a large university. However, data can be collected from a sample of the students, and statistical measures (including variance) can be used to make inferences about the rest of the population based on the sample.

Sample variance formula

The sample variance, s2, can be computed using the formula

where xi is the ith element of the sample, x is the mean, and n is the sample size.

The value of the expression

is referred to as the sum of squares (SS). It is an expression that is worth noting because it is used as part of a number of other statistical measures in addition to variance. Generally, a higher sum of squares value indicates a larger degree of variability while a lower value indicates that the data varies less relative to the mean.

Since data sets in experiments are typically large, statistical measures such as variance are commonly computed using a calculator or computer. Just to demonstrate the use of the formula, a worked example is provided below.

Find the variance given a sample of the number of hours of sleep a group of students get the night before an exam:

7, 6.5, 6, 5, 5, 4, 4, 3, 3, 2.5

The sample mean is:

The sum of squares is:

SS =
= (7 - 4.6)2 + (6.5 - 4.6)2 + (6 - 4.6)2
  + (5 - 4.6)2 + (5 - 4.6)2 + (4 - 4.6)2
  + (4 - 4.6)2 + (3 - 4.6)2 + (3 - 4.6)2
  + (2.5 - 4.6)2
= 21.9

The variance is:

Thus, the sample variance is 2.43 hours2.

Notice that the variance for the above example is in terms of hours2. One of the drawbacks of variance is that it results in a value that is difficult to interpret at face value. Standard deviation, another statistical measure of variability, accounts for this since it is the square root of variance, so it results in units of measurement that are consistent with the data.