Standard deviation

Standard deviation is a statistical measure of variability that indicates the average amount that a set of numbers deviates from their mean. The higher the standard deviation, the more spread out values are, while a lower standard deviation indicates that the values tend to be close to the mean.

Standard deviation is used throughout statistics, and in many cases is a preferable measure of variability over variance because it is expressed in the same units as the collected data while the variance (the square of the standard deviation) has squared units.

In science, standard deviation is commonly reported alongside the standard error of the estimate. Together, they are used to determine whether the effects or results of an experiment are statistically significant. 95% of values in a normal distribution typically fall within the first two standard deviations from the mean, or expectation, so only the remaining 5%, those that vary by more than two standard deviations, are typically considered statistically significant.

Standard deviation formulas

Like variance and many other statistical measures, standard deviation calculations vary depending on whether the collected data represents a population or a sample. Briefly, a sample is a subset of a population that is used to make generalizations or inferences about a population (σ) as a whole using statistical measures. Below are the formulas for standard deviation for both a population and a sample. In most experiments, the standard deviation for a sample is more likely to be used since it is often difficult or too expensive to collect data from an entire population.

For a population:

where N is the population size, μ is the population mean, and xi is the ith value.

For a sample:

where n is the sample size, x is the sample mean, and xi is the ith value.

Note that both the formulas for standard deviation contain what is referred to as the sum of squares (SS), which is the the sum of the squared deviation scores. The calculation of SS is necessary in order to determine variance, which in turn is necessary for calculating standard deviation. SS is worth noting because in addition to variance and standard deviation, it is also a component of a number of other statistical measures.

In the standard deviation formula for a population, . Similarly, in the standard deviation formula for a sample, . The only thing that changes is which mean is used in the formula.

Example

Determine the standard deviation of the following height measurements assuming that the data was obtained from a sample of the population.
Height (cm)
154
161
172
173
181

1. Find the sample mean:

2. Find the sum of squares (SS):

SS =
= (154 - 168.2)2 + (161 - 168.2)2 + (172 - 168.2)2 + (173 - 168.2)2 + (181 - 168.2)2
= 454.8

3. Divide SS by 1 less than the total number of scores in the sample then take the square root of the result:

Thus the standard deviation of the sampled height measurements is 10.663. As a general rule of thumb, s should be less than half the size of the range, and in most cases will be even smaller. This can be used as a cursory check for sizable computation errors. The range of the above scores is:

181 - 154 = 27

10.663 lies well within what we might expect, so while there may be other potential sources of error, the result is reasonable enough that we do not expect error due to computation.

Calculations for the standard deviation of a population are very similar to those for a sample, with the key differences being the use of the population rather than the sample mean, and the use of N rather than n - 1.