home / probability and statistics / descriptive statistics

Descriptive statistics

Descriptive statistics is a branch of statistics that, through tools such as tables, graphs, averages, correlations, and more, provides us the means to use, analyze, organize, and summarize the characteristics of a given set of data. A "descriptive statistic" is also a type of datum that describes or summarizes a collection of observations or information.

Examples of descriptive statistics include:

A graph showing the change in temperature in a region over 1 year
A table of SAT scores for 11th graders in various schools for a given year
The average height of the people on a soccer team

Below are examples of some of the key tools and topics covered by descriptive statistics.

Tables and graphs

Tables and graphs, particularly of frequency distributions, are an important aspect of descriptive statistics. Frequency distributions are used to impose some order on the inevitable variabity in observed data to help us determine whether there are any patterns in the data. For example, if the distribution of SAT scores for one high school is vastly different from those of other schools in the area, it could be worth trying to determine why the discrepancy exists.

Because graphs can very effectively summarize data, in some cases a graph can be the end product of basic statistical analysis.

Measures of central tendency

Measures of central tendency, also referred to as averages, are commonly used descriptive statistics. Generally, a measure of central tendency the middle, or typical value for a distribution. Mean, median, and mode are three measures of central tendency used in statistics. Mean is arguably the most impoortant measure, both in descriptive and inferential statistics, but all three have their uses depending on what is being measured.

Measures of variability

Measures of variability (also referred to as spread) measure the degree to which data is scattered in a distribution. Measures of variability include variance, range, interquartile range, standard deviation, and more. These measures can provide information about data that measures of central tendency cannot. For example, if we were told that the mean depth of a fast-flowing river is 3 meters, and that the depth varies by 1-2 meters in either direction, we may not be too worried about the water being too deep. If, however, we didn't know this, or knew that the depth varied by 10-15 meters instead, we may be more hesitant to enter the river. Measures of variability can tell us the amount of variation in a given set of values and how far from the mean they may stray.

Normal distributions and Z-scores

A normal distribution, also referred to as a Gaussian distribution, is a type of continuous probability distribution that is commonly described as having a bell shape. It represents a distribution of data that is symmetric about the mean. It is named the "normal" distribution because many observed frequency distributions in a variety of fields and contexts exhibit a normal distribution.

Z-scores (also referred to as standard score) indicate the number of standard deviations by which a given value is above or below the mean of whatever is being observed or measured. A Z-score of 0 indicates that a given value is identical to the mean. A negative Z-score indicates that a value is below the mean and a positive Z-score indicates that it is above the mean. The larger the magnitude of the Z-score, the further the value is from the mean, with a Z-score of 1.0 indicating that the value is 1 standard deviation from the mean. Together, normal distributions and Z-scores can provide us information on how likely it is for a given value/observation to occur in a given set of data.

Correlation

Another aspect of descriptive statistics is finding correlations between pairs of variables. For example, is there a correlation between age and the number of times a person eats at McDonald's in a month? Descriptive statistics, particularly the use of scatterplots or correlation coefficients can be used to determine the answers to questions such as these.

Descriptive and inferential statistics

The study of statistics exists as a way to help us analyze and better understand variability in the world around us. Descriptive statistics is arguably the simplest form of statistics. It provides us with tools for organizing and summarizing variability in collections of data. There is no uncertainty in descriptive statistics, as it is not based on the assumption that a set of data represents a larger population. It only describes these sets of observed data in terms of attributes such as distribution (frequency), central tendency (averages), and variability (spread).

In contrast, inferential statistics uses observed data to make conclusions, generalizations, or predictions about a population. This involves the use of probability theory, which takes into account sampling error due to the sample size always being smaller than the population it is intended to represent. Thus, the validity of the conclusions drawn in inferential statistics is subject to factors such as sample size and the random sampling methods used. These are factors that do not need to be taken into consideration when using descriptive statistics.

Both descriptive and inferential statistics are widely used, often together, depending on the intent of the study. Generally, descriptive statistics is useful for observing patterns in data, while inferential statistics examines the sample data to make predictions about the relationships between variables in the data and how they may relate to the larger population.