Histogram

A histogram is a type of chart used to represent the distribution of a set of data. The width of the bars in a histogram represent what is referred to as a "bin" or "bucket."

A bin is an interval into which a given set of data is divided; if we have a range of values from 1-100, we could create bins in intervals of 20. So, we could have the first bin contain values from 1-20, the second 21-40, third 41-60, and so on until 100. This gives us an idea of how many of the values in a given data set fall within a certain range of values, i.e., the distribution of the data. Bins are usually equal in size, but do not necessarily have to be.

Example

The histogram below is a representation of the table on its left. The bins are in intervals of 20 from 0-100. The count, or frequency, is how many numbers in our assumed data set fall into each bin. From the histogram, we can see that most of our values fall within the 41-60 bin and taper off on either end of this bin. The shape of the histogram here is referred to as a symmetric, unimodal distribution.

Bin Count
1 to 20 12
21 to 40 30
41 to 60 65
61 to 80 35
81 to 100 15
 

Histogram distributions


Below are examples of some of the histogram distributions you may encounter, and their names. Unimodal, bimodal, and multimodal refer to the number of modes in the distribution, which in a histogram, are the peaks, referred to as local maxima. The "local" refers to how there can be multiple maxima in the histogram.

Symmetric, unimodal distribution Skewed right distribution Skewed left distribution
Bimodal distribution Multimodal distribution Symmetric distribution

In all of these, the bin widths were the same. Given a set of data, plotting the data using varying bin widths can be helpful. In some cases, having smaller bin sizes may reveal some information that we would otherwise miss. In the examples above the bin size is quite large. It's entirely possible, for example in the symmetric distribution, that we could have a case where smaller bin widths could reveal a multimodal distribution.


Histogram vs. bar graph


On the left we have an example of a histogram and on the right a bar graph.

Although a histogram looks like a bar graph, they are not the same, and convey different information. While a histogram is used for continuous data, bar graphs compare categorical data (data that takes on one, or a limited number of possible values). Furthermore, the rectangles in a histogram are always adjacent (this may not be apparently if a bin is empty). Although bar graphs are often drawn in the same way, it can be helpful to leave spaces between each bar when drawing a bar graph to make it clear that it is a bar graph, not a histogram.

Also, unlike a bar graph where the width of the bar does not have any special meaning, in the histogram, the width of the graph represents the size of the bin. In the example above, the bars represent the range of weights from 0 pounds to 165 pounds divided into bins of 10 pounds each.