Median

The median is the middle number in a set of data when the data is arranged in ascending (this is more common) or descending order. If there are an even number of values in the data set, the median is the arithmetic mean of the two middle numbers. Thus, the median separates the lower and higher half of a data set.

To find the median of a set of numbers, arrange the numbers in order and then find the middle number.

Examples

1. When there is a middle number :

1, 3, 6, 9, 12

6 is the median, since it is the number in the middle.

2. When there is no single middle number :

8, 7, 5, 4, 2, 1

Since there is an even number of values in the data set, we need to find the arithmetic mean of the two middle numbers, 5 and 4:

So the median of this data set is 4.5.

3. If the data is not in order :

1, 6, 2, 42, 12, 15, 99

We need to first rearrange the numbers before attempting to determine the median. The median is not 42.

Rearranging the data in ascending order:

1, 2, 6, 12, 15, 42, 99

We see that the median is 12. Note that it is possible that we could've gotten lucky and that 12 could've been in the middle of the first set of data, but we should not be trying to determine the median unless the data is listed in ascending or descending order.


How to quickly determine the number in the middle


The data sets in the examples above only had a few numbers, so we could determine the middle values just by looking at them. For longer sets of data, if we know how many numbers there are in total (or can count them), we can perform basic division to quickly determine which number is in the middle:

  1. Determine how many numbers there are total
  2. Add 1 to this total
  3. Divide the total by 2
    • If there is an odd number of digits in step 1, the result of step 3 indicates the position of the median in the data set
    • If there is an even number of digits in step 1, round the result of step 3 up and down to the nearest whole number; these two numbers indicate the position of the two middle numbers in the data set. Find the arithmetic mean of these two numbers to find the median

Examples

Case 1 (odd): The data set has 131 digits

We add 131 + 1 = 132, then divide by 2 to get 66, so the 66th digit in the data set is the median.

Case 2 (even): The data set has 256 digits

We add 256 + 1 = 257, then divide by 2 to get 128.5, so the 128th and 129th digits in the data set are the two middle numbers. We find the arithmetic mean of the values in the 128th and 129th position of the data set. The result is the median.


Mean vs median


The mean and median are both types of averages. The mean is more useful for situations where there are few outliers in the data, so the data is not skewed. When the data is skewed, the median is a better representation of the data because unless more than half of the data is made up of outliers (values that are much bigger or smaller than the rest), the median won't be arbitrarily large or small; the mean on the other hand will be heavily skewed by overly large or small values making it less representative of the data.

Example

Income is an example of data that is commonly skewed. Generally, the few, highest income earners, earn vastly more than the majority of people. Modeling this on a small scale for simplicty's sake, assume that in a room of 50 people, 3 people earn $10,000,000 a year and 47 earn $55,000. Then we determine the median and the arithmetic mean:

Median: 55,000
Mean:
  = $651,700

Even though there are only three people who earn more than $55,000 in this example, and they comprise only 6% of the people in the room, because they earn vastly more, they skewed the mean income such that the value is almost 12 times more than what 94% of the people in the room earn. Clearly this is not a good representation of what the average person in the room earns.

In contrast, the median, since it is exactly in the middle of the values, isn't affected by either very large or very small values, and better represents the data since the median is what 94% of people in the room earn.

In this particular case, even if there were very small outliers (say if 2 out of the 47 people actually earned only $5,000 a year), they would not significantly affect the result, but in different scenarios they could have a significant effect. In such cases, the arithmetic mean would again, not be the best representation of the average value to use. In such a case, the median would still account for the very small outliers.


See also average.