One of the most important aspects of communicating information is clarity. But often, we aren’t clear or specific when we talk about averages.
You might find many websites compare the average cost of living between different countries. Or you might be looking at the class average for your college course. What specifically are we talking about when we say average?
Measuring the Average
The mean is the simplest measure of central tendency. It is calculated by taking the arithmetic sum of each value in your dataset, dividing by the number of datapoints. However, this means that in smaller datasets, outlier values will skew your average.
Imagine five houses are for sale. Here are the asking prices: $250 000, $275 000, $300 000, $325 000, $1 500 000. One house is worth more than the rest of the houses combined, and when calculating the mean, it skews it. In larger datasets, you can detect and exclude these outliers. But in this small dataset, we need another way to calculate the average.
Using the median gives us a much better idea of the average price. The median takes the middle value in the dataset. Here, the median will be $300,000. If there were six numbers in this dataset, the median would be the midpoint between the third and fourth values in the dataset.
But not all data is quantitative – what if you’re working with nominal or ordinal values? These values don’t have any numerical value but they still convey some form of information. Here, the best way to calculate the average is the mode. Simply put, it’s the most common value in your set.
You have 5 black sweaters, 3 brown sweaters, and 2 yellow sweaters. If you were to select one at random, you’d likely pick out a black sweater. This is the mode within this sweater set.
Finally, the range helps describe the spread of values in your dataset. For our housing prices dataset, our range would be the difference between the highest and the lowest value: $1 250 000.
More Ways to Describe the Spread of Your Data
How do you know that your dataset is reliable? Perhaps you might need to get more measurements or consider extraneous variables.
When you’re analyzing an experiment standard deviation (SD) and variation can quickly tell you if something is off. Standard deviation measures the average distance of each value from the mean.
Sample standard deviation s first calculates the distance of each value from the mean. These distances are squared and added together. This is divided by a denominator of N – 1, which is your sample size minus one. Finally, you take the square root of this value to get your SD. Population standard deviation is similar, except the denominator is simply N. Variance is another way to calculate the spread, which it does so by squaring the standard deviation.