The beautiful bell curve. Oh, how society loves to model everything to this “imaginary curve”. From salary incomes to leaving certificate grades, we’ve all heard of at least one use for this notorious distribution method.
The bell curve does what it says on the tin - depicts a curve in the shape of a bell. A lovely grand church bell. The bell curve does what it says on the tin - depicts a curve in the shape of a bell. A lovely grand church bell. To us statisticians, the common “bell curve” is the mother of all statistical distributions and one of the primary foundations to understanding data analysis. We like to call it the Normal Distribution.
There are many properties of the normal distribution that I find particularly intriguing. To start us off, did you know that the mode, median and mean all lie in the same place? The mode - the most common value within the data. The median - the value that lies smack in the middle of all the numerical data if you were to organise them from lowest to highest value.
Lastly, the mean - your bog-standard, typical “average” that we learn in primary school. Each of those values lie in the centre of the distribution. If you were to draw a vertical dotted line from the peak of the distribution down to the number line that the curve is being modelled on, then you will find that the mode, median and mean all lie on that line. This value is denoted by an unusual looking “u” with a tail: μ. We call this “mu”, pronounced “mew”. Cool.
The data collected is dispersed beneath our bell curve and we can measure the variation between the data points by means of a term called standard deviation. Think of them as steps:
One standard deviation is one step away from our “mew”, either left or right.
Two standard deviations are two steps away from our ”mew”, either left or right.
Similarly for three standard deviations... you get the idea.
What is fascinating about this property is that a certain percentage of our data lies within each “step” we take in both directions from the mean, μ:
The total percentage of data that lies within one standard deviation either side of “mew” is 68.2%, 34.1% left and 34.1% right.
95.4% of the data lies within two standard deviations of μ, 47.7% on either side.
Finally, 99.7% is within three standard deviations.
You best believe we have another fancy word for this property - it’s called the Empirical Rule. Very cool.
I hope you’re noticed by now that less data lies at the tails of our normal distribution. When I say “tails”, I refer to the far right and far left of our curve. How does this tie into topics we’ve discussed so far, you ask? Well, those small “tails” on either end is how we calculate our p-value. Remember when we discussed data lying within an interval of 95% and there was the small forgotten piece of 2.5% on either side.
This all referred to a bell-shaped curve distribution and data lying within a given number of standard deviations of the mean. Less data lies on either end, as we have clearly noticed, and we often analyse the probability of data lying within these smaller intervals. Look at us coming full circle!
The normal distribution models various types of natural phenomena: heights, blood pressure, IQ. It does it all. When collecting data, it may be worth considering if the normal distribution can be applied as it will make your life a WHOLE lot easier when conducting your analysis. Very very cool.
My name is Saoirse Trought and I am a Mathematical Sciences student at University College Cork, Ireland. Besides my obvious interest in all things maths, I happen to be fluent as Gaeilge. I enjoy sailing, staycation-ing and attempting to run UCC MathSoc. I'm looking forward to combining some of these interests with statistics and seeing where it takes us!