Correlation ≠ Causation

Cheese is very dangerous. Did you know that per capita cheese consumption is correlated to the number of people that die by getting tangled in their bedsheets? The correlation coefficient after all, is almost 90%. But wait a minute, that doesn’t really make so much sense.

After all, eating cheese and getting tangled in bedsheets aren’t really related. While there is a correlation between these two values, it is unlikely that it is actually meaningful.

As humans, we always look to find causal factors for different events in our lives. This carries over to the way we look at statistics in our daily lives. You’ve probably heard the phrase that “correlation does not equal causation.” But sometimes when we’re presented with particularly juicy data, we might forget this simple rule.

Bogus Correlations

When you’re conducting a large study, measuring many different outcome variables – you’ll find a correlation simply by chance. Bad research practices include manipulating your data until you find the results that you like. Don’t do it, folks!

You can definitely see where this gets problematic. If I want to prove an association or correlation between two different factors, all I need to do is play with the data. Psychologist Daryl Bem himself showed just how important a good understanding of statistics is when he published several misleading papers that “proved” psychics exist.

There are however certain cases where your correlations aren’t bogus, but they aren’t causal either.


Other Explanations for Correlations

There are other explanations for correlation between two different variables. They might be correlated because they are different scales for measuring the same outcome. You can also verify this through Cronbach’s alpha.

In another case, there might be a hidden or confounding third variable that causes this correlation. The variables A and B might only be highly correlated because they are mediated or even caused by the same hidden variable.

From a statistical standpoint, it is extremely difficult (and sometimes impossible) to convincingly prove causation. Nonetheless, there will be cases where one variable will have a causative relationship to the other, rare as it may be.

Remember that oftentimes, one variable is mediated by many different independent factors. Some statisticians and scientists are moving to a model of proportionality, where they look to see what proportion of a measurement is explained by an independent variable.



Be very cautious and conservative when you describe correlational data and studies. Very seldom will you collect enough data and run enough analysis to prove a causal link. Strong correlations imply a few different scenarios, each of which tells you something interesting about your data:

  1. Your dependent variables are interrelated and may be different scales for measuring the same phenomenon
  2. Your correlation might be spurious if the two variables have nothing to do with each other
  3. There is a hidden third variable or confounding factor that is mediating both variables

Importantly remember that most correlation doesn’t equal causation, but sometimes it does. Causation also does not require any linear correlation.