The Most Misunderstood Part of Statistics: Experimental vs Null Hypothesis

Whenever a science story hits the news, often we will hear that something is statistically significant. For instance, “eating an apple a day, has a significant effect on keeping the doctor away”. This is the crux of many experiments, determining whether there is a difference between the means of two independently samples groups. This is best measured through the P-value (do not worry if you have never heard of it, we will explain it step-by-step).

What exactly are we testing and what does it mean when he have a non-significant P-value?

 

The Null and Experimental Hypotheses

In any experiment, there are two competing hypotheses. The experimental hypothesis (H1) states that there is a significant difference between the means of the two groups. The null hypothesis (H0) states that there is no difference between these two groups. A P-value provides us with information about our null hypothesis.

We first pick our false-positive error rate, alpha, which sets our threshold for error. Commonly, this is set at 0.05 or 0.01.

In recent years, the misuse of the P-value is as a source of error in many studies across the psychological. Often, we make common errors when we try to explain, understand, or use the sacred P-value. As you may be aware, this is a problem.

Almost every field in the sciences and social sciences tests hypotheses and arrives at P-values for interpretation. However, there are a lot of nuance and subtlety to unravelling the meaning that it holds. Does P-value tell us about the magnitude of the effect? Or the probability that there is a difference between the groups? Can it confirm an experimental hypothesis?

 

Definition  – Let’s Get Technical

The definition of the P-value is as follows: The probability of the observed result, plus more extreme results, if the null hypothesis were true.

The null hypothesis (often denoted H0) asserts that there is no difference in the mean/ranked-order of an independently-sampled variable sampled independently from two groups. These groups might be a treatment group or a control group, for example. Individuals within these groups are sampled from a larger pool, called the population.

Thus, probability tells us the chances of receiving a certain test result if there is no true difference between the two groups.

The Pvalue does not tell us the probability of the null hypothesis is true.

A P-value of 0.03 does not mean that there is a 3% null hypothesis is true. Recall that the statistical test is conducted upon the assumption that the null hypothesis is already true! It follows that the P-value does not tell us about Type-I or false-positive errors. After all, how does one conclude a false positive when we assume that the H0 is true?

If our P-value is higher than the threshold we set, it does not mean there is no real difference between the groups.

Recall that we are sampling values from a population. With such a sample, it is always possible to reach a non-significant conclusion within your sampled groups, especially if they are small. It is always entirely possible that either we are incorrectly applying assumptions or statistical tests, not reaching the appropriate conclusions. Therefore if we do not see a significant difference after testing, does not mean there is no difference between our groups.

Additionally, two studies assessing the same experiment may receive different P-values because they are different samples. Even if one study shows significance, while another does not, it is inaccurate to say that these two studies disagree.

 

Statistically significant results are meaningful and important

A statistical test alone cannot tell us what the data means. If we are conducting many statistical tests without properly correcting for this, we are prone to bogus correlations. If we find a significant value, it does not tell us anything about the magnitude of the effect, or the confidence we have in this effect. When possible, the relevance of a P-value should be contextualized by assessing the experimental question, the effect size and the confidence interval.