# Sample Size

Whenever scientists want to test a null hypothesis, they need to consider their sample size. That is, how many individuals or samples will be measured per group. The sample size can even determine what kind of statistics you can use with your data, and how confident you can be in the results.

It is impossible to test everyone in a population, so scientists and experimenters still need to make sure they can account for individual variability. Looking at height across Ireland and the USA with a small sample size, our results will not be representative of the populations.

Imagine a scenario where you take your random sample. In the USA, you happen upon three people that are taller than two meters. Meanwhile, in Ireland, the first three people you find are closer to one and a half meters.

You could conclude that the Irish are shorter than Americans, but of course, you would be wrong. The more people that we sample from the population, the closer the values come to approximating the entire population. Here, even without advanced knowledge of statistics, a sample size of three is too low.

## Technical and Biological Replicates

Calculating a sample size isn’t always so straightforward. If we are looking at gene expression from three individuals, we could take three blood samples from each of them. If we have three individuals, and nine total blood samples, what is our sample size?

In this case, we have technical replicates of the blood. These replicates are not independent of each other, they all come from the same person. They can not give us good information about differences between groups or populations. However, they give us a good idea if our experimental measurements are accurate.

To determine our sample size, we must look at biological replicates or the number of independent samples from which we measure dependent outcomes. But what if we have two extremely similar samples, genetic twins or triplets? We can calculate the correlation between these two samples and calculate the effective sample size.

For twins with a correlation of 0.85 for their genetic similarity, we can use the following formula:

Plugging in the numbers, this would equal around 1.18. These two twins would therefore count as 1.18 effective samples.

## Takeaway

Friends do not let friends use low sample sizes. In many cases, a low sample size can hinder the conclusions you are able to make. For example, in gene sequencing studies, a certain number of samples are required to find true group differences.

When your sample is too small, you are not catching the variation that exists within the population. Instead, you are more likely to pick up on bias and end up with an unreliable result. By increasing the sample size, we can get more accurate information about any differences that might exist within the population.