Chi-Squared Test

You and your friend bought a pizza with thirteen pieces. After demolishing six pieces, both of you rush for the thirteenth piece. However, as researchers and pizza connoisseurs, you decide to settle things peacefully. Your friend takes out a coin and will flip it. Heads, the pizza is yours. Tails, your friend gets the pizza.

You lose the coin toss, again. For whatever reason, your friend has won the coin flip fifteen times in a row. Meanwhile, the last ten times you flipped a coin, you only won half the time. You begin to suspect a conspiracy is afoot; luckily you can use the Chi-Squared test to see if there is something fishy going on.

The Chi-Squared test can measure the difference between expected and observed frequencies of categorical variables. It can also test whether the two variables are related or independent from one another. In this case, you can figure out if your friend is using a trick coin.

You ask your friend to toss the coin fifty times, and you record which side it lands on.

 

 

 

 

Our Chi-squared statistic is calculated by taking the sum of the observed minus expected values in each row, dividing them by the expected values. Here this would equal the following:

(41-25)2/25 + (9-25)2/25 =  20.48

Then we can take a look at the Chi-square distribution with an alpha level of 0.05. Our degrees of freedom is 1, since we only have two categories of outcomes. With a two-tailed test, we look for a critical value under 0.975 and 0.025 and find that it is 0.000982 and 5.024 respectively.

Since our observed value isn’t within this range, we reject the null hypothesis that the coin tosses were balanced. Congratulations, you have now proven that your friend has been cheating you out of pizza. Next time, you’ll make sure to flip the coin.

 

Takeaway

The Chi-Squared test is useful for testing a null hypothesis when you have categorical outcomes. It works for looking at coin tosses, levels of education, and many other variables. By looking at expected values in your sampling, you can determine whether you are collecting data that deviates from the expected.

Here, the expected values are essentially outcomes from a control group. Given the null hypothesis, you should not expect that your observed values will differ from them. Essentially, this is just a T-test with extra tables. Here are some other considerations for using this test:

  1. Are all of your dependent variables categorical?
  2. Is your null hypothesis one-tailed or two-tailed?
  3. What kind of error are you willing to tolerate?