Simpson’s Paradox (also sometimes known as the reversal paradox or Simpson’s reversal) refers to the phenomenon in which a trend or result that appears in multiple groups of data no longer appears—or in fact reverses—when the groups are combined.
This leads to some surprising results, best demonstrated by some examples:
Example 1: University Grades
Let’s say student A received higher average grades than student B both last semester and this semester. However, it is possible that student B has higher average grades across both semesters.
How is this possible? The key is both student A and student B did much better in one semester than the other, and student B took more classes in the better semester:
- Student A, semester 1: 4.0 across 3 classes
- Student A, semester 2: 3.0 across 5 classes
- Student B, semester 1: 3.8 across 5 classes
- Student B, semester 2: 2.8 across 3 classes
This means student A has an average of (4.0 x 3 + 3.0 x 5) / 8 = 3.375 and student B has an average of (3.8 x 5 + 2.8 x 3) / 8 = 3.425. So despite receiving worse grades on average in both semesters, student B has an overall higher average grade due to doing better in the semester with more classes.
Example 2: Batting Averages
A well-known example by mathematician Ken Ross involved baseball batting averages for Derek Jeter and David Justice in 1995-1996.
- In 1995
- Justice had a 0.253 batting average over 411 at bats
- Jeter had a 0.250 batting average over 48 at bats
- In 1996
- Justice had a 0.321 batting average over 140 at bats
- Jeter had a 0.314 batting average over 582 at bats
- Justice had a (0.253 x 411 + 0.321 x 140) / (411 + 140) = 0.270 batting average
- Jeter had a (0.250 x 48 + 0.314 x 582) / (48 + 582) = 0.310 batting average
Even though Justice had the higher batting average in each year, Jeter actually had the higher batting average over the two years!
How does Simpson’s Paradox occur?
Generally speaking, you can simplify it to two conditions:
First, even though the statistics for the first person (Justice in Example 2) are higher than the statistics for the second person (Jeter) in each respective group (1995 and 1996), the statistics for the first person cannot all be higher than all statistics for the second person.
Justice’s 0.253 in 1995 was higher than Jeter’s 0.250 in 1995, but not higher than Jeter’s 0.314 in 1996.
If both statistics had been higher, no combination could bring the second person’s average (Jeter) higher than the first person (Justice).
Second, the first person must have greater relative representation in the group with the lower statistic than the group with the higher statistic, compared to the second person’s representation in those respective groups.
This is the key. Even though the first person has higher statistics in each group (0.253 > 0.250 and 0.321 > 0.314), the combination for the first person is more heavily weighted towards its lower statistic, relative to the second person.
Justice had many more at bats in his lower year, 411, compared to his higher year, 140. Jeter was the opposite with 48 in his lower year compared to 582 in his higher year. So Jeter’s combined statistic was more heavily weighted towards his better year, and Justice was the opposite, resulting in the unintuitive Simpson’s Paradox.
If you found Simpson’s Paradox interesting, you might also appreciate a related statistical phenomenon, Berkson’s Paradox.