The Whole is Different the Sum of its Parts

“When a whole body of data displays one trend, yet when broken into subgroups, the opposite trend comes into view for each of those subgroups.”

[Weapons of Math Destruction, Cathy O’neil]

In statistics, Simpson’s paradox shows the difference in trends between the whole group and subgroups. This result shows that we often mislead the statistical result from the data-driven model, leading to the wrong causation for some phenomena in our world.

We pay particular attention to make a model to avoid this misconception. A general model often fails to predict the behavior of subgroups. And also, we CANNOT guarantee that the combination of specific models predicts the coarse-scale behavior of the whole system effectively. Hence, there should be a proper balance in the use of Big Data.