Statistical Model: A Map is Not the Territory

statistical model map

“A good analogy is that a model is like a map, rather than the territory itself. And we all know that some maps are better than others: a simple one might be good enough to drive between cities, but we need something more detailed when walking through the countryside. “

[The Art of Statistics, David Spiegelhalter]

When you visit Disneyland, you may not need a detailed map made by satellite information. We need only a simple cartoon map which includes the relative location of all the attractions. If you are a secret agent, you need a much more detailed map to investigate. The statistical model (or even data-driven model) is the same. The fidelity of the statistical models totally depends on the purpose of the use of the statistical model and the quality of data that has been fed into it.

When making a statistical model, there is a general trade-off between bias and variance (bias-variance dilemma). If we reduce the variance, the model may fail to approximate underlying ground truth (high bias). if we reduce the bias, on the other hand, the model is vulnerable to noise, leading to failure of approximation of ground truth (overfitting and high variance). This dilemma shows that we cannot make a perfect model from data. A map is a map. it is not the territory. A menu is a menu. it is not food. A statistical model is a model. it is not ground truth. So, we don’t need to overestimate (and also underestimate) the power of the statistical model. As a map is still useful to find a right path, a statistical model is useful to understand and predict the system.