Signal and Noise: How to Understand Data?

“In the statistical world, what we see and measure around us can be considered as the sum of a systematic mathematical idealized form plus some random contribution that cannot yet be explained.”

[The Art of Statistics, David Spiegelhalter]

The famous book by Nate Silver, The Signal and the Noise, said how to find the signal from the noises. Since the level of the signal and the noise totally depend on the quality of data, it is really hard to distinguish these perfectly from the data. Also, it requires prior knowledge, intuition, and experiences about the data. So, all statistical models have two components: (deterministic) mathematical formulation and (stochastic) residual error. Hence, when we make a statistical model for analyzing the data, we need to check what we know (mathematical form) and what we don’t know (randomness). The name “residual error” seems to refer a bad model but it is not. Of course, the large residual error may stem from the bad choice of the model but this error often stems from the lack of our knowledge, the lack of data, or the data acquisition method.

When we analyze data, we don’t need to make a perfect model (actually it is impossible due to the aforementioned issues). If we try to make an errorless model, we can be struggling with overfitting issues, leading to the worst model without any significant finding. Instead, we provide both mathematical formulations and the corresponding residual errors. That is the only thing the statistician can do. Our life is the same. We don’t self-flagellate much when our life plan fell through. This is not our mistake but randomness in our life. If the failure of the plan came from our mistake, we fail several times in a row and then we can check what we did and adjust our plan (or mindset). If not, we may make a successful comeback the next time by randomness. Hence, no matter which reason makes your plan fail, we do try more and more for the success of our life.