Finding the Cause from the Effect in the Age of Big Data

hello world inverse

“Just as it would be difficult to predict where the very next drop of water is going to fall, (…). But once the water has been spraying for a while and many drops have fallen, it’s relatively easy to observe from the pattern of the drops where the lawn sprinkler is likely to be situated.”

[Hello World: Being Human in the Age of Algorithm, Hannah Fry]

In science, an inverse problem is one of the research fields to extracts the hidden law (or the mathematical formula) from observation (data). That is, the inverse problem is to find the “cause” from the “effect”. It is a similar concept of profiling a serial killer in criminology. Through all the data of victims, we anticipate the character of the serial killer. We agree that more victims make an accurate prediction of the serial killer BUT we don’t want more victims. So, the important part of the inverse problem is to find the appropriate formulation from a small data set. However, as you see the quote, it is really hard to estimate something accurately with small data. This issue has been a bottleneck of the development of an inverse problem.

In the age of Big Data, on the other hand, we collect massive data set from individuals, autonomous systems, efficient measurements, or online websites, leading to accurate prediction of the cause. So many people thought that it is easy to solve the inverse problem using massive data; that is somewhat true and many research achievements about data-driven modeling that finds the underlying laws or governing equations (or a black box model) to describe the cause and effect directly from data. However, the inverse problem is now struggling with another issue – finding “right” causality. In big data, improbable things happen all the time. This may lead to the wrong causality of input/output data. For example, there is a possibility that the correlation between two variables stems from just coincidence but the algorithm cannot distinguish this coincidence and the real causality. Hence, the human check the data-driven causality based on rigorous way. That is why the fundamental mathematics/statistics are becoming important in the age of Big Data.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s