The Blind Man and the Lame Man in Data Science

“But combining both the flawed government data with the imperfect night light data gives a better estimate than either source alone provide.”

[Everybody lies, Seth Stephens-Davidowitz]

In a well-known fable, “The blind man and the lame man”, they can cross the bridge together. ‘A blind man carried a lame man on his back, lending him his feet and borrowing from him his eyes’. This story tells us how the collaboration overcome the difficulties.

In the Big Data era, the same collaboration is highly required to analyze and extract useful information from Big Data. Generally, it takes a long time and is costly to obtain highly accurate data while less accurate data is cheap and fast to access. Then, we need to blend these two different data (a few highly accurate data and a lot of less accurate data) to better analyze data, leading to correct forecast and prediction.

Beyond Pride and Prejudice

“First, and perhaps most important, if you are going to try to use new data to revolutionize a field, it is best to go into a field where old methods are lousy.”

[Everybody lies, Seth Stephens-Davidowitz]

Our decisions are often based on prejudice, resulting in a bad ending. Specifically, our prejudice stems from wrong or weak cause-and-effect which totally depends on our limited experiences, pseudosciences, or popular misconceptions.

Big Data makes us escape from bias, provides the right cause-and-effect, and finally suggests the optimal choice. For example (in this book), a pedigree has been commonly regarded as a primary factor for choosing a racing horse but it is NOT. Big Data shows the size of the heart (specifically the left ventricle) is a much more important factor. But we should keep in mind that data science and Big Data is not always perfect. Biased and incomplete data also provides another data-driven prejudice and misconception.

David Hume as a Data Scientist

“Hume believed that we can’t be absolutely certain about anything that is based only on traditional beliefs, testimony, habitual relationships, or cause and effect. In short, we can rely only on what we learn from experience.”

[The Theory That Would Not Die, McGrayne, Sharon B.]

Empiricism put forth by David Hume claims that observation/investigation is the correct way to extend our cognitive capacities. However, individual experiences often are incomplete and biased due to limited experiences. Hence, practically, it is skeptical to gain certainty based only on personal experience, observations, and investigation.

Nowadays, in the Big Data era, we have enormous data collected from all the people in the world. Integrated data can provide unbiased and common observation about human nature. That is, it is time to recall Hume’s empiricism. In fact, the fundamental philosophy of machine learning is totally based on Hume’s empiricism and this would bring us closer to “Truth”. If Hume was still alive, he would be a Googler.

The Intrinsicscope

“The microscope showed us there is more to a drop of pond water than we think we see. The telescope showed us there is more to the night sky than we think we see. And new, digital data now shows us there is more to human society than we think we see”

[Everybody lies, Seth Stephens-Davidowitz]

Sometimes we see others’ incomprehensible behaviors and realize that individual human behaviors are too complex, complicated, and random to understand. However, group behavior (instead of individual behaviors) often has an intrinsic and typical pattern. That is, group behavior is not just the sum of individual behaviors.

Data science is spotting these patterns and extracting dominant factors to make these patterns from Big Data, which enable us to clearly understand our society. While a telescope or microscope shows us more detailed observations, the intrinsicscope shows us a more coarse-grained insight to better understand a complex system in our society.