Meet a Doppelgänger in Big Data

“But what else can these searches reveal? For one thing, doppelgänger searches have been used by many of the biggest internet companies to dramatically improve their offerings and user experience.”

[Everybody lies, Seth Stephens-Davidowitz]

There is an urban legend about a doppelganger, a non-biologically related look-alike or double of a living person; If you meet a doppelganger, you will die. We know this is fictional but nobody wants to meet her/his doppelganger.

In data science, however, meeting a doppelganger is helpful to understand and predict. In Big Data, we have assumed that similar input data, like a doppelganger, has similar output. Hence, if we can find doppelgangers of the target, we can predict its output effectively (e.g. averaging output of doppelgangers like a k-nearest neighbor or kNN). For example, Amazon and Netflix figure out what you might like from your doppelgangers in their database.