Everything is Connected but Not Correlated

How not to be wrong

“Correlation is not transitive. … The non-transitivity of correlation is somehow obvious and mysterious at the same time.”

[How not to be wrong, Jordan Ellenberg]

In Hollywood, the Bacon Number of an actress/actor represents the closest connectivity to the actor, Keven Bacon through movies. Surprisingly, we observed that almost all the actresses/actors can be connected to Keven Bacon within six steps, called this: “Six Degrees of Separation” or “Small World.” This concept originally stems from “Erdős Number” in mathematics and science research, representing a collaborative distance to the mathematician, Paul Erdős. (My Erdős number is 4 by-the-way). What a small world and we feel that everybody is connected!

Sometimes, we confuse a correlation with a connection (or relation). A correlation is not transitive. Even though A and B are strongly correlated and B and C are also correlated, nobody can guarantee that A and C are correlated. However, we often think that there should be a correlation between A and C because we get used to syllogistic reasoning. Moreover, when we mixed up with causality, correlation, and relation, it’s a disaster. So, please do not make any transitivity for mutually correlated data. Also, we keep in mind that uncorrelated data can have a relationship with each other. We, you and I, are connected in the small world but we may not (or may) be correlated with each other.

The Triumph of Mediocrity: Do not Stumble on Your Success

Triumph of Mediocrity

“That’s what causes regression to the mean: not a mysterious mediocrity-loving force, but the simple working of heredity intermingled with chance.”

[How not to be wrong, Jordan Ellenberg]

At the beginning of the month, I check the number of visitors and views on my blog and say: “What? Too many people come in! Then, my blog is ON PACE to break my monthly record!!” I am really excited about this shock rise. At the end of the month, my eyes widen in surprise because the average number of people visited, no new record (Sigh). This shows “The Triumph of Mediocrity.” Some data intertwined with deterministic factors and uncertainties show a tendency to regress to the mean.

This simple mathematical observation gives a lesson about how to live. There is no (deterministic) equation of success. Even if it exists, it has too many uncertainties so we cannot solve this equation. When you achieved something that you want, this success does not only stem from your skills, abilities, intelligence, and effort. Rather, uncertainties (many people call this “luck”) may drive your way to success. Just when you think that you find the equation of the success, your next try may fail and you will be back to the mean – we call this “Sophomore Slump.” So please be humble. please do not stumble on your success. Also, if you did your best but failed, please try one more, the triumph of mediocrity may take you to the success.

Make Your Problem Harder!

How not to be wrong

“Instead, we turn to the other strategy, which is the one Birbier used: make the problem harder. That doesn’t sound promising. But when it works, it works like a charm.”

[How not to be wrong, Jordan Ellenberg]

When your friend was struggling with a difficult problem, we often said: “Don’t make it complex, just start with a simple problem”. This is because we have experienced that this simplification provides some clues for solving the difficult problem. This is what mathematicians actually do every day. When proving some statements, they start from the simplest case and expand it to the target problem. However, sometimes, making the problem harder suggests a simple alternative way to solve your real problems effectively.

Many data scientists have focused only on reducing the number of features to make a data-driven model simper. However, this approach does not always give the simplest model. The projection onto the low-dimension (fewer features) may make the data structure more complicated, leading to a failure of spotting the hidden pattern. Hence, sometimes, they need to increase features to make a model simpler (because of more data, more simple). This alternative thinking (adding more features) embodies the trade-off between a simpler model with many features and a complicated model with few features.

Can We Predict our Future in Chaos?

“For human action we have no such model and may never have one. That makes the prediction problem massively harder.”

[How not to be wrong, Jordan Ellenberg]

In the weather, the very tiny scale of energy at a certain location can change the global outcome dramatically – we called this chaos. Edward Lorenz discovered this and wrote: “if the theory were correct, one flap of a sea gull’s wing would be enough to alter the course of the weather forever”. Even though we have an accurate mathematical model (or a data-driven model) for the weather forecast with tons of measured data, we can make only a short-range prediction.

Our behaviors in society are much more chaotic than the weather, leading to a failure of prediction of future outcomes. Moreover, we have no mathematical model to describe our behaviors effectively. Hence, it is really hard (or impossible) to find “right” causation from the massive data. In this chaotic system, we should keep in mind the followings: (1) don’t make any causation from your success (rather, say, just “lucky”); (2) don’t follow others’ successes (a tiny different condition makes a totally different outcome); (3) don’t prejudge the situation using “common sense” (no one can predict the outcome).

Improbable Things Happen All the Time

“The universe is big, and if you’re sufficiently attuned to amazingly improbable occurrences, you’ll find them. Improbable things happen a lot.”

[How not to be wrong, Jordan Ellenberg]

You have a card deck and draw five cards from this. Surprisingly, five cards you drawn are spade A, 2, 3, 4, and 5. (Congrats! you made a straight flush). Then, you might think that this is a new card deck so it is not shuffled yet because drawing these five cards in a row might be improbable (or much lower probable). However, improbable things happen all the time. Please go to Las Vegas and check this!

When analyzing some results, we need to get used to a BIG number in our fields. Our field of interest is pretty big and you can see many improbable occurrences (we can see winners of the lottery every week). Hence, we should be careful not to make any causality from a chance occurrence. In data science, even though the data-driven model finds some patterns from Big Data, we should examine that this pattern can be made by randomness or not. (It may be improbable that millions of people read this post and like it but improbable things happen all the time!!)

The Past is in the Past: the Law of Large Numbers

“That’s how the Law of Large Numbers works: not by balancing out what’s already happened, but by diluting what’s already happened with new data, until the past is so proportionally negligible that it can safely be forgotten.”

[How not to be wrong, Jordan Ellenberg]

You have a FAIR coin and toss it ten times. Surprisingly, ten heads in a row! Now, you should bet on head or tail. Where do you put your money? Fortunately, you had learned the law of large numbers which said that the average of large trials closer to the expected value. So, the next will be “tail” for balancing out by this law. BUT, it is not true, the probability of getting head or tail is still the same. We called this misconception as “Gambler’s Fallacy”. The law of large numbers CANNOT predict your future.

We often misunderstand that the previous independent results are highly related to the future result. Before you think like that, you should check first that previous results are really related to my future decision. If not, please forget about the past. Please don’t make the wrong causality using the law of large numbers. Even though you see that the average of repeated trials is far from normal, it cannot say anything about the future. Queen Elsa in Frozen says “Past is in the past” in her famous song ‘Let it go’.

Do You Want to Be a Nonlinear Thinker?

“Nonlinearity is a real thing! … Thinking nonlinearly is crucial, because not all curves are lines.”

[How not to be wrong, Jordan Ellenberg]

Many people want to be a nonlinear thinker who does not follow the step-by-step progression but tries to find the solution outside of the box. Hence, the word ‘nonlinear thinking’ implies somewhat special ability but most of the curves are nonlinear (only a few are lines) in the real world. That is, becoming a nonlinear thinker means (maybe) being mediocre.

When predicting future behaviors from the past (like the predictive model in AI), we should keep in mind that almost all curves are not lines. We should consider all possibilities to make our prediction nonlinear. Moreover, if our case turns out the nonlinear prediction, our optimal decision depends on where we already lie on the nonlinear curve. However, a linear prediction gives us good advantages to quickly find the pattern from the past and efficiently predict the (near) future behaviors because ALL curves seem to be lines locally. Hence, the balance of linear and nonlinear thinking is highly required in the age of Big Data.

As Human Beings, We are Flawed but We Learn

“But human decision making, while often flawed, has one chief virtue. It can evolve. As human beings learn and adapt, we change, and so do our processes.”

[Weapons of Math Destruction, Cathy O’neil]

We are flawed and vulnerable. We sometimes are blinded by prejudice. We are often apt to be emotional and fails to make the right decision. Yes, we are human beings. However, we have learned from our mistakes. We accepted the Copernican system. We changed our mind after Martin Luther King’s “I Have A Dream” speech. When we realized that there is something wrong, we can change all at once.

Automated systems, by contrast, CANNOT change their model immediately. The only thing they can do is an improvement of the model to add more parameters and correlations (like the eccentric and the epicycle in the Ptolemaic system). This makes the model more complex and complicated (not the right direction!). This shows the main role of human beings in the age of Big Data. Only we, human beings, stop and change the data-driven model immediately when it goes wrong.

Justice: What’s the Right Thing to Do in Data Science?

“The model is optimized for efficiency and profitability, not for justice or the good of the “team”. This is, of course, the nature of capitalism.”

[Weapons of Math Destruction, Cathy O’neil]

Michel J Sandel’s magnum opus, Justice: What’s the Right Thing to Do?, called our attention to justice (and fairness) in a period of prosperity of capitalism. Data science acts in a similar fashion of capitalism. More data (money) is more powerful and the efficiency (profitability) is the most important factor for its success. Hence, in Data Science, we should consider that fairness and efficiency (and profitability) are compatible.

To take fairness into the consideration in data-driven models, we need to think over what we can do. First, we should double-check that our data are unbiased. Specifically, historical data are often biased due to different historical backgrounds. So when combining long-time history data, we need delicate effort to eliminate hidden bias. Moreover, we add “fairness” to the main objectives in data-driven models directly. Here, we have the problem of how to quantify fairness (also justice and morality). Hence, it is still challenging to make the fair model but it is not impossible.

Don’t Put Me in, Data

“More times than not, birds of a feather do fly together. … Investors double down on scientific systems that can place thousands of people into what appear to be the correct buckets.”

[Weapons of Math Destruction, Cathy O’neil]

To enhance computational efficiency, data-driven models often create subgroups and predict future behaviors of these subgroups (not every single person). Under the premise that people who have similar characteristics may make a similar decision for specific problems (like doppelganger search). This prediction for the subgroups leads to an efficient and simple predictive model.

There are still some important questions about this efficient model. Are we in the correct subgroups? If so, is it true that all the people in the same subgroup always make the same (or very similar) decision? Data scientists should remind these questions. And then they check that our prediction results can be divided by some (finite) subgroups and the number of subgroups is enough to make the right prediction. Someone may want to say like “Don’t put me in any subgroups. I am so independent!”