Am I the Same Person as I Was Yesterday?

“Mathematical models, by their nature, are based on the past, and on the assumption that patterns will repeat.”

[Weapons of Math Destruction, Cathy O’neil]

We always promise ourselves, ‘I will be not the same person as I was yesterday’. But, can we do that? It is really hard to change our daily routine and we repeat the same mistakes over and over. Hence, we are so predictable.

Data-driven models are finding patterns from past (or nearly present) to predict future behaviors under the belief that history always repeats itself. Hence, the data-driven model uses our repeated patterns to enhance its prediction accuracy. So, we need a break from routines and to be an outlier so that the model cannot predict our future.

How to Make my Model Unspoiled?

“Without feedback, however, a statistical engine can continue spinning out faulty and damaging analysis while never learning mistakes.”

[Weapons of Math Destruction, Cathy O’neil]

To deal with a spoiled child effectively, parents (or teachers) should consistently observe their child. When they become rude, parents should give them feedback immediately and teach responsibility, appreciation, and respect about others.

Data-driven models also require the right feedback so that they update them in the right direction. To this end, data-driven models use their mistakes to adjust the model. However, this feedback loop commonly includes only one-side mistakes. For example, the AI HR-screening model has only data about the bad performance of chosen applicants. It cannot have data about the successful career of unchosen applicants.

What Ingredients Do We Need for Yummy Data Soup?

“To create a model, then, we make choices about what’s important enough to include, simplifying the world into a toy version that can be easily understood.”

[Weapons of Math Destruction, Cathy O’neil]

Imagine you make chicken soup for dinner. What ingredients do you need for delicious soup? Chicken absolutely and maybe celery and onion and more; it depends on your mother’s recipe. Organic ingredients will be much better for your health.

Successful data-driven models commonly require enough important data. However, we do not know which data is important so we just put all the data into the model. Fortunately, a data-driven model may know what data is salient among all these data (via feature selection or dimensionality reduction) and make the own recipe. Also, unbiased (like organic) data will be much better for the accurate model.