Don’t Put Me in, Data

“More times than not, birds of a feather do fly together. … Investors double down on scientific systems that can place thousands of people into what appear to be the correct buckets.”

[Weapons of Math Destruction, Cathy O’neil]

To enhance computational efficiency, data-driven models often create subgroups and predict future behaviors of these subgroups (not every single person). Under the premise that people who have similar characteristics may make a similar decision for specific problems (like doppelganger search). This prediction for the subgroups leads to an efficient and simple predictive model.

There are still some important questions about this efficient model. Are we in the correct subgroups? If so, is it true that all the people in the same subgroup always make the same (or very similar) decision? Data scientists should remind these questions. And then they check that our prediction results can be divided by some (finite) subgroups and the number of subgroups is enough to make the right prediction. Someone may want to say like “Don’t put me in any subgroups. I am so independent!”

1 thought on “Don’t Put Me in, Data”

Leave a comment