The Dark Side of Big DataLarry Maloney | October 18, 2016
Mathematical models built from Big Data are driving decisions in virtually every area of life – from the ads we see on computer screens to the types of products stocked in vending machines to the tests that are given to job applicants.
But what happens when these models are flawed, perhaps because of outright bias or faulty statistical data? People can get hurt, says Harvard Ph.D. mathematician Cathy O’Neil, who writes the mathbabe blog. Her book, Weapons of Math Destruction, describes models gone amuck that have put teachers out of work, hiked insurance premiums, and created burnout in factory workers, among other adverse impacts.
O’Neil knows this subject firsthand. In her career, she has worked as a Wall Street quant, building predictive models for a hedge fund. She has also served as a data scientist for firms to forecast consumer purchases and online clicks. Most recently, she designed a program on Data Journalism for Columbia University.
In an interview with Engineering360 contributing editor Larry Maloney, O’Neil discusses the causes of destructive algorithms, as well as ideas to ensure sounder, more ethical models. (Also read "Is Your Big Data Project a 'Weapon of Math Destruction'" at IEEE Spectrum.)
It’s a timely topic for engineers, who are asked increasingly for input on models affecting everything from product design to factory operations to project management.
Maloney: How prevalent is the misuse of mathematical modeling in our society and economy?
O’Neil: It is widespread and affects every area of life, including education, employment, credit and lending, and the criminal justice system. Many people have the erroneous assumption that algorithmic models and predictive analytics are inherently fair and objective. As a result, there is far too little scrutiny of these algorithms. Simply stated, algorithms are a way of automating past practices for making decisions. But if those past practices are flawed, then modeling codifies those mistakes.
Maloney: Can you cite some examples from your book?
O’Neil: I’ll cite three examples that show the breadth of the problem and the kinds of flaws that can creep into models.
First, there are statistical flaws, which you find in teacher value-added models that aim to assess teachers and identify those who are failing. This model evaluates teachers by comparing student test scores versus expected test scores for those students. The chief problem here is the statistical inconsistency of the model, which can yield vastly different scores from year to year, even for a teacher who has not changed his or her style of teaching.
Why is this? Among the many variables are the time of day the test is given, whether or not a student was hungry at test time, or even the temperature in the classroom. Such factors create uncertainty and make it very difficult to model what an expected score should be for any given student, and this uncertainty grows as the size of the class gets bigger.
Despite these problems, this model is being used all over the country for high-stakes decisions, including teacher tenure and teacher firings. What’s more, it is a very secretive and complex model. Most teachers don’t understand how these models work, nor are they being told how they can get better scores. In summary, it’s an unaccountable model that has largely failed in its aim of weeding out bad teachers.
Maloney: How about other areas of life?
O’Neil: Another prime example in law enforcement is modeling for predictive policing, which determines where police will be assigned. The flaw here is biased data, in that the models are based on arrest records, which are a very incomplete picture of crime. In many cities, poor sections, including those with large minority populations, historically get much more police scrutiny than richer sections. Blacks and whites may be smoking pot at the same rates, but blacks are far more likely to be arrested for it. So if you build your models based on arrests, you are simply sending more police to the same neighborhoods that are already being over-policed.
Then there are the algorithmic models for constructing personality tests that are used by many human resources departments. Even though there are now many regulations protecting individuals from unfair hiring practices, there is little or no auditing of testing models used to filter job applicants. For example, one model filters out those who have had past instances of mental illness, a practice that is illegal under the Americans with Disabilities Act.
Maloney: What concerns do you have about use of modeling in the development and sale of products?
O’Neil: If you ask Silicon Valley venture capitalists about using modeling to create targeted online product advertising, most will say that this practice constitutes a service, since the advertising is tailored to items that we might want to buy.
For many of us, such advertising is a service. For lower-income consumer, however, such ads can be predatory. They tend to see ads for such things as payday loans or for-profit colleges, even when they have not be doing searches in these areas.
As an experiment, I searched for information on food stamps, and three of the top five hits led me to the same web site, which passed along information on such inquires to lead aggregators for for-profit colleges. The point is that when we set out to promote a product or service through online targeted advertising, we need to be asking ourselves who could be harmed by our methodology?
Maloney: You talk in your book about the increasing use of operations research to create leaner, more productive operations. Do you have concerns about modeling in that arena?
O’Neil: I don’t want to attack a whole field of endeavor, because some very good benefits derive from operations research and related manufacturing techniques, such as just-in-time production and supply-chain practices. But here, again, the technologists who build these models should be aware of who potentially could be harmed.
For example, scheduling software -- an extension of just-in-time -- runs the risk of bending the lives of workers to fit a mathematical model. Oftentimes, workers get little notice about when their shifts will occur, which makes it very difficult to plan for things like child care or a night school class that could lead to improved career prospects. In other instances, scheduling software is used as a tool to keep worker hours below the threshold for providing health insurance coverage.
Maloney: How about pricing models used for products and services?
O’Neil: The advent of Big Data in more and more cases has led to pricing structures based not so much on the inherent worth of the product or service, but on how much a person might be willing to pay. More and more we see pricing models based on the group in which someone resides.
In the book, I cite a Consumer Reports study on auto insurance, which showed great disparities in the rates people were charged based on demographic data, including credit scores. In this case, how you manage your money may count more than your driving record, when it comes to what you pay for insurance.
Maloney: You mention in your book that some high-tech companies have used software and modeling to assess the idea-generating talents of their technology personnel. Do such methods really work?
O’Neil: I would argue that such methods are very questionable. First of all, how do you define what constitutes a good idea? And since we can’t see into people’s minds, model builders are forced to choose rather weak proxies for good ideas, such as the number of emails a person generates or how often a person is promoted.
Maloney: What can be done to ensure that Big Data-driven models are fair and objective?
O’Neil: One of the critical steps needed in modeling is having an effective feedback loop, which can help create a healthy ecosystem for your model. The feedback loop identifies problems once the model is in the field so that improvements can be made. You can compare it to good auto design, where features are added, removed, or refined based on customer feedback. Design flaws can be fixed if you have a healthy feedback loop.
However, feedback loops don’t work when you have an algorithm that is so secretive that it prevents accountability. In the automotive world recently, the feedback loop failed when Volkswagen allegedly cheated on government emissions tests. The alleged cheating was so clever that it was hard for people to realize the problem.
Maloney: What are some of the other steps you recommend to prevent flawed models?
O’Neil: When you are constructing a data-driven algorithm, you need to have very strong evidence that it is automating a human decision-making process that you actually trust. Your model must be as good or better. And there needs to be ongoing monitoring to ensure that these models are fair, legal and non-discriminatory.
Movements toward auditing algorithms are already taking place. Researchers at Princeton, for example, have launched a software project to detect biases in automated systems from search engines to job placement sites. Elsewhere, Harvard mathematician Mira Bernstein built a model to scan industrial supply chains to help companies root out slave-built components in their products.
You also need more input from people who are the targets of these mathematical models. Getting back to the teacher evaluation example I noted earlier, a fair modeling system would include in its feedback loop ongoing input from the teachers themselves on how to make the model better.
Maloney: Do we need more government regulation of these models?
O’Neil: Government has a powerful role to play, just as it did when confronting the excesses of the first industrial revolution. But it won’t be easy, because some companies are making a lot of money from use of these questionable algorithms. With modeling affecting so many areas of life, regulators across the board must become more tech-savvy.
The Equal Employment Opportunity Commission is already looking at personality tests used in hiring, and the Federal Trade Commission has been involved in price discrimination suits connected with biased modeling.
Finally, there is the whole area of self regulation. Data scientists and others who build data-based models need to consider the ethics of what they are doing. As in other areas of business education, courses in ethics need to be part of the curriculum for people who will be working in fields like data science and operations research.