Flawed data yields flawed results. Researchers understand the impact that flawed data may have on research that relies on experimental results, which affects everything from drug development to performance data to public policy.

MIT CSAIL researchers recently reported the development of an algorithm that removes bias from the training data sets that teach it to categorize facial images. The technique may offer a promising way to eliminate racial bias in image recognition systems.

The results were reported in the 2018 Gender Shades project, which examined bias in automated facial analysis algorithms, are an example of inaccurate results caused by inaccurate data. In one case discussed by the researchers, racially biased data resulted in serious mistakes in three major image recognition algorithms.

Why misclassification is a problem

Increased use of automated systems and the algorithms that power them magnify the potential for errors caused by misidentification.

For example, when facial recognition software misidentifies an employee who needs access to a business’s front door or a restricted lab, the result can be a minor inconvenience. When law enforcement misidentifies and arrests a person in a crowd at a public event, that individual may suffer the stigma of an arrest and negative publicity. Perhaps more troubling are the potential consequences that arise when the same software fails to identify a dangerous individual in that same crowd.

More broadly, biased algorithms that are baked into several systems may result in social stigmatization, economic losses and lost opportunities. For example, employment services and companies that use algorithms to evaluate resumes may tune the algorithm to look for specific words, and reject resumes that lack that language. Applicants who know how to game the system by using the right keywords will have an advantage over potentially better-qualified applicants who do not.

(Read about new applications of facial recognition software: Facial recognition tech used to stop illegal chimpanzee trade; Facial recognition technology moves into the classroom)

What causes bias?

Humans can inadvertently build biases into software. In the case of an image recognition system, the MIT researchers discovered that the machine learning algorithms used to train certain systems are a source of bias.

The Gender Shades project tested three AI gender classification products: IBM/Watson, Microsoft Cognitive Services and Face++ from Beijing Kuangshi Technology Co. Ltd. All three products more accurately identified male faces than female faces, and lighter-skinned faces than darker-skinned faces. Darker female faces fared the worst of any groups. The researchers learned that the training set was biased in favor of lighter skin and male faces.

The Gender Shades project team characterized the problem: “Automated systems are not inherently neutral. They reflect the priorities, preferences, and prejudices – the coded gaze – of those who have the power to mold artificial intelligence.”

(Read about anti-facial recognition technology activism: Coalition urges tech giants to stop sales of facial recognition tech to the government)

Computing a solution

Researchers led by Alexander Amini of CSAIL and Ava Soleimany from Harvard’s biophysics program set out to design an algorithm that can review and reduce sample-set data bias, and then iteratively apply the algorithm to the same data. The researchers learned that existing methods to build unbiased sample sets do not look at variability within a class of training data. Instead, they look for bias across classes. Researchers set out to discover how to remove bias from within data classes, which is referred to as removing latent biases from data.

The researchers chose to use an existing set of images, the Pilot Parliaments Benchmark (PPB) dataset, which an MIT group developed and used in the Gender Shades project. The PPB consists of 1,270 images selected from elected members of Parliaments from three African and three European countries; the images are grouped by binary gender, skin type and the intersection of these two characteristics. Gender breakdown was 44.6% female and 54.4% male. Skin color was evaluated by the Fitzpatrick skin type classification system, which is a standard tool for skin pigmentation research. The classification system recognizes six categories: 1 to 3 are lighter and 4 to are 6 darker. In this sample, 46.4% individuals had darker skin and 53.6% had lighter skin.

The researchers developed a variation on an existing technique, variational autoencoding (VAE) for discovering latent variables in a set of images, creating a debiasing-VAE tool. Their system used learned latent variables to re-weight sample variables, and the algorithm identified under-represented parts of the training data and increased their weight in the classifier. The system was unsupervised – meaning that humans do nothing to look for bias – as it learned latent variables and used the resulting knowledge to classify faces.

Experiment results

Before working with the PPB data, the team validated their algorithm using a training set compiled from two existing image sets that had been used by previous researchers. Validation consisted of comparing results using the test algorithm before and after de-biasing the training set. The research team them applied the trained algorithm to the PPB data.

The results showed several areas of real improvement over the three commercial systems that were tested. The debiased algorithm offered significant improvement in classifying dark-skinned males and reducing variation in accuracy between categories. It improved on the commercial systems’ accuracy with light-skinned males – a task that the three systems already handled well – showing that the CSAIL system did not lose any performance as a result of debiasing.

The team is offering its work as an additional tool to promote fairness in artificial intelligence applications and is providing an open-source model. The researchers point out that the algorithm also can be used for other computer vision applications.

The Algorithmic Justice League, founded by MIT Media Lab researcher Joy Buolamwini, is dedicated to fighting bias in algorithms. She was the lead author on the Gender Shades research. Amini and Soleimany and their team presented their results at the Artificial Intelligence, Ethics, and Society Conference in January 2019.