Facial recognition has an image problem. In 2014, a Denver man was arrested for robbing two banks on the basis of surveillance video stills shown to acquaintances and his ex-wife. The case was dismissed, but the man was rearrested a year later — this time based on FBI facial recognition technology.
The problem? The man who was arrested had a distinct mole on his face; the suspect in the image did not. Furthermore, a height analysis on the surveillance video found a 3-inch difference between the defendant and the suspect on camera. The case was dismissed a second time.
That’s not to say facial recognition is without its success stories. While the wrongful identification case may serve more as a cautionary tale than a mainstream occurrence in law enforcement’s use of facial recognition, the technology itself has had to contend with problems of bias and accuracy.
For instance, a 2012 study showed that face recognition systems are 5 - 10% less accurate when trying to identify African Americans as compared to white subjects. The study also found that female subjects were more difficult to recognize than males, and younger subjects were more difficult to identify than older adults.
“A lot of people would argue that face recognition technology itself is neutral, but technologies are only ever meaningful in their applications,” says Kelly Gates, author of Our Biometric Future: Facial Recognition Technology and the Culture of Surveillance. “Looking at algorithms alone doesn't really help us understand the problems that might come along with introducing much more sophisticated and widespread use of facial recognition technologies.”
To combat the bias and accuracy issues inherent in the face recognition technology, researchers, algorithm developers, and even law-enforcement agencies are taking a multipronged approach that emphasizes training and testing.
Training the Data
Face recognition algorithms leverage training data in order to distinguish between “noisy” facial characteristics and facial features consistent to a subject’s identity. If certain segments of the population are underrepresented in the training set, however, the algorithm’s performance could suffer.
“The quality of the training data and the makeup of the training data in terms of demographics are critical,” says Anil Jain, a biometric recognition expert and computer science professor at Michigan State University.
A test conducted by the U.S. National Institute of Standards and Technology (NIST) in 2006 discovered that algorithms developed in East Asia performed better on Asian subjects than algorithms originating in the Western hemisphere. Meanwhile, face recognition algorithms developed in the West are more accurate on white subjects.
The 2012 study on face recognition accuracy expanded the scope of training biases not only to three different races/ethnicities (white, black and Hispanic) but also across gender and race, using 1 million mugshots. Researchers concluded that face recognition performance improved when training exclusively on subjects of the same race/ethnicity.
What’s more, training face recognition systems on datasets evenly distributed across demographics offered consistently high accuracy on all demographic groups.
Testing facial recognition on a large scale is necessary to evaluate an algorithm’s shortcomings and advantages.
“Accuracy claims are questionable because they’re always based on experiments from the team developing the facial recognition algorithm,” says Gates, the biometrics author and associate professor at the University of California San Diego.
Researchers at the University of Washington evaluated such a premise when they established the MegaFace Challenge. After developing a dataset comprising 1 million publicly available Flickr images representing more than 690,000 individuals, the UW team invited face recognition teams to evaluate how their algorithms performed.
For example, Google’s FaceNet had previously achieved nearly 100% accuracy on a facial recognition dataset that included more than 13,000 images. That success rate dropped to 75% when tested as part of UW’s million-image challenge.
Face recognition’s accuracy is only as good as the database of photos being used. If the database only has one photo of someone from 15 years ago, providing a match a significant time later may be difficult. After about seven years, even for mug shot–type images, performance drops by 5%.
“When you have multiple exemplars over a period of time, your confidence level goes up,” says Wyly Wade, CEO of Biometrica Systems, Inc., a group that uses biometric-enabled tools to track crime. “But in most current face recognition systems, you really still don’t match one-to-one.”
Even with a large database of photo galleries, image inconsistency could pose a problem.
“In a single casino, for example, you may have analog cameras all the way up to 7K cameras,” Wade says. “That camera that was installed in the 1980s isn’t going to have the same matching capability as a camera installed last week.”
In late 2016, Georgetown law school’s Center on Privacy & Technology released a report examining law enforcement face recognition and the risks it poses to privacy and civil liberties. The report’s co-authors suggest that mug shots, rather than driver’s license photos, be the default image databases for face recognition. The databases also should be “periodically scrubbed to eliminate the innocent.”
Testing the Algorithms
In addition to more diverse training sets and image databases, testing the algorithms can reduce bias and improve accuracy. But testing is voluntary, unregulated and sporadic. Another recommendation from the Georgetown report called on NIST to expand the scope and frequency of accuracy tests.
NIST currently offers voluntary face recognition vendor tests. These tests provide independent government evaluations of commercial and prototype face recognition technologies with the goal of helping government and law enforcement agencies determine where and how to best deploy the systems.
In February 2017, NIST is expected to start a new evaluation of face recognition technologies applied to 2D still images. Unlike previous evaluations, the program will be conducted on a continuing basis and remain open indefinitely. The tests will use large sets of facial imagery to measure the performance of face recognition algorithms.
Training the User
Even though most facial recognition technology Is automated, most systems still require a human to examine similarities and dissimilarities in order to make the final match.
This act could add another layer of bias, even if it’s unintentional. “There is an inherent ability in the expertise of the individual, their interest in the job, and their performance depending on their state of mind or how difficult the position is,” says MSU’s Jain.
The Georgetown law study indicates that without specialized training, human reviewers make the wrong decision about a match half the time. The Georgetown law study discovered that of the 52 face recognition systems they studied from law enforcement, only eight used specialized personnel to review and narrow down potential matches.
Recent research from the University of New South Wales in Australia shows similar accuracy rates for untrained individuals, but, moreover, demonstrates that experts performing comparisons with peer review attained near-perfect accuracy.
An organization called the Facial Identification Scientific Working Group aims to improve outcomes by standardizing training in facial comparison. The training should include the subjects of image science, scientific principles of photographic comparison, biometric systems and facial anatomy to include the facial bones, muscles, and skin and facial features.
Subsequent to the training, ample practice — ideally in a mentorship situation where trainees can review comparisons with more senior examiners and receive feedback — is recommended.
“Forensic practice for face is to have peer review of searches and matches, and that is an excellent means of fostering that feedback,” says Nicole Spaun, principal face biometric expert at MorphoTrak.
Spaun recommends that smaller agencies, which may have only one or two facial examiners, reach out to a larger neighboring agency to find more senior examiners or additional peer reviewers.
Perhaps the biggest challenges for reviewers who want training is their limited access to it. According to Spaun, few agencies have developed in-house training for face recognition. The FBI provides classes for law enforcement and government agencies, but there is a waiting list.
“Because fingerprint is so well established, there are numerous commercial entities that provide training, but face is in its infancy,” says Spaun. What’s more, agencies that rely on face recognition are facing tightening budgets and schedules, making them hesitant to send their full staff on a weeklong course.
UCSD’s Gates points out that no matter how well-trained the reviewer is, perception will always influence face matching.
“Faces look radically different in different images, poses and lighting conditions,” she says. “Whether we are talking about humans or algorithms, there is no way to make face recognition perfect.”
Despite its limitations, facial recognition technology is a reality of everyday life. For better or worse, the age of social media is giving key industry players access to millions of photos. To ensure facial recognition is as unbiased and accurate as possible, developers, law enforcement agencies and other end-users may need to more fully commit to robust testing and training.