Can you trust the accuracy claims of Amazon face recognition?

In July 2018, the American Civil Liberties Union (ACLU) conducted a test using Amazon’s face recognition tool, “Rekognition,” to match photos of US Congress members against mugshots of people arrested for a crime.

The ACLU found 28 false matches, highlighting the shortcomings of face recognition technology that’s being peddled to law enforcement agencies nationwide.

So, has it gotten any better? Curious as to if and how quickly face recognition is improving, Comparitech decided to conduct a similar study almost two years later, adding UK politicians into the mix, for a total of 1,959 lawmakers.

The study shows that Amazon’s face recognition software incorrectly matched more than 100 photos of US and UK lawmakers with police arrest photos, but measuring accuracy isn’t as simple as it sounds.

Before we discuss the results of the study between US and UK politicians in detail, let’s first review the fulcrum on which all of these tests pivot: confidence thresholds.

Confidence thresholds

When two images are compared by Amazon’s Rekognition, it doesn’t simply return a yes or no answer. Instead, results are given as percentages. The higher the percentage, the more confident Rekognition is that the two images are of the same person.

The ACLU used Rekognition’s default settings, which set the confidence threshold at 80 percent.

Amazon rebuked the ACLU’s findings, saying the threshold was too low. An Amazon spokesperson told GCN it should be set at least 95 percent for law enforcement purposes, and ablog post on the Amazon Web Services website stated it should be 99 percent. However, areport by Gizmodo found that it’s up to police discretion to set those thresholds, and they don’t always use Amazon’s recommendations.

Raising the confidence threshold inevitably leads to fewer false positives (incorrectly matching two photos of different people), but also more false negatives (failure to match two photos of the same person). Unfortunately, the researchers couldn’t measure the latter in this experiment. More on that later.

Comparitech researchers contacted both the ACLU and Amazon for comment and will update this article if we receive a response on the record.


The US data set was comprised of photos of 430 Representatives and 100 Senators.

At an 80 percent confidence threshold, Rekognition incorrectly matched an average of 32 US Congresspersons to mugshots in the arrest database. That’s four more than the ACLU’s experiment two years ago.

By those standards, Amazon’s face recognition hasn’t improved and even performed worse than what the ACLU posited two years ago.

When the researchers increased the threshold to what Amazon recommends for law enforcement, however, they found no incorrect matches at or above 95 percent confidence. The ACLU did not give results at this threshold back in 2018, so researchers have no previous results to which they can compare.


The UK data set consists of 1,429 politicians: 632 Members of Parliament and 797 Members of the House of Lords. The researchers matched them against the same arrest photos as the US politicians.

At an 80 percent confidence threshold, Rekognition misidentified an average of 73 politicians to mugshots in the arrest database. The rate of false positives was lower for UK politicians (5 percent) than for US ones (13 percent), which might suggest UK politicians look substantially different than their US counterparts, at least according to Rekognition.

When the researchers raised the confidence threshold to 95 percent, there were no incorrect matches.

Racial bias

The ACLU alleged that, at 80 percent confidence threshold, Amazon’s face recognition technology was racially biased, misidentifying non-whites at a higher rate than white people.

Comparitech results support this finding. Out of the 12 politicians who were misidentified at a confidence threshold of 90 percent or higher, six were not white (as shown in the image at the top of this article). That means half of the misidentified people were people of color, even though non-whites only make up about one-fifth of US Congress and one-tenth of UK parliament.


Comparitech used publicly available photos of 430 US Representatives, 100 US Senators, 632 members of the UK Parliament, and 797 members of the House of Lords.

These were matched against four sets of 25,000 randomly chosen arrest photos from using Amazon Rekognition. The experiment was repeated once for each set, and the results averaged together. Because the ACLU did not publish its test data, Comparitech could not use the exact same database of arrest photos.

In some instances, a single politician was misidentified more than once against multiple mugshots. This counts as a single false positive.

This spreadsheet contains all of the politicians who matched at or above 70 percent confidence, their photos, and the confidence at which Rekognition matched them.

Why you shouldn’t trust face recognition accuracy statistics

Be skeptical any time a company invested in face recognition peddles metrics about how well it works. The statistics are often opaque and sometimes downright misleading.

Here’s an example of how statistics about face recognition accuracy can be twisted. In the UK, the Met police force claimed its face recognition technology only makes a mistake in one of every 1,000 cases. They reached this number by dividing the number of incorrect matches by the total number of people whose faces were scanned. This inflates the accuracy rating by including true negatives—the vast majority of images that were not matched at all.

In contrast, independent researchers at the University of Essex found the technology had an error rate of 81 percent when they divided the number of incorrect matches by the total number of reported matches. The University’s report is much more in line with how most people would reasonably judge the accuracy, disregarding true negatives and focusing on the rate at which reported matches are correct.

A later report found the Met police used live face recognition to scan 8,600 people’s faces without consent in London. The results were in line with the University of Essex’s findings: one correct match leading to an arrest, and seven false positives.

False negatives

Even more seldom reported is the rate of false negatives: two images of the same person that should have been matched, but weren’t. As a hypothetical example of this error in practice, a face recognition-equipped camera at an airport would fail to trigger an alert upon seeing a person it should have recognized. Another form of false-positive would be failing to recognize that a face exists in an image at all.

In order to measure the rate of false negatives, Comparitech would have to populate the mugshot database with some real—but not identical—photos of the politicians. Because the aim was to recreate the ACLU’s test, this was beyond the scope of the experiment.

Real-world use cases

Let’s also consider what we’re comparing: two sets of headshots. One contains police mugshots and the other doctored portraits, but both offer clear views of each person’s face at eye level, facing the camera.

Real-world use cases are much different. Let’s take CCTV surveillance, for example. Police want to scan faces at an intersection and match them against a criminal mugshot database. Here are just a few factors that further muddy claims of how well face recognition perform in such a real-world setting:

  • How far away is the camera from the subject?
  • At what angle is the camera pointed at the subject?
  • What direction is the subject facing?
  • Is the subject obscured by other humans, objects, or weather?
  • Is the subject wearing makeup, a hat, or glasses, or have they recently shaved?
  • How good are the camera and lens? Is it clean?
  • How fast is the subject moving? Are they blurry?

All of these factors affect face recognition accuracy, and performance. Even the most advanced face recognition software available can’t make up for poor quality or obscured images.

Putting too much faith in face recognition can lead to false arrests. In April 2019, for example, a student sued Apple after the company’s face recognition software falsely linked him to thefts at several Apple stores, leading to his arrest.

The use of a threshold higher than 80% certainly improves results. But whether you agree with police use of face recognition or not, one thing is certain: it isn’t ready to be used for identification without human oversight.

Amazon states in its blog post, “In real-world public safety and law enforcement scenarios, Amazon Rekognition is almost exclusively used to help narrow the field and allow humans to expeditiously review and consider options using their judgment (and not to make fully autonomous decisions).”