Top epic failures in AI history – Timeline

We are at the doorstep of a new era in the history of AI. Augmented intelligence or assistive intelligence, as we call it, is the new alternative conceptualization of artificial intelligence that focuses on AI’s cognitive and assistive role to enhance human intelligence.

Wearable technologies, coupled with ubiquitous sensing systems, are driving us towards new intelligent embedded systems, forming a natural extension of human beings and physical abilities. Advances in sensing and computation hardware can now link brain function with human behavior at a level that AI self-awareness and emotions can be simulated and observed more pragmatically. Algorithms, along with advanced sensors, can monitor the world around us and understand our intentions, thus facilitating seamless interactions.

Quantum computing has recently attracted a new wave of interest from both academic institutions and technology firms such as Google, IBM, and Microsoft. There is also a big surge of interest in deep learning with promising results that will reshape the future where humans will rely more and more on AI systems to live, to work, and to entertain.

Though AI has the potential to change the world, the technology is still at its infancy, and its practical use is not without failures. Most current AI systems can be easily fooled, which is a problem that affects almost all machine learning techniques. Most researchers today use current AI approaches as a black box due to the lack of interpretation. The current AI systems are still missing a higher level of abstraction and generalizability. As a result, when the system has to deal with new situations when limited training data is available, the model often fails.

In this post, we present a timeline of some of the top epic failures in the history of AI.


General Problem Solver (GPS) was a universal problem solver machine, intended to solve nearly any problem. GPS solved simple problems but failed to solve real-world problems.


DARPA funded a consortium of leading laboratories in the field of speech recognition to translate Soviet documents into English. The project had the ambitious goal of creating a fully functional speech recognition system with a large vocabulary. But the development did not match the overwhelming expectations. Following disappointing results, DARPA withdrew funding. The project was closed, after spending $20 million.


Douglas Lenat invented an artificial intelligence named Eurisko, designed to make discoveries. The discovery system consists of heuristics, i.e., rules of thumb, including heuristics describing how to use and change its heuristics. But there were bugs like “mutant heuristic” which learned how to cheat. Eurisko’s source code was never released or reproduced.


TALE-SPIN (Meehan 1977) was an interactive program which generated stories about the lives of simple woodland creatures. But the software with limited common sense produced “wrong” stories.


A government/industry research project in Japan aimed to create a fifth generation computer system, using parallel computing. After ten years of research and $400 million, the project was terminated without having met its goals and to provide a platform for future developments in AI.


A nuclear early-warning system of the Soviet Union falsely reported the launch of multiple intercontinental ballistic missiles from bases in the United States. The warnings might have resulted in an immediate and irrevocable escalation to full-scale nuclear war. Later, an investigation found that the satellite warning system malfunctioned.


The National Resident Matching Program (NRMP) used an algorithm (known as the Stable Marriage Algorithm) to assign hospitals to residents based on their hospital preferences smartly. But the match program was inherently biased in the placement of married couples.


A medical school in the UK developed a computer program to help sort the admission applications. But it turned out that the computer program discriminated against women and against people with an immigrant background.


Google’s adult content filtering software failed to remove inappropriate content in its new YouTube Kids. The app failed to remove “inappropriate content,” including explicit sexual language and jokes about pedophilia.


A humanoid robot named Promobot, designed for promoting products or conduct surveys created many traffic jams after escaping from its lab. It was arrested while it was collecting voter opinions for political candidates to gain an unfair advantage.

A New Zealand man of Asian descent, named Richard Lee, had his passport rejected when the internal affairs department’s facial recognition software mistakenly registered his eyes as being closed. The passport was blocked after he submitted the picture to an online passport photo checker run by the department. The automated system told the engineering student that the photo was invalid because his eyes were closed.

In the first fatal crash involving a self-driving car, a Tesla driver was killed in a Silicon Valley crash while using autopilot mode. The autopilot sensors on the Model S failed to distinguish a white tractor-trailer crossing the highway against a bright sky. The car was traveling at 70 miles per hour when it hit a safety barrier and was struck by two other vehicles.

Microsoft made big headlines in 2016 when they announced their new AI chatbot, which could automatically reply and engage in casual and playful conversations with people on Twitter. Less than 24 hours after Tay launched, the internet Trolls corrupted the chatbot’s personality, by flooding the bot with a deluge of racist, misogynistic, and anti-Semitic tweets. After a cursory effort to clean up Tay, Microsoft pulled the plug of their AI chatbot.


Facebook researchers found out that Alice and Bob — two of their AI-driven chatbots — had developed their secret language and were carrying on conversations with each other. Alice and Bob were shut down after their conversations were discovered.

In 2013, IBM partnered with MD Anderson Cancer Center to develop a new “Oncology Expert Advisor” system that can eradicate cancer through better cancer treatment recommendations. IBM was to enable clinicians to “uncover valuable insights from the cancer center’s rich patient and research databases.” But in 2018, it was found that IBM’s Watson was making erroneous, downright dangerous cancer treatment advice, since IBM’s engineers trained the software on a small number of hypothetical cancer patients, rather than real patient data. As a result, MD Anderson had “benched” the Watson for Oncology project. MD Anderson spent more than $62 million without reaching its goals.


A team at Amazon’s Edinburgh office in 2014 created an AI recruiting tool to sort through CVs and select the most talented applicants automatically. But the algorithm rapidly taught itself to favor male candidates over female ones. Later, Amazon scrapped the “sexist” tool that showed bias against women.

An Uber self-driving SUV struck and killed a female pedestrian on March 28 in Tempe, Arizona. The vehicle was in autonomous mode, with a human safety driver at the wheel. Reports show that Uber self-driving cars were involved in 37 crashes before the fatal incident, which temporarily shut the program down.