History of artificial intelligence in medicine (AIM)

The term “Artificial Intelligence” (AI) was coined in 1950. Alan Turing was the first to describe the use of computers to simulate intelligent behavior and critical thinking. In his famous book Computers and Intelligence, Turing described a simple test, later known as the “Turing test,” to determine if computers are capable of human intelligence. Six years later, John McCarthy defined AI as “the science and engineering of making intelligent machines.”

In the early days, AI started out as a simple set of “if, then” rules. But in 1961, the first industrial robot arm (named Unimate) was introduced in the assembly line of General Motors to perform automated die casting. Unimate was a big success, and it was able to follow step-by-step commands.

A few years later, in 1964, Joseph Weizenbaum introduced Eliza that communicated with the help of natural language processing, using pattern matching and substitution methodology to imitate human conversation (external communication). It laid the groundwork for future chatterbots.

Shakey, dubbed “the first electronic person,” was created in 1966. This was the first mobile robot capable of interpreting instructions, and it was developed at Stanford Research Institute. Rather than simply following 1-step commands, Shakey could process more complex instructions and carry out the appropriate actions. This was an essential milestone in robotics and AI.

Despite the innovations in engineering, medicine was slow to adopt AI. However, this early period was an important time for digitizing data that later served as the foundation for future growth and utilization of artificial intelligence in medicine (AIM).

The National Library of Medicine’s development of the Medical Literature Analysis and Retrieval System and the web-based search engine PubMed in the 1960s became an important digital resource for biomedicine’s later acceleration. During this time, clinical informatics databases and medical record systems were also developed for the first time, laying the groundwork for future AIM developments.

The 1970s to 2000s

The period between 1970 and 2000 is commonly referred to as the “AI winter,” denoting a period of reduced funding and interest, with fewer significant developments as a result. Many people recognize two major winters: the first in the late 1970s, which was fueled by concerns about AI’s limitations, and the second in the late 1980s, which lasted until the early 1990s, and was fueled by the high cost of developing and maintaining expert digital information databases.

Despite the lack of general interest during this time, AI pioneers continued to collaborate. Saul Amarel founded The Research Resource on Computers in Biomedicine at Rutgers University in 1971.

In 1973, the Stanford University Medical Experimental–Artificial Intelligence in Medicine, a time-shared computer system, was established to improve networking capabilities among clinical and biomedical researchers from various institutions. Due to these collaborations, the first National Institutes of Health-sponsored AIM workshop was held at Rutgers University in 1975. These events represented the initial collaborations among the pioneers in AIM.

The development of a glaucoma consultation program using the CASNET model was one of the first prototypes to demonstrate the feasibility of applying AI to medicine. The CASNET model is a causal–associational network made up of three distinct programs: model construction, consultation, and a database created and maintained by the collaborators. This model could apply disease information to individual patients and provide physicians with patient management advice. It was created at Rutgers University and first presented in 1976 at the Academy of Ophthalmology meeting in Las Vegas, Nevada.

MYCIN, a “backward chaining” AI system, was created in the early 1970s. MYCIN provided a list of potential bacterial pathogens and then recommended antibiotic treatment options adjusted appropriately for a patient’s body weight based on patient information input by physicians and a knowledge base of about 600 rules. MYCIN served as the foundation for EMYCIN, a later rule-based system. INTERNIST-1 was later developed to aid primary care physicians in diagnosis, using the same framework as EMYCIN and a larger medical knowledge base.

The University of Massachusetts released DXplain, a decision support system, in 1986. This program creates a differential diagnosis based on the symptoms entered. It also functions as an electronic medical textbook, with detailed disease descriptions and additional references. DXplain could provide information on approximately 500 diseases when it was first released. It has since grown to include over 2400 diseases. By the late 1990s, there was a resurgence of interest in machine learning, particularly in the medical field, which, combined with the above technological advancements, ushered in the modern era of AIM.

Early models had several flaws that prevented widespread acceptance and application in medicine. However, in the early 2000s, many of these limitations were overcome by the advent of machine learning (ML), and deep learning (DL), and computer vision, creating opportunities for personalized medicine rather than the algorithm–only–based medicine.

From 2000 to 2020

Watson, an open-domain question–answering system developed by IBM in 2007, competed against human participants on the television game show Jeopardy! in 2011 and won first place. Rather than using forward reasoning (following rules from data to conclusions), backward reasoning (following rules from conclusions to data), or hand-crafted if-then rules, DeepQA used natural language processing and various searches to analyze data over unstructured content and generate probable answers. This system was more convenient to use, easier to maintain, and less expensive.

DeepQA technology could provide evidence-based medicine responses by pulling data from a patient’s electronic medical record and other electronic resources. As a result, it expanded the scope of evidence-based clinical decision-making. In 2017, Bakkar used IBM Watson to successfully identify new RNA-binding proteins altered in amyotrophic lateral sclerosis.

Digitalized medicine became more widely available due to this momentum and improved computer hardware and software programs, and AIM began to grow rapidly. Chatbots went from superficial communication (Eliza) to meaningful conversation-based interfaces thanks to natural language processing. In 2011, this technology was used to create Apple’s Siri virtual assistant, and in 2014, it was used to create Amazon’s Alexa virtual assistant. Pharmabot was a chatbot created in 2015 to assist pediatric patients and their parents with medication education. In 2017, Mandy was built as an automated patient intake system for a primary care practice.

In AIM, DL was a significant step forward. Unlike ML, which requires human input and uses a set of traits, DL can be trained to classify data on its own. Though DL was first studied in the 1950s, the problem of “overfitting” limited its application in medicine. Overfitting occurs when machine learning is overly focused on a single dataset and cannot accurately process new datasets due to a lack of computing capacity and training data. With the availability of larger datasets and significantly improved computing power in the 2000s, these limitations were overcome. A convolutional neural network (CNN) is a type of deep learning (DL) algorithm that simulates the behavior of interconnected neurons in the brain.

A CNN is made up of several layers that analyze an input image to spot patterns and apply filters. The fully connected layers combine all of the features to produce the final result. Le-NET, AlexNet, VGG, GoogLeNet, and ResNet are just a few of the CNN algorithms that are now available.


AI has advanced over several decades to include more complex algorithms that perform similarly to the human brain. Today, predictive models can diagnose diseases, predict therapeutic response, and potentially prevent medicine in the future. AI has the potential to improve diagnostic accuracy, provider workflow, and clinical operations efficiency, disease and therapeutic monitoring, procedure accuracy, and overall patient outcomes. With AI systems capable of analyzing complex algorithms and self-learning, we enter a new era in medicine, where AI can be used in clinical practice to improve diagnostic accuracy and workflow efficiency through risk assessment models.