AI-based tools and databases for Drug Discovery

Drug Discovery

The development of a new drug is a very complex, expensive, and long process, broadly divided into four major stages: target selection and validation; compound screening and lead optimization; preclinical studies; and clinical trials. The process is somewhat tedious to perform and can be very expensive.

It has been estimated that the average cost of a traditional drug discovery pipeline is 2.6 billion USD, and a complete traditional workflow can take over 12 years. How to decrease the costs and speed up the new drug discovery are central questions for all pharmaceutical companies.

Much impressive artificial intelligence (AI) methods and machine learning (ML) tools have been developed recently to make the hunt for new drugs significantly quicker, cheaper, and more effective.

These AI tools are used in multiple aspects of the drug discovery lifecycle, ranging from real-time image-based cell sorting, cell classification, quantum mechanics, calculation of compound properties, computer-aided organic synthesis, designing new molecules, developing assays, predicting the 3D structures of target proteins, and many others.

Notably, traditional experimental structural biology methods usually take several years to resolve a protein structure. In contrast, AI-based structure predictions only take a few hours to a few days, making the process far more time-efficient.

By incorporating ML algorithms, pharmaceutical companies can also find a new use of drugs, predict drug-protein interactions, discover drug efficacy, ensure safety biomarkers, and optimize molecules’ bioactivity. They can automate and optimize the entire R&D by using various models for predicting the chemical, biological, and physical characteristics of compounds.

Some of the common ML algorithms widely used in drug discovery include Random Forest (RF), Naive Bayesian (NB), and support vector machine (SVM). The key use cases of AI in drug discovery are as follows:

  • Predicting 3D structure of target protein
  • Predicting drug-protein interactions
  • Determining the drug activity
  • Drug design
  • Designing biospecific drug molecules
  • Designing multitarget drug molecules
  • Predicting the reaction yield
  • Predicting the retrosynthesis pathways
  • Developing insights into reaction mechanisms
  • Designing the synthetic route
  • Identifying therapeutic target
  • Predicting the therapeutic use.
  • Predicting toxicity and bioactivity
  • Predicting physicochemical property
  • Identification and classification of target cells

AI tools used in drug discovery

Here are the examples of AI tools currently used by pharmaceutical companies in drug discovery:

AlphaFold – Predicts protein 3D structure

Chemputer – Helps to report procedure for chemical synthesis in a standardized format

DeepChem – MLP model that uses a python-based AI system to find a suitable candidate in drug discovery

DeepNeuralNet-QSAR – Python-based system driven by computational tools that aid detection of the molecular activity of compounds

DeepTox – Software that predicts the toxicity of a total of 12 000 drugs

DeltaVina – A scoring function for rescoring protein-ligand binding affinity

Hit Dexter – ML models for the prediction of molecules that might respond to biochemical assays

Neural GraphFingerprints – Property prediction of novel molecules

NNScore – Neural network-based scoring function for protein-ligand interactions

ODDT – A comprehensive toolkit for use in chemoinformatics and molecular modeling

ORGANIC – An efficient molecular generation tool that helps to create molecules with desired properties

PotentialNet – Ligand-binding affinity prediction based on a graph convolutional neural network (CNN)

PPB2 – Used for polypharmacology prediction

QML – A Python toolkit for quantum ML

REINVENT – Molecular de novo design using RNN (recurrent neural network) and RL (reinforcement learning)

SCScore – A scoring function to evaluate the synthesis complexity of a molecule

SIEVE-Score – An improved method of structure-based virtual screening via interaction-energy-based learning.

Databases used for target discovery

Here are the examples of databases used by pharmaceutical companies in target discovery:

  • BRENDA – Enzyme and enzyme-ligand information source.
  • KEGG – Database containing genomic information for functional interpretation and practical application.
  • PubChem – Database for encompassing information on chemicals and biological activities.
  • TTD – Therapeutic Target Database containing encompassing information about the drug resistance mutations, gene expressions, and target combinations data.
  • DrugBank – Detailed drug data and drug-target information database.
  • SuperTarget – Drug-related information databases with more than >300,000 compound-target protein relations.
  • TDR targets – Database containing chemogenomic information for neglected tropical diseases.
  • STITCH – Chemical-Protein interaction networks.
  • SMD – Database of raw microarray datasets.
  • Gene Expression Omnibus – Database of raw microarray datasets.
  • caArray – Database of cancer-related microarray datasets.
  • CGAP database – Database of cancer-related microarray datasets.
  • Oncomine – Database of cancer-related microarray datasets.
  • UniHI – Database of human molecular interaction networks.
  • Pathguide – Database of 702 biological pathway-related resources and molecular interactions.
  • UniProt – Encompassing protein information center.
  • InterPro – Database of protein domain information.
  • ADReCS – Database of toxicology information with 137,619 Drug-ADR pairs.
  • ChEMBL – Database of drug-like small molecules with predicated bioactive properties.
  • ChemSpider – Encompassing database of over 64 million chemical structures.
  • DrugCentral – Database containing relevant drug information of activity, chemical identity, mode of action, etc.