Types of AI attacks against AI systems and best practices


Traditional cyberattacks generally exploit bugs or intentional and unintentional human mistakes in code. Classic cybersecurity attacks mainly aim to steal data (extraction) and disrupt a system.

Today, a new set of attacks are emerging, called AI attacks, which fundamentally target AI systems and expand the set of entities, including physical objects, that can be used to execute cyberattacks.

It is not just software or hardware that can be attacked as in traditional IT systems, but also the data critical to all AI systems. Attacks against AI systems also often aim to steal information or disrupt the system but are crafted in a more subtle form and for longer-term orientation. They try to acquire control of the targeted system for a given intent or get the model to reveal its inner workings by intrusion into the system and then change its behavior.

AI attackers can exploit several different vulnerabilities in an AI architecture, involving initial data inputs, data-conditioning processes, algorithms, and human-machine teaming. They represent an attack surface that is potentially vulnerable to cyberattacks. For this reason, the whole information technology system is rendered more open to attacks.

This post explores some popular types of AI attacks, such as data poisoning, tempering of categorization models, backdoors, reverse engineering of the AI model, and how to best protect AI systems from malicious attacks.

  • Perturbation attack: Attacker modifies the query to get appropriate response
  • Poisoning attack: Attacker contaminates the training phase of ML systems to get the intended result
  • Model Inversion: Attacker recovers the secret features used in the model through careful queries
  • Membership Inference: The attacker can infer whether a given data record was part of the model’s training dataset.
  • Model Stealing: The attacker can recover the model through carefully crafted queries
  • Reprogramming ML system: Repurpose the ML system to perform an activity it was not programmed for
  • Adversarial Example in Physical Domain: Attacker brings adversarial examples into the physical domain to subvert ML system, e.g., 3D printing special eyewear to fool facial recognition system.
  • Malicious ML provider recovering training data: Malicious ML provider can query the model used by the customer and recover the customer’s training data
  • Attacking the ML supply chain: The attacker compromises the ML models as it is being downloaded for use
  • Backdoor ML: Malicious ML provider backdoors algorithm to activate with a specific trigger
  • Exploit Software Dependencies: The attacker uses traditional software exploits like buffer overflow to confuse/control ML systems.

1. Data poisoning

Attackers may bring carefully crafted flawed data into the legitimate dataset used to train the system to modify its behavior. By adding 8% of erroneous data to an AI system for drug dosage, attackers could generate a 75.06% change in the dosage of half of the patients using the system for their treatment.

2. Tampering of categorization models

By manipulating the categorization models of, e.g., neural networks, attackers could modify the outcome of AI system applications. For instance, researchers using pictures of 3D printed turtles, obtained using a specific algorithm, were able to deceive the learning method of an AI system and classify turtles as rifles.

3. Backdoors

Adversaries can also use backdoor injection attacks to compromise AI systems. The adversary creates a “customized perturbation mask applied to selected images to carry out such attacks and override correct classifications.” “The backdoor is injected into the victim model via data poisoning of the training set, with a small poisoning fraction, and thus does not impair the learned deep neural network’s normal functioning.” As a result, such attacks “can exploit a deep learning system’s vulnerability stealthily and potentially cause great mayhem in many realistic applications, such as sabotaging an autonomous vehicle or impersonating another person to gain unauthorized access.” This is when a No Entry sign, for example, is misinterpreted as an Ahead Only sign.

4. Reverse engineering the AI model

Attackers can perform more targeted and successful adversarial attacks by gaining access to the AI model through reverse engineering. For example, according to a study published by the Institute of Electrical and Electronics Engineers (IEEE), following the Differential Power Analysis methodology, an adversary can target the ML inference, assuming the training phase is trusted, and learn the secret model parameters. The adversary can build knockoffs of the system and put security and intellectual property at risk.

How to prevent AI attacks?

Ways to make AI systems safe and reliable when developing and deploying them include the following:

Companies must promote suitability testing before an AI system is implemented to evaluate the related security risks. To be performed by all stakeholders involved in the development and/or a deployment project, such tests should gauge value, ease of attack, damage, opportunity cost, and alternatives.

Companies should address the risk of AI attacks once the AI system is implemented. General AI safety could also be strengthened by putting detection mechanisms in place. These would alert companies that adversarial attacks are occurring that the system in question is no longer functioning within specified parameters to activate a fallback plan.

All AI systems must follow a secure development life cycle, from ideation to deployment, including runtime monitoring and post-deployment control and auditing.

Companies should strengthen AI security in maintaining accountability across intelligent systems by requiring adequate documentation of the system’s architecture, including the design and documentation of its components and how they are integrated. They should secure logs related to the system’s development/coding/training: who changed what, when, and why? These are standard procedures applied for revision control systems used in developing software, which also preserves older versions of the software so that differences and additions can be checked and reversed.

Developers must provide cybersecurity pedigrees for any data libraries used for training machine learning (ML) algorithms. They should keep track of the data, model parameters, and training procedure where ML is used and require records that demonstrate due diligence when testing the technology before releasing it.

Companies must maintain logs of inputs and outputs for AI-powered operating systems, depending on the system’s capacities and when feasible, and assuming these are cybersecurity and GDPR compliant. They should enhance AI reliability and reproducibility by using techniques other than logging, such as randomization, noise prevention, defensive distillation, and ensemble learning.


Please enter your comment!
Please enter your name here