How hackers use machine learning to breach cybersecurity


From a technical standpoint, machine learning is a field where absolute cybersecurity is impossible! It does not promise to completely protect the confidentiality, integrity, and availability of data and networks but instead offers practical ways to reduce the scale of attacks and improve the security level to a great extent.

One reason why we cannot entirely prevent cybersecurity threats in machine learning is that cyber attackers themselves are adopting the same technology for attacks, which include malware and phishing, spam, DDoS, ransomware, spyware, etc. Besides, the offensive capabilities are much cheaper and easier to develop and deploy than the necessary defensive measures.

The use of AI-powered malicious apps in massive cyberattacks increases the speed, adaptability, agility, coordination, and even sophistication of the attacks on a large population of networks and devices. By using supervised and unsupervised learning, these malicious programs can hide within a victim’s system, and generate credentials to infiltrate devices by automatically cycling through password and username options at a speed faster than a human could test. They can self-learn how and when to attack their target system and be able to evade defensive measures through self-initiated changes in signature and behavior at the event of a counterattack. Interestingly, this kind of adaptivity and dynamism has become a core characteristic of both attack and defense systems.

Traditionally, the most obvious choice for a defensive cybersecurity application is antivirus or anti-malware programs, which work by scanning for specific malicious code or apps with unique fingerprints. They look for specific signatures and characteristics of specific malware families to identify attacks. But with new machine learning capabilities, attackers can stymie traditional antivirus applications by applying slight changes that throw off the signatures. Advanced tactics even allow hackers to work around these security measures not only to bypass anomaly detection engines, but facial security and spam filters, and promote fake voice commands.

The 2018 Ponemon Institute’s “Artificial Intelligence (AI) in Cyber-Security” study found that zero-day vulnerability, i.e., a software issue or undiscovered vulnerability with no known patches, represents one of the surest ways to hack a system. AI has the ability to detect previously undetectable zero-day exploits by 63 percent, and when hackers get to use AI, these zero-day vulnerabilities become a big problem. For hackers, they are essential for advanced cyberattacks. Fuzzing is an old but standard method used by attackers to find and exploit those vulnerabilities. With AI and machine learning, attackers can even automate the fuzzing process to spot the weaknesses before the defenders find and fix them.

CAPTCHA is a prevalent system used by websites and networks to distinguish human users from bots or machine input and gain unauthorized access. But deep learning and computer vision have emerged to be a way for hackers to break through CAPTCHA. Adrian Rosebrock is one of the first people to break CAPTCHA. In his book “Deep Learning for Computer Vision with Python,” Adrian explains how he bypassed the CAPTCHA systems on the E-ZPass New York website, using deep learning. He trained his model by downloading a large image dataset of CAPTCHA examples to break the CAPTCHA systems. Experts say that if we have access to the source code (which comes with when one installs a WordPress Plugin to generate CAPTCHA), it will take less than 15 minutes to hack CAPTCHA, using machine learning. In 2012, researchers proved that machine learning could bypass reCAPTCHA-based systems with an 82 percent success rate. More recently, in 2017, researchers used machine learning to support 98 percent accuracy to sidestep Google reCAPTCHA protections.

Stealth attacks are another dangerous form of cyberattack hackers use to penetrate a system. In these attacks, hackers first create a malware capable of mimicking trusted system components and let it blend with an organization’s security environment. It automatically learns the computation environment of the organization, patch update lifecycle, and preferred communication protocols. The malicious app remains silently for years without detection as hackers wait to strike when the systems are most vulnerable. Hackers then execute the undetectable attacks when no one expects. Hackers can also predefine an application feature as an AI trigger for executing attacks at a specific time, say ten months after the applications have been installed, or when the systems are least protected.

Deepfake technology is another common tactic thieves use to trick companies/individuals and steal their money. Deepfakes are fake videos or images of people, used to convince someone into believing a piece of virtual artifice is real. They are made using artificial neural networks. In October, a UK energy company’s chief executive was tricked into wiring €200,000 to a Hungarian supplier, because he believed his CEO was instructing him to do so. But in reality, a fraudster used deepfake to mimic the voice and demand the payment within an hour. The software could imitate not only the voice but the tonality, punctuation, and the German accent.

To sum up, hackers are indeed turning to AI and machine learning to weaponize malware and attacks to counter the advancements made in cybersecurity solutions. But the good news is that AI developers keep stepping up their defenses. They use algorithms to simulate attacks against an ecosystem, potentially helping the analysts to check, attack, defend, and secure vulnerabilities.

Machine learning’s biggest strength in cybersecurity is anomaly detection, i.e., to help understand what is “normal” behavior for a system and flag anything unusual for human review. Automating network scanning and anomaly detection allows us to pinpoint suspicious behavior much faster, mitigate a potential breach, minimize its impact, and thereby enhance the resilience of a system.

Machine learning is already being implemented in communication filtering, antivirus, vulnerability scanning, malware, and forensic analysis, spam-filters, and phishing defense. It is also used to tackle the spread of computational propaganda. Machine learning substantially speeds up this process for the security analyst. It is particularly crucial since cybersecurity has moved from perimeter defense also to include network scanning for unusual behavior or anomalies that may constitute a breach. This concept applies to all sorts of machine learning-assisted threat detection. But researchers say that the machine learning-human interplay is the crucial strength of the techniques. Perhaps, machine learning is no cure-all, but it can surely help identify security holes and prevent them.