AI hardware – What they are and why they matter in 2023 [Updated]


Artificial Intelligence (AI) has been around for decades, but limited access to large data sets and lack of appropriate computing architectures have restrained AI developments, until recently.

The emergence of deep learning, cloud, parallel computing architectures, and the race for sophisticated AI capabilities such as speech, image, video, and text recognition, have accelerated AI research. It is driving a new wave of investment in premium AI hardware, capable of accelerating application development.

One of the layers in the technology stack for artificial intelligence, such as storage, memory, logic, and networking, AI hardware orchestrates and coordinates computations among accelerators, serving as a differentiator in AI.

According to studies, the demand for AI chips and application-specific hardware will increase by about 10 to 15 percent, resulting in a $109bn AI hardware market by 2025. Owing to the continued growth in data availability, compute power, and the developer ecosystem, chipmakers are racing to build AI hardware to capture 40 to 50 percent of the total technology stack value, which is the best opportunity they’ve had in decades.

AI hardware types

The hardware used for AI today mainly consists of one or more of the following:

  • CPU – Central Processing Units
  • GPU – Graphics Processing Units
  • FPGA – Field Programmable Gate Arrays
  • ASIC – Application Specific Integrated Circuits

Modern machines combine powerful multicore CPUs to solve parallel processing with dedicated hardware. GPU and FPGA are the popular, dedicated, and commonly available hardware in AI systems developed in workstations. A GPU is a chip designed to speed up multidimensional data processing, such as an image. A GPU is made up of thousands of smaller cores, designed to work independently on a subspace of input data that requires heavy computing. Repetitive functions that can be applied to various parts of the input, such as texture mapping, image rotation, translation, and filtering, are accomplished faster and more efficiently using GPU that has dedicated memory.

FPGA is a reconfigurable digital logic that contains a set of programmable logic blocks and a reconfigurable interconnection hierarchy. An FPGA is not a processor, so it cannot run a stored program in memory. A hardware description language (HDL) is used to configure an FPGA, and unlike the traditional CPU, it is truly parallel. This means that a dedicated section of the chip is assigned to each independent processing task, while many parts of the same program can be run simultaneously. Also, a typical FPGA may have dedicated memory blocks, digital clock manager, IO banks, and several other functions that vary across different models. While a GPU is designed for efficient performance with similar threads on different input subsets, an FPGA is designed to parallel, serial sequential processing of the same program.

Like all other general-purpose CPUs, AI chips gain speed and efficiency by incorporating vast numbers of smaller and smaller transistors that run faster and consume less energy than larger transistors. But unlike CPUs, AI chips have other design features that are AI-optimized too. These features dramatically speed up the identical, predictable, independent calculations AI algorithms require. They include:

Carrying out a large number of calculations in parallel instead of in sequence
Calculating low-precision numbers in a manner that successfully implements AI algorithms but reduces the number of transistors required for the same calculation
Speeding up memory access by storing an entire AI algorithm in a single AI chip
Using specially built programming languages to effectively translate AI computer code for execution on an AI chip.

Different types of AI chips are useful for various tasks. GPUs are most commonly used for initial development and refining of AI algorithms; this process is known as “training.” FPGAs are mostly used to apply trained AI algorithms to real-world data inputs; this is often referred to as “inference.” ASICs can be either designed for training or inference.

Why do we need cutting-edge AI chips for AI?

Because of its unique features, AI chips are tens or even thousands of times faster and more efficient to train and infer AI algorithms than CPUs. Due to their greater efficiency for AI algorithms, state-of-the-art AI chips are also significantly more cost-effective than state-of-the-art CPUs. An AI chip a thousand times as efficient as a CPU provides an improvement equivalent to 26 years of improvements to Moore’s Law-driven CPU.

Cutting-edge AI systems require not only AI-specific chips but also state-of-the-art AI chips. Older AI chips with their larger, slower, and more power-hungry transistors are incurring enormous energy consumption costs that are rapidly ballooning to unbeatable levels. Thus, using older AI chips today means overall costs and slowdowns at least an order of magnitude greater than for state-of-the-art AI chips.

These cost and speed dynamics make the development and deployment of cutting edge AI algorithms virtually impossible without state-of-the-art AI chips. Even with state-of-the-art AI chips, it can cost tens of millions of dollars to train an AI algorithm and take weeks to complete.

With general-purpose chips, training would take considerably longer to complete and cost magnitude orders more, making it virtually impossible to stay at the research and deployment frontier. Similarly, inferencing using less advanced or less specialized chips could involve similar cost overruns and take magnitude orders longer.

Overview of AI chip related technologies

  • Video/Image: Face Recognition, Object Detection, Image Generation, Video Analysis, Video Content Audit, Image Beautify, Search by Image, AR.
  • Sound and Speech: Speech Recognition, Language Synthesis, Voice Wake-up, Voiceprint Recognition, Music Generation, Intelligent SoundBox, Intelligent Navigation.
  • NLP: Text Analysis, Language Translation, Human-Machine Communication, Reading Comprehension, Recommendation System
  • Control: Autopilot, UAV, Robotics, Industrial Automation
  • Neural network topology: Multi-Layer Perceptron (MLP), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Spiking Neural Network (SNN).
  • Deep neural networks: AlexNet, ResNet, VGGNet, GoogLeNet.
  • Neural Network Algorithms: Back Propagation, Transfer Learning, Reinforcement Learning, One-shot Learning, Adversarial Learning, Neural Turing Machine, STDP.
  • Machine Learning Algorithms: Support Vector Machine (SVM), K-Nearest Neighbor, Bayesian, Decision Tree, Markov Chain, Adaboost, Word Embedding.
  • Chip Performance Optimizations: Optimization on efficiency, low power, high-speed, flexibility, which are applied for deep learning accelerators, and face recognition chip.
  • Neuromorphic Chip: Brain-inspired Computing, Biological Brain Stimulation, Brain Mechanism Simulation.
  • Programmable Chips: Focus on flexibility, programmability, algorithm compatibility, and software compatibilities, such as DSP, GPU, and FPGA.
  • System-on-chip Architecture: Multi-core, Many-core, SIMD, Arithmetic Units Array, Memory Interface, Network-on-chip, Multi-chip-inter-connection, Memory Interface, Communication Channels, Multi-level Cache.
  • Development Tool-chain: Interface to Deep Frameworks (Tensorflow, Caffe), Compiler, Simulator, Optimizer (Quantization, Pruning), Atomic Operations (Network Layers) Library.
  • High Bandwidth off-chip Memory: HBM, DRAM, High-speed GDDR, LPDDR, STT-MRAM.
  • High-Speed Interface: SerDes, Optical Communication
  • Bionic Devices (Artificial Synapses, Artificial Neurons): Memristors
  • New Computing Devices: Analog Computing, Memory Computing (In-memory Computing)
  • On-Chip Memory (Synaptic Array): Distributed SRAM, ReRAM, PCRAM, etc.
  • CMOS Process: Process Node (16, 7, 5 nm)
  • CMOS 3D Stacking: 2.5D IC/SiP, 3D-stack Technology, Monolithic 3D, etc.
  • New Technologies: 3D NAND, Flash Tunneling FETs, FeFET, FinFET

At present, AI chip is still in its infancy stage, and abundant uncertainties linger. However, research surrounding AI chips is making significant progress in machine learning based on a neural network that is considered superior to human intelligence in solving some computing-intensive issues. However, it is still at an embryo stage when it comes to solving cognitive problems and has a long way to achieve general-purpose intelligence (Artificial General Intelligence, AGI). AGI’s ideal computing capability and energy efficiency should be at least several orders of magnitude higher than that of AI Chips today.