CPU vs GPU: Which is Better for AI Models?

Why do some AI models use CPUs while others use GPUs?

The computational tasks of AI models typically involve extensive matrix operations and parallel processing, with CPUs and GPUs having distinct advantages and disadvantages in this regard.

The CPU (central processing unit) is a universal computing device capable of executing various types of instructions, such as logical judgment, branch jumping, and memory access. CPUs typically have fewer cores, but each core has a large cache and multiple arithmetic logic units (ALUs), and can support multi-threading technology to improve computational efficiency. CPUs are suitable for executing complex and irregular computing tasks, but for large-scale and simple matrix operations, they may be limited by factors such as cache capacity, memory bandwidth, and thread switching.

The GPU (graphics processing unit) is a specialized computing device used for graphic rendering, capable of executing a large number of identical or similar instructions, such as matrix multiplication and vector addition. GPUs typically have many cores (e.g. 3800 cores), but each core has a smaller cache and fewer arithmetic logic units, and can only read data from GPU memory. GPUs are suitable for executing simple and regular computing tasks, but for complex and irregular computing tasks, they may be affected by factors such as control units, branch prediction, and synchronization mechanisms.
Therefore, whether an AI model uses a CPU or GPU depends on its computational characteristics and requirements. In general, if an AI model needs to execute a large number of matrix operations in parallel, using a GPU will be more efficient; if an AI model needs to perform complex and irregular computing tasks, using a CPU will be more flexible.


Figure: CPU vs GPU for the deployment of deep learning models

CPU and GPU in AI Model Training

AI model training entails executing intricate mathematical calculations on vast datasets. These calculations are commonly carried out using matrix operations and necessitate a substantial amount of processing capability. Though possible to perform these computations on a CPU, training an AI model using a GPU is typically much more expedient and efficient. Here are a few reasons why:

· Parallel processing: GPUs are designed to perform numerous operations in parallel, rendering them exceptionally suited for the matrix operations that are prevalent in AI model training.

· Specialized hardware: GPUs are designed exclusively for executing complex mathematical calculations, whereas CPUs are intended for more general-purpose computing tasks.

· Memory bandwidth: Training an AI model involves transferring copious amounts of data between the CPU and the GPU. GPUs have significantly higher memory bandwidth than CPUs, which implies that they can transfer data between the CPU and GPU much more swiftly.

The following picture compares a GPU cluster to a 5-node CPU cluster with 35 pods for all models in each framework.

Figure: CPU vs GPU for the deployment of deep learning models

The results indicate that GPU clusters consistently outperform CPU clusters in terms of throughput for all models and frameworks.

The Role of CPUs in AI Model

· Model Training: Standard machine learning models with fewer parameters than deep learning models can still achieve effective and cost-efficient results using CPUs. Here are some examples of such models:

- Linear regression: The number of parameters is equivalent to the sum of input variables plus one.

- Logistic regression: The number of parameters is equivalent to the sum of input variables plus one.

- Decision trees: The number of parameters is equal to the number of tree nodes.

- Support vector machines: The number of parameters is equivalent to the product of support vectors and the number of output categories.

Figure: Types of Machine Learning

· ML inference: 

For most ML inference requirements, CPUs are still the best option. 

Intel offers optimized versions of popular AI frameworks, in addition to a range of libraries and tools for end-to-end AI development, including inference, to complement the AI acceleration capabilities integrated into our hardware architectures.

Figure: Machine learning inference


How to optimise the CPU or MPU ML inference?

· For x86 SBCs:

The Intel® Distribution of OpenVINO™ toolkit enables practitioners to optimize, tune, and run comprehensive AI inference using an included model optimizer and runtime and development tools. It supports many of the popular AI frameworks including Tensorflow, ONNX, PyTorch, and Keras, and allows for deployment of applications across combinations of accelerators and environments including CPUs, GPUs, and VPUs, and from the edge to the cloud.

Furthermore, methods such as MKL DNN and NNPACK can be used to optimize CPU performance, just as TensorRT can be used to optimize GPU performance. 

Figure: OpenVINO


· For ARM SBCs:

The Arm NN SDK is a set of open-source Linux software tools that enables machine learning workloads on power-efficient devices. This inference engine provides a bridge between existing neural network frameworks and power-efficient Arm Cortex-A CPUs, Arm Mali GPUs and Ethos NPUs. It is designed to accelerate ML on Arm Cortex-A CPUs and Arm Mali GPUs.