Run YOLOv10 on LattePanda Mu (Intel N100 Processor) with OpenVINO
Preface
In the rapidly evolving fields of artificial intelligence and machine learning, object detection models play a crucial role in various applications, from autonomous driving to security systems. Among these models, YOLO (You Only Look Once) has gained widespread attention for its outstanding speed and accuracy. With the growing demand for more powerful and efficient models, the release of YOLOv10 marks a significant milestone. To explore the performance of the newly released YOLOv10 on the Intel N100 chip, we conduct an experiment using the LattePanda Mu which is an x86 compute module featuring Intel N100 processor. In this study, we will deploy and run YOLOv10 through various methods to comprehensively evaluate its practical performance under different configurations and conditions. The LattePanda Mu compute module possesses powerful computing capabilities, enabling it to run various machine learning and deep learning frameworks at high speed. If you are interested, you can refer to our previous test documentation of YOLOv8 running on the LattePanda Mu compute module. Now, let's delve into the performance improvements of YOLOv10 and its potential to revolutionize object detection tasks.
Native Model
YOLOv10 comes in various model scales to cater to different application needs:
YOLOv10-N: Nano version for extremely resource-constrained environments.
YOLOv10-S: Small version balancing speed and accuracy.
YOLOv10-M: Medium version for general-purpose use.
YOLOv10-B: Balanced version with increased width for higher accuracy.
YOLOv10-L: Large version for higher accuracy at the cost of increased computational resources.
YOLOv10-X: Extra-large version for maximum accuracy and performance.
Due to the limitations of SBC hardware, we will use the YOLOv10n model for this experiment. As the YOLOv10 model branch is currently unstable (with no clear optimization date from Ultralytics), it is not recommended to directly use the from ultralytics import YOLO
command to call the V10 model, as it may result in some errors. We can temporarily refer to the YOLOv10 GitHub tutorial to deploy and call the model locally.
1. First download the GitHub project file to local
git clone https://github.com/THU-MIG/yolov10.git
cd yolov10
2. Create environmental dependencies
conda create -n yolov10 python=3.9
conda activate yolov10
pip install -r requirements.txt
pip install -e .
3. Run YOLOv10 test code
python yolov10.py
It can be observed that, without any optimization, directly running YOLOv10n on the Mu's CPU results in slow inference speeds. The FPS is approximately 1-3, causing significant lag and making it difficult to adapt to streaming detection environments.
![Figure: Original YOLOv10n model running on LattePanda Mu compute module](https://dfimg.dfrobot.com/6209babbaa9508d63a41b1bb/cmsen/28675b3e95f238e5d96c0be1905c9b4f.jpg)
Figure: Original YOLOv10n model running on LattePanda Mu compute module
ONNX Model
To accelerate inference speed, we will convert the model to an ONNX format. The operation code is shown below. The dynamic=True
option allows the ONNX model to adapt to input image sizes for inference.
from ultralytics import YOLO
# Load a model
model = YOLO("yolov8n.pt") # load the official model
# Export the model
model.export(format="onnx" , dynamic=True)
Next, place the Python file and model file in the current directory. You can directly download the converted model and sample code from the following GitHub link.
Or use command:
git clone https://github.com/winster-bai/yolo_on_Mu.git
cp yolo_on_Mu/yolo10.py path_to_your_yolov10_file #change your path
cp yolo_on_Mu/yolo10n.pt path_to_your_yolov10_file
cp yolo_on_Mu/text_onnx.py path_to_your_yolov10_file
After running the model, you may find that the inference speed has not significantly improved compared to the previous model. However, don't forget that the ONNX model can adapt the input image resolution. The smaller the input size, the faster the inference speed. The native camera input images are typically processed at 640x640 for computation. To increase speed, we can use cv.resize()
in the code to change the width and height of the input image to 128x128, improving the inference speed. The code is as follows:
frame = cv2.resize(frame,(128,128) )
After compressing the images, the model inference speed significantly improved, with FPS increasing to 25-30.
![128*128 input size running on ONNX model](https://dfimg.dfrobot.com/6209babbaa9508d63a41b1bb/cmsen/b89621f26781c69a4c579007fad74026.png)
Figure: 128*128 input size running on ONNX model
However, it's important to note that in this scenario, compressing the images may lead to decreased model accuracy and precision due to reduced pixel resolution. You can balance inference speed and accuracy by adjusting the input size to 224 or 416, depending on your specific needs.
OpenVINO Optimization
OpenVINO (Open Visual Inference and Neural Network Optimization) is an open-source toolkit developed by Intel aimed at accelerating high-performance computer vision and deep learning applications. Through it, we can significantly enhance the inference speed of the YOLOv10 model.
Download the yolov10-optimization.ipynb notebook from OpenVINO.
Following the steps in the yolov10-optimization.ipynb notebook (you can skip the model validation section), the inference results for the images are as follows:
![run YOLOv10 on LattePanda Mu - CPU image inference 2ms](https://dfimg.dfrobot.com/6209babbaa9508d63a41b1bb/cmsen/33f612d5c07ebaccced63ae3c3110e36.png)
Figure: CPU image inference 2ms
![run YOLOv10 on LattePanda Mu - GPU image inference 2ms](https://dfimg.dfrobot.com/6209babbaa9508d63a41b1bb/cmsen/454f79aec7818b4210373ab24aba24ae.png)
Figure: GPU image inference 1ms
Using OpenVINO to accelerate:
Running the FP16 model on GPU stabilizes the frame rate at around 12 FPS.
![Running the FP16 model on GPU](https://dfimg.dfrobot.com/6209babbaa9508d63a41b1bb/cmsen/b0eee57516d3b5084392f96446cd1f27.png)
GPU in FP16
![Running the FP16 model on CPU](https://dfimg.dfrobot.com/6209babbaa9508d63a41b1bb/cmsen/acf1848f55445df38e69ac4ee25bf5f4.png)
CPU in FP16
After quantizing the YOLOv10n model to INT8, the inference speed significantly improves, reaching around 18-19 FPS.
![Run int8 format YOLOv10 on OpenVINO](https://dfimg.dfrobot.com/6209babbaa9508d63a41b1bb/cmsen/4cef6b27d3a1373fbe4b2857a2475dd9.png)
Figure: Run int8 format YOLOv10 on OpenVINO
Summary
The versatility of YOLOv10 extends to various industries like real-time surveillance systems, autonomous driving, smart agriculture, etc., providing robust solutions for complex object detection tasks. In this test, we showcased the performance of YOLOv10 on Intel N100 using the LattePand Mu compute module and discussed how to enhance model inference speed through ONNX and OpenVINO technologies. Here's a comparison of performance using different inference methods for YOLOv10:
![The performance of YOLOv10 with ONNX and OpenVINO on the LattePanda Mu compute module](https://dfimg.dfrobot.com/6209babbaa9508d63a41b1bb/cmsen/94ddae09c4807d71245827fd1607984f.png)
Table: The performance of YOLOv10 with ONNX and OpenVINO on the LattePanda Mu compute module
While significant improvements in inference speed were achieved, balancing model accuracy and inference speed remains crucial in practical applications. We hope this sharing provides readers with optimization insights and methods for real-world applications. If you have any questions or suggestions, feel free to discuss them in our Discord.